Floating Point Math

__rito__1 pts0 comments

Floating Point Math

Floating Point Math

Your language isn’t broken, it’s doing floating point math. Computers can only<br>natively store integers, so they need some way of representing decimal numbers.<br>This representation is not perfectly accurate. This is why, more often than not,<br>0.1 + 0.2 != 0.3.

Why does this happen?

It’s actually rather interesting. When you have a base-10 system (like ours), it<br>can only express fractions that use a prime factor of the base. The prime<br>factors of 10 are 2 and 5. So 1/2, 1/4, 1/5, 1/8, and 1/10 can all be expressed<br>cleanly because the denominators all use prime factors of 10. In contrast, 1/3,<br>1/6, 1/7 and 1/9 are all repeating decimals because their denominators use a<br>prime factor of 3 or 7.

In binary (or base-2), the only prime factor is 2, so you can only cleanly<br>express fractions whose denominator has only 2 as a prime factor. In binary,<br>1/2, 1/4, 1/8 would all be expressed cleanly as decimals, while 1/5 or 1/10<br>would be repeating decimals. So 0.1 and 0.2 (1/10 and 1/5), while clean decimals<br>in a base-10 system, are repeating decimals in the base-2 system the computer<br>uses. When you perform math on these repeating decimals, you end up with<br>leftovers which carry over when you convert the computer’s base-2 (binary)<br>number into a more human-readable base-10 representation.

Below are some examples of sending .1 + .2 to standard output in a variety of<br>languages.

Read more:

Wikipedia

IEEE 754

Stack Overflow

What Every Computer Scientist Should Know About Floating-Point<br>Arithmetic

Language<br>Code<br>Result

πŸ”—

πŸ”—

PowerShell by default uses double type, but because it runs on .NET it has the<br>same types as C# does. Thanks to that the Decimal type can be used -<br>directly by providing the type name [decimal] or via suffix d.

More about that in the C# section.

πŸ”—<br>ABAP

πŸ”—<br>ABAP

WRITE / CONV f( '.1' + '.2' ).

and

WRITE / CONV decfloat16( '.1' + '.2' ).

0.30000000000000004

and

0.3

πŸ”—<br>APL

πŸ”—<br>APL

0.1 + 0.2

and

βŽ•PP ← 17<br>0.1 + 0.2

and

0.3 = 0.1 + 0.2

and

βŽ•CT←0<br>0.3 = 0.1 + 0.2

and

βŽ•FR ← 1287<br>βŽ•PP ← 34<br>0.1 + 0.2

and

βŽ•FR ← 1287<br>βŽ•DCT ← 0<br>0.3 = 0.1 + 0.2

0.3

and

0.30000000000000004

and

and

and

0.3

and

APL has a default printing precision of 10 significant digits. Setting βŽ•PP to 17 shows the error, however 0.3 = 0.1 + 0.2 is still true (1) because there’s a default comparison tolerance of about 10-14. Setting βŽ•CT to 0 shows the inequality. Dyalog APL also supports 128-bit decimal numbers (activated by setting the float representation, βŽ•FR, to 1287, i.e. 128-bit decimal), where even setting the decimal comparison tolerance (βŽ•DCT) to zero still makes the equation hold true. Try it online! Multi-precision floats, unlimited precision rationals, and ball arithmetic are available in NARS2000.

πŸ”—<br>Ada

πŸ”—<br>Ada

with Ada.Text_IO; use Ada.Text_IO;<br>procedure Sum is<br>A : Float := 0.1;<br>B : Float := 0.2;<br>C : Float := A + B;<br>begin<br>Put_Line(Float'Image(C));<br>Put_Line(Float'Image(0.1 + 0.2));<br>end Sum;

3.00000E-01<br>3.00000E-01

πŸ”—<br>AutoHotkey

πŸ”—<br>AutoHotkey

MsgBox, % 0.1 + 0.2

0.3

πŸ”—<br>AutoIt

πŸ”—<br>AutoIt

ConsoleWrite(0.1 + 0.2)

0.3

πŸ”—

πŸ”—

#include

int main(int argc, char** argv) {<br>printf("%.17f\n", .1 + .2);<br>return 0;

0.30000000000000004

πŸ”—<br>C#

πŸ”—<br>C#

Console.WriteLine("{0:R}", .1 + .2);

and

Console.WriteLine("{0:R}", .1f + .2f);

and

Console.WriteLine("{0:R}", .1m + .2m);

0.30000000000000004

and

0.3

and

0.3

C# has support for 128-bit decimal numbers, with 28-29 significant digits<br>of precision. Their range, however, is smaller than that of both the single and<br>double precision floating point types, between Β±7.98e28. Decimal literals are<br>denoted with the m suffix.

πŸ”—<br>C++

πŸ”—<br>C++

#include<br>#include

int main() {<br>std::cout

0.30000000000000004

πŸ”—<br>COBOL

πŸ”—<br>COBOL

IDENTIFICATION DIVISION.<br>PROGRAM-ID. HFPMATH.<br>DATA DIVISION.<br>WORKING-STORAGE SECTION.<br>77 X-32 USAGE COMP-1.<br>77 X-64 USAGE COMP-2.<br>PROCEDURE DIVISION.<br>COMPUTE X-32 = 0.1 + 0.2.<br>DISPLAY X-32.<br>COMPUTE X-64 = 0.1 + 0.2.<br>DISPLAY X-64.<br>MOVE 0.1 TO X-64.<br>DISPLAY X-64.

0.30000001<br>0.3<br>0.09999999999999999

COBOL encodes floating-point numbers using the HFP format instead of IEEE754, leading to different but analogous computation inaccuracies.

πŸ”—<br>Clojure

πŸ”—<br>Clojure

(+ 0.1 0.2)

0.30000000000000004

Clojure supports arbitrary precision and ratios. (+ 0.1M 0.2M) returns 0.3M,<br>while (+ 1/10 2/10) returns 3/10.

πŸ”—<br>ColdFusion

πŸ”—<br>ColdFusion

#foo#

0.3

πŸ”—<br>Common Lisp

πŸ”—<br>Common Lisp

(+ .1 .2)

and

(+ 1/10 2/10)

and

(+ 0.1d0 0.2d0)

and

(- 1.2 1.0)

0.3

and

3/10

and

0.30000000000000004d0

and

0.20000005

CL’s spec doesn’t actually even require radix-2 floats (let alone specifically<br>32-bit singles and 64-bit doubles), but the high-performance implementations all<br>seem to use IEEE floats with the usual sizes. This was tested on SBCL and ECL in<br>particular.

πŸ”—<br>Crystal

πŸ”—<br>Crystal

puts 0.1 + 0.2

and

puts 0.1_f32 + 0.2_f32

0.30000000000000004

and

0.3

πŸ”—

πŸ”—

import std.stdio;

void main(string[] args)...

decimal floating point base prime decimals

Related Articles