← index #1450PR #17444
Related · high · value 0.274
QUERY · ISSUE

Single precision floating point formatting issues

openby dhylandsopened 2015-08-31updated 2024-08-28
docs

While investigating #1435 I ran into some other formatting issues. The output isn't necessarily wrong, but it doesn't match CPython either (although I am concerned about the outputting more data than requested potentially causing a buffer overflow).

1 - Outputting more data than requested.
On pyboard:

>>> '{:.4e}'.format(9.99999e-5)
'1.00000e-04'

On CPython:

>>> '{:.4e}'.format(9.99999e-5)
'1.0000e-04'

2 - making the wrong choice about when to use e notation:
pyboard:

>>> '%5.g' % 9.99999
'   10'

CPython:

>>> '%5.g' % 9.99999
'1e+01'

3 - Failing to normalize when rounding:
pyboard:

>>> '%5.e' % 9.99999
'10e+00'

CPython:

>>> '%5.e' % 9.99999
'1e+01'
CANDIDATE · PULL REQUEST

py/formatfloat: Improve accuracy of float formatting code.

mergedby yoctopuceopened 2025-06-06updated 2025-07-31
py-core

Summary

Following discussions in PR #16666, this pull request updates the float formatting code to reduce the repr reversibility error, i.e. the percentage of valid floating point numbers that do not parse back to the same number when formatted by repr.

The baseline before this commit is an error rate of ~46%, when using double-precision floats.

This new code is available in two flavors, based on a preprocessor definition:

  • In the simplest version, it reduces the error down to ~40%, using an integer representation of the decimal mantissa rather than working on floats. It is also slightly faster, and improves the rounding in some conditions.
  • In the most complete version, it reduces the error down to ~5%. This extra code works by iterative refinement, and makes the code slightly slower than CPython when tested on ports/unix.

Testing

The new formatting code was tested for reversibility using the code provided by Damien in PR #16666
A variant using formats {:.7g}, {:.8g} and {:.9g} was used for single-precision testing.

Compatibility with CPython on the various float formats was tested by comparing the output using the following code:

for mant in [34567, 76543]:
    for exp in range(-16, 16):
        print("Next number: %de%d" % (mant, exp))
        num = mant * (10.0**exp)
        for mode in ['e', 'f', 'g']:
            maxprec = 16
            # MicroPython has a length limit in objfloat.c
            if mode == 'f' and 6 + exp + maxprec > 31:
                maxprec = 31 - 6 - exp
            for prec in range(1, maxprec):
                fmt = "%." + str(prec) + mode
                print("%5s: " % fmt, fmt % num)

The integration tests have also found some corner cases in the new code which have been fixed.
For single-precision floats, some test cases had to be adapted:

  • float_format_ints is tapping into an ill-defined partial digit of the mantissa (the 10th), which is not available in single-precision floats with the new code due to integer limitations. So the display range has been updated accordingly.
  • similarly, float_struct_e uses a 15-digit representation which is meaningless on single-precision floats. A separate version for double-precision has been made instead
  • in float_format_ints, there is one test case specific to single-precision floats which verifies that the largest possible mantissa value 16777215 can be used to store that exact number and retrieve it as-is. Unfortunately the rounding in the simplest version of the new algorithm makes it display as a slightly different number. This would cause the CI test to fail on single-precision floats when the improved algorithm is not enabled.

Trade-offs and Alternatives

It is unclear at that point if the simplest version of this improvement is worth the change:

  • going from 46% error to 40% error in double precision is not a big improvement.
  • there is no improvement for single-precision
  • the new code is only marginally faster

The full version of the enhancement makes much more difference in terms of precision, both for double-precision and single-precision floats, but it causes about 20% overhead on conversion time, and makes the code a bit bigger

Looking forward to reading your feedback...

Edit 1: See https://github.com/micropython/micropython/pull/17444#issuecomment-2979968660 for updates on accuracy results
Edit 2: Updated values in https://github.com/micropython/micropython/pull/17444#issuecomment-2987116217

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied