Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: inaccurate string to float64 conversion ParseFloat #36657

Closed
g7r opened this issue Jan 20, 2020 · 20 comments
Closed

strconv: inaccurate string to float64 conversion ParseFloat #36657

g7r opened this issue Jan 20, 2020 · 20 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@g7r
Copy link
Contributor

g7r commented Jan 20, 2020

What version of Go are you using (go version)?

$ go version
version go1.13 linux/amd64

Does this issue reproduce with the latest release?

Not sure. But https://play.golang.org/p/izkaZ-XBog5 reproduces the bug.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Here's the code: https://play.golang.org/p/izkaZ-XBog5

I'm converting string "1090544144181609348835077142190" to float64 with strconv.ParseFloat("1090544144181609348835077142190", 64) and it returns incorrect value.

What did you expect to see?

It is expected that 1090544144181609419040633126912 should be returned as the closest value representable by float64. The difference between string value and correct float64 value is -70205555984722 (see https://www.wolframalpha.com/input/?i=1090544144181609348835077142190-1090544144181609419040633126912)

What did you see instead?

strconv.ParseFloat(...) actually returns 1090544144181609278303144771584. The difference between string value and returned float64 value is 70531932370606 (see https://www.wolframalpha.com/input/?i=1090544144181609348835077142190-1090544144181609278303144771584)

Additional info

Go compiler correctly converts float64 literals to float64 constants, as you can see in Playground example https://play.golang.org/p/izkaZ-XBog5

@randall77
Copy link
Contributor

I can confirm. It seems like the fast path in strconv/atof.go:627, particularly the AssignDecimal call, is to blame. If I turn the optimize flag off, the bug goes away.
This has been an issue since at least 1.11.
@griesemer

@randall77 randall77 added this to the Go1.15 milestone Jan 20, 2020
@ianlancetaylor ianlancetaylor added NeedsFix The path to resolution is known, but the work has not been done. release-blocker labels Jan 20, 2020
@odeke-em odeke-em changed the title Inaccurate string to float64 conversion in strconv.ParseFloat strconv: inaccurate string to float64 conversion ParseFloat Jan 22, 2020
@smasher164
Copy link
Member

This behavior has existed since go 1, so it wasn't introduced in any recent release. Digging into extfloat now.

@ALTree
Copy link
Member

ALTree commented Jan 23, 2020

The ParseFloat doc says:

If s is well-formed and near a valid floating-point number, ParseFloat returns the nearest floating-point number rounded using IEEE754 unbiased rounding.

So it appears that it doesn't unconditionally guarantee to always return the closest IEEE754 number... only if the argument is "near a valid floating-point number" (whatever that means). I believe that your number does not qualify as "near a valid floating-point number".

@smasher164
Copy link
Member

Also, the CL associated with #15672 solves this.

@andybons
Copy link
Member

This has been like this forever. Removing release-blocker label.

@odeke-em
Copy link
Member

We didn’t cover much progress on the related CL unfortunately because folks were swamped. I shall move this to the Unplanned milestone given that the parallel issue is in the Unplanned milestone too.

@odeke-em odeke-em modified the milestones: Go1.15, Unplanned May 13, 2020
@lemire
Copy link

lemire commented Oct 23, 2020

@nigeltao Given your recent work, is this still an issue?

@JoshStrobl
Copy link

This is still an issue as of 1.15.3. I'm still losing precision. I have a float string of -0.63571428571428568 and it's getting truncated to -0.6357142857142857, losing 1428568.

@odeke-em
Copy link
Member

odeke-em commented Oct 23, 2020 via email

@ALTree
Copy link
Member

ALTree commented Oct 23, 2020

The reproducer in the first post still prints

1.0905441441816094e+30 is not equal to 1.0905441441816093e+30
expected: 0x462b8779f2474dfb     7748782196739579p+47 1090544144181609419040633126912.000000000
actual  : 0x462b8779f2474dfa     7748782196739578p+47 1090544144181609278303144771584.000000000

at tip, so yes, this is still an issue.

go version devel +c3fe874f25 Fri Oct 23 18:02:16 2020 +0000 linux/amd64

@lemire
Copy link

lemire commented Oct 23, 2020

Right because the input string has many more than 17 digits. I was being hopeful.

@smasher164
Copy link
Member

IIRC, CL 170078 (implement Ryū-like algorithm for atof) did address this issue, but it was abandoned in favor of CL 260858 (use the Eisel-Lemire ParseFloat algorithm).

@lemire
Copy link

lemire commented Oct 23, 2020

I have basically ported the Go code more or less as-is to C++ from @nigeltao's own C port... and I don't have this issue. See https://github.com/lemire/fast_float/pull/6

The code follows the same logic, so it ought to be work. I mean... there is no big conceptual gap.

@lemire
Copy link

lemire commented Oct 23, 2020

Ok. So the problem goes away entirely if you comment out this third fast path in atof.go...

		// Try another fast path.
		/*ext := new(extFloat)
		if ok := ext.AssignDecimal(mantissa, exp, neg, trunc, &float64info); ok {

			b, ovf := ext.floatBits(&float64info)
			f = math.Float64frombits(b)
			if ovf {
				err = rangeError(fnParseFloat, s)
			}
			return f, n, err
		}*/

Looking at ext.AssignDecimal, it says...

	// Errors (in the "numerical approximation" sense, not the "Go's error
	// type" sense) in this function are measured as multiples of 1/8 of a ULP,
	// so that "1/2 of a ULP" can be represented in integer arithmetic.
	//
	// The C++ double-conversion library also uses this 8x scaling factor:
	// https://github.com/google/double-conversion/blob/f4cb2384/double-conversion/strtod.cc#L291
	// but this Go implementation has a bug, where it forgets to scale other
	// calculations (further below in this function) by the same number. The
	// C++ implementation does not forget:
	// https://github.com/google/double-conversion/blob/f4cb2384/double-conversion/strtod.cc#L366
	//
	// Scaling the "errors" in the "is mant_extra in the range (halfway ±
	// errors)" check, but not scaling the other values, means that we return
	// ok=false (and fall back to a slower atof code path) more often than we
	// could. This affects performance but not correctness.
	

The part where it says we return ok=false (and fall back to a slower atof code path) more often than we could. This affects performance but not correctness might be incorrect, as illustrated by the example (1090544144181609348835077142190). In that case, it appears to me that double-conversion does reject the fast path but not the Go code?

Maybe it is helpful to have some perspective. Prior to the recent changes made by @nigeltao, this third fast path was the second fast path. I would argue that it was absolutely needed.

Now you have effectively three fast paths followed by a fallback.

The first and second fast paths will catch nearly everything in the real world. Like 99% of anything that has no more than 19 digits.

So what the third fallback does is protect against the fallback. But that may not be useful:

  1. Nobody that cares about performance will generate floats with more than 19 digits of accuracy (mantissa). It is an awkward thing to do in the best of times.
  2. People who do throw long mantissa at the parser, would probably prefer to get correct rounding given a choice. That is, they are probably not in the business of ingesting large CSV files filled with 100 digit numbers. That's really an edge case.
  3. Having more layers can sometimes make things worse. It certainly increases the surface area for maintenance.
  4. The fallback is not that slow. I have not benchmarked the Go version, but the direct C++ port runs at an acceptable speed. You may be rightfully concerned with denial of service attacks where someone throws numbers with very long strings at you... but that's not an effective attack... in fact, beyond a certain point, the parsing speeds goes up. My C++ port of the fallback runs at 2 GB/s for streams of long numbers. (Asymptotically, parsing is O(N).)

I think point 4 is maybe key. Assuming that the fallback is "fast enough", and assuming that it is correct (my port of it sure appears correct), then when not go to it directly?

@gopherbot
Copy link

Change https://golang.org/cl/264518 mentions this issue: strconv: remove extfloat.go atof code path

@nigeltao
Copy link
Contributor

Ok. So the problem goes away entirely if you comment out this third fast path in atof.go...

Coincidentally, that's what https://golang.org/cl/264518 does, which I sent out yesterday.

@odeke-em
Copy link
Member

Thank you for the change and foresight @nigeltao! Thank you @lemire for the rationale in #36657 (comment) and also for the algorithms!

@lemire, it would be a delight to have you as a Go contributor/reviewer, and Nigel's CL could use the rationale you gave for points 1 and 4 in his commit message, but would be nice if you could post those up and reflect in the record, and have you also approve that change :)
@nigeltao please take a look at @lemire's rationale in the various points and perhaps also include that reasoning that he raised in those points.

@nigeltao
Copy link
Contributor

@lemire said:

Nobody that cares about performance will generate floats with more than 19 digits of accuracy (mantissa). It is an awkward thing to do in the best of times.

OTOH, separately on https://go-review.googlesource.com/c/go/+/264518/3#message-7fda4a76b2d909bb21794b84f9c3d30f4730adb5 @remyoudompheng said:

It's not for malicious inputs. It's for inputs which are too long. In my environment I have old apps which produce non-shortest floats and they are a pain.

In the same thread, @remyoudompheng also mentions that https://golang.org/cl/264677 should help with the more-than-19-digits case.

@nigeltao
Copy link
Contributor

concerned with denial of service attacks where someone throws numbers with very long strings at you

In any case, if it's through a JSON API, there might be more worrisome DoS attacks where decoding lots of empty JSON arrays or JSON objects might be more expensive (due to mallocs and garbage collection) than decoding long JSON numbers.

@lemire
Copy link

lemire commented Oct 23, 2020

Always happy to help.

@nigeltao I think https://golang.org/cl/264677 is what I had in mind in an independently sent private message. As I also privately communicated, we could tighten further the second fast path if we wanted to catch even more cases (e.g., subnormals and round-to-even cases) but it is only worth doing if too many fall through.

@golang golang locked and limited conversation to collaborators Oct 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests