How do you get an impossible to fix (completely) bug? Just follow me...

flabdablet

@ben_lubar said:

It's JSON, so numeric literals are floats by definition.

Cite, please.

ben_lubar

RaceProUK

@flabdablet said:

Cite, please.

JSON = JavaScript Object Notation
JavaScript has no integer type, only float

flabdablet

The spec allows for data interchange of numeric literals in scientific notation. It does not mandate the use of scientific notation for any particular instance of data interchange, and is deliberately silent on the question of the types that the producers and consumers of the data so interchanged are to use to represent it.

JSON can transfer numeric values that cannot be represented even approximately in any IEEE754 type; the only common native type that can guarantee to consume any JSON number without data loss is strings. So in practice, some reasonable agreement on numeric type is going to need to exist ahead of time between the producer and consumer of any given chunk of JSON.

There are traditional forms for representing integers and traditional forms for representing floats, and although the JSON spec itself is silent on the distinction, there is nothing to stop that distinction forming part of such a producer-consumer agreement.

I can see no good argument for emitting a number in scientific notation when asked to JSON-encode an integer. It will almost never be shorter than the equivalent plain digits notation, and it causes exactly the kind of pain you're now experiencing, by sabotaging the use of even such minimal type inference as the JSON spec makes possible.

ben_lubar

@flabdablet said:

[scientific notation] will never be shorter than the equivalent plain digits notation

Go Playground - The Go Programming Language

flabdablet

@RaceProUK said:

JavaScript has no integer type, only float

Javascript is not Java, and JSON is not Javascript. JSON has its own spec. Let us all now bow our heads for today's reading. Please open your prayer books to the Introduction.

JSON is a text format that facilitates structured data interchange between all programming languages. JSON is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications. JSON was inspired by the object literals of JavaScript aka ECMAScript as defined in the ECMAScript Language Specification, third Edition. It does not attempt to impose ECMAScript’s internal data representations on other programming languages. Instead, it shares a small subset of ECMAScript’s textual representations with all other programming languages.
JSON is agnostic about numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits. All programming languages know how to make sense of digit sequences even if they disagree on internal representations. That is enough to allow interchange.

Emphasis mine. Amen.

ben_lubar

json package - encoding/json - Go Packages

flabdablet

Number is not a common native type. Near as I can tell it's not even native in Go.

ben_lubar

encoding/json.Number is a string in Go.

flabdablet

Are we still arguing, or are we singing in glorious harmony now? I can't tell.

RaceProUK

@flabdablet said:

JSON is agnostic about numbers

Like I said, JSON has no integer type, only float. And since it's all strings anyway, it doesn't have to adhere to IEEE754.

flabdablet

@RaceProUK said:

JSON has no integer type, only float.

It doesn't have either. The spec defines a syntax for numeric literals, and explicitly states that it does not itself have anything to say about their interpretation as machine types:

JSON is agnostic about numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal. That can make interchange between different programming languages difficult. JSON instead offers only the representation of numbers that humans use: a sequence of digits.

314159265358979323846264338327950288419716939937510582097494459230 is a valid JSON numeric literal. So is that "really" integer, or float, or fixed point?

RaceProUK

Since the format allows scientific notation, it's float. But like I said, as it's a string, it doesn't have to be an IEEE754 float.

flabdablet

Actually, it isn't. It's a fixed-point representation of an approximation of pi, correct to 65 decimal places.

The format allows scientific notation, but that particular literal is not in scientific notation.

RaceProUK

Translation: "I shall disprove your point by ignoring it"

flabdablet

Floating point is a fixed-size machine representation for numeric quantities that allows you to place the point between any arbitrary pair of digits and/or scale the digits by a power of the base. Scientific notation is not the same thing as floating point. Scientific notation can be used to represent floating point quantities. It can also be used to represent integers (Ben provides some examples above).

RaceProUK

@flabdablet said:

Floating point is a fixed-size machine representation

But we're not talking about that, are we? We're talking about a string representation in which the decimal point can float. Hence is it floating-point.

flabdablet

@RaceProUK said:

We're talking about a string representation in which the decimal point can float. Hence is it floating-point.

But the string I asked you for an opinion on doesn't even have a decimal point. It's not in scientific notation. The spec allows but does not mandate the use of a decimal point, floating or otherwise, and it allows but does not mandate the use of the e convention to separate mantissa from exponent. You could design a program that used fully spec-compliant JSON to transfer integers or fixed-point numbers from place to place with never a decimal point in sight.

When the spec itself claims explicitly to be agnostic about numbers, why is this so difficult to take at face value?

RaceProUK

The presence of a decimal point may be optional; doesn't make it any less able to float.

And I like how you are still ignoring the fact I said it's not an IEEE754 float.

flabdablet

@RaceProUK said:

ignoring the fact I said it's not an IEEE754 float.

Sorry, did you need that acknowledged? I didn't realize it was in dispute.

@RaceProUK said:

JSON = JavaScript Object Notation
JavaScript has no integer type, only float

Javascript numbers, for what it's worth, are 64-bit IEEE754 floats per spec.

RaceProUK

Am I not allowed to alter my stance in light of new information now?

flabdablet

@RaceProUK said:

The presence of a decimal point may be optional; doesn't make it any less able to float.

It can't float if it isn't there. These are digit strings, as the spec makes completely clear, and producers and consumers of those digit strings are allowed to ascribe any numeric meaning to them that makes sense.

The difficulty that Ben is having is not due to any fault in JSON or its spec. It's due to the fact that the JSON producer and the JSON consumer in his JSON library of choice have got different rules for converting machine ints to and from JSON.

JSON is perfectly capable of transferring machine ints of any bit width; it is not bound by the limitations that 64 bit IEEE754 floats place on Javascript.

It is also capable of transferring machine floats, though because the JSON representation has a decimal mantissa you might need your encoder to emit a few bogus trailing digits just to make sure the consumer ends up with the same bits you encoded.

It is even capable of transferring implicit type information about whether any particular JSON number is the result of encoding a machine int or a machine float, using the same well-tested convention that compilers have been using since FORTRAN: a numeric literal that includes an explicit decimal point and/or exponent delimiter represents a float, and one without represents an int.

JSON does not inherit any of the limitations that occasionally make Javascript's lack of an explicit int type annoying. It is perfectly capable of representing any arbitrary machine int without loss. To say that integers are not in the JSON spec, or that any JSON number is "really" a float, is completely misleading.

RaceProUK

@flabdablet said:

It can't float if it isn't there

The presence of a decimal point is implicit, just like all those zeroes that people don't write down because it's pointless. Just because it isn't written down, doesn't mean it can't float.

flabdablet

@RaceProUK said:

The presence of a decimal point is implicit

Agreed. It's just that the implicit place is not necessarily to the right of all the digits. If I want to create a JSON producer and a JSON consumer that transfer fixed-point numbers, I can encode those in JSON without explicit decimal points; all that's required is that my producer and my consumer agree on the place value of the least significant JSON digit.

flabdablet

@RaceProUK said:

Just because it isn't written down, doesn't mean it can't float.

I think the difficulty we're having here is that you conflate two ideas I see as distinct. In your world, it seems to me, scientific notation or even the possibility of scientific notation are the same thing as floating point.

In my world, floating point is a specific encoding technique that allows precision to be traded off against representable magnitude within a fixed-length stored numeric; scientific notation pe se implies no such tradeoff, because it can be used to write down numbers with arbitrary precision and magnitude, and scientific notation is also almost always written in base 10.

The JSON spec for numbers describes a specific Unicode encoding for strings with or without scientific-notation semantics representing numbers with or without fractional parts. You could use either plain or scientific-notation forms to exchange machine floats or machine ints.

I think the distinction between scientific notation and floating-point representation is important, because keeping it in mind helps avoid pitfalls involving floating-point numbers that are unexpectedly unwieldy to represent exactly in scientific notation, or numbers with short scientific notation representations that can't be exactly represented as machine floats.

If my wretched insistence on hammering that distinction here has ruffled your quills, I apologise.

RaceProUK

@flabdablet said:

If my wretched insistence on hammering that distinction here has ruffled your quills, I apologise.

S'alright; I thought we were just having the normal sort of WTDWTF argument, where both sides are being pedantic for the sake of pedantry

RobFreundlich

@flabdablet said:

About 50em makes for comfortable reading of descriptive text. So if you have a Text Label whose width is not set explicitly, not constrained by that of its container, and not inferrable via alignment with other controls above or below, then 50em in whatever font its text is in would be a reasonable default.

That's the arbitrary number we went with, I think.

@flabdablet said:

@RobFreundlich said:
when you word-wrap a Text Label, its height changes. So you might end up having to change the height of the container. This causes a recalculation of the container, and then a re-layout, which, if the container is constrained in some other way ... can cause scrollbars to appear... That affects the available width and height of the Text Label, which causes another recalculation of the wrapping point of the text, which causes another recalculation, which blah blah blah.

If that were my problem, I'd be doing all my layouts not against the container itself, but against an invisible box that would still fit in the container even if the container were to grow both X and Y scrollbars.

THANK YOU!!!!!!! I spent so much time deep into trying to get this perfect that I never thought of that possibility when I did the original implementation. And, as it happens, this week is when I am going to start tackling the list of bug reports that have come in since it went out the door, and your suggestion just might save me a huge headache.

flabdablet

I'll be here all week, tip your waitress etc.

abarker

@RaceProUK said:

The presence of a decimal point is implicit,

However, implicit and explicit decimal points can have different meanings. For example, as the decimal in 1000 only implicit, this tells us that the number only has one significant digit. However, if the decimal were to be explicit (1000.) then the number is presumed to have 4 significant digits. Thus, it is important to remember that there are differences between implicit and explicit decimals.

<I don't think I really had a point, your comment just made me thing of this.

RaceProUK

The implicit DP doesn't imply anything about significant digits though; unless otherwise stated, typically all present digits are significant :P

dkf

@flabdablet said:

I think the difficulty we're having here is that you conflate two ideas I see as distinct. In your world, it seems to me, scientific notation or even the possibility of scientific notation are the same thing as floating point.

He's trying to ascribe a single fixed type to the result of each of the rules in the grammar. That doesn't work with JSON unless you go for the very vague “numeric” so he's trying to (ab)use floating point for that purpose. Which is wrong in general (though adequate for a lot of real-world JSON documents, to be fair).

We should pass cryptographic credentials around by JSON document. The long integers involved would bust many of these easy type assumptions, and so add another layer of security!!!

flabdablet

@dkf said:

We should pass cryptographic credentials around by JSON document

And since IEEE754 floats do have a fully specified binary representation (vendor-independent interop was one of the main motivations for creating that format), there should also be a standard JSON idiom for passing them around packed into JSON strings - and we'll be having none of this wussy hex or base64 encoding shit either, at least not if it doesn't actually save space.

abarker

@RaceProUK said:

unless otherwise stated, typically all present digits are significant

Except for trailing zeroes in integers (such as the zeroes in 1000) or leading zeroes where the stated value is less than 1 and greater than -1 (such aa the zeroes in 0.00001). Otherwise, you are correct. This is because those zeroes are assumed to be there only to indicate scale. The one exception is when a decimal is deliberately placed at the end of an integer value, which is done to indicate that the value was actually measured to that precision, such that 1000. indicates the value was measured to one's precision.

PleegWat

That's not how I learned it. As I learned it, the only correct way to write 1000 with 1 decimal precision is 1 × 10³.

dkf

@flabdablet said:

And since IEEE754 floats do have a fully specified binary representation

And come in three or four actual binary representations (assuming we're talking about double-precision here) corresponding to different orderings used placing the bytes on the wire. This is stupid. This is reality.

flabdablet

The whole idea of "significant figures" is a bit of a crude hack anyway.

If I multiply 2.0×10¹ by 1.0×10¹ to get 2.0×10² then what I'm really doing is multiplying a value claimed to be correct to within 1 part in 20 by another claimed to be correct to within 1 part in 10; the larger uncertainty will dominate, so the implied 1-in-20 accuracy of the final result is a lie.

The right way to do these things is to have an explicitly specified uncertainty range for all inputs, correctly track the worst-case uncertainty all the way through the arithmetic, then quote it explicitly for the output. But that's very rarely done. Values including an explicit uncertainty range are not a native type in any programming language I'm aware of.

flabdablet

@dkf said:

different orderings used placing the bytes on the wire

This is an idiom we're inventing here, so as long as the bits all mean the same thing when they're put back in the right places (and they do) then we're free to specify their network byte order. Exponent-first big-endian is consistent with traditional networking byte order; do we need a tighter spec than that?

Also, since the default byte encoding for JSON's Unicode code points is UTF-8, I'm not sure we'll get a portably worthwhile increase in bit density by going outside the ASCII range (we certainly would for UTF-16 encoded JSON). So what about some form of Base85 - using 5 JSON characters to transmit each 32 bits of float looks like a pretty natural fit, Base85 includes the !!!!! = z compression hack for a block of 32 zeroes, and we can easily pick a Base85 alphabet that excludes both " and \.

PleegWat

Agreed. But it's the only uncertainty notation I ever learned.

dkf

@flabdablet said:

do we need a tighter spec than that?

I would've said “no”, except I've seen the stupid shit that actually happens in the real world. There's some really broken-headed things out there!

flabdablet

@PleegWat said:

it's the only uncertainty notation I ever learned.

Stuff like 3.142±0.125 is pretty standard where I come from.

abarker

I can't help that you were taught wrong. See the following:

@wikipedia said:

The significant figures of a number are those digits that carry meaning contributing to its precision. This includes all digits except:

-All leading zeros;
-Trailing zeros when they are merely placeholders to indicate the scale of the number[1]

@Texas A&M University said:

Leading zeros are never significant.
Imbedded zeros are always significant.
Trailing zeros are significant only if the decimal point is specified.

...

EXAMPLES:



| Number of   |               |

Raw     | Significant | Scientific    |

Number  | Figures     | Notation      | Notes

--------+-------------+---------------+---------------------------------

0.00682 |      3      | 6.82 x 10-3   | Leading zeros are not significant.

1.072   |      4      | 1.072 (x 100) | Imbedded zeros are always significant.

300     |      1      | 3 x 102       | Trailing zeros are significant only if the decimal point is specified.

300.    |      3      | 3.00 x 102    |

300.0   |      4      | 3.000 x 102   |

[2]

flabdablet

Pretty much regardless of which convention you use for decimal points in raw numbers, the only unambiguous way to specify a value whose significant figures include some trailing zeroes, like 3.0×10², is to do so in scientific notation.

abarker

@flabdablet said:

Pretty much regardless of which convention you use for decimal points in raw numbers, the only standard unambiguous way to specify a value whose significant figures include some trailing zeroes, like 3.0×10², is to do so in scientific notation.

FTFY

There are other methods used, such as underlining or overlining the least significant digit (ie, 1000 has three significant digits. Some people specific the number of digits, such as "12000 to 3sf." While unambiguous, none of these methods are standard.

tarunik

@dkf said:

I would've said “no”, except I've seen the stupid shit that actually happens in the real world. There's some really broken-headed things out there!

See this page under "ARM floating points"...damn you, mixed-endian doubles! (Thankfully, the FPA is p. much dead and buried now -- just about all ARMs are either soft-float or VFP)

Jerome_Grimbert

@tarunik said:

Chicago is one of the main places where the major US railroads (yes, basically all of them) join together and swap railcars around -- it's an absolute spaghetti bowl of railroad tracks.

Still no problem: that's what computer and their database are made for, handling large set of data. Chicago just get a bunch of very small tracks and plenty of switch/static cross.

dkf

@Jerome_Grimbert said:

Still no problem: that's what computer and their database are made for, handling large set of data. Chicago just get a bunch of very small tracks and plenty of switch/static cross.

So you're saying that it's just a bunch of SELECTs and JOINs?

flabdablet

It's just a SMOP.

Filed under: six weeks