JSONx is Sexy LIESx

dkf

This is easy: [1,,3,,5,,7,,9]... and so on. The point @tarunik made earlier about JS arrays already having a notation for this and JSON for some reason not supporting it is very relevant here.

It's not even a particularly compact format when it comes to real sparse arrays, where the average distance between elements can be 10 or 20 entries. :p

VaelynPhi

@Jaime said:

I doubt it. In languages without sparse arrays, key-value pairs fill the same logical role. The true access semantics are either lookup by key or lookup by index. If you intended to create something that behaves exactly as JavaScript sparse arrays do, then you need professional help.

I don't think key-value pairs being the go-to for this supports the notion that they're best. Either way, you conceded the point: sparseness is a semantic feature of the data being interchanged. You just want to argue over the best way to do it.

@flabdablet said:

Yeah, it is, but it's about as good as JSON can do for a sparse array. Of course, now that I have finally understood that you don't get any control at all over the actual JSON stringify/parse process, it's all moot; you can't do better than the post-facto null-stripping you're already doing with map().

Presumably I can... it just seems like more work to find out how and fiddle with middleware internals than to just add the relevant couple lines to my own code.

@dkf said:

It's not even a particularly compact format when it comes to real sparse arrays, where the average distance between elements can be 10 or 20 entries.

Perhaps, but between [,,,,,,,,,,,,,,,,] and [{ key: 0, value: null }, ::rinse and repeat:: ], I'll take the former over the latter, which looks like XML. Maybe [{},{},{}...] is a better equivalent? I dunno. Either way, anything with internal keys in an array already having keys, even for sparse arrays, in a dynamic language, is a bit ridiculous. For C(++) or other languages where sparse arrays are truly sparse (ie, correspond to vast pieces of unused memory), a solution with internal keys might make sense. Then again, using native arrays for a sparse array (or really at all for most things) is probably asking for it in those languages.

Another relevant Python (Monty) style question is "Well, how sparse?" I could probably spend the time figuring out where in an array of size n the cost of storing extra characters for existing entries in the key/value form overwhelms the cost of extra commas for non-existing entries, but I'm going to assume that we can all figure out this tradeoff and make a rational decision.

The data I'm dealing with is contiguous from load time until the user deletes something, then the undefined (null in the case of JSON-sent data) elements are intentional gaps. When saving the data (which happens when the server shuts down or when all the clients have disconnected), these are filtered and the data is contiguous again. So, barring users being really crazy, the data I'm dealing with will always have a low sparseness compared to, say, the ridiculously bad test array I posted earlier.

Point being, at some point the sparseness can overwhelm your data and contiguous, array-semantic data starts being non-contiguous hash-semantic data. This is most evidence in the case where data is actually mapped to memory (in the C style case), where having huge gaps is just a waste and speed loses the tradeoff to memory. For this, we have indexed hashmaps to take over.

dkf

@VaelynPhi said:

Perhaps, but between [,,,,,,,,,,,,,,,,] and [{ key: 0, value: null }, ::rinse and repeat:: ], I'll take the former over the latter, which looks like XML. Maybe [{},{},{}...] is a better equivalent? I dunno. Either way, anything with internal keys in an array already having keys, even for sparse arrays, in a dynamic language, is a bit ridiculous. For C(++) or other languages where sparse arrays are truly sparse (ie, correspond to vast pieces of unused memory), a solution with internal keys might make sense. Then again, using native arrays for a sparse array (or really at all for most things) is probably asking for it in those languages.

I was comparing these syntaxes:

["abc",,,,,,,,,,,,,,,,,,,,"def"]

{"0":"abc","20":"def"}

Both are possible ways to encode same sparse array, with the first being not legal in JSON right now but what you wanted, and the second being an alternative way to do it and being both legal and shorter by 10 characters; if keys could be any atomic value instead of needing to be strings, we could make it 4 characters shorter.

{0:"abc",20:"def"}

Note that I'm talking about encoding and not what you've got in memory or what language you're using; it's all about how you write the bytes on the wire (or on the disk, if you use it for that). I'm militantly unfussed about what happens when the other side picks it apart.

If I could, I'd allow keys to be any JSON value. I don't know what I'd do with the ability to have structured keys, but I bet it could be used interestingly.

flabdablet

@VaelynPhi said:

For C(++) or other languages where sparse arrays are truly sparse (ie, correspond to vast pieces of unused memory), a solution with internal keys might make sense.

Sure, maybe, in the target language. But given a data interchange format that defines a concise native representation for key->value maps, I think letting implementation details of an endpoint's hashmap/sparsearray implementation leak into the data interchange is a mistake.

@dkf said:

I'm militantly unfussed about what happens when the other side picks it apart.

What he said.

flabdablet

@dkf said:

If I could, I'd allow keys to be any JSON value. I don't know what I'd do with the ability to have structured keys, but I bet it could be used interestingly.

Fuck it. Let's just give up on JSON and use LISP syntax for data interchange.

Filed under: (parentheses for the win)

EvanED

@flabdablet said:

I think letting implementation details of an endpoint's hashmap/sparsearray implementation leak into the data interchange is a mistake.

I think I could make a pretty good argument that it's the [,,,,,,] syntax that is leaking implementation details. Suppose you want to represent a sparse array that is 1024 entries, with indices 0 and 1023 used and the others open. Which is leaking implementation details more:

{0: "abc", 1023: "def"}

or

["abc",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"def"]

? (That may be off by a couple, or a factor of two, or something.)

Which one is the saner representation?

another_sam

@EvanED said:

(That may be off by a couple, or a factor of two, or something.)

That's just one more reason to not ever do that, ever. I don't know why all the fuss about a different way to represent key value pairs.

dkf

@another_sam said:

I don't know why all the fuss about a different way to represent key value pairs.

Because someone said something silly and wanted to backpedal without appearing to do so too obviously.

flabdablet

@EvanED said:

Which one is the saner representation?

For the OP's use case i.e.

@VaelynPhi said:

So, barring users being really crazy, the data I'm dealing with will always have a low sparseness compared to, say, the ridiculously bad test array I posted earlier.

the one with the multiple commas would probably be better. As he says:

@VaelynPhi said:

Point being, at some point the sparseness can overwhelm your data and contiguous, array-semantic data starts being non-contiguous hash-semantic data.

Personally I would like to see JSON include specific support for sparse arrays using syntax like

[
    "zero",
    "one",
    ,
    ,
    "four",
    1023:"one thousand and twenty-three",
    "one thousand and twenty-four"
]

...because although Javascript treats sparse arrays as string-indexed hashmaps with array semantics tacked on, sparse arrays are their own thing with their own semantics. It should be up to the endpoints, not the data interchange format, to decide whether it's better to implement sparse arrays as hashmaps or in some other fashion.

There are precedents for the idea of setting a specific index and then carrying on from there in the array initializer syntax for C99 and the whole-array assignment syntax in bash - the latter being an example of a language that does make a semantic distinction between sparse arrays and hashmaps.

ben_lubar

flabdablet

That's all very well, but building new standards as strict supersets of existing ones can work. Consider USB: it's effectively displaced PS/2, IEEE-1284, RS-232, parallel SCSI, and even PCI for some kinds of peripherals; and you can still plug USB1 devices into USB3 ports and have them work.

Bulb

JSON replaced the overcomplicated XML crap because it is simple. If it gets extended to include everyone's favourite misfeature, it will become overcomplicated and will have to be replaced again.

I prefer protobuf (though the compiler has it's own share of ) anyway.</ >

dkf

@Bulb said:

JSON replaced the overcomplicated XML crap because it is simple. If it gets extended to include everyone's favourite misfeature, it will become overcomplicated and will have to be replaced again.

It's all because some people simply won't accept that things have an inherent level of complexity, and that naturally complicated stuff can't be communicated without that complexity either showing up in the communications or as some sort of (explicit or implicit) shared context. It's gotta be somewhere, OK?

kilroo

@dkf said:

If I could, I'd allow keys to be any JSON value. I don't know what I'd do with the ability to have structured keys, but I bet it could be used interestingly.

First and most directly comparable language that comes to mind that supports that is Lua. (I'm sure it's not the only one.) It has an object literal notation that could be extracted into a data interchange format--pretty sure someone was working on that last year--but I think that effort (somewhat reasonably, perhaps) did not attempt to support, say, functions.

Yep, here it is. https://bitbucket.org/alexames/lon

Bulb

@kilroo said:

Yep, here it is. lon

How should one take that seriously. It does not even have a description of the actual format and list of features and nothing. Something like yaml at least does.

@dkf said:

complexity […] It's gotta be somewhere, OK?

XML, however, contains a lot of complexity that is completely unnecessary for data serialization, because it is a repurposed document markup. And it is absurdly verbose. And the good features it has often get ignored by developers and something silly and non-standard gets used instead anyway, because ignorance is easy.

And then there is of course the naive SAX and DOM apis that are pain in the arse to use.

Bort

@flabdablet said:

Fuck it. Let's just give up on JSON and use LISP syntax for data interchange.

GitHub - edn-format/edn: Extensible Data Notation

Extensible Data Notation. Contribute to edn-format/edn development by creating an account on GitHub.

Eldelshell

That wouldn't work for @VaelynPhi either:

nil

nil represents nil, null or nothing. It should be read as an object with similar meaning on the target platform.

No undefined which is what started this whole thing.

Bulb

@Eldelshell said:

No undefined

Of course not. Because undefined (distinct from null; in some other languages undefined is a name for null and that doesn't count) is a JavaScript quirk.

Bort

@Eldelshell said:

No undefined which is what started this whole thing.

I don't feel like re-reading this thread: Why would you need undefined in a serialization format? You don't need to specify that a property is undefined, just don't mention in the data.

Bulb

@Bort said:

re-reading this thread

You don't have to. It started the whole thing, so it suffices to read the initial post.

@Bort said:

Why would you need undefined in a serialization format?

Because @VaelynPhi abuses the javascript quirk that it has two distinct null values, null and undefined, and the javascript quirk that array with undefined items (but not null items) behaves as sparse array (because it is not an array but a hashmap with unless-you-delete-some-consecutive numeric indices so deleting from it does not shift the following indices like in approximately any other language).

boomzilla

@Bort said:

You don't need to specify that a property is undefined, just don't mention in the data.

The use case was a sparse array. And, actually, your idea was what he wanted JSON to do for the array.

Bort

Now that I'm all caught up, I can actually participate in the conversation.

well...

Nothing to add here, carry on...

smallshellscript

@flabdablet said:

it's effectively displaced... RS-232

I wish. Wouldn't have to drag these

around all the time nor have to try to figure out what arbitrary settings the last guy decided to use.

FrostCat

What are you using that you still need those?

Most people don't need those older-style cables.

smallshellscript

Industrial controls (think PLCs, Motion Controls, Touch Screens, etc). The cheap ones generally don't come w/ USB interfaces and anything > ~5 years old definitely doesn't. I think a fair number of embedded systems still use it too. Even every hipster's "favourite" Arduino uses an FTDI to a RS-232 UART to talk to the microcontroller over USB.

And don't get me started on the things that still use RS-422/485...

Now if they'd only come out with a converter that could fake a reliable LPT port that would work with legacy hardware keys.

tar

@ben_lubar said:

You have 0.0000001 elements in your hash table?

How about if I have 1 element stored in 10000000 tables?

blakeyrat

10,000,000 tables? It's just match-3 thing, not a big deal.

tar

@Bulb said:

yaml

+1 for YAML, I've been using it for data modelling and serialization in personal projects, and I've yet to find a use-case it can't handle. It's like using XML, except, you know, [i]actually[/i] human-readable.

(Also, it's a superset of JSON, so you can pretend it's JSON...)

dkf

@tar said:

(Also, it's a superset of JSON, so you can pretend it's JSON...)

The model is very close. The syntax isn't.

Salamander

YAML Ain’t Markup Language (YAML™) revision 1.2.2

... every JSON file is also a valid YAML file. This makes it easy to migrate from JSON to YAML if/when the additional features are required.

If you can find a case of json that is not valid yaml, I'm pretty sure that's a bug that should be reported.

accalia

@Salamander said:

If you can find a case of json that is not valid yaml, I'm pretty sure that's a bug that should be reported.

found one!

{
  "A": 1,
  "A": 2
}

Valid JSON invalid YAML

From http://yaml.org/spec/1.2/spec.html#id2759572

In practice, since JSON is silent on the semantics of such duplicates, the only portable JSON files are those with unique keys, which are therefore valid YAML files.

Filed under:

Salamander

They're referring to an old specification. The newer ones pretty much refer to it as undefined behaviour, with implementations doing anything up to and including throwing parse errors.

accalia

well the reference implementation i'll refer to is the V8 implementation used in chrome and nodejs

JSON.parse('{"A": 1,"A": 2}')

using that code results in this output.

Please, if you know of one show me an implementation that does anything differently.

Salamander

Will the Java implementation by Douglass Crockford suffice?

accalia

Nota Bene: I know that i am asserting a De Facto Standard behavior, not a De Jure Standard behavior.

Officially the behavior is undefined, in practice it's acceptable and the implementation is last of any duplicate keys wins. ;-)

accalia

fair ehough, and that is standards compliant.

is anyone using it?

also where did your post come from. when i posted my nota bene you post was no where to be seen! even after i left the thread and returned your post wan't there....

Salamander

@accalia said:

is anyone using it?

A quick search tells me it is used by selenium and the javascript closure compiler, among hundreds of other things.

@accalia said:

also where did your post come from. when i posted my nota bene you post was no where to be seen! even after i left the thread and returned your post wan't there....

I think we both know the answer to that.

dkf

@accalia said:

If you can find a case of json that is not valid yaml, I'm pretty sure that's a bug that should be reported.

Found an easy example. This is accepted by JSONLint but rejected by YAMLLint.

{
   "a":	"b"
}

ben_lubar

http://yaml-online-parser.appspot.com/?yaml={"a"%3A"b"} &type=json

Salamander

@dkf said:

This is accepted by JSONLint but rejected by YAMLLint.

Considering YAMLLint barfs on examples from the YAML specification, I wouldn't exactly consider it a good example.

dkf

@ben_lubar said:

http://yaml-online-parser.appspot.com/?yaml={"a"%3A"b"} &type=json

Try this one…
http://yaml-online-parser.appspot.com/?yaml={"a"%3A "b"} &type=json

(That's the second online YAML parser that I've made throw up on valid JSON. I think I might win this round…)

Salamander

Again, fails on the same example from the YAML specification I posted earlier.
Most likely those online parsers are 1.1 or earlier, not 1.2

accalia

@Salamander said:

A quick search tells me it is used by selenium and the javascript closure compiler, among hundreds of other things.

hmm...

fair enough then. i'll claim only half a victory as my example is accepted by all browser based parsers i'm aware of and is not explicitly denied by the spec.

i will agree that it is a bit of a degenerate case to specify the same key more than once, hence my half a victory (wouldn't that be a draw?)

flabdablet

@tar said:

+1 for YAML

Can it do sparse arrays?

dkf

@Salamander said:

Again, fails on the same example from the YAML specification I posted earlier.

I have no plans whatsoever to write a YAML parser. The YAML parsers I've found online with trivial searching failed to parse that valid JSON document. Ergo either you're wrong, or you need to find a parser that has evidential proof of conforming to the version of the spec you are talking about and which will parse that exact document I provided.

If we get to round 2, I'll see if I can find some other evilness.

flabdablet

To my way of thinking, that ridiculously overcomplicated spec - a spec, incidentally, that allows both JSON and stuff that looks like Markdown to be valid YAML - is plenty evil enough.

tar

@flabdablet said:

YAML

Can it do sparse arrays?

On the basis of this thread, the most succinct acceptable answer is probably going to be "no".

Having said that, serializing your sparse array out into a map, and using that map data to reconstruct the sparse array is a viable strategy.

Does [i]XML[/i] support sparse arrays?

tar

Isn't the [b][i]REAL[/i][/b] WTF the fact that JavaScript supports both undefined and null, and that they have similar but not identical semantics?

loopback0

@tar said:

Isn't the REAL WTF ... JavaScript

Probably, yes.

tar

If PHP was JavaScript it would also have real_undefined and real_null and and non-commutative equality relationship between all of them.