JSONx is Sexy LIESx

accalia

It's not, strictly, O(1), but it's a hell of a lot closer to O(1) than to O(n), or you'd have been seeing a several-orders-of-magnitude increase in time.

fun fact. in C array index IS an O(1) oepration.

funner fact there are three different speeds that O(1) operation can complete in.

listed in fastest to slowest speeds

Index value is in CPU cache (L1, L2 or L3 (which are different speeds, but they're very close))
Index value is in Core memory (RAM)
Index value is in RAM that has been paged to disk.

ben_lubar

It is O(n). All operations on a hash table are really slow if you are unlucky with your hash function's results.

FrostCat

Maybe it is, but n is something like .0000001, so who cares?

ben_lubar

You have 0.0000001 elements in your hash table?

flabdablet

@Jaime said:

My preference would be something like:

{
    "things": [
        {"key": "0", "value": "zero"},
        {"key": "1", "value": "one"},
        {"key": "2", "value": "two"}
    ]
}</pre</blockquote>That's @Rhywden's inner platform at work again. Given a competent JSON parsing library in the target language, why would that be any easier to deserialize than a straight JSON object with the same inner keys and values stored as native JSON keys and values?

Jaime

Imagine the target is either C# or Java. If you use the key/value schema, then you can deserialize to classes that have known properties. If you have a JSON blob that could have any property names, then you could only access the data using reflection. It's not an inner platform, it's simply having a well defined exchange interface that's generally sane and doesn't have compatibility problems.

You are looking at the problem with JavaScript colored glasses. But even in JavaScript, you have to see that if you are exchanging an object and it could have any data contained in it, then you don't actually have a data interface.

dkf

@Jaime said:

Image the target is either C# or Java.

The usual deserializer for Java turns a JSON object into a Map<String,Object> (or the logical equivalent). It's usually implemented with a hash table, because they work really well in practice. You do see some people going all the way to a POJO, but that's a lot rarer; at that point, you might as well use JSON Schema to say what you're doing anyway.

Accurately describing the actual format of the data you might be sending is both tedious and hard. People tend to try to avoid it, the utter fools! If you don't describe what the messages are, how can you expect a third party to write something that interfaces with what you're doing sensibly? (Oh, you don't think that anyone would ever want to do that…?)

FrostCat

@ben_lubar said:

You have 0.0000001 elements in your hash table?

are you angling to be the next blakeyrat?

If the difference between 100 and 100000000 or whatever elements is only a factor of 3, then n is a very very very small number and for all intents and purposes, the operation can safely be considered O(n). Is this not obvious?

Yamikuronue

@FrostCat said:

If the difference between 100 and 100000000 or whatever elements is only a factor of 3, then n is a very very very small number

In O notation, n is the size of the input data. In this case, the number of items in the hash table (100 and 100000000). So if n is 0.0000001, you have 0.0000001 items in your table. What you probably meant was that the co-efficient of n was 0.0000001, but the co-efficient is ignored for calculating relative algorithm speed.

Basically, O(0.0000001n) and O(10000000n) are considered the same speed, because as n grows to infinity, the co-efficient stops mattering.

Jaime

@dkf said:

The usual deserializer for Java turns a JSON object into a Map<String,Object>

Yeah, but that treats the whole data structure as an untyped mess. Instead of getting serialization errors on bad formats, you get null reference exceptions on use. That defeats the whole point of strong typing. It's deserialization to a POJO or POCO that I was referring to.

dkf

@Jaime said:

Yeah, but that treats the whole data structure as an untyped mess.

Yes, it's an honest representation.

flabdablet

@FrostCat said:

If the difference between 100 and 100000000 or whatever elements is only a factor of 3, then n is a very very very small number and for all intents and purposes, the operation can safely be considered O(n). Is this not obvious?

No. In fact all you can conclude from the two benchmarks presented is that the operation requires time t(n) where t() is a function that grows as n grows. With only two data points, you can't really tell whether t() is polynomial, which would in fact class the associated algorithm as O(n^m) where m is the degree of the highest polynomial term; or logarithmic, yielding O(log n); or something else.

Note particularly that t(n) = 1 + 0.000001n, which is possibly the kind of relationship you have in mind, is still theoretically O(n) because it's a polynomial whose highest degree is 1. However, for most practical purposes an O(n) algorithm whose constant time component dominates to that extent is close enough to constant time to be treated as if it were in fact O(1).

In fact any decent hash table implementation is likely to be closer to O(log n) than O(n), and its actual t(n) function will look something like a + b log n with b very small relative to a. The entire point of a good hash table is to achieve a nearly constant lookup time.

flabdablet

@Jaime said:

If you use the key/value schema, then you can deserialize to classes that have known properties.

You can do that equally well with the raw object schema, precisely because the type of the object you're trying to deserialize any particular piece of JSON into is known ahead of time. There is nothing at all to stop you deserializing an ordinary JSON object straight into some instance of whatever data structure your local language uses to represent a hashmap if you're expecting the JSON to represent an arbitrary hashmap at that point.

The argument that JSON shouldn't be able to represent sparse arrays because it will also need to be used by languages that don't support those is weak, in my view, because JSON can and does represent arrays whose length is implicit and data-dependent; the languages that don't have sparse arrays are the very same ones that don't have variable-length arrays either.

flabdablet

@dkf said:

>Jaime:
Yeah, but that treats the whole data structure as an untyped mess.

Yes, it's an honest representation.

JSON data is loosely and implicitly typed. That doesn't stop anybody designing a strongly typed object to deserialize any particular chunk of JSON data into, and ensuring that the supplied JSON data fits it properly at parse time rather than use time.

dkf

@flabdablet said:

JSON data is loosely and implicitly typed. That doesn't stop anybody designing a strongly typed object to deserialize any particular chunk of JSON data into, and ensuring that the supplied JSON data fits it properly at parse time rather than use time.

Sure, but you tend to do that as a separate mapping step, and at least in Java the libraries that do this aren't standard (there are some awful problems when you get to deal with edge cases unless you do “extra cunning stuff”). Sane programmers mostly prefer the main JSON mapping instead, it's much easier to make work.

flabdablet

@dkf said:

you tend to do that as a separate mapping step, and at least in Java the libraries that do this aren't standard

I have bugger-all Java experience. Are you telling me it's not normal practice, when using JSON to persist and/or transfer Java objects, to define a constructor that accepts a JSON string or stream, along with a toJSON() method, for such objects?

Eldelshell

A constructor? That's stupid. A fromJSON maybe, but you don't want to tie up your models to some data transfer format or external data representation.

flabdablet

As I said, I have bugger-all Java experience. Does Java not do the thing where you can have multiple constructors and it works out which one to use by parameter type?

Eldelshell

Yes, but placing all that "logic" in a constructor is a big NO-NO and it's nothing special with Java, just good design.

flabdablet

Fair enough. I'm perfectly willing to believe that being able to say something like

Foo foo = new Foo(someJSONdataSource);

is a terrible idea for reasons I'm simply too inexperienced to have seen. Live and learn.

Could you do me a favour and point me to some materials I can study in order to understand why constructors that do significant work are the Wrong Thing?

dkf

@flabdablet said:

Are you telling me it's not normal practice, when using JSON to persist and/or transfer Java objects, to define a constructor that accepts a JSON string or stream, along with a toJSON() method, for such objects?

Correct. One would define POJOs that the (de)serialization engine could interact with directly, possibly guided by annotations. It's a lot less error-prone than creating constructors for everything, and there are some excellent implementations for working with XML. The state of the tooling for JSON is nothing like as good (alas; I've been dragged through that briar bush…)

flabdablet

So the essential problem, then, is that with Java you're not given a competent JSON parsing library in the target language.

Personally I would still strongly prefer that internal hashmap implementation details for my client language of choice did not leak into my JSON interchange formats; {"key": "0", "value": "zero"} as a proposed replacement for {"0": "zero"} just smells horrible to me.

dkf

@flabdablet said:

So the essential problem, then, is that with Java you're not given a competent JSON parsing library in the target language.

No. It's just that there are two ways of interpreting JSON. One way says that we should build a model of JSON in Java datastructures, and the other way says that we should build a model of Java datastructures in JSON. If you're dealing with arbitrary JSON, you need a very specific approach out of that pair or you're SOL…
@flabdablet said:

Personally I would still strongly prefer that internal hashmap implementation details for my client language of choice did not leak into my JSON interchange formats; {"key": "0", "value": "zero"} as a proposed replacement for {"0": "zero"} just smells horrible to me.

Yep. Maps should become maps. Compact numerically-indexed sequences (lists, arrays, whatever you want to call them) should become compact numerically-indexed sequences.

flabdablet

Is there a standard Java idiom for dealing with the fact that a JSON array's length is unknown at the time it begins to get parsed?

boomzilla

The most obvious approach is to use a List (variable, dynamic length) instead of an array.

accalia

@flabdablet said:

Is there a standard Java idiom for dealing with the fact that a JSON array's length is unknown at the time it begins to get parsed?

don't do much Java these days but these seem like reasonable approaches:

Map to arraylist then get array from array list after parsing complete
do a two pass parse, one to get the length one to parse contents
use a library that someone else maintains and has already solved these problems.

dkf

@accalia said:

Map to arraylist then get array from array list after parsing complete

I wouldn't bother. Most of the time you don't need an actual array; a List will do well enough. (ArrayList will manage the array for you, and it's ~~append~~ add method uses the well-known amortisation approach to reallocation.)

flabdablet

So, I've just found out with a quick Google search that Java and C# both have readily available sparse array libraries. And I'd be astounded if something similar isn't available in any language, to be frank; hell, it wouldn't even be hard to whip up a half-decent one in C.

I'm still at a loss to think of any good reason why sparse arrays should have been left out of JSON, especially given the tidiness of the Javascript syntax for sparse array literals.

FrostCat

@flabdablet said:

>Is this not obvious?

No.

Ugh. I meant to write "considered O(1)" there.

flabdablet

The big O is often misused.

Jaime

@flabdablet said:

Personally I would still strongly prefer that internal hashmap implementation details for my client language of choice did not leak into my JSON interchange formats; {"key": "0", "value": "zero"} as a proposed replacement for {"0": "zero"} just smells horrible to me.

Remember that is example is quite contrived since there aren't a whole lot of use cases where you have a list of unknown length of things that you know little about. The actual situation would dictate how you would represent it.

In order to end up with the problem the OP had, you almost certainly are in a situation where you should model it as a map or a key/value pair.

Jaime

@dkf said:

Correct. One would define POJOs that the (de)serialization engine could interact with directly, possibly guided by annotations. It's a lot less error-prone than creating constructors for everything, and there are some excellent implementations for working with XML. The state of the tooling for JSON is nothing like as good (alas; I've been dragged through that briar bush…)

I'm surprised, .Net has excellent support for serialization to/from JSON.

Magus

As blakey might say, and I would agree with:

What? Java does something worse than .NET? What is the world coming to? That's completely unthinkable!

ben_lubar

json package - encoding/json - Go Packages

Magus

Yay! Go has json methods! Can we talk about good languages now?

dkf

@Jaime said:

.Net has excellent support for serialization to/from JSON.

Is that of the “handle arbitrary JSON” or “handle arbitrary .NET objects” variety?

lucas

This is the defacto JSON library for .NET:

james.newtonking.com/json

I assume something similar exists for Java

Jaime

I've never tried to access JSON as unstructured data - I was referring to the ability to deserialize JSON to a POCO.

I don't understand how people end up with these JSON blobs that don't have a defined schema. How would you expect to do anything meaningful with the data if you don't even know its structure? How does a property "go missing"?

another_sam

@flabdablet said:

Foo foo = new Foo(someJSONdataSource);

is a terrible idea

You are putting a dependency in your business objects on the method by which they are serialised and deserialised. Even worse, you are requiring your business objects to know how to deserialise themselves. Will you add a constructor that takes XML, one for a JDBC ResultSet, etc? I hope not.

@flabdablet said:

with Java you're not given a competent JSON parsing library in the target language.

I've used Jackson for serialisation, it supports deserialisation but I've never used it for that.

@flabdablet said:

{"key": "0", "value": "zero"} as a proposed replacement for {"0": "zero"} just smells horrible to me.

They have identical semantics. You are being confused by the integer keys in the sparse array that JavaScript supports. They are just keys, and the values are just values.

Maciejasjmj

@dkf said:

Is that of the “handle arbitrary JSON” or “handle arbitrary .NET objects” variety?

Pretty much both. You can deserialize JSON to a POCO and have the library throw up when the model doesn't match (which is very often what you actually want - after all, you expect the client to send something meaningful), or to a dynamic object which works in most of the other cases (rules for .NET identifiers still apply though), or to a string dictionary (I think - I've never used that last one).

I don't think there's a variety of JSON that can't be covered by .NET somehow.

flabdablet

@another_sam said:

They have identical semantics.

Exactly, which is why using the untidy version smells bad.

Unnecessarily prolix data interchange formats give me hives. The entire point of text-based data interchange is easy human readability; obscuring the actual data in redundant boilerplate is an error, to my way of thinking.

flabdablet

@another_sam said:

You are putting a dependency in your business objects on the method by which they are serialised and deserialised. Even worse, you are requiring your business objects to know how to deserialise themselves.

Speaking as somebody who has not yet been bitten by enough OO antipatterns to know any better, I don't yet understand why exposing enough of an object's internals to allow some other object to do ser/des for it is a Better Thing than teaching it to do its own. Is data encapsulation no longer Teh New Hotness?

another_sam

In OO, data encapsulation is still the new hotness. That's pretty core, I don't see that changing in a hurry. But like many things, it's best in moderation.

Things like separation of concerns are also considered important, and loosely coupled code. If your object knows JSON, it's tightly coupled to JSON. Ditto XML, JDBC, etc. It also becomes a large chunk of code that's trying to do many things instead of modelling just one business logic thing. These are commonly referred to as God objects and it's a common antipattern.

A more flexible system has your business logic at the core, loosely coupled so it's easily unit tested. Then you have a JSON/XML/JDBC library, and a serialiser/serialiser/DAO. That serialiser/serialiser/DAO knows JSON/XML/JDBC and just enough of your business object to get the job done.

You may even have multiple implementations for different serialisation in the same project, and it doesn't pollute your business objects.

flabdablet

Thanks, that makes sense.

flabdablet

@another_sam said:

That serialiser/serialiser/DAO knows JSON/XML/JDBC and just enough of your business object to get the job done.

How would it be if your business objects had generic serialize() and deserialize() methods that took a Serializer object as a parameter, and you derived specialist Serializers for JSON, XML, unit test stub, or whatever? That way, Serializer would only ever need to know how to deal with a handful of primitive types rather than needing to see inside your business objects at all, and your business objects would not need to know anything about specific serialization formats.

ben_lubar

Why write extra code? Just let the serializer handle your business objects. We have something called "reflection" now.

flabdablet

@ben_lubar said:

Why write extra code?

To avoid the possibility of business logic needing to leak into the serializer.

@ben_lubar said:

We have something called "reflection" now.

Doesn't using it run entirely counter to the principle of data encapsulation?

another_sam

@flabdablet said:

How would it be if your business objects had generic serialize() and deserialize() methods that took a Serializer object as a parameter

As always, it depends. It's not necessarily a terrible design. Native Java serialisation is similar except there's only one kind of serialiser and all JVM objects have a default implementation of "Serialise ALL the fields!".

It does require your objects to know how to serialise even if they don't know the format, which depending on object graph complexity might lead to unecessary complexity in your objects. It's often better for that complexity to be in the object that is responsible for it: the serialiser.

@flabdablet said:

To avoid the possibility of business logic needing to leak into the serializer.

That's pretty much the entire job of the serialiser.

@flabdablet said:

Doesn't using [reflection] run entirely counter to the principle of data encapsulation?

Yes, and compile-time safety and performance and many other things. Sometimes it's still the best solution to the problem.

Maciejasjmj

@flabdablet said:

Doesn't using it run entirely counter to the principle of data encapsulation?

You can make some sort of a facade POCO object tied to your actual model. Kinda like a viewmodel, or something.

In this regard, the list of object's properties - ie. what you get via reflection - is your contract.

flabdablet

So instead of adding serdes methods to your objects, you just bolt friends onto the sides of them? How much complexity does that save, in practice? ("That depends" is the answer I'm expecting there.)