PostgreSQL with JSON: Should I take the plunge?



  • So far, my usage of PostgreSQL JSON field was limited to validity checking. Eg. test that the provided blob of text is valid JSON.

    The only time I was bit was when I tried to compare two records containing a JSON field.

    SELECT '{"name": "a", "age": 13}'::json = '{"age" :13,"name": "a"}'::json
    

    ERROR: operator does not exist: json = json
    SQL state: 42883

    But fine, I understand how difficult this comparison would be to implement.

    But now, I'm on verge of taking a plunge into the deep water. I have a bunch of auto-generated tables with fixed data schema. Eg. each table has columns id and name. But now I need to add variable number of additional columns, that will be configurable through user interface.

    One way is to just alter all my functions and code to work with arbitrary number of columns. Which is just barely this side of possible. But a huge pain in the ass.

    Or I can use json. My quick tests so far worked like a dream. Just add a column data of type json, and store a hash of key/values in it. The two thing I expect to need (dig into json data in WHERE queries and add indexes) seem to be covered.

    My question is, what's the catch? Am I missing some terrible downside that's gonna bite me in the ass later on?

    Anyone here has any experience to share?


  • Winner of the 2016 Presidential Election

    @cartman82 said:

    Or I can use json. My quick tests so far worked like a dream. Just add a column data of type json, and store a hash of key/values in it. The two thing I expect to need (dig into json data in WHERE queries and add indexes) seem to be covered.

    I intended to use something similar, but it didn't get the time to test it. Are you storing any additional metadata about too, or just values? This bit is just my own curiosity though.

    As for experience, I don't store any JSON but I deal with decoding and encoding to and from JSON in a stored procedure and I can confirm that I had no hiccups, works great. Granted, datasets I work with are pretty small but so far I had no problems with it.

    Unfortunately, I don't store any JSON at this point so I can't comment on how good or bad the indexing and querying is.



  • @cartman82 said:

    But fine, I understand how difficult this comparison would be to implement.

    JSON?

    That's actually pretty easy to compare. (Not fast, but not difficult.)



  • @Onyx said:

    I intended to use something similar, but it didn't get the time to test it. Are you storing any additional metadata about too, or just values? This bit is just my own curiosity though.

    I'll be storing the list of hash keys per table in a metadata table. So I can make sure user doesn't try to do something with a key that's not there.

    @Onyx said:

    As for experience, I don't store any JSON but I deal with decoding and encoding to and from JSON in a stored procedure and I can confirm that I had no hiccups, works great. Granted, datasets I work with are pretty small but so far I had no problems with it.

    Unfortunately, I don't store any JSON at this point so I can't comment on how good or bad the indexing and querying is.

    I pussied out when I was making procedures and just used regular arrays. Eg.

    CREATE FUNCTION blagh(arg1 integer[], arg2 text[])
    
    #...
    
    SELECT blagh(ARRAY[1, 2], ARRAY['one', 'two'])
    
    

    instead of

    CREATE FUNCTION blagh(args json[]) AS...
    
    #...
    
    SELECT blagh('[ {"arg1": 1, "arg2": "one" }, { "arg1": 2: "arg2": "two" } ]'::json)
    

    I guess it's more efficient, but boy is it annoying to prepare these arguments in code.

    @blakeyrat said:

    JSON?

    That's actually pretty easy to compare. (Not fast, but not difficult.)

    Yeah, some people already did it on their own. I guess pg guys are kind of like Apple, and would rather not have the feature than leave a performance mine for the unwary to get burned.



  • @cartman82 said:

    Yeah, some people already did it on their own. I guess pg guys are kind of like Apple, and would rather not have the feature than leave a performance mine for the unwary to get burned.

    According to that page:

    UPDATE: PostgreSQL 9.4 comes with the new JSONB type. It is better, stronger, faster and can be sorted. Just use that if possible.

    So presumably (?) they can test for equality with it.



  • Autogenerated JSON is awesome. There is no catch, except that you have to be careful about validation for clients (and you should be, anyway). This makes your job easier. Especially if your application code lets you build JSON parsers/emitters for a class and they are syntactically compatible encodings. (Typically, you can adjust the encoding syntax some, since there is no canonical encoding).



  • They're probably pre-tokening it, which would make compares a lot faster than comparing raw text.



  • Nice one @boomzilla!

    SELECT '{"txt":"a", "num":1}'::jsonb = '{"num" :1, "txt":  "a"}'::jsonb
    

    t

    I actually completely missed the jsonb format. I limited all my doc searches to 9.3 because that's the version I have in production. I vaguely remember hearing better json support is coming. Seems like it's here.

    This is even better. Definitely worth pushing the servers to 9.4.



  • json columns will certainly work great for things like user preferences, because you have tri-state bools for free - TRUE, FALSE, DEFAULT.

    Also, whenever you have an array that will be usually 1-3 items but maybe more, json will be nicer than a table you JOIN on.


    I believe in Go you can do this:

    var currentSettings Settings // where Settings is a struct{...} - i.e. by-value
    // currentSettings is all zeroes, so copy the defaults in
    currentSettings = defaultSettings
    // Now expand the settings json on top of that
    decoder.Decode(&currentSettings)
    

    Free tri-state booleans.



  • You can also put a *bool field in your struct so you can tell the difference between the default value and a nonexistent value.



  • @riking said:

    json columns will certainly work great for things like user preferences, because you have tri-state bools for free - TRUE, FALSE, DEFAULT.

    I don't understand.

    @riking said:

    Also, whenever you have an array that will be usually 1-3 items but maybe more, json will be nicer than a table you JOIN on.

    I use pg arrays for that. JSON seemed like overhead.

    But since pg 9.4, from their docs:

    JSON objects are better suited than arrays for testing containment or existence when there are many keys or elements involved, because unlike arrays they are internally optimized for searching, and do not need to be searched linearly.

    Who would have thought.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.