JSON is hard...
-
So I'm working on a project that requires me to visually compare images for similarity.
Well this is just a perceptual hashing and distance calculation problem! That's EASY*!
* No, it's not. This is what we call foreshadowing
ImageMagick can generate that using its
moments
module.ImageMagick also supports JSON output from all "indentify" type operations, of which
moments
is one.So you would think all I have to do is:
- Spawn `convert -quiet -moments $filename json:-`
- Read stdout
- Parse with `JSON.parse()`
Right?
Ha ha ha ha ha ha ha ha!
BZZZZZZZT! WRONG! (but thanks for playing)
Depending on the input image the output can contain:
- embedded
␍
characters. That's the U+240D "SYMBOL FOR CARRIAGE RETURN"
Ugh... fiiiine. I can fix that......
stdout = stdout.replace(/␍/gm, '')
- Number fields in the output that for some reason do not have a value are printed at
(nan)
instead of the more sensiblenull
(or even just omitting the field entirely)
Okay, okay, I can fix that too.
stdout = stdout.replace(/␍/gm, '') .replace(/\b(nan)\b/g, 'null')
Certain dimension fields are printed as
(+12345+67890)
What the actual flying fuck? Okay! Fine!~ You're now strings!
stdout = stdout.replace(/␍/gm, '') .replace(/\b(nan)\b/g, 'null') .replace(/\s(\+\d+\+\d+)$/gm, '"$1"')
- Commas may be omitted arbitrarily at the end of fields that contain objects and arrays for reasons of I do not know
YOU SAID YOU SUPPORTED JSON OUTPUT! :giant_squid_of_anger: THIS IS NOT JSON OUTPUT!
stdout = stdout.replace(/␍/gm, '') .replace(/\b(nan)\b/g, 'null') .replace(/\s(\+\d+\+\d+)$/gm, '"$1"') .replace(/([^,}\][{])($|\n)/g, '$1,$2')
- Processing images that have multiple frames such as GIFs, aPNGs, or simply images that consist of images concatenated together for I don't know why results in a list of response objects, not comma separated, and not embedded in an array.
-sobbing quietly-
stdout = '[' + stdout.replace(/␍/gm, '') .replace(/^\s+$/gm,'') .replace(/\n\n/gm, '\n') .replace(/\b(nan)\b/g, 'null') .replace(/\s(\+\d+\+\d+)$/gm, '"$1"') .replace(/([^,}\][{])($|\n)/g, '$1,$2') .replace(/,((\s)*[\]}])/g, '$1') .replace(/^\}\n/gm, '}, \n') .replace(/, \n,/gm, '') + ']'
Finally it parses, and so far...... It hasn't been thrown an image that this monstrosity doesn't fix the JSON output to be parseable, but I'm only about 10% the way through processing all the images I need to process... Still tens of thousands of user submitted images to go...
Then comes the fun part of comparing all those results against each other to find images visually similar to those claimed to be copyrighted by what I suspect to be a copyright troll company trying to get us to pay them money to go away.... But.... That's a matter for our corporate lawyers. I just gotta get this analysis they want.
But seriously ImageMagick...... If you're going to be generating your JSON output by string concatenating, maintaining internal state, and praying you're generating valid JSON........ Please don't.
I'd rather have dealt with parsing a fixed width flat file output than this.....
-
/me passes @Vixen a hankey
-
@Vixen You might try the parser VSCode uses for JSON. It seems to cope with all manner of syntax errors - a fact I discovered when noticed an egregious error in my VSCode config, yet all my other settings were unaffected.
-
@error said in JSON is hard...:
@Vixen You might try the parser VSCode uses for JSON. It seems to cope with all manner of syntax errors - a fact I discovered when noticed an egregious error in my VSCode config, yet all my other settings were unaffected.
is that one available as a nuget package? cause i didn't see it...
-
@Vixen said in JSON is hard...:
@error said in JSON is hard...:
@Vixen You might try the parser VSCode uses for JSON. It seems to cope with all manner of syntax errors - a fact I discovered when noticed an egregious error in my VSCode config, yet all my other settings were unaffected.
is that one available as a nuget package? cause i didn't see it...
Nuget? I thought we were talking about JavaScript.
-
Or you could parse the data as JS and traverse the AST.
-
@error said in JSON is hard...:
we were talking about JavaScript.
only because it's more concise to show the transformations of the stdout i need to go to get it to parse.
Y'all don't want to see the C# it was originally written in do you?
;-P
-
@error said in JSON is hard...:
Or you could parse the data as JS and traverse the AST.
but those transformations needed...... arent' valid JS either. and if i have to munge it it's better to munge it to JSON proper, yes?
-
@Vixen said in JSON is hard...:
@error said in JSON is hard...:
Or you could parse the data as JS and traverse the AST.
but those transformations needed...... arent' valid JS either. and if i have to munge it it's better to munge it to JSON proper, yes?
Yes. JS parsers just tend to be more resilient.
-
@Vixen What about json containing COBOL data? It contains only one long string, the first value starts at index 0 of that string and extends for n1 characters, foollowed by value 2 extending for n2 characters, ...
I am pretty sure that there is someone who already invented that (ehm, actually several people inventing that independently).
-
Isn't the standard solution for these cases base64 encoding?
-
@_P_ said in JSON is hard...:
Isn't the standard solution for these cases base64 encoding?
The best thing about standards is there's always a different one?
-
@Vixen said in JSON is hard...:
It hasn't been thrown an image that this monstrosity doesn't fix the JSON output to be parseable
Yeah no, it';s still throwing bad JSON. only 50 images out of 80k tho. and of those 30 are incomplete uploads or uploads of something that isn't an image in the first place (oh. hello there wanacry..... too bad you were uploaded to a linux host.... [DELETE! DELETE! DELETE!])
-
@dcon said in JSON is hard...:
@_P_ said in JSON is hard...:
Isn't the standard solution for these cases base64 encoding?
The best thing about standards is there's always a different one?
base85? base91? salted base64?
-
@_P_ said in JSON is hard...:
@dcon said in JSON is hard...:
@_P_ said in JSON is hard...:
Isn't the standard solution for these cases base64 encoding?
The best thing about standards is there's always a different one?
base85? base91? salted base64?
-
@_P_ said in JSON is hard...:
@dcon said in JSON is hard...:
@_P_ said in JSON is hard...:
Isn't the standard solution for these cases base64 encoding?
The best thing about standards is there's always a different one?
base85? base91? salted base64?
Don't forget to throw in some rot13 - or even better, rot26!
-
-
@Vixen said in JSON is hard...:
@_P_ said in JSON is hard...:
@dcon said in JSON is hard...:
@_P_ said in JSON is hard...:
Isn't the standard solution for these cases base64 encoding?
The best thing about standards is there's always a different one?
base85? base91? salted base64?
-
-
@Vixen said in JSON is hard...:
YOU SAID YOU SUPPORTED JSON OUTPUT! :giant_squid_of_anger: THIS IS NOT JSON OUTPUT!
Use a HOCON parser, that'll work on all of the stuff you mentioned.
-
-
@pie_flavor said in JSON is hard...:
@Vixen said in JSON is hard...:
YOU SAID YOU SUPPORTED JSON OUTPUT! :giant_squid_of_anger: THIS IS NOT JSON OUTPUT!
Use a HOCON parser, that'll work on all of the stuff you mentioned.
Should I consider this to replace YAML for my config files?
-
@error If you're looking to replace YAML for your config files, start with TOML.
But after that, yes.
-
@pie_flavor said in JSON is hard...:
TOML
*choking back bile after reading the spec* Oh, right. Earth-73.
-
@error I assume you have some religious aversion to first-class date-times.
-
@pie_flavor said in JSON is hard...:
first-class date-times
It sounds like dates are not really on the list of concerns...
-
@Tsaukpaetra said in JSON is hard...:
It sounds like dates are not really on the list of concerns...
you are more concerned with the rejection?
-
@pie_flavor said in JSON is hard...:
@error I assume you have some religious aversion to first-class date-times.
Nah. Just can't afford first class.
-
@error said in JSON is hard...:
@pie_flavor said in JSON is hard...:
@Vixen said in JSON is hard...:
YOU SAID YOU SUPPORTED JSON OUTPUT! :giant_squid_of_anger: THIS IS NOT JSON OUTPUT!
Use a HOCON parser, that'll work on all of the stuff you mentioned.
Should I consider this to replace YAML for my config files?
I will switch to HOCON when it has:
- a fully compliant JS implementation
- IDE support
-
@error said in JSON is hard...:
@pie_flavor said in JSON is hard...:
TOML
*choking back bile after reading the spec*
TOML is human-written config file format, not data exchange format. It's great for human-written config files, mostly because it avoids endless nesting - but that's exactly why it sucks for data exchange.
-
@Gąska said in JSON is hard...:
@error said in JSON is hard...:
@pie_flavor said in JSON is hard...:
TOML
*choking back bile after reading the spec*
TOML is human-written config file format, not data exchange format. It's great for human-written config files, mostly because it avoids endless nesting - but that's exactly why it sucks for data exchange.
Yeah, I want to use it for config files, but I don't want my config files resembling ini files (though I know lots of FOSS projects do).
-
@error why not? It's very readable. It's easy to write. It's trivial to comment out small fragments, rearrange sections, or copy snippets from the internet. What's not to like?
-
@error said in JSON is hard...:
@Gąska said in JSON is hard...:
@error said in JSON is hard...:
@pie_flavor said in JSON is hard...:
TOML
*choking back bile after reading the spec*
TOML is human-written config file format, not data exchange format. It's great for human-written config files, mostly because it avoids endless nesting - but that's exactly why it sucks for data exchange.
Yeah, I want to use it for config files, but I don't want my config files resembling ini files (though I know lots of FOSS projects do).
Is it bad that the entirety of my configuration these days is
const frobWidgets = process.env['PRODPREFIX_FROB_WIDGETS'] || true
cause it is....
I'd do a configuration file, but like I only have like five configuration options, and four of them are just overriding the database connection stuff (default connect to host
postgres
databasepostgres
, userpostgres
passwordletmein
) and the defaults are suitable for connecting to a containerized database anyway..... so you know. really you should only have the one configuration setting to change because you just give me a container to connect to as database. :-)
-
@Gąska said in JSON is hard...:
@error why not? It's very readable. It's easy to write. It's trivial to comment out small fragments, rearrange sections, or copy snippets from the internet. What's not to like?
I think YAML is a better fit for my use-case.
roles: admin: $and: - roles: { $nin: [ bots, persona_non_grata ] } - $or: - roles: { $in: [ owner ] } - groups: $in: - administrators - trust_level_4 bots: groups: { $in: [ bots ] } owner: userslug: { $eq: error } persona_non_grata: userslug: $in: - levicki
-
Tangentially, I once believed that a YAML parser could parse JSON, because said YAML parser said it was JSON-compatible. It lied - it could only parse JSON written in the very-specific whitespace-senistive way that YAML likes.
Valid:
{key: "value"}
Invalid:
{key:"value"}
-
@PotatoEngineer said in JSON is hard...:
Tangentially, I once believed that a YAML parser could parse JSON, because said YAML parser said it was JSON-compatible. It lied - it could only parse JSON written in the very-specific whitespace-senistive way that YAML likes.
Yeah, it fooled me with that claim, too. It doesn't even support tab-indention, which is one of my major gripes with it. The others are:
- a lack of ways to import other YAML documents
- you can merge objects but not concat arrays
- you can't reference individual properties of objects
HOCON addresses several of these.
-
@PotatoEngineer Years ago I had a task where the output was an XML file with only a couple items changed. The normal process was to copy the example XML file, copy a few fields from a row in a browser window (IE-only, of course) and put them in the right place in the XML file, then submit the changed file. I wrote myself a small program where you would paste the entire row and it would grab the fields and do the needful, and spit out an XML file. However, the files got rejected. I checked the diff between my generated file and a working one, and the only change was some whitespace in between the XML tags.
I still sometimes wonder what they were using to consume these files.
-
@error said in JSON is hard...:
@Gąska said in JSON is hard...:
@error why not? It's very readable. It's easy to write. It's trivial to comment out small fragments, rearrange sections, or copy snippets from the internet. What's not to like?
I think YAML is a better fit for my use-case.
roles: admin: $and: - roles: { $nin: [ bots, persona_non_grata ] } - $or: - roles: { $in: [ owner ] } - groups: $in: - administrators - trust_level_4 bots: groups: { $in: [ bots ] } owner: userslug: { $eq: error } persona_non_grata: userslug: $in: - levicki
Heh, your configs read like Crusader Kings/Europa Universalis event scripts. Yeah, I don't think it'd benefit much from change. TOML is more for the simple, deeply-nested-hierarchy-of-plain-mostly-optional-values config files, not for quasi-declarative-programming kind.
-
@error said in JSON is hard...:
- a lack of ways to import other YAML documents
- you can merge objects but not concat arrays
- you can't reference individual properties of objects
HOCON addresses several of these.
Depending on a particular implementations, you'll run into different walls regarding references and includes (I'm only familiar with Java implementation from Lightbend, and it can only do absolute names and doesn't like includes being put in array). And rolling your own is quite a feat - HOCON is basically mashup of several different incompatible object notations, similar to including HTML in Markdown.
-
@hungrier said in JSON is hard...:
I still sometimes wonder what they were using to consume these files.
No, you really shouldn't...
-
@dcon said in JSON is hard...:
@hungrier said in JSON is hard...:
I still sometimes wonder what they were using to consume these files.
No, you really shouldn't...
Homegrown XML/JSON parsers, as usual. Shouldn't have expected anything else.
-
@error said in JSON is hard...:
@PotatoEngineer said in JSON is hard...:
Tangentially, I once believed that a YAML parser could parse JSON, because said YAML parser said it was JSON-compatible. It lied - it could only parse JSON written in the very-specific whitespace-senistive way that YAML likes.
Yeah, it fooled me with that claim, too. It doesn't even support tab-indention, which is one of my major gripes with it. The others are:
- a lack of ways to import other YAML documents
- you can merge objects but not concat arrays
- you can't reference individual properties of objects
HOCON addresses several of these.
If you need something flexible, this looks interesting:
It is superset of JSON, it has comments and unquoted keys to make hand-writing easier, but it can also do imports and define variables for repeated bits and reference earlier values and a lot more. You can also use it to pre-process configuration and spit it out as JSON, YAML or INI.
-
@Bulb it looks very interesting, but I shudder at the thought what can be done with it by our not very technical data scientists...
-
@Bulb said in JSON is hard...:
@error said in JSON is hard...:
@PotatoEngineer said in JSON is hard...:
Tangentially, I once believed that a YAML parser could parse JSON, because said YAML parser said it was JSON-compatible. It lied - it could only parse JSON written in the very-specific whitespace-senistive way that YAML likes.
Yeah, it fooled me with that claim, too. It doesn't even support tab-indention, which is one of my major gripes with it. The others are:
- a lack of ways to import other YAML documents
- you can merge objects but not concat arrays
- you can't reference individual properties of objects
HOCON addresses several of these.
If you need something flexible, this looks interesting:
It is superset of JSON, it has comments and unquoted keys to make hand-writing easier, but it can also do imports and define variables for repeated bits and reference earlier values and a lot more. You can also use it to pre-process configuration and spit it out as JSON, YAML or INI.
It seems to meet all my needs except the JavaScript implementation, which is kind of a showstopper.Never mind, this should work: https://www.npmjs.com/package/jsonnet-loader
-
@error It's isn't a JavaScript implementation though, just a hook to allow running it from webpack as an external tool.
-
@Bulb said in JSON is hard...:
@error It's isn't a JavaScript implementation though, just a hook to allow running it from webpack as an external tool.
Good enough for my porpoises.
Filed under: Eee! Eee!
-
@error Then all is dandy ;-).
-
Wait, . They have a Linux binary, and an OSX binary. Hm, are they perhaps missing a popular platform?
Filed under: I'm talking about BSD, obviously.
-
@Gąska said in JSON is hard...:
@Bulb it looks very interesting, but I shudder at the thought what can be done with it by our not very technical
data scientistsbusiness users...
-
@error said in JSON is hard...:
Wait, . They have a Linux binary, and an OSX binary. Hm, are they perhaps missing a popular platform?
Filed under: I'm talking about BSD, obviously.
As usual, you have to build it yourself. If you can't do that, you failed the entry test as an Open Source Developer and hence no support will be given to you