Elemental: War of Magic

eBusiness

@eBusiness said:
FYI, RELOAD is a binary format, and a fine example that a file doesn't need to include human-readable overhead to implement key-value and tree structured storage.

Yeah, I know, as I designed it.

As dhromed also implied, that doesn't prevent your initial statement form being a logical fallacy.

Though I would have done a few things differently, the format looks like a pretty decent job. Mainly it seems like you haven't decided whether you are optimizing for file size or parsing speed. For file size a VLI will almost always be better than a fixed length integer, but it costs a little extra CPU time. And then of course arrays of keyless data would seem like a natural element to have.

pkmnfrk

@dhromed said:

classic snap.

Nonetheless,
@pkmnfrk said:
I pick [RELOAD] over a binary format
Wait what?

Ok, let me clarify. By "binary format", I mean this:

@eBusiness said:

There is a data format class that most people seem to have forgotten exists, binary data. It can be formed in countless ways, but basically it is very easy to use. Every programming language capable of file handling has these neat commands, typically named something like readint, writeint, writeint64, writestring, readstring. All one has to do is use the corresponding sequence of read commands when reading the file, as was used for writing it. It is way faster than all the text data formats.
Anyone programmer who think of binary data as clumsy, weird, hard to use or "only for the pros", really ought to get to know their computer better. A binary format should be the default choice, and then XML might be chosen in a few cases where a normal binary file format lacks a feature that XML has.

I designed RELOAD to be like XML, but with minimal overhead. Yes, it's binary, but it's semantic and not prone to the kind of clusterfucks that this guy is advocating. Tags are named, but the names are stored in a string table so they only need to appear once, etc. You get the benefits of a binary format, with most of the benefits of an XML [/JSON/YAML] file (the only one being that it's not user editable. But, to fix this, I also created a utility to convert an XML document to RELOAD)

@eBusiness said:

@pkmnfrk said:
@eBusiness said:
FYI, RELOAD is a binary format, and a fine example that a file doesn't need to include human-readable overhead to implement key-value and tree structured storage.

Yeah, I know, as I designed it.
As dhromed also implied, that doesn't prevent your initial statement form being a logical fallacy.
Though I would have done a few things differently, the format looks like a pretty decent job. Mainly it seems like you haven't decided whether you are optimizing for file size or parsing speed. For file size a VLI will almost always be better than a fixed length integer, but it costs a little extra CPU time. And then of course arrays of keyless data would seem like a natural element to have.

We debated using VLI for sizes recently. The reason I didn't was to make the file-writing routines a lot simpler than they would be otherwise. It doesn't know the size of an element on disk until it writes it. So, it write a four-byte blank, remembers where it is, and then continues on. Once it's done writing the element fully (including children, etc), it skips back and fills in the size.

The other option would be omit the size entirely, and force any reader to (at least) scan the entire document. In many of it's current uses, this is what happens anyway (the whole tree is loaded and accessed in memory), but I've also implemented "delay loading", which only loads a the tree as you access it. It's designed for large trees with thousands of nodes, where you might only touch a few branches.

eBusiness

You seem not to have noticed that I simply described binary formats in general, I didn't go further to describe more specific kinds of binary formats because that seemed beyond the point, and the appropriate format vary a lot depending on use. Anything can be messed up, and independently built file formats do have some proneness to ending up messed up, but that shouldn't keep able programmers from making them. As for your use of the word "clusterfuck", that is what you end up with if you are not an able programmer, but then you project should be a learners project, and making some clusterfuck will probably turn out to be a valuable lesson. Saying that I'm advocating clusterfuck (to anyone not working on toy projects), that is just plain rude.

While I think it should be quite possible to generate the file in memory and avoid using a clumsy write routine, my main concern was the actual data, you make some nice variable length integer routines, but don't make them available for the big bulk of data where they would actually make a noticeable impact on file size. In any case, that's technicalities, if you are only handling thousands of nodes this is certainly not going to make or break the project. I'm merely discussing the choices because it's an interesting subject.