Install under any other name…

Gąska

Which is all funny because XML was really designed to be used for describing a logical document. It's actually rather good at doing that

Logical documents have some structure. Structure means order. Order means predictability. Predictability means that saying what tag you're closing is pure redundancy.

@dkf said:

while you can wedge text formatting in, that's actually Discoursean in its levels of

FWIW, HTML wasn't that bad, back in non-interactive non-DOM-manipulating non-content-presentation-separation days. And it's basically XML.

@dkf said:

What's worse, most of the alternatives that people propose instead are actually better in XML than most of the alternatives

@dkf said:

Most programmers should be kept a long way away from any kind of format that might be visible to users at all ever, because most programmers are twerps

The problem is, the whole internet is based on plain text, so you can't run away from it without creating your own internet.

Weng

User. System maps would require IT to give a fuck.

Weng

@Gaska said:

Storage format? Not good. Data exchange format? Not good. Serialization format? Not good. Database format? Not good. Configuration files? Not good. Text formatting? Works well, but wasn't designed for it.

XML as a programming language.
Welcome to my hell.

Of particular note is the hellish recursive XSLT-based runtime JIT compiler that burns CPUs alive).

Gąska

I would complain about unmatched parentheses, but in XML, parens have no special meaning, so it's okay.

dkf

@Gaska said:

Predictability means that saying what tag you're closing is pure redundancy.

And redundancy isn't necessarily bad. ;)

Re JSON to XML for IPC, JSON works very well for simple cases. Simple cases are enough a lot of the time. Try to build anything complicated though (and yes, using multiple controlled vocabularies is a great way to get into “complicated”) and that JSON starts to grow some really ugly warts; the advantages over XML really don't show for anything large. You're sending a document. That document should be clear as to what it means. Once you've got that clarity at the conceptual level, the mapping to the serialization format is pretty simple whatever you're going to.

At least it's not ASN.1…

Gąska

@dkf said:

And redundancy isn't necessarily bad.

It's good if it lets you detect and correct corrupted data. In XML, it only lets you to detect, so it's pretty useless - especially that more redundancy = bigger file = higher chance of error happening to start with.

@dkf said:

Re JSON to XML for IPC, JSON works very well for simple cases. Simple cases are enough a lot of the time.

Given that simple cases are the typical case, and that JSON is most often shown as alternative to XML, then earlier when you said that in most cases XML is better than most alternatives, you effectively lied.

@dkf said:

Try to build anything complicated though (and yes, using multiple controlled vocabularies is a great way to get into “complicated”) and that JSON starts to grow some really ugly warts

If we get rid of the JS in JSON (by which I mean, make every value a string - except, of course, objects and arrays), I don't see anything that can go wrong. I don't have a goddamn clue what a controlled vocabulary is (and Wikipedia article isn't just not helpful - it's anti-helpful), so pray tell, what feature exactly makes XML a better choice here than JSON? With a concrete example, if you might.

@dkf said:

You're sending a document. That document should be clear as to what it means.

Clear to the human operator, or clear to the machine that parses it? If the former, PDF is almost always the best choice. If the latter, XML is almost always the worst choice due to the complexity of syntax.

@dkf said:

Once you've got that clarity at the conceptual level, the mapping to the serialization format is pretty simple whatever you're going to.

What you said is like claiming that you can write any program in any programming language and it will perform the same. While technically true, it's missing the point completely - it's not about whether something is possible (of course it's possible - Turing-completeness and all), but about what language would be easiest/most efficient choice. Claiming that XML is good at very complicated stuff is like claiming that PHP is good at very large websites. Yes, there are very large websites built in PHP. It means nothing.

@dkf said:

At least it's not ASN.1…

http://i3.kym-cdn.com/photos/images/newsfeed/000/270/513/406.gif

dkf

@Gaska said:

I don't have a goddamn clue what a controlled vocabulary is (and Wikipedia article isn't just not helpful - it's anti-helpful), so pray tell, what feature exactly makes XML a better choice here than JSON?

Controlled vocabularies are basically a fancy kind of enumeration in concept space, except that it's not just “here's a bunch of terms: pick one” but rather “here's a bunch of terms with meanings: pick one”. Controlled vocabularies tend to be pretty useful when building metadata schemes, as they make it much more possible for people to search for things that are relevant. Generalized tagging schemes work too, but suffer from problems with everyone making their own tags and the folksonomy spinning out of control.

Controlled vocabularies are about the only form of ontology that has found significant real world traction. Lots of people say they want the more complicated stuff so they can do complex reasoning, but such efforts rapidly run into the sand IME. CVs are about what we can actually get away with.

The relevance to XML is that XML's tackled the awful namespacing problem so that it is possible to talk about two or more CVs at once. Thus, a “Foo” in domain A is not identified as being the same as a “Foo” in domain B (they're probably completely different ideas, or worse, not completely different but still not the same) yet they can be talked about together in the one document. That's really really useful for cross-domain integrative work.

It's possible to do the same thing with JSON, YAML and stuff like that, but the result is no easier to read and requires that you reinvent all the namespacing stuff yourself, meaning that nobody else will understand it. Belgium that.

(Before you ask, I really don't like how XML does namespaces. But at least it bothers.)

@Gaska said:

Clear to the human operator, or clear to the machine that parses it? If the former, PDF is almost always the best choice. If the latter, XML is almost always the worst choice due to the complexity of syntax.

Except that the alternatives tend to be either much more specific to a domain (e.g., LaTeX is pretty good at actual document content description, but nobody uses it for anything else) or outright miserable. Yes, I see people going “aaaah, angle brackets!!!” but they really don't come up with anything that is better for non-trivial examples.

Case in point: PDF isn't nice for anything other than showing to people. Getting the content out of it in a machine-readable form too often requires running it through OCR because of the awful things that some PDF generators do…

Gąska

@dkf said:

Controlled vocabularies are basically blah blah blah blah blah

Thank you very much for explanation. I'm sorry that even though you wrote all this stuff, I barely grasp the concept.

@dkf said:

The relevance to XML is that XML's tackled the awful namespacing problem so that it is possible to talk about two or more CVs at once.

Correct me if I'm wrong, but isn't namespacing just tacking on something: in front of an identifier? It's very simple to implement for other formats.

@dkf said:

It's possible to do the same thing with JSON, YAML and stuff like that, but the result is no easier to read

It's not easier, but neither is harder. But it's definitely shorter, and easier on parser.

@dkf said:

and requires that you reinvent all the namespacing stuff yourself, meaning that nobody else will understand it.

I wonder how many people in your team knew about XML namespacing before being hired in place where they needed this knowledge. I bet that most didn't, and I bet that teaching them the concept was just as hard (or easy) as it would be to teach them a custom solution for any other format.

@dkf said:

Except that the alternatives tend to be either much more specific to a domain (e.g., LaTeX is pretty good at actual document content description, but nobody uses it for anything else)

See? A right tool for a right job.

@dkf said:

or outright miserable.

Earlier, you said that in common case, JSON is sufficient. I think you're missing at least one more "or" in this sentence.

@dkf said:

Yes, I see people going “aaaah, angle brackets!!!” but they really don't come up with anything that is better for non-trivial examples.

My main gripe about XML is that the end of an element must repeat the name of the element (redundancy), and that data can be put in either a parameter or child element (TIMTOWTDI). Not to mention mixed content. All of these make it unnecessarily hard to parse. XML tries to be a jack of all trades - a single format that's appropriate for everything. While it's indeed very versatile and can be used in many different ways, a custom-tailored format with most features non-existent that's good for one thing and one thing only, will always be better (more readable, more performant, easier to parse) than XML.

@dkf said:

Case in point: PDF isn't nice for anything other than showing to people.

But it's really awesome at showing to people.

Jaime

@Gaska said:

https://i.imgflip.com/u2u4n.jpg

You can actually serialize a date in XML. Not just a string that happens to represent a date, an element that any deserializer will deserialize as a real date without having to know that it's supposed to be a date.

Gąska

The thing is, my serialized objects make sense ONLY in my application. Why would I care about any other deserializer than the one in my application?

Bulb

@Jaime said:

an element that any deserializer will deserialize as a real date without having to know that it's supposed to be a date.

That is UtterlyUseless™ Anyway™.

If the deserializer does not know its supposed to be a date, chances are the application that called the deserializer does not know either and therefore isn't able to use the date for anything with it anyway and therefore does not care whether the deserializer recognizes it as date or not.

@Gaska said:

The thing is, my serialized objects make sense ONLY in my application. Why would I care about any other deserializer than the one in my application?

No, you wouldn't. And within single application a simpler format is indeed usually preferred (I personally prefer protobuf).

However

@Gaska said:

Correct me if I'm wrong, but isn't namespacing just tacking on something: in front of an identifier? It's very simple to implement for other formats.

only XML actually went and defined it. If you need format that will be used by multiple applications and expect the applications may want to add their extensions, being possible it add it is not enough. You need it to be defined already and that is what only XML does.

dkf

@Gaska said:

Correct me if I'm wrong, but isn't namespacing just tacking on something: in front of an identifier?

Not really. Yes, that's most of what it looks like on the surface, but the complicated part is that you've also got to come up with a way of assigning owners to those namespaces in a way that persuades others that the namespace really is owned by someone who is authoritative for it.

The way XML uses — namespaces are really URLs, with the foo: stuff in front of it being just a mapped prefix — is both great (people buy into owning URLs very easily) and very sucky (the prefix mapping mechanism has resulted in many people writing XML “parsers” with regexps and other such sins). Programming languages like Java and C# do a much nicer job of the namespace problem IMO, but at a cost of a much more complex state model. However, the thing that really annoys me about XML's namespaces is that there's no guarantee at all that the namespace will map to its definition (i.e., a schema document of some kind) in any sort of a sane way. Because namespaces and schemas are totally different things in the minds of people at the W3C. (Of course, Java and C# also don't think so much in terms of providing a mechanism to validate what people talk about in the namespace, which is a whole 'nother level of complexity: programming languages are not document description languages.) ASN.1 is the other system that really tackled the namespace problem, and their answer (the OID system) is absolutely fucking awful.

The silly thing is that if you solve these hard problems anywhere else, you're still going to have something almost as nasty. JSON and YAML avoid the problems by pretending that they don't exist, and you can get away with that when you're in a little closed world where you control what all the servers and all the clients are; you control what the terms mean and force everyone to work inside that assumption. Walled gardens work efficiently, but the thing that makes them work is precisely the thing that makes them relatively weak at interoperation: their extreme myopia and extensive assumed shared context. Wider reality is not so forgiving: the closed world assumption doesn't work.

dkf

@Bulb said:

that is what only XML does

No, but the alternatives are worse.

Gąska

@Bulb said:

If you need format that will be used by multiple applications and expect the applications may want to add their extensions, being possible it add it is not enough. You need it to be defined already and that is what only XML does.

My hypothetical custom made-up-on-the-spot format that will only ever be used in this scenario also defines it. Isn't that enough?

@dkf said:

Not really. Yes, that's most of what it looks like on the surface, but the complicated part is that you've also got to come up with a way of assigning owners to those namespaces in a way that persuades others that the namespace really is owned by someone who is authoritative for it.

The only place I've ever actually encountered namespaces is XAML, and they seem totally random to me. They don't make anything any clearer. But maybe it's just me, or it's MS misusing the feature.

Bulb

@Gaska said:

My hypothetical custom made-up-on-the-spot format that will only ever be used in this scenario also defines it. Isn't that enough?

Then you are adding yourself more work, because you have to specify many things where with XML you can just point at the existing specification and be done with it, and you are reducing chances for adoption, because you are creating YetAnotherStandard™. XML is not perfect, but it is GoodEnough™ and established already.

@Gaska said:

The only place I've ever actually encountered namespaces is XAML, and they seem totally random to me. They don't make anything any clearer. But maybe it's just me, or it's MS misusing the feature.

XAML is another place where just slapping an unadorned xmlns attribute in the root element would be about the right amount of namespace use.

One good example of putting XML namespaces to good use is the Genshi templating engine. The template is any XML document with directives, that are attributes or elements in namespace http://genshi.edgewall.org/. That way the template is a valid XML document, so it can be edited in normal XML editor, and even valid XML document of the target format (usually XHTML, but can be anything), so it can be loaded for checking (for that it allows to have placeholders in the template that will be replaced by the actual values using content or replace directives).

An a good example of the intended main use case might be GPX. Older GPS units provided data in ad-hoc manufacturer-specific formats. New units switched to GPX. But since each unit has some extra data, a new Garmin unit might return something that starts with e.g.:

<gpx xmlns="http://www.topografix.com/GPX/1/1"
    xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3"
    xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1"
    creator="Oregon 400t" version="1.1"
    …>

And it will be a valid GPX that any tool that understand GPX can use. If it also understands the Garmin extensions, then it can use them and if not, it won't confuse them with any other similar extensions that someone else might have defined.

Another case might be mixing HTML or (or DocBook or even OpenDocument) with XLIFF. Xliff defines an x element (unfortunately not shown in examples in the WP page) for annotating parts of the string to translate. These can be used to annotate part of HTML document—or Genshi template—with notes like what are names that should not be translated or what will be substituted for various placeholders. Then the document would be taken apart by translate toolkit, creating an XLIFF where the entries would still embed some HTML (like emphasis tags and links), the XLIFF translated and the translated document reassembled again with translate toolkit.

The key thing here is that in these cases it is someone extending or adapting a format defined by someone else and using namespaces to avoid ambiguity with any future development of the formats and to allow each application to simply pass through the data that will be handled by another component of the processing chain reliably.

Gąska

@Bulb said:

Then you are adding yourself more work, because you have to specify many things where with XML you can just point at the existing specification and be done with it

But I have to implement the mapping between XML data model and the actual data model I'll be using in my application, and for this, I need to reimplement most features of XML in one way or another, including namespaces. So how exactly is that less work than doing the same with my custom-tailored format?

@Bulb said:

and you are reducing chances for adoption, because you are creating YetAnotherStandard™

Every XML file is its own standard. If it wasn't like that, there wouldn't be such thing as XSD.

@Bulb said:

XML is not perfect, but it is GoodEnough™ and established already.

You know what else was established? PHP.

@Bulb said:

XAML is another place where just slapping an unadorned xmlns attribute in the root element would be about the right amount of namespace use.

XAML uses namespaces for:

metadata
access to libraries
access to containment hierarchy parent's fields
access to inheritance hierarchy parent's fields

It's not just decoration; the namespaces are actually meaningful. The organization is batshit insane, though - but I'm not sure whether it's because of XAML in particular or XML in general.

@Bulb said:

One good example of putting XML namespaces to good use is the Genshi templating engine. The template is any XML document with directives, that are attributes or elements in namespace http://genshi.edgewall.org/. That way the template is a valid XML document, so it can be edited in normal XML editor, and even valid XML document of the target format (usually XHTML, but can be anything), so it can be loaded for checking (for that it allows to have placeholders in the template that will be replaced by the actual values using content or replace directives).

The grammatical structure of this paragraph makes it nigh-incomprehensible. From the parts I understand, I draw a conclusion that Genshi is awesome because XML, not because namespaces. So I still don't see why they're any good.

@Bulb said:

An a good example of the intended main use case might be GPX. Older GPS units provided data in ad-hoc manufacturer-specific formats. New units switched to GPX. But since each unit has some extra data, a new Garmin unit might return something that starts with e.g. (...) And it will be a valid GPX that any tool that understand GPX can use.

And without the xmlns:, it wouldn't be. Because... raisins. I think I've got it.

@Bulb said:

Another case might be mixing HTML or (or DocBook or even OpenDocument) with XLIFF. Xliff defines an x element (unfortunately not shown in examples in the WP page) for annotating parts of the string to translate.

And it's done via namespaces. Because it cannot be physically done via regular parameters.

@Bulb said:

The key thing here is that in these cases it is someone extending or adapting a format defined by someone else and using namespaces to avoid ambiguity with any future development of the formats and to allow each application to simply pass through the data that will be handled by another component of the processing chain reliably.

Can't I just skip the bytes I can't understand? Wouldn't that work the same?

Jaime

@Gaska said:

But I have to implement the mapping between XML data model and the actual data model I'll be using in my application, and for this, I need to reimplement most features of XML in one way or another, including namespaces. So how exactly is that less work than doing the same with my custom-tailored format?

If you choose XML, you don't have to consider these things:

How to I deal with embedded delimiters?
How do I represent common data types?
How do I get my editor to syntax highlight?
Writing a parser.

It doesn't solve everything (or even the majority of things), but why volunteer to do ground-up construction when you can start with a foundation?

XML does the foundation stuff better than many of the alternatives. JSON has limited data type support. CSV requires the exchanging parties to agree on an embedded delimiter strategy. ASN.1 is impossible to use without sophisticated tools. The main downsides of XML are its verboseness and the lack of native binary data support. Neither of those bother most people.

Gąska

@Jaime said:

How to I deal with embedded delimiters?

It's much simpler problem than "how do I link to this XML library?" in most environments.

@Jaime said:

How do I represent common data types?

How does XML represent common data types? And what do you mean by common data type?

@Jaime said:

How do I get my editor to syntax highlight?

If you need syntax highlighting, it means your format has terrible readability. That, or you shouldn't even look at it because it's not meant for human eyes to start with.

@Jaime said:

Writing a parser.

Simple parsers are simple.

@Jaime said:

It doesn't solve everything (or even the majority of things), but why volunteer to do ground-up construction when you can start with a foundation?

Because it still saves a ton of work in the long run and makes my life a little more pleasant?

@Jaime said:

JSON has limited data type support.

JSON is dynamically typed.

@Jaime said:

CSV requires the exchanging parties to agree on an embedded delimiter strategy.

I consider each delimiter to be a separate file format. Because they fucking are. They're all named the same just to fuck with us, programmers.

@Jaime said:

ASN.1 is impossible to use without sophisticated tools.

Same with XML.

@Jaime said:

The main downsides of XML are its verboseness and the lack of native binary data support.

Why would a text format support binary data?

@Jaime said:

Neither of those bother most people.

Most people don't care about SQL Injection vulnerabilities either.

Bulb

@Gaska said:

It's not just decoration; the namespaces are actually meaningful. The organization is batshit insane, though - but I'm not sure whether it's because of XAML in particular or XML in general.

It is definitely not because of XML in general. I suspect it is because XAML is using the wrong tool for the job, but then I am not actually using XAML, so I can't tell for sure.

@Gaska said:

…that Genshi is awesome because XML, not because namespaces.

Namespaces are what makes it sensible and sane. It could be done without by prefixing all the attributes with genshi- or something to create a poor man's namespace (after all, that's exactly how it's done in HTML and CSS, which don't have namespaces), but it would be somewhat hacky and many programs would choke on that, because they would not expect it and so on. With namespaces it is something XML was designed for.

@Gaska said:

And without the xmlns:, it wouldn't be. Because... raisins.

Because there are many extensions to GPX. And one day, sooner or later, more than one is going to use the same identifier, or next version of GPX is going to add something already used by some extension. Namespaces are to avoid that kind of conflict.

@Gaska said:

And it's done via namespaces. Because it cannot be physically done via regular parameters.

It is done via regular parameters. Just they are namespaced. Because every other XML-based format might have a tag called x. But with namespaces you can still safely mix them.

@Gaska said:

Can't I just skip the bytes I can't understand? Wouldn't that work the same?

You don't always want to skip the bytes you don't understand. Often you want to give at least a warning that the input contains some features that are not supported. With namespaces you can warn about unknown things in your namespace (because they mean the data is probably from a newer (or broken) version and may not work) and ignore things in different namespace.

dkf

@Gaska said:

XAML

Oh, you're talking about XAML. That's Different.

XML's system of careful definitions, namespaces and schemas works really quite well, provided you really do treat it like the recursive language that it is. Yes, this means that cranking out an XML parser isn't trivial, but there's a boat load of good existing parsers so why would you write your own? (Except some people do just that; they just can't seem to bring themselves to believe that regexps are a bad choice of tool for this task.)

There are some valid possible criticisms of XML, but the verbosity is by far the least of them (and when you can compress stuff well, verbosity is really a triviality). However, I'm curious whether you already know about any of the real criticisms of XML and so won't enlighten you just yet…

Gąska

@Bulb said:

Namespaces are what makes it sensible and sane. It could be done without by prefixing all the attributes with genshi- or something to create a poor man's namespace

If prefixing everything with "genshi-" works the same as using namespace "http://whatever.com/genshi", then what exactly is the benefit of using namespaces, functionally? Because it sounds like it's just a prefix for the sake of having prefix. A prefix that can be different on file to file basis but actually all refer to the same thing!

@Bulb said:

but it would be somewhat hacky and many programs would choke on that

Is there any other program that parses Genshi templates except for Genshi? And if so, what's the worst that can happen if you don't have namespaces in your XML? Is an XML file that doesn't use xmlns: even once in whole file ill-formed?

@Bulb said:

With namespaces it is something XML was designed for.

Extreme verbosity?

@Bulb said:

Because there are many extensions to GPX. And one day, sooner or later, more than one is going to use the same identifier

Most likely on purpose, based on the latest 40 years of computer history.

@Bulb said:

or next version of GPX is going to add something already used by some extension

There are at least four alternative ways to solve this problem, all of them being less verbose and just as good.

@Bulb said:

It is done via regular parameters.

By regular parameters, I meant non-namespaced ones. Obviously.

@Bulb said:

Because every other XML-based format might have a tag called x.

Why do I care about other formats? My XML is in my format, isn't it? Is there some written or unwritten rule that my manifest XML has to be valid OOXML in case someone might want to zip it?

@Bulb said:

You don't always want to skip the bytes you don't understand.

The alternative is to throw error. There isn't any reasonable third option.

@Bulb said:

Often you want to give at least a warning that the input contains some features that are not supported.

I don't treat logging as anything meaningful. Giving warning and proceeding is equivalent to just ignoring the damn thing.

@Bulb said:

With namespaces you can warn about unknown things in your namespace (because they mean the data is probably from a newer (or broken) version and may not work) and ignore things in different namespace.

Or I could just introduce <ext> tag that can be placed under any component and can include arbitrary XML data.

@dkf said:

XAML [An XML “application” that sane people try to avoid.]

Don't be so harsh. It's the best declarative GUI thingy that I've ever used, and I used pretty much everything available for free (at least for C#, C++ and Java).

@dkf said:

XML's system of careful definitions, namespaces and schemas works really quite well, provided you really do treat it like the recursive language that it is.

Sorry, but I still don't see any real benefits over using other formats. Hundred posts ago, I thought the namespaces actually have some special logic behind them, but it turned out they're literally nothing more than prefixes to names (with a logical equivalent of #define to keep them within sane length).

@dkf said:

There are some valid possible criticisms of XML, but the verbosity is by far the least of them (and when you can compress stuff well, verbosity is really a triviality). However, I'm curious whether you already know about any of the real criticisms of XML and so won't enlighten you just yet…

It completely misses the point. It tries to be a generic format for all data ever, forgetting that a file format isn't just syntactical structure but also the semantical meaning of data. The XML standard doesn't define any format - it merely defines how you should define your own special snowflake format if you want to be enterprisey. And there are three ways to attach data to each element (parameters, child elements and content) - which encourages people to exploit all three in equal proportions, which people have then to deal with, wondering whether a given thing should be a <data> element or simply content, or maybe data parameter? Sometimes it makes sense to have these separate options (see: HTML), but most often it doesn't.

Maciejasjmj

@Gaska said:

Can't I just skip the bytes I can't understand?

What's the point of having a schema to verify against at all, then? I don't really know precisely how XSD works, but I assume the point is to have your schema validator barf on a typoed or out-of-context tag.

So when you have:

<person>
    <name>John</name>
    <price>900000.00</price>
</person>

your validator sees that <price> doesn't belong in a <person> object and rejects your XML, but when you have:

<person>
    <name>John</name>
    <footballers:price>900000.00</footballers:price>
</person>

it sees the namespace, thinks "well, it's probably some extension to this schema, anyway not my job" and can skip it.

Gąska

@Maciejasjmj said:

What's the point of having a schema to verify against at all, then?

Yes, I've been wondering that too.

@Maciejasjmj said:

I don't really know precisely how XSD works

It works horribly. It would be much, much better if it wasn't XML.

@Maciejasjmj said:

but I assume the point is to have your schema validator barf on a typoed or out-of-context tag.

I'm pretty sure I can define "whatever" tags in it too.

@Maciejasjmj said:

So when you have (...) your validator sees that <price> doesn't belong in a <person> object and rejects your XML

One of the most stupid misconceptions in the world of IT is that a file with all valid data and then some should be treated as invalid. The problem wouldn't exist if they didn't make it themselves in the first place.

Maciejasjmj

@Gaska said:

One of the most stupid misconceptions in the world of IT is that a file with all valid data and then some should be treated as invalid.

You'd rather it silently went "well, I don't understand these data, so they must not be important"?

<stock>
    <name>WTF</name>
    <price>2.50</price>
    <price_mutliplier>10</price_mutliplier>
</stock>

A good validator would note the (optional) price_mutliplier is misspelled and tell you to go fix your shit. Your validator would just happily let you sell your stock at 2.50 per share instead of 25.00.

Gąska

@Maciejasjmj said:

You'd rather it silently went "well, I don't understand these data, so they must not be important"?

Yes, and maybe show some warning that the file contains bonus data that might have not been loaded properly.

@Maciejasjmj said:

A good validator would note the (optional) price_mutliplier is misspelled and tell you to go fix your shit. Your validator would just happily let you sell your stock at 2.50 per share instead of 25.00.

You deserve it for not unit-testing your shit.

Bulb

@Gaska said:

Why do I care about other formats? My XML is in my format, isn't it?

As long as we are talking about examples of good use of namespaces, no, it isn't, because the document is a mix of several formats (e.g. HTML+Genshi+XLIFF or XLIFF+OOXML or GPX+Garmin extensions etc.)

If it is your format and just your format, you don't need namespaces. And should either not have any, or have one global namespace declaration to make it absolutely clear this is your format.

@Gaska said:

Is there some written or unwritten rule that my manifest XML has to be valid OOXML in case someone might want to zip it?

No. That's why I think the way Android uses namespaces in its manifest (and resources) is a .

@Gaska said:

I don't treat logging as anything meaningful.

That is totally beside the point. The point is that new versions may need to be treated differently from independent extensions (e.g. the XLIFF annotations) and namespaces are tool to distinguish the two cases.

@Gaska said:

Or I could just introduce <ext> tag

…and then the extensions still need to avoid conflicting with each other. Namespaces handle both already.

@Gaska said:

… it turned out they're literally nothing more than prefixes to names.

Exactly.

@Gaska said:

The XML standard doesn't define any format - it merely defines how you should define your own special snowflake format if you want to be enterprisey.

Of course it does not (which is why having a file with extension .xml is and so is MIME-type text/xml without further specification).

However it is not just about being enterprisey. It is about having a way to extend it in future versions, by other tools that need to add something and/or by tools that do processing to some orthogonal aspect, without having to tackle all the problems, because the mechanisms are already specified and tools already written.

If you don't need any of that extensibility, XML is probably 0verkill. But if you do, I don't see any suitable alternatives.

Gąska

@Bulb said:

As long as we are talking about examples of good use of namespaces, no, it isn't, because the document is a mix of several formats (e.g. HTML+Genshi+XLIFF or XLIFF+OOXML or GPX+Garmin extensions etc.)

OK, I see now. Namespaces are good when what you really want is ZIP.

@Bulb said:

That is totally beside the point.

You raised it first.

@Bulb said:

…and then the extensions still need to avoid conflicting with each other. Namespaces handle both already.

Namespaces don't solve anything. The only advantage over other methods is that the identifier is BY CONVENTION very long and that BY CONVENTION is always unique.

@Bulb said:

Of course it does not (which is why having a file with extension .xml is and so is MIME-type text/xml without further specification).

+1

@Bulb said:

However it is not just about being enterprisey. It is about having a way to extend it in future versions, by other tools that need to add something and/or by tools that do processing to some orthogonal aspect, without having to tackle all the problems, because the mechanisms are already specified and tools already written.

None of these implies that XML was a good idea to build all this tooling around to start with, or to settle on namespaces as the standard way to achieve extendability.

Okay, now about conflicting extensions. Linux ecosystem is basically a million programmers doing a million random products every year. Most of the good ones are incorporated in various distros. These distros usually have central package manager with each program and library represented as separate package. These packages all have to be uniquely named. When was the last time you heard that someone had to rename something because of pre-existing packet?

dkf

@Gaska said:

These distros usually have central package manager with each program and library represented as separate package. These packages all have to be uniquely named. When was the last time you heard that someone had to rename something because of pre-existing packet?

Someone (well, many someones) sorts out the problems for you and you take that as evidence that there are no problems?

Gąska

You might look at it this way if you forget that applications are included in package managers after they're made, and before inclusion conflicts didn't matter, so if there are any, it would mean that the original author of the project has to change the name of his project after it's already established. Turns out it doesn't happen.

dkf

@Gaska said:

Turns out it doesn't happen.

You really seem to believe this. I think we have to go back to the meme image…

Gąska

Prove me wrong.

Bulb

@Gaska said:

Linux ecosystem is basically a million programmers doing a million random products every year. Most of the good ones are incorporated in various distros. These distros usually have central package manager with each program and library represented as separate package. These packages all have to be uniquely named. When was the last time you heard that someone had to rename something because of pre-existing packet?

I definitely do remember couple of cases. The most notable was, probably, git, in case of which the older, but less known, gnuit was renamed (both package and main executable). I am also sure I've seen a couple of packages that deviate from upstream name because of conflict.

Gąska

Cool, you've proved me wrong! I love when someone proves me wrong because it lets me learn new things!

Weng

Well you see Firefox was once called Firebird. And before that, Phoenix. Both were abandoned because naming conflicts.

Gąska

Silver medal goes to you!