Why is XHTML bad?



  • Continuing the discussion from Representative lines from a view file:

    @Arantor said:

    Whoever that XHTML5 was a good idea requires being forced to kneel on rice whilst holding a penny to the wall with their nose, to contemplate what they've done.

    I recently discovered that <br> and <option selected> are valid HTML after doing <br /> and <option selected='selected'> virtually all my adult life. Apparently I actually learned XHTML without realizing it was a separate thing from HTML. I even do that stuff on HTML5 pages and it seems to work.

    So why is XHTML bad? 😕



  • Better question: what good is it? What benefit is there to being able to produce valid XML out of HTML5 documents?


  • BINNED

    @Arantor said:

    What benefit is there to being able to produce valid XML out of HTML5 documents?

    http://thedailywtf.com/Articles/Sketchy-Skecherscom.aspx?

    You could skip the XML! It would be glorious!



  • @Arantor said:

    Better question: what good is it? What benefit is there to being able to produce valid XML out of HTML5 documents?

    I don't know. I always thought HTML was XML and have always coded as such. So for me it's just habit I suppose.



  • The whole point of HTML5 was to undo the damage the W3C was doing with XHTML2. Which is what they (seriously!) thought would replace HTML4. Because they're ON CRACK COCAINE CONSTANTLY, apparently.

    Look, while turning HTML into an XML-based language might seem like a good idea, it's fucking not. You end up with an XHTML1, where nobody uses it in "strict" mode, so you still can't parse it as XML and it's just a somehow-more-annoying HTML4. But you need a parser that's actually pretty different than your existing HTML4 one.

    The concept of XHTML2 was that it's always strict, and so web browsers can be made "simpler" by just using off-the-shelf XML parsers instead of HTML parsers! Except they can't, because there are 300,000,000 websites that don't magically disappear just because the W3C thinks that should-- in fact the W3C made web browsers more complicated, because due to XHTML1 Strict, they all needed XML parsers in addition to their existing HTML parsers.

    Basically, it was the world's dumbest idea, it helped nobody, it made the world more annoying for no benefit, and HTML5 was a nice retreat away from it.



  • Much as it pains me to admit it, I completely agree with @blakeyrat.



  • Here's a follow-up question then.

    Is it bad that I use the HTML5 doctype but then do some XML-y things in the document?

    Like I said, I never realized HTML wasn't XML so if I have potentially bad habits I want to identify and correct them. Can't have my MUD code showing up here for using HTML5 and self-closing line break tags 😄


  • BINNED

    It will probably fail strict checks, but I never saw a browser failing to render it properly.

    Not that you should rely on that, of course.

    Edit: Why in the fuck is this topic not tracked for me after 2 replies I posted in it? Discoursesistency!



  • Good idea in theory, bad in practice.

    If they made html fully XML compliant from the start, that would have been great. That's what they should have done. But alas, the same idiotic mindset that allowed semi-column inference in javascript also led to the initial loose html standards.

    Years later, they realized they screwed up and tried to fix it. But it was too little too late. You can't push toothpaste back into the tube any more than you can force millions of websites to rewrite themselves following standards.



  • I have to agree, but is there no way to move to improve web standards? Can they only become more fragmented and confused over time?



  • It would take the agreeance of all the big names (google, mozilla, microsoft) to create a drop dead date on old standards, and discontinue support for old methods 100% after that date (IE: drop dead date for windows 9 support) - this is under the assumptions they can fix a true standard and all agree on how it should be implemented.

    So probably not for at least the next 30-40 years.

    It would still only be an 85-90% fix though, since you have all the random implementations, supporters, etc of web browsers, and giant entities like governments.



  • This post is deleted!


  • @Bort said:

    How do you get 2^45 web pages converted?

    Make a big announcement about the discontinued support. People who care about being accessible will update their pages. Screw the rest.

    @Bort said:

    Would newer versions of browsers just stop supporting old web sites?

    That's pretty much what you would have to do to make the standards sane. Otherwise, people have no incentive to conform to the new standards.

    @Bort said:

    (post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

    And now you've deleted the post. But I still have evidence!



  • @mott555 said:

    Is it bad that I use the HTML5 doctype but then do some XML-y things in the document?

    No; one of the great things about HTML5 is its laissez-faire "hey do what you want" attitude.

    @mott555 said:

    Like I said, I never realized HTML wasn't XML so if I have potentially bad habits I want to identify and correct them.

    Kids these days. This is the problem with computer science, schools teach the current but not the how we got here. Without knowing how we got there, how the fuck can you improve anything? Oh wait you can't, that's one of the main reasons IT sucks ass.

    @cartman82 said:

    If they made html fully XML compliant from the start, that would have been great.

    Yeah, well, they didn't have a time portal. HTML pre-dates XML. Both were based on SGML, originally, but as should be now obvious SGML was not very well-specified.

    @cartman82 said:

    That's what they should have done.

    Yeah, uh. I'm all for criticizing the W3C, but I don't think you can expect them to see into the future.

    @cartman82 said:

    Years later, they realized they screwed up and tried to fix it. But it was too little too late.

    "Fix" what? HTML not being XML was a feature, not a big. There's nothing to fix. HTML is loose and easy. XML is strict and harsh. HTML is allowing of unknown input. XML is rigidly defined (in theory). Other than both being based on SGML, they're nothing alike.

    The problem is that the W3C was thinking like you, "wow these languages are similar, what if we made them the same?" Yeah. Well. A hobby helicopter and a truck might share the same engine, but that doesn't mean you put huge off-road tires on the helicopter and a tail rotor on the truck.

    @Bort said:

    I have to agree, but is there no way to move to improve web standards? Can they only become more fragmented and confused over time?

    The way to improve web standards is to improve web standards. Not to invent something entirely new (XHTML2) which sucks, and then try to force everybody to adopt it and if that site from 1996 never gets updated, then fuck you it won't display anymore.

    HTML5 is improving web standards. Without turning HTML into something else entirely.

    @Bort said:

    As has been covered above, it's not the browsers that are hard to change, it's the websites. How do you get 2^45 web pages converted?

    See, you're already smarter than the W3C. Their focus was on making browsers easier to write, for some strange reason I can't fathom.

    The best you could do would be to write a web service that could load them with an existing, say, Firefox parser, then re-write the loaded DOM back into the new standard. But then again, Firefox could just do that (and actually does), so what would be the point?

    You can't magically hack into someone's server and replace their page content. Even if you could technically, you can't legally. Even if you could legally, you can't morally.



  • @blakeyrat said:

    See, you're already smarter than the W3C. Their focus was on making browsers easier to write, for some strange reason I can't fathom.

    I suspect it has something to do with these companies being W3C members:

    • Apple
    • Google
    • Microsoft
    • Mozilla
    • Opera
    • and more!

  • ♿ (Parody)

    @blakeyrat said:

    Kids these days. This is the problem with computer science, schools teach the current but not the how we got here. Without knowing how we got there, how the fuck can you improve anything? Oh wait you can't, that's one of the main reasons IT sucks ass.

    This is a huge problem in most areas of human activity. Every new generation believes it invented everything and is much better at it than the idiots who came before. Perhaps the exception is engineering disciplines involved with large catastrophes: e.g., ship building, large construction. Not foolproof, of course, but keener on learning than other areas.



  • @blakeyrat said:

    HTML5 is improving web standards. Without turning HTML into something else entirely.

    HTML5 is adding to the standard. I think the idea of XHTML was to remove things from the standard, so that...

    @blakeyrat said:

    Their focus was on making browsers easier to write, for some strange reason I can't fathom.

    It might have been more about keeping a simpler standard so there would be fewer opportunities for browsers to have bugs. A simpler, more consistent standard (XHTML supposedly) is easier to implement correctly than a confused one (HTML).

    @blakeyrat said:

    HTML is loose and easy. XML is strict and harsh.

    What might the ramifications of having a loose and easy standard be? Horrible incompatibility between implementations? Just like what happened?

    I would call fostering compatibility to be an improvement of the standard.

    I'm sure there are incompatibilities between XML parsers, but are they anything like the problems we've seen with HTML parsers/renderer?



  • @blakeyrat said:

    Yeah, well, they didn't have a time portal. HTML pre-dates XML. Both were based on SGML, originally, but as should be now obvious SGML was not very well-specified.

    @blakeyrat said:

    "Fix" what? HTML not being XML was a feature, not a big. There's nothing to fix. HTML is loose and easy. XML is strict and harsh. HTML is allowing of unknown input. XML is rigidly defined (in theory). Other than both being based on SGML, they're nothing alike.

    The problem is that the W3C was thinking like you, "wow these languages are similar, what if we made them the same?" Yeah. Well. A hobby helicopter and a truck might share the same engine, but that doesn't mean you put huge off-road tires on the helicopter and a tail rotor on the truck.

    Fair enough. It didn't have to be XML or anything at all. Just something well defined, that forced people to follow the spec.

    From my reading and/or podcasting, it seems the feeling at the time was "Let's just allow anyone to write anything and then we do our best to render it". This kind of policy leads to web browsers being able to swallow pretty much anything you throw at them. This is good if you want to allow your average secretary to put something up, but also makes the applications that need to process this super complicated and hardware demanding. Turns out the former requirement wasn't all that important, while the second one still is.

    As for XML, this was just the thing they picked because it's close enough to HTML and widely used. It might have as well been something else entirely.


  • I survived the hour long Uno hand

    My understanding is that the feeling was more like "Oh crap, people are writing stuff! Quick, write all this down and call it a standard. Just don't dare suggest any of them are wrong or we'll be right in the middle of a huge browser war."



  • @Bort said:

    It might have been more about keeping a simpler standard so there would be fewer opportunities for browsers to have bugs. A simpler, more consistent standard (XHTML supposedly) is easier to implement correctly than a confused one (HTML).

    And I'd agree with that, but with the minor caveat that at the time this work was being done all browsers had a bug-free HTML 4.01 parser.

    Now CSS, that's a different story entirely. But HTML implementations have been generally bug-free for a long time-- the bugs were all in XHTML1<super>*</super> and XHTML2, partially because those were the new standards but mostly because nobody fucking used those and so the bugs never got sorted-out.

    *) Pedantic dickweed alert: XHTML1 non-strict was actually quite popular for a couple of years.

    @Bort said:

    What might the ramifications of having a loose and easy standard be? Horrible incompatibility between implementations? Just like what happened?

    When was there horrible incompatibility between implementations of HTML?

    There were horrible incompatibilities between scripting language support (to wit: IE supported VBScript, something which was BTW supported by the HTML spec, and other browsers did not). There were horrible incompatibilities between CSS implementations. There were some pretty bad incompatibilities between DOM implementations.

    But HTML?

    You're smoking crack.

    @Bort said:

    I would call fostering compatibility to be an improvement of the standard.

    It would be, were it necessary or worthwhile work.

    Then again, improving DOM and CSS are like three orders of magnitude more important, and the W3C basically ignores those. DOM is stone-dead, progress-wise, and CSS takes 27 years to get a version revision out.

    @cartman82 said:

    Fair enough. It didn't have to be XML or anything at all. Just something well defined, that forced people to follow the spec.

    But don't you see? Both HTML and XML are compliant with the SGML spec. So that's already been done. The problem is the SGML spec is really, really "loose".

    Remember CSS1? Remember how CSS1 didn't specify whether border widths applied inside or outside the box? Remember how IE implemented it one way (perfectly compliant with the spec) and Netscape implemented it the complete opposite way (also perfectly complaint with the spec)? Yeah. Same thing with HTML vs. XML.

    @cartman82 said:

    Turns out the former requirement wasn't all that important, while the second one still is.

    Why? Of all the CPU time your browser spends on websites, how much is spent parsing HTML/XHTML/XML/whatever? I mean, seriously?

    That was a valid argument back when people thought Palm IIIs should be able to surf the web. It hasn't been a valid argument for a long, long time.



  • @Yamikuronue said:

    My understanding is that the feeling was more like "Oh crap, people are writing stuff! Quick, write all this down and call it a standard. Just don't dare suggest any of them are wrong or we'll be right in the middle of a huge browser war."

    Followed directly by

    "I CAN'T FIND THE PEN!"



  • @blakeyrat said:

    But don't you see? Both HTML and XML are compliant with the SGML spec. So that's already been done. The problem is the SGML spec is really, really "loose".

    Remember CSS1? Remember how CSS1 didn't specify whether border widths applied inside or outside the box? Remember how IE implemented it one way (perfectly compliant with the spec) and Netscape implemented it the complete opposite way (also perfectly complaint with the spec)? Yeah. Same thing with HTML vs. XML.

    Ok, now you're just being overly pedantic. Follow the spec = stricter spec. Less chance that different browsers will parse the code differently.

    @blakeyrat said:

    Why? Of all the CPU time your browser spends on websites, how much is spent parsing HTML/XHTML/XML/whatever? I mean, seriously?

    That was a valid argument back when people thought Palm IIIs should be able to surf the web. It hasn't been a valid argument for a long, long time.

    Fair argument about browsers. Cleaner spec is always better, but not to the point where you disrupt everything when things are already working. That's my whole point: no use pushing for XHTML now, but it would have been nice if they set up a stricter standard to begin with.

    On the other hand, it WOULD be nice if you could reliably parse an HTML like you can XML, for example.



  • @cartman82 said:

    On the other hand, it WOULD be nice if you could reliably parse an HTML like you can XML, for example.

    IE, Chrome, Firefox, Opera all pull it off-- what's your excuse?

    I'm not going to say HTML is an excellent spec, but it's not that difficult to get right, and the few edge cases (tags that don't close in the correct order) have all been hashed-over years ago.



  • I assume an XML parser would do a better job of letting you know when you screw up. I had a page do all kinds of weird things, but there were no errors. Eventually I discovered I was missing a </div>. Personally I'd prefer the browser to tell me start tag count doesn't match end tag count instead of just making crap up and trying to continue like nothing is wrong.



  • @blakeyrat said:

    IE, Chrome, Firefox, Opera all pull it off-- what's your excuse?

    I'm not going to say HTML is an excellent spec, but it's not that difficult to get right, and the few edge cases (tags that don't close in the correct order) have all been hashed-over years ago.

    No excuse. I guess I'm just not up to the task of re-implementing one of the major browser 10-years-of-accumulated-development parsers when I need to strip an HTML document.



  • @blakeyrat said:

    You're smoking crack.

    More like I don't know what I'm talking about.

    My attitudes are more reflective of CSS and DOM, as you've said.

    We would have the same problems replacing or improving those, though, ugggggg.



  • Can you extract the HTML parser out of an open-source browser (Chromium) and make it a library? Is it already a library?



  • Last time I needed this, it was in C# and I used some kind of library that converts html into a "proper" XML, and then you can browse through it using XPATH. The caveat in the docs was, "works in most cases". Things might have improved since.



  • I think this critique of HTML (and friends) is an example of learning from the past, not a failure to do so.

    The lesson is: Keep things as simple as possible from the start. I would criticize the designers of the web for failing to learn this lesson from their predecessors[*].

    XHTML was a desired simplification of HTML that came too late.

    [*] Personally, I would have preferred SHTML:

    (html (head (title "Title")) (body (ul (li "Item1") (li "Item2"))))
    


  • Can you imagine the nightmare of trying to find a missing ) though? No thanks.



  • You use an editor that matches parens. I haven't found it to be a problem in practice.

    Definitely not any more than missing closing tags.

    Or missing curly braces.



  • @mott555 said:

    I assume an XML parser would do a better job of letting you know when you screw up.

    You might assume that, but the XHTML2 spec called for not displaying anything at all if the XML didn't validate. Just a blank window. The W3C really did fuck it up in all possible ways.



  • At least with the closing tags you know what the missing tag is (if you have a half-decent editor). It's a lot easier to figure out where to put a missing </tr> tag than a missing ), because at least you know what is missing the tag.

    In languages where curly braces are an issue, you generally aren't using enough in one section of code to make this an issue. If you do encounter a scenario where you are having trouble matching the braces, it's probably a good idea to refactor.

    On a web page however, you can't really refactor. SHTML would just introduce a whole new can of worms, with the only "benefit" being that there are fewer characters on the raw page.


  • ♿ (Parody)

    @Bort said:

    I think this critique of HTML (and friends) is an example of learning from the past, not a failure to do so.

    Collectively, agreed, but how many individuals are learning from it? Too many people still want to tear down Chesterton's gate.



  • @blakeyrat said:

    You might assume that, but the XHTML2 spec called for not displaying anything at all if the XML didn't validate. Just a blank window. The W3C really did fuck it up in all possible ways.

    I never got into XHTML, so I am unfamiliar with the spec. Keep that in mind when I say:

    Wha ... What?! Why?! That ... TDEMSYR!!

    Disclaimer All vitriol and incredulity contained within this post is directed at W3C, not at blakeyrat.



  • I think I'm starting to understand. Writing HTML as XML isn't bad. It's the official standard for that that's bad...right?



  • I would say: thinking XML and HTML are the same language based on some cosmetic similarities is bad.

    But if you do make that mistake, don't worry, you're in good company with the W3C and thousands of web developers over the years.



  • @abarker said:

    It's a lot easier to figure out where to put a missing </tr> tag than a missing ), because at least you know what is missing the tag.

    I don't know why this would be the case and in my experience with this kind of syntax, it isn't.

    Also, HTML needs variables. And functions.



  • @Bort said:

    The lesson is: Keep things as simple as possible from the start. I would criticize the designers of the web for failing to learn this lesson from their predecessors[*].

    Actually I want to go back and address this.

    The lesson is: don't trade human simplicity for machine simplicity. XML is "simple"<super>*</super> for machines but complicated and unforgiving for humans. HTML is the opposite, more difficult for machines to parse but very forgiving to humans.

    That's like the number one philosophical difference between the two languages. And I'd argue that both languages are right to do what they do, given what they were designed for.

    *) Scare quotes because XML's actually pretty damned complicated to parse correctly.



  • @Bort said:

    I don't know why this would be the case and in my experience with this kind of syntax, it isn't.

    Also, HTML needs variables. And functions.

    It's easier to figure out where to put a missing </tr> because you know what's missing, so you can narrow down where to look. You don't need to look at locations where you expect </a> or </div> tags.

    If you only know you are missing a ), then the first course of action is to figure out which element the paren belongs to. Since all ) look alike, the fact that one is missing doesn't help you. Let's say you have an editor that matches the parens for you. So you step through each element, from the beginning, and determine if the matched paren is in the correct spot. Since a paren is missing, you should be able to skip the html element, because it won't match up to anything. Have fun with that. I'd rather pass.



  • @blakeyrat said:

    The lesson is: don't trade human simplicity for machine simplicity. XML is "simple" for machines but complicated and unforgiving for humans. HTML is the opposite, more difficult for machines to parse but very forgiving to humans.

    Why does HTML need to sacrifice machine simplicity for human simplicity? It's supposed to be written by people who are already used to dealing with machine simplicity over human simplicity.



  • @mott555 said:

    Why does HTML need to sacrifice machine simplicity for human simplicity? It's supposed to be written by people who are already used to dealing with machine simplicity over human simplicity.

    Who says it is? Who says HTML isn't intended for my grandma to create her knitting club's website on?

    I hate that kind of tech elitism. Computers are for everybody. The Internet is for everybody.



  • @mott555 said:

    Why does HTML need to sacrifice machine simplicity for human simplicity? It's supposed to be written by people who are already used to dealing with machine simplicity over human simplicity.

    But it's also supposed to allow people without a highly technical background to write web-pages. At least, that's part of how it started.



  • @blakeyrat said:

    Who says it is? Who says HTML isn't intended for my grandma to create her knitting club's website on?

    @blakeyrat: you appear to have had a parsing error. @mott555 said "written by" not "written for".



  • I'm apparently reading that sentence the exact opposite way you are? I don't see the distinction between "written by" and "written for" in that sentence...

    It's supposed to be written by people who are already used to dealing with machine simplicity over human simplicity.

    It's supposed to be written for people who are already used to dealing with machine simplicity over human simplicity.

    Hm. Well whatever.


  • ♿ (Parody)

    @blakeyrat said:

    Who says it is? Who says HTML isn't intended for my grandma to create her knitting club's website on?

    I hate that kind of tech elitism. Computers are for everybody. The Internet is for everybody.

    And...human simplicity benefits the elite, too, since they don't have to struggle past it (even if they are much more proficient than blakey's grandma).

    Of course, engineering is about tradeoffs, and it's not at all easy to figure out when human simplicity outweighs technical capability.



  • @blakeyrat said:

    I'm apparently reading that sentence the exact opposite way you are? I don't see the distinction between "written by" and "written for" in that sentence...

    Written by: talking about the people who designed the spec.

    Written for: talking about the intended users of the spec.

    Then again, you might be right. @mott555 might not be typing what he means.



  • @blakeyrat said:

    don't trade human simplicity for machine simplicity

    I think this is often a false dilemma. XML is easier to write a parser for and easier to read because there are fewer variations.

    HTML is more forgiving, so it is easier to write (the writer can make mistakes and overlook details), but that doesn't make it any easier to read (the reader has to keep more details in mind while reading).

    Another lesson: code is read at least as often as it is written. Usually much more often. By people and machines.



  • The confusion, to beat this dead horse, is whether "it" refers to the spec itself, or the HTML documents the spec describes. I don't know how to read it anymore, you've confused my brain, congratulations.



  • I think the guy was referring to the stuff in the thing about which we were all yaking.


Log in to reply