What's the Purpose of XHTML?

PSWorx

Or more accurately, what's it purpose in the W3C philosophy? I'm trying to make myself familiar with the whole "semantic web" and "future of the web" stuff and everything, because, although a few years old already, it sounds pretty interesting in my opinion. (Yes, I am a nerd, I know...)

While XHTML seems clean and nifty at first I more and more don't really get how it is supposed to "fit in the greater scheme of things". Basically, from what I got so far, the W3C wants to promote XML and RDF datasets for semantic data and CSS an XSL for presentational data. While this sounds like an improvement in my opinion, what would be left as "valid" XHTML content then?

I mean, I see what good ol' HTML was good for (or in fact, still is). It combined semantic and presentational data because it was being developed a long time before the whole standardisation stuff started. But XHTML was created long after RDF, XML and CSS, so why was it created in the first place?

bonzombiekitty

It comes down to HTML having to be lienent, and thus more complicated.

Wikipedia to the rescue:

The need for a more strict version of HTML was felt primarily because World Wide Web content now needs to be delivered to many devices (like mobile devices) apart from traditional computers, where extra resources cannot be devoted to support the additional complexity of HTML syntax.

Another goal for XHTML and XML was to reduce the demands on parsers and user-agents in general. With HTML, user-agents increasingly took on the burden of “correcting” errant documents. Instead XML requires user-agents to fail when encountering malformed XML. This means an XHTML browser can theoretically be faster and made to run more easily on miniaturized devices than a comparable HTML browser. The recommendation for browsers to post an error rather than attempt to render malformed content should help eliminate malformed content. Even when authors do not validate code, and simply test against an XML browser, errors will be revealed.

An especially useful feature XHTML inherits from its XML underpinnings is XML namespaces. With namespaces, authors or communities of authors can define their own XML elements, attributes and content models to mix within XHTML documents. This is similar to the semantic flexibility of the ‘class’ attribute from HTML, but with much more power. Some W3C XML namespaces/schema that can be mixed with XHTML include MathML for semantic math markup, Scalable Vector Graphics for markup of vector graphics, and RDFa for embedding RDF data.

stratos

besides the above, HTML used to be all about being easy to use. Then netscape and later IE started "inventing" new tags, and new stuff that should be able to work.
A lot of the idea's in HTML 2/3/4 came from the browsers really. This made things messy.
XHTML 1 is just the first step, in the coming years a lot of stuff will be pulled out of XHTML because it should be done in CSS.

Also like it said in the quote above, because XHTML is based on XML it will be more easy to implement diffrent XML based stuff into it. see for instance XFORMS, which will be a big boon for prof. webdevs when it actually works and gets fleshed out. Because of this again lots of stuff that is handled by modules like XFORMS, XFRAMES, XLINKS etc.. etc.. will be pulled out of XHTML and should be replaced with the use of the subsequent module.

The down side at the moment is however that XHTML isn't supported yet by IE. Now there's is a lot of debate about this because IE will render your xhtml page correctly. (if you lucky) But since it does not support application/xhtml+xml imho it does not support XHTML, and if it renders it correctly, it's only because XHTML and HTML4 are 98% the same, in terms of tag names and behavior.

However like i said there's quite a bit of debate about that, so the above is only my opinion.

Personally i don't bother with xhtml, i simply use HTML 4.01 strict and it works. In all browsers like intended. (par the browser bugs of course)
Now when i'm hearing signals XHTML 2.0 or perhaps some lower big version change is coming up, that will actually be supported by all browsers i'll start using XHTML. However, that might take a while ;) (This of course doesn't mean i don't read up on current events of the tech, it's always good to read about stuff like they, if only to better understand how it's supposed to work)
I'm all for progress and cleaner tools for the job, but i simply hate using something that "happens" to work, instead of just work because it's damn well supported.

asuffield

First: "semantic web". This is the Duke Nukem Forever of the web industry; it will probably never be finished, and if it is ever finished, it will be obsolete and crap. Ignore it.

People talk a lot about how XHTML is about making things simpler or more elegant or whatever. The reality is, as usual, more gritty and kinda puerile. It's all about the XML buzzword. Everybody was rewriting their perfectly adequate systems in XML so that they could claim to be using XML. W3C "needed" to incorporate XML somewhere or risk getting sidelined in favour of dozens of proprietary XML-based systems (purely so that the people using them could say "XML", not because it was in any way better). XHTML is their response. The rest is just rationalisation of their need for buzzword-compliance.

There is some debate about whether W3C themselves were jumping on the buzzword bandwagon, or whether they were just reacting to a great many other stupid people doing so. Personally, I can't see that it makes any difference either way.

stratos

@asuffield said:

First: "semantic web". This is the Duke Nukem Forever of the web industry; it will probably never be finished, and if it is ever finished, it will be obsolete and crap. Ignore it.
People talk a lot about how XHTML is about making things simpler or more elegant or whatever. The reality is, as usual, more gritty and kinda puerile. It's all about the XML buzzword. Everybody was rewriting their perfectly adequate systems in XML so that they could claim to be using XML. W3C "needed" to incorporate XML somewhere or risk getting sidelined in favour of dozens of proprietary XML-based systems (purely so that the people using them could say "XML", not because it was in any way better). XHTML is their response. The rest is just rationalisation of their need for buzzword-compliance.
There is some debate about whether W3C themselves were jumping on the buzzword bandwagon, or whether they were just reacting to a great many other stupid people doing so. Personally, I can't see that it makes any difference either way.

Gee and here i was thinking W3C invented XML.

the changes that are happening will make it much easier to change small things when the whole web thing gets modular. Now if some form element needs to be added or something else they need to release an entire new HTML standard. When XHTML 2 is done, they can just release a new XFORMS and be done with it.

I'm fine with bashing the abuse of XML and all, but this actually isn't abuse.

kirchhoff

There's only one real reason why XHTML exists:

The theory goes that everyone would want to store their structured data in XML at some point. Office documents and spreadsheets and phonebooks or whatever else. So wouldn't it be nice if there was a variant of HTML that was also XML. Then you (the company that creates the XML file format for phonebooks and office documents) could provide an XSLT that acts as a filter for those documents, creating XHTML output, which is transformed and in turn rendered by the web browser, so that way you don't need ump-teen plugins to browse your Intranet.

That's basically it. Since XSLT only transforms XML -> XML, there needed to exist an XML variant of HTML (which was almost well-formed XML, but not quite). That's the destination presentation format for web-browser consumption; the end result target for your enterprisey XML manipulation.

Edit: And I know that's not the only reasons that were thrown about. But it was the only serious driving issue that anyone really cared about. No one really believed that XHTML would somehow make rendering web pages any easier, or thought it would be valuable to use XHTML as an XML data source for further manipulation (because that would be ass-backwards), or that it would have a hand in the development of the DOM or anything.

asuffield

@stratos said:

I'm fine with bashing the abuse of XML and all, but this actually isn't abuse.

I didn't say that XHTML was abusing XML, just that the initial motivation was "have stuff in XML" and not "improve HTML". It is perfectly possible to do a good job of implementing something even when starting from the worst of intentions.

masklinn

XHTML has no purpose, it was merely thought as a wedge part of the complete switch to XML of the W3C (XHTML 1.0 and 1.1 are merely reformulations of HTML 4.01 in XML instead of SGML), and it's failing, which is why the W3C finally decided to restart the HTML WG.

Whiskey_Tango_Foxtro

Wow... everyone has a hateon for XML. Why? I rather like XML. Sure, it's overused as a buzzword, but the concepts behind it are actually pretty solid for hierarchically structured data. HTML that conforms to XML isn't a Bad Idea (TM), IMO. I mean... isn't HTML 4 strict well-formed XML? What's wrong with making sure *all* of your HTML is well-formed XML?

A very useful feature of XHTML over HTML is that you can use any XML parser to generate or process it. It removes a great deal of frustration in working with PHP - I can let the XML parser generate correct XHTML for me and I can have text be automatically escaped.

re: Whiskey Tango Foxtrot? Over.

Some valid HTML is invalid XML:

foo bar

ë

ArneArts

Everybody should read this -> http://www.digital-web.com/articles/html5_xhtml2_and_the_future_of_the_web/

Basically; XHTML2 is a dud, and has no support whatsoever from any major browsers.

Hitsuji

@ArneArts said:

Everybody should read this -> http://www.digital-web.com/articles/html5_xhtml2_and_the_future_of_the_web/

Basically; XHTML2 is a dud, and has no support whatsoever from any major browsers.

/me *cries*

Why, oh why, are they turning away from a good thing, and moving back to the mess that already is HTML. I my eyes XHTML is a good thing, get rid of the crap that developers shouldn't be using, and make it easier for error detection and cleaner code. And i'd def like the removal of the ability to embed JS or CSS in the HTML code itself.

asuffield

@Hitsuji said:

@ArneArts said:
Everybody should read this -> http://www.digital-web.com/articles/html5_xhtml2_and_the_future_of_the_web/

Basically; XHTML2 is a dud, and has no support whatsoever from any major browsers.

/me *cries*

Why, oh why, are they turning away from a good thing, and moving back to the mess that already is HTML. I my eyes XHTML is a good thing, get rid of the crap that developers shouldn't be using, and make it easier for error detection and cleaner code. And i'd def like the removal of the ability to embed JS or CSS in the HTML code itself.

Because web pages are written by morons, and if you take away all that stuff, none of the morons will bother putting an xhtml tag at the top of their pages (much like they currently do not bother), and continue to generate the same HTML 3.2-ish stuff that they always have.

XHTML 1 is almost unused in practice. XHTML 2 would probably be used even less. Any idiot can design a markup language, but that won't change what websites use.

Hitsuji

but there's always the options for ~~backwards~~ moron compatabily by defaulting the document type to html 4.01 if no doctype is declared and real website designers can do things the right way by using the doctype declaration.

Tweenk

@Hitsuji said:

but there's always the options for ~~backwards~~ moron compatabily by defaulting the document type to html 4.01 if no doctype is declared and real website designers can do things the right way by using the doctype declaration.

Unless you have to support IE... then there is no "right" way, only hacks utilizing hacks that have been hacked to hack into hacks...

In Microsoft IE, the browser hacks YOU!

Hitsuji

@Tweenk said:

@Hitsuji said:
but there's always the options for ~~backwards~~ moron compatabily by defaulting the document type to html 4.01 if no doctype is declared and real website designers can do things the right way by using the doctype declaration.

Unless you have to support IE... then there is no "right" way, only hacks utilizing hacks that have been hacked to hack into hacks...

In Microsoft IE, the browser hacks YOU!

What I mean't by the right way was, the HTML code only contained the content of the webpage, CSS files contained info on the layout of the page, and all JS in their own relevant JS files. And of course any tags that should never be used are never used. I was looking forward to the next revision of XHTML, hoping it would have a more strict syntax, and the removal of any html features that are covered/should be covered by a more appropriate tool such as CSS. HMTL these days is far too much of a mess. And the whole idea of HTML 5.0 being backwards compatable is a bit of a joke. If someone wants features from a previous version of HTML, then change the doctype.

Hitsuji

@Tweenk said:

@Hitsuji said:
but there's always the options for ~~backwards~~ moron compatabily by defaulting the document type to html 4.01 if no doctype is declared and real website designers can do things the right way by using the doctype declaration.

Unless you have to support IE... then there is no "right" way, only hacks utilizing hacks that have been hacked to hack into hacks...

In Microsoft IE, the browser hacks YOU!

What I mean't by the right way was, the HTML code only contained the content of the webpage, CSS files contained info on the layout of the page, and all JS in their own relevant JS files. And of course any tags that should never be used are never used. I was looking forward to the next revision of XHTML, hoping it would have a more strict syntax, and the removal of any html features that are covered/should be covered by a more appropriate tool such as CSS. HMTL these days is far too much of a mess. And the whole idea of HTML 5.0 being backwards compatable is a bit of a joke. If someone wants features from a previous version of HTML, then change the doctype.

Monkeyget

Really good brain food on the subject : Why you should be using HTML 4.01 instead of XHTML

asuffield

@Hitsuji said:

but there's always the options for ~~backwards~~ moron compatabily by defaulting the document type to html 4.01 if no doctype is declared and real website designers can do things the right way by using the doctype declaration.

And when the number of people "doing things the right way" (for whatever value of "right") approximates zero, then XHTML is declared a dud and no browser authors waste their time implementing further support for it.

This is what has happened. The causes for it are varied, but they all basically come down to: there is no real reason for using XHTML, and it's more work.

Hitsuji

@Monkeyget said:

Really good brain food on the subject : Why you should be using HTML 4.01 instead of XHTML

To summarise this, from what i glanced thru, Very few groups support XHTML, but those who do support it badly and/or not by default. I counted about 20 times in this article where it mentioned that IE dows not support XHTML. But I still can't see why it's not suppoted, from reading thru the article it seems that most companies are just too lazy to write a separate parser for XHTML files

asuffield

@asuffield said:

@Hitsuji said:
/me cries

As always, the forum software is the real WTF. This is the silliest misfeature I've seen all day.

masklinn

@joost /**/ said:

A very useful feature of XHTML over HTML is that you can use any XML parser to generate or process it. It removes a great deal of frustration in working with PHP - I can let the XML parser generate correct XHTML for me and I can have text be automatically escaped.
re: Whiskey Tango Foxtrot? Over.
Some valid HTML is invalid XML:
<ul><li>foo<li>bar<li>baz</ul> 
foo bar 
ë

Use a real language with real screen scraping libraries (e.g. BeautifulSoup in Python or hpricot in Ruby), problem solved.

@Hitsuji said:

@ArneArts said:
Everybody should read this -> http://www.digital-web.com/articles/html5_xhtml2_and_the_future_of_the_web/

Basically; XHTML2 is a dud, and has no support whatsoever from any major browsers.

*cries*

Why, oh why, are they turning away from a good thing, and moving back to the mess that already is HTML.

Because XML sucks for web publishing?

And you may have missed that one of the goals of the working group is to standardize failure (standardize error conditions and handling across implementations, which is the biggest issue in HTML)

@Hitsuji said:

And the whole idea of HTML 5.0 being backwards compatable is a bit of a joke. If someone wants features from a previous version of HTML, then change the doctype.

It's not about features, it's about migration of existig content...

asuffield

@masklinn said:

It's not about features, it's about migration of existig content...

Nobody cares about old content when you can hire a bunch of college morons to type it all in again for you. It's about migration of existing morons.

Ice__Heat

At least XHTML looks nice.

dhromed

@Ice^^Heat said:

At least XHTML looks nice.

(X)(H)TML looks nice because I set pretty syntax colouring in my editor.

The language is a dog.

woof

@masklinn said:

@joost /**/ said:
A very useful feature of XHTML over HTML is that you can use any XML parser to generate or process it. It removes a great deal of frustration in working with PHP - I can let the XML parser generate correct XHTML for me and I can have text be automatically escaped.
[... invalid XHTML snipped ...]

Use a real language with real screen scraping libraries (e.g. BeautifulSoup in Python or hpricot in Ruby), problem solved.

Can you define 'realness' when it comes to programming/scripting languages? Because I think PHP is a real language, just not an exceptionally pretty or powerful language. And of course, I can think of numerous technical reasons (not in the least libraries) to not use PHP, but the reality is that many web apps are written in PHP, and that not many of those that do the programming get to do the platform choosing.

These super-lenient HTML parsers, I would only use them if I don't have control over the validity of the original markup, because that's where the problem really lies. In every other case, generating invalid XML and then parsing around all the errors is crazy.