Emil Stenström: Why XHTML is a bad idea
| Torsten wrote: | ||
|
XML would be a huge barrier for most people publishing on the web because well-formedness and other XML requirements are not intuitive concepts for Joe Public.
| Torsten wrote: | ||
|
In contrast, there are very few browser manufacturers and they already have the expertise of developing sophisticated error handling. They also have the time (and funding) to continue this. It makes sense to keep this most difficult aspect where the talent and resources for solving it already exist?
Better compliance from the major commercial authoring tools would be a good thing as well, especially those made by companies who also make browsers (such as Microsoft).
| Torsten wrote: | ||
|
| Torsten wrote: |
| I know next to nothing about HTML parsers, but it seems to me that you can't completely separate the application of style rules from parsing the underlying mark-up i.e. broken mark-up makes for broken CSS. |
Remember that tidy, browser-friendly markup can be written in HTML. It ain't got to be soup!
| Torsten wrote: |
| It's always been my assumption that XHTML, in addition to allowing for greater efficiency and accuracy in parsing (with all the benefits that this entails), would also make the application of CSS less error prone. I'd welcome any corrections on that score from those in the know. |
Last edited by Ben Millard on 08 Sep 2006 02:13 pm; edited 1 time in total
| Cerbera wrote: |
| The only significant HTML interoperability problem I know of is the lack of <abbr> support in IE, but the CSS interoperability problems between are numerous (from what I've experienced and seen reported). |
Simon Pieters
| Cerbera wrote: |
| I've often had to help friends whose blogs have become scrabled because they missed an end tag, or copy-pasted too many end tags. Because their pages were being sent as text/html their readers can at least still see the content; in an XML web all they'd get would be an error message. |
...telling them exactly what the problem is. So they have to add or remove one or two closing tags, is that really an enormous hurdle? I don't think so.
| Cerbera wrote: |
| XML would be a huge barrier for most people publishing on the web because well-formedness and other XML requirements are not intuitive concepts for Joe Public. |
I simply do not agree. To the contrary, having a strictly defined and simple set of rules (far simpler than those governing SGML and by extension HTML), should create a stronger conceptual model. There's nothing inherently complicated about the concept of 'well formed-ness'. It really couldn't be simpler.
| Cerbera wrote: |
| A lot of pages are authored through a variety of homebrew systems, open-source news scripts and custom CMSs. Moving all error correction to each of the small teams and individuals for these projects wouldn't work. They don't have the time or expertise to develop the sophisticated error handling to sanitise erroneous user input with the 100% reliability an XML web would require. |
You misunderstand me. I wasn't talking about error correction, I meant that authoring tools should create valid mark-up in the first place. If you're referring to authoring tools that allow authors to manipulate the mark-up itself, then the simple answer is for them not to accept malformed mark-up, and thereby force the authors to correct their errors. Again, I don't see this as a major hurdle for content authors. Anybody who's prepared to author their mark-up 'by hand' is, it seems to me, unlikely to have any difficulty grasping the requirements of XML.
| Cerbera wrote: |
| The two are tied together, you're right, but tied loosely. HTML browsers apply styling to an error-corrected version of the document. That softens the blow of erroneous markup, sometimes making its effect imperceivable. |
(my emphasis)
That's precisely my point. Two different browsers, by virtue of their independent error detection/correction heuristics, may ultimately be looking at two different documents. These differences may often be small, but they may also be significant. As an advocate of the importance of semantics, I'm surprised that you're happy with the idea of User Agents having to guess what a content author intended to convey. XHTML resolves these ambiguities.
| Cerbera wrote: |
| Remember that tidy, browser-friendly markup can be written in HTML. It ain't got to be soup! |
If you're in favour of promoting 'well formed-ness' in HTML, why not go that little bit further and make it a requirement? To my knowledge nobody, myself included, is suggesting that such a requirement be enforced tomorrow. I think everybody recognises that it's likely to be a slow and gradual process, but that doesn't mean we shouldn't strive to attain it.
| Cerbera wrote: |
| XML compliant XHTML documents will almost always be at least a little larger than the equivalent HTML due to the more verbose syntax. The time lost in data transfer is likely to be greater than that gained through the marginally faster parsing it allows, resulting in an overall performance loss. (Data transfer rates around the internals of a PC are very much faster than those across the web.) |
I said “efficiency and accuracy”, not speed. We're not just talking about PCs here, but potentially a wide range of resource limited devices. Implementing a complex HTML parser, with all the error detection and correction that it would require, may not even be possible on such devices.
There's simply no mileage in the 'larger file sizes' argument as far as I'm concerned. With one or two notable exceptions (tables for example), the difference amounts to nothing. Furthermore, the level of knowledge required to actually benefit from HTML's comparatively slack rules is surely a far greater cognitive challenge than required by XML.
Turn the clock forwards - maybe 30 years - a pure XML web is not inconceivable as a long term ambition.
But probably these details are irrelevant, and what will ultimately emerge will be based on something entirely new that we haven't thought of yet. But the idea that underlies it all is to try to create a semantic web - a web of structured, semantic content, rather than a chaotic mash-up of unstructured noise.
Maybe it will turn out that the w3c becomes irrelevant in this process - maybe microformats shows a way for us to do it ourselves by the back door
But probably these details are irrelevant, and what will ultimately emerge will be based on something entirely new that we haven't thought of yet. But the idea that underlies it all is to try to create a semantic web - a web of structured, semantic content, rather than a chaotic mash-up of unstructured noise.
Maybe it will turn out that the w3c becomes irrelevant in this process - maybe microformats shows a way for us to do it ourselves by the back door


