MSXML WTF & lot's of head banging against wall



  • Ouch, my forehad...  'been banging my head against solid brick wall for couple of hours for now. I am writing an application to our customer to integrate several systems together. My application collects data to internal XML document and uses XSLT transform to send it to another application. Works well, so far so good.

    I realized that I could transform the internal XML document to some user-friendly format - such as XHTML to be viewed direcly in web browser. Well, that's what XSLT is ment to be used, so this should be a quite easy task. I created a very simple stylesheet:

    <?xml version="1.0" encoding="iso-8859-1" ?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:template match="/">
        <html>
          <head>
            <title>Summary</title>
          </head>
          <body>
            <xsl:apply-templates/>
          </body>
        </html>
      </xsl:template>
      <xsl:template match="RootElement">
        <p>foo</p>
      </xsl:template>
    </xsl:stylesheet>

    Then I wrote a very simple piece of code to transform the internal XML to HTML by using TransformNodeToObject method:

    blah blah...
    Dim summaryDocument
    Set summaryDocument = CreateObject("Msxml2.DOMDocument")
       
    Dim summaryStylesheet
    Set summaryStylesheet = CreateObject("Msxml2.DOMDocument")
       
    summaryStylesheet.load stylesheetPath & summaryStylesheetName
    intermediateXMLDocument.transformNodeToObject summaryStylesheet, summaryDocument
    summaryDocument.save summaryFolderPath & summaryFileName

    I know, I know, the real WTF is that this is actually VBSript code (!) so no need to mention it, but anyway... For couple of hours I have been trying to find out why the XSLT transform fails and the resulting HTML file is empty. Trying to figure out row by row and character by character what is wron in the code and the stylesheet (which does not even do any formatting to actual data etc.).

    Finally I was able to find out, that in the extremely complex XSLT transform mentioned above, the following stylesheet source

          <head>
            <title>Summary</title>
          </head>

    transforms into

    <head>
    <META http-equiv="Content-Type" content="text/html; charset=UTF-16">
    <title>Summary</title>
    </head>

    in the result document, which is not valid XML since the tag is not closed. Therefore it causes a parse error and I get an empty result. Simple enough.

    But hey, WTF - where did that META tag come from? Apparently Mycro$oft XML parser uses some clever internal heuristics (maybe Office Assistant) to help me output "proper" html. "It seems that you are trying to output HTML data. Perhaps you would like to include some extra tags in your source? How about some META ones?" Whoops, it's not valid xml anymore, but who cares.
     

     


     



  • That might actually be somewhat in complance with specs, if I recall right... if you want to output XHTML, you'd need to use the HTML namespace and it probably wouldn't hurt to use an xsl:output saying the output is xml.



  • Nope, XHTML will not parse unless all the tags close. It has the HTML namespace, but the XML tag closing rules. If Microsoft is adding unclosed meta tags, Microsoft is wrong.



  • Of course Microsoft is wrong...

    Microsoft apparently has no clue on what XHTML is. Check out their "XTHML-Strict" search page www.live.com. It's rare to see a page declared XHTML with this many validation errors.




  • @pcooper said:

    That might actually be somewhat in complance with specs, if I recall right... if you want to output XHTML, you'd need to use the HTML namespace and it probably wouldn't hurt to use an xsl:output saying the output is xml.
     

    Sorry, much as I'd love to blame MickeySoft, it's you.

    From the W3C (Gospel) at http://www.w3.org/TR/xslt#output:

    =================

    The default for the method attribute is chosen as follows. If

    • the root node of the result tree has an element child,

    • the expanded-name of the first element child of the root node (i.e. the document element) of the result tree has local part html (in any combination of upper and lower case) and a null namespace URI, and

    • any text nodes preceding the first element child of the root node of the result tree contain only whitespace characters,

    then the default output method is html; otherwise, the default output method is xml. The default output method should be used if there are no xsl:output elements or if none of the xsl:output elements specifies a value for the method attribute.

    =====================

     i.e. you didn't tell it what you were writing out, and the first thing you wrote out was an <html> tag. Therefore, it (sensibly) defaults to HTML output. Not XHTML, as that breaks some stupid braindead browsers... which may well be Microsoft's fault...

     

     



  • >Sorry, much as I'd love to blame MickeySoft, it's you.

     Of course it was, and it was just a mouse click away to STFG how to fix this. The point was just that it was just quite unintuitive; I may be still a newbie but I would expect that such a simple stylesheet would pass through as-is.

     



  • @sirhegel said:

    I realized that I could transform the internal XML document to some user-friendly format - such as XHTML to be viewed direcly in web browser. Well, that's what XSLT is ment to be used, so this should be a quite easy task. I created a very simple stylesheet:

     

    Doh!  Much, much easier to use CSS to style any old XML for browserification:

     


    <?xml etc etc>

    <?xml-stylesheet type="text/css" href="/my.css" ?> 

    <rootNode>  <child>Hey!</child></rootNode> 

     

    /* my.css */

    rootNode { text-align: centre; font-weight: bold; color: red; }

    child { margin-left: 3em; }

     

    and so on... great for RSS feeds etc.

     



  • Amen with this one!

    I've had this problem on a number of occasions.

    Sometimes it feels like Microsoft doesn't like you using XML. It's too non-proprietary. So they introduce naff little bugs like this (and others - never use their MSSOAP SDK 3.0 libraries) to make it as irritating to use as possible :)

    On top of that they have no plans to support XPath 2.0 (and thus XSLT 2.0 or XQuery) because it will clash with LINQ. Gyargh!

    I've found that the .Net port for Saxon-B is the best free alternative.
     



  • Did you not read where it was determined that this was perfectly standards compliant behaviour, or were you too blinded by the chance to bash Microsoft?



  • @sirhegel said:

    >Sorry, much as I'd love to blame MickeySoft, it's you.

     Of course it was, and it was just a mouse click away to STFG how to fix this. The point was just that it was just quite unintuitive; I may be still a newbie but I would expect that such a simple stylesheet would pass through as-is.

     

     If you are a newbie, let me give you the best advice you will ever get:  blame the tools last.  blame yourself first.  How much time did you spend bashing "stupid micro$oft!!" when all you had to do was some research.  And also realize that no tool is perfect, from Microsoft or Linux or Apple or elsewhere; it is a fact of life that programming is an art, not a science, and there's lots of ways to get things done and reasons that things work the way they do that you will never even begin to understand.

    It's amazing how many "newbies" out there have this attitude. Dare to be different from the pack of script kiddies out there -- don't look to assign blame to every popular scapegoat out there that's "cool" to bash.  You'll be a better programmer and less of a cliched, boring, cloned, "M$"-hater.  (or google-hater, or Apple, or whatever the trendy thing to hate is ...)

    And, by the way, even using the expression "Microsoft" with the $ shows a lot of childish, unoriginal, blind  ignorance.  Again, dare to be "unique" and "different" and evaluate things based on their actual merits and not based on what you read on "slashdot" or "digg" so that you can comply with all of the uniformed script kiddies out there. 

    Just browse this very forum and look at all of the "WTF's" that people are submitting (this one included).  About 1 in 10 are legitimate. Most are people that have no idea what they are doing and they are just looking to assign trendy blame to something to "join the fun" and to hopefully be considered a knowledgeable, experience programmer.  Of course, the end result is they clearly portray themselves as the opposite.... It's kind of scary how often this happens ...  (example: http://forums.worsethanfailure.com/forums/122421/ShowThread.aspx )

    Anyway, take the advice or leave it, but consider it.  Good luck.  Don't be a sheep, be a leader and be brave enough to think independently on your own.
     



  • >If you are a newbie, blah blah..

    I am shocked, I thought that this site and these anecdotes written here are ment to be humorous and thus amuse the readers. I am sorry if I have misunderstood. I am very sorry if I spoiled your day by writing a Daily WTF post which did not fill your WTFiness criteria.

     >Don't be a sheep

    Don't be a donkey, go learn some sense of humour, jackass.


    The real WTF in this topic seems to be how seriously these WTF's can be taken.  

     -h-
     


     



  • @sirhegel said:

    >If you are a newbie, blah blah..

    I am shocked, I thought that this site and these anecdotes written here are ment to be humorous and thus amuse the readers. I am sorry if I have misunderstood. I am very sorry if I spoiled your day by writing a Daily WTF post which did not fill your WTFiness criteria.

     >Don't be a sheep

    Don't be a donkey, go learn some sense of humour, jackass.


    The real WTF in this topic seems to be how seriously these WTF's can be taken.  

     -h-  

     
    I didn't mean it as an insult, I meant it as advice. take it or leave it. Sounds like you are leaving it. that's ok -- the world needs ignorant "M$-haters" just as much as it needs intelligent programmers who can think on their own and evaluate things objectively based on their own opinion.,

    I have a great sense of humor, but there is criteria: the joke must be at least somewhat original and also at least a little funny. if you were making a joke somewhere in this thread, did you feel that it was either of the two?  Do we really need more "M$ suckz, dude!" jokes?



  • FWIW:

    I always find it useful when developing stylesheets for XML documents to save off a sample .xml document, add the

    <?xml-stylesheet type="text/xsl" href="mystyle.xslt" ?>

    line to the .xml file, and open the .xml file in Internet Explorer.  Assuming all your markup is valid, it works as a nice stylesheet debugger, without the hassle of digging error messages out of JS/VBS code.



  • You may be surprised, but the output is 100% W3C compliant HTML 4.01, which is the default output method of a stylesheet.  A document is required to provide a character encoding to the user agent either by means of an HTTP "Content-Type" header or a <META> tag in the head of the document:

    Specifying the character encoding

    By the HTML 4.01 DTD, the META tag is FORBIDDEN to have an ending tag:

    HTML 4.01 DTD : META

    Thus, the MSXML parse has guaranteed you a fully compliant HTML 4.01 document, regardless of your ability to send a content type in an HTTP header.  Your intention is perhaps to choose the xsl:output method of "xml" as described by PC Paul in an earlier post.  Beware that without the proper content type of "application/xhtml+xml" your browser may render it funky.  Consider the following:

     <html>

    <body><textarea/><div>This is &quot;outside the text area&quot;</div></body>

    </html>

    Fully compliant XML.  Fully broken HTML 4.01.  IE, Firefox, and Safari render it in a way that you may not expect.  Content type is everything.

    I hope this can give you some appreciation of the difficulty in standards compliance, and I also hope that you have a better understanding of the perceived problem and can find the solution you're wanting.



  • @Jeff S said:

    Do we really need more "M$ suckz, dude!" jokes?

    I think that you do not understand the purpose of this site. With proper spelling and grammar, yes, we do. It's why we're here. 



  • @asuffield said:

    @Jeff S said:

    Do we really need more "M$ suckz, dude!" jokes?

    I think that you do not understand the purpose of this site. With proper spelling and grammar, yes, we do. It's why we're here. 

     
    <emotion type="sarcasm" level="novice" intention="humorous">Maybe we should encourage each other to use specific tags to indicate the intended emotional content of the written text, just to help people with no sense of humour.</emotion>
     



  • @asuffield said:

    @Jeff S said:

    Do we really need more "M$ suckz, dude!" jokes?

    I think that you do not understand the purpose of this site. With proper spelling and grammar, yes, we do. It's why we're here. 

    uh, no -- I've been at this site a long time and believe me, I understand it.  If you are here to make unoriginal, uniformed and ignorant M$ jokes, then you are at the wrong place.  Then again, maybe not, I suppose; originally this site was a bunch of decent programmers making fun of bad code, but it definitely has morphed into a typically digg/slashdot-style "anti-M$" website, I suppose.   Now people just write random "The real WTF is using VB!" comments over and over -- most of them never having used VB since about 1998, if at all, and having no idea what they are talking about ... it's too bad.  There's a whole new generation of programmers out there looking to blame everything that ever goes wrong in all the code they write on someone else.

     



  • @Jeff S said:

    uh, no -- I've been at this site a long time and believe me, I understand it.  If you are here to make unoriginal, uniformed and ignorant M$ jokes, then you are at the wrong place.  Then again, maybe not, I suppose; originally this site was a bunch of decent programmers making fun of bad code, but it definitely has morphed into a typically digg/slashdot-style "anti-M$" website, I suppose.   Now people just write random "The real WTF is using VB!" comments over and over -- most of them never having used VB since about 1998, if at all, and having no idea what they are talking about ... it's too bad.  There's a whole new generation of programmers out there looking to blame everything that ever goes wrong in all the code they write on someone else. 



  • @asuffield said:

    @Jeff S said:

    uh, no -- I've been at this site a long time and believe me, I understand it.  If you are here to make unoriginal, uniformed and ignorant M$ jokes, then you are at the wrong place.  Then again, maybe not, I suppose; originally this site was a bunch of decent programmers making fun of bad code, but it definitely has morphed into a typically digg/slashdot-style "anti-M$" website, I suppose.   Now people just write random "The real WTF is using VB!" comments over and over -- most of them never having used VB since about 1998, if at all, and having no idea what they are talking about ... it's too bad.  There's a whole new generation of programmers out there looking to blame everything that ever goes wrong in all the code they write on someone else. 

    you have a talented knack for reading the exact opposite of what people are telling you .... did write I wrote suggest to you that it was "much harder back in the good old days??"   My goodness, you are not so good at reading words on a page and then drawing a logical conclusion from those words, as demonstrated at least 4 or 5 times today.  Read sloooowly and more carefully and you might start getting it.  Best of luck.  I should note, however, that it seems that maybe English is not your native language; that would explain a lot, and if that is the case, then it is rude of me to criticize your reading comprehension and I apologize.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.