XPath query on sun-jaxws.xml file



  •  Right... so I wrote this little tool using Java 6 that scans a sun-jaxws.xml file inside a .war file, and starts all web services it finds as endpoints. It's like a mini-Tomcat, but just for web services. It used JAXB to scan the file, it worked, and everything was fine.

    Recently, I've converted it to Java 7 and Maven, and here's the problem: JAXB, Maven and Netbeans 7.x simply do not work. Netbeans generates build files that are manifestly wrong. So I decided to use XPath to scan the file.

    But when I have this file

    <?xml version="1.0" encoding="UTF-8"?>
    <endpoints version="2.0" xmlns="http://java.sun.com/xml/ns/jax-ws/ri/runtime">
      <endpoint implementation="com.initech.wtf.WTFWS" name="WTFWS" url-pattern="/WTFWS"/>
    </endpoints>

    and I try this XPath query

    //endpoint

    it simply doesn't work. I ended up using '/*/*' as the query to read all <endpoint> tags, but I was hoping for something a little more... elegant.



  • Not a namespace issue, by any chance? That's caught me out before (change of namespace as part of versioning)

    But year, that ought to work. There any other basic XML you can test your XPath processor against?



  • This looks very much like a namespace issue. What you want to do is not to query for //endpoint but //x:endpoint, where x is bound to http://java.sun.com/xml/ns/jax-ws/ri/runtime. Unfortunately doing this binding with the Java XPath library is, as some in this forum would put it, a goddamned fucking pain in the ass, which involves way more code than is healthly or in any way reasonable, even for java. If you feel masochistic, here is a good tutorial.

    If there are no other namespaces in your document, it's usually easier to turn off namespace processing completely. You can do this by setting the feature http://xml.org/sax/features/namespaces of your parser to false. At least this worked for Xerxes, I haven't tried it yet with JAXB...



  • Ugly hack that works:

    //*[local-name()='endpoint']


  • @PSWorx said:

    What you want to do is not to query for //endpoint but //x:endpoint, where x is bound to http://java.sun.com/xml/ns/jax-ws/ri/runtime.
     

    If that is the case, then TRWTF is Java (or something) "pulling an IE"[1], confusing results with success.

    @PSWorx said:

    If you feel masochistic, here is a good tutorial that's badly presented

    WTF is up with that embedded markup and pink highlight? Or is FF refusing to display it in all its glory?

    [1] flawed processing to produce correct results. When the processing is later fixed, the real (incorrect) results are accurately displayed.



  • @Cassidy said:

    If that is the case, then TRWTF is Java (or something) "pulling an IE"[1], confusing results with success.


    From what I know, it's not technically incorrect. They conform to the XPath spec; It's just that the designer of the Java XPath API apparently wanted to make dealing with namespaces as painful as possible.

    [Dork mode on]

    According to spec, if you do a query //foo that means "Get all descendants that have local name foo and that belong to the current default namespace". Unfortunately, the spec doesn't say a word about how you declare what "the current default namespace" is - that's left to implementors. Now, Java - as we'd expect from them - made the most practical and reasoneable choice and hard-wired it to the empty namespace without any way for you to change it. That works well if everything you deal with is in the empty namespace (as it is for documents with no namespace declarations at all). But as soon as you want to access something that isn't - like you now - you need to deal with prefixes, namespace contexts, etc...
    [Dork mode off]

    @Cassidy said:

    @PSWorx said:

    If you feel masochistic, here is a good tutorial that's badly presented

    WTF is up with that embedded markup and pink highlight? Or is FF refusing to display it in all its glory?

    [1] flawed processing to produce correct results. When the processing is later fixed, the real (incorrect) results are accurately displayed.

    I can't see any pink highlights in chrome, but agreed, on second thought that is not a good tutorial. Have a more readable impromptu-tutorial here:

    1. Choose a namespace prefix. I used x but you might as well use jax, foo or purpledildo.
    2. Write a custom class that implements NamespaceContext, which is just a bidirectional map between prefixes and namespace URIs. The only really important method is getNamespaceURI(). For a single prefix, you can "simplify" the implementation by having that method just return your URI, and the other methods return your prefix, no matter the input. Yes, you have to do this yourself. No, there is no AbstractNamespaceContext or somesuch.
    3. Before doing any queries, call setNamespaceContext() on your XPath object to hook up your class.
    4. Change your queries so that they use the prefix. Yes, even if your class doesn't care about your prefix, the library does, for the reasons stated above.
    5. profit, etc etc



  • @PSWorx said:

    [Dork mode on]
     

    That's... pretty interesting. I just presumed it's always worked because I've moved my queries into the desired namespace to ensure they're contextualised - I didn't tihnk of the empty/default situation.

    @PSWorx said:

    I can't see any pink highlights in chrome, but agreed, on second thought that is not a good tutorial.

    The content is probably good, but the presentation foncuses the gessamge. I dislike sites (and printed material) where it's not obvious what is entered code, what is returned results, what is textual explanation and what's a link. Neilsen would rip a new one.

    @PSWorx said:

    Have a more readable impromptu-tutorial here:

    That's a major improvement and I thank thee kindly. I've had a re-read of that link and I can see what he's getting at... but I [hilite]think [/hilite] his escaping (or HTML translation)is a tad busted.



  • @Cassidy said:

    @PSWorx said:

    What you want to do is not to query for //endpoint but //x:endpoint, where x is bound to http://java.sun.com/xml/ns/jax-ws/ri/runtime.
     

    If that is the case, then TRWTF is Java (or something) "pulling an IE"[1], confusing results with success.

    It's just as much of a pain in the ass in .Net and MSSQL.



  • @Jaime said:

    Ugly hack that works:

    //*[local-name()='endpoint']

     

     

    + 20 pionts to Jaime!

     



  •  Thanks for all the help. The concrete problem that I had is that I'd called setNamespaceAware( true ) on the DocumentBuilderFactory instance I added a bit of code so that you can add prefix/URI tuples to the constructor, and it will create a neat little NamespaceContext instance for you. More importantly, the code actually works.


  • Discourse touched me in a no-no place

    @PSWorx said:

    the designer of the Java XPath API apparently wanted to make dealing with namespaces as painful as possible
    Having encountered XPath systems in rather a lot of different languages, namespaces are a PITA in every last single one of them. I conclude that the problem is that the W3C screwed the pooch with an 8-foot dildo covered in rusty razors with this one, turning a potentially useful syntax into something that makes using the DOM directly look nice as soon as namespaces rear their heads.


Log in to reply