XML Streaming Deserializer in C#



  • I have an XML file. It could be huge. Like gigabytes huge.

    I have a set of C# classes that replicate the XSD of the XML file.

    I can do this:

    using (var blahXMLStream = new FileStream(blahtXMLPath, FileMode.Open))
    {
        var serializer = new XmlSerializer(typeof(BlahSet));
        var blahSet = (BlahSet)serializer.Deserialize(blahXMLStream);
    }
    

    Awesometastic. Except, that loads the entire XML she-bang into memory all at once.

    So I think, "hey LINQ-to-XML, LINQ's all about streaming it must do that!" and it does, but, AFAICT, there's no way to make LINQ-to-XML type-aware. (Meaning: there's no way to get a XElement and say "hey this XElement is really this class, please convert for me thanx".)

    So I think, "hey XmlReader! It lets you stream the Xml data as it comes in!" but XmlReader is also not type-aware, AFAICT.

    So. I have an XML file. I have C# classes that were used to generate that XML file. I want to deserialize into the classes. I don't want to have to write my own deserializer from scratch. Do I have any options?



  • @blakeyrat You need a SAX parser. Here's a random one I found on Google: http://saxdotnet.sourceforge.net/



    1. No I need an XML parser, I don't even know what "SAX" is.

    2. That website looks sketchy as hell. Assuming a "SAX parser" is indeed what I need, do you have one that looks like it's been maintained for the last 9 years?



  • How can you parse the XML without having the full thing? Won't you run into issues like trying to parse "what I have" without a closing brace somewhere?


  • SockDev

    @blakeyrat said in XML Streaming Deserializer in C#:

    I don't even know what "SAX" is.

    Who's the dumb fucker now?



  • @blakeyrat said in XML Streaming Deserializer in C#:

    I don't even know what "SAX" is.

    SAX = Simple API for XML. The difference between SAX and the other methods you are accustomed to is that SAX parsers consume the XML file as a stream and fire your events as they process each component of the document.

    The only well-supported SAX parser I've ever seen is Apache Xerces. I can't find anything for .Net that looks solid. SAX is pretty simple - maybe one of the toy projects is good enough for what you are doing.



  • @RaceProUK Maybe Blakey just isn't interested in SAX.



  • @Lorne-Kates said in XML Streaming Deserializer in C#:

    How can you parse the XML without having the full thing?

    The outermost element is simply a container of thousands of identical elements.

    @Lorne-Kates said in XML Streaming Deserializer in C#:

    Won't you run into issues like trying to parse "what I have" without a closing brace somewhere?

    Right; and XML Reader has handy functions that'll say "start at the opening tag and read to the closing tag and give me the whole she-bang at once", which is great. The problem I have is once I have the entire tag, I need to convert it to a C# object and AFAICT there's no way to do that automatically.

    @Jaime said in XML Streaming Deserializer in C#:

    he difference between SAX and the other methods you are accustomed to is that SAX parsers consume the XML file as a stream and fire your events as they process each component of the document.

    How is that difference from what C#'s XmlReader already does?



  • @blakeyrat said in XML Streaming Deserializer in C#:

    How is that difference from what C#'s XmlReader already does?

    Just read your post more thoroughly... A SAX parser would put you in the same position as XmlReader and you would have to implement your own deserializer on top of it..


  • I survived the hour long Uno hand

    @blakeyrat said in XML Streaming Deserializer in C#:

    How is that difference from what C#'s XmlReader already does?

    Java has an XMLReader interface that must be implemented by any SAX2 parser. I suspect that C#'s XmlReader is actually a SAX parser itself, it just doesn't advertise that fact because nobody really cares.



  • @blakeyrat said in XML Streaming Deserializer in C#:

    The outermost element is simply a container of thousands of identical elements.

    Gut instinct is "you're going to have to do some custom parsing" on this.

    @blakeyrat said in XML Streaming Deserializer in C#:

    Right; and XML Reader has handy functions that'll say "start at the opening tag and read to the closing tag and give me the whole she-bang at once", which is great. The problem I have is once I have the entire tag, I need to convert it to a C# object and AFAICT there's no way to do that automatically.

    There's got to be some way. If you write a webservice, you can make the signature look something like this: Public Sub FuckYouGiveMeObject(ByVal o As BlakeyObject). You then call it with XML (or JSON I suppose ick), and .Net converts it to the object. So that exists somewhere.

    If the outermost element is just a container, and you have thousands of BlakeyObjects inside-- the only way I can think of doing this is inheriting the streamreader. Have it look for the open and close of a <blakeyobject> node. Take that text and run it through another XML parser (to case into a C# BlakeyObject).



  • @Jaime Right; so despite the snark from RaceProlapse and powerlord that helps me not at all.

    I mean, thanks for introducing the term I guess, but it doesn't address the question I'm asking.


    Take a look at this page: https://msdn.microsoft.com/en-us/library/mt693195.aspx

    Using that as an example, I have believe I can get an XElement, and then buried in this StackOverflow answer is an example of using a XmlSerializer on a single XElement instead of an entire document, so I think I'm set. I hope so.

    EDIT: except one is an XElement and the other an XNode, so never the twain shall meet. It looks like both implement a CreateReader(), so let's give this a go...



  • @Yamikuronue said in XML Streaming Deserializer in C#:

    I suspect that C#'s XmlReader is actually a SAX parser itself, it just doesn't advertise that fact because nobody really cares.

    Probably. It's just frustrating when people "answer" a question by just dropping a few keywords without bothering to explain how those keywords are intended to answer the question. Maybe my response was a little rude, but.

    (Then when I spent a 20 minutes doing my own research, I find out that the SAX API is AFAICT not even slightly different from the XmlReader API, and I already mentioned I was trying XmlReader in the original question!)



  • @blakeyrat said in XML Streaming Deserializer in C#:

    The outermost element is simply a container of thousands of identical elements.

    Well, if they're identical, you only need to parse the first one...

    But seriously, I've used SAX a lot; I'm surprised you don't know of it.

    Can you use SAX to grab each entry, and then use XMLSerializer on just that one entry by streaming the "substring" you captured?

    EDIT: Wow, that took a long time to post.... didn't see your previous post.



  • @blakeyrat You are indeed set. In the case where the outer element is just a list, you can stream in each children using linq to xml and then deserialize it using your old XmlSerializer method. You can get the xml as a string by simple calling ToString on the XElement.



  • @xaade said in XML Streaming Deserializer in C#:

    But seriously, I've used SAX a lot; I'm surprised you don't know of it.

    I don't work in XML very much. As should be obvious from this question.

    I'd still much rather have a type-aware XmlReader, and I'm more than a little surprised that no such thing exists.



  • @DogsB said in XML Streaming Deserializer in C#:

    Then write one. Patent it. Realize that is no money in software patents and open source it. Then get pissed off at the community and piss all over it. Come back here and tell us how you trolled an entire industry. We would build statues in your likeness.

    Fuck off. This is Coding Help, not "Be A Irredeemable Dick To Blakeyrat".



  • @blakeyrat New problem:

    <BlahItem xmlns='http://schemas.blah.com/BlahSet/v2.0'> was not expected.
    

    It looks like the XmlReader "decorates" all the tags it spits out with the XML namespace, which should be ok. But somehow that's confusing the deserializer when I tell it to deserialize a BlahItem...

    I saw some advice to add a root attribute, like so:

                XmlRootAttribute root = new XmlRootAttribute();
                root.ElementName = "BlahSet";
    

    But that did not help. Not sure how to solve this exception...


  • Winner of the 2016 Presidential Election

    @blakeyrat said in XML Streaming Deserializer in C#:

    I'd still much rather have a type-aware XmlReader, and I'm more than a little surprised that no such thing exists.

    There's a parser generator for C++, with Visual Studio integration, which I've used before: http://www.codesynthesis.com/products/xsd/c++/parser/

    It generates a (SAX) parser skeleton from a XML Schema and you just have to fill in the methods which instantiate your objects.

    Unfortunately, I don't know of anything similar for other languages.



  • @blakeyrat XmlSerializer seems to think I want to Deserialize the entire XML document and not just the single node, I believe that's the problem-- it's looking at the XSD and saying "ok your root is BlahSet, but this XML's root tag is BlahItem ERROR ERROR!"

    At least that's my guess. Still no clue how to communicate to it that I only want to deserialize the one element...



  • @blakeyrat Can't you just do a little bit of string mangling to get you the rest of the way there?



  • @cartman82 said in XML Streaming Deserializer in C#:

    Can't you just do a little bit of string mangling to get you the rest of the way there?

    I have no idea what that error is complaining about, so no. I'm not sure what input it would find acceptable. I've just been flailing around so far.

    All the Googling is people saying adding a XmlRootAttribute fixed it, but I've tried every single variation of that and I can't get the error to change.



  • @blakeyrat Create a root tag with a namespace prefix, so the xmlns= on the child doesn't collide with the existing namespace of the root tag?



  • @PleegWat said in XML Streaming Deserializer in C#:

    Create a root tag with a namespace prefix, so the xmlns= on the child doesn't collide with the existing namespace of the root tag?

    Create a root tag how? How do I find out if the child xmlns is colliding with the namespace of the root tag?

    Fucking I hate people who build compilers and programming tools. "Unexpected input". Great. BUT TELL ME WHAT IS EXPECTED! It's useless to tell me you didn't get what you wanted without also telling me WHAT YOU WANTED!!!!!



  • @blakeyrat On second inspection I may have misread what you were trying to do - I thought you were editing the exported XML fragment string - re-wrapping it with the expected root tag. I guess that's not what you're doing.



  • @PleegWat I'm certainly not trying to do that.

    What I'm actually doing is quickly becoming "throw this laptop across the room" angry at the lack of any kind of useful error messages, debugging methods, or even Google results about this issue.



  • @BaconBits Fine; thank you for the long tirade on why I'm a stupid wrong idiot. Completely with a "erm," at the beginning. Awesome.

    If you have any help, please provide it. If you just want to call me stupid, feel free to move to literally any other category on this forum and knock yourself out.


  • I survived the hour long Uno hand

    @BaconBits said in XML Streaming Deserializer in C#:

    new XmlSerializer(typeof(BlahSet));

    Wait, maybe stupid question, but have you tried using new XmlSerializer(typeof(BlahItem)) instead?



  • Ok, I'm up to 2 co-workers stumped, so at least I feel a bit better about it.

    There's absolutely no reason any of us can see that this serializer shouldn't be able to deserialize this XML fragment. The XML fragment is confirmed correct and other than the added namespace attribute exactly matches the contents of the physical file. Moreover, the solution that helped AFAICT every single other person who's ran into this exact problem does not help for this code.

    I should have been a plumber.



  • @Yamikuronue said in XML Streaming Deserializer in C#:

    Wait, maybe stupid question, but have you tried using new XmlSerializer(typeof(BlahItem)) instead?

    I've tried a million different things. Yes I've tried a serializer with the set class and the element-in-the-set class.



  • Maybe you can work out a hybrid solution using XmlTextReader? I think it does what you want as far as not loading everything in one shot. I don't know about deserialization with it.

    https://support.microsoft.com/en-us/kb/301228



  • @fwd My problem isn't the streaming part, that works fine. My problem is the deserialization part.


  • Discourse touched me in a no-no place

    @blakeyrat said in XML Streaming Deserializer in C#:

    All the Googling

    Does this do what you want? If I understand it, the code there will remove the namespace, which maybe will make your error go away.


  • Winner of the 2016 Presidential Election

    @blakeyrat Would you be OK with a solution that involves a code generator, like the one I mentioned above? Just tried to find something similar for C# and found this post on SO, it might help:

    If you want to create custom objects from XML, a parser generator is probably what you want.

    Edit: Also found Linq2XSD, which sounds promising, but appears to be dead.



  • Co-worker 3 found the answer, although mostly by accident.

    I found a hundred pages on Google that said you needed to add a XmlRootAttribute to the XmlSerializer to fix it; we finally discovered through trial and error that if your XML has a namespace, the XmlRootAttribute must have that namespace specified. Nobody mentioned this. Not in ANY webpage EVER.

    This 75 lines of code was like 6 hours of my day, thank God everybody uses JSON now.



  • @blakeyrat said in XML Streaming Deserializer in C#:

    Great. BUT TELL ME WHAT IS EXPECTED! It's useless to tell me you didn't get what you wanted without also telling me WHAT YOU WANTED!!!!!

    This is my everyday experience.

    Our internal diagnostics suck...



  • @blakeyrat said in XML Streaming Deserializer in C#:

    If you have any help, please provide it.

    Time to fire up my Visual Studio. I'll get back to you if I find anything.



  • @xaade It's solved.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.