XML Format



  •  I'm defining an XML format, but I'm not seeing clearadvatanges of using attributes over using child elements.

    Here's the thing:

    	<pagetree>
    	
    		<page title="" url="">
    			<page title="" url=""></page>
    			<page title="" url=""></page>
    			<page title="" url=""></page>
    			<page title="" url="">
    				<page title="" url=""></page>
    				<page title="" url=""></page>
    				<page title="" url=""></page>
    			</page>
    		</page>
    
    
    
    		<page>
    			<title>home</title>
    			<url>/zvzzv/</url>
    			<children>
    				<page>
    					<title>Sub 1</title>
    					<url>/zvzzv/jhgf/</url>
    					<children></children>
    				</page>
    				<page>
    					<title>Sub 2</title>
    					<url>/zvzzv/kjhg/</url>
    					<children></children>
    				</page>
    				<page>
    					<title>Sub 3</title>
    					<url>/zvzzv/fdsaw/</url>
    					<children></children>
    				</page>
    			</children>
    		</page>
    
    	</pagetree>
    

    What would you recommend?



  • The first problem I see is that this is not a very extensible format. Might I suggest this? http://thedailywtf.com/Articles/Extensible-XML.aspx

    For reals... you've identified one of the many reasons why XML is an absurdly stupid tool to represent data. The fact of the matter is, no one knows... not even the guys who thought it'd be a good idea to make a language based off of HTML invented it. So, just flip a coin. Or pick the one you think looks prettier.

    The good news is, it will still be human readable.



  • The attribute syntax is less bloated, but I seem to recall there being some "special" characters that are impossible to embed in attributes, but can be escaped or embedded in CDATA in elements... which characters those are, I don't recall, unfortunately...



  • Per IBM: http://www.ibm.com/developerworks/xml/library/x-eleatt/index.html

    If you consider the information in question to be part of the
    essential material that is being expressed or communicated in the XML, put it in
    an element.
    For human-readable documents this generally means the core
    content that is being communicated to the reader. For machine-oriented records
    formats this generally means the data that comes directly from the problem
    domain. If you consider the information to be peripheral or incidental
    to the main communication, or purely intended to help applications process the
    main communication, use attributes.
    This avoids cluttering up the core
    content with auxiliary material. For machine-oriented records formats, this
    generally means application-specific notations on the main data from the
    problem-domain.



  • Basically that^^.

    Also, please, please, please put child pages inside a container element (such as children in your second example).  It makes serializing so much easier.



  • Welp, elements it is, then.

    Thanks, boys.



  • My experience has taught me that the most elegant way to design XML is to make it look like your OO schema. So if you have nested containers in your business objects like this:

    public class A 
    { 
       public List<AChild> Children { get; set; } 
       public string Name { get; set; } 
    }
    

    class AChild
    {
       public List<AGrandChild> Children { get; set; }
       public string Name { get; set; }
    }

    class AGrandChild
    {
       public string Name { get; set; }
    }

    It would be in XML like this:

    <a name = "blah">
       <achild name = "larry blah">
          <agrandchild name = "larry blah jr."/>
       </achild>
    </a>

    Basically, use child elements to model sets and use attributes to model single properties. It makes for a pretty simple and consistent translation between objects and xml IMAO.



  • fun fact: 90% of the XML I defined was discarded by the third party and written down in essentially the same way but with different tag names.

    I don't really mind as the goal isthat we're using th same settings format, but I still have hard time figuring out motives for that one.



  • @dhromed said:

    fun fact: 90% of the XML I defined was discarded by the third party and written down in essentially the same way but with different tag names.

    I don't really mind as the goal isthat we're using th same settings format, but I still have hard time figuring out motives for that one.

    That would only make sense if they intended to merge it with other XML, hated namespaces, and were wtfs in an of themselves.

     



  • @hoodaticus said:

    My experience has taught me that the most elegant way to design XML is to make it look like your OO schema. So if you have nested containers in your business objects like this:

     

    public class A 
    { 
       public List<AChild> Children { get; set; } 
       public string Name { get; set; } 
    }
    

    class AChild
    {
       public List<AGrandChild> Children { get; set; }
       public string Name { get; set; }
    }

    class AGrandChild
    {
       public string Name { get; set; }
    }

    It would be in XML like this:

    <a name = "blah">
       <achild name = "larry blah">
          <agrandchild name = "larry blah jr."/>
       </achild>
    </a>

    Basically, use child elements to model sets and use attributes to model single properties. It makes for a pretty simple and consistent translation between objects and xml IMAO.

    No, no noooooo! Like I said above, please use container elements:

    <a name = "blah"> <children>    <achild name = "larry blah"> <children>        <agrandchild name = "larry blah jr."/> </children>    </achild> </children> </a>

     

    This makes it so much easier to serialize.



  • @hoodaticus said:

    were wtfs
     

    there



  • @Sutherlands said:

    No, no noooooo! Like I said above, please use container elements:[...]

    This makes it so much easier to serialize.

    Can you be a bit more specific on what part of the serialize process is made easier? I've always found container elements pretty retard, it's not like i don't know /child/child is a child of /child if it's not wrapped in /child/childeren/child.

     

    Personally the only rule i use for elements vs attributes is that attributes have a one-one relation with the elemnt, while elements can have a one-many relation.

    If you need a many-many relation, resort to id and ref.



  • In C#, you have an object:

    public class PageTree

    {

       public List<Page> Pages { get; set; }

    }

    Serializes to:

    <PageTree>

       <Pages>

          <Page/>

          <Page/>

       </Pages>

    </PageTree>

     

    simply by using XmlSerializer.Serialize.  Try getting it to deserialize from the XML without the container elements.

     

    Btw, I couldn't get code or pre to work to format, so...



  • Ofcourse, your pagetree contains a List, the List contains pages. Your container element indicates the List is there.

    I'm not familiar with Serialize, but you might want to try this instead:


    public class PageTree extends List<Page>
    {

    }

    Serializes to:

    <PageTree>
        <Page/>
        <Page/>
    </PageTree>

    Arguably this is a cleaner desing too. If extending List is too much, extend or implement a supperclass or the interfaces of List instead.

    Ps. Use the html button to add code tags :P



  • That works if that's all it is is a straight list, but it doesn't allow multiple separate lists of the same thing, doesn't allow new elements or attributes, and doesn't allow it to be a list of itself (so you can't just nest children of the same type).



  • My mistake, you should be abe to get a long way with something like:

    public class PageTree extends List<IPage> Implements IPage
    {

    }

     Then again, XmlSerializer.Serialize probably doesn't support half the stuff i used above, considering i'm mostly writing Java lately. Multilpe seperate lists might still be a problem, but im not sure what you mean with new elements or attributes, the class above should still be able to contain other members.

    Glad I've been using XSLT for my XML manupilation mostly, and lately some SOAP if we must talk in XML. The former allows every random xml repesenation you can think off, and the later support both Serialize and Deserialize, so you never have to look or think about the resulting XML.



  • Well, first, I'm using C#, not Java.  It looks like your sample is trying to fix the - can't contain a list of itself - thing, but I'm not sure.  The class CAN'T contain other members, they're simply not serialized. 


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.