Google sitemap: URL encoding querystring


  • Trolleybus Mechanic

    The company's Sitemap generation code is being reviewed for the first time in about a hundred thousand years. There's a question that came up that I can't figure out.

    Okay, so Google insists that the sitemap contain URL encoded characters. Which makes sense, since you don't want to send this:

    http://example.com?id=1&name=dog humper

    You want to send this:

    http://example.com?id=1&name=dog%20humper

    But they also seem to want the & character encoded as &. But I can't figure out if they want the querystring key separator encoded or not.

    In other words, do they want this:

    http://example.com?id=1&name=dog%20humper

    Or this:

    http://example.com?id=1&name=dog%20humper

    And if it is the latter, will their indexing translate that back to &, or do I have to write a special handler to turn & back to &? (Otherwise evaluating querystring("id") results in the string "1&name=dog%20humper".



  • The sitemap must be valid XML, and & in XML is always encoded as &.


  • FoxDev

    @Lorne-Kates

    THIS SOUNDS LIKE A JOB FOR

    0_1480710891979_upload-053ac75f-ac88-48f7-8ad9-24ca5847c2f8

    As a developer i would expect that URLs in the sitemap should be encoded as the browser is expected to navigate to them, thus ?id=1&2 where the & is part of the data would be ?id=1&2 and .... but wait.... no, that's wrong.

    the proper URL encoding for & isn't & it's %26..... & is the proper XML encoding of &

    .... correction. encode literal & as %26 as per standard URL stuff and separator & as & because XML apparently.



  • Sounds like a job for …

    regular expressions!



  • @ben_lubar said in Google sitemap: URL encoding querystring:

    The sitemap must be valid XML, and & in XML is always encoded as &.

    This. You have to keep in mind that URL encoding is complete separate from properly encoding special characters in an XML file. You need to encode the URL first, then encode everything as XML entities. If you're using some programming language's XML library to generate the file, it might do the XML encoding for you.

    Edit: I just checked Python's ElementTree library, and it does in fact encode & as & for you.


Log in to reply