Google sitemap: URL encoding querystring

  • The company's Sitemap generation code is being reviewed for the first time in about a hundred thousand years. There's a question that came up that I can't figure out.

    Okay, so Google insists that the sitemap contain URL encoded characters. Which makes sense, since you don't want to send this: humper

    You want to send this:

    But they also seem to want the & character encoded as &. But I can't figure out if they want the querystring key separator encoded or not.

    In other words, do they want this:

    Or this:

    And if it is the latter, will their indexing translate that back to &, or do I have to write a special handler to turn & back to &? (Otherwise evaluating querystring("id") results in the string "1&name=dog%20humper".

  • The sitemap must be valid XML, and & in XML is always encoded as &.

  • SockDev




    As a developer i would expect that URLs in the sitemap should be encoded as the browser is expected to navigate to them, thus ?id=1&2 where the & is part of the data would be ?id=1&2 and .... but wait.... no, that's wrong.

    the proper URL encoding for & isn't & it's %26..... & is the proper XML encoding of &

    .... correction. encode literal & as %26 as per standard URL stuff and separator & as & because XML apparently.

  • Sounds like a job for …

    regular expressions!

  • @ben_lubar said in Google sitemap: URL encoding querystring:

    The sitemap must be valid XML, and & in XML is always encoded as &.

    This. You have to keep in mind that URL encoding is complete separate from properly encoding special characters in an XML file. You need to encode the URL first, then encode everything as XML entities. If you're using some programming language's XML library to generate the file, it might do the XML encoding for you.

    Edit: I just checked Python's ElementTree library, and it does in fact encode & as & for you.

Log in to reply

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.