Character encoding



  • @morbiuswilters said:

    Transmit? Sure. Store? Seems unlikely. But, see, the problem here is software that transcodes my data. Why are you doing that? You shouldn't be doing that (or if you absolutely must you should be XML-aware so you can make the appropriate change to the document).
    Databases are the biggest culprit here.  A lot of people store XML data as strings in databases and modern DBMSs do a lot of transcoding.



  • @Jaime said:

    @morbiuswilters said:

    Transmit? Sure. Store? Seems unlikely. But, see, the problem here is software that transcodes my data. Why are you doing that? You shouldn't be doing that (or if you absolutely must you should be XML-aware so you can make the appropriate change to the document).
    Databases are the biggest culprit here.  A lot of people store XML data as strings in databases and modern DBMSs do a lot of transcoding.

    Good point. I just use UTF-8 for everything, though. It still seems like a problem with the way things are being done. To me, the spec should require an explicit declaration containing the encoding and that should be the only source for encoding information. Implicit encodings, out-of-band encodings.. all this leads to trouble. I hear people bitching about encoding problems all of the time but I rarely ever encounter them, perhaps because I am diligent about enforcing my own standards.


  • Discourse touched me in a no-no place

    @morbiuswilters said:

    @PJH said:
    Wouldn't the sensible (hah!) thing to do would be to use the most recent specification? i.e. use the one in the headers if specified, and replace it with the one inline if one is specified?

    HTTP headers take precedence over inline declarations. The theory being that the server may transcode the data without altering the declaration.

    Um. Not that I know much (if anything) about XML, but in general (i.e. any blob of data being served) if the server knows enough about what it's mutilating when sending it to provide 'meta-'meta data, shouldn't it mutilate the meta-data in what it's sending to match?



  • @PJH said:

    Um. Not that I know much (if anything) about XML, but in general (i.e. any blob of data being served) if the server knows enough about what it's mutilating when sending it to provide 'meta-'meta data, shouldn't it mutilate the meta-data in what it's sending to match?

    I agree but the W3C and IETF do not, unfortunately.


  • ♿ (Parody)

    @morbiuswilters said:

    @Jaime said:
    It's not like this is a new problem the the W3C couldn't have expected, FTP implementions are infamous for causing problems by doing improper CRLF translations.

    I haven't used FTP for, like, 14 years, so I'll take your word on it. Also, the problem sounds like the CRLF translations: why are you doing that? Stop doing that.

    I haven't done much FTP lately, either, but IIRC the problem is really when you are operating in text mode instead of binary mode (or vice versa...who fucking cares, really?), which I suppose is it's own WTF.*



  • @boomzilla said:

    @morbiuswilters said:
    @Jaime said:
    It's not like this is a new problem the the W3C couldn't have expected, FTP implementions are infamous for causing problems by doing improper CRLF translations.

    I haven't used FTP for, like, 14 years, so I'll take your word on it. Also, the problem sounds like the CRLF translations: why are you doing that? Stop doing that.

    I haven't done much FTP lately, either, but IIRC the problem is really when you are operating in text mode instead of binary mode (or vice versa...who fucking cares, really?), which I suppose is it's own WTF.*

    I seem to recall that. I also seem to recall a rule to never use text mode, probably for that reason.


Log in to reply