We transferred a website to another hoster today and everything kept working. Well, some links to files did not work anymore. I hate investigating that shit because obviously: Would it be the files with non-ascii chars in their filename? Yes
So I'm looking at hunting down all the links and fixing them. But wait, most filenames with unicode in them work just fine. Just some do not, one example being files with å
in them. Even if I encode the char in the URL ( turning it into %3C%A5) it does not work. Well trusty hexdump
has the answer: On the new host this character is stored as byte sequence 61 cc 8a
as opposed to the c3 a5
found in the link. Now the weird thing is that on the old system the filename uses c3 a5
for å
so it must have been converted in-transit.
Turns out this is a known problem with HFS+
. It stores all filenames in NFD and will happily convert filenames to NFD. So when my boss used his Mac to move the files, he inadvertently converted NFC to NFD.
The funny thing is that if the files were served from a HFS+ system it would continue to work because it allows you to open files with chars in NFC and happily converts them to NFD. Should I blame HFS+
, I mean Apple, for being stubborn assholes? Or everybody else for allowing filenames with equivalent strings to exist alongside? Or the Unicode Consortium? Oh wait they're all part of the Unicode Consortium so it's easy to just blame that.