NFC NFD WTF
-
We transferred a website to another hoster today and everything kept working. Well, some links to files did not work anymore. I hate investigating that shit because obviously: Would it be the files with non-ascii chars in their filename? Yes
So I'm looking at hunting down all the links and fixing them. But wait, most filenames with unicode in them work just fine. Just some do not, one example being files with
å
in them. Even if I encode the char in the URL ( turning it into %3C%A5) it does not work. Well trustyhexdump
has the answer: On the new host this character is stored as byte sequence61 cc 8a
as opposed to thec3 a5
found in the link. Now the weird thing is that on the old system the filename usesc3 a5
forå
so it must have been converted in-transit.Turns out this is a known problem with
HFS+
. It stores all filenames in NFD and will happily convert filenames to NFD. So when my boss used his Mac to move the files, he inadvertently converted NFC to NFD.The funny thing is that if the files were served from a HFS+ system it would continue to work because it allows you to open files with chars in NFC and happily converts them to NFD. Should I blame
HFS+
, I mean Apple, for being stubborn assholes? Or everybody else for allowing filenames with equivalent strings to exist alongside? Or the Unicode Consortium? Oh wait they're all part of the Unicode Consortium so it's easy to just blame that.
-
@gleemonk NFD is superior.
-
@gleemonk said in NFC NFD WTF:
the Unicode Consortium
Blame them. It's their fault we're in this shit.
-
@thegoryone said in NFC NFD WTF:
@gleemonk said in NFC NFD WTF:
the Unicode ConsortiumAnd now I have a new name for my custom civilization in Stellaris tonight.
Unicode Consortium: Universally reviled for doing a job nobody else wanted to do.
-
@gleemonk IMO Unicode has worked out remarkably well. It's just that developers don't bother to learn about the aspects beyond basic text encoding and maybe sometimes character classes. Normalization opens a fucking worldof understanding and opportunity.
-
@Weng I agree. It's really nice how a lot of worries have been taken care of by them.
For a hobby project I'm dicking around in the PRIVATE USE AREA and I can actually create fonts that render both my characters and the standard ones just fine. It's a lot of fun to create a website for your ancient writing system
-
@gleemonk Not to mention that the private use area has allowed the existence of FontAwesome, meaning scalable icons and no more folders full of 16x16 GIFs
-
@RaceProUK nope the PRIVATE USE AREA is all mine! Mine mine mine! I was there first. FA can go somewhere else!
-
@RaceProUK pff, you don't need the private use area. Before it existed, you just redefined existing codepoints, a la Wingdings and Webdings.
-
@gleemonk Why is this in Meta?
-
@blakeyrat
Because it isn't funny enough for Funny Stuff
-
Moved to sidebar.
-
@RaceProUK You know, I'd never bothered to think or look into how FA worked. This makes complete sense.
-
@thegoryone the Unicode Consortium is taking some extraordinary steps in reputation building.