Big list of software that cannot handle spaces or accents in paths
-
@marczellm In this case, whether or not there are other issues about, I've identified that the switch-keyboard-but-not-language case is not being handled correctly on Windows. (The code simply ignores the critical info.)
Better to file a bug multiple times than never. (Also, those other bugs you identified have subtle differences.)
-
-
@zecc That's a limitation of the vFAT filesystem (and allowed-but-not-a-good-idea on any other filesystem)
-
@gwowen I don't know, it created the directory anyway. It just failed to then proceed to copy any files into it.
-
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
not-a-good-idea
Why not? Sure, it's probably not a good idea if you insist on using command lines that interpret everything as a string, but any proper API shouldn't have any problem with a string containing a |
-
@jaloopa said in Big list of software that cannot handle spaces or accents in paths:
Why not? Sure, it's probably not a good idea if you insist on using command lines that interpret everything as a string, but any proper API shouldn't have any problem with a string containing a |
If if its not a problem for me, its a potential problem for everyone who uses my files - who may not use @jaloopa approved APIs.
Not only can they not copy the files to a vFAT formatted USB stick or phone (i.e. essentially all of them) but the world is full of code that's subtly broken, and doing this will cause completely-avoidable problems.
And I don't think its a good idea to create completely-avoidable potential problems for my friends/clients (or strangers for that matter), because I'm not an sociopath.
-
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
@zecc That's a limitation of the vFAT filesystem (and allowed-but-not-a-good-idea on any other filesystem)
- The VFAT filesystem does not have such limitation. In fact, it has very few limitations. Windows do have limitations, but not when using the
\\?\
prefix. So it is actually possible to create such file. - It is the MTP protocolāor maybe its implementations in various devicesāthat forbids many things and never tells you what the problem is.
- The VFAT filesystem does not have such limitation. In fact, it has very few limitations. Windows do have limitations, but not when using the
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
The VFAT filesystem does not have such limitation
https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations claims it does. What's your argument that it doesn't?
-
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
The VFAT filesystem does not have such limitation
I have two arguments:
- The characters are not special to the filesystem, so it is not the one limiting them. The operating system might restrict them, but that's not the same thing. And the operating system (Windows 7 in this case) doesn't restrict them in all cases.
- I have actually tested it. And it is possible to create a file with
|
and*
in its name, on Windows 7, on a FAT32 (which is really VFAT32; it supports long names) flash disk. The explorer shows them with some replacement characters and so doesdir
command, but if you copy the name (even in the explorer), the result still contains|
and*
. I believe it is also possible to create name with/
, but not with MSys2, which is what I used to get the|
and*
in there.
-
Linux perf.
I was profiling an in-house plasma modelling library together with an
NLopt
-based application, which all resided in a directory with spaces in path."Hmm, that's interesting. Which part of Voight profile calculation takes most CPU time?"
"Let's annotate the disassembly with the timing data."
What.
The Debian version is even worse than current upstream because it uses just plain
snprintf("... %s ...", ...
instead ofsnprintf("... \"%s\" ...", ...
(which is why I saw the bug). The way it's done right now it would still work with most filenames but will barf on anything with"
,$
or a backtick in filenames. Since they're building a shell command line, they should have passed their arbitrary string filenames as command line parameters viaexecl("/bin/sh", "sh", "-c", command, "--", filename, NULL)
and used proper shell escapes to access positional variables containing arbitrary characters (snprintf("... \"$1\" ...", ...
).I used to like UNIX philosophy when I was in high school, but the way people implement it depresses me.
-
@aitap said in Big list of software that cannot handle spaces or accents in paths:
I used to like UNIX philosophy when I was in high school, but the way people implement it depresses me.
There are lots of things that are appealing to high school students but are to adults with properly-functioning brains.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
The VFAT filesystem does not have such limitation
I have two arguments:
- The characters are not special to the filesystem, so it is not the one limiting them. The operating system might restrict them, but that's not the same thing. And the operating system (Windows 7 in this case) doesn't restrict them in all cases.
- I have actually tested it. And it is possible to create a file with
|
and*
in its name, on Windows 7, on a FAT32 (which is really VFAT32; it supports long names) flash disk. The explorer shows them with some replacement characters and so doesdir
command, but if you copy the name (even in the explorer), the result still contains|
and*
. I believe it is also possible to create name with/
, but not with MSys2, which is what I used to get the|
and*
in there.
It's also worth noting that NTFS is technically UCS2 for its filenames. From what I understand of that "encoding" it's really just an array of int16, not possible to join many int16s into a char like UTF-16. Maybe it has a teensy bit of specification beyond the int16s. The Windows GUI probably treats it as a UTF-16 string.
-
@mikehurley IIRC it's WTF-16 Āā while UTF-16 only allows surrogates in pairs, and proper UCS-2 doesn't allow them at all, WTF-16 typically allows unpaired surrogates (here for historic reasons).
-
@mikehurley said in Big list of software that cannot handle spaces or accents in paths:
It's also worth noting that NTFS is technically UCS2 for its filenames. From what I understand of that "encoding" it's really just an array of int16, not possible to join many int16s into a char like UTF-16. Maybe it has a teensy bit of specification beyond the int16s. The Windows GUI probably treats it as a UTF-16 string.
Actually it's a long shot to call it anything Unicode-related, because the comparison used does not follow Unicode equivalence rulesāthe composed and decomposed form are considered different.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@mikehurley said in Big list of software that cannot handle spaces or accents in paths:
It's also worth noting that NTFS is technically UCS2 for its filenames. From what I understand of that "encoding" it's really just an array of int16, not possible to join many int16s into a char like UTF-16. Maybe it has a teensy bit of specification beyond the int16s. The Windows GUI probably treats it as a UTF-16 string.
Actually it's a long shot to call it anything Unicode-related, because the comparison used does not follow Unicode equivalence rulesāthe composed and decomposed form are considered different.
Most of what I know about it came from a guy who did a lot of work on NTFS data recovery code. He maybe gave me the lazy explanation but the important part I believe he was trying to convey was it's technically UCS-2; UCS-2 is a rather worthless encoding so you may as well treat it as an array of int16 or byte.
-
@mikehurley UCS-2 is not that worthless encoding. It is 16-bit code words, fixed width, Unicode Basic Multilingual Plane only. Since letters of all live languages do fit in the Basic Multilingual Plane, it's quite sufficient for many purposes.
The actual encoding used by Windows on its filesystems is much worse. It is 16-bit code words, interpreted haphazardly.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
The VFAT filesystem does not have such limitation
I have two arguments:
- The characters are not special to the filesystem, so it is not the one limiting them. The operating system might restrict them, but that's not the same thing. And the operating system (Windows 7 in this case) doesn't restrict them in all cases.
- I have actually tested it. And it is possible to create a file with
|
and*
in its name, on Windows 7, on a FAT32 (which is really VFAT32; it supports long names) flash disk. The explorer shows them with some replacement characters and so doesdir
command, but if you copy the name (even in the explorer), the result still contains|
and*
. I believe it is also possible to create name with/
, but not with MSys2, which is what I used to get the|
and*
in there.
If it's anything like when I tested creating files with | and * in NTFS via cygwin, it is cygwin that replaces the characters with something else - not explorer.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
Since letters of all live languages do fit in the Basic Multilingual Plane
Unless you're dealing with proper names in Japan. (Or is it China that has the problems? I see the bug reports for them go past, but I don't field them myself so I can't remember which is particularly painful.)
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
Unless you're dealing with proper names in Japan.
Do you count emoji as a live language? Sometimes you need to store a pile of poo!
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
Unless you're dealing with proper names in Japan. (Or is it China that has the problems? I see the bug reports for them go past, but I don't field them myself so I can't remember which is particularly painful.)
I am not sure whether it is, but if it is related to the han unification, it is not really a problem as long as the application uses appropriate font depending on whether it is dealing with Japanese or Chinese. Which is not good, but can be worked around.
@zemm said in Big list of software that cannot handle spaces or accents in paths:
Do you count emoji as a live language? Sometimes you need to store a pile of poo!
Yes, but I do not count them as letters.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
the comparison used does not follow Unicode equivalence rulesāthe composed and decomposed form are considered different.
I dare you find a piece of software that actually considers those equal.
-
@gÄ ska I guess the only sane way to compare those as equal would be to transform both sides into a canonical form before doing a straight bit comparison.
-
@khudzlin ā¦ and that's precisely what the normal forms are for.
@gÄ ska said in Big list of software that cannot handle spaces or accents in paths:
I dare you find a piece of software that actually considers those equal.
Well, the MacOS filesystem does, by virtue of normalizing all the filenames.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
I am not sure whether it is, but if it is related to the han unification, it is not really a problem as long as the application uses appropriate font depending on whether it is dealing with Japanese or Chinese. Which is not good, but can be worked around.
I believe that worksā¦ provided you're not dealing with family names in the relevant culture, as those have a tendency to use characters which are now obsolete in general usage. The whole thought of this makes me far more depressed than dealing with emoji, as people get understandably touchy about being able to write their names properly.
-
@gÄ ska said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
the comparison used does not follow Unicode equivalence rulesāthe composed and decomposed form are considered different.
I dare you find a piece of software that actually considers those equal.
MacOS does, and not just in its filesystem ā at least as far as editing them in a text field is concerned. Hereās two
Ć
s in Unicode Code Converter:These take only two presses of Backspace to delete entirely ā itās not that the first press deletes the right
Ć
and the second turns the left one into ana
without accent.Though, of course, you might mean āconsiders those equal behind the scenes,ā in which case you might be right, I donāt know.
-
@gurth said in Big list of software that cannot handle spaces or accents in paths:
@gÄ ska said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
the comparison used does not follow Unicode equivalence rulesāthe composed and decomposed form are considered different.
I dare you find a piece of software that actually considers those equal.
MacOS does, and not just in its filesystem ā at least as far as editing them in a text field is concerned. Hereās two
Ć
s in Unicode Code Converter:These take only two presses of Backspace to delete entirely ā itās not that the first press deletes the right
Ć
and the second turns the left one into ana
without accent.Though, of course, you might mean āconsiders those equal behind the scenes,ā in which case you might be right, I donāt know.
Mac OS stores filenames as some Apple-specific variant of UTF-16 NFD form.
-
@magnusmaster said in Big list of software that cannot handle spaces or accents in paths:
@asdf said in Big list of software that cannot handle spaces or accents in paths:
I don't want to derail the discussion too much, but I hate blanket statements like that. You're missing half of the point of democracy, which is not only about the right to vote, but also about the right to get involved and get elected yourself.
If you want to run for office, you are probably a psychopath. Most people do not want to risk losing friends and family
and your job, apparently:
-
@marczellm wow, that's an article full of names that are very appropriate for this topic.
-
I just saw that the latest stable release of Anaconda* is 5.0.1. Since I've got 4.x installed, I went to check out the release notes:
Spaces are no longer allowed in the installation path on Windows.
(emphasis mine)
Welcome to 1995, we've got long file names now. I assume this means it was always broken, but now at least their installer prevents you from installing it in a path it can't handle. That's, um, something?!
This must be the "throw your hands in the air and just give up" approach to software development.*Anaconda is a "python distribution". Probably comes with lots of cool stuff, but I mainly use it install locally as an unprivileged user without a ton of manual futzing.
-
@topspin said in Big list of software that cannot handle spaces or accents in paths:
Probably comes with lots of cool stuff
Not really. It's just a Python distributionā¦
-
Throwback to when the Steve-o was still in charge:
-
@bb36e said in Big list of software that cannot handle spaces or accents in paths:
Throwback to when the Steve-o was still in charge:
?
Oh, throwback. You meant the other definition of the word....
-
Recently linked elsethread:
-
@tsaukpaetra What does ācomplex hard disk setupsā mean?
-
@gurth said in Big list of software that cannot handle spaces or accents in paths:
@tsaukpaetra What does ācomplex hard disk setupsā mean?
I would assume having more than one harddrive? (Like what was possible in old Mac Pro) Or maybe having a case-sensitive file system (even though that breaks a lot of stuff).
-
@gurth said in Big list of software that cannot handle spaces or accents in paths:
@tsaukpaetra What does ācomplex hard disk setupsā mean?
Having more than just one partition named "Mac HD"
-
@zemm said in Big list of software that cannot handle spaces or accents in paths:
I would assume having more than one harddrive? (Like what was possible in old Mac Pro)
Thatās more than one internal drive. Though Iāve never owned a Mac Pro, Iām not sure how that would really differ from having one internal and one or more external ones, given that macOS is a Unix and so mounts everything under a directory of the root filesystem (/Volumes being the default). Why software would break because of something like this, Iām not sure, but it sounds like a .
Or maybe having a case-sensitive file system (even though that breaks a lot of stuff).
That is indeed seriously a bad idea. I tried it once, long ago, and even Opera wouldnāt run, IIRC.
@tsaukpaetra said in Big list of software that cannot handle spaces or accents in paths:
Having more than just one partition named "Mac HD"
Not sure if youāre serious ā¦ But just to be on the safe side, I just tried it with some disk images: I made one, named the volume
Test
, and mounted it. Then copied the disk image and mounted that too. Hereās what results:$ ls -l /Volumes/ total 40 lrwxr-xr-x 1 root wheel 1 8 feb 17:24 Macintosh HD -> / drwxr-xr-x 5 gurth staff 238 26 feb 19:19 Test drwxr-xr-x 5 gurth staff 238 26 feb 19:20 Test 1
But it does show a minor WTF elsewhere:
-
@gurth said in Big list of software that cannot handle spaces or accents in paths:
macOS is a Unix and so mounts everything under a directory of the root filesystem (/Volumes being the default). Why software would break because of something like this, Iām not sure, but it sounds like a .
From the original description of the bug fix it would indeed be a . I never said Mac OS itself couldn't handle multiple drives, just that it is more complex than one. These developers already demonstrated that there were bugs around install paths so this would add to it.
I used to use a Mac Pro with several terabytes of hard drive space for video editing and rendering. All that software was fine.
-
Silk. Apparently.
Edit: Fixed. Fuckers.
-
This Emacs bug is ridiculous.
-
@marczellm said in Big list of software that cannot handle spaces or accents in paths:
This Emacs bug is ridiculous.
I āloveā how it seems to depend on whether during some sort of startup phase it inspects where it is currently and tries to guess how to interpret filenames elsewhere based on that (seemingly incorrect in some cases) initial guess. That'sā¦ baroque and troublesome.
-
@dkf smells like leakage of Linux "filenames are sequences of binary bytes" mantra. I can see some bright spark crafting a "read the path to the working directory -> if it contains valid UTF-8 chars, treat all filenames as UTF-8 -> otherwise, treat all filenames as ISO-8859-1" routine and merrily applying it to NTFS as well, despite NTFS using UTF-16.
-
@dcoder I'd hazard the ntfs filesystem driver transcodes between utf-8 and utf-16 as needed.
-
@pleegwat I was thinking that emacs path building code, after checking the working dir and not finding any UTF-8 symbols there, might think all future filenames, including
"c:/Users/M\uC3A1rton/"
, are ISO-8859-1 strings that need to be converted to UTF-8 or -16.
-
Of course none of you attempts to find the relevant code yourself, instead making wild buttumptions.
Which is what makes this forum fun. Carry on. :)I don't know if you noticed this but a very similar bug was fixed after I reported it: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=25038
-
@dcoder said in Big list of software that cannot handle spaces or accents in paths:
@dkf smells like leakage of Linux "filenames are sequences of binary bytes" mantra. I can see some bright spark crafting a "read the path to the working directory -> if it contains valid UTF-8 chars, treat all filenames as UTF-8 -> otherwise, treat all filenames as ISO-8859-1" routine and merrily applying it to NTFS as well, despite NTFS using UTF-16.
Well, that would be just as wrong in Linux. Filenames are sequences of bytes means that:
- you avoid recoding them and
- you display them according to the locale.
@pleegwat said in Big list of software that cannot handle spaces or accents in paths:
@dcoder I'd hazard the ntfs filesystem driver transcodes between utf-8 and utf-16 as needed.
Windows don't use utf-8. They have a utf-16-le version of the interface and an 8-bit version using the legacy charset according to the system locale.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@dcoder said in Big list of software that cannot handle spaces or accents in paths:
@dkf smells like leakage of Linux "filenames are sequences of binary bytes" mantra. I can see some bright spark crafting a "read the path to the working directory -> if it contains valid UTF-8 chars, treat all filenames as UTF-8 -> otherwise, treat all filenames as ISO-8859-1" routine and merrily applying it to NTFS as well, despite NTFS using UTF-16.
Well, that would be just as wrong in Linux. Filenames are sequences of bytes means that:
- you avoid recoding them and
- you display them according to the locale.
@pleegwat said in Big list of software that cannot handle spaces or accents in paths:
@dcoder I'd hazard the ntfs filesystem driver transcodes between utf-8 and utf-16 as needed.
Windows don't use utf-8. They have a utf-16-le version of the interface and an 8-bit version using the legacy charset according to the system locale.
Other way round.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
you display them according to the locale.
Virtually all Unix platforms now use locales that have UTF-8 as their encoding. The only troublesome paths are on legacy filesystems where no mount options were passed in to persuade the filesystem driver to do the translation for you. (In that particular case, there's not really all that much that applications can do as shit's just plain fucked on disk and in the OS.)
They have a utf-16-le version of the interface
And that's what applications should use all the time. Nobody then cares what the app does internally as long as it talks to the OS correctly. Using the legacy 8-bit stuff is in this day and age.
-
@pleegwat said in Big list of software that cannot handle spaces or accents in paths:
Other way round.
Which other way?
-
@marczellm said in Big list of software that cannot handle spaces or accents in paths:
This Emacs bug is ridiculous.
I would have thought Emacs was "perfect" by now: I'm surprised that it's still actively developed since "everyone" uses vim instead. And also surprised that the discussion was relatively helpful, even if following the standard pattern of "works on my machine. oh wait, it doesn't. here, try this fix".