Big list of software that cannot handle spaces or accents in paths
-
@cark said in Big list of software that cannot handle spaces or accents in paths:
@Grunnen said in Big list of software that cannot handle spaces or accents in paths:
@ben_lubar That's also find-specific.
Now try to make this work in a whitespace-proof way:
$ rm `tar -tf archive.tar`
#!/usr/bin/env python3 import tarfile, os with tarfile.open('./file.tar', 'r') as f: for name in f.getnames(): os.unlink(name)
It's 2017, why are you still writing shell scripts for anything non-trivial?
To show why this mentality of "you should just not use whitespace in file names" is so widespread on an OS that technically has been supporting whitespace in file names for decades, that's why.
-
@LB_ said in Big list of software that cannot handle spaces or accents in paths:
What is the sound of one hand clapping?
-
@Grunnen tar -tf file.tar | while read file;do if [ -f "$file" ] ; then rm "$file" ; fi; done
It's not a good thing that you have to jump through these hoops, but it is possible (and I'm not sure if that works with filenames with newlines in them -- quick check, no it doesn't).
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
@ben_lubar said in Big list of software that cannot handle spaces or accents in paths:
Any text file that starts with #!/ is a shell script.
Any file that starts with that is a script. It'd need to be executing with something that calls itself a shell for it to be a shell script. (For example, you can use it with Perl and then it would be a perl script since nobody sane calls Perl a shell.)
well, no one sane uses Go, so i can see where @blubar got confused.
:-p
-
@gwowen Filenames can have newlines in. They're not even the most evil thing that can be in there either; ANSI escape sequences are also legal…
-
@dkf I know that. That's why I checked.
Indeed, that is HOW I checked. I made some files with newlines in the filename, tested how well the "while read" loop worked. (Not well)
Given that I tested explicitly whether it worked with filenames with newlines, and reported the result, I'm slightly confused as to why you felt the need to tell me that such things were possible.
-
To vindicate @blakeyrat's point about CLIs, I'll re-tell a personal anecdote:
There's a partitioning program you've probably used called GParted. It's the standard in all GTK-based distros, and it actually has a really good UI in my opinion, one of the few examples of nice and properly designed Linux programs.
When you make a change to a partition, it adds that change to a "pending actions" list, and when you click a button it applies all the changes in the list. Actions are decomposed into their sub-actions (e.g. resize a partition = check file system, resize partition, resize file system) and every action is done by calling an external command line program or a library function (and you can see the output of all those actions).
So, I tried to resize a NTFS partition. First it successfully modified the partition limits, then it called the command line program
ntfsresize
to resize the file system in it....however, the
ntfsresize
developers had recently added an "are you sure you wish to continue?" prompt, which GParted was not programmed to expect. It detected an unknown output and stopped the entire operation, leaving a partially-resized partition. Oops.And there's the problem. Programmers should be able to change their programs' human interfaces without breaking other programs.
-
@anonymous234 said in Big list of software that cannot handle spaces or accents in paths:
Programmers should be able to change their programs' human interfaces without breaking other programs.
If the only interface to your program is its CLI, then your program's human and machine interface are one and the same.
And -let's be real here- there are humans out there that are no better at handling unexpected "workflow"-breaking CLI changes than machines.
-
@bugmenot cli is used by shell scripts too, you shouldn't change it for no good reason
-
Is this going to be one of those threads where everyone says the same thing at each other, angrily, as if to prove that they disagree?
-
@Yamikuronue Don't be stupid. This is obviously a thread where everyone's statements are the same thing so they angrily say them to each other because they want to disagree!
-
@Yamikuronue said in Big list of software that cannot handle spaces or accents in paths:
Is this going to be one of those threads where everyone says the same thing at each other, angrily, as if to prove that they disagree?
And it was going well until you came in and change the subject
-
@Yamikuronue said in Big list of software that cannot handle spaces or accents in paths:
Is this going to be one of those threads where everyone says the same thing at each other, angrily, as if to prove that they disagree?
pretty sure that there will be a couple of denizens that prove POE's law too.... but yeah the majority will be fighting to agree harder than everyone else.
-
@bugmenot said in Big list of software that cannot handle spaces or accents in paths:
If the only interface to your program is its CLI, then your program's human and machine interface are one and the same and you're an idiot.
FTFY
You should at minimum have an actual API, and not be like stupid git.
-
@Steve_The_Cynic said in Big list of software that cannot handle spaces or accents in paths:
@Yamikuronue said in Big list of software that cannot handle spaces or accents in paths:
Is this going to be one of those threads where everyone says the same thing at each other, angrily, as if to prove that they disagree?
I believe the phrase you're looking for is "violently agreeing".
?
-
The news just came in that Python joined the gang:
-
Well that's the Python build system, rather than python itself. And the reason it barfs is because two different parts of Visual C (the compiler and the linker) output their text in two different encodings (UTF-8 and ANSI-OEM code page).
So, if you pipe the output from VC and attempt to parse, it changes encoding in midstream, and - mystifyingly I know - Python detects this as an invalidly encoded text stream. So those patches are a hack to detect VC doing this and adjust encoding on the fly.
But, yeah, claim that that means "Python can't cope with accents in filenames" if you must.
-
@marczellm
Reading the bug reports and pull requests made my head hurt. I'd say it's time to kill legacy (i.e. non UTF-X) encodings, once and for all, to stop this madness.
-
@asdf said in Big list of software that cannot handle spaces or accents in paths:
I'd say it's time to kill legacy (i.e. non UTF-X) encodings, once and for all, to stop this madness.
I agree. Unfortunately Microsoft has jumped on the wide char bandwagon back when ucs-2 still seemed it will be enough and ended up with the WTF-16 encoding, but they've invested so much in that that they now don't want make another encoding transition with option to configure the “multi-byte character set” to utf-8. But meanwhile the rest of the world chose utf-8, because it is actually reasonable.
-
@Bulb
If at least that was used consistently on Windows, that'd be a step in the right direction. But apparently, the various legacy codepages are alive and well; and build systems will happily mix output in different encodings.
-
@Bulb Totally true. They picked an encoding in good faith with the knowledge at the time, and the world moved underneath them. POSIX lucked into UTF-8 as the thing that mostly works OK with C-strings. And MS know that someone somewhere is using some bizarre console app that that relies on some obscure OEM codepage and are loathe to break it. No-one is to blame, and everyone loses, but we seem to be stuck with it.
-
@accalia said in Big list of software that cannot handle spaces or accents in paths:
pretty sure that there will be a couple of denizens that prove POE's law too.
But here we won't be able to agree on which posts do so.
-
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
They picked an encoding in good faith with the knowledge at the time, and the world moved underneath them.
These terrible stupid developers. Won't get out of their 1990s mindsets!
-
I can't name this one, but someone here developing a service that:
- expects input in ascii and was documented so
- it's input come from java services, where developers frequently forgets about encoding
- it counts the number of characters in some strings, and forward that as byte length in some binary format that is forwarded to a C program
- when it receives a mis-encoded string, my C program breaks
- developer claim it's not his problem, because the input should be always ASCII
-
@Bulb said in Big list of software that cannot handle spaces or accents in paths:
But meanwhile the rest of the world chose utf-8, because it is actually reasonable.
... except Java, which also uses UTF-16. (Or perhaps WTF-16, I don't use Java myself.)
Basically, anybody who was designing a big new system around the 1994-1996 timeframe is likely to have come across UTF-16 as the best solution.
It's not some conspiracy, it's like that Raymond Chen example of the kid who asked why they didn't use the space shuttle to rescue Apollo 13: nothing like it existed at the time.
(It was also pre-web-- when it was unlikely any networked computer would have to talk to another networked computer of a different type. Macintoshes talked to Macintoshes, Suns talked to Suns, Windows talked to Windows or, in extraordinary cases, Novell Netware.)
The weirdest thing is why Python doesn't, since it was invented around that time. (I assume it went a few years of "nobody thought about character encoding whatsoever" before settling on the now-usable UTF-8.)
-
Also this XAMPP installer leaves a bunch of shit around in the drive root. I hate this stuff.
-
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
@accalia said in Big list of software that cannot handle spaces or accents in paths:
pretty sure that there will be a couple of denizens that prove POE's law too.
But here we won't be able to agree on which posts do so.
well obviously. that is rather what POE's law is about.
or is that one of the corollaries?
-
@marczellm that's an interesting date format. I typically just opt for ISO-8601.
-
@accalia said in Big list of software that cannot handle spaces or accents in paths:
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
@accalia said in Big list of software that cannot handle spaces or accents in paths:
pretty sure that there will be a couple of denizens that prove POE's law too.
But here we won't be able to agree on which posts do so.
well obviously. that is rather what POE's law is about.
or is that one of the corollaries?
Eh...some people will find stuff outrageous and others will think the same stuff is obviously correct.
-
@LB_ said in Big list of software that cannot handle spaces or accents in paths:
@marczellm that's an interesting date format. I typically just opt for ISO-8601.
This is the format that Hungarians expect to see. I didn't set it, it came with Windows language settings. I'll keep it though.
-
@marczellm said in Big list of software that cannot handle spaces or accents in paths:
This is the format that Hungarians expect to see.
Weirdos!
-
@marczellm I've run into that too (forget what package - I don't do XAMPP). I think they're including the vs runtime and installing it from C: instead of %TEMP%. Basically, they're idiots.
-
@asdf said in Big list of software that cannot handle spaces or accents in paths:
@marczellm said in Big list of software that cannot handle spaces or accents in paths:
This is the format that Hungarians expect to see.
Weirdos!
I used to work with a Hungarian. Can confirm.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
... except Java, which also uses UTF-16. (Or perhaps WTF-16, I don't use Java myself.)
That's for internal data I think. Data on disk/network is different and there's usually a well-defined moment where one changes into the other.
The biggest problems in this area actually come when dealing with filesystems where somehow you've wound up with different encodings for different parts of the same filename. There's all sorts of things that can cause this (and it is a little more common on Unixes because of the single-rooted filesystem tree) but the root cause is joining things that were written independently with different conventions (memory sticks were the classic examples), and it is a complete ass. The only reasonable fix I know is to force everything to “use ISO 8859-1” (i.e., just the bytes) and hope that the resulting zalgo don't annoy people too much. Most users are actually pretty OK with it; if they get their actual data, they don't nearly so much about the names.
It's a difficult problem. Fortunately not too common any more.
(It was also pre-web-- when it was unlikely any networked computer would have to talk to another networked computer of a different type. Macintoshes talked to Macintoshes, Suns talked to Suns, Windows talked to Windows or, in extraordinary cases, Novell Netware.)
The web was going back then, and had been around for a few years. 1994 was when it started to really commercialise. And it's been going downhill ever since!
-
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
These terrible stupid developers. Won't get out of their 1990s mindsets!
I have literally no idea what this is supposed to mean.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@Bulb said in Big list of software that cannot handle spaces or accents in paths:
But meanwhile the rest of the world chose utf-8, because it is actually reasonable.
... except Java, which also uses UTF-16. (Or perhaps WTF-16, I don't use Java myself.)
According to Oracle's documentation, Java does indeed use UTF-16. As does .NET, as it turns out.
-
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
These terrible stupid developers. Won't get out of their 1990s mindsets!
I have literally no idea what this is supposed to mean.
I'm recycling a blakeymeme.
Explaining the joke
Blakey likes to excoriate people over stuff like CLIs, which he describes as outdated 1970s technology.
-
@RaceProUK said in Big list of software that cannot handle spaces or accents in paths:
According to Oracle's documentation, Java does indeed use UTF-16. As does .NET, as it turns out.
Sounds like .Net uses WTF-16 instead:
Note that, because a String instance consists of a sequential collection of UTF-16 code units, it is possible to create a String object that is not a well-formed Unicode string. For example, it is possible to create a string that has a low surrogate without a corresponding high surrogate. Although some methods, such as the methods of encoding and decoding objects in the System.Text namespace, may performs checks to ensure that strings are well-formed, String class members do not ensure that a string is well-formed.
Speaking of, I got curious and tried googling WTF-16, and I found this:
-
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
I'm recycling a blakeymeme.
OK, fair enough. Wasted on me, I'm afraid.
-
@gwowen how do you expect us to know that? Do we look like mind readers?
-
@RaceProUK said in Big list of software that cannot handle spaces or accents in paths:
According to Oracle's documentation, Java does indeed use UTF-16. As does .NET, as it turns out.
What, did you think I was lying to you? Christ.
-
@blakeyrat It sounded like you weren't sure whether it was pure UTF-16 or not, so I checked
-
@RaceProUK Saying I wasn't sure if it was UTF-16 or WTF-16 is this thing called a "joke". You see those sometimes on comedy sites.
-
@marczellm said in Big list of software that cannot handle spaces or accents in paths:
This is the format that Hungarians expect to see.
I would've thought something like
yr2017 mo06 dy19 hr10:mn26:sc44
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@RaceProUK Saying I wasn't sure if it was UTF-16 or WTF-16 is this thing called a "joke". You see those sometimes on comedy sites.
I was unaware this was a comedy site...
-
@Tsaukpaetra Half comedy site, half insane asylum. Depends on which poster you're talking to.
-
@hungrier said in Big list of software that cannot handle spaces or accents in paths:
yr2017 mo06 dy19 hr10:mn26:sc44
damnit i was going to make that joke but then I saw you already it
@Medinoc said in Big list of software that cannot handle spaces or accents in paths:
Speaking of, I got curious and tried googling WTF-16, and I found this:
The WTF-8 encoding Wobbly Transformation Format
Is that the format that the TARDIS uses?
-
@LB_ said in Big list of software that cannot handle spaces or accents in paths:
@marczellm that's an interesting date format. I typically just opt for ISO-8601.
My favorite date format is YYYY-MM-dd, where YYYY is the year, MM is the minute, and dd is the day.
Most people will assume you meant ISO-8601 until it's too late!
-
@ben_lubar yeap, 2017-36-19 gets them every time?
ORA-01843: not a valid month
Filed Under: Oracle Hate Club
-
@darkmatter second time's a charm