Big list of software that cannot handle spaces or accents in paths
-
@tsaukpaetra said in Big list of software that cannot handle spaces or accents in paths:
Silk. Apparently.
Edit: Fixed. Fuckers.
Is there a batch file equivalent to
set -e
? You probably don't want ON ERROR RESUME NEXT.
-
@ben_lubar said in Big list of software that cannot handle spaces or accents in paths:
Is there a batch file equivalent to
set -e
? You probably don't want ON ERROR RESUME NEXT.I don't think so. I tend to do an
IF ERRORLEVEL 1 EXIT /B %ERRORLEVEL%
after important commands.
-
@ben_lubar said in Big list of software that cannot handle spaces or accents in paths:
@tsaukpaetra said in Big list of software that cannot handle spaces or accents in paths:
Silk. Apparently.
Edit: Fixed. Fuckers.
Is there a batch file equivalent to
set -e
? You probably don't want ON ERROR RESUME NEXT.It doesn't matter here, it's just updating a version string or something (failing) and then moving on.
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
Virtually all Unix platforms now use locales that have UTF-8 as their encoding.
Right; but there's no way for software to ask about that and it's not in the POSIX specifications, so you just have to kind of guess and check and pray.
-
@blakeyrat Yes, there is.
nl_langinfo(CODEPAGE)
Ok, this is what I get for typing from memory. Of course the constant is called
CODESET
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@blakeyrat Yes, there is.
nl_langinfo(CODEPAGE)
HE WASN'T ASKING TO BE CORRECTED
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@blakeyrat Yes, there is. nl_langinfo(CODEPAGE)
So there's a magical undocumented command that does it. Great.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@blakeyrat Yes, there is. nl_langinfo(CODEPAGE)
So there's a magical undocumented command that does it. Great.
How is something that has a manpage and specification (yes, that is the POSIX specification) undocumented?
-
@bulb
nl_langinfo
can answer "What locale is in use right now?", but in my scenario we want an answer to "I have a sequence of bytes that represents a filename,"/usr/M\uC3A1rton/"
, is it in UTF-8 or some other encoding?", to support older/snowflake filesystems.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
How is something that has a manpage and specification (yes, that is the POSIX specification) undocumented?
Because he says you pass in
CODEPAGE
and that option's not listed in either of the documentation sources people have linked to in this thread yet?If
CODEPAGE
is valid and the correct way to do this, well, great, but if it's documented, you guys sure haven't provided any evidence of that. And linking to more sources of documentation that don't mention it isn't really going to help.Oh and as an added aside: if this is how well Linux users can read documentation, no wonder their software's all so shitty!
But all that aside, the function description says it gives locale information for the current locale. What happens if I mount a disk that was created in a different locale? Does the OS just magically re-encode all its filenames on mount (how would the OS know the encoding of the filenames?), or does this function have (an undocumented) option to get that information specific to a disk in the filesystem?
So even if the function does what the documentation says, and
CODEPAGE
is a magical undocumented parameter that returns the default text encoding for the system, you still have no way of obtaining the text encoding for a particular disk. Which is something that'd be needed to solve the problem I'm talking about. At minimum; there's probably other cases I haven't considered.Since people can, you know, move disks between computers.
AFAICT my statement was 100% accurate. But I'm sure I'm still somehow the idiot, right?
-
@dcoder said in Big list of software that cannot handle spaces or accents in paths:
@bulb nl_langinfo can answer "What locale is in use right now?", but in my scenario we want an answer to "I have a sequence of bytes that represents a filename, "/usr/M\uC3A1rton/", is it in UTF-8 or some other encoding?", to support older/snowflake filesystems.
Wow dcoder if you write stuff like this, you're going to get shunned by all the other open source developers who just write bullshit, hope and pray it works, and never look at the spec. It's almost as if you spent a few seconds thinking about the problem... surely a no-no!
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
What happens if I mount a disk that was created in a different locale?
Does the disk include a description of what encoding (or locale) was used to create the filenames on it? If not, all operating systems have to guess. The guess might be informed by user choices in some config system somewhere, but often aren't.
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
If not, all operating systems have to guess.
They don't if the filename encoding is specified in the disk format specs. Or if it's logged to the disk in some neutral unambiguous way. (Like the UTF BOM characters.)
Now if you mean "you put a shitty broken Linux disk into a Windows machine", well, yeah, that disk is going to be a problem no matter where it goes. But Windows probably won't guess any worse than any Linux distro will guess-- my main complaint is that it has to be a guess in the first place, because apparently the people who "designed" this stuff were idiots.
-
@dcoder said in Big list of software that cannot handle spaces or accents in paths:
@bulb
nl_langinfo
can answer "What locale is in use right now?", but in my scenario we want an answer to "I have a sequence of bytes that represents a filename,"/usr/M\uC3A1rton/"
, is it in UTF-8 or some other encoding?", to support older/snowflake filesystems.There's no way to tell. Either you know what encoding it is from the source you got that string from, or you don't and the best you can do is educated guessing. Like "would it decode as valid UTF-8?" or "if it's not UTF-8, does it contain any codes between 0x01..0x1f?" (probably some exotic crap you don't want to know) or 0x7f..0x9f (perhaps CP1252). But all of that is bound to break somewhere.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
Does the disk include a description of what encoding (or locale) was used to create the filenames on it?
If not, all operating systems have to guess.They don't if the filename encoding is specified in the disk format specs.
It's almost as if @dkf had mentioned exactly that.
Or if it's logged to the disk in some neutral unambiguous way. (Like the UTF BOM characters.)
You mean I can't start my filenames with "þÿ"!?
Now if you mean "you put a shitty broken Linux disk into a Windows machine", well, yeah, that disk is going to be a problem no matter where it goes.
Obviously, what with all the open-sour extfs drivers
-
@laoc said in Big list of software that cannot handle spaces or accents in paths:
It's almost as if @dkf had mentioned exactly that.
Sorry I didn't realize he owned the trademark.
-
Y'all could take a look at the proposed patch? It's extensively commented.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@laoc said in Big list of software that cannot handle spaces or accents in paths:
It's almost as if @dkf had mentioned exactly that.
Sorry I didn't realize he owned the trademark.
What you don't realize is that he owns the trademark!
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
How is something that has a manpage and specification (yes, that is the POSIX specification) undocumented?
Because he says you pass in
CODEPAGE
and that option's not listed in either of the documentation sources people have linked to in this thread yet?Are you blind or what? First page linked by @Bulb:
Second page linked by @Bulb:
Oh and as an added aside: if this is how well Linux users can read documentation, no wonder their software's all so shitty!
Imagine how much worse it would be if they've had YOUR reading comprehension!
But all that aside, the function description says it gives locale information for the current locale. What happens if I mount a disk that was created in a different locale? Does the OS just magically re-encode all its filenames on mount
Yes.
(how would the OS know the encoding of the filenames?)
The filesystem driver takes care of it. Some drivers support additional mount options that say what encoding the drive uses.
-
@gąska So
CODEPAGE
is the same thing asCODESET
? Is that your assertion?Short of telepathy, how was I expected to know that?
-
@blakeyrat as if you never made a typo.
-
@gąska said in Big list of software that cannot handle spaces or accents in paths:
@blakeyrat as if you never made a typo.
So the thing I saw on the screen is undocumented. And everything I posted here was exactly correct.
And, of course, somehow being wrong is my fault. I should have used my telepathic powers to determine it was a typo all along. Naturally.
Of course I don't even know that/if bulb made a typo. I'll change my mind when I hear it from the horse's mouth.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@gąska said in Big list of software that cannot handle spaces or accents in paths:
@blakeyrat as if you never made a typo.
So the thing I saw on the screen is undocumented.
No. The thing you saw on the screen is misspelled.
And, of course, somehow being wrong is my fault.
If you actually tried to read the provided link (the die.net one), the first thing you'd notice would be that
CODESET
does EXACTLY what @Bulb saidCODEPAGE
does. Just a minimum amount of good will.I should have used my telepathic powers to determine it was a typo all along. Naturally.
Well, I never heard about nl_langinfo in my life before now, and somehow I figured it out. And I assure you I don't practice telepathy.
Of course I don't even know that/if bulb made a typo.
You do. I told you it. You even acknowledged I told you it. The only way you can claim you don't know if it's a typo or not is if you ignore this very conversation we're having right now.
I'll change my mind when I hear it from the horse's mouth.
Oh, so you are ignoring this conversation. And you wonder why people think you're an asshole.
-
@gąska said in Big list of software that cannot handle spaces or accents in paths:
No. The thing you saw on the screen is misspelled.
And I was supposed to know that how?
For the 58 millionth time, I am not telepathic.
@gąska said in Big list of software that cannot handle spaces or accents in paths:
Well, I never heard about nl_langinfo in my life before now, and somehow I figured it out.
You didn't figure it out; you just assumed. For all you know, Bulb didn't make a typo and there is an undocumented option.
I try not to assume, at least not without making the fact that I'm assuming obvious, so it would cover my ass if I were wrong.
@gąska said in Big list of software that cannot handle spaces or accents in paths:
You do. I told you it. You even acknowledged I told you it. The only way you can claim you don't know if it's a typo or not is if you ignore this very conversation we're having right now.
You're not the one who typed it. Unless you're telepathically reading Bulb's mind (maybe you are; I lack the telepathy you seem to have), then you don't know whether or not he made a typo.
Sure I get that you seem to think it's beyond question that you're 100% correct here, but I have no reason to believe you are. If A made a typo, then A should come in here and say "whoops that was a typo guys". I don't give a shit what you assume, I want the facts.
-
@gąska said in Big list of software that cannot handle spaces or accents in paths:
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
Just a minimum amount of good will.
Found the problem!
-
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
Found the problem!
Is the problem that the option is undocumented? Because I already pointed that out.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
Found the problem!
Is the problem that the option is undocumented?
Yes, good will on your part has never been documented.
-
@boomzilla So to be clear, in this case "good will" is assuming bulb is an idiot who doesn't type what he means to say. That would have been the "good will" option.
Assuming he's an intelligent human being is the "bad will" option.
Makes... sense?
In any case, I'm done with this conversation. Nothing I said in this thread was wrong. I won't believe Bulb made a typo until he says he did. And your definition of "good will" here is fucking moronic.
-
@blakeyratCathy Newman Good discussion!
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@gąska said in Big list of software that cannot handle spaces or accents in paths:
No. The thing you saw on the screen is misspelled.
And I was supposed to know that how?
I told you how already.
@gąska said in Big list of software that cannot handle spaces or accents in paths:
Well, I never heard about nl_langinfo in my life before now, and somehow I figured it out.
You didn't figure it out; you just assumed.
When the assumption is correct, it's called figuring out.
For all you know, Bulb didn't make a typo and there is an undocumented option.
For all we know, that webpage might be fake and not actually how POSIX works. For all we know, you might be in coma and your entire life is just a dream. While certainly possible, all three are extremely unlikely to be true, and therefore a complete waste of time to consider until more evidence comes up.
I try not to assume, at least not without making the fact that I'm assuming obvious, so it would cover my ass if I were wrong.
You assumed such an essential part of system API might be undocumented. If you ask me, that's a much bolder assumption than that people make mistakes.
@gąska said in Big list of software that cannot handle spaces or accents in paths:
You do. I told you it. You even acknowledged I told you it. The only way you can claim you don't know if it's a typo or not is if you ignore this very conversation we're having right now.
You're not the one who typed it. Unless you're telepathically reading Bulb's mind (maybe you are; I lack the telepathy you seem to have), then you don't know whether or not he made a typo.
I never met Albert Einstein or read any of his papers, yet I know for sure he's the author of the theory of general relativity.
Sure I get that you seem to think it's beyond question that you're 100% correct here, but I have no reason to believe you are.
There's a spec. It defines CODESET. It does exactly what @Bulb says CODEPAGE does. What more evidence do you need?
If A made a typo, then A should come in here and say "whoops that was a typo guys".
I'm sure he will when he goes back to this topic.
I don't give a shit what you assume, I want the facts.
The fact is that CODESET does what you want and it's absolutely documented.
-
@gąska Jesus fuck.
Look, when I'm the one who makes the mistake, by all means pile on. But in this case, I didn't. Bulb made the mistake. Call him an idiot for 47 posts straight. Fuck off. You and Boomzilla both.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@boomzilla So to be clear, in this case "good will" is assuming bulb is an idiot who doesn't type what he means to say.
The minimum of good will is not calling people idiots just because they've made a typo.
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
Nothing I said in this thread was wrong.
I believe you repeated a few times that the option is undocumented. Even if you didn't have any reason to believe otherwise, you were objectively wrong here. Because it IS documented.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@gąska Jesus fuck.
Look, when I'm the one who makes the mistake, by all means pile on.
What do you think I'm doing right now!?
But in this case, I didn't. Bulb made the mistake. Call him an idiot for 47 posts straight.
Why? Unlike you, he didn't do anything idiotic.
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@gąska Jesus fuck.
Look, when I'm the one who makes the mistake, by all means pile on. But in this case, I didn't. Bulb made the mistake. Call him an idiot for 47 posts straight. Fuck off. You and Boomzilla both.
He's not a ginormous dickhole to people who make a fairly obvious typo, is he? Or the one who rages at people who provide helpful information? Hmm...maybe I should be spacing these out if we want to get to 47.
-
@laoc said in Big list of software that cannot handle spaces or accents in paths:
What you don't realize is that he owns the trademark!
I'm a trademark owner now? W00t!
-
@gąska said in Big list of software that cannot handle spaces or accents in paths:
Because it IS documented.
But he couldn't find it on MSDN!
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@gąska So
CODEPAGE
is the same thing asCODESET
? Is that your assertion?Short of telepathy, how was I expected to know that?
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
So the thing I saw on the screen is undocumented. And everything I posted here was exactly correct.
Of course. You would also be correct™ to bitch at people telling you C's
printf
takes a format string as the first argument that this is an undocumented feature and the official version only takes aconst char *
.
-
@gąska said in Big list of software that cannot handle spaces or accents in paths:
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@boomzilla So to be clear, in this case "good will" is assuming bulb is an idiot who doesn't type what he means to say.
The minimum of good will is not calling people idiots just because they've made a typo.
It's not exactly a typo, just a far more common but slightly imprecise term. "Codepage" and "Codeset" are often used synonymously but technically multibyte encodings like BIG5 don't have codepages nor are they codepages of any other coding system; UTF-8 also isn't a "codepage" in the narrow sense because it directly encodes Unicode.
soln_langinfo(CODEPAGE)
certainly works, although you can alsopassreceive names that don't technically refer to codepages. As should be totally clear from the docs, if @blakeyrat read them.
-
@laoc said in Big list of software that cannot handle spaces or accents in paths:
so ln_langinfo(CODEPAGE) certainly works
As long as you have CODEPAGE constant defined, and it is set to 0. AFAIK it's not the case in any standard library.
UTF-8 also isn't a "codepage" in the narrow sense because it directly encodes Unicode.
But it is in the sense that Windows defines a codepage 65001 for UTF-8 encoding.
-
@dcoder said in Big list of software that cannot handle spaces or accents in paths:
@bulb
nl_langinfo
can answer "What locale is in use right now?", but in my scenario we want an answer to "I have a sequence of bytes that represents a filename,"/usr/M\uC3A1rton/"
, is it in UTF-8 or some other encoding?", to support older/snowflake filesystems.You just assume the name is in locale encoding and if it is not, things may break and the user gets to keep both pieces.
Since the name is still just a string of bytes as far as all the system calls are concerned, many utilities can access files that can't be correctly printed in the current locale, so the user should be actually able to fix it if they know the originating codeset. But trying to guess can only ever break it more, because there is no way to actually tell. So you—or in this case Emacs—shouldn't try.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
But trying to guess can only ever break it more,
Some guesses are worse than others, much much worse. One of the best is ISO 8859-1, as that at least leaves you with a simple way of figuring out what the original bytes were; it's a recoverably bad guess unlike most of the alternatives (which tend to scramble whatever you've got).
-
@laoc said in Big list of software that cannot handle spaces or accents in paths:
It's not exactly a typo, just a far more common but slightly imprecise term. "Codepage" and "Codeset" are often used synonymously
@blakeyrat said in A fool and his not-really-money are soon parted:
@anotherusername said in A fool and his not-really-money are soon parted:
I didn't see any "
show interestcodepage" links,YES FELLOW ROBOT IT IS GOOD YOU TAKE INSTRUCTION SO LITERALLY BEEP BEEP. BUTTON LABELED "BUY NOW" MIGHT INDICATE INTEREST BUT SINCE IT IS NOT LITERALLY TITLED
SHOW INTERESTCODEPAGE YOU WERE CORRECT IN NOT CLICKING IT BEEP BEEP.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
the name is still just a string of bytes as far as all the system calls are concerned
The problem with this idea is that file names are parts of paths, and some characters in paths have special meanings. If you don't know the encoding of the byte string, you can't find these characters, nor correctly split the string of bytes where they are, so for some things you must know the encoding. If the file system can't tell you, you'll have to guess and things will break.
-
@ixvedeusi said in Big list of software that cannot handle spaces or accents in paths:
The problem with this idea is that file names are parts of paths, and some characters in paths have special meanings. If you don't know the encoding of the byte string, you can't find these characters, nor correctly split the string of bytes where they are, so for some things you must know the encoding. If the file system can't tell you, you'll have to guess and things will break.
EBCDIC paths are
-
@laoc said in Big list of software that cannot handle spaces or accents in paths:
EBCDIC paths
Yeah I suppose this may not be much of a problem very often with most encodings currently in use in practice, except of course for Windows WTF-16. But the fact that currently it happens to "mostly work" because particular choices have been made elsewhere doesn't make it any less of a design defect in my eyes.
-
@ixvedeusi said in Big list of software that cannot handle spaces or accents in paths:
@laoc said in Big list of software that cannot handle spaces or accents in paths:
EBCDIC paths
Yeah I suppose this may not be much of a problem very often with most encodings currently in use in practice, except of course for Windows WTF-16.
Ah yeah, true. At first I thought, WTF, basically everyone is using a superset of ASCII now—but yes, there's WTF-16. I think I should consider myself lucky to be able to forget it ^^
But the fact that currently it happens to "mostly work" because particular choices have been made elsewhere doesn't make it any less of a design defect in my eyes.
Yeah, many OSs can't all follow one design. But identifying path separators isn't usually the problem.
-
@ixvedeusi said in Big list of software that cannot handle spaces or accents in paths:
except of course for Windows WTF-16
Pet peeve: when people call the UCS-2-with-surrogates "WTF-16", despite the WTF being invented well over a decade after Windows went Unicode.
-
@gąska said in Big list of software that cannot handle spaces or accents in paths:
Pet peeve: when people call the UCS-2-with-surrogates "WTF-16"
https://simonsapin.github.io/wtf-8/ said:
As a result, surrogates do occur in practice and need to be preserved. For example:
- In ECMAScript (a.k.a. JavaScript), a String value is defined as a sequence of 16-bit integers that usually represents UTF-16 text but may or may not be well-formed.
- Windows applications normally use UTF-16, but the file system treats path and file names as an opaque sequence of WCHARs (16-bit code units).
We say that strings in these systems are encoded in potentially ill-formed UTF-16 or WTF-16.
There you have it, straight from the horse's mouth :face_with_stuck-out_tongue:
-
@ixvedeusi pet peeve: when creators of new technology rename established pieces of technology that has been there for well over a decade.
-
@gąska He didn't rename it, because “UCS-2-with-surrogates” was hardly ever established, well defined name.