Big list of software that cannot handle spaces or accents in paths
-
@Unperverted-Vixen said in Big list of software that cannot handle spaces or accents in paths:
@Tsaukpaetra Half comedy site, half insane asylum. Depends on which poster you're talking to.
I talk to this poster:
-
@ben_lubar said in Big list of software that cannot handle spaces or accents in paths:
@LB_ said in Big list of software that cannot handle spaces or accents in paths:
@marczellm that's an interesting date format. I typically just opt for ISO-8601.
My favorite date format is YYYY-MM-dd, where YYYY is the year, MM is the minute, and dd is the day.
Most people will assume you meant ISO-8601 until it's too late!
Don't you mean YYYY-mm-dd? (or YYYY-MI-DD for Oracle)
-
@ben_lubar said in Big list of software that cannot handle spaces or accents in paths:
My favorite date format is YYYY-MM-dd, where YYYY is the year, MM is the minute, and dd is the day.
Use
YYYY-ww-DD
;ww
is the week number andDD
is the number of the day in the year, so the format actually has lots of information. And lots of programmers will really hate you!(Me? I just hate date and time formats…)
-
@Jaloopa I wasn't complaining, I was explaining.
-
@gwowen oh right, I guess I'm just a big stupid dumb wrong moron. I'm so glad I come on here for you to pile on and tell me how stupid dumb wrong I am
-
@Jaloopa ??? Where did that come from ???
-
@dkf down with the oppressive Month-iarchy!
-
@gwowen seemed more fun than a . Those posts were pure Blakeymeme.
-
Tell you what, put in
1712-02-30
-
@Medinoc said in Big list of software that cannot handle spaces or accents in paths:
Speaking of, I got curious and tried googling WTF-16, and I found this:
The WTF-8 encoding Wobbly Transformation Format
Is that the format that the TARDIS uses?
No, theirs is the Wibbly-Wobbly.
-
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
@Jaloopa ??? Where did that come from ???
Shoulder aliens.
-
@boomzilla said in Big list of software that cannot handle spaces or accents in paths:
@gwowen said in Big list of software that cannot handle spaces or accents in paths:
@Jaloopa ??? Where did that come from ???
Shoulder
aliensforeign friends.FTFY
-
@blakeyrat said in Big list of software that cannot handle spaces or accents in paths:
@RaceProUK Saying I wasn't sure if it was UTF-16 or WTF-16 is this thing called a "joke". You see those sometimes on comedy sites.
I thought WTF-16 was a joke on UCS-2 (because it's a WTF that looks like UTF-16 and was used in older versions of java, I think)
-
@wharrgarbl said in Big list of software that cannot handle spaces or accents in paths:
I thought WTF-16 was a joke on UCS-2 (because it's a WTF that looks like UTF-16 and was used in older versions of java, I think)
There is some of that, and there is also the general problem that Unicode just isn't staying simply representable using 16-bit quantities (and the costs of going to the next technically-capable unit up — 32-bits per character — are still annoyingly high). And there's some jerks on the Unicode side who seem to claim that any kind of compromise is impossible and who don't see that there could possibly be annoying costs associated with using UTF-8 everywhere. (All while filling the place up with more ways to represent and so on, together with a zillion ways to write some visual characters, since the internet needs a lot more of those sorts of things.)
Unicode. Fucking annoying. But still better than what went before.
-
@dkf Fuck unicode, ISO-8859-1 has everything I need
-
@wharrgarbl said in Big list of software that cannot handle spaces or accents in paths:
@dkf Fuck unicode, ISO-8859-1 has everything I need
Not if you want to talk to @GÄ…ska.
-
-
@dreikin said in Big list of software that cannot handle spaces or accents in paths:
@wharrgarbl said in Big list of software that cannot handle spaces or accents in paths:
@dkf Fuck unicode, ISO-8859-1 has everything I need
Not if you want to talk to @GÄ…ska.
Then you use ISO-8859-2. Or call @G±ska.
@moderator I Can Haz Another Alias? KTHXBAI
Edit: Windows users probably want to mention @GÂąska.
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
And there's some jerks on the Unicode side who seem to claim that any kind of compromise is impossible and who don't see that there could possibly be annoying costs associated with using UTF-8 everywhere.
...or simply that the costs are still far lower than anything else that has been developed so far. UTF8 Everywhere is, simply put, the best bad option we have.
-
@masonwheeler said in Big list of software that cannot handle spaces or accents in paths:
...or simply that the costs are still far lower than anything else that has been developed so far. UTF8 Everywhere is, simply put, the best bad option we have.
Like democracy?
-
@marczellm it's surprisingly good metaphor.
Democracy: originally meant to let everyone chime in on important stuff going on in their country, today it's a small group of poiticians that produces hundreds of new law bills no one ever wanted, while ignoring actual problems.
Unicode: originally meant to let everyone write and read text in any language they want without encoding errors, today it's a small group of Unicode Consortium members that produces hundreds of new emoji no one ever wanted, while ignoring actual problems like missing characters in non-CJK Asian languages.
-
@gÄ…ska said in Big list of software that cannot handle spaces or accents in paths:
Democracy: originally meant to let everyone chime in on important stuff going on in their country, today it's a small group of poiticians that produces hundreds of new law bills no one ever wanted, while ignoring actual problems.
I don't want to derail the discussion too much, but I hate blanket statements like that. You're missing half of the point of democracy, which is not only about the right to vote, but also about the right to get involved and get elected yourself.
It's easy and slightly hypocritical to bitch about "those politicians", while at the same time refusing to get involved in politics. It's your right and duty as a citizen of a democratic country to make sure that the country is not ruled by some elitist circle that doesn't actually represent the population. In principle, the status quo is your fault as much as it is theirs.
Also, people quickly forget both Hanlon's razor and the fact that people cannot possibly be infallible when politicians are involved.
-
@masonwheeler said in Big list of software that cannot handle spaces or accents in paths:
...or simply that the costs are still far lower than anything else that has been developed so far. UTF8 Everywhere is, simply put, the best bad option we have.
It's not too bad at all for an interchange format, but it pretty awful for actual text manipulation due to the way that it requires much more processing for many common operations. So yes, put UTF-8 on disk or over the network (and other such similar situations) but transform to something else inside a program if you really need to work with characters.
Theoretically the transform could be done with something rope-like with an index tree pointing into UTF-8 data so that indexing can be fast, but I can't recall seeing anyone actually do that for real when they could just use a fat array of
uint32_t
(oruint16_t
way back in the day).
-
@asdf said in Big list of software that cannot handle spaces or accents in paths:
It's your right and duty as a citizen of a democratic country to make sure that the country is not ruled by some elitist circle that doesn't actually represent the population.
How?
-
@asdf said in Big list of software that cannot handle spaces or accents in paths:
I don't want to derail the discussion too much, but I hate blanket statements like that. You're missing half of the point of democracy, which is not only about the right to vote, but also about the right to get involved and get elected yourself.
Point that makes democracy better is that power is now disputed on money instead of blood. The rest is all bullshit.
-
@antiquarian said in Big list of software that cannot handle spaces or accents in paths:
@asdf said in Big list of software that cannot handle spaces or accents in paths:
It's your right and duty as a citizen of a democratic country to make sure that the country is not ruled by some elitist circle that doesn't actually represent the population.
How?
By exercising your right to vote and get elected, your freedom of speech and your right of assembly. This is why you have those!
I'm not saying it's easy to be heard as a single citizen, and it's not supposed to be (otherwise nothing would ever get done), but it's still your job as a citizen to exercise those rights to change something if you think that there is a major problem.
-
@asdf said in Big list of software that cannot handle spaces or accents in paths:
By exercising your right to vote and get elected, your freedom of speech and your right of assembly. This is why you have those!
But people have been doing that for years, and it hasn't prevented the situtation @GÄ…ska mentioned.
-
@antiquarian
Who? Where? How many people? Trying to change what exactly?This is a null argument, because you're just throwing a statement out there that cannot possibly be proven or refuted. I'd say that about 80% of the problems people attribute to democracy are actually caused by lack of participation and interest in the political process.
But let's take this argument to the garage if you want to continue it. @wharrgarbl already created a thread there.
-
@dkf Ain't no representation makes all the special snowflake language rules easy to process.
-
@asdf said in Big list of software that cannot handle spaces or accents in paths:
I don't want to derail the discussion too much, but I hate blanket statements like that. You're missing half of the point of democracy, which is not only about the right to vote, but also about the right to get involved and get elected yourself.
It's easy and slightly hypocritical to bitch about "those politicians", while at the same time refusing to get involved in politics. It's your right and duty as a citizen of a democratic country to make sure that the country is not ruled by some elitist circle that doesn't actually represent the population. In principle, the status quo is your fault as much as it is theirs.
So you're saying that we should all join the Unicode Consortium?
Anyway, TIL that this is a thing:
-
@antiquarian said in Big list of software that cannot handle spaces or accents in paths:
@asdf said in Big list of software that cannot handle spaces or accents in paths:
It's your right and duty as a citizen of a democratic country to make sure that the country is not ruled by some elitist circle that doesn't actually represent the population.
How?
By shooting the elitists?
-
@asdf said in Big list of software that cannot handle spaces or accents in paths:
I don't want to derail the discussion too much, but I hate blanket statements like that. You're missing half of the point of democracy, which is not only about the right to vote, but also about the right to get involved and get elected yourself.
If you want to run for office, you are probably a psychopath. Most people do not want to risk losing friends and family and do evil stuff to further their own agenda.
-
@jaloopa said in Big list of software that cannot handle spaces or accents in paths:
Companies don't tend to survive if they're as actively outright hostile as some open source maintainers
:oracle:
-
@magnusmaster said in Big list of software that cannot handle spaces or accents in paths:
If you want to run for office, you are probably a psychopath. Most people do not want to risk losing friends and family and do evil stuff to further their own agenda.
E_NON_SEQUITUR
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
It's not too bad at all for an interchange format, but it pretty awful for actual text manipulation due to the way that it requires much more processing for many common operations.
Well, most of those operations don't make sense for all those scripts.
@greybeard said in Big list of software that cannot handle spaces or accents in paths:
Ain't no representation makes all the special snowflake language rules easy to process.
Exactly. So utf-8 is usually as good as any.
Because when you need to actually work inside the words of the text, codepoints don't mean a thing and you have to work in (extended) graphemes, which are multi-unit in any and every encoding anyway, and if you only work by delimiters, you can just search for them as bytes (or byte strings, if they are not ASCII), because in utf-8 a sequence encoding one character can't ever appear as part of those encoding different ones.
So Rust actually does work on utf-8 internally always except for the purpose of interfacing with wtf-16 and ucs-4 interfeces. They are indexed by byte indices, so you just have to keep in mind an arbitrary number may not be a valid index (using one will throw a panic), but because of the rules it does not make sense ones not obtained by some kind of search anyway, so it turns out not to be a problem. And parsing libraries take advantage of the fact searching for utf-8 string is equivalent to searching for byte string. The regex library is actually pretty good at it.
And it's not just Rust. Go uses utf-8 internally as well and so do some other new languages like Julia or Nim. And some older languages—OK, the Unicode support is somewhat hacked on on to Perl. And C/C++ frameworks, for example Glib. Basically since programmers actually learned about all those rules, most of them decided that utf-8 is the right way to go everywhere.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
interfeces
I'm adopting that word.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
And it's not just Rust. Go uses utf-8 internally as well and so do some other new languages like Julia or Nim. And some older languages—OK, the Unicode support is somewhat hacked on on to Perl. And C/C++ frameworks, for example Glib. Basically since programmers actually learned about all those rules, most of them decided that utf-8 is the right way to go everywhere.
UTF-8 would be fine except it has a nasty habit of increasing the complexity of algorithms (because you lose the ability to assume that characters are of an equal width, forcing linear scans in annoying places). Logically, it's better to define your string to be a sequence of unicode characters and say that the in-memory representation might be either UTF-8 or a fixed-width character form, with behind-the-scenes mutating between the two to promote to the most efficient internal representation for the particular mix of operations. Which sounds a lot more complex than it actually is, but does mean that pointers into the string become unsafe (though indexes are fine).
There are some other subtleties too, but they don't apply so widely.
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
pointers ... become unsafe
Become?
-
@jaloopa If you're using a single representation, pointers work until you have a defined event that invalidates the whole string. When you have a mutating internal representation, only the implementation code itself (i.e., the thing that understands when mutation occurs) can safely use pointers. It sounds like a trivial difference, but it really isn't.
-
@dkf I'm just coming from managed language privilege, where any raw access to pointers is
-
@jaloopa At least in C#, you have to explicitly turn off the safeties to use raw pointers.
-
@raceprouk and in my years of C# coding, I have never needed to do that
-
@jaloopa IMO, if you ever find you do need to do it, then you're using the wrong language in the first place
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
Logically, it's better to define your string to be a sequence of unicode characters
TDEMSYR. There is no use for “unicode character”. Hell, that term is not even well defined!
@dkf said in Big list of software that cannot handle spaces or accents in paths:
fixed-width character form
That's the thing. Such thing does not exist.
@dkf said in Big list of software that cannot handle spaces or accents in paths:
(though indexes are fine
No, they ain't! Remember,
@bulb said in Big list of software that cannot handle spaces or accents in paths:
They are indexed by byte indices
-
@jaloopa said in Big list of software that cannot handle spaces or accents in paths:
@dkf said in Big list of software that cannot handle spaces or accents in paths:
pointers ... become unsafe
Become?
Rust does have safe pointers. With this, they would become impossible.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
@dkf said in Big list of software that cannot handle spaces or accents in paths:
Logically, it's better to define your string to be a sequence of unicode characters
TDEMSYR. There is no use for “unicode character”. Hell, that term is not even well defined!
In fact, not defining your strings to be sequences of unicode “codepoints” is a feature, because that way it does not promote the flawed idea that there useful units of fixed size that make sense individually.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
TDEMSYR. There is no use for “unicode character”. Hell, that term is not even well defined!
The Unicode consortium has defined a mapping from abstract characters to codepoints (integer numbers in the range 0 to 0x10FFFF, excluding 0xD800 to 0xDFFF and a few other values such as 0xFFFE). Those are — in any reasonable definition of english words — the Unicode characters, and they are independent of the way in which they are encoded.
-
@bulb said in Big list of software that cannot handle spaces or accents in paths:
Rust does have safe pointers. With this, they would become impossible.
But then you'd have Rust on you.
-
@dkf said in Big list of software that cannot handle spaces or accents in paths:
@bulb said in Big list of software that cannot handle spaces or accents in paths:
TDEMSYR. There is no use for “unicode character”. Hell, that term is not even well defined!
The Unicode consortium has defined a mapping from abstract characters to codepoints (integer numbers in the range 0 to 0x10FFFF, excluding 0xD800 to 0xDFFF and a few other values such as 0xFFFE). Those are — in any reasonable definition of english words — the Unicode characters, and they are independent of the way in which they are encoded.
No. Unicode consortium has defined mapping from a various glyphs, their parts, compositions and variants, in various combinations as they were used in legacy encodings, to codepoints. And then defined various other rules and algorithms in terms of those codepoints. That does not make individual codepoints correspond to “characters” in any useful sense of the word. Because they don't. There are “characters” that are encoded with multiple codepoints, there are codepoints that encode multiple “characters”, there are characters that can be encoded by multiple codepoint sequences, be it due to normalization, presence of presentation forms, or both etc. Even “graphemes”, which are actually well defined and useful, are basic and extended!
True, “unicode character” is generally recognized synonym of “unicode codepoint”. That does not match any “reasonable definition of English words”. Because “reasonable definition” of “character”, in English, is that it is synonym for “grapheme”—and this “Unicode character” definitely isn't.
-
@bulb so can we do, like unicode_real_string where we do actually define every character in a single master list without any of this basic/extended nonsense?