Shame on the majority of the internet for not building in support for the character ‪‫‬‭‮‪‫‬‭‮҉

asuffield

@VGR said:

@asuffield said:
@VGR said:
What you are in effect saying is that there should be no such thing as a plain text document. I think an explicit goal of Unicode was to allow plain text documents, without any markup.
Right, mostly. The goal was to allow plain text documents to have features of a layout engine without actually having a layout engine. I'm saying that this was a bad idea. The sane thing to do would have been to say that a plain text document would either be entirely displayed left-to-right or entirely right-to-left, and if you want anything more than that, use something smarter than plain text.

That may be okay for Latin languages, but right-to-left languages commonly embed Latin characters. The idea that they cannot make use of plain text files probably doesn't sit well with technical people who use those languages.

Yes, which is how this entire mess got started: people who were unwilling to deal with the reality of the situation, that the requested set of features is contradictory. You cannot get a sane solution until you accept that plain text is inadequate to the task.

Consider a simpler example: people want to use bold, italics, and variable font sizes. They also want to use plain text. Should we therefore design character sets with plain/bold/italic/bold+italic versions of each character, and repeat the entire set several times over for each font size? Should we extend character sets with magic character prefixes that represent these other forms? Or should we just tell them "stop being a dick and get used to the idea that you can't use plain text for this"?

It's one of those "old shoe or glass bottle?" things.

asuffield

@Random832 said:

So what's wrong with providing a standard for the behavior of such layout engines, so that people won't be surprised moving from one program to another?

The HTML example earlier is what's wrong with it. Jamming this behaviour in on the character set level breaks real layout engines at higher levels, and effectively precludes providing a smarter, more flexible layout engine. For example, it is impossible to factor out this functionality into CSS.

The problem with most attempts to make all software "unsurprising" is that you limit everything to the lowest common denominator.

The goal was to make plain text smarter.

A goal which many people have chased over the past 20 or 30 years. Every single time, it has been a terrible idea. The reason why plain text is useful is because it is not "smart".

I don't want to live in a "sane" world.

Random832

@asuffield said:

@Random832 said:
The goal was to make plain text smarter.
A goal which many people have chased over the past 20 or 30 years. Every single time, it has been a terrible idea. The reason why plain text is useful is because it is not "smart".

Was using a control character for line breaks instead of just having 80-column cards padded with blank space a bad idea?

Was the soft hyphen a bad idea?

Was letting words automatically wrap at spaces, and having a special different space for it to not wrap, a bad idea?

Plain text IS smarter than it used to be.

Random832

@asuffield said:

The HTML example earlier is what's wrong with it. Jamming this behaviour in on the character set level breaks real layout engines at higher levels, and effectively precludes providing a smarter, more flexible layout engine. For example, it is impossible to factor out this functionality into CSS.

Actually, it _can_ be factored into CSS. The "direction:" and "unicode-bidi" properties have all the same functionality. And they should be used instead. But you're condemning not ONLY those, but even just the idea of being able to have a run of latin [or numbers] within a hebrew sentence without extraneous markup.

Cap_n_Steve

@asuffield said:

@VGR said:
@asuffield said:
@VGR said:
What you are in effect saying is that there should be no such thing as a plain text document. I think an explicit goal of Unicode was to allow plain text documents, without any markup.
Right, mostly. The goal was to allow plain text documents to have features of a layout engine without actually having a layout engine. I'm saying that this was a bad idea. The sane thing to do would have been to say that a plain text document would either be entirely displayed left-to-right or entirely right-to-left, and if you want anything more than that, use something smarter than plain text.

That may be okay for Latin languages, but right-to-left languages commonly embed Latin characters. The idea that they cannot make use of plain text files probably doesn't sit well with technical people who use those languages.
Yes, which is how this entire mess got started: people who were unwilling to deal with the reality of the situation, that the requested set of features is contradictory. You cannot get a sane solution until you accept that plain text is inadequate to the task.
Consider a simpler example: people want to use bold, italics, and variable font sizes. They also want to use plain text. Should we therefore design character sets with plain/bold/italic/bold+italic versions of each character, and repeat the entire set several times over for each font size? Should we extend character sets with magic character prefixes that represent these other forms? Or should we just tell them "stop being a dick and get used to the idea that you can't use plain text for this"?
It's one of those "old shoe or glass bottle?" things.

That's a good point, [b]are[/b] there bold and italic characters in Unicode?

dhromed

@Cap'n Steve said:

That's a good point, [b]are[/b] there bold and italic characters in Unicode?

The argument of having plaintext support for bold/italics is not quite the same are the argument about layout characters in Unicode. Bold and italics versions are physically different font files eg. glyph sets. The relation between the "normal" and "bold" version of a font is entirely dependent on the matching font name in the file's meta data.

Fuckups of this match is the reason I have 2,583 Helvetica weights in Photoshop instead of 1 Helvetica font with 2,583 variants in the second dropdown. In the same vein, my Mrs. Eaves font is a ligatures set in "normal", but small-caps in "bold". It's fucked. There really is no explicit way of saying "This single font file is a complete set of its variants".

Thart's how it is on Windows, anyway. :)

I argue that there is no semantic difference whatsoever between a control character, and semantic markup fed into a layout engine. \r\n and CRLF mean the same as <br />. All plaintext renderers (including notepad) are already, by necessity, layout engines -- even if they are rudimentary ones compared to, say, an ML/CSS parser+renderer.

So where do you draw the line between rudimentary layout (expressed as characters, i.e. by Unicode) and advanced layout (expressed as explicit commands in some form of source code*)?

I remember Ye Olde Wordperfect with its underwater view: the line was drawn at the absolute plaintext zero. Everything that concerned layout, even (soft) breaks, was visible as a tag.