"ASCII"

blek

ಠ_ಠ

sloosecannon

Inb4 review: "Too bad it's Unicode..."

Luhmann

@pjh
Now if we could change those to animated jif's we might have something

Arantor

@luhmann don't give the Unicode guys any more ideas because I bet in like Unicode 14, they'll be outlining characters that are animated and change rendering over time.

cvi

@arantor said in "ASCII":

@luhmann don't give the Unicode guys any more ideas because I bet in like Unicode 14, they'll be outlining characters that are animated and change rendering over time.

Eh ... at this point we probably should just embed something like postscript or SVG into unicode. It's where things are headed anyway.

JBert

@arantor Oh, that's simple, just insert a zero-width-join and between the emojis.

Arantor

@cvi except it's not. Unicode doesn't mandate a rendering, merely what the codepoints mean and can have a suggested rendering but there is no more to it than that.

U+0041 for example has a meaning. It means the capital form of the first letter of the Latin alphabet. It has a suggested form which in this particular font is rendered as 'A'. Other fonts will vary the appearance of this glyph, but they won't change its meaning. But for the standard to impose an appearance on a glyph goes beyond the remit of what Unicode is about.

cvi

@arantor You are right of course, I wasn't serious about the suggestion (and probably shouldn't have picked ps/svg, I was going for making it programmable [which is also not a suggestion to be taken seriously]. Blame lack of coffee.)

OTOH, it's getting close to imposing appearance in several places (IMO): there are some geometric shapes (e.g., U+25A0: BLACK SQUARE), which would be silly to render differently (I mean, you could make it non-square, but .... that would be missing the point). Plus there are the skin tone modifiers which I'd argue are also somewhere in between presentation meta-data and meaningful information about a glyph. Then there are characters like U+1F34E (RED APPLE), U+1F34F (GREEN APPLE); while rendering them red/green isn't a requirement, it certainly sounds like a strong suggestion.

Then again, embedding a rendering language would fix the problem with people missing their favourite glyphs...

Arantor

@cvi said in "ASCII":

I was going for making it programmable [which is also not a suggestion to be taken seriously]

But where's the fun in that?

@cvi said in "ASCII":

OTOH, it's getting close to imposing appearance in several places (IMO): there are some geometric shapes (e.g., U+25A0: BLACK SQUARE),

That particular example is more, I think, poorly named rather than anything else. The block it's part of has a set of 'black' and 'white' characters on the basis of filled vs not-filled rather than anything else. I think if you wrote those glyphs in another colour, they would correctly have the font and background colours that match the idea of 'filled' vs 'not filled' rather than 'black' vs 'white'.

As for the shape... I'm not so fussed about that. If you're going to define something as a block containing 'geometric shapes', a square is a logical choice. The same block has triangle, rectangle and others. There is some value in having a generic 'this is a square' symbol because there is value in having something fairly abstract that can be reused.

I guess my issue is more on the broader use factor than on 'it must not infer a shape' per se. On the one hand, we have squares. On the other we have 'rolling on the floor laughing' and 'face with hand over mouth' which have meaning and potentially inferred representations that aren't generic.

@cvi said in "ASCII":

Then there are characters like U+1F34E (RED APPLE), U+1F34F (GREEN APPLE); while rendering them red/green isn't a requirement, it certainly sounds like a strong suggestion.

See, this is something I have a problem with. What would be more logical would be to define an apple and then support colour variations with combining glyphs in the same way skin tones are handled, making them modifiers rather than explicit items.

Is there a semantic difference between a red and a green apple? If not, it has no place being a discrete character in the set.

Jaloopa

@cvi said in "ASCII":

there are some geometric shapes (e.g., U+25A0: BLACK SQUARE), which would be silly to render differently

cvi

@arantor said in "ASCII":

That particular example is more, I think, poorly named rather than anything else. The block it's part of has a set of 'black' and 'white' characters on the basis of filled vs not-filled rather than anything else.

Yeah. I agree with the poor naming, it seems much more of filled/non-filled as you say. Still the square should be square (even in a proportional font), and so on.

What would be more logical would be to define an apple and then support colour variations with combining glyphs in the same way skin tones are handled, making them modifiers rather than explicit items.

I kinda agree.

I guess the problem is that if colour variations were expressed as combining glyphs, there would be the question of what glyphs you're allowed to apply them on. Technically, there's nothing preventing them from being applied to any glyph. You'd then have colorization in Unicode, which may or may not be something desirable. 𝐀𝐟𝐭𝐞𝐫 𝗮𝗹𝗹, 𝘵𝘦𝘹𝘵 𝖿𝗈𝗋𝗆𝖺𝗍𝗍𝗂𝗇𝗀 𝒉𝒂𝒔 ｎｏ 𝓹𝓵𝓪𝓬𝓮 𝗂𝗇 𝖀𝖓𝖎𝖈𝖔𝖉𝖊.

Is there a semantic difference between a red and a green apple? If not, it has no place being a discrete character in the set.

I'm actually a bit curious about the rationale for inclusion. Is there any resource that documents why something is included (or not)? I don't see why the difference would be more significant than something like vs .

Arantor

@cvi there's already colorisation in Unicode - c.f. the skin tone combinations for faces. There's also specific meaning combinations for flags.

I don't have a problem with glyphs having semantic meaning even if the typical representation of that meaning happens to look like a letter - this is why we get duplicate-rendered characters that aren't technically the same as other letters, because there's multiple things that render like 'A' even though they have different semantic meanings. That's cool and all, that's what Unicode sets out.

@cvi said in "ASCII":

I'm actually a bit curious about the rationale for inclusion. Is there any resource that documents why something is included (or not)?

I don't think so, at least not for the earlier stuff. That said there's also an amount of compatibility, e.g. things that existed in the old code pages that needed to be extended forward one way or another. Think of all the line drawing characters in CP437 for example. These need to have a representation somewhere in the BMP.

cvi

@arantor said in "ASCII":

@cvi there's already colorisation in Unicode - c.f. the skin tone combinations for faces.

Indeed, but the skin tone combinations are (AFAIK) only valid for the faces. That at least seems like a set that's possible to define. But for general colors (like green, red, or possibly even arbitrary RGB) .. would you restrict them to a certain set of glyphs, or just let them be applied to anything?

Arantor

@cvi I'd restrict them to the codepoints where it makes sense. And I certainly wouldn't make it RGB because rendering is hard - and there's no need to make it needlessly complicated ;)

I think Unicode has elements of scope creep though...

pcooper

Unicode has all the advantages of being designed by a worldwide committee and all the disadvantages of being designed by a worldwide committee.

If you're actually curious about the rationale of why certain decisions were made, the archives of the Unicode Mailing List are public, and go back a long way. Good luck finding what you want, but it's probably in there somewhere.

cvi

@arantor said in "ASCII":

@cvi I'd restrict them to the codepoints where it makes sense.

Great. Another table of stuff to lug around and keep updated when "parsing" Unicode. ;-)

Not sure why RGB would that much harder compared to a few discrete colours. In the attempts at rendering unicode I've seen (which are rather basic ones TBF), if there's support for per-glyph colors, there's basically already support for full RGB. But YMMV. (Colorizing just a part of the glyph, as seen in e.g. the apples and in some of the emojis with skin tone, seems much more invasive.)

Arantor

@cvi this is the kind of thing that libraries and language-level support should tackle, not us peons writing higher level code.

cvi

@arantor I'm a C++ person, mostly. This "language-level support" thing that you mention, what's that? ()

I've occasionally have to detour into low-level land, so this kind of stuff worries me, because I might end up having to deal with it at some point. (Besides, somebody has to write the libraries too. They have a hard enough time as it is.)

Arantor

@cvi Toby Faire, it's not my problem as a PHP dev to sort out Unicode support. That's what the intl library in PHP exists for - the people who maintain it have my deepest sympathies but frankly it's not my problem.

If every dev that so much as farted in the general direction of Unicode had to understand it in order to use it, no-one would ever support it.

Gurth

@arantor said in "ASCII":

@cvi said in "ASCII":

Then there are characters like U+1F34E (RED APPLE), U+1F34F (GREEN APPLE); while rendering them red/green isn't a requirement, it certainly sounds like a strong suggestion.

See, this is something I have a problem with. What would be more logical would be to define an apple and then support colour variations with combining glyphs in the same way skin tones are handled, making them modifiers rather than explicit items.

Was it introduced before or after the skin tone variations? If before, then there’s the explanation, probably (though it doesn’t explain why they didn’t then also add “skin” tone to the apples).

Arantor

@gurth Probably before. Doesn't make it cromulent though. *pouts*

Bulb

@jaloopa said in "ASCII":

Unfortunately, in this forum and with this image set, does, in fact, look like an eight-pointed black start—where black means filled and it is white ink on orange background, but it is an eitght-pointed black star. In contrast the actually had something like dashed diagonal cross.

Bulb

@cvi said in "ASCII":

Is there a semantic difference between a red and a green apple? If not, it has no place being a discrete character in the set.

I'm actually a bit curious about the rationale for inclusion. Is there any resource that documents why something is included (or not)? I don't see why the difference would be more significant than something like vs .

Histerical raisins. One of the stated goals of Unicode is to subsume all legacy encodings. Long ago, bunch of Japanese mobile operators went nuts and encoded all kinds of like , , or . And then Apple came around with their iCrap and it was unsellable in Japan unless it could send these in SMS. But because it was already fully Unicode, they didn't want to use the legacy encodings, so they dutifully proposed it to be included there and because it was present in existing encodings, it got accepted.

And then since it was added to Unicode, it became available worldwide, so more people got emoji-crazy and started demanding addition of more and more. And the committee just accepts it because why not—and because the people in it actually come from the companies that have some financial interest in catering to the craze.

(or approximately along those lines)

boomzilla

@bulb said in "ASCII":

@jaloopa said in "ASCII":

Unfortunately, in this forum and with this image set, does, in fact, look like an eight-pointed black start—where black means filled and it is white ink on orange background, but it is an eitght-pointed black star. In contrast the actually had something like dashed diagonal cross.

If it were just a font and the colors determined by CSS, user theme, etc, I could see your point. As is actually an image that literally has its colors defined in its data your post makes no sense.

Jaloopa

@boomzilla said in "ASCII":

where black means filled and it is white ink

Yeah, and this forum is full of civilised discourse, where civilised means a wretched hive of scum and villany

Bulb

@boomzilla It's still at least an eight-pointed star. The cross wasn't even that.

boomzilla

@bulb Yes, it's definitely an improvement over that.

masonwheeler

Wow, this whole thread just makes me think, "ASCII stupid question, get a stupid ANSI."

Medinoc

@cvi said in "ASCII":

@arantor said in "ASCII":

@cvi I'd restrict them to the codepoints where it makes sense.

Great. Another table of stuff to lug around and keep updated when "parsing" Unicode. ;-)

Not sure why RGB would that much harder compared to a few discrete colours. In the attempts at rendering unicode I've seen (which are rather basic ones TBF), if there's support for per-glyph colors, there's basically already support for full RGB. But YMMV. (Colorizing just a part of the glyph, as seen in e.g. the apples and in some of the emojis with skin tone, seems much more invasive.)

I'd say it depends on what exactly you mean by "arbitrary RGB". The usual 3×8 bits RGB requires 24 bits, which means it could not be encoded in a single codepoint (since only 20.1 bits are supported). So either one needs multiple combining "color channel+value" codepoints, or we'd have to restrict colors to a smaller subset (3×2 bits for example, is already more than the classic CGA palette, and more than an Amstrad CPC's full palette).

dkf

@medinoc said in "ASCII":

The usual 3×8 bits RGB requires 24 bits, which means it could not be encoded in a single codepoint (since only 20.1 bits are supported).

On the basis of past behaviour by the Unicode Consortium, they'll just declare that they will no longer be sticking to the guarantee to keep codepoints no larger than 0x10FFFF and then they'll have plenty of room at a cost of seriously annoying anyone who works with Unicode in computers.

Again.

cvi

@medinoc said in "ASCII":

So either one needs multiple combining "color channel+value" codepoints

Eh, you could potentially do "rgb[a] codepoint+value+value+value[+value]" for a total of 1+256 codepoints. Your suggestion would be 3+256 (4+256 if you want alpha). If you want to avoid state, you could still get away with 3*256 (4*256 with alpha) by having 256 values for each channel (resulting in "Rvalue+Gvalue+Bvalue"). Not sure which is the least insane.

If you want to piss off @dkf ... you'd just add the 16M codepoints to cover all of eight-bit RGB (getting RGBA to fit might be a bit more effort).

Khudzlin

@dkf said in "ASCII":

On the basis of past behaviour by the Unicode Consortium, they'll just declare that they will no longer be sticking to the guarantee to keep codepoints no larger than 0x10FFFF and then they'll have plenty of room at a cost of seriously annoying anyone who works with Unicode in computers.

Again.

The 0x10FFFF cap exists only because of UTF-16, which became so widely used only because of the previous 0xFFFF cap (UCS-2). UTF-8 can easily deal with 31 bits.

Medinoc

@cvi said in "ASCII":

@medinoc said in "ASCII":

So either one needs multiple combining "color channel+value" codepoints

Eh, you could potentially do "rgb[a] codepoint+value+value+value[+value]" for a total of 1+256 codepoints. Your suggestion would be 3+256 (4+256 if you want alpha). If you want to avoid state, you could still get away with 3*256 (4*256 with alpha) by having 256 values for each channel (resulting in "Rvalue+Gvalue+Bvalue"). Not sure which is the least insane.

The latter is what my initial idea was leaning towards (768 new codepoints), but 3+256 sounds much less wasteful.

Kian

@khudzlin UTF-8 has 3+6+6+6 bits available at most, so 21, not 31. The first byte tells you how long the encoded codepoint is (4 bytes means a prefix of 11110, leaving only 3 bits for the codepoint, and each following byte has a prefix of 10, leaving 6 bits). Of course, any reasonable utf-8 validator will also check that the codepoint is below 0x10FFFF, so it all needs to be rewritten if it's changed.

Khudzlin

@kian Before RFC 3629 (which capped codepoints at 0x10FFFF), the specification for UTF-8 allowed 5-byte sequences prefixed by 111110 (allowing 2+6+6+6+6 = 26 bits) and 6-byte sequences prefixed by 1111110 (allowing 1+6+6+6+6+6 = 31 bits).

Zecc

@cvi said in "ASCII":

If you want to piss off @dkf ... you'd just add the 16M codepoints to cover all of eight-bit RGB (getting RGBA to fit might be a bit more effort).

Don't forget the need to support other colorspaces!
Unicode isn't applicable only to screens.

dkf

@zecc said in "ASCII":

Don't forget the need to support other colorspaces!

Also, we shouldn't be discriminating against tetrachromats.

cvi

@zecc Also, why is everybody disregarding alpha? Transparency is an important virtue when it comes to communication.

LB_

@khudzlin said in "ASCII":

UTF-8 can easily deal with 31 bits.

It can also be easily modified to deal with infinite bits.

bb36e

In 10 years unicode will be a reimplementation of some subset of HTML styling

we will finally be able to use <font> and <colour> without HTML

pie_flavor

@bb36e said in "ASCII":

In 10 years unicode will be a reimplementation of some subset of HTML styling

we will finally be able to use <font> and <colour> without HTML

In 10 years, given everyone's apparent obsession with using old stupid things instead of new fancy things (see: editors over IDEs, git over being sane, etc.), we'll be right back to using ASCII.

boomzilla

@pie_flavor said in "ASCII":

we'll be right back to using ASCII

That's how I use UTF-8.

cvi

@pie_flavor If you're suggesting that people will go back to being sane, I'd suggest that you're (a) overly optimistic, and (b) unfamiliar with history.

pie_flavor

@cvi No, I'm saying they're going back to using git.

bb36e

@pie_flavor ascii embraces the Unix philosophy, this is why our program is only available in North America (Canada, Mexico excluded)

Medinoc

@bb36e Maybe this time we'll finally get something that accepts a UTC timestamp and is to be rendered in the renderer's local time! No more having to tell your timezone to the server!

cvi

@pie_flavor Back to git? Isn't git one of the more recent inventions in source control? I mean there's fossil, but I've never met anybody actually using that.

HardwareGeek

@cvi said in "ASCII":

I mean there's fossil, but I've never met anybody actually using that.

They're all dead.

Tsaukpaetra

@arantor said in "ASCII":

Is there a semantic difference between a red and a green apple?

One tastes better than the other.

"ASCII"

**ಠ_ಠ**

ಠ_ಠ