Am I the only one who knows what "strongly-typed" even means anymore???

xaade

Doesn't convince me at all. To me, characters are still members of an alphabet, and have an order, but nothing even remotely similar to an inherent numerical value which would be necessary to sensibly define a character distance.

Oh god, it's a tangent, ok?

The most important point of this whole thing...

Don't treat a character as a number, and don't give access to its implementation details.

If for some god-awful reason that makes no sense to @PWolff you must have the characters represented as a number, provide an interface that does the conversion.

Fucking hell....

Gribnit

But this doesn't even return from the tangent completely! Representing a character as a number is in significant ways orthogonal to whether it is useful to be able to use the '-' operator to obtain a character difference - which is itself a separate point from whether that difference can be meaningfully represented as an integer. The snippet of code presented, was taking a character difference, mind..

It seems that there are significant and useful subsets of unicode for which an ordering is valid - so it seems that it is valid and consistent to use a '-' operator to obtain a difference between two characters which are members one such subset, and that such a difference can be dealt with in a type-safe fashion.

ScholRLEA

@Magus said:

Yes, I am aware that 'alpha' and 'beta' are Egyptian words.

Oh? I thought they were from Phoencian script, ('alep', a cognate of the Hebrew 'aleph' and meaning 'ox' in both languages, and 'bet', a cognate of 'beth', meaning 'house', again in both Phoenician and Hebrew). IIRC (and Wicked-pedo backs me up, for what that's worth), the hypothesis that they came out of Ancient Egyptian is no longer considered correct, but the earliest connection anyone can make is to 'Proto-Canaanite', which is a lot later than either Egyptian heiroglyphs or Sumerian cuneiform (sometimes argued as another possible source). Neither of them seem connected to the Minoan scripts, either, though it is hard to say because there are so few samples of them that only one of them (Linear B) is even partially interpretable.

In any case, as has already been said, we're talking 'alphabets' in the Information Theoretical sense of a set of symbols with an accepted meaning or interpretation, not the philological sense. By that definition, any script with a known meaning, regardless of its form, is an alphabet, so long as there are a finite number of accepted members of that script, including ideograms, hieroglyphics, and syllabaries (well, ideograms could be argued, as there are some generational rules for creating new ones based on existing radicals, but the point remains).

Magus

No, you have it all wrong. Xenu gave us the words. You probably just don't have a high enough thetan level to know that.

ScholRLEA

Well, you know, getting blown up in a supervolcano tends to mess with one's memories.

another_sam

@flabdablet said:

Unicode is more than just a set

See my second comment in the post you're replying to: You're assuming the set of all Unicode code points. You're going to build substandard software if you make assumptions like this. While you can build a working implementation of the set of all Unicode code points and use that for your set operations, it's of extremely limited usefulness.

@flabdablet said:

natural intepretation of the subtraction operator applied to characters

doesn't exist. See my first comment in the post you're replying to. I don't want to go all @blakeyrat on you, but please read what you're replying to.

ben_lubar

Cool has a charAt method on its String class, but it doesn't have any way of turning numbers into a string.

dkf

Somewhere in this thread I think we all got a bit lost. Yes, alphabetic distance is only defined with respect to a particular language/locale, but there are good uses for sometimes ignoring that stuff and just using the codepoints. For example, there are sorting and searching algorithms which use that sort of thing (e.g., for building a distance estimator for a binary search). The calling code doesn't need to know that things are ordered, but the algorithms themselves care and can go incredibly fast because of it.

It's probably better to avoid the word “letters” and “alphabet” when talking about computers. Instead, talk about characters, character sets, codepoints, charsets (this is what Unicode really is, BTW), codings and glyphs.

another_sam

@dkf said:

Somewhere in this thread I think we all got a bit lost.

That never happens here!

flabdablet

@another_sam said:

You're assuming the set of all Unicode code points

No, I'm saying that specifying Unicode is reasonable design.

Zecc

@ben_lubar said:

Cool has a charAt method on its String class, but it doesn't have any way of turning numbers into a string.

What? No toString method or string formatting? That doesn't sound cool at all.

ben_lubar

Well, there is a toString method on Int.

	// toString returns a decimal string representation of the integer.
	override def toString() : String =
		if (this < 0)
			"-".concat({
				var n : Int = -this;
				if (n < 0)
					"2147483648"
				else
					n.toString()
			})
		else {
			var digits : String = "0123456789";
			var s : String = "";
			var n : Int = this;
			while (0 < n) {
				var n10 : Int = n / 10;
				var d : Int = n - n10 * 10;
				s = digits.substring(d, d + 1).concat(s);
				n = n10
			};
			if (s.length() == 0)
				"0"
			else
				s	
		};

Jaime

@dkf said:

For example, there are sorting and searching algorithms which use that sort of thing (e.g., for building a distance estimator for a binary search).

Fine, let the people implementing these edge cases use the special calls (.ToUnicodeCodePoint() or whatever) and make the "-" operator throw a compiler error so that the former can actually do their job without the everyone else accidentally writing software that only works for half of the world.

Gribnit

Not sure it's necessary to make an either-or distinction here, particularly that which you're suggesting. If the operands are checked for orderability with one another, the "-" operator seems as fine a choice for taking character differences as "+" does for string concatenation. Granted, that checking can only happen at run-time, unless there's a typing system for characters that takes locale into account.

I'm not sure that inflexible defense of learned pragmas is the best approach to language design.

dkf

@Gribnit said:

a typing system for characters that takes locale into account

Please, God, no.

flabdablet

Since much of the locale-specific weirdness applies more to strings than to individual characters, I'd favour keeping the character type simple and general and building all the locale processing into string libraries.

dkf

Not just that, things get really gnarly when you're working with multilingual texts, yet at the same time most code doesn't actually give a shit at all.

OffByOne

@PWolff said:

Outside DotNet C++, it seems there isn't even a data type called byte, so we couldn't even pretend a char would be something different. Total BS indeed.

There is uint8_t in stdint.h, which at least makes your intent clear.

It's not entirely cross-platform though, it is only required to be defined if the implementation has an exact 8-bit datatype.