Git hates UTF-16

dkf

I asked multiple times for someone to show me a sequence of characters that wasn't a sequence of bytes

I once saw someone implement a string (which is the very definition of a sequence of characters) as a tree of bytes. Does that count? (They forgot to make the tree balanced; their performance sucked balls even before considering the memory effects of using such a thoroughly wasteful representation.)

boomzilla

@dkf said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

I asked multiple times for someone to show me a sequence of characters that wasn't a sequence of bytes

I once saw someone implement a string (which is the very definition of a sequence of characters) as a tree of bytes. Does that count? (They forgot to make the tree balanced; their performance sucked balls even before considering the memory effects of using such a thoroughly wasteful representation.)

Hmm...were the contents of the tree in a block of contiguous memory? If not then I will give you the prize.

boomzilla

@levicki said in Git hates UTF-16:

You know what's the difference between a character and a byte? They're different entities.

@boomzilla said in Git hates UTF-16:

What? How?

Because both are different levels of abstraction? You know, it's like a fractal, you can keep zooming in for more details.

Well, yes. It's using different abstractions on the same thing. But in the context of each abstraction, of course they are different.

Unicode example:

"The maximum valid code point in Unicode is U+10FFFF, which makes it a 21-bit code set (but not all 21-bit integers are valid Unicode code points..."

So how is a code point (of which one or more can represent a single logical character) a sequence of bytes, when it's not whole bytes anymore?

Any unicode character is going to be stored in some unicode encoding. All encodings I'm aware of currently use some multiple of bytes.

How do you say "it's just bytes" without describing how to read it back properly?

What difference does that make? If it's in Chinese I won't be able to read the characters. Does that make them non characters?

Also did you know that you can use "accented a" and have one code point or combine "accent" and "a" and have two code points, both of which render identical character?

I know that there is a lot of fuckery possible with Unicode. And when you do it you're using bytes arranged in some encoding format to do it.

Gribnit

@boomzilla said in Git hates UTF-16:

@levicki said in Git hates UTF-16:

You know what's the difference between a character and a byte? They're different entities.

@boomzilla said in Git hates UTF-16:

What? How?

Because both are different levels of abstraction? You know, it's like a fractal, you can keep zooming in for more details.

Well, yes. It's using different abstractions on the same thing. But in the context of each abstraction, of course they are different.

Unicode example:

"The maximum valid code point in Unicode is U+10FFFF, which makes it a 21-bit code set (but not all 21-bit integers are valid Unicode code points..."

So how is a code point (of which one or more can represent a single logical character) a sequence of bytes, when it's not whole bytes anymore?

Any unicode character is going to be stored in some unicode encoding. All encodings I'm aware of currently use some multiple of bytes.

How do you say "it's just bytes" without describing how to read it back properly?

What difference does that make? If it's in Chinese I won't be able to read the characters. Does that make them non characters?

Also did you know that you can use "accented a" and have one code point or combine "accent" and "a" and have two code points, both of which render identical character?

I know that there is a lot of fuckery possible with Unicode. And when you do it you're using bytes arranged in some encoding format to do it.

Well, there's the ternary computing world, where you don't have bits at all, you have trits. There's no bytes there, though there's probably something equivalent.

HardwareGeek

@Gąska said in Git hates UTF-16:

we're just bike shedding about dictionary.

Welcome to TDWTF.

HardwareGeek

@Gąska said in Git hates UTF-16:

Because "is" doesn't work like that.

It doesn't necessarily work the way you think it does, either.

The sky is blue. is a perfectly valid and true statement. But that doesn't mean everything that is blue is sky. class B extends A — B is a A; it has properties that A doesn't have, and it doesn't have (or at least can't directly access) some (private) properties that A does have, but it still is an A. Is doesn't necessarily mean is identical to.

HardwareGeek

@boomzilla said in Git hates UTF-16:

You've gone to a layer where I'm honestly a lot less knowledgeable, in that I don't really know how semiconductors work at the electron level.

Ordinarily, I'd love to explain, but this whole argument is ic dickweedery of a degree that even I, Viscount Pedantic Dickweed of the Noble Order of the Garter, don't care to descend into any further than I already have. Good luck.

overly pedantic

My usual response to this is "impossible," but in this case, I'm going to allow it.

M_Adams

@boomzilla said in Git hates UTF-16:

I fully agree that there are other representations of characters in the world but was only talking about the sorts you get in strings.

Like these?

Redirect Notice

Kian

Man, this is the longest, dumbest argument I've seen here. Everyone agrees on the facts, and are just upset over different meanings of the word "is".

A sequence of characters "is", concretely, a sequence of bytes: memory is an array of bytes, and anything stored in memory must therefore be serialized as a sequence of bytes. Operating on the bytes will affect the sequence of characters, since on the concrete, real world, that's were they exist. As long as your sequence of characters exists in the machine, it is just a way of looking at a specific set of bytes.

A sequence of characters "is not", abstractly, a sequence of bytes: you can perform operations on a sequence of bytes that result in a valid sequence of bytes, but the resulting sequence of bytes is no longer a valid sequence of characters. You can also conceive of sequences of characters independently of any memory where they might be stored. You can write the sequence down on paper, for example, without using bytes.

Both statements are true, and you've spent hundreds of posts arguing which meaning of the word "is" you like better. It's amazing.

pie_flavor

@Kian said in Git hates UTF-16:

A sequence of bytes "is not", abstractly, a sequence of bytes

dkf

@boomzilla said in Git hates UTF-16:

Hmm...were the contents of the tree in a block of contiguous memory? If not then I will give you the prize.

Probably not contiguous. It was all just (equivalent of) malloc()ed nodes with one character per node. Plus all the pointers so that the tree could be traversed. Triumph of theoretically-interesting data structures and all that.

The creator was so proud of it. And so amazingly disappointed when we proved that it was awful for practical things like reading a file and parsing it.

dkf

@Kian said in Git hates UTF-16:

A sequence of characters "is", concretely, a sequence of bytes: memory is an array of bytes, and anything stored in memory must therefore be serialized as a sequence of bytes. Operating on the bytes will affect the sequence of characters, since on the concrete, real world, that's were they exist. As long as your sequence of characters exists in the machine, it is just a way of looking at a specific set of bytes.

Bytes are abstractions too, though a lower-level one. It's all a tower of abstractions, one built on another.

Gąska

@HardwareGeek said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

Because "is" doesn't work like that.

It doesn't necessarily work the way you think it does, either.

The sky is blue. is a perfectly valid and true statement. But that doesn't mean everything that is blue is sky. class B extends A — B is a A; it has properties that A doesn't have, and it doesn't have (or at least can't directly access) some (private) properties that A does have, but it still is an A. Is doesn't necessarily mean is identical to.

You're absolutely correct. I've even talked about exactly this for a few dozen posts. And even with this alternative meaning of "is", @boomzilla is still wrong.

Gąska

@Kian said in Git hates UTF-16:

A sequence of characters "is", concretely, a sequence of bytes: memory is an array of bytes, and anything stored in memory must therefore be serialized as a sequence of bytes. Operating on the bytes will affect the sequence of characters, since on the concrete, real world, that's were they exist. As long as your sequence of characters exists in the machine, it is just a way of looking at a specific set of bytes.

Are the quotes around "is" meaningful? Like, is it just stylistic choice, or did you mean them as sort of scare quotes, to indicate that this paragraph is kinda sorta right if you squint a lot but actually it isn't entirely correct?

Kian

@Gąska stylistic choice, to highlight that both 'is' and 'is not' (below) apply.

Gąska

@Kian okay, I see. In which case, I can say that your post isn't entirely correct. Because you're making the same mistake as @boomzilla. To go back to his equation analogy - when you write down an equation, the equation doesn't become the symbols you've written. It's just represented by those symbols. There are some operations you can perform on an equation, and you can implement those operations by writing down the steps you've taken. But what you wrote down isn't those operations. It's just a representation of those operations. Those operations can happen independently of what you write down. And what you write down is independent of those operations. When you mess up in the process of writing down the steps of operations so that the left doesn't equal the right, it doesn't change the original equation - it's just that the representation is mangled and doesn't represent the same equation anymore. The equation isn't the symbols - the symbols only represent the equation for ease of processing. Same with characters and bytes.

dkf

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing.

Arguably, equations are just symbols. They may interpreted to represent statements about other things, but the equation itself is ~~a sequence~~ an arrangement of symbols. Some types of logic are built purely up from symbolic processing: there the symbols really don't represent anything, except they can then be built up to look the same and work the same as much of more conventional mathematical logic…

boomzilla

@dkf said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

Hmm...were the contents of the tree in a block of contiguous memory? If not then I will give you the prize.

Probably not contiguous. It was all just (equivalent of) malloc()ed nodes with one character per node. Plus all the pointers so that the tree could be traversed. Triumph of theoretically-interesting data structures and all that.

The creator was so proud of it. And so amazingly disappointed when we proved that it was awful for practical things like reading a file and parsing it.

A lot of fun "puzzles" like that are ultimately useless but I think they're still good learning experiences, even if the immediate result is disappointing.

boomzilla

@Gąska said in Git hates UTF-16:

@HardwareGeek said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

Because "is" doesn't work like that.

It doesn't necessarily work the way you think it does, either.

The sky is blue. is a perfectly valid and true statement. But that doesn't mean everything that is blue is sky. class B extends A — B is a A; it has properties that A doesn't have, and it doesn't have (or at least can't directly access) some (private) properties that A does have, but it still is an A. Is doesn't necessarily mean is identical to.

You're absolutely correct. I've even talked about exactly this for a few dozen posts. And even with this alternative meaning of "is", @boomzilla is still wrong.

Only in ways that you can't express, though, apparently.

boomzilla

@Gąska said in Git hates UTF-16:

Because you're making the same mistake as @boomzilla.

Which is disagreeing with you.

To go back to his equation analogy - when you write down an equation, the equation doesn't become the symbols you've written.

And you're getting this all wrong by ignoring what I wrote and replacing it with something in your head.

Gąska

@dkf said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing.

Arguably, equations are just symbols.

Wikipedia seems to disagree.

In mathematics, an equation is a statement that asserts the equality of two expressions.

boomzilla

@Gąska said in Git hates UTF-16:

@dkf said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing.

Arguably, equations are just symbols.

Wikipedia seems to disagree.

In mathematics, an equation is a statement that asserts the equality of two expressions.

do you think an expression is? Do you actually know that you're pendanting the word here or do you actually believe what you're typing?

Gąska

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

Because you're making the same mistake as @boomzilla.

Which is disagreeing with you.

Yes. Disagreeing with someone who's right is a mistake.

To go back to his equation analogy - when you write down an equation, the equation doesn't become the symbols you've written.

And you're getting this all wrong by ignoring what I wrote and replacing it with something in your head.

I'm not ignoring it. I'm saying what you wrote is wrong.

boomzilla

@Gąska said in Git hates UTF-16:

I'm not ignoring it. I'm saying what you wrote is wrong.

Yes, I know what you're saying, but it addresses something other than what I wrote. I can tell because you're talking about writing the equation down, not the equation and the abstractions involved in the equation. But I understand now that you're incapable of moving between different levels and thinking about them at the same time so I guess I understand why you went there instead of dealing with the actual ideas there.

Gąska

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@dkf said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing.

Arguably, equations are just symbols.

Wikipedia seems to disagree.

In mathematics, an equation is a statement that asserts the equality of two expressions.

do you think an expression is?

A statement that asserts the equality of two expressions.

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as sequences of characters are.

boomzilla

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@dkf said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing.

Arguably, equations are just symbols.

Wikipedia seems to disagree.

In mathematics, an equation is a statement that asserts the equality of two expressions.

do you think an expression is?

A statement that asserts the equality of two expressions.

I invite you to re-read my question and then read your answer.

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as characters of bytes are.

And that's just silly and wrong. You've over complicated things and have convinced yourself that obvious things are obviously wrong with your interpretations of dictionary definitions and philosophical navel gazing. It's truly marvelous and a testament to the human ability for self deception.

Kian

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing. Same with characters and bytes.

You are too focused on the ideal value that an object represents and unable to accept that real representations of that value also are that value. That is to say, at the level you are talking about, it is impossible to copy a string, for example, because both copies are actually the same string. Nor can you modify a string, because the first value continues to exist after the representation of it has been changed, and the new value also already existed, you're just now changing the representation to represent the new value. And that level of abstraction can be useful for certain operations, the entirety of functional programming is based on operating at this level, but that is not the only way to operate.

For most people, the concrete representation of a thing is the thing. A wooden chair is a chair, even if the platonic ideal of a chair encompasses more things. You're arguing that chairs aren't made of wood because chairness doesn't define a material, and you can have metal chairs too. And yes, that's true, but as long as I can sit my ass on it it's a chair, and if I can set it on fire it is wood, and it is both things at once. It being a chair isn't going to protect me from getting splinters just because "you can get a splinter from it" is not a fundamental quality of chairness.

Gąska

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@dkf said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing.

Arguably, equations are just symbols.

Wikipedia seems to disagree.

In mathematics, an equation is a statement that asserts the equality of two expressions.

do you think an expression is?

A statement that asserts the equality of two expressions.

I invite you to re-read my question and then read your answer.

...yeah, I misread your question. After 200 posts of you mixing various generic insults with repeating the same wrong statements ad nauseam, I get a bit careless. That's entirely on me.

OK, let me check what the Almighty Wikipedia says about expression...

OK let me check what the Almighty Wikipedia says about symbols...

...

Okay, you've got me here. What I said makes no sense. I retract everything I said about writing down equations - it's a completely wrong analogy. I should've stuck with geometric lines and their descriptions instead. This is much closer to the relation of characters as bytes - since equations are literally defined in terms of symbols, while lines are independent from their representation as equations, just like characters are independent from bytes.

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as characters of bytes are.

And that's just silly and wrong.

Silly, or wrong? Because pedantry is always silly, but it's never wrong. If it's wrong, it's not pedantry - it's bullshit. Pedantry is annoying specifically because it's entirely correct.

You've over complicated things and have convinced yourself that obvious things are obviously wrong with your interpretations of dictionary definitions and philosophical navel gazing.

Are my interpretations wrong? If so, what exactly with my interpretations is wrong? Which part of the definitions I've got wrong? What those parts mean instead?

If you can't answer any of these questions precisely (best with a citation from some dictionary, encyclopedia or other source), then shut up and just accept the truth. You're free to call me annoying asshat. You're free to call me pedantic dickweed. But you can't say I'm wrong when I'm not.

dkf

@Kian said in Git hates UTF-16:

And yes, that's true, but as long as I can sit my ass on it it's a chair, and if I can set it on fire it is wood, and it is both things at once.

It could be a beanbag…

boomzilla

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@dkf said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing.

Arguably, equations are just symbols.

Wikipedia seems to disagree.

In mathematics, an equation is a statement that asserts the equality of two expressions.

do you think an expression is?

A statement that asserts the equality of two expressions.

I invite you to re-read my question and then read your answer.

...yeah, I misread your question. After 200 posts of you mixing various generic insults with repeating the same wrong statements ad nauseam, I get a bit careless. That's entirely on me.

And perhaps it's been going on for a while. Note: the insults are just because you are denying the trivially true and obvious.

OK, let me check what the Almighty Wikipedia says about expression...

OK let me check what the Almighty Wikipedia says about symbols...

...

Okay, you've got me here. What I said makes no sense. I retract everything I said about writing down equations - it's a completely wrong analogy. I should've stuck with geometric lines and their descriptions instead. This is much closer to the relation of characters as bytes - since equations are literally defined in terms of symbols, while lines are independent from their representation as equations, just like characters are independent from bytes.

You're approaching enlightenment.

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as characters of bytes are.

And that's just silly and wrong.

Silly, or wrong? Because pedantry is always silly, but it's never wrong. If it's wrong, it's not pedantry - it's bullshit. Pedantry is annoying specifically because it's entirely correct.

You're trying to be pedantically correct with your "the meaning of is" bullshit but you're failing. I've speculated that it's due to the way you learned English. Though I think the ontology discussion in the salon gave you wrong ideas, too, in that you confused a philosophical way of categorizing concepts with actual things that we use (even though they are kind of abstract, they kind of aren't as evidenced by the text we have been posting right here).

You've over complicated things and have convinced yourself that obvious things are obviously wrong with your interpretations of dictionary definitions and philosophical navel gazing.

Are my interpretations wrong? If so, what exactly with my interpretations is wrong? Which part of the definitions I've got wrong? What those parts mean instead?

Remember how I kept asking you where the bytes went that were making up the characters? That's the bit. You need to figure out the mental block you have in seeing that the characters are also bytes. Again, you need to be able to simultaneously see them at multiple abstraction levels.

If you can't answer any of these questions precisely (best with a citation from some dictionary, encyclopedia or other source), then shut up and just accept the truth. You're free to call me annoying asshat. You're free to call me pedantic dickweed. But you can't say I'm wrong when I'm not.

You and your fucking dictionaries. Just tell me where the bytes go! You know, the ones on the stack or on the heap. Why are they suddenly not bytes when someone is using them as characters? No dictionary is required here. I really is that simple. You're over complicating things and getting yourself all spun around.

boomzilla

@dkf said in Git hates UTF-16:

@Kian said in Git hates UTF-16:

And yes, that's true, but as long as I can sit my ass on it it's a chair, and if I can set it on fire it is wood, and it is both things at once.

It could be a beanbag…

dkf

@Gąska said in Git hates UTF-16:

A statement that asserts the equality of two expressions.

An equation is a relationship between two expressions, usually under an assumed common evaluation scheme. It's not necessarily true. I can write 1 = 2 but despite that being clearly an equation in its syntactic form, its valuation is always false.

There are more basic notions of =-like things than equality. Specifically, “definition” (≝) and “syntactically identical” (≡). Neither of those depends on a valuation scheme other than one that is purely about the syntax.

Most mathematicians aren't quite this cautious with things. It tends to be the ones tending into theoretical CS who are this -inclined, precisely because computers are so thoroughly good at moving symbols around that you need a damn good grip on what they mean in the first place or you end up utterly lost. (For example, all strings are also numbers. Computing has picked a pretty-much universal Gödel numbering for everything, and byte-strings that encode character strings using UTF-8 are pretty close to the heart of it.)

Kian

@dkf said in Git hates UTF-16:

@Kian said in Git hates UTF-16:

And yes, that's true, but as long as I can sit my ass on it it's a chair, and if I can set it on fire it is wood, and it is both things at once.

It could be a beanbag…

Aren't beanbags a kind of chair though?

Gąska

@Kian said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

The equation isn't the symbols - the symbols only represent the equation for ease of processing. Same with characters and bytes.

You are too focused on the ideal value that an object represents and unable to accept that real representations of that value also are that value.

There's a reason why we call them representations.

That is to say, at the level you are talking about, it is impossible to copy a string, for example, because both copies are actually the same string. Nor can you modify a string, because the first value continues to exist after the representation of it has been changed, and the new value also already existed, you're just now changing the representation to represent the new value.

Depends. Are we talking about strings, or instances of strings? Because instances have identity beyond their raw content, and so copying and modifying them makes perfect sense. And it's all still at abstract, above-memory level with entities independent of their representation.

For most people, the concrete representation of a thing is the thing.

There's even a technical term for exactly this behavior: leaky abstraction.

Computer science is hard. Many people (basically everybody) simplify many aspects of it because they can't wrap their heads around all the concepts, or because they're tired of it, or because of performance reasons (theory, meet practice). But those simplifications aren't entirely correct. Like all simplifications, they omit some details, usually ones that aren't important. Newtonian dynamics are a great tool that's good enough in most situations, but those laws aren't entirely correct. Same with a statement that a sequence of characters is a sequence of bytes. It's a good enough simplification for most situations, but ultimately it's not entirely correct.

A wooden chair is a chair, even if the platonic ideal of a chair encompasses more things. You're arguing that chairs aren't made of wood because chairness doesn't define a material, and you can have metal chairs too.

That's not what I'm saying. I actually said the polar opposite, and was criticized by others for saying that. A sequence of characters in computer memory absolutely is made of bytes, no question about it. But it doesn't mean a sequence of characters is a sequence of bytes. Much like water, which is made of hydrogen and oxygen, cannot be said to be a mixture of hydrogen and oxygen - because the actual mixture of hydrogen and oxygen has entirely different properties. Just like characters have different properties than bytes.

And yes, that's true, but as long as I can sit my ass on it it's a chair, and if I can set it on fire it is wood, and it is both things at once. It being a chair isn't going to protect me from getting splinters just because "you can get a splinter from it" is not a fundamental quality of chairness.

Can you XOR characters? Not the bytes that represent them. The characters themselves. Is XOR a valid operation for two arbitrary characters? It is for arbitrary bytes. If it's valid for arbitrary bytes, and characters are bytes, then it must be valid for arbitrary characters. What is "∃ XOR 吾" equal to? And don't ask about encoding. Encoding isn't a property of a character. Encoding is a property of a particular representation of characters as bytes. Characters are the same regardless of how they're encoded. And if XOR is defined on characters, it must yield the same result regardless of encoding. So what is "∃ XOR 吾" equal to?

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

Gąska

@boomzilla said in Git hates UTF-16:

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as characters of bytes are.

And that's just silly and wrong.

Silly, or wrong? Because pedantry is always silly, but it's never wrong. If it's wrong, it's not pedantry - it's bullshit. Pedantry is annoying specifically because it's entirely correct.

You're trying to be pedantically correct with your "the meaning of is" bullshit but you're failing. I've speculated that it's due to the way you learned English. Though I think the ontology discussion in the salon gave you wrong ideas, too, in that you confused a philosophical way of categorizing concepts with actual things that we use (even though they are kind of abstract, they kind of aren't as evidenced by the text we have been posting right here).

The thing is, "is" isn't what anybody uses for anything practical - at least not in the sense that you're trying to do (putting the wrongness aside for a moment). "Is implemented with" is used in practice. "Is represented by" is used in practice. "Is identical to" is used in practice. "Is subclass of" is used in practice. But "is" alone isn't. Because it's a purely ontological concept, as explained by Captain in that Salon thread.

You've over complicated things and have convinced yourself that obvious things are obviously wrong with your interpretations of dictionary definitions and philosophical navel gazing.

Are my interpretations wrong? If so, what exactly with my interpretations is wrong? Which part of the definitions I've got wrong? What those parts mean instead?

Remember how I kept asking you where the bytes went that were making up the characters? That's the bit. You need to figure out the mental block you have in seeing that the characters are also bytes. Again, you need to be able to simultaneously see them at multiple abstraction levels.

You have a mental block of your own here. You keep asking about where the bytes go. As if it was relevant to the question of whether sequences of characters are sequences of bytes. I believe I know why you're so fixated on this. It's because you've already assumed right from the start that since characters are always implemented with bytes, it means that characters are bytes. With that assumption, the only way you can see to disprove that statement is to show that characters aren't implemented with bytes. Which is impossible, because it just so happens in our world that characters are always implemented with bytes. And so you conclude that characters are bytes, because they're always implemented with bytes. The conclusion is identical to the assumption you've made at the beginning. This is a book example of circular reasoning.

Let me ask you this: are you able to prove in some other way that sequences of characters are sequences of bytes, without relying on the assumption that if A is always implemented with B, it means A is B?

boomzilla

@Gąska said in Git hates UTF-16:

@Kian said in Git hates UTF-16:

A wooden chair is a chair, even if the platonic ideal of a chair encompasses more things. You're arguing that chairs aren't made of wood because chairness doesn't define a material, and you can have metal chairs too.

That's not what I'm saying. I actually said the polar opposite, and was criticized by others for saying that.

Because what we read was exactly what he's saying. I'm not sure where the disconnect is but it's somewhere on your side.

A sequence of characters in computer memory absolutely is made of bytes, no question about it.

But it doesn't mean a sequence of characters is a sequence of bytes.

But...how?!

Much like water, which is made of hydrogen and oxygen, cannot be said to be a mixture of hydrogen and oxygen - because the actual mixture of hydrogen and oxygen has entirely different properties. Just like characters have
different properties than bytes.

Yes, to say "a mixture" is wrong. But it's definitely hydrogen and oxygen. Those extra properties are 100% irrelevant to that point. However, they are 100% relevant to the second part of the contentions quote, because those other properties are important for using the stuff. The fact that you even bring them up at this point is mind boggling. Truly. Are you taking aspirin or the traditional vodka to deal with this cognitive dissonance here where you contradict yourself.

And yes, that's true, but as long as I can sit my ass on it it's a chair, and if I can set it on fire it is wood, and it is both things at once. It being a chair isn't going to protect me from getting splinters just because "you can get a splinter from it" is not a fundamental quality of chairness.

Can you XOR characters? Not the bytes that represent them. The characters themselves. Is XOR a valid operation for two arbitrary characters? It is for arbitrary bytes. If it's valid for arbitrary bytes, and characters are bytes, then it must be valid for arbitrary characters. What is "∃ XOR 吾" equal to? And don't ask about encoding. Encoding isn't a property of a character. Encoding is a property of a particular representation of characters as bytes. Characters are the same regardless of how they're encoded. And if XOR is defined on characters, it must
yield the same result regardless of encoding. So what is "∃ XOR 吾" equal to?

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

TDEMSYR. Once again, you are confusing yourself by boxing yourself into particular abstractions and refusing to see the other ones. We should put your brain in a museum.

Kian

@Gąska said in Git hates UTF-16:

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

Here you're contradicting yourself. Setting the chair on fire is a property of the wood it's made of, not of it being a chair. Set it on fire, and once all you have left is ashes you no longer have a chair. Analogously, I can XOR the bytes that make up my string, just as I can set on fire the wood my chair is made of, and the resulting set of characters will probably no longer be a valid string, just as the ashes no longer are a chair.

You can't set on fire the "chairness" of a chair, just as you can't XOR ideal characters, but I can set on fire the wood and I can XOR the bytes.

boomzilla

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as characters of bytes are.

And that's just silly and wrong.

Silly, or wrong? Because pedantry is always silly, but it's never wrong. If it's wrong, it's not pedantry - it's bullshit. Pedantry is annoying specifically because it's entirely correct.

You're trying to be pedantically correct with your "the meaning of is" bullshit but you're failing. I've speculated that it's due to the way you learned English. Though I think the ontology discussion in the salon gave you wrong ideas, too, in that you confused a philosophical way of categorizing concepts with actual things that we use (even though they are kind of abstract, they kind of aren't as evidenced by the text we have been posting right here).

The thing is, "is" isn't what anybody uses for anything practical - at least not in the sense that you're trying to do (putting the wrongness aside for a moment). "Is implemented with" is used in practice. "Is represented by" is used in practice. "Is identical to" is used in practice. "Is subclass of" is used in practice. But "is" alone isn't. Because it's a purely ontological concept, as explained by Captain in that Salon thread.

Stop the navel gazing and actually read what's being said.

You've over complicated things and have convinced yourself that obvious things are obviously wrong with your interpretations of dictionary definitions and philosophical navel gazing.

Are my interpretations wrong? If so, what exactly with my interpretations is wrong? Which part of the definitions I've got wrong? What those parts mean instead?

Remember how I kept asking you where the bytes went that were making up the characters? That's the bit. You need to figure out the mental block you have in seeing that the characters are also bytes. Again, you need to be able to simultaneously see them at multiple abstraction levels.

You have a mental block of your own here. You keep asking about where the bytes go. As if it was relevant to the question of whether sequences of characters are sequences of bytes.

Well, duh! I simply can't imagine why you think this is odd.

I believe I know why you're so fixated on this. It's because you've already assumed right from the start that since characters are always implemented with bytes, it means that characters are bytes.

It's not exactly an assumption. It was a hypothesis based on experience and knowledge of what's going on. Aside from exotic shit like @Gribnit's trits or @dkf's weirdo tree thing, no one has an example of anything else. And certainly garden variety strings are exactly this, which is what we were talking about, no matter what @ixvedeusi wants to tell us.

With that assumption, the only way you can see to disprove that statement is to show that characters aren't implemented with bytes. Which is impossible, because it just so happens in our world that characters are always implemented with bytes.

I know, right? And yet you've still found a way to deny this obvious truth even when you admit it.

And so you conclude that characters are bytes, because they're always implemented with bytes. The conclusion is identical to the assumption you've made at the beginning. This is a book example of circular reasoning.

Look, I've said from the beginning that it was trivially true. And you may recall that I'm not the one denying trivially true things.

Let me ask you this: are you able to prove in some other way that sequences of characters are sequences of bytes, without relying on the assumption that if A is always implemented with B, it means A is B?

This is getting retarded now. Seriously. You are the one treating this simple and obvious fact as thought it's some kind of deep or clever theorem. The proof follows from the nature of our computers and the way we represent characters in memory. It's really that simple.

Gąska

@Kian said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

Here you're contradicting yourself. Setting the chair on fire is a property of the wood it's made of, not of it being a chair. Set it on fire, and once all you have left is ashes you no longer have a chair. Analogously, I can XOR the bytes that make up my string, just as I can set on fire the wood my chair is made of, and the resulting set of characters will probably no longer be a valid string, just as the ashes no longer are a chair.

You can't set on fire the "chairness" of a chair, just as you can't XOR ideal characters, but I can set on fire the wood and I can XOR the bytes.

In other words, characters aren't bytes. They're just made of bytes.

PleegWat

@Gąska said in Git hates UTF-16:

@Kian said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

Here you're contradicting yourself. Setting the chair on fire is a property of the wood it's made of, not of it being a chair. Set it on fire, and once all you have left is ashes you no longer have a chair. Analogously, I can XOR the bytes that make up my string, just as I can set on fire the wood my chair is made of, and the resulting set of characters will probably no longer be a valid string, just as the ashes no longer are a chair.

You can't set on fire the "chairness" of a chair, just as you can't XOR ideal characters, but I can set on fire the wood and I can XOR the bytes.

In other words, characters aren't bytes. They're just made of bytes.

If I draw a character on a piece of paper, bytes don't come into it. Only when I scan that character in it can be said to be composed of bytes. And if I take that character and encode it as iso-latin-1 the bytes that come out are different than when I initially scanned it in. Additionally if the original character was, for example, é, the encoding changes if I use utf-8 or utf-16.

As such, though characters can be encoded as bytes, I think it is not correct to say they are bytes.

dkf

@Gąska said in Git hates UTF-16:

In other words, characters aren't bytes. They're just made of bytes.

If they're on computers, they're made of bits (they're not ground reality, but there's literally no reason to delve deeper unless you're a glutton for punishment). Bits are usually arranged into bytes. Sometimes there's a one-to-one mapping between byte-values and character-values, sometimes not. Strings are always logically character sequences (they're defined that way). Implementations can get much more complicated.

As I said, you can say that strings and characters are made out of bits and bytes and so on. It's just that very often it doesn't help you to say this; almost all interesting operations on strings are only at best loosely connected to operations on bits.

Gąska

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as characters of bytes are.

And that's just silly and wrong.

Silly, or wrong? Because pedantry is always silly, but it's never wrong. If it's wrong, it's not pedantry - it's bullshit. Pedantry is annoying specifically because it's entirely correct.

You're trying to be pedantically correct with your "the meaning of is" bullshit but you're failing. I've speculated that it's due to the way you learned English. Though I think the ontology discussion in the salon gave you wrong ideas, too, in that you confused a philosophical way of categorizing concepts with actual things that we use (even though they are kind of abstract, they kind of aren't as evidenced by the text we have been posting right here).

The thing is, "is" isn't what anybody uses for anything practical - at least not in the sense that you're trying to do (putting the wrongness aside for a moment). "Is implemented with" is used in practice. "Is represented by" is used in practice. "Is identical to" is used in practice. "Is subclass of" is used in practice. But "is" alone isn't. Because it's a purely ontological concept, as explained by Captain in that Salon thread.

Stop the navel gazing and actually read what's being said.

Stop the random insults and actually read what's being said.

You've over complicated things and have convinced yourself that obvious things are obviously wrong with your interpretations of dictionary definitions and philosophical navel gazing.

Are my interpretations wrong? If so, what exactly with my interpretations is wrong? Which part of the definitions I've got wrong? What those parts mean instead?

Remember how I kept asking you where the bytes went that were making up the characters? That's the bit. You need to figure out the mental block you have in seeing that the characters are also bytes. Again, you need to be able to simultaneously see them at multiple abstraction levels.

You have a mental block of your own here. You keep asking about where the bytes go. As if it was relevant to the question of whether sequences of characters are sequences of bytes.

Well, duh! I simply can't imagine why you think this is odd.

Because it's wrong?

I believe I know why you're so fixated on this. It's because you've already assumed right from the start that since characters are always implemented with bytes, it means that characters are bytes.

It's not exactly an assumption. It was a hypothesis based on experience and knowledge of what's going on. Aside from exotic shit like @Gribnit's trits or @dkf's weirdo tree thing, no one has an example of anything else. And certainly garden variety strings are exactly this, which is what we were talking about, no matter what @ixvedeusi wants to tell us.

Not quite. The experience and knowledge is that characters are always implemented with bytes. The rest of the statement - specifically the implication there: "since..., then..." - is just an assumption, plain and simple. An unproven assumption, by that (circular reasoning isn't proof).

With that assumption, the only way you can see to disprove that statement is to show that characters aren't implemented with bytes. Which is impossible, because it just so happens in our world that characters are always implemented with bytes.

I know, right? And yet you've still found a way to deny this obvious truth even when you admit it.

And so you conclude that characters are bytes, because they're always implemented with bytes. The conclusion is identical to the assumption you've made at the beginning. This is a book example of circular reasoning.

Look, I've said from the beginning that it was trivially true. And you may recall that I'm not the one denying trivially true things.

The problem is, it's trivially true because you assumed it to be trivially true. It's tautology. It's true because it's assumed to be true, and it's only true as long as you keep assuming it's true. Once you stop assuming it's true (and it's always good to have as little assumptions as possible), it stops being true. Unless you can prove it otherwise, but it looks like you can't.

Let me ask you this: are you able to prove in some other way that sequences of characters are sequences of bytes, without relying on the assumption that if A is always implemented with B, it means A is B?

This is getting retarded now. Seriously.

It's just a standard process of reconsidering your assumptions. If you can't prove your assumptions without circular reasoning, there's only two options - they're axioms and therefore don't need a proof just like God doesn't need a proof of His existence, or they just aren't true.

You are the one treating this simple and obvious fact as thought it's some kind of deep or clever theorem. The proof follows from the nature of our computers and the way we represent characters in memory. It's really that simple.

Except it doesn't. The only thing that follows from nature of computers is that all characters must be represented as bits. Nothing about nature of computers says anything about what characters are. And what characters are is a different question from what characters are represented with.

Gąska

@PleegWat said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@Kian said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

Here you're contradicting yourself. Setting the chair on fire is a property of the wood it's made of, not of it being a chair. Set it on fire, and once all you have left is ashes you no longer have a chair. Analogously, I can XOR the bytes that make up my string, just as I can set on fire the wood my chair is made of, and the resulting set of characters will probably no longer be a valid string, just as the ashes no longer are a chair.

You can't set on fire the "chairness" of a chair, just as you can't XOR ideal characters, but I can set on fire the wood and I can XOR the bytes.

In other words, characters aren't bytes. They're just made of bytes.

If I draw a character on a piece of paper, bytes don't come into it.

I know it might have been lost in the discussion, but we were talking about characters in programming languages. Those are always backed by bytes.

boomzilla

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@boomzilla said in Git hates UTF-16:

Do you actually know that you're pendanting the word here

Duh? I've told you several pages ago that my only gripe with you is that you're using the word "is" wrong. Everything you say is absolutely correct, except for only this one single claim that sequence of characters is sequence of bytes. And you're wrong for reasons completely unrelated to how computers work, because it's not a question of how computers work, but of what the abstract entities known as characters of bytes are.

And that's just silly and wrong.

Silly, or wrong? Because pedantry is always silly, but it's never wrong. If it's wrong, it's not pedantry - it's bullshit. Pedantry is annoying specifically because it's entirely correct.

You're trying to be pedantically correct with your "the meaning of is" bullshit but you're failing. I've speculated that it's due to the way you learned English. Though I think the ontology discussion in the salon gave you wrong ideas, too, in that you confused a philosophical way of categorizing concepts with actual things that we use (even though they are kind of abstract, they kind of aren't as evidenced by the text we have been posting right here).

The thing is, "is" isn't what anybody uses for anything practical - at least not in the sense that you're trying to do (putting the wrongness aside for a moment). "Is implemented with" is used in practice. "Is represented by" is used in practice. "Is identical to" is used in practice. "Is subclass of" is used in practice. But "is" alone isn't. Because it's a purely ontological concept, as explained by Captain in that Salon thread.

Stop the navel gazing and actually read what's being said.

Stop the random insults and actually read what's being said.

Nothing random about these insults. They are a direct result of reading the bilge you've been posting.

I believe I know why you're so fixated on this. It's because you've already assumed right from the start that since characters are always implemented with bytes, it means that characters are bytes.

It's not exactly an assumption. It was a hypothesis based on experience and knowledge of what's going on. Aside from exotic shit like @Gribnit's trits or @dkf's weirdo tree thing, no one has an example of anything else. And certainly garden variety strings are exactly this, which is what we were talking about, no matter what @ixvedeusi wants to tell us.

Not quite. The experience and knowledge is that characters are always implemented with bytes. The rest of the statement - specifically the implication there: "since..., then..." - is just an assumption, plain and simple. An unproven assumption, by that (circular reasoning isn't proof).

Show how the proof is deficient then. Which step is wrong and why?

With that assumption, the only way you can see to disprove that statement is to show that characters aren't implemented with bytes. Which is impossible, because it just so happens in our world that characters are always implemented with bytes.

I know, right? And yet you've still found a way to deny this obvious truth even when you admit it.

And so you conclude that characters are bytes, because they're always implemented with bytes. The conclusion is identical to the assumption you've made at the beginning. This is a book example of circular reasoning.

Look, I've said from the beginning that it was trivially true. And you may recall that I'm not the one denying trivially true things.

The problem is, it's trivially true because you assumed it to be trivially true. It's tautology. It's true because it's assumed to be true, and it's only true as long as you keep assuming it's true. Once you stop assuming it's true (and it's always good to have as little assumptions as possible), it stops being true. Unless you can prove it otherwise, but it looks like you can't.

Can't satisfy your irrational demands? No, I too believe I'll never be able to do that. Now, show me where the reasoning is wrong.

Let me ask you this: are you able to prove in some other way that sequences of characters are sequences of bytes, without relying on the assumption that if A is always implemented with B, it means A is B?

This is getting retarded now. Seriously.

It's just a standard process of reconsidering your assumptions. If you can't prove your assumptions without circular reasoning, there's only two options - they're axioms and therefore don't need a proof just like God doesn't need a proof of His existence, or they just aren't true.

You've given no reason to think any assumptions (or any of the reasoning) are wrong. The ball is in your court here.

You are the one treating this simple and obvious fact as thought it's some kind of deep or clever theorem. The proof follows from the nature of our computers and the way we represent characters in memory. It's really that simple.

Except it doesn't. The only thing that follows from nature of computers is that all characters must be represented as bits. Nothing about nature of computers says anything about what characters are. And what characters are is a different question from what characters are represented with.

And once again you're confusing yourself by limiting your thinking to a particular abstraction. And even then you're wrong. What they're represented with is what they are. They are those things. This is a stupid and obviously wrong assumption you're making here and there is simply no way to get around that. The way you try is to say, "Yeah, they are made from those things, but in a particular way." Which somehow makes you now think they aren't those things. Which...TDEMSYR

boomzilla

@Gąska said in Git hates UTF-16:

@PleegWat said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@Kian said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

Here you're contradicting yourself. Setting the chair on fire is a property of the wood it's made of, not of it being a chair. Set it on fire, and once all you have left is ashes you no longer have a chair. Analogously, I can XOR the bytes that make up my string, just as I can set on fire the wood my chair is made of, and the resulting set of characters will probably no longer be a valid string, just as the ashes no longer are a chair.

You can't set on fire the "chairness" of a chair, just as you can't XOR ideal characters, but I can set on fire the wood and I can XOR the bytes.

In other words, characters aren't bytes. They're just made of bytes.

If I draw a character on a piece of paper, bytes don't come into it.

I know it might have been lost in the discussion, but we were talking about characters in programming languages. Those are always backed by bytes.

Oh, sure, now you agree with me.

Khudzlin

@Gąska Character is such a confusing term in computer science, because what users think of as characters might not align with the possible values for a "character" type, or even with the code points of a given character encoding.

HardwareGeek

@boomzilla said in Git hates UTF-16:

Do you actually know that you're pendanting the word here

Not that there's anything wrong with that.

As long as your pendantry is actually correct.

Zenith

@Khudzlin The real WTF is variable size characters. They should've left western languages alone and forced hieroglyphics into 24-bit or something.

Gąska

@boomzilla said in Git hates UTF-16:

I believe I know why you're so fixated on this. It's because you've already assumed right from the start that since characters are always implemented with bytes, it means that characters are bytes.

It's not exactly an assumption. It was a hypothesis based on experience and knowledge of what's going on. Aside from exotic shit like @Gribnit's trits or @dkf's weirdo tree thing, no one has an example of anything else. And certainly garden variety strings are exactly this, which is what we were talking about, no matter what @ixvedeusi wants to tell us.

Not quite. The experience and knowledge is that characters are always implemented with bytes. The rest of the statement - specifically the implication there: "since..., then..." - is just an assumption, plain and simple. An unproven assumption, by that (circular reasoning isn't proof).

Show how the proof is deficient then. Which step is wrong and why?

I just did. But I'm feeling generous today, let me copy-paste the relevant part.

"You've already assumed right from the start that since characters are always implemented with bytes, it means that characters are bytes. With that assumption, the only way you can see to disprove that statement is to show that characters aren't implemented with bytes. Which is impossible, because it just so happens in our world that characters are always implemented with bytes. And so you conclude that characters are bytes, because they're always implemented with bytes. The conclusion is identical to the assumption you've made at the beginning. This is a book example of circular reasoning."

With that assumption, the only way you can see to disprove that statement is to show that characters aren't implemented with bytes. Which is impossible, because it just so happens in our world that characters are always implemented with bytes.

I know, right? And yet you've still found a way to deny this obvious truth even when you admit it.

And so you conclude that characters are bytes, because they're always implemented with bytes. The conclusion is identical to the assumption you've made at the beginning. This is a book example of circular reasoning.

Look, I've said from the beginning that it was trivially true. And you may recall that I'm not the one denying trivially true things.

The problem is, it's trivially true because you assumed it to be trivially true. It's tautology. It's true because it's assumed to be true, and it's only true as long as you keep assuming it's true. Once you stop assuming it's true (and it's always good to have as little assumptions as possible), it stops being true. Unless you can prove it otherwise, but it looks like you can't.

Can't satisfy your irrational demands?

What's irrational about them? That I'm asking you to prove a false statement? Well, guess what! It wouldn't be that hard if your statement wasn't false!

Let me ask you this: are you able to prove in some other way that sequences of characters are sequences of bytes, without relying on the assumption that if A is always implemented with B, it means A is B?

This is getting retarded now. Seriously.

It's just a standard process of reconsidering your assumptions. If you can't prove your assumptions without circular reasoning, there's only two options - they're axioms and therefore don't need a proof just like God doesn't need a proof of His existence, or they just aren't true.

You've given no reason to think any assumptions (or any of the reasoning) are wrong. The ball is in your court here.

Closing your eyes doesn't make the world disappear. I've already explained several times what's wrong with your reasoning. But okay, I'll repeat one more time: you're using circular reasoning, and that's a logical fallacy - and that makes your argument invalid.

You are the one treating this simple and obvious fact as thought it's some kind of deep or clever theorem. The proof follows from the nature of our computers and the way we represent characters in memory. It's really that simple.

Except it doesn't. The only thing that follows from nature of computers is that all characters must be represented as bits. Nothing about nature of computers says anything about what characters are. And what characters are is a different question from what characters are represented with.

And once again you're confusing yourself by limiting your thinking to a particular abstraction. And even then you're wrong. What they're represented with is what they are. They are those things.

Is this relation transitive? If there are two representations of the same thing, are those two representations the same thing? If two different things have the same representation (for example, number 150 and number -106, when represented with 8 bits each), do they become the same thing?

This is a stupid and obviously wrong assumption you're making here and there is simply no way to get around that. The way you try is to say, "Yeah, they are made from those things, but in a particular way."

...I never said anything like that. I never said anything about "particular ways" the things are made. I just said that what they are made of doesn't imply anything about what they are.

Gąska

@boomzilla said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@PleegWat said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

@Kian said in Git hates UTF-16:

@Gąska said in Git hates UTF-16:

If you can't set a chair on fire, it's not wood. If you can't XOR characters, then they're not bytes.

Here you're contradicting yourself. Setting the chair on fire is a property of the wood it's made of, not of it being a chair. Set it on fire, and once all you have left is ashes you no longer have a chair. Analogously, I can XOR the bytes that make up my string, just as I can set on fire the wood my chair is made of, and the resulting set of characters will probably no longer be a valid string, just as the ashes no longer are a chair.

You can't set on fire the "chairness" of a chair, just as you can't XOR ideal characters, but I can set on fire the wood and I can XOR the bytes.

In other words, characters aren't bytes. They're just made of bytes.

If I draw a character on a piece of paper, bytes don't come into it.

I know it might have been lost in the discussion, but we were talking about characters in programming languages. Those are always backed by bytes.

Oh, sure, now you agree with me.

I've always agreed with you on this point. I believe I wrote at least six times now that I agree with everything you said in this topic, except that sequences of characters are sequences of bytes. I agree they are made of bytes. I agree they are represented with bytes. I agree they are often thought of as bytes when the distinction doesn't matter (they're also thought of as numbers occasionally too, and Unicode code points are thought of as characters, even though none of these is strictly true - it's just that they behave the same most of the time). I just disagree about one teeny tiny detail.