Git hates UTF-16


  • Discourse touched me in a no-no place

    @kazitor said in Git hates UTF-16:

            <MyAwesomeProgramOptionNumber>1</MyAwesomeProgramOptionNumber>
            <MyAwesomeProgramOptionValue>Foo</MyAwesomeProgramOptionValue>
    

    I've seen someone seriously propose:

    <MyAwesomeProgramConfiguration>
        <MyAwesomeProgramConfigurationSettings>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="1">Foo</MyAwesomeProgramOptionValue>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="2">Bar</MyAwesomeProgramOptionValue>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="3">Baz</MyAwesomeProgramOptionValue>
        </MyAwesomeProgramConfigurationSettings>
    </MyAwesomeProgramConfiguration>
    

    It took quite a bit of work to persuade them that XML really does preserve the order of elements by default. :facepalm:


  • BINNED

    @dkf At least they recognised the concept of "attributes"! I've seen too many examples of "Name" and "Value" as separate tags.


  • Discourse touched me in a no-no place

    @kazitor If you want to start a fight in XML circles, get a group of practitioners to say which style you should use for a particular application.



  • @dkf said in Git hates UTF-16:

    @kazitor said in Git hates UTF-16:

            <MyAwesomeProgramOptionNumber>1</MyAwesomeProgramOptionNumber>
            <MyAwesomeProgramOptionValue>Foo</MyAwesomeProgramOptionValue>
    

    I've seen someone seriously propose:

    <MyAwesomeProgramConfiguration>
        <MyAwesomeProgramConfigurationSettings>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="1">Foo</MyAwesomeProgramOptionValue>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="2">Bar</MyAwesomeProgramOptionValue>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="3">Baz</MyAwesomeProgramOptionValue>
        </MyAwesomeProgramConfigurationSettings>
    </MyAwesomeProgramConfiguration>
    

    It took quite a bit of work to persuade them that XML really does preserve the order of elements by default. :facepalm:

    I've had the misfortune of this kind of XML (it was a lot worse, because there were about 20 tags with an unknown quantity of attributes on them for describing the object but this describes the general gist of it):

    <Object>
        <ObjectType>Settings</ObjectType>
        <Object>
            <ObjectType>Container</ObjectType>
            <ObjectList>
                <Object>
                    <ObjectType>String</ObjectType>
                    <ObjectValue>Foo</ObjectValue>
                </Object>
                <Object>
                    <ObjectType>String</ObjectType>
                    <ObjectValue>Bar</ObjectValue>
                </Object>
                <Object>
                    <ObjectType>String</ObjectType>
                    <ObjectValue>Baz</ObjectValue>
                </Object>
            </ObjectList>
        </Object>
    </Object>


  • @dkf said in Git hates UTF-16:

    It took quite a bit of work to persuade them that XML really does preserve the order of elements by default. :facepalm:

    It does??? TIL... and I will be able to delete places where we actually use that awful pattern (head bowed in shame).

    I read (a long time ago) in some Qt's doc (I had a quick look through the current doc and couldn't find it, but it was just a quick look) that the order wasn't always preserved, which I guess is what you imply with your "by default", so even though in practice it was always preserved, I took the habit to assume it might not. Looks like I am TRWTF here...


  • Discourse touched me in a no-no place

    @remi said in Git hates UTF-16:

    I read (a long time ago) in some Qt's doc (I had a quick look through the current doc and couldn't find it, but it was just a quick look) that the order wasn't always preserved,

    XML is order preserving — it's a document format at its core — but the schema might say that the order is unimportant for the contents of some elements. It's a bit like saying that JSON is itself order-preserving, but order is unimportant inside a JSON object (unlike an array).

    Reordering elements without something saying explicitly that it is OK to do so would be a :wtf: of course…



  • @dkf If there is no clearly defined schema (let's face it, that's about all of the XML files I handle...), would it be a :wtf: for something to assume it means the order is unimportant (and therefore it can reorder elements)?

    I'm still not sure whether I should write :wtf:-esque code to cover my ass in case some XML parser messes up, or whether I should assume that 3rd-party code will never do a stupid thing... Which one is the biggest :wtf:?


  • Discourse touched me in a no-no place

    @remi said in Git hates UTF-16:

    @dkf If there is no clearly defined schema (let's face it, that's about all of the XML files I handle...), would it be a :wtf: for something to assume it means the order is unimportant (and therefore it can reorder elements)?

    Yes. Unambiguously, yes.


  • Banned

    @remi said in Git hates UTF-16:

    @dkf If there is no clearly defined schema (let's face it, that's about all of the XML files I handle...), would it be a :wtf: for something to assume it means the order is unimportant (and therefore it can reorder elements)?

    If there is no clearly defined format (not necessarily with XSD schema), then there is no definite answer to any question you might possibly ask about the format. Both assuming order and assuming lack of order is potentially :doing_it_wrong:.

    But if the format is clearly defined (not necessarily with XSD schema), then assuming anything and everything that the format defines is completely okay.

    If you're asking how most parsers will behave - as far as I know, all the most popular ones always keep the elements in order they were written in file.

    I'm still not sure whether I should write :wtf:-esque code to cover my ass in case some XML parser messes up, or whether I should assume that 3rd-party code will never do a stupid thing... Which one is the biggest :wtf:?

    Don't future-proof. Only solve problems that you actually encounter.


  • Discourse touched me in a no-no place

    @Gąska said in Git hates UTF-16:

    @remi said in Git hates UTF-16:

    @dkf If there is no clearly defined schema (let's face it, that's about all of the XML files I handle...), would it be a :wtf: for something to assume it means the order is unimportant (and therefore it can reorder elements)?

    If there is no clearly defined format (not necessarily with XSD schema), then there is no definite answer to any question you might possibly ask about the format. Both assuming order and assuming lack of order is potentially :doing_it_wrong:.

    The fundamental model of XML (i.e., the XML-flavour DOM) is always ordered (even for attributes, though the model says you're supposed to ignore the order of those). Ignoring element order occasionally is technically easier than introducing order where it doesn't exist (since then you have to do a lot more work to define the ordering criterion).



  • Qt Widgets's .ui is a special type of hell. It's a XML format for declarative GUI layouts, generated using a GUI editor.

    • the indentation is crazy deep, so 1-space indentation.
    • Instead of using XML order to dictate layout positions, GUI widgets in a QFormLayout or QGridLayout (not h/v layouts) have row="" and col="" attributes (if I remember names right).
      • Adding 1 widget in the same row in 2 branches = merge conflict, and afterwards you have to increment the row of all widgets below that row.
        • Yesterday I didn't get a merge conflict but rather a "successful" merge, which produced a syntactically invalid result. Luckily I didn't have to increment widgets below it.
      • If you edit the widgets wrong, you can end up with row="1" followed by row="3" or more. (To fix issue: Qt Designer, right-click, Layout/Simplify Layout)
    • And if you order your widgets wrong in the .ui file, the tab order no longer matches the visual order! (To fix issue: Open XML in a text editor, and reorder widgets manually.)

  • 🚽 Regular

    @jimbo1qaz-0

    INB4 @Tsaukpaetra welcomes you back.


  • BINNED

    @Zecc
    I'll give you a like so I know that I have read your reply


  • Considered Harmful

    @jimbo1qaz-0 Welcome back!


  • Notification Spam Recipient

    @Zecc I have been summoned, and so I appear.

    Edit: fuck! I haven't read this thread yet!

    Oh well...



  • @Carnage said in Git hates UTF-16:

    @dkf said in Git hates UTF-16:

    @kazitor said in Git hates UTF-16:

            <MyAwesomeProgramOptionNumber>1</MyAwesomeProgramOptionNumber>
            <MyAwesomeProgramOptionValue>Foo</MyAwesomeProgramOptionValue>
    

    I've seen someone seriously propose:

    <MyAwesomeProgramConfiguration>
        <MyAwesomeProgramConfigurationSettings>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="1">Foo</MyAwesomeProgramOptionValue>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="2">Bar</MyAwesomeProgramOptionValue>
            <MyAwesomeProgramOptionValue MyAwesomeProgramOptionIndex="3">Baz</MyAwesomeProgramOptionValue>
        </MyAwesomeProgramConfigurationSettings>
    </MyAwesomeProgramConfiguration>
    

    It took quite a bit of work to persuade them that XML really does preserve the order of elements by default. :facepalm:

    I've had the misfortune of this kind of XML (it was a lot worse, because there were about 20 tags with an unknown quantity of attributes on them for describing the object but this describes the general gist of it):

    <Object>
        <ObjectType>Settings</ObjectType>
        <Object>
            <ObjectType>Container</ObjectType>
            <ObjectList>
                <Object>
                    <ObjectType>String</ObjectType>
                    <ObjectValue>Foo</ObjectValue>
                </Object>
                <Object>
                    <ObjectType>String</ObjectType>
                    <ObjectValue>Bar</ObjectValue>
                </Object>
                <Object>
                    <ObjectType>String</ObjectType>
                    <ObjectValue>Baz</ObjectValue>
                </Object>
            </ObjectList>
        </Object>
    </Object>
    

    That looks like a Mac plist file...



  • @dkf said in Git hates UTF-16:

    even for attributes, though the model says you're supposed to ignore the order of those

    Really? Every parser I've worked with always alphabetizes those on output...



  • @jimbo1qaz-0 said in Git hates UTF-16:

    It's a XML format ..., generated using a GUI editor.

    DONOTLOOK!!! Looking at generated code will cause mass extinction of brain cells!


  • BINNED

    @Tsaukpaetra said in Git hates UTF-16:

    fuck! I haven't read this thread yet!

    I would think that, with your special "Mark as read" system, that wouldn't be an issue at all. Combined with "Mark unread" at the top and bottom of the page, of course.


  • Notification Spam Recipient

    @kazitor said in Git hates UTF-16:

    @Tsaukpaetra said in Git hates UTF-16:

    fuck! I haven't read this thread yet!

    I would think that, with your special "Mark as read" system, that wouldn't be an issue at all. Combined with "Mark unread" at the top and bottom of the page, of course.

    Sure, it's just rather annoying because there's no "Mark Unread from here" so I'll always return to the end of the thread now and have to manually rewind.

    Problem is only half solved.


  • Discourse touched me in a no-no place

    @dcon said in Git hates UTF-16:

    Every parser I've worked with always alphabetizes those on output...

    It'd be just as correct to arrange them by order of length.



  • @levicki said in Git hates UTF-16:

    Sequence of characters is also a sequence of bytes

    Please don't say this. It's just as correct and useful a statement as saying "a computer program is a very large number". This mental shortcut of conflating abstract concepts with their physical representations is exactly the cause of the bug described in the OP (among many, many others).


  • Discourse touched me in a no-no place

    @ixvedeusi It's one of those wonderful things that is both true and utterly unhelpfully misleading.



  • @dkf I don't agree that it's true, even. "can be represented as" != "is". It's a distinction we ignore way too often in all kinds of situations. That's understandable because accounting for it can make communication rather cumbersome, but it's important to always keep in mind that there is a distinction.


  • ♿ (Parody)

    @ixvedeusi said in Git hates UTF-16:

    @dkf I don't agree that it's true, even. "can be represented as" != "is". It's a distinction we ignore way too often in all kinds of situations. That's understandable because accounting for it can make communication rather cumbersome, but it's important to always keep in mind that there is a distinction.

    In terms of computing, what's an example where it isn't (i.e., "Sequence of characters is not a sequence of bytes")?


  • Banned

    @boomzilla always. Characters are implemented as bytes, but it doesn't mean they are bytes.


  • ♿ (Parody)

    @Gąska said in Git hates UTF-16:

    @boomzilla alwaysNever. Characters are implemented as bytes, but it doesn't mean they are bytes.

    :wtf: You literally said that they are always exactly that.



  • @boomzilla said in Git hates UTF-16:

    @ixvedeusi said in Git hates UTF-16:

    @dkf I don't agree that it's true, even. "can be represented as" != "is". It's a distinction we ignore way too often in all kinds of situations. That's understandable because accounting for it can make communication rather cumbersome, but it's important to always keep in mind that there is a distinction.

    In terms of computing, what's an example where it isn't (i.e., "Sequence of characters is not a sequence of bytes")?

    I think that the main issue is when a single character doesn't map to a single byte.
    A sequence of characters are a sequence of bytes, much the same way very large numbers are a sequence of bytes. Technically true, but not that particularly helpful unless you are working with the gritty details of mapping the data to bytes.


  • Banned

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla alwaysNever. Characters are implemented as bytes, but it doesn't mean they are bytes.

    :wtf: You literally said that they are always exactly that.

    I literally said the exact opposite.


  • Discourse touched me in a no-no place

    @Carnage said in Git hates UTF-16:

    I think that the main issue is when a single character doesn't map to a single byte.

    There are many possible mappings. They're called encodings, and lots of programmers don't understand them at all.


  • ♿ (Parody)

    @Carnage said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @ixvedeusi said in Git hates UTF-16:

    @dkf I don't agree that it's true, even. "can be represented as" != "is". It's a distinction we ignore way too often in all kinds of situations. That's understandable because accounting for it can make communication rather cumbersome, but it's important to always keep in mind that there is a distinction.

    In terms of computing, what's an example where it isn't (i.e., "Sequence of characters is not a sequence of bytes")?

    I think that the main issue is when a single character doesn't map to a single byte.
    A sequence of characters are a sequence of bytes, much the same way very large numbers are a sequence of bytes. Technically true, but not that particularly helpful unless you are working with the gritty details of mapping the data to bytes.

    Right. That's why @dkf's statement applies: "It's one of those wonderful things that is both true and utterly unhelpfully misleading."

    But then @ixvedeusi says it's not even true. And then you get the weird response that says they're always implemented as bytes and therefore they're not bytes. I'm not sure what to think about a person who says that.


  • ♿ (Parody)

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla alwaysNever. Characters are implemented as bytes, but it doesn't mean they are bytes.

    :wtf: You literally said that they are always exactly that.

    I literally said the exact opposite.

    Yeah, you wrote "Never" and then immediately contradicted that. :mlp_shrug: Like, you posted a version of this:



  • @boomzilla said in Git hates UTF-16:

    In terms of computing, what's an example where it isn't (i.e., "Sequence of characters is not a sequence of bytes")?

    As Gąska said:

    @Gąska said in Git hates UTF-16:

    @boomzilla always

    Nothing you'll ever encounter is a sequence of characters. "Sequence of characters" is an abstract concept. As such it only exists in our minds. You'll only ever encounter physical representations of this abstract concept, which can take all kinds of forms, from traces of ink on a paper, over specific oscillations of air molecules, to the charge distribution in a chunk of silicon (which in turn represents the value of a sequence of bytes which in turn is used to represent a sequence of characters).


  • Banned

    @boomzilla said in Git hates UTF-16:

    And then you get the weird response that says they're always implemented as bytes and therefore they're not bytes.

    Therefore? THEREFORE? Where did you get that from? No, that they're not bytes is not because of that they're implemented as bytes. These are two independent statements. They are not bytes. They are implemented as bytes. No causal relationship of any kind. Learn to read.


  • ♿ (Parody)

    @Gąska said in Git hates UTF-16:

    They are not bytes. They are implemented as bytes.

    How do you keep these things in your head at the same time?

    @ixvedeusi said in Git hates UTF-16:

    Nothing you'll ever encounter is a sequence of characters.

    I believe you've descended into sophistry now.


  • Banned

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla alwaysNever. Characters are implemented as bytes, but it doesn't mean they are bytes.

    :wtf: You literally said that they are always exactly that.

    I literally said the exact opposite.

    Yeah, you wrote "Never" and then immediately contradicted that. :mlp_shrug:

    WHAT? You're making stuff up now, and you're not even hiding it. I wrote "always". You used ins/del tags to replace it in quote with "never". And now you're acting like this was the original content of my post. You're literally putting words in my mouth. How can I take you seriously after that?


  • Banned

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    They are not bytes. They are implemented as bytes.

    How do you keep these things in your head at the same time?

    Very simple: I don't leak abstractions.


  • ♿ (Parody)

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla alwaysNever. Characters are implemented as bytes, but it doesn't mean they are bytes.

    :wtf: You literally said that they are always exactly that.

    I literally said the exact opposite.

    Yeah, you wrote "Never" and then immediately contradicted that. :mlp_shrug:

    WHAT? You're making stuff up now, and you're not even hiding it. I wrote "always". You used ins/del tags to replace it in quote with "never". And now you're acting like this was the original content of my post. You're literally putting words in my mouth. How can I take you seriously after that?

    Sorry, you're correct, I got the two mixed up and of course that typo has blown your mind, but I haven't changed anything in the meaning in my post, which is that you're spouting obvious contradictions.

    "That chair is built of wood. It's not wood."



  • @boomzilla said in Git hates UTF-16:

    @ixvedeusi said in Git hates UTF-16:

    Nothing you'll ever encounter is a sequence of characters.

    I believe you've descended into sophistry now.

    I believe you mean to say that you disagree with my statement. According to the definitions I found, "sophistry" means "Plausible but fallacious argumentation." Would you mind pointing out the fallacy for me?


  • ♿ (Parody)

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    They are not bytes. They are implemented as bytes.

    How do you keep these things in your head at the same time?

    Very simple: I don't leak abstractions.

    You just let your brain leak out?


  • ♿ (Parody)

    @ixvedeusi said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @ixvedeusi said in Git hates UTF-16:

    Nothing you'll ever encounter is a sequence of characters.

    I believe you've descended into sophistry now.

    I believe you mean to say that you disagree with my statement. According to the definitions I found, "sophistry" means "Plausible but fallacious argumentation." Would you mind pointing out the fallacy for me?

    It's in the thing I quoted. For instance:

    In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.

    You've waved your hands and dismissed this.


  • Banned

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla alwaysNever. Characters are implemented as bytes, but it doesn't mean they are bytes.

    :wtf: You literally said that they are always exactly that.

    I literally said the exact opposite.

    Yeah, you wrote "Never" and then immediately contradicted that. :mlp_shrug:

    WHAT? You're making stuff up now, and you're not even hiding it. I wrote "always". You used ins/del tags to replace it in quote with "never". And now you're acting like this was the original content of my post. You're literally putting words in my mouth. How can I take you seriously after that?

    Sorry, you're correct, I got the two mixed up

    HOW CAN YOU MIX UP ALWAYS WITH NEVER!? :wtf: :doing_it_wrong: 🤯

    and of course that typo has blown your mind

    That was quite significant typo, you know. Like, you accused me of directly contradicting myself. While claiming that I said the opposite of what I actually said.

    "That chair is built of wood. It's not wood."

    "That cooking pot is made of recycled garbage. That pot is garbage."


  • ♿ (Parody)

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla alwaysNever. Characters are implemented as bytes, but it doesn't mean they are bytes.

    :wtf: You literally said that they are always exactly that.

    I literally said the exact opposite.

    Yeah, you wrote "Never" and then immediately contradicted that. :mlp_shrug:

    WHAT? You're making stuff up now, and you're not even hiding it. I wrote "always". You used ins/del tags to replace it in quote with "never". And now you're acting like this was the original content of my post. You're literally putting words in my mouth. How can I take you seriously after that?

    Sorry, you're correct, I got the two mixed up

    HOW CAN YOU MIX UP ALWAYS WITH NEVER!? :wtf: :doing_it_wrong: 🤯

    Wow, that typo really did blow your mind. You really are channeling blakey.

    and of course that typo has blown your mind

    That was quite significant typo, you know. Like, you accused me of directly contradicting myself. While claiming that I said the opposite of what I actually said.

    And you went back and looked and saw that I got it reversed and still couldn't figure it out, eh?

    "That chair is built of wood. It's not wood."

    "That cooking pot is made of recycled garbage. That pot is garbage."

    Now you're getting it! I'm glad you've accepted your mistake.


  • Banned

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    They are not bytes. They are implemented as bytes.

    How do you keep these things in your head at the same time?

    Very simple: I don't leak abstractions.

    You just let your brain leak out?

    The trick is not to let it leak out. Then you don't confuse "always" with "never", and don't think that just because something is implemented as sequence of bytes, you can't treat it like you would treat any random sequence of bytes.


  • ♿ (Parody)

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    @Gąska said in Git hates UTF-16:

    They are not bytes. They are implemented as bytes.

    How do you keep these things in your head at the same time?

    Very simple: I don't leak abstractions.

    You just let your brain leak out?

    The trick is not to let it leak out. Then you don't confuse "always" with "never", and don't think that just because something is implemented as sequence of bytes, you can't treat it like you would treat any random sequence of bytes.

    Now who is putting words in people's mouths? But of course, in reality you might want to do just that in some cases. Copying memory, transmitting over a network, serializing to disk. You might not, too, but that's irrelevant since there are cases where it could happen.

    But even if it didn't, you haven't explained how these characters aren't sequences of bytes.


  • 🚽 Regular

    @boomzilla said in Git hates UTF-16:

    "That chair is built of wood. It's not wood."

    You guys need a better "to be" verb. :halftroll:



  • @boomzilla said in Git hates UTF-16:

    In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.

    You've waved your hands and dismissed this.

    Not sure how this implies a fallacy in what I said, assuming that you accept my claim that "is represented by" != "is". A character string is just as abstract a concept as a character (necessarily, seeing as it's constituted of characters). There's many ways a string can be represented in a computer system.

    As I see it, saying that a string is a sequence of bytes is a similar kind of categorical error as saying that the database row which tracks an inventory item is that inventory item.


  • ♿ (Parody)

    @ixvedeusi said in Git hates UTF-16:

    @boomzilla said in Git hates UTF-16:

    In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.

    You've waved your hands and dismissed this.

    Not sure how this implies a fallacy in what I said, assuming that you accept my claim that "is represented by" != "is". A character string is just as abstract a concept as a character (necessarily, seeing as it's constituted of characters). There's many ways a string can be represented in a computer system.

    And I asked for an example of a string that isn't represented as bytes.

    As I see it, saying that a string is a sequence of bytes is a similar kind of categorical error as saying that the database row which tracks an inventory item is that inventory item.

    Vehemently disagree. At least until you can show an example of a sequence of characters that isn't bytes (again, in a computer, as I said previously).


  • Banned

    @boomzilla said in Git hates UTF-16:

    But of course, in reality you might want to do just that in some cases. Copying memory, transmitting over a network, serializing to disk.

    Because it just so happens that some of the operations that are valid for arbitrary sequences of bytes are also valid for character strings. Like, for example, bitwise copy of the entire string from the beginning to the very end, skipping nothing inbetween. Java's ArrayLists are also implemented as sequences of bytes, just like character strings, and there's also some operations that you can do on arbitrary sequences of bytes that are also valid for variable-sized array objects - but if you try to bitwise copy ArrayList, you're gonna have a bad time.

    You can sometimes in limited circumstances treat character strings identically to arbitrary byte sequences. You can sometimes in limited circumstances treat ArrayLists identically to arbitrary byte sequences. But despite that, ArrayLists are still not the same thing as byte sequences. And neither are character strings.



  • @boomzilla said in Git hates UTF-16:

    At least until you can show an example of a sequence of characters that isn't bytes

    In computer science, everything is represented as bytes. So I obviously cannot show you an example where a string isn't represented with bytes. But fundamentally, computer programming is manipulating electrical charges and nothing else. We just use these electrons to represent things so that we can reason in an abstract space which happens to have similar properties to the abstract space in which the problem we want to solve lives.

    Yes, I'm being very pedantic and philosophical here, but this has many rather concrete real-life consequences we have to deal with daily, such as character encodings and leaky abstractions.


Log in to reply