Overheard at Work



  • Two of my bright young coworkers were talking about a messaging protocol.

    Meathead1: How many bits in a byte?  Four?

    Meathead2: Yeah.

    Me:*facepalm*



  •  Well.  Send [i]them[/i] home for the day.  Tell them to come back smarter tomorrow.



  • Nibbles and bits ... nibbles and bits ... I gotta get me some nibbles and bits!

    <apologies to the appropriate dog and dog food lovers>



  • This can actually be true if they were talking about an encoding that carries along 4 bits of information in a byte.



  • @Wikipedia said:

    Early computers used a variety of 4-bit binary coded decimal (BCD) representations [for bytes]

    ?



  • Tell them they'll get paid today for working (the number of bits in a byte) hours.



  • Yep, a 'Byte' is simply the smallest addressable unit of memory, if I recall correctly, there even was a system that could vary from 5-12 bits in a byte, depending on configuration options...

    It would have been interesting if recent computers had moved to 32 bit bytes, as a bridge to 64 bit, there is little modern useful data that fits in 8 bits anyway, 32 bits for an RGBA color, 32 bits for a full unicode char, etc. you could store 4x as much data, while still using 32 bit addressing, which also means that pointers would be one 'byte' as well.



  • @bgodot said:

    ...there is little modern useful data that fits in 8 bits anyway...

    Yeah, text characters are sooooo overrated.



  • @dtech said:

    This can actually be true if they were talking about an encoding that carries along 4 bits of information in a byte.

    This. I would try to find out if it was actually this before giving those guys hell.



  • @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.



  • @Sutherlands said:

    @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.

    I dunno, seems about right to me.

    I don't think I'd advocate 32-bit byte architectures, though. Imagine the hex dumps!



  • @Renan said:

    @dtech said:
    This can actually be true if they were talking about an encoding that carries along 4 bits of information in a byte.
    This. I would try to find out if it was actually this before giving those guys hell.
    Please provide at least one current, real-world example of "this".  I'll wait.



  • @frits said:

    @Renan said:
    @dtech said:
    This can actually be true if they were talking about an encoding that carries along 4 bits of information in a byte.
    This. I would try to find out if it was actually this before giving those guys hell.
    Please provide at least one current, real-world example of "this".  I'll wait.

    RAID-1



    ... on average



  • @Sutherlands said:

    @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.

    Never heard of UTF-32?

    http://en.wikipedia.org/wiki/UTF-32

    "UTF-32 (or UCS-4) is a protocol to encode <font color="#0645ad">Unicode</font> characters that uses exactly 32 <font color="#0645ad">bits</font> per Unicode <font color="#0645ad">code point</font>. All other Unicode transformation formats use variable-length encodings. The UTF-32 form of a character is a direct representation of its codepoint."

    <font size="3">edit: also, most modern CPU's want data structures to be aligned on 2^nth bounderys anyway. So a struct with, for example, 4 'chars' would take up 128 bits anyway.</font>



  • @bgodot said:

    @Sutherlands said:

    @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.

    Never heard of UTF-32?

    http://en.wikipedia.org/wiki/UTF-32

    "UTF-32 (or UCS-4) is a protocol to encode <font color="#0645ad">Unicode</font> characters that uses exactly 32 <font color="#0645ad">bits</font> per Unicode <font color="#0645ad">code point</font>. All other Unicode transformation formats use variable-length encodings. The UTF-32 form of a character is a direct representation of its codepoint."

    <font size="3">edit: also, most modern CPU's want data structures to be aligned on 2^nth bounderys anyway. So a struct with, for example, 4 'chars' would take up 128 bits anyway.</font>

    Is there a point to this? All it shows is that you would like for me to mostly waste 24 bits per character. Which kind of proves the opposite of storing any modern data usefully in 8-bits.



  • @bgodot said:

    @Sutherlands said:

    @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.

    Never heard of UTF-32?

    Yes, I've heard of it.  Have you heard of UTF-8, UTF-16, or ISO 8859?  Are you saying that the only modern useful character encoding is UTF-32?  Oh, you are?  Pity...



  • @Sutherlands said:

    @bgodot said:

    @Sutherlands said:

    @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.

    Never heard of UTF-32?

    Yes, I've heard of it.  Have you heard of UTF-8, UTF-16, or ISO 8859?  Are you saying that the only modern useful character encoding is UTF-32?  Oh, you are?  Pity...

     

    Have you heard of reading comprehension? 

    Bgodot said that little modern useful data fits in 8 bits. They did not say that the only modern useful character encoding is UTF-32.

    Personally, I agree with the former statement. For character encodings, certainly, 8 bits isn't enough. Or perhaps I should be saying that  ISO/IEC 8859-1 isn't enough, because it doesn't contain characters I need in my country. Possibly I could fit all the characters I need in 8 bits but that wuld exclude characters that other countries need.

     



  • @Sutherlands said:

    @bgodot said:

    @Sutherlands said:

    @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.

    Never heard of UTF-32?

    Yes, I've heard of it.  Have you heard of UTF-8, UTF-16, or ISO 8859?  Are you saying that the only modern useful character encoding is UTF-32?  Oh, you are?  Pity...

    It's the only fixed size encoding for Unicode. You might not store it on disk or send over the network in that form, but it's useful as a memory data structure to allow simple iteration through characters. And you could still pack 8 bit width data into 32 bit bytes, just use shifting and masking.

    Note that in my origional post I said 32 bit bytes would have been interesting, not better. As in the proverb, "May you live in interesting times."



  • @bgodot said:

    Yep, a 'Byte' is simply the smallest addressable unit of memory, if I recall correctly, there even was a system that could vary from 5-12 bits in a byte, depending on configuration options...

    It would have been interesting if recent computers had moved to 32 bit bytes, as a bridge to 64 bit, there is little modern useful data that fits in 8 bits anyway, 32 bits for an RGBA color, 32 bits for a full unicode char, etc. you could store 4x as much data, while still using 32 bit addressing, which also means that pointers would be one 'byte' as well.

     

    While the rest is bitching you for using 32 bits to store a character (Unicode is limited to 31 bits, by the way, and at the moment uses 21, I think), they seem to feel rather hot about that, I'd like to point out that in modern architectures a pointer would be two bytes...

    And that 4 chars definitely do not have to take up 128 bits, unless you're doing something wrong.



  • @havokk said:

    Have you heard of reading comprehension?

    Bgodot said that little modern useful data fits in 8 bits. They did not say that the only modern useful character encoding is UTF-32.

    That's pretty much what he said, at least for "modern" text. Which is BS...see more below.

    @havokk said:

    Personally, I agree with the former statement. For character encodings, certainly, 8 bits isn't enough. Or perhaps I should be saying that  ISO/IEC 8859-1 isn't enough, because it doesn't contain characters I need in my country. Possibly I could fit all the characters I need in 8 bits but that wuld exclude characters that other countries need.

    It's really quite rare that I actually need anything outside of 7-bit ASCII. You can talk all you want about how this offends your sensibilities, or can't handle any arbitrary nonsense that someone could come up with, but it doesn't change the fact that even in modern times, 8-bits is enough to be extremely useful. That doesn't mean it's appropriate for everyone. If you're unfortunate enough to have a crazy alphabet, or require lots of scribbles along with it, then I agree that an 8-bit format may not be sufficient for you. But that's not what was said.



  • @Sutherlands said:

    Have you heard of UTF-8, UTF-16, or ISO 8859?

    Clearly you've never worked with any of those encodings. They aren't fixed-width encodings. For in-memory manipulation of data I can hardly think of worse choices. If I say "write me a function that determines the length of a string," this is suddenly a monumental task. An operation like "give me the 17th character in the string" is a O(n) operation instead of O(1).



  • @havokk said:

    @Sutherlands said:

    @bgodot said:

    @Sutherlands said:

    @bgodot said:

    there is little modern useful data that fits in 8 bits anyway, [. . .] 32 bits for a full unicode char

    Wha... there is just so much wrong with this statement I don't know where to begin.

    Never heard of UTF-32?

    Yes, I've heard of it.  Have you heard of UTF-8, UTF-16, or ISO 8859?  Are you saying that the only modern useful character encoding is UTF-32?  Oh, you are?  Pity...

     

    Have you heard of reading comprehension? 

    Bgodot said that little modern useful data fits in 8 bits. They did not say that the only modern useful character encoding is UTF-32.

    Have you heard of logic?

    Proposition: <A> does not fit in <B>

    <C> is an <A>

    <C> fits in <B>

    QED, Proposition is false.

    (In case you haven't figured it out, <A> is "modern useful data", <B> is "8 bits", and <C> is UTF-8)

     

    Bgodot: Do you understand yet that my only argument is that "little modern useful data [] fits in 8 bits"? (And certainly there's even more that fits in <32 bits.)  Yes, I know about UTF-32, but it's wasteful.  If you want to use it, that's fine, your prerogative, but there is plenty of useful data that fits in 8 bits.



  • @smxlong said:

    @Sutherlands said:
    Have you heard of UTF-8, UTF-16, or ISO 8859?
    Clearly you've never worked with any of those encodings. They aren't fixed-width encodings. For in-memory manipulation of data I can hardly think of worse choices. If I say "write me a function that determines the length of a string," this is suddenly a monumental task. An operation like "give me the 17th character in the string" is a O(n) operation instead of O(1).
    Clearly.  You must be the most knowledgable person on this forum, with great insight.  Able to deduce that I haven't worked with an encoding simply because it's not fixed-width.  It must be odd that UTF-8 and UTF-16 are commonly used, whereas UTF-32 isn't, because anyone who is working with those encodings doesn't know anything about them.  You, sir, are a man among men.  You have stood on the shoulders of giants.

     

    If you said "write me a function that determines the length of a string", I'd tell you "String.Length".  Piss off, wanker.

     

    edit: And besides, if you're getting international text and are using an array-index because you think that's a "specific position" character, you'd very possibly be wrong.



  • @Sutherlands said:

    edit: And besides, if you're getting international text and are using an array-index because you think that's a "specific position" character, you'd very possibly be wrong.

    Not to mention these. A single visual character may be composed of a varying number of Unicode code points, such as U+0061 U+0308 to make up ä (this does not display correctly in my browser; YMMV). And since that's exactly equivalent to the pre-combined form U+00E4 (ä), you can't even compare two strings directly with a byte-by-byte comparison, regardless of byte size and encoding used.



  • @Sutherlands said:

    If you said "write me a function that determines the length of a string", I'd tell you "String.Length".  Piss off, wanker.

    Excellent, you win the prize for missing the point and making yourself look like an idiot.

    smxlong said if he asked you to write a function. String.length is not writing a function, it's using a built-in function of a language (probably Java or .Net from the looks of it) Multi-byte encodings cause problems when you don't know the exact encoding of a string and just assume, which is why even the string length functions built into languages get it wrong. So internally, instead of looking at the byte length of a string in memory and calculating character length by dividing it by the number of bits used per character, the whole string has to be traversed from start to end to determing the character length based on the byte patterns of each character. Negligible if you do it once, but when have you ever only needed to get the length of a string once in your code?

    @Sutherlands said:

    edit: And besides, if you're getting international text and are using an
    array-index because you think that's a "specific position" character,
    you'd very possibly be wrong.

    This is why the suggestion of using a fixed-length encoding was brought up, because then you can use array-indexes on strings with confidence that it will work as you intend.

     



  • @ASheridan said:

    This is why the suggestion of using a fixed-length encoding was brought up, because then you can use array-indexes on strings with confidence that it will work as you intend.
    But even UCS-4 isn't fixed-length - just search for combining diacritical marks ‧̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼͇͈͉͍͎̀́̂̄̃̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̽̾̿̀́͂̓̈́͆͊͋͌̕̚ͅ͏͓͔͕͖͙͚͐͑͒͗͛ͣͤͥͦͧͨͩͪͫͬͭͮͯ͘͜͟͢͝͞͠͡ - if you have (one of more of) those in a string, will it really work as intended?



  • @bgodot said:

    So a struct with, for example, 4 'chars' would take up 128 bits anyway.

    Ah, T0pC0d3r, good to see you back!



  • @bgodot said:

    >

    <font size="3">edit: also, most modern CPU's want data structures to be aligned on 2^nth bounderys anyway. So a struct with, for example, 4 'chars' would take up 128 bits anyway.</font>

    Mister, could you please, slowly, lift your hands from the keyboard, and leave the building. CPUs want (or preferer in some cases) data aligned to the size of the element. So chars on 8 bits, int32_t/floats on 32bits and int64_t/doubles on 64bits.



  • Who the fuck had to go and mention characters?  char != byte

    Just to clarify:  Much of what we do involves communicating with an ad hoc collection of devices over different physical layers (Ethernet, RS422, RS232, etc.).  We have a common architecture to deal with communications with these devices as a stream of bytes.  All enocoding is handled after getting the raw data.



  • @frits said:

    Who the fuck had to go and mention characters?  char != byte

    Just to clarify:  Much of what we do involves communicating with an ad hoc collection of devices over different physical layers (Ethernet, RS422, RS232, etc.).  We have a common architecture to deal with communications with these devices as a stream of bytes.  All enocoding is handled after getting the raw data.

    Many communication protocol definitions prudently sidestep the byte (and char) issue by using the term 'octet' (= 8 bits).



  • Maybe they'd been studying The Art of Computer Programming, in which a byte is 6 bits (when it's a binary byte, as opposed to a decimal byte!) and got it slightly wring.



  • @ASheridan said:

     

    @Sutherlands said:

    edit: And besides, if you're getting international text and are using an
    array-index because you think that's a "specific position" character,
    you'd very possibly be wrong.

    This is why the suggestion of using a fixed-length encoding was brought up, because then you can use array-indexes on strings with confidence that it will work as you intend.

    Ah, so after missing the point, you make yourself look like an idiot.  Did you even click the link?  Do you know what a joiner is?  Look at the "character" in ender's post.  How long is that string?  You CANNOT use array-indexes on strings with confidence that it will work as you'd intend.  Ok, I take that back, YOU can, but nobody who knows anything about character encodings can.



  • As far as know there is no global defintion of a byte. In digital electronics it is defined by most text books as a group of eight bits.

    The IEEE 1541 doesn't define its size and in C, a byte is sizeof(char) which is at least 8 bits.

    So it depends on the context really. But 4 bits is highly unlikely.



  • @frits said:

    @Renan said:

    @dtech said:
    This can actually be true if they were talking about an encoding that carries along 4 bits of information in a byte.
    This. I would try to find out if it was actually this before giving those guys hell.
    Please provide at least one current, real-world example of "this".  I'll wait.

    Manchester Encoding. [Although the more common approach is 8 bits of information in a 16 bit word - in either case, two bits of storage for one bit of information.]



  • @Manos said:

    As far as know there is no global defintion of a byte. In digital electronics it is defined by most text books as a group of eight bits.

    The IEEE 1541 doesn't define its size and in C, a byte is sizeof(char) which is at least 8 bits.

    So it depends on the context really. But 4 bits is highly unlikely.

    I will now go and devise "Unicode C", a dialect of C in which sizeof(char) varies from eight to thirty-two bits depending on "#pragma codepage" statements! >:-D



  • @ekolis said:

    I will now go and devise "Unicode C", a dialect of C in which sizeof(char) varies from eight to thirty-two bits depending on "#pragma codepage" statements! >:-D
    sizeof(char) can never be anything other than 1 in a conforming implementation. You perhaps meant CHAR_BIT (which can not be less than 8)?



  • @ekolis said:

    I will now go and devise "Unicode C", a dialect of C in which sizeof(char) varies from eight to thirty-two bits depending on "#pragma codepage" statements! >:-D

    I believe that's what wchar_t is for.



  • Come again? These two deserve a spanking. WITH A BELT.



  •  Overheard too many times on the internet-- "The Atari 2600 was a 4-bit system."

     SIGH.



  • And what about booleans?  They only need a single bit!



  • Come to think of it, I think it depends on the byte's race... if a white byte, it is 8 bits; if a black byte, it's only 4.8 bits, due to the Three-Fifths Compromise.



  • @ekolis said:

    Come to think of it, I think it depends on the byte's race... if a white byte, it is 8 bits; if a black byte, it's only 4.8 bits, due to the Three-Fifths Compromise.

    Wow. Topical.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.