[Solved] UUIDs, variant (not version) 1. The `variant type` part. What determines the LSB?


  • Discourse touched me in a no-no place

    Variant type 1 is given as:

       The following table lists the contents of the variant field, where
       the letter "x" indicates a "don't-care" value.
    
       Msb0  Msb1  Msb2  Description
    
        0     x     x    Reserved, NCS backward compatibility.
    
        1     0     x    The variant specified in this document.
    
        1     1     0    Reserved, Microsoft Corporation backward
                         compatibility
    
        1     1     1    Reserved for future definition.
    

    the second line in that table. The 1 0 x The variant specified in this document line.

    What determines the value of that x, since
    0) I know it says 'don't care' up there but that's for determining the variant I assume (hope!) - otherwise you get into the position where you can have two different UUID's which are - FAIAP - identical.

    1. I can't find any reasonable explanation about it (certainly not in RFC4122, and Google isn't being useful to me tonight)
    2. I've spent over an hour wondering why my unit tests, on some code I copied and was testing, were failing on one nibble when the rest of the UUID seemed fine, then eventually stumbled upon - while looking:
    $ uuidgen -s -n 5951ac30-636c-11e8-b27e-834d045ab87b -N ''
    d8dc8f19-a957-5903-86ce-89eebb01625e
    $ uuid -v5 5951ac30-636c-11e8-b27e-834d045ab87b ''
    d8dc8f19-a957-5903-a6ce-89eebb01625e
    

    The first result is what I was using and doesn't match the output from the code I swiped. The second result does.

    Why, when used with V5 (and V3 - I have similar issues) UUIDs, which are supposed to be reproducible when used with the same namespace and name, am I getting two different answers from two different (supposedly system) tools.

    And more specifically, which one should I be using (if there is a choice to be made) to construct my unit tests for this?

    Unless there's a wood lurking round these trees and I've been stumbling around too long to see it....


  • Discourse touched me in a no-no place

    Experimentation shows I can get my plagiarized code to generate either result (for all instances where I can get uuid and uuidgen to differ) by either

    1. masking off then fixing the two significant identifier bits (allowing the x to be populated by the rest of the algorithm) (what uuid seems to do)
    2. masking off then fixing all three identifier bits (forcing x to be zero) (uuidgen)

    I suspect I know which should be the 'right' thing to do, but still digging.

    I've had more interesting things I could be investigating...


  • Discourse touched me in a no-no place

    Found the wood. And it appears it's uuidgen that's borked - it's clearing x when it shouldn't.

    Version 1 (not relevant here, but it's indicative) Section 4.2.2:

    • Set the 6 least significant bits (bits zero through 5) of the
      clock_seq_hi_and_reserved field to the 6 most significant bits
      (bits 8 through 13) of the clock sequence in the same order of
      significance.

    • Set the two most significant bits (bits 6 and 7) of the
      clock_seq_hi_and_reserved to zero and one, respectively.

    Bit 5 is x

    Versions 3 and 5 Section 4.3:

    • Set the clock_seq_hi_and_reserved field to octet 8 of the hash.

    • Set the two most significant bits (bits 6 and 7) of the
      clock_seq_hi_and_reserved to zero and one, respectively.

    Version 4 Section 4.4:

    • Set the two most significant bits (bits 6 and 7) of the
      clock_seq_hi_and_reserved to zero and one, respectively.
    • [..]
    • Set all the other bits to randomly (or pseudo-randomly) chosen
      values.


  • Now that you've solved it, can you explain why this matters?

    Trying to figure out how to generate identical GUIDs seems to defeat the entire purpose of using GUIDs at all. Why do you have unit tests that seem to rely on generating identical GUIDs? Why not just use MD5 or something?



  • @blakeyrat As he said in the OP, by spec Version 3 and Version 5 GUIDs are generated from a namespace and name, and the generation is supposed to be repeatable. (The difference between V3 and V5 is in which hash they use.)

    (From the spec:)

    The requirements for these types of UUIDs are as follows:

    o The UUIDs generated at different times from the same name in the
    same namespace MUST be equal.
    o [additional requirements]

    So if you generate a new GUID for the same namespace and name and it doesn't match the one you had before, that is an error.



  • @scarlet_manuka I get that; I don't get why a unit test relies on that.



  • @blakeyrat said in [Solved] UUIDs, variant (not version) 1. The `variant type` part. What determines the LSB?:

    @scarlet_manuka I get that; I don't get why a unit test relies on that.

    If you're generating namespace GUIDs in your application, it makes sense to have a unit test that says "generate this specific one and make sure you get this value." I don't know what his application is about, but I presume generating namespace GUIDs is a thing some possible application could reasonably do, so I don't see a problem here.


  • Discourse touched me in a no-no place

    TLDR: I'm unit testing newly introduced code for actually generating GUIDs, not stuff that relies on GUIDs.

    @blakeyrat said in [Solved] UUIDs, variant (not version) 1. The `variant type` part. What determines the LSB?:

    Trying to figure out how to generate identical GUIDs seems to defeat the entire purpose of using GUIDs at all.

    Key word there being 'seems,' I'm guessing because you've had a very limited view of how GUIDs are used. There are circumstances where repeatability is required.

    Why not just use MD5 or something?

    Because an MD5 doesn't look like a GUID, when a GUID is required perhaps?

    The 'something' in that would be a V3 (which uses md5 internally) or V5 (sha1) GUID.

    Why do you have unit tests that seem to rely on generating identical GUIDs?

    In short, because I'm testing the actual GUID generation code.

    I've had to implement GUID generation in a language that otherwise didn't naturally support it for a project I'm working on; most packages I found simply added more bloat.

    Using the repeatability of Version 3 and 5 GUIDs provide ideal unit tests with which to test it, because I can test all 128 bits of output.

    The only thing that can be reliably tested with Version 4 (basically a random number) is the version number and variant, which is basically 6 bits out of the whole 128, and doesn't really prove a lot when it comes to testing, apart from simply making sure they're the right values.



  • @pjh said in [Solved] UUIDs, variant (not version) 1. The `variant type` part. What determines the LSB?:

    TLDR: I'm unit testing newly introduced code for actually generating GUIDs, not stuff that relies on GUIDs.

    You could have just posted that without the snark.

    Sorry for being curious, sheesh.


  • Discourse touched me in a no-no place

    @blakeyrat said in [Solved] UUIDs, variant (not version) 1. The `variant type` part. What determines the LSB?:

    You could have just posted that without the snark.

    I wasn't the one being snarky.



  • @pjh said in [Solved] UUIDs, variant (not version) 1. The `variant type` part. What determines the LSB?:

    The only thing that can be reliably tested with Version 4 (basically a random number) is the version number and variant, which is basically 6 bits out of the whole 128, and doesn't really prove a lot when it comes to testing, apart from simply making sure they're the right values.

    You're just not trying hard enough. Generate 1,000,000 Version 4 GUIDs and do some statistical tests for randomness. Start with each bit individually, then do correlations of two, three, .... (and don't forget to check for correlations between GUID n and GUIDs n+1, n+2, ...)

    Oh, you wanted your unit tests to execute quickly? Never mind then.



  • @pjh said in [Solved] UUIDs, variant (not version) 1. The `variant type` part. What determines the LSB?:

    I wasn't the one being snarky.

    I assume you're implying that my questions were "snark" somehow?

    Sorry you read them that way? But they weren't.



  • @blakeyrat As far as I can tell nobody was being snarky. Except me in my last post. :)


Log in to reply