Errm... I don't think that's a good choice of key.



  • Some of you have probably heard of the online virtual world Second Life. Some of you are probably also aware that they open-sourced their client a while ago. Unfortunately, it's a giant WTF.

    For example, I was recently looking at the packet creation code for the UDP-based protocol it uses. Basically, each packet consists of a pre-defined set of named blocks, each of which has its own pre-defined structure. Some blocks are "variable", which means there can be several of them in a packet. The code for adding a block to the list of blocks to be sent in the next packet is... interesting. It basically works something like this:

    void LLTemplateMessageBuilder::nextBlock(const char* blockname)
    {
    	char *bnamep = (char *)blockname; 
    
    	// ok, have we already set this block?
    	LLMsgBlkData* block_data = mCurrentSMessageData->mMemberBlocks[bnamep];
    	if (block_data->mBlockNumber == 0)
    	{
    		// nope! set this as the current block
    		block_data->mBlockNumber = 1;
    		mCurrentSDataBlock = block_data;
    	}
    	else
    	{
    		// already have this block. . . 
    
    		// modify the name to avoid name collision by adding number to end
    		S32  count = block_data->mBlockNumber;
    
    		// incrememt base name's count
    		block_data->mBlockNumber++;
    
    		char *nbnamep = bnamep + count;
    	
    		mCurrentSDataBlock = new LLMsgBlkData(bnamep, count);
    		mMemberBlocks[nbnamep] = mCurrentSDataBlock;
    	}
    }

    That's right - it uses, as the key to the mapping all the blocks are stored in, the sum of a pointer to the block name and (in order to handle variable blocks) the number of blocks of that name that are already there. (The code here is somewhat simplified, but you get the idea. Also, all the comments are genuine, original comments.)



  • No surprise here... considering it's coming from the company that can't support more than two client connections per server.



  • @makomk said:

    That's right - it uses, as the key to the mapping all the blocks are stored in, the sum of a pointer to the block name and (in order to handle variable blocks) the number of blocks of that name that are already there. (The code here is somewhat simplified, but you get the idea. Also, all the comments are genuine, original comments.)

    They didn't overload the + operator to stringify the number, then concatenate it to the original string, did they?  Not that I'm excusing that sort of thing, but it seems slightly more plausible than just bumping the pointer and using whatever the result happens to be pointing too.

    Another possibility is that the "number of blocks of blocks that are already there" is always small, so when the name is something like "foobar", the result is "oobar" or "obar", and it never runs off the end of the string to become a noticable problem.



  • @iwpg said:

    @makomk said:

    That's right - it uses, as the key to the mapping all the blocks are stored in, the sum of a pointer to the block name and (in order to handle variable blocks) the number of blocks of that name that are already there. (The code here is somewhat simplified, but you get the idea. Also, all the comments are genuine, original comments.)

    They didn't overload the + operator to stringify the number, then concatenate it to the original string, did they?

    You can't overload operator+(char*, int) and expect any code to still work after compiling it (in particular, the standard library will stop working).



  • @iwpg said:

    They didn't overload the + operator to stringify the number, then concatenate it to the original string, did they?  Not that I'm excusing that sort of thing, but it seems slightly more plausible than just bumping the pointer and using whatever the result happens to be pointing too.


    Totally breaking pointer arithmatic on char* would be an even bigger WTF, IMO.

    @iwpg said:

    Another possibility is that the "number of blocks of blocks that are already there" is always small, so when the name is something like "foobar", the result is "oobar" or "obar", and it never runs off the end of the string to become a noticable problem.


    Can be up to 255. There's at least two packets (possibly three - not sure if the third is still used) which currently always have 218 VisualParam blocks (each of which consists of a single byte, which is probably a WTF by itself), and I think there are other packets which have variable blocks with large numbers of repeats.



  • Having looked closer at the code, it's even more... interesting than I thought. The order of the blocks in a message is actually the order of the pointers to the names of the blocks (fields within a block are probably done the same way). Since the blocks have no identifying headers, it's important that this doesn't change. How is this achieved? Via a big static hashtable of interesting design:

    const U32 MESSAGE_MAX_STRINGS_LENGTH = 64;
    const U32 MESSAGE_NUMBER_OF_HASH_BUCKETS = 8192;
    
    char mString[MESSAGE_NUMBER_OF_HASH_BUCKETS][MESSAGE_MAX_STRINGS_LENGTH];
    


    All the packet/block/field names are hashed and interned into this table in the appropriate slots. (In the case of a hash collision, the next free slot is used. You may have spotted that this means any change to the message template file used to define the message format will have unpredictable results on block and field ordering. I think message IDs are also assigned in hashtable order.) This ensures that, as long as two systems have the same message template file, the pointers to the interned strings will be in the same order, and therefore they will agree on block and field ordering within messages.

    In order to make the code more efficient, there are also variables of the form PREHASH_SomeString which hold the correct interned pointer to the string in question (in this case, "SomeString"). Most of the code uses these, though some doesn't. If you're using a PREHASH string, you call a method called (say) fooFast, whereas if you're not you have to call a method with the exact same signature but without the "Fast" at the end of the name which interns the string for you and then calls fooFast. (Confusingly, the function I described above corresponds to nextBlockFast rather than nextBlock, despite its name.)



  • well, that is weird code.  nbnamep is a char *, then it's used as an array subscript.  That's mighty odd.

     

     



  • @asuffield said:

    [You can't overload operator+(char*, int) and expect any code to still work after compiling it (in particular, the standard library will stop working).

     

    What a wicked idea, though...



  • @wtf said:

    char *nbnamep = bnamep + count;

    By the way...this is clearly a bug written by a former Java programmer, right?

    He expected to concat the name with the count without dereferencing the pointer, converting int to string, or even using a concat function. No way somebody actually thought it was a good idea to use a pointer as a key. In Java, however, this line would roughly be correct.

     OP, did you file this as a bug? Or are you planning to use this maliciously?



  • @savar said:

    He expected to concat the name with the count without dereferencing the pointer, converting int to string, or even using a concat function. No way somebody actually thought it was a good idea to use a pointer as a key.

    Actually, I do know of a few instances where real, working code has used pointer arithmetic to generate pointers-as-keys (nothing simple enough to explain here though - mostly rather complicated memory management schemes). But I cannot see any way in which this particular instance could be correct.



  • @asuffield said:

    @savar said:

    He expected to concat the name with the count without dereferencing the pointer, converting int to string, or even using a concat function. No way somebody actually thought it was a good idea to use a pointer as a key.

    Actually, I do know of a few instances where real, working code has used pointer arithmetic to generate pointers-as-keys (nothing simple enough to explain here though - mostly rather complicated memory management schemes). But I cannot see any way in which this particular instance could be correct.

    See my post above. Due to the way they allocate the memory for the strings in question, this is in fact pretty much guaranteed to work correctly as long as you don't have more than 64 of each block per packet. Unfortunately, this isn't actually the case. The thing is, unless you get unlucky and another block in the same packet ends up in the next hash bucket from a block you've got too many of, it still works perfectly, which is probably why the code is still there. (I hear their hash algorithm also sucks, BTW).


Log in to reply