Best Linked List Ever



  • @dkf said in Best Linked List Ever:

    @ixvedeusi said in Best Linked List Ever:

    And why would that be (assuming you never have to traverse it in reverse)?

    The information flow inside the processor results in better handling of memory, whereas complex tricks with address arithmetic force the main ALU to be used to generate the address, so you tend to get a longer pipeline stall. By contrast, array accesses have been extremely well optimised for you already, as they're utterly critical to so much performance. (Structure accesses are also really quick.)

    Essentially, if you store the address in "plain sight" (i.e., you store the actual address, and not some value from which it can be derived), your CPU may be able to prefetch memory related to it. If you need to compute the address, it's much less likely that the CPU will see the address ahead of time and be able to prefetch based on that.

    There's a talk at one of the recent CppCon by (IIRC) a LLVM guy. Essentially, arrays work well with cache and prefetch, linked lists can maybe be prefetched one node ahead and you might get lucky with the cache (but probably not). He suggests an array of pointers as a middle ground in some cases (i.e., with big nodes), since that allows prefetching to be more efficient (look-ahead is easier).

    Disclaimer: This assumes a modern CPU. Your problem might be different. Use protectionprofilers. Etc etc etc.



  • @Steve_The_Cynic C has it's benefits

    1. Dumb coworkers are rarer and more evident, so they don't mess with me on stupid arguments*
    2. There is less competition, so nobody cares if you're a little insane

    .* Like the guy who gave me shit for using the + operator in C# strings instead of string.concat and it wasn't even in a loop. Fuck these dumbfuckers.


  • Discourse touched me in a no-no place

    @cvi said in Best Linked List Ever:

    If you need to compute the address, it's much less likely that the CPU will see the address ahead of time and be able to prefetch based on that.

    Fixed offsets work well. Arrays would be a problem except they've been optimised very hard from the hardware side. Complex structures… are a problem, as you end up with a direct data dependence between one fetch (of the address) and another (of the next). Full computed addresses are slow, but aren't really used very often.



  • @dkf said in Best Linked List Ever:

    Full computed addresses are slow, but aren't really used very often.

    Mainly referring to the linked list with a single offset from which you can derive the next or previous address while traversing the list in said direction. (I've seen the same idea, but with xor-ing the pointer values.)


  • Java Dev

    @cvi Probably also applies to 'our pointer values are always 8-byte aligned so I can use the last 3 bits for something else'.



  • @PleegWat said in Best Linked List Ever:

    @cvi Probably also applies to 'our pointer values are always 8-byte aligned so I can use the last 3 bits for something else'.

    The nasal demons from that are going to leave a large crater in the face of whoever says it.

    Or in Denver, if you run it on a DeathStation 9000.


  • Java Dev

    @Steve_The_Cynic When you hear such stories, it's usually many years ago and in code targeting a single compiler and architecture, so nasal demons tend to be consistent.



  • Well I had been doing C and C++ for a long time before I first read that the mere act of loading an invalid (but non-NULL) pointer (even without dereferencing it) was nasal demon material.

    Which also means, by the way, be careful about your choice of types when doing a XOR list, because the result of XORing two pointers shall not be expected to be a valid pointer. One would need to use uintptr_t for the entire computation and storage.



  • @PleegWat Well, Win32 guarantees that Win32 handles have the bottom two bits available for special use, and those are stored as pointer types, so all platforms that run Windows should offer guarantees that this doesn't cause UB.

    And as Raymond Chen mentioned, Win32 does NOT guarantee the same about process IDs.



  • @Medinoc I wouldn't be surprised if the preload caching is 8-byte aligned or something anyway, so even if it incorrectly assumed that the last bits are part of the pointer it'd still end up having the correct memory address preloaded.


  • Java Dev

    @Medinoc said in Best Linked List Ever:

    @PleegWat Well, Win32 guarantees that Win32 handles have the bottom two bits available for special use, and those are stored as pointer types, so all platforms that run Windows should offer guarantees that this doesn't cause UB.

    And as Raymond Chen mentioned, Win32 does NOT guarantee the same about process IDs.

    At that point you're targeting a single platform (windows) on a single architecture (x86) with a single compiler (whatever the windows standard libraries get compiled with).

    To your own code, those values are simple 32-bit or 64-bit integer values. Despite being implemented as pointers they may not even point to a specific memory location in your address space. As such from the viewpoint of your code, there is no UB. If the standard library implements this in C, it is almost certainly UB. If it implements it in inline assembly or some other language, I'm not sure on the UB implications. Technically, UB is a poisoned chalice and UB in the standard library poisons your entire application with nasal demons. In practice, UB is a compiler concept not a processor concept, and in absence of certain types of link-time optimisation (which isn't performed here because dlls) the poison cannot spread.


  • Discourse touched me in a no-no place

    @PleegWat said in Best Linked List Ever:

    UB is a compiler concept

    Also, specific compilers may define behaviour more strictly than the language standard requires. Code that compiles reliably with them would then not necessarily be portable to any other compiler, but would function correctly in the environment that it was intended to be run in.

    The true nasal demons are for where neither the compiler nor the language standard define what happens.



  • @Steve_The_Cynic said in Best Linked List Ever:

    @PleegWat said in Best Linked List Ever:

    @cvi Probably also applies to 'our pointer values are always 8-byte aligned so I can use the last 3 bits for something else'.

    The nasal demons from that are going to leave a large crater in the face of whoever says it.

    Or in Denver, if you run it on a DeathStation 9000.

    IIRC the talk I mentioned above also outline a tuple<>-like wrapper that combines pointers with some other data (up to N bits of it). As mentioned, this was from one of the LLVM devs, although I don't think that particular concept has made it into their code base (yet?).


  • Discourse touched me in a no-no place

    @cvi said in Best Linked List Ever:

    IIRC the talk I mentioned above also outline a tuple<>-like wrapper that combines pointers with some other data (up to N bits of it). As mentioned, this was from one of the LLVM devs, although I don't think that particular concept has made it into their code base (yet?).

    Probably hasn't made it in at all; the costs of doing shenanigans with “spare” bits in pointers are higher than you'd think at first glance, since they introduce more complex data dependencies that play merry hell with CPU branch predictors.



  • @dkf Yeah, you'd need to profile that pretty carefully (wrapping it into a generic class should make switching back and forth between implementations easy enough, though). Might be worth it if it allows you to make the resulting object smaller and more cache friendly. I also wouldn't use it on a pointer that's likely very frequently accessed (if you have something vector-like with start, end and capacity pointers, you could use a bit in the capacity pointer -- you never access it when iterating over the active range, for instance).


Log in to reply