Sometimes checking for NULL pointers is a mistake (NVidia CUDA).



  • You'd think that the following check can only prevent problems:

    if(pointer == NULL)
    {
         return 0;
    }

    But when you're programming for NVidia CUDA (An environment for general purpose programming on the GPU) it can cause very hard to debug problems. This is because with the Fermi architecture (The newest GPU architecture from NVidia) the first variable in the on-chip shared memory (one of the memory spaces you can access) has an address of 0.

    I found this out after a lot of cursing and suspecting pretty much anything else (memory leaks, hardware errors, compiler bugs) since I still couldn't figure out how to get the debugger to work for the GPU code.

    Edit: I forgot to mention that this is a new problem introduced with the Fermi architecture. The older GPUs did not generate 0 pointers. You can imagine the fun when doing hardware upgrades.



  •  Assuming this is C, your compiler is broken. From 6.3.2.3:

    3. An integer constant expression with the value 0, or such an expression cast to type
    void *, is called a null pointer constant.55) If a null pointer constant is converted to a
    pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
    to a pointer to any object or function.

    If this is broken, there is no way to determine if a pointer is null.



  • You're returning an arbitrary number and expect that there's no way returning that number could cause an issue?



  • How the fuck did you figure that out.



  •  What arbitray number is he returning? 0 *is* null in C.



  • I thought the OP meant the check

    if (pointer == NULL)

    which does not fail, but should since pointer points to a valid thing on adress 0?



  • This is exactly why C++0x (and probably C1x) introduce the [code]nullptr[/code]. They wanted to change NULL, but didn't because it would break lot's of old code.



  • The language is CUDA C. It's supposed to be a set of extensions to C for programming on the GPU. And I was returning 0 when I got a NULL pointer because it made sense for the following processing. The pointer was pointing to an array of integers. As for figuring it out: some very confused and increasingly desperate experimenting.

    I didn't know it breaks the C standard. I just thought it was a very stupid idea to have NULL pointers refer to actual data.

    Btw a nested WTF: This pointer is generated by the runtime when launching a GPU function. Unfortunately it is not passed as a parameter or anything so mundane, instead you have to declare it like so:

    extern __shared__ int memory[];
    
    __kernel__
    void function(int arg1, float* arg2)
    {
    // ...
    }
    

    And there can only be one extern shared array. If you need more than one you have to request more bytes of memory and subdivide it yourself.



  • @Sutherlands said:

    You're returning an arbitrary number and expect that there's no way returning that number could cause an issue?

    0 would be the correct answer for no input. Unfortunately it is the wrong answer when there is input. Which is what I was debugging.



  • I never understood why there is so much confusion over this topic.  I thought it was pretty straightforward.  And I also never understood why people found the need to use NULL instead of the literal 0 either.

    But then again, I'm pretty weird.

    So yeah, my vote here would be a broken compiler, because the literal value 0 should be an invalid address.  (If you really want to compare to address 0 in this instance, you'd need to create a pointer constant with value 0. Although I've never tried, I wonder an easy way to get this would be to use pointer math, like:

    void* ptr;

    ptr = (&ptr)-(&ptr);

    I'm not sure what the rules of casting would be; that is, I don't know if

    void* ptr = (void*)((int)0);

    would work.



  • IIRC 0 used to be a vallid memory address for architectures that mapped their ROMs to the end of memory. Infact there used to be and XT/AT switch on your keyboard that had to be set based on the proccesor, eventually an IP wraparound to 0 was implemented making the proccessor function in both mode (start at 0, and start at end-32 (or whatever it was)). This combination of comatability hacks ended up being an exploitable vector on the origonal xbox, the XT/AT compatability was baked into the components that the xbox used, by injecting a signal to that keyboard line, could cause the xbox's proccessor to start at 0, and could bypass the security code mapped into the end of memory, (i remeber it being far more complicated, and i probably have some details wrong).

    The decision to use 0 as NULL, IIRC, was because mapping the ROM to 0 became standard, and in cases where it wasn't it was reservered by the system for internal use, so 0 was on its way to becomming the defacto NULL, and got codified into C at some point.

    As mentioned above adding a nullptr frees up 0 for architectures that can support it (though IIRC most OSes still reserver much of low memory, but a GPU on a card being used as a spare proccessor may be an exception.)



  • ptr = &ptr - &ptr; is completely bogus. Subtracting two pointers gives you an integer, not another pointer.

    (int)0 is also bogus because 0 is already an int. So your code basically says void *ptr = (void *)0, which just sets ptr to NULL (and the cast is redundant anyway).

    I think the simplest way to get an all-bits-zero pointer is memset(&ptr, '\0', sizeof ptr);.

    Adding nullptr doesn't give you more portability or anything; it's a workaround for a problem C++ introduced, namely outlawing implicit void *Any * conversions. C is fine with defining NULL as ((void *)0), while C++ must define it as plain 0. This just means C++ will silently tolerate using NULL as a number, while C won't. It has no portability implications because the C standard says nothing about the bits of (void *)0. See the C FAQ for more fun and exciting information.



  • There was a similar exploit on the Wii, too. A function in the kernel does something along the lines of:

    void UnregisterHandler(int idx) {
        if(idx < 0 || idx > MAX_HANDLERS) {
            throwErrorOrSomething();
            return;
        }
        handler *h = Handlers[idx];
        *(h->foo) = 0;
        Handlers[idx] = NULL;
    }

    Handlers[ ] and the structs within are filled in by the kernel when the handler is registered. However, if you unregister a handler twice, the second time, h is NULL. They didn't check for that, and wouldn't you know it, it's possible for userspace code to map a page of memory to address zero, write an arbitrary address there, and use this function to zero out anything in kernelspace, such as a couple of critical instructions in security code. Oops.



  • @esoterik said:

    As mentioned above adding a nullptr frees up 0 for architectures that can support it (though IIRC most OSes still reserver much of low memory, but a GPU on a card being used as a spare proccessor may be an exception.)

    I can kind of understand that they might want to free up address 0 for shared memory since it's a small user-managed on-chip "cache". But small in this case means max. 64kB and the pointer is 32 bits, so I'm pretty sure they don't need to save address space.



  • @too_many_usernames said:

    And I also never understood why people found the need to use NULL instead of the literal 0 either.

    In C, there is a difference, because NULL is:

    #define NULL ((void *)0)
    

    and there are some problems using plain 0 for NULL. E.g. in variadic or unprototyped functions if sizeof(void *) differs from sizeof(int) (used to be quite common on 16-bit and is becoming common on 64-bit again) 0 is passed in format incompatible with pointer. So you actually have to use NULL in some contexts and than it's easier to use it in all, even though 0 will work in most.

    In C++ the C definition does not work, because C++ does not implicitly cast to/from void *. So C++ introduced this ugly rule that literal 0 implicitly casts to null pointer of any type and redefined NULL to just 0. It was less of a problem than in C, because C++ does not allow unprototyped functions and avoids variadic functions. But than templates and function overloading introduced new cases where using 0 for null pointer causes problems, which is why C++11 introduces the nullptr.

    The main reason to use NULL is readability though, because logically they are different things.

    @too_many_usernames said:

    So yeah, my vote here would be a broken compiler, because the literal value 0 should be an invalid address.

    Yes. Remember, there is no requirement that such pointer is actually represented by zero. The compiler would just have to emit appropriate mangling when converting between integer and pointer. @too_many_usernames said:

    (If you really want to compare to address 0 in this instance, you'd need to create a pointer constant with value 0. Although I've never tried, I wonder an easy way to get this would be to use pointer math, like:

    void* ptr;

    ptr = (&ptr)-(&ptr);

    That will give you NULL pointer. (&ptr)-(&ptr) is 0 of type ptrdiff_t, which is long except on 64-bit windows where it is long long (64-bit windows is an abomination with 32-bit long). And casting integeral 0 should always yield null pointer, not only for literals. @too_many_usernames said:

    I'm not sure what the rules of casting would be; that is, I don't know if

    void* ptr = (void*)((int)0);

    would work.

    No, you are still casting integer to pointer, so zero must yield null pointer in all cases. The only way to get pointer with zero value in envrionment with different representation of null pointers would have to be:

    intptr_t temp = 0;
    void *ptr = *(void **)&temp;
    

    Using the C99 intptr_t (interger at least as large as pointer), but on most platforms size_t or ptrdiff_t (the former is unsigned, the later signed) would do. They would not do on platforms using segments, because these only allow pointer arithmentics on offsets and so size_t and ptrdiff_t only need to accommodate the offset (e.g. 32-bit i386 with multiple segments per process (most systems use just one) would have 48-bit pointers, but still only 32-bit size_t).



  • @Bulb said:

    void *ptr = *(void **)&temp;

    What... the... .... Can you explain what is going on here? I understand the input and the output, but I don't understand how it gets that way.

    That statement has about 200% more asterisks than I'm comfortable with.



  • @Xyro said:

    @Bulb said:
    void *ptr = *(void **)&temp;

    What... the... .... Can you explain what is going on here? I understand the input and the output, but I don't understand how it gets that way.

    That statement has about 200% more asterisks than I'm comfortable with.

    It's a typo, it's supposed to read:

    void ***ptr = ***(void **)&&&temp;&&&;

    In other news, thank God I don't use C anymore.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    It's a typo, it's supposed to read:
    void ***ptr = ***(void **)&&&temp;&&&;

    In other news, thank God I don't use C anymore.

    Given that bollocks as an attempt at humour, I think the rest of us are thanking him.


  • @Bulb said:

    The only way to get pointer with zero value in envrionment with different representation of null pointers would have to be:

    intptr_t temp = 0;
    void *ptr = *(void **)&temp;
    

    Using the C99 intptr_t (interger at least as large as pointer)

    I don’t think that’s right. An integer constant (e.g., something of the form “0” or “0L”), yields a null pointer as a value when cast to pointer type. An integer value of 0 need not.

    To make this clear, consider a platform where 0 is a valid address, and the address 0xDEADBEEF is a special trap value suitable for the null pointer.

    void *null_ptr = 0;
    intptr_t null_int = (intptr_t)null_ptr;
    printf("%p = 0x%jX\n", null_ptr, (intmax_t)null_int); // prints “Null = 0xDEADBEEF”
    

    intptr_t zero_int = 0;
    void *zero_ptr = (void *)zero_int;
    printf("%p = 0x%jX\n", zero_ptr, (intmax_t)zero_int); // prints “0:0 = 0x0”

    assert(null_ptr == NULL); assert(null_ptr == 0); // redundant for emphasis
    assert(zero_ptr != NULL); assert(zero_ptr != 0); assert(zero_ptr != null_ptr);

    (I’ve chosen an odd, but conforming, implementation of the “%pprintf() format for emphasis.)

    Confusing? Sure. That said, if there’s a user-accessible variable at memory address 0, thenthe compiler really ought to provide this non-zero null to avoid exactly the foul-up noticed by the OP.



  • @Bulb said:

    No, you are still casting integer to pointer, so zero must yield null pointer in all cases. The only way to get pointer with zero value in envrionment with different representation of null pointers would have to be:

    intptr_t temp = 0;
    void *ptr = *(void **)&temp;
    If you use a C compiler (rather than a piece of code which merely bears some passing resemblance to one), that would just give you a null pointer. If you don't want to use anondrifice's suggestion, you can use
    [code]void  *ptr = (void*)integer[/code]
    where [code]integer[/code] is the number you get when you cast such a pointer to a suitable integral type. How you discover what that number actually is is another question, but anyone who needs to write to a specific point in memory probably already knows how to find out the answer. 


  • @Bulb said:

    (&ptr)-(&ptr) is 0 of type ptrdiff_t

    Huh? I'm fairly certain that ptrdiff_t is not an inherent type, so how can an expression evaluate to that type? I may have to research that... ok, at least in the compiler I have, the type of (void*)-(void*) is long

    @Bulb said:
    intptr_t temp = 0;
    void *ptr = *(void **)&temp;

    That's what I was trying to write. Apparently I was having some cognitive disorder. Again, though, intptr_t is not a built-in type is it?

    I believe the void *ptr = *(void**)&temp; version is the only way to do it without using a library function like memset

    EDIT: Oh, I just noticed

    @jcsalomon said:
    I don’t think that’s right. An integer constant (e.g., something of the form “0” or “0L”), yields a null pointer as a value when cast to pointer type. An integer value of 0 need not.

    That's actually what I thought, that the null pointer was reserved to the literal 0

    (And, for the record, I do appreciate and stand corrected by the observations on how the type of NULL as defined by #define NULL ((void*)0) is different than the literal 0. But, that said, I don't know why you'd ever do something like sizeof(NULL) instead of sizeof(void*) or, better yet,

    void* ptr;
    :
    :
    sizeof(ptr);



  • Types like size_t, ptrdiff_t, or intptr_t are built-in types—just not under those names. On some system, size_t will be a typedef for unsigned short; on another, for unsidned long long; on yet a third, for a compiler extension __builtin_size_t. Doesn’t matter to the programmer, so long as he can refer to whichever type it “really” is by the standard alias size_t.


  • Considered Harmful

    So shouldn't OP's comparison against NULL get cast to 0xDEADBEEF pointer or whatever and still return false for an actual pointer to valid address 0?


  • Discourse touched me in a no-no place

    @__moz said:

    @Bulb said:

    No, you are still casting integer to pointer, so zero must yield null pointer in all cases. The only way to get pointer with zero value in envrionment with different representation of null pointers would have to be:

    intptr_t temp = 0;
    void *ptr = *(void **)&temp;
    If you use a C compiler (rather than a piece of code which merely bears some passing resemblance to one), that would just give you a null pointer.
    Only on implementations where the null pointer happens to be all bits zero. It's certainly not portable.


  • @witchdoctor said:

    The language is CUDA C. It's supposed to be a set of extensions to C for programming on the GPU.
    And I was returning 0 when I got a NULL pointer because it made sense for the following processing. The pointer was pointing to an array of integers.
    As for figuring it out: some very confused and increasingly desperate experimenting.

    I didn't know it breaks the C standard. I just thought it was a very stupid idea to have NULL pointers refer to actual data.

    Btw a nested WTF: This pointer is generated by the runtime when launching a GPU function. Unfortunately it is not passed as a parameter or anything so mundane, instead you have to declare it like so:

    extern __shared__ int memory[;
    
    __kernel__
    void function(int arg1, float* arg2)
    {
    // ...
    }
    

    And there can only be one extern shared array. If you need more than one you have to request more bytes of memory and subdivide it yourself.

    Weird. You sure you're not running into some sort of weird NVCC bug? (There's plenty of those, especially in some of the old CUDA releases).

    I tried to repro this using

    #include <cstdio>
    

    device void f( int* p )
    {
    if( p == NULL ) printf( "p is NULL\n" );
    else printf( "p is not NULL\n" );
    }

    global void test()
    {
    extern shared int foo[];
    shared int bar;

    printf( "foo = %p; bar = %p\n", foo, &bar );
    f( foo );
    f( &bar );
    f( 0 );
    

    }

    int main()
    {
    test<<<1,1,0>>>();
    cudaDeviceSynchronize();
    test<<<1,1,4>>>();
    cudaDeviceSynchronize();
    return 0;
    }


    And get that foo' is on 0x1000010, and bar' on 0x1000000. (It also prints "not NULL", "not NULL", "NULL" for each kernel launch.) I compiled with -arch compute_20, and it's running on a GTX480.



  • @jcsalomon said:

    Types like size_t, ptrdiff_t, or intptr_t are built-in types—just not under those names.
     

    By "built-in" I meant the pedantic "a reserved keyword of the language specification that doesn't require a header file."  All those C99 types are library types as far as I'm concerned (and no I don't really want to get into the discussion about if the standard library should be considered part of the language or not).  While they are extremely useful, they always resolve to one of the primitive types.  Perhaps I should have used the word "primitive" instead of "built-in."

    I actually don't understand the typedefs to things like ptrdiff_t anyway, because they are still not portable; they are still going to be different sizes on different platforms.


  • ♿ (Parody)

    @too_many_usernames said:

    @jcsalomon said:
    Types like size_t, ptrdiff_t, or intptr_t are built-in types—just not under those names.

    By "built-in" I meant the pedantic "a reserved keyword of the language specification that doesn't require a header file."  All those C99 types are library types as far as I'm concerned (and no I don't really want to get into the discussion about if the standard library should be considered part of the language or not).  While they are extremely useful, they always resolve to one of the primitive types.  Perhaps I should have used the word "primitive" instead of "built-in."

    Yes. If anything requires and include directive to use, then I generally don't consider it to be built-in. They might be part of the standard, but they're not built-in to the compiler (not any that I've ever used, at least), which is a very different thing than the pure standard. Most compilers don't seem to fully / correctly implement any particular standard, anyways.

    @too_many_usernames said:

    I actually don't understand the typedefs to things like ptrdiff_t anyway, because they are still not portable; they are still going to be different sizes on different platforms.

    Uh, being different sizes on different platforms is the whole point of portability of source code. You can write code that will do the right thing on different platforms. Perhaps you're confusing source portability with something else.



  • @boomzilla said:

    Perhaps you're confusing source portability with something else.
     

    I suppose when I use the term "portability" I mean both source and data-structure portability.  I've run into interesting situations when switching code between 32- and 64-bit systems, and have had to learn a few ways of doing things to ensure how to deal with communication between the same application compiled both ways.  There's a surprising amount of I/O code you have to write (or include from a library) to ensure data files are portable.



  • @PJH said:

    @blakeyrat said:
    It's a typo, it's supposed to read:
    void ***ptr = ***(void **)&&&temp;&&&;

    In other news, thank God I don't use C anymore.

    Given that bollocks as an attempt at humour, I think the rest of us are thanking him.

    They can't all be winners.



  • Sometimes I feel like all you do is look for youtube clips to link in to random conversations...



  • @Sutherlands said:

    Sometimes I feel like all you do is look for youtube clips to link in to random conversations...

    That's not all I do.



  • @PJH said:

    @__moz said:


    @Bulb said:
    intptr_t temp = 0;
    void *ptr = (void **)&temp;
    If you use a C compiler (rather than a piece of code which merely bears some passing resemblance to one), that would just give you a null pointer.
    Only on implementations where the null pointer happens to be all bits zero. It's certainly not portable.


    That's my point about using a C compiler. [code]
    &tmp[/code] is just 0, and the compiler is obliged to treat [code]void *0[/code] as a void pointer, however it is represented.

    If you don't have access to a C compiler, of course, you just have to make the best with whatever you do have.



  • @__moz said:

    That's my point about using a C compiler. <font face="Lucida Console" size="2">*&amp;tmp</font> is just 0, and the compiler is obliged to treat <font face="Lucida Console" size="2">void *0</font> as a void pointer, however it is represented.

    If you don't have access to a C compiler, of course, you just have to make the best with whatever you do have.
     

    Are you sure? 6.3.2.3 only speaks of constant expressions, I didn't find a rule that every conversion of an integer expression evaluating to 0 must yield a null pointer when converted to void*.


  • Discourse touched me in a no-no place

    @too_many_usernames said:

    I've run into interesting situations when switching code between 32- and 64-bit systems, and have had to learn a few ways of doing things to ensure how to deal with communication between the same application compiled both ways.  There's a surprising amount of I/O code you have to write (or include from a library) to ensure data files are portable.
    That's what the int32_t (et alia) types are for.


  • Discourse touched me in a no-no place

    @__moz said:

    That's my point about using a C compiler. <font face="Lucida Console" size="2">*&tmp</font> is just 0, and the compiler is obliged to treat <font face="Lucida Console" size="2">void *0</font> as a void pointer, however it is represented.
    No it isn't. It's obliged to treat a literal 0 in a pointer context as 'null.' It is not required to treat a variable containing 0 in a pointer context as 'null.'



  • @__moz said:

    @PJH said:
    @__moz said:


    @Bulb said:
    intptr_t temp = 0;
    void *ptr = (void **)&temp;
    If you use a C compiler (rather than a piece of code which merely bears some passing resemblance to one), that would just give you a null pointer.
    Only on implementations where the null pointer happens to be all bits zero. It's certainly not portable.


    That's my point about using a C compiler. <font face="Lucida Console" size="2">
    &amp;tmp</font> is just 0, and the compiler is obliged to treat <font face="Lucida Console" size="2">void *0</font> as a void pointer, however it is represented.

    If you don't have access to a C compiler, of course, you just have to make the best with whatever you do have.

    1. There is a cast in the middle and it's important. Event if the specification said that any integer expression with value 0 must yield a NULL pointer (it does not), I've done the cast so that there is no integer expression. It takes address of memory filled with 0s, interprets it as address of pointer and reads that pointer. Which yields a pointer with numerical value 0 and that may not be the same as null pointer.
    2. Since the specification does indeed only speak of constant integer expression with value 0 to mean NULL pointer, (void *)temp actually has to yield the same pointer with numeric value 0, which may not be the same as null pointer (the conversion is "implementation defined" by the standard with a note that it should be unsurprising to somebody knowing the addressing model, so usually it should do the same as the one with the double indirection.
    3. Actually I was doing it wrong, because according to strict aliasing rules the cast invokes Undefined Behaviour. The correct way to do it would be:
      char tmp[sizeof(void *)] = { 0 }; /* this syntax zeroes the whole array */
      void *ptr = *(void **)tmp; /* tmp decays to char * */
      
      because strict aliasing explicitly allows casting between char * and any other type of pointer.


  • @Bulb said:

    Actually I was doing it wrong, because according to strict aliasing rules the cast invokes Undefined Behaviour. The correct way to do it would be:



    char tmp[sizeof(void )] = { 0 }; / this syntax zeroes the whole array */
    void *ptr = *(void *)tmp; / tmp decays to char * */

    because strict aliasing explicitly allows casting between char * and any other type of pointer.

    And, from this thread in comp.lang.c, this can even be done (in C99) without the temporary:

    #define ALL_ZEROS(T) ( ( (union {unsigned char uc[sizeof(T)]; T it;}){.uc={0}} ).it )
    #define ZERO_PTR ALL_ZEROS(void *)
    


  • @cvi said:

    @witchdoctor said:

    The language is CUDA C. It's supposed to be a set of extensions to C for programming on the GPU.
    And I was returning 0 when I got a NULL pointer because it made sense for the following processing. The pointer was pointing to an array of integers.
    As for figuring it out: some very confused and increasingly desperate experimenting.

    I didn't know it breaks the C standard. I just thought it was a very stupid idea to have NULL pointers refer to actual data.

    Btw a nested WTF: This pointer is generated by the runtime when launching a GPU function. Unfortunately it is not passed as a parameter or anything so mundane, instead you have to declare it like so:

    extern __shared__ int memory[;
    
    __kernel__
    void function(int arg1, float* arg2)
    {
    // ...
    }
    

    And there can only be one extern shared array. If you need more than one you have to request more bytes of memory and subdivide it yourself.

    Weird. You sure you're not running into some sort of weird NVCC bug? (There's plenty of those, especially in some of the old CUDA releases).

    I tried to repro this using

    snip...
    

    And get that `foo' is on 0x1000010, and `bar' on 0x1000000. (It also prints "not NULL", "not NULL", "NULL" for each kernel launch.) I compiled with -arch compute_20, and it's running on a GTX480.

    I built it with nvcc 4.0 and tested on a Quadro FX5800 (Compute Capability 1.3) and a GTX580 (built without setting -arch). The pointer was NULL on the GTX580 (With my code, not your test code, I'm not at work right now.) But yes, NVCC bug seems likely.

    Edit: The CUDA version is CUDA 4.0


Log in to reply