Printf bashing (C)



  • Here's a weird problem.

    At run time I'm going to have a printf format string and a linked list containing the arguments for that string (thus I have no idea ahead of time how many arguments there will be and since the linked list just happens to contain a union, I don't even know what the arguments will be).  So how do I run a printf on that?

    The first idea was I can cheat by passing a struct to printf like so:

     

    struct printf_args { ...lots of memory... }
    ...
    printf_args args;
    void * p = &args
    memcpy( p, &arg0, sizeof(arg0) );
    p += sizeof(arg0);
    memcpy( p, &arg1, sizeof(arg1) );
    p += sizeof(arg1);
    ...
    printf( format, args );

    The problem is what happens when some joker decides to give me a printf string with 4K of data? In order to use the struct trick to make C push random data into printf I need to define that struct ahead of time and I have to allocate the extra memory each time I need this function. And what happens if I don't have a large enough struct type defined?

    My next thought was to break up the format string into individual %foo tokens and build the string from a number of 1-argument printfs that can be hard-coded. But eventually someone's going to try to drop in a string like printf("%*.*f %n %*d",3,2,12.34,&x,20-x) just to see if he can break it.

    I don't suppose someone can think of a portable/elegant solution? (aside from the obvious "Use Perl/Python stupid!")



  • @omega0 said:

    Here's a weird problem.

    At run time I'm going to have a printf format string and a linked list containing the arguments for that string (thus I have no idea ahead of time how many arguments there will be and since the linked list just happens to contain a union, I don't even know what the arguments will be).  So how do I run a printf on that?

    The first idea was I can cheat by passing a struct to printf like so:

     

    struct printf_args { ...lots of memory... }
    ...
    printf_args args;
    void * p = &args
    memcpy( p, &arg0, sizeof(arg0) );
    p += sizeof(arg0);
    memcpy( p, &arg1, sizeof(arg1) );
    p += sizeof(arg1);
    ...
    printf( format, args );

    The problem is what happens when some joker decides to give me a printf string with 4K of data? In order to use the struct trick to make C push random data into printf I need to define that struct ahead of time and I have to allocate the extra memory each time I need this function. And what happens if I don't have a large enough struct type defined?

    My next thought was to break up the format string into individual %foo tokens and build the string from a number of 1-argument printfs that can be hard-coded. But eventually someone's going to try to drop in a string like printf("%*.*f %n %*d",3,2,12.34,&x,20-x) just to see if he can break it.

    I don't suppose someone can think of a portable/elegant solution? (aside from the obvious "Use Perl/Python stupid!")

    You don't want to use printf. Really. Use vprintf (vfprintf, vsprintf) instead.
     



  • ...

    ...

    ...

    My head-bashy how can this work problem turned into a head-bashy why couldn't I have just turned the page.  Stupid standard library solving all the problems ahead of time.

    Many thanks.
     



  • Hold on, I doubt it's actually legal to use vprintf to do what you suggest

    Why do you need to use printf instead of making your own function to print things?



  • @Random832 said:

    Why do you need to use printf instead of making your own function to print things?

    I have no control over the input.  My function must handle a pointer to a printf format string and a pointer to the head of a linked list which contains the arguments.



  • @omega0 said:

    I have no control over the input.  My function must handle a pointer to a printf format string and a pointer to the head of a linked list which contains the arguments.


    I don't think there is any safe way to do that.  How can you know ahead of time that there are enough entries in the linked list to satisfy printf's requirements?  I'm not a C programmer but this does not sound like a reasonable design requirement.  I'm not sure if the standard C library can do this without you having to re-implement printf.

    What you really would like would be a version of printf that lets you register a callback function that returns the value for index [i]n[/i], but I don't think that's part of the standard library.

    Aren't there a whole bunch of better strings-for-c libraries out there?



  • *nods* printf is inherently unsafe, but at the moment the users of this routine should be smart enough to correctly count arguments in a printf string (my last words...).  Plus the program is small enough and lacking in critical function that would prevent recovery from a segfault.


    However I think it's going to be a moot point:

           The   ANSI  C  standard  specifies that implementations must support at
           least formatted output of up to 509 characters.

    I have a very ordinary test case that uses a 10,000 character printf string.



  • I don't mean to be an idiot here, but I am very glad about the progress that is happening with gcj.  It lets you compile Java into shared object libraries, so it would let you use safe Java string manipulation functions to do this kind of thing, and then link it in to the rest of your app.  C is a painful and unsafe language for doing what you're trying to do.



  • You're probably right, if this project was more serious I probably should be farming out the string processing to some well-tested and safegaurded routines.  At the moment though I just care if it will run under various environments and good output gives good results.

    The non-serious aspect is very obvious when you see how it's supposed to work: The data being manipulated is in a linked list (to emulate arrays since there will be a lot of splicing going on but not much random access) contains a union so that it can handle multiple data types (and one of those data types is a pointer to a linked list node so that you can have arrays of arrays).  Functions get passed a pointer to an array of pointers to these linked lists (and have to figure out for themselves if there are the right number of parameters and the parameters are the right types).  And finally, the instructions this is supposed to operate from are not interpreted, but are converted to C-code, compiled and then linked with object files of the utility functions.

    In other words, I have a huge mess that should have been written in Perl if only the person who wrote the core code had been willing.



  • Yeah it really sounds like the wrong start to a project.  C can be fine for manipulating string, but standard lib is NOT fine for manipulating strings, and even with the right library, you are still better off using a safe language, whether it's java or perl or whatever.



  • I have no control over the input.  My function must handle a pointer to a printf format string and a pointer to the head of a linked list which contains the arguments.
    Where are these requirements coming from. And, parse the format string yourself. It's really not as hard as it sounds. (esp. if you're willing to skimp on details like padding, farm out the actual formatting to e.g. snprintf calls, etc)
     



  • @omega0 said:

    users of this routine should be smart enough to correctly count arguments in a printf string (my last words...)

     Fun reoccurring situation at work:

    char msg[BUF_SIZE] = "Session ID is %s";
    GetSessionID(msg);

    ...... 

    void GetSessionId(char* foo){
            LogAPI("IN: GetSessionId.  params: [%s]", foo);
            // Put the session ID in formatted string foo.
            sprintf(foo, m_sSessionID);
    }

    ......

    void LogAPI(char* logdata){
             if(LogServerAlive()){
                     // Send logging message to log server
                     SendLog(logdata);
             }
             else{
                     // Uh oh, log server is down.  Print the log to stdout instead
                     printf(logdata);
             }
    }
     

    So we're intentionally trying to log something with a "%s" in it, but can't be entirely certain how many times it'll be formatted.  Yay!

    (and yes, the correct answer is that GetSessionId is poorly defined, and should instead return a char*) 


Log in to reply