Checking for the existence of... something.



  • I have a test install of the LiveJournal code, ported to run on Apache 2/mod_perl 2 (it seemed like a good idea at the time). A recent change broke it so that all .bml files came up as 404 with a message in the log like "[Sun Aug 26 17:05:11 2007] [error] File does not exist: /home/lj/htdocs/index.bml" (but only after the server had been up for a minute or two). Since /home/lj/htdocs/index.bml did in fact exist, I decided to take a look at the code causing the error:

    sub handler
    {
    my $r = shift;
    my $file;

    $Apache::BML::r = $r;
    
    # determine what file we're supposed to work with:
    if (ref $r eq "Apache::FakeRequest") {
        # for testing.  FakeRequest's 'notes' method is busted, always returning
        # true.
        $file = $r->filename;
        stat($file);
    } elsif ($file = $r->notes("bml_filename")) {
        # when another handler needs to invoke BML directly
        stat($file);
    } else {
        # normal case
        $file = $r->filename;
        $r->finfo;
    }
    
    unless (-e _) {
        $r->log_error("File does not exist: $file");
        return NOT_FOUND;
    }
    
    unless (-r _) {
        $r->log_error("File permissions deny access: $file");
        return FORBIDDEN;
    }
    
    # boring stuff...
    



    Okay, so it's checking for the existence of something - but what? Some quick Googling turns up this - so in the first two cases it checks that $file exists and is readable. However, in the "normal case", it's testing for the existence of whatever file stat() or a file test operator was last used on before it was called. Since the "normal case" code path is for when the code is being called by mod_perl directly as a PerlHandler, alarm bells are now ringing. Presumably, the author noticed that mod_perl was calling stat() or similar on the file before it called the handler, and decided to save a call by reusing the result.

    The reason it "worked" before and suddenly broke? Changes to the config file handling meant that the code to check if the Livejournal config file has been modified (which runs as part of the request cycle) stat()ed a non-existent file. This code was somewhere else (not just a different file, but a different SVN repo), completely unrelated, and called (indirectly) from a different mod_perl handler. (Actually, if I'm reading the code in question correctly, it should always have stat()ed an existing file as the last call to stat() before it returned, but I've done some testing and it does seem to be the culprit, so...)



  • @makomk said:

            $r->finfo;

    I suspect the author expected this to set the "last stat buffer" used by the -e _ and -r _ expressions, but from the documentation (I didn't bother testing it, so I could be wrong) it looks as though the finfo method uses some special Apache thingy, not Perl's normal stat builtin, so it wouldn't work.



  • @iwpg said:

    @makomk said:

            $r->finfo;

    I suspect the author expected this to set the "last stat buffer" used by the -e _ and -r _ expressions, but from the documentation (I didn't bother testing it, so I could be wrong) it looks as though the finfo method uses some special Apache thingy, not Perl's normal stat builtin, so it wouldn't work.

    Ah. That is evil - and as far as I can tell, not documented anywhere. It is documented that finfo can be used by file tests in mod_perl 1.0 and not in mod_perl 2.0, but I wouldn't expect just getting its value to update the last stat buffer. (Looks like another thing I need to port, too.)



  • @makomk said:


     

        } else {
    # normal case
    $file = $r->filename;
    $r->finfo;
    }



    Almost right...  $r->finfo() returns the finfo object (details about the file apache is thinking about serving up at this stage in the request).  I think what the author intended to do was:

      $r->finfo(APR::Finfo::stat($filename, APR::Const::FINFO_NORM, $r->pool));

    You normally do this if you've changed the file apache was going to serve and your code is after the MapToStorage phase of the request.  I use it on one system because I don't let apache do the MapToStorage, because my URLs rarely match real files - it saves a whole bunch of stat()s because normally Apache would look for all possible .htaccess files etc.



  • I'm not sure if I'm misunderstanding this, or something, but doesn't $file always get set to the filename?

    As such, instead of unless (-e _) couldn't the author have used unless (-e $file)?

    Which is better for a whole number of reasons, especially since you wouldn't have to track down subtle bugs like this.



  • They're using the last stat buffer here because they're quite concerned about performance.  Calling (-e $filename) stats the file again (using stat(2) internally), while (-e _) re-uses the results of the last stat.

    On my machine, (-e $file) takes 743ns to complete, while (-e _) takes about 237ns, so it's about 3 times faster (that is, (-e $file) takes three times as long as (-e _)).

    Of course we're talking about a half a microsecond saved here, but I guess every little bit counts.
     



  • @Stephen Deken said:

    They're using the last stat buffer here because they're quite concerned about performance.  Calling (-e $filename) stats the file again (using stat(2) internally), while (-e _) re-uses the results of the last stat.

    On my machine, (-e $file) takes 743ns to complete, while (-e _) takes about 237ns, so it's about 3 times faster (that is, (-e $file) takes three times as long as (-e _)).

    Of course we're talking about a half a microsecond saved here, but I guess every little bit counts.
     

    But, they stat the file just to put it into the stat buffer - they don't do anything with the result of the stat, so what do they gain?

     is "stat $file; (-e _)" faster then "(-e $file)"?
     



  • No, those two are equivalent.  (Actually, "stat $file; (-e _)" is slightly slower than a plain "(-e $file)" if all you're doing is one test, because Perl doesn't optimize it together.)

    In this case, though, they're running two tests on the same buffer -- they do a (-e _) followed by a (-r _).  That's where you get the savings, because you don't have to do the stat again.  Again, the savings are pretty negligible, since each stat call only takes about 500ns.

     


Log in to reply