PHP - random garbage collection



  • Found this one in an open-source e-commerce app. Can anyone elaborate on the virtues of random cleanup routines?

    if ($session->gc_probability > 0) {
      $randmax = getrandmax();
      $nrand = (int)(100 * my_rand() / $randmax);
      if ($nrand < $session->gc_probability) {
        $mod->gc($session->gc_maxlifetime);
      }
    }
    


  • There's a php.ini setting that controls the probability of the session GC routine firing on any given request. I suspect that whoever wrote this code wanted to reduce the probability, but assumed that many users would be running in environments (shared hosting with PHP in safe mode) where they wouldn't have the appropriate access to do it correctly.



  • Come to think of it, I'm curious what my_rand() is all about, there's probably some serious WTF-ery going on in there. I tend to get nervous when folks decide to "build a better rand()"...



  • @Frigax said:

    I'm curious what my_rand() is all about
     

    It's not terrible, but it's still pretty silly:

      function my_rand($min = null, $max = null) {
        static $seeded;
    
    if (!isset($seeded)) {
      mt_srand((double)microtime()*1000000);
      $seeded = true;
    }
    
    if (isset($min) &amp;&amp; isset($max)) {
      if ($min &gt;= $max) {
        return $min;
      } else {
        return mt_rand($min, $max);
      }
    } else {
      return mt_rand();
    }
    

    }



  • @sootzoo said:

    Found this one in an open-source e-commerce app. Can anyone elaborate on the virtues of random cleanup routines?

    Looks to me as if someone didn't grasp the concept of nondeterministic GC - and unfortunately didn't realize he had no clue either.




  •  @sootzoo said:

    @Frigax said:

    I'm curious what my_rand() is all about
     

    It's not terrible, but it's still pretty silly:

      function my_rand($min = null, $max = null) {
    static $seeded;

    if (!isset($seeded)) {
    mt_srand((double)microtime()*1000000);
    $seeded = true;
    }

    if (isset($min) && isset($max)) {
    if ($min >= $max) {
    return $min;
    } else {
    return mt_rand($min, $max);
    }
    } else {
    return mt_rand();
    }
    }

     

     

    mt_rand() hasn't been needed for a few releases now.

     

    TRWTF: I can't format one word in a code tag.



  • @PSWorx said:

    @sootzoo said:

    Found this one in an open-source e-commerce app. Can anyone elaborate on the virtues of random cleanup routines?

    Looks to me as if someone didn't grasp the concept of nondeterministic GC - and unfortunately didn't realize he had no clue either.

    This isn't GC in the sense of memory management.  This is PHP doing "garbage collection" on expired sessions.  For example, say all of your sessions are stored in a table.  You don't want to do "DELETE FROM sessions WHERE (expires < now())" on every single page-hit, so instad you only do it on every 100th or 1000th.  This isn't a WTF, although it could definitely be written better.



  • @PSWorx said:

    Looks to me as if someone didn't grasp the concept of nondeterministic GC - and unfortunately didn't realize he had no clue either.
     

    Or, sootzoo is primarily from the Java/.NET world and the term "garbage collection" is bit of semantic misdirection here.

    It's for session cleanup - fine. Are there not a bazillion better ways to do so than randomizing the whole routine?



  • @niteice said:

    TRWTF: I can't format one word in a code tag.

    <font face="Courier New"><font face="Courier New">word</font></font>



  • @sootzoo said:

    It's for session cleanup - fine. Are there not a bazillion better ways to do so than randomizing the whole routine?

    Why?  It works quite well as-is.  If you expired sessions to be cleared roughly every 1000 hits, just check if (rand(0,999) == 999) and you're good to go.  I suppose you could keep a counter to ensure sessions are cleared precisely every 1000 hits, but that seems kind of pointless.  What other method would you use?



  • @morbiuswilters said:

    @sootzoo said:

    It's for session cleanup - fine. Are there not a bazillion better ways to do so than randomizing the whole routine?

    Why?  It works quite well as-is.  If you expired sessions to be cleared roughly every 1000 hits, just check if (rand(0,999) == 999) and you're good to go.  I suppose you could keep a counter to ensure sessions are cleared precisely every 1000 hits, but that seems kind of pointless.  What other method would you use?

    I guess my first question would be "how expensive is it, time-wise to get a random number?"  If it takes significantly more resources to get a random number than to just increment a counter, then the RNG solution would be pretty pointless.


  • @TooWhiteAndNerdy said:

    @morbiuswilters said:

    @sootzoo said:

    It's for session cleanup - fine. Are there not a bazillion better ways to do so than randomizing the whole routine?

    Why?  It works quite well as-is.  If you expired sessions to be cleared roughly every 1000 hits, just check if (rand(0,999) == 999) and you're good to go.  I suppose you could keep a counter to ensure sessions are cleared precisely every 1000 hits, but that seems kind of pointless.  What other method would you use?

    I guess my first question would be "how expensive is it, time-wise to get a random number?"  If it takes significantly more resources to get a random number than to just increment a counter, then the RNG solution would be pretty pointless.

    It's not expensive at all.  And the counter would have to be stored in a database or the like, which would be slower. 



  • @morbiuswilters said:

    @sootzoo said:

    It's for session cleanup - fine. Are there not a bazillion better ways to do so than randomizing the whole routine?

    Why?  It works quite well as-is.  If you expired sessions to be cleared roughly every 1000 hits, just check if (rand(0,999) == 999) and you're good to go.  I suppose you could keep a counter to ensure sessions are cleared precisely every 1000 hits, but that seems kind of pointless.  What other method would you use?

    Clear one session every hit, if there is at least one session eligible for clearing. This is assuming that the per-session cost of clearing is significantly higher than the fixed cost of clearing.



  • @Carnildo said:

    Clear one session every hit, if there is at least one session eligible for clearing. This is assuming that the per-session cost of clearing is significantly higher than the fixed cost of clearing.

    Since this would be in a database you are essentially running the query "DELETE FROM sessions WHERE (expires < now())".  That clears all expired sessions.  Running that on every page load would add pointless load, so you might as well run it every 1k+ hits.  I would actually use something more like 10k, personally. 



  • @morbiuswilters said:

    Since this would be in a database you are essentially running the query "DELETE FROM sessions WHERE (expires < now())".  That clears all expired sessions.  Running that on every page load would add pointless load, so you might as well run it every 1k+ hits.  I would actually use something more like 10k, personally. 

     

    Wouldn't it make more sense to tie it to time instead of page hits?  Most sites don't have incredibly consistent traffic, but if the rule says "Run the thingy to clear expired sessions if the last time we ran it was over 1 minute ago", then your traffic could triple and this wouldn't have to run any more often.  It also solves the problem of not expiring sessions when you're supposed to in the event of a really slow day with not much traffic.

    Either way you have to check something on each hit - either the number of hits since last cleanup, or the time of last cleanup, and I think the latter makes more sense, but that's just my opinion.  I would never try to hand-code this stuff anyway, that's what state servers/services/providers are for.



  • @Aaron said:

    Wouldn't it make more sense to tie it to time instead of page hits?  Most sites don't have incredibly consistent traffic, but if the rule says "Run the thingy to clear expired sessions if the last time we ran it was over 1 minute ago", then your traffic could triple and this wouldn't have to run any more often.  It also solves the problem of not expiring sessions when you're supposed to in the event of a really slow day with not much traffic.

    Either way you have to check something on each hit - either the number of hits since last cleanup, or the time of last cleanup, and I think the latter makes more sense, but that's just my opinion.  I would never try to hand-code this stuff anyway, that's what state servers/services/providers are for.

    This doesn't handle expiring sessions, just purging the data from already expired sessions.  By doing it randomly, you cut down the amount of queries to the DB and don't have to store anything.  Not the time of the last purge nor the number of hits since last purge.  This is the same method I use to do session cleanup and it works quite well.  Normally you wouldn't hard-code something like this because PHP handles it for you, but I've done it because it gives me more control over the scalability of the app since I can cache sessions in memcached and keep the authoritative data on several different databases to spread out load.  Presumably whoever wrote this sample of code was doing the same thing.  Definitely not a WTF.


Log in to reply