My god, it's full of carts



  • Hi all, I just joined the forum to post about my personal WTF for today.

    I run a small webstore which gets a few thousand orders per year with a custom cart system. It has been working quite well for two years now, but I noticed that it was getting slower by the day. I had been putting it off but I had to finally profile my code, because the delay was beginning to be noticeable. I threw in some timing code and after 15 mins of searching for the bottleneck I narrowed it down to this SQL query:

    select id from cart where id = $session_id

    Every time an item is inserted into the shopping cart it goes into this table, then on every page the table needs to be checked to know whether the cart is empty or not. With only a few thousand orders I should be perfectly OK even if that table is never emptied. However when I checked that table, I noticed I had TWO MILLION carts in it. Huh? After I pondered about it for a while I made the mental connection to another problem I had before. Because my "insert item to cart" link is wrongly a normal link (GET) instead of a POST-type form, web spiders would follow that link and insert everything to a cart. Apparently 2 million item pages had been indexed so far, with the web spiders following the "add this item to cart" link for each item and creating a new cart every time. It would have resulted into just one cart with a lot of items, but since the spiders were ignoring cookies they were getting a new random session id every time. Oops.



  • Oops indeed. It's a subtle lesson, but once learned not easily forgotten. It happened to me too.

    My company (the one I worked at about 4 years ago) had a utility that would send reports (by email) out to various people at the click of a link. Every once in a while, the reports seemed to send themselves out, prematurely. Later I found out why, like you did. There were a number of things that I had overlooked or did not think about: POST instead of GET (ie, a link), checking authentication, using robots.txt, etc. I had actually thought "Well, if they get to this script, they've already logged in, so I don't have to check it here!"



  • And this is why you don't store temp data in a permanent database? How about putting stuff like that in a session (which expires after time) and only put it in your database when the actual order is made?



  • robots.txt anyone? Or do those spiders ignore it?



  • @ammoQ said:

    robots.txt anyone? Or do those spiders ignore it?


    You probably would want the spiders to index the product pages themselves.



  • @Pap said:

    @ammoQ said:
    robots.txt anyone? Or do those spiders ignore it?


    You probably would want the spiders to index the product pages themselves.

     

    robots.txt is not all-or-nothing. 



  • @ammoQ said:

    @Pap said:

    @ammoQ said:
    robots.txt anyone? Or do those spiders ignore it?


    You probably would want the spiders to index the product pages themselves.

     

    robots.txt is not all-or-nothing. 

    Doesn't matter much though as in this case the issue is with the design of the webapp, not with the spiders themselves (preloading of links e.g. Google Accelerator would've had the same result without robots.txt being of any use).



  • @ammoQ said:

    robots.txt anyone? Or do those spiders ignore it?

    Spam-searching spiders do, which is why http://www.spamjunkyard.com/ can only cause pain for them >:D



  • I have to nominate this post for best title in a discussion forum posting.


Log in to reply