DB question (spec on DELETE)



  • So, i've just been put in charge of writing an online app in php/mySQL. But i've never touched SQL before in my life(nor any other DB language).

    The app is just an events manager.

    One key word i'd like to avoid is the infamous 'DELETE'. So I've decided to try and completely avoid it whatsoever. Instead of ever deleting a rocord i'll just set a deleted flag to true. Then at the end of every month the app is shut down for maintenance and the entire database is copied to a new database but ignoring any records that have been marked deleted. thus the records are effectively deleted and i still have a backup incase something went terribly wrong.

    Is this a good idea or a bad idea?



  • I'm not a DB expert, but wouldn't this mean that you would have to filter every query you write to make sure you don't accidently find deleted data.  This would probably negatively affect performance.  Moving deleted rows into a special "deleted" table might be a better idea.  Why are you trying to avoid DELETE anyway?



  • @burnmp3s said:

    I'm not a DB expert, but wouldn't this mean that you would have to filter every query you write to make sure you don't accidently find deleted data.  This would probably negatively affect performance.  Moving deleted rows into a special "deleted" table might be a better idea.  Why are you trying to avoid DELETE anyway?

    I don't mind the minor performance loss(from what i gather this is just a minor loss, however I could be wrong). And as I said I've never used SQL before so I don't want anything like this to happen:

    @Alex Papadimoulis said:

    Mine was a missing WHERE clause in a certain DELETE query that resulted in the deletion of all 6,000+ rows instead of just one. Whoops. Of course, my blunder only resulted in several hours of lost work, a painful data restore, and one really embarrassed junior developer (me).



  • You're not really winning anything by doing it that way and you're making your application much more complex.  Situations with "forgotten" WHERE clauses come up when entering ad-hoc queries, not in (most) applications' queries. Furthermore, forgetting the WHERE clause on "UPDATE foo SET is_deleted = 1" isn't much less destructive than forgetting it on "DELETE FROM foo".  Yes, you still have the rows, but you can't tell which ones should've been deleted and which ones shouldn't.

    You'll be better served by a reliable backup system than by avoiding a useful tool.
     



  • @Hitsuji said:

    I don't mind the minor performance loss(from what i gather this is just a minor loss, however I could be wrong). And as I said I've never used SQL before so I don't want anything like this to happen:

    @Alex Papadimoulis said:

    Mine was a missing WHERE clause in a certain DELETE query that resulted in the deletion of all 6,000+ rows instead of just one. Whoops. Of course, my blunder only resulted in several hours of lost work, a painful data restore, and one really embarrassed junior developer (me).

    Sure if that happened you'd not delete all of your data, but the result would still be pretty much as bad - everything would be flagged as deleted, so you can't tell valid from invalid data.

    Copying and recreating the database every month is just  a bodge job that won't even protect you from what you want it to. You would be far better off ensuring a solid design from the start. A few ideas to think about:

    Ban direct queries to the live database - every action goes through stored procedures, anddon't develop on the live server - do that on a test db so you can make your mistakes in a safe sandbox.
    When deleting (using a SP/triggers), move the data to a tblName_deleted table with an added auto-populated deletion date column - you're then covered in the event of the above mentioned scenario, since you can tell what you just deleted (deleted_time = now-5) from genuine deleted data.
    Look into the data integrity features either built into your DB, or that you can build into it - I don't know where mySQL is nowadays with this stuff, but stop using it if you have to, there are lots of other choices, free or otherwise.



  • DELETE isn't something to be scared of, just be carefull when testing, i personally always first make prints/var_dumps of my sql statements before i actually use them.

    also, just backup your sql database daily. preferably as insert statements, since csv can be a pain in the ass.

    And if your afraid of human error while maintaining, and figuring you will be using phpmyadmin. Just rip out the empty and drop table buttons.

    And lasty, don't worry. you will fuck up. everyone does. And buy yourself a good book on SQL. if only to learn what a cross join is and why you shouldn't be using it.
     



  • What you are proposing (formally called logical deletes) have their place. Typically they are used when the system needs very high levels of auditing and traceability like financials.  In some respects it also similar to a obsolete flag, ie you can't delete a part number since it is tied to existing orders/invoices but you don't actively sell it.  You also find them in data warehousing applications in some areas when you deal with "corrections" such as general ledger reversals, invoicing credits, etc.  If you really don't need that level of auditing/traceabiltiy, I'm not sure what advantage logical deletes offer.  Just marking them deleted really doesn't help much unless you also track the user who deleted it, the time, reason, etc. 

    Recreating the database every month doesn't sound like a good idea to me either.  I take the approach I learned from managing servers, I always look for ways to keep the system online.  I will exhaust all options before I consider shutting a system down unless there is a immediate danger to the integrity of the system. 

    A good backup strategy, thorough testing, and good deal of learning/research will address most of your concerns.

     



  • A trick i have used before is to copy the row from the table to be deleted to a table called <tablename>_deleteddata or something like that and then delete the row from the original table.  I have used this where users could delete the rows but a record needed to be kept of a previous value for reporting etc.  You can then write an interface for users to restore old data if need be.  It's not the best solution for every situation but for some projects i have found it very useful.  It's easy to set up regular archiving of the _deletedata tables periodically if space is an issue.  The main advantage of this technique over the one described above is that there is not the added overhead of of the WHERE deleted = 0 in every query.  Although i have used the technique described above in situations where users regularly need to "undelete" data because it is easier and less invasive than doing a full restore from backup.  Although in 97% of cases it's probably advisable to just use delete statements, you really need to assess the project and see whether the ability to undelete data is necessary in this situation and worth the tradeoff in efficiency.



  • sorry RayS i just read your post and realised you suggested the same thing first ;-)



  • > mysql --i-am-a-dummy

    It prevents you from running an update or delete without specifying the key. My wife got me into, basically, the same habit on every other database platform... every update and every delete starts it's life as a select statement to make sure I have the records I want to mangle.

    If you're really nervous, just run mysqldump every time before you start testing. Heck, check each mysqldump into version control so you can go back to any revision you want.

    Once you've tested your statements, throw them in your code and stop worrying about it.



  • Hitsuji, this might sound a bit harsh but:

    If you are not competent enough to use the DELETE statement, you should not use UPDATE either. Better stay away from the database until you know how to use it.

    Just look how many WTFs on the front page are created by people who didn't know their stuff. You don't want to be one of them, would you? 



  • @ammoQ said:

    Hitsuji, this might sound a bit harsh but:

    If you are not competent enough to use the DELETE statement, you should not use UPDATE eigther. Better stay away from the database until you know how to use it.

    Just look how many WTFs on the front page are created by people who didn't know their stuff. You don't want to be one of them, would you? 

    And SELECT and INSERT  as well. There is no better way to learn about databases than to not interact with them at all, whatsoever. Read books. Books that are not about SQL. Read cook books.



  • Another thing that really helps it turn off auto commits.  This really helps in testing as the changes aren't complete until you specifically submit a commit statement.  It been a while since I've used MySQL, but I believe commits are supported in InnoDB.

     



  • @dhromed said:

    And SELECT and INSERT  as well. There is no better way to learn about databases than to not interact with them at all, whatsoever. Read books. Books that are not about SQL. Read cook books.

    Good point. I should have written "stay away from the production database".

    Maybe Hitsuji should start learning on a scratch database, where no real damage could be made?



  • @ammoQ said:

    Maybe Hitsuji should start learning on a scratch database, where no real damage could be made?

    I only wish that were possible. Thanx for the info everyone, guess I almost made a WTF of my own there. My original plan was to start out the way I described and then add in the DELETE statements when I got comfortable with SQL, but from the looks of it, it's not that hard for the basics.Guess it's back to the books then. At least it's not a critical system that i'm working with.


Log in to reply