Archive a row before it ON DELETE CASCADEs?



  • Hello


    I have defined an ON DELETE CASCADE on a table; everything works fine. Is there a way I can "backup" that row before it's deleted? you see, the table contains important BLOB data that I'd like to keep for record's sake.


    So basically my question is this: Is there a mechanism via which
    you could archive a row before MySQL deletes it (due to ON DELETE
    CASCADE). For example, is there a way I could trigger an external
    script that saves the row information to the disk and then the ON
    DELETE CASCADE is executed?


    Excuse me if this doesnt sound like the right approach. I am just
    brainstorming here, any pointers for adopting a different strategy
    would be most appreciated.



  • If your using a recent release of MySQL 5.0, you can use triggers to accomplish the task.  (http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html)

    It will be something like the following

    Create Trigger ArchiveData
        Before Delete
        ON MyTable
        For Each Row
        Begin
        -- Archive column to another table here
        Insert into archive (Id, Blob) Values (DELETED.Id, DELETED.Blob);
       END

    The exact syntax might not be right, but that's should be close to what you need.  You might want to read up on the limitations as well, MySQL does have a tendency to support the syntax but fail in the implementation on certain storage engines (Check Constraints on MyISAM).

    If you're using earlier versions, you might not have too many options other than doing it in your DAL.
     



  • Cool thanks. I'll look more deeply into triggers. BTW I am using InnoDB, MyISAM doesn't even support referential integrity!


     



  • If you want to allow rows to be "deleted" but it is important that you maintain the old data for historical purposes, then you shouldn't physically delete the row from the table -- just mark the row as "deleted" or "inactive" or "removed" or something like that, using a Status column or a bit/boolean column to track this. 



  • @Jeff S said:

    If you want to allow rows to be "deleted" but it is important that you maintain the old data for historical purposes, then you shouldn't physically delete the row from the table -- just mark the row as "deleted" or "inactive" or "removed" or something like that, using a Status column or a bit/boolean column to track this. 

    I see what you mean. However the rows contain BLOB data and therefore can't just keep them "inactive".
     



  • @GizmoC said:

    @Jeff S said:

    If you want to allow rows to be "deleted" but it is important that you maintain the old data for historical purposes, then you shouldn't physically delete the row from the table -- just mark the row as "deleted" or "inactive" or "removed" or something like that, using a Status column or a bit/boolean column to track this. 

    I see what you mean. However the rows contain BLOB data and therefore can't just keep them "inactive".
     

    Huh? Either I'm totally missing your point, or I don't think you see what he means.  Jess's suggestion was adding a column named deleted or something. When you delete a row you don't actually delete the row. You use an update to set deleted=1.Then when selecting data you would always have deleted=0 in your where clause. However I think that gets to be a bit annoying. What I would personally do is create a view based off of that table based on the condition deleted=0 then only select your data from the view and not the table.



  • @CorporateFelon said:

    @GizmoC said:

    @Jeff S said:

    If you want to allow rows to be "deleted" but it is important that you maintain the old data for historical purposes, then you shouldn't physically delete the row from the table -- just mark the row as "deleted" or "inactive" or "removed" or something like that, using a Status column or a bit/boolean column to track this. 

    I see what you mean. However the rows contain BLOB data and therefore can't just keep them "inactive".
     

    Huh? Either I'm totally missing your point, or I don't think you see what he means.  Jess's suggestion was adding a column named deleted or something. When you delete a row you don't actually delete the row. You use an update to set deleted=1.Then when selecting data you would always have deleted=0 in your where clause. However I think that gets to be a bit annoying. What I would personally do is create a view based off of that table based on the condition deleted=0 then only select your data from the view and not the table.

    hmm.. I did understand what he meant. Correct me if I am wrong, but doesn't it make more sense to back-up the BLOB data rather than merely marking those rows as "deleted" ? Logically speaking, if I have unneeded BLOBs in my db, I am guessing it'll just make things slower?
     



  • @GizmoC said:

    hmm.. I did understand what he meant. Correct me if I am wrong, but doesn't it make more sense to back-up the BLOB data rather than merely marking those rows as "deleted" ? Logically speaking, if I have unneeded BLOBs in my db, I am guessing it'll just make things slower?

    Doubt it.

    A bigger DB guru than myself (easy to find, because I'm not much of a DBA :) ) might contradict that, but an unneeded blob would do nothing but take up disk space. It's unlikely that it would influence the speed of subsequent operations on that table that do not specifically involve that blob. It's not like a SELECT would scan through all records bit by bit. :)

    After that, you could always create a job that periodically selects the deleted blobs, saves them to an (external) disk, and then truly deletes the old stuff.


     



  • @dhromed said:

    Doubt it.

    A bigger DB guru than myself (easy to find, because I'm not much of a DBA :) ) might contradict that, but an unneeded blob would do nothing but take up disk space. It's unlikely that it would influence the speed of subsequent operations on that table that do not specifically involve that blob. It's not like a SELECT would scan through all records bit by bit. :)

    After that, you could always create a job that periodically selects the deleted blobs, saves them to an (external) disk, and then truly deletes the old stuff.

    The answer isn't quite as cut and dry as there are many factors that influence performance including database platform, schema design, etc.  Generally speaking any variable length record will be slower than a fixed width record because you can't just do a static offset.  Since BLOBs are variable length, then they fall into the slower category.  Some platforms can help minimize that by doing some tricks. Example, in MsSQL you can set the Image/Text file group differently from the regular data. Your BLOB columns are stored in a different file so queries that don't access the BLOB are faster - essentially tiering the table into fixed and variably width portions.

    Its been awhile since I've used MySQL, but I think BLOB support isn't quite up to snuff compared to other database platforms.  The last time I tried it, performance sucked horribly (about 2000 records - 1Gb database).  It's not like I did any comparison to other databases either and I'm not a big fan of BLOB columns anyways, so take that with a grain of salt.




  • It may be slower to query and manipulate blob compared to fixed length data stored in the row, but just to store them in your system as "archived" it is faster  because you are only storing and retrieving and dealing with pointers until when you actually need to retrieve the blob, which wouldn't be often if the data is "archived" of course. 


     


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.