Put it in a branch



  • Our environment consists of two development databases, svn and assorted code branches.

    One of our db tables (that contains an xml-filled clob) had grown unmanageably large (>3B rows), so the powers-that-be are finally acquiescing and letting me partition the table (mostly for aging and offlining of data).

    Since everyone else is working in database A, I do it in database B (which, for some reason nobody likes to use). The change involves renaming table: x =-> x_clob, creating a new table: x_part, and creating a view to mask the change: x. This way, the queries used by other teams don't need to change up front.

    I tell my boss I'm done and he decides it can't also be done in db-A at this time, so I should instead put the code into a branch.

    Erm....

    1. If you want it in a branch, it's a whole lot easier if that's known up front, rather than after the fact (by default, we're supposed to do general development in trunk, which gets snapshotted and promoted for release)

    2.  I can put packages, procedures and functions in the database with a suffix, and configure our application to use a different suffix to get these changes, but the renamed table is too large to have copies; it's a singleton global resource. Regardless of where my code is located, if I make the change to the table in the db, everyone has to grab my code (the view is read only, so all inserts would fail without the new procs).

    Hmmm, we'll need to think about that...

    And so my code sits...

     



  • @snoofle said:

    1. If you want it in a branch, it's a whole lot easier if that's known up front, rather than after the fact

    Step 1: Check out the current trunk into a new directory.

    Step 2: Tag with a branch tag.

    Step 3: Pull the branched code.

    Step 4: Copy files that you haven't checked in to new directory.

    Step 5: Check in on branch.

    Step 6: Stop whining.

     

    Seriously, if this is too difficult for you, you need to go ahead and use the schema everyone else is using.



  • @Sock Puppet 5 said:

    Seriously, if this is too difficult for you
    I know how to move the code into a branch. That wasn't the point. Putting it in a banch doesn't solve the problem of a monstrously large table as a singleton resource.



  • @snoofle said:

    Our environment consists of two development databases, svn and assorted code branches.

    One of our db tables (that contains an xml-filled clob) had grown unmanageably large (>3B rows), so the powers-that-be are finally acquiescing and letting me partition the table (mostly for aging and offlining of data).

    Since everyone else is working in database A, I do it in database B (which, for some reason nobody likes to use). The change involves renaming table: x =-> x_clob, creating a new table: x_part, and creating a view to mask the change: x. This way, the queries used by other teams don't need to change up front.

    I tell my boss I'm done and he decides it can't also be done in db-A at this time, so I should instead put the code into a branch.

    Erm....

    1. If you want it in a branch, it's a whole lot easier if that's known up front, rather than after the fact (by default, we're supposed to do general development in trunk, which gets snapshotted and promoted for release)

    2.  I can put packages, procedures and functions in the database with a suffix, and configure our application to use a different suffix to get these changes, but the renamed table is too large to have copies; it's a singleton global resource. Regardless of where my code is located, if I make the change to the table in the db, everyone has to grab my code (the view is read only, so all inserts would fail without the new procs).

    Hmmm, we'll need to think about that...

    And so my code sits...

     

    TRWTF is development that is not happening on a branch.



    EDIT: Actually TRWTF is unbearably huge development databases, but the first one sounds more pithy.



  • When I read your stories, I feel really bad for you. Then I remember where I work, and I consider you lucky.


    Lets put it this way: if I asked any of the developers I work with what a branch is, the only definition you would hear is one involving the words "Maple" or "Redwood".



  • @pkmnfrk said:

    TRWTF is development that is not happening on a branch

    Agreed, but it's the boss' policy: everything done in trunk unless he expressly says otherwise. Unfortunately, he sometimes makes that decision after the work is done. I'm just a lowly consultant; when in Rome...


    @pkmnfrk said:

    Actually TRWTF is unbearably huge development databases
    Normally, our dev db is usually pretty modest (only a tiny subset of prod data), but in this case, there was no way to verify this change scaled without the whole table. The DBAs had to shuffle around some resources to make it happen, but I actually have a table of 3.5 billion rows, each mostly consisting of a 2K clob filled with XML.

    Personally, I am floored by the fact that they knew it was going to get this big - up front - and didn't design in anything to segment the data.

     



  • @pkmnfrk said:

    TRWTF is development that is not happening on a branch.
    What do you call the main 'core' that everyone merges their changes on branches into?



  • @C-Octothorpe said:

    "Maple" or "Redwood"
    Ouch!



  • @PJH said:

    What do you call the main 'core'
    The way it generally works is:

    1.  All work is done on trunk.

    2. When it's cutoff time, the old preproduction branch is dropped (physically deleted from svn) and a copy of trunk is made and called preproduction. This is what gets qa'd, and bug fixes go here, and ultimately get merged back to trunk.

    3. As part of deployment, the old production branch is physically dropped from svn, and a copy of preprod is taken and called production

    Why physically dropping branches? Space limits.

    I've had numerous discussions with the boss about this, but to no avail.

     

     



  • @snoofle said:

    @PJH said:

    What do you call the main 'core'
    The way it generally works is:

    1.  All work is done on trunk.

    2. When it's cutoff time, the old preproduction branch is dropped (physically deleted from svn) and a copy of trunk is made and called preproduction. This is what gets qa'd, and bug fixes go here, and ultimately get merged back to trunk.

    3. As part of deployment, the old production branch is physically dropped from svn, and a copy of preprod is taken and called production

    Sounds not dissimilar to what I think we're doing:

    1. Between releases, all work on trunk
    2. On upcoming version release, freeze on all code not already in unless it's a bugfix for something (dev's have a personal branch at that point if they need to check stuff in for whatever reason and if they can't wait for the unfreeze)
    3. At the time of the first alpha/beta/whatever to leave the department, trunk is branched and tagged/named at that version number. Trunk is then free to resume for checkins - see (1).
    4. Any bugs/dev required for that release then happens on that branch - and stuff migrated between branch and trunk where necessary.



      @snoofle said:
      Why physically dropping branches? Space limits.
      Fortunately we don't have that problem.


  • @snoofle said:

    One of our db tables (that contains an xml-filled clob) had grown unmanageably large (>3B rows), so the powers-that-be are finally acquiescing and letting me partition the table (mostly for aging and offlining of data).

    Since everyone else is working in database A, I do it in database B (which, for some reason nobody likes to use). The change involves renaming table: x =-> x_clob, creating a new table: x_part, and creating a view to mask the change: x. This way, the queries used by other teams don't need to change up front.

    What is the RDBMS? It would be surprising that a database that large is running on a technology that does not support a partitioning scheme that is transparent at the logical level. On most products one can simply create a partition function (in your case it would likely be on the date field) and data will move to the right file/tablespace/filegroup on its own, initially as well as for any future insert, no need to create multiple tables and hide them behind views. Nowadays even MySql supports table partitions, that says a lot... The database engine, knowing the partition function, can then access only the relevant files or can work in parallel for aggregations.

    If you had a solution that does not change the data model, then there is no branching discussion needed.

    Also I'm curious - a Dev DB that is that big is pretty unusual. Do you really use all that data? Maybe you need to test extensive data mining features where a smaller subset would not work? If you have a different situation it would be interesting to know about it.



  • @Speakerphone Dude said:

    What is the RDBMS? It would be surprising that a database that large is running on a technology that does not support a partitioning scheme that is transparent at the logical level.

    Dude, the DBAs at Snoofle's place have a Post-It note on their monitor reminding them to put on pants every day.

    You should lurk more before posting.



  • @blakeyrat said:

    @Speakerphone Dude said:
    What is the RDBMS? It would be surprising that a database that large is running on a technology that does not support a partitioning scheme that is transparent at the logical level.

    Dude, the DBAs at Snoofle's place have a Post-It note on their monitor reminding them to put on pants every day.

    You should lurk more before posting.

    "First pants, then shoes."



  • @pkmnfrk said:

    "First pants, then shoes."
     

    "And one leg at a time."



  •  @snoofle said:

    Since everyone else is working in database A, I do it in database B (which, for some reason nobody likes to use). The change involves renaming table: x =-> x_clob, creating a new table: x_part, and creating a view to mask the change: x. This way, the queries used by other teams don't need to change up front.
    Ahh the goold old partitioned view trick.  Create a table with the same column definitions but no constraints or indexes. Pump in your new data. Apply the constraints and indexes and update the view.  In SQL Server the Query optimizer will look any constraints you've placed on the underlying tables and use those to decide if it even needs to read a specific table at all.

     



  • @dhromed said:

    @pkmnfrk said:

    "First pants, then shoes."
     

    "And one leg at a time."

    You forget trousers, they go on after pants but before shoes.  Looks like we need more post it notes around here.



  • @KattMan said:

    @dhromed said:

    @pkmnfrk said:

    "First pants, then shoes."
     

    "And one leg at a time."

    You forget trousers, they go on after pants but before shoes.  Looks like we need more post it notes around here.

    ???

    trousers: plural of trou·sers
    Noun: An outer garment covering the body from the waist to the ankles, with a separate part for each leg.
    Synonyms:
    pants - pantaloons - breeches - slacks

    pants: plural of pants
    Noun: Trousers.
    Synonyms:
    trousers - pantaloons - breeches - underpants - slacks



  • @KattMan said:

    @dhromed said:

    @pkmnfrk said:

    "First pants, then shoes."
     

    "And one leg at a time."

    You forget trousers, they go on after pants but before shoes.  Looks like we need more post it notes around here.

    Underwear optional?

    The mind recoils in horror...


  • @dhromed said:

    @pkmnfrk said:

    "First pants, then shoes."
     

    "And one leg at a time."

    Tried that, but the pants only had two legs.



  • @blakeyrat said:

    @Speakerphone Dude said:
    What is the RDBMS? It would be surprising that a database that large is running on a technology that does not support a partitioning scheme that is transparent at the logical level.

    Dude, the DBAs at Snoofle's place have a Post-It note on their monitor reminding them to put on pants every day.

    You should lurk more before posting.

    I don't see how creating modifications to the data model instead of table partitions is a way to avoid incompetent DBAs. It would be quicker for Snoofle to create a partition function compared to the heavy work of changing the model with all the side-effects this involves. But you are right, I don't know his situation that well so maybe it does not apply in this case.



  • @dhromed said:

    @pkmnfrk said:

    "First pants, then shoes."
     

    "And one leg at a time."

    Why would you do it one leg at a time?  Sit down on the edge of your bed for a moment and you can do both legs at once. :P

     



  • @pkmnfrk said:

    ???

    trousers: plural of trou·sers
    Noun: An outer garment covering the body from the waist to the ankles, with a separate part for each leg.
    Synonyms:
    pants - pantaloons - breeches - slacks

    pants: plural of pants
    Noun: Trousers.
    Synonyms:
    trousers - pantaloons - breeches - underpants - slacks

     

     I believe Kattman may be referring to the "underpants" synonym, which is mostly what people around here would assume you meant as well.

     



  • @Mason Wheeler said:

    @dhromed said:

    @pkmnfrk said:

    "First pants, then shoes."
     

    "And one leg at a time."

    Why would you do it one leg at a time?  Sit down on the edge of your bed for a moment and you can do both legs at once. :P

     

     

    Not on the edge of [i]my[/i] bed you can't.  You'll be lucky if you can get back up afterwards.

     



  • @pkmnfrk said:

    pants
    Third link on that search (for me) was "Mens Underwear, Calvin Klein, Ralph Lauren, Emporio Armani ... [www.iwantpants.com/]" which was the intended meaning. Not everyone lives in North America.



  • @oheso said:

    I believe Kattman may be referring to the "underpants" synonym, which is mostly what people around here would assume you meant as well.

    Do people really regularly call "underpants," "pants?" I accept that they're a synonym, since they're a type of garment worn where you'd wear pants. Some of them can even be long. But I've never heard anyone use "pants" when they really meant underpants. I can accept that it was someone trying to stretch a definition for a joke, and maybe that's what you're saying most people did.



  • @boomzilla said:

    Do people really regularly call "underpants," "pants?"
    In the UK, yes. In fact you're more likely to hear 'pants' than 'underpants' to mean undergarments.



  • @snoofle said:

    Why physically dropping branches? Space limits.

    Either you didn't hear the actual reason, you don't understand how Subversion works or the one who defined this (sensible) practice doesn't understand how Subversion works, since it saves exactly no space. That's because it does not affect past revisions, so the files are still in the repository. Just not linked in the latest revision. If it did affect previous revisions, doing this would be a gigantic WTF as you would be loosing the previous releases.

    The actual reason for deleting branches normally is general sanity. When branches accumulate, it gets progressively harder to find the one you are interested in in the long list of obsolete ones. So it's easier to just delete and replace the obsolete branches. They are still accessible by specifying revision number in the right range, but don't fill up the repository browser tree.



  • @PJH said:

    @boomzilla said:
    Do people really regularly call "underpants," "pants?"
    In the UK, yes. In fact you're more likely to hear 'pants' than 'underpants' to mean undergarments.

    Guys it was a throwaway joke, you don't have to spend the rest of your life talking about pants. Sheesh.

    I guess I should feel flattered that people liked my joke so much they kept riffing on it for 46 more posts, but man it ruins the original joke like nothing else. Just stamps it into the ground.



  • @blakeyrat said:

    I guess I should feel flattered that people liked my joke so much they kept riffing on it for 46 more posts, but man it ruins the original joke like nothing else. Just stamps it into the ground.

    Wait, when did we start talking about stuff that we like around here?!



  • @C-Octothorpe said:

    When I read your stories, I feel really bad for you. Then I remember where I work, and I consider you lucky.


    Lets put it this way: if I asked any of the developers I work with what a branch is, the only definition you would hear is one involving the words "Maple" or "Redwood".

    +1

    Sadly, I'm in the same situation



  • @Speakerphone Dude said:

    ...ideas...
    We use Oracle. For these particular tables, we have over 30 billion rows totalling over 30TB. Oracle won't let you partition a table in-place; it effectively moves every row from where it is to it's proper place in the range/hashed partition. The only way to do that would be to shut down our system for way longer than we can shut it down. The only practical alternative is to leave the data in-place, rename the existing table to xxxOld, make a new partitioned table: xxxPart and a view with the original table name to hide the dual tables. Its very quick (seconds) and there's zero hassle with overflowing transaction/redo logs or downtime.



  • @snoofle said:

    2. When it's cutoff time, the old preproduction branch is dropped (physically deleted from svn) and a copy of trunk is made and called preproduction. This is what gets qa'd, and bug fixes go here, and ultimately get merged back to trunk.

    3. As part of deployment, the old production branch is physically dropped from svn, and a copy of preprod is taken and called production

    Why physically dropping branches? Space limits.

     

    Somebody doesn’t know how svn is supposed to work. Copies are cheap, unless you’re exporting and then importing to make these copies (they’re just pointers to the prevoius version plus any changes). Deleting anything from svn doesn’t free up any space, unless they’re actually dumping and reloading the repository through svndumpfilter or something similar.

     



  • @pkmnfrk said:

    TRWTF is development that is not happening on a branch

    Can you branch databases? Are the database files included in the SVN repository? More likely you're talking in abstract terms, but maybe databases have versioning now?



  • @lettucemode said:

    but maybe databases have versioning now?
    Perhaps not, but the DDL scripts can certainly be versioned.



  • @lettucemode said:

    Can you branch databases? Are the database files included in the SVN repository? More likely you're talking in abstract terms, but maybe databases have versioning now?

    No, but you could at least store the DML required to generate the database schema and branch that...


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.