Semantically diffing SQL DDL?



  • So, I'm researching reworking our current (not that great) database schema management process -- right now, we keep the master schemas in our databases, and manually generate SQL for schema updates.

    Ideally, I'd be able to keep the schema DDL for creating the schema from scratch in version control, then basically run a "semantic diff" from the previous DDL to the current DDL in order to generate the correct SQL statements to patch the in-database schemas (so if you add a new column to a CREATE TABLE statement, it'd generate an ALTER TABLE ADD COLUMN for the new column, for instance).

    My limited research has identified several tools (such as Red Gate's) for differencing schemas that are actually living in a database, but none that can take two text files of DDL and generate the "patch" SQL needed to move from one to the next.

    (Approaches like Rails migrations are rather undesirable as they create an issue with stacks of deltas that must be applied in sequence to get a database from scratch -- if they get run in the wrong order for some reason, stuff breaks :P )

    P.S. we do have stored procedures in our DB, but those aren't an issue when stored in textual form as we are upserting them anyway (i.e. CREATE OR REPLACE STORED PROCEDURE). I just want to get away from having the in-database copy of the schema being the master, as that approach is rather fragile from a versioning standpoint in my eyes.

    P.P.S. we are on Oracle 11g2, but don't have Oracle Change Management AFAIK -- not that that does what I want anyway :P



  • Never heard of any such thing.

    Since you already know about RedGate, why don't you execute both the old schema and new schema on a dev server (or local machine) and then make a diff using those?



  • I would suggest that you don't do this.

    Doing "offline state diffs" can cause subtle problems. For example, if you split a column into two columns, you may need to do data transformation to distribute the existing data to its new home. It's pretty easy to bake that into a change script, but it can be challenging to coordinate the transformation to go along with the schema change if the schema changes are migrated by a tool.



  • @tarunik said:

    but none that can take two text files of DDL and generate the "patch" SQL needed to move from one to the next

    If you keep the schema DDL in a standardized format, then the text diff of the files will tell you everything you need to know. We have a rule that our schema scripts must always be idempotent - this way, we merge the changes we want into the trunk and simply run all of the affected scripts to apply the database schema. The "patch" SQL is the same as the DDL.



  • @cartman82 said:

    Since you already know about RedGate, why don't you execute both the old schema and new schema on a dev server (or local machine) and then make a diff using those?

    There's only one schema with this complete set of DDLs in it on a given DB instance at any given time, as the app that uses this depends on public synonyms to point at the correct schema during operation, instead of having app-level facilities to specify the schema name. Running an Oracle instance on my local machine is a no-go, for obvious reasons.

    @Jaime said:

    Doing "offline state diffs" can cause subtle problems. For example, if you split a column into two columns, you may need to do data transformation to distribute the existing data to its new home. It's pretty easy to bake that into a change script, but it can be challenging to coordinate the transformation to go along with the schema change if the schema changes are migrated by a tool.

    I'm thinking you could simply propagate the data change after the schema change (at least in the case of a tool like mine, which at least is smart enough to not be knocked over when an extra column gets tacked onto a table somewhere). (Considering that I can do the data changes in dev/test myself, and that's not true for schema changes, I don't see why that'd be a major issue...but if you have a corner case in mind that breaks this approach, I'd love to see it!)

    @Jaime said:

    We have a rule that our schema scripts must always be idempotent - this way, we merge the changes we want into the trunk and simply run all of the affected scripts to apply the database schema. The "patch" SQL is the same as the DDL.

    I'm trying to avoid the situation another part of the project is in right now, where the admins have to run a pile of DB patch SQL scripts in the correct order every time they install a release that has DB changes in it (in other words, pretty much all of them). That gives me an idea, though... ;)


  • Java Dev

    As Jaime says - put them all in one file, in the correct order, and rely on pl/sql to check if each step needs to be done. Anything you're likely to need to know regarding columns, indexes, etc. is described in the data dictionary.

    Do be aware that reading some stuff in the data dictionary (performance analysis related) may require additional licensing.



  • Most enterprisey projects I've worked on have generally had a large pile of patch scripts to be run in the specified order.

    There are funky tools to automate much of it. http://www.dbmaintain.org/ looks awesome, but I've not yet had a chance to check it out for real.



  • @tarunik said:

    I'm trying to avoid the situation another part of the project is in right now, where the admins have to run a pile of DB patch SQL scripts in the correct order every time they install a release that has DB changes in it (in other words, pretty much all of them).

    Sample of our standard table DDL script (T-SQL variant):

    -- This was the original script content when the table was first creeated
    IF NOT EXISTS (SELECT 1 FROM sys.objects WHERE name = 'Employees')
    BEGIN
      CREATE TABLE Employees
      (
        EmployeeID int NOT NULL,
        FirstName varchar(30) NOT NULL,
        LastName varchar(30) NOT NULL
      )
    END
    GO
    
    -- This bit was added at a later date when it was decided to add a column
    IF NOT EXISTS(SELECT 1 FROM sys.columns WHERE object_id = OBJECT_ID('Employees') AND name = 'BirthDate')
    BEGIN
      ALTER TABLE Employees
        ADD BirthDate datetime NULL
    END
    

    The script was only the first block during the original deployment. To add a column, we added a conditional ALTER TABLE. Deployment is done by running all scripts that have been modified in their entirety. We have a tool that does a dependency analysis to order the scripts for a deployment. In one out of every hundred deployments we need to hand tweak the final deployment script - usually due to a circular dependency that probably should be engineered out anyways.



  • @Jaime said:

    We have a rule that our schema scripts must always be idempotent

    I keep trying to push a few of my coworkers in that direction, but some of them are all too happy to generate a script from SQL Compare and live with tons of red text after the script is run once...



  • @PleegWat said:

    As Jaime says - put them all in one file, in the correct order, and rely on pl/sql to check if each step needs to be done. Anything you're likely to need to know regarding columns, indexes, etc. is described in the data dictionary.

    I'll float the "single script with a pile of conditional statements in it" approach by our DBA -- although something that generates that script from individual changes in some form or another seems like a better plan than having a file a zillion lines long that's half boilerplate :P



  • @tarunik said:

    (Approaches like Rails migrations are rather undesirable as they create an issue with stacks of deltas that must be applied in sequence to get a database from scratch -- if they get run in the wrong order for some reason, stuff breaks )

    we do the whole migration thing. to avoid having a ton of steps we simply squash the migrations on each version, IE delete all the migration scripts and generate one from the current state of the schema. so in any given moment we have one step per version, plus a migration for every change on the current dev branch.


  • :belt_onion:

    We use Liquibase where I work and it does just fine. We're using the SQL migration scripts (which are effectively @Jaime's migrations, but without the need for the "should I already run this" checks in every statement, since Liquibase keeps track of what's been run on the DB already.)

    If you use its custom migration language (JSON / YAML / XML variants thereof) then it can generate the diff script for you for any given migration. It can also introspect your DB and write an initial change log for you, IIRC.


  • Java Dev

    We've got a lot of it in helper functions - the whole script is one anonymous pl/sql block, most of the checks and some of the more common actions are in helper functions. So we type create_table('TABLE_NAME', 'COLUMNS_CLAUSE') and the exists check, tablespace assignment, etc. are handled centrally.



  • Take a look at http://sqitch.org/.

    It's not a diffing tool (so not a solution for your problem), but a database schema management tool that integrates nicely with VCS managed DDL update scripts.

    You can find the Oracle tutorial at https://metacpan.org/pod/sqitchtutorial-oracle.



  • I needed something similar a couple of years ago, and at the time I couldn't find anything to do it. I ended up writing my own utility that handles both the schema and data changes (as we just store SQL table dumps). It wasn't all that difficult to write. On the other hand, our schema changes are usually small (mostly just adding new columns) so I imagine a general-purpose tool could be a bit more complex.



  • Agreeing with all the "migrations" stuff - the only problem Discourse has had with migrations is that they all have to be in the same folder, so plugins aren't allowed to use them.


  • Garbage Person

    Redgate has a SQL Source Control product that might be helpful for you.


Log in to reply