What's a good tool for Linux database migrations?

Captain

I am trying to make better use of my database in a webapp I am working on, and to do that, I intend to get rid of its micro-ORM's automatic migrations.

This means I have to manually manage my migrations. Or at least hook managing them into my app. The db server is Postgres on Ubuntu 18.04.

This is what I am envisioning:

A SQL file in my source repository that gets updated as I make model changes.
A program/library that runs and checks if the changes have been applied, and if not, applies them.
A way to run that program from my webapp, at start up. I know 3 is always possible, by going to the shell or whatever.

Someone on Quora just suggested Sqitch. What do you use?

PleegWat

Didn't @apapadimoulis write a soapbox about this a few years back?

Captain

@PleegWat Maybe but I think @apapadimoulis is more into the Windows ecosystem.

Carnage

I've used liquibase a couple of times for doing database updates.

PJH

@Captain said in What's a good tool for Linux database migrations?:

This is what I am envisioning:

A SQL file in my source repository that gets updated as I make model changes.

A program/library that runs and checks if the changes have been applied, and if not, applies them.

A way to run that program from my webapp, at start up. I know 3 is always possible, by going to the shell or whatever.

The system I've used in the past for php/web-based systems (note - rollback isn't something I've considered...):

A directory containing (0-padded)-number-prefixed files with SQL statements to create/modify the underlying database schema for tables (but not views, triggers, events). The last command in each file updates the table in #3 with the number of that file.
A subdirectory containing more (optionally similarly prefixed) files (to dictate order if required) with SQL statements to create views, triggers, events.
A single column, single row table called schema_version with the number of the most recent file from #1 that's been applied
A define somewhere in the application dictating how far along the files in #1 should have been applied
A check that #3 and #4 match. If #3 < #4, apply relevant updates from #1 in order and move to #6. If #3 > #4, then there's a problem (code is too old for the current database schema) and stop doing stuff.
If an update was applied in #5, wipe out all views, triggers and events, and rerun everything in #2

The README explains some of the above in more details, with reasoning for #6

Example listings/files (not anonymized, don't think there's anything here that requires it)

$ ls schema/mysql/
0000.README.sql          0006.permissions.sql   0012.password_tokens.sql    0018.settings.sql
0001.schema_version.sql  0007.rsa_keys.sql      0013.clients.sql            0019.crontab.sql
0002.empty.sql           0008.picklists.sql     0014.nested_set_groups.sql  catalog
0003.sessions.sql        0009.emails.sql        0015.actions.sql            drop-everything.sql
0004.users.sql           0010.tokens.sql        0016.otp.sql                test_data.sql
0005.empty.sql           0011.audit_mapper.sql  0017.clients_users.sql

$ ls schema/mysql/catalog/
0001-example.routine.sql  0007-scendants.view.sql      nested_sets.view.sql
0002-example.event.sql    actions_users.view.sql       passwords_tokens.view.sql
0003-example.trigger.sql  addresses_tokens.view.sql    permissions.view.sql
0004-uuid.routine.sql     audit_audit_fields.view.sql  queued_users.view.sql
0005-addresses.view.sql   clients_users.trigger.sql
0006-keys.view.sql        clients.view.sql

$ cat schema/mysql/0001.schema_version.sql
SET time_zone = "+00:00";

CREATE TABLE IF NOT EXISTS `schema_version` (
  `version` int(11) NOT NULL COMMENT 'Database schema version'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `schema_version` (`version`) VALUES (1);

$ cat schema/mysql/0003.sessions.sql 
SET time_zone = "+00:00";

-- Don't bother forcing \DB\SQL\Session() to see if this table is here.

CREATE TABLE IF NOT EXISTS `sessions` (
  `session_id` varchar(255) NOT NULL DEFAULT '',
  `data` text,
  `ip` varchar(45) DEFAULT NULL,
  `agent` varchar(300) DEFAULT NULL,
  `stamp` int(11) DEFAULT NULL,
  PRIMARY KEY (`session_id`(191))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

UPDATE schema_version SET version = 3;

0000.README.sql

-- This is just a README, and not expected to be run as a script.

-- Anything after a double-minus sign in a script, whether quoted or not, shall 
-- be stripped as a comment

-- Update scripts presume that a database connection with full access to a 
-- database has been provided via db_credentials.php in the root directory.

-- Update scripts in this parent directory should only modify underlying tables; 
-- views, routines, events and triggers are handled differently - see "Updating 
-- non-base table objects." below.

-- The database itself can be empty, or partially 'upgraded' using these scripts 
-- from a previous version of the software.

-- Each update script filename MUST consist of the format
--   ####.brief_description.sql where 
-- -- #### is at least a 4-digit, left-zero padded number
-- -- brief_desctiption is just that
-- -- .sql as a file suffix

-- The numbers at the start of the filenames MUST be consecutive, contiguous, 
-- and start at 0001. Should it ever come to having more than 9999 scripts, 
-- resume starting at 10000.

-- The most recent script presumed to have run should match the define() for 
-- SCHEMA_VERSION in index.php in the root directory.


-- Each script should be self contained and SHOULD serve a single purpose (i.e. 
-- a single update of the software may contain more than one update script; e.g.
-- 0002.users creates a single users table, 0005.permissions sets up four linked
-- tables.)

-- Each script may (MUST) presume that all the scripts numerically before it 
-- have run to successful completion, and thus the database is in a consistent 
-- state.

-- Generally scripts will add/modify/remove tables and columns, but they MAY 
-- also insert/modify/remove table data (e.g. for pick-lists or default data 
-- for an empty table (the users table with a default admin for example.)

-- Since any DDL statements (modifying/creating tables/columns) have inherent 
-- COMMITs before and after execution, scripts should take care of any 
-- transactional requirements, since the calling routine doesn't provide any.

-- Any statement that causes an error within a script WILL abort the transaction

-- Each script MUST finish with a line updating the schema_version table with 
-- the number of the script. Thus 00054.something.sql MUST finish with the 
-- following statement:
--
--     UPDATE schema_version SET version = 54;

-- As a consequence of the above two points, an aborted script shall result in 
-- the failure of any further scripts to be executed.

-- Updating non-base table objects...
-- Ref: https://odetocode.com/blogs/scott/archive/2008/02/02/versioning-databases-views-stored-procedures-and-the-like.aspx
--
-- These update scripts are located in the `catalog` subdirectory
--
-- Any update of the database shall result in the dropping of all views, 
-- routines, events and triggers the all scripts in the `catalog` subdirectory
-- shall be run in alphabetical order. If you wish to add a trigger, add a file 
-- to create it from scratch in that directory for it. If you wish to modify a
-- view, modify that file. if you wish to delete an event, remove the file.
--
-- If you wish to update any of the catalog items without making any changes to
-- the underlying tables, simply create an 'empty' update script (described 
-- above) to trigger an update of the other catalog items (commenting the script 
-- suitably to that effect.) See 0004.empty.sql as an example.
-- 
-- Filenames should follow the format of brief_description.type.sql where
-- -- brief_description is just hat
-- -- type should be one of `view`, `routine`, `event` or `trigger`.
--
-- To impose an implicit order on the scripts, those that are non-dependant on
-- other catalog items should simply have a brief_description starting with a
-- non-number and shall be executed first in alphabetical order. Those that 
-- rely on previous scripts having been run (e.g. a view  that relies on other 
-- views existing) should be prefixed with a four-digit  left-zero padded number, 
-- with 0000 being executed first after the  non-prefixed scripts, and 9999 
-- being the last.



-- ---------
-- ---------
-- ---------

-- Conventions:

-- Comment copiously in update scripts where it makes sense.

-- Avoid NULLable columns where possible/reasonable

-- NO systems-Hungarian prefixes; no `tbl` to describe a table, `trg` for 
-- triggers, `col` for column etc.

-- NO camel-case. Use underscores and lower case.

-- Capitalise sql keywords.

-- The primary key SHOULD be a single column (auto-incrementing or not.)

-- Use <tablename>_id for the primary key for a table, not simply `id`, since
-- using the latter usually involves lots of aliasing when joining tables.

-- Tables describing objects should use the plural of that object (users, keys)

-- Columns within the tables should use the singular (user, key)

-- Join tables should be named after the tables they're joining, in alphabetical 
-- order (keys_users)

-- Store times using UTC. Use the presentation layer to adjust zones if required.

Unperverted Vixen

If you like the sound of @PJH's approach and have a .NET app, DbUp is a nice wrapper for a very similar process. (It doesn't automatically rebuild views on a table change, but refreshing them on a table change is probably something that should have been scripted anyways.)

(I know @Captain said the database is on Linux and thus probably isn't .NET, but that doesn't mean the outer app is.)

I don't know of any pre-existing solutions that combine the model-based approach and doing the upgrade on startup; the model-based solutions I've seen (admitted for SQL Server and not Postgres) do the diff/apply at deployment time instead.

sebastian.galczynski

@PJH said in What's a good tool for Linux database migrations?:

The system I've used in the past for php/web-based systems (note - rollback isn't something I've considered...):

A directory containing (0-padded)-number-prefixed files with SQL statements to create/modify the underlying database schema for tables (but not views, triggers, events). The last command in each file updates the table in #3 with the number of that file.
A subdirectory containing more (optionally similarly prefixed) files (to dictate order if required) with SQL statements to create views, triggers, events.
A single column, single row table called schema_version with the number of the most recent file from #1 that's been applied
A define somewhere in the application dictating how far along the files in #1 should have been applied
A check that #3 and #4 match. If #3 < #4, apply relevant updates from #1 in order and move to #6. If #3 > #4, then there's a problem (code is too old for the current database schema) and stop doing stuff.
If an update was applied in #5, wipe out all views, triggers and events, and rerun everything in #2

The README explains some of the above in more details, with reasoning for #6
Example listings/files (not anonymized, don't think there's anything here that requires it)

The system I use (Doctrine) has a plugin like this, except the table from point 3 has many rows, each for one migration applied, and migrations have 'up' and 'down' (rollback) statements. I think it's relatively simple to just implement it by hand if you don't need to autogenerate schema diffs, or you already have a tool for that - you just check what is already there and run what isn't, in a single atomic transaction of course.

zekka

To suggest an option you've probably already abandoned because it's too manual -- have you tried dumping the database schema, then using diff to figure out how much the schema differs from the one you wish you had, then making changes manually to approach that?

We make the changes on a test instance of our database (that has no data) first, then try them again on a test instance that contains all of our data so we know how heavy changes are. Then we diff the schema, and finally make the changes (using a blessed, versioned SQL script) against the production database.

We do so we can consolidate multiple user-facing changes into one change, and so we can make ad hoc changes to the schema without losing track of what they were. (For instance, sometimes we'll try a new index on just one replica to see if it helps, or we'll delete it to see if it hurts.)

We switched to this process after previously using alembic and accidentally knocking down the app one too many times with unforeseen expensive changes. However, we also have a dedicated DBA who can afford to spend time on this sort of thing.

(If you do want to use a migration tracking tool, please please please write your changes manually -- database-level diffing is pretty unsafe, as it can't distinguish new columns of the same type from deleting a column and then creating a column. I liked Goose, but it's probably still too manual for you. I think PJH's process is good.)

zekka

(PS: No matter what choice you make, be sure your backups work before you start doing migrations. Especially if you do them automatically!)

Captain

@Unperverted-Vixen Deployment time would be a bit annoying because of how my devops is organized.

That said, I'm using Sqitch and doing it at deployment time... but it's yet another barrier to CI (for me, for right now).

I probably do need to switch to fucking docker.

JBert

@Captain said in What's a good tool for Linux database migrations?:

I probably do need to switch to fucking docker.

The official docker statement is that you're supposed to keep persistent data out of containers, storing them in data volumes instead. Containers are meant to be "volatile" and you should be able to throw one away and create a new container from an image.

But that's why I always wonder how you would migrate the data volumes...

Captain

But that's why I always wonder how you would migrate the data volumes...

Replication at the db level?

Hmm.

I'm not sure what to think about all this. I am definitely having some problems with Sqitch that make it hard to deploy my databases anyway. If I fixed that, I could just keep going with what I have.

Maybe I should set up a database server and have the deployable web-app figure out its server from its environment. But then, again, that separates the SQL schema from the app and makes a more convoluted process. (I'm just a one-man shop on this project, and adding 5 minutes to a testing cycle means I get like 1 less bug fix every day).

I want to make the database stuff easy, so I can make the data model rich and write cool queries.

For now, I'll just deploy the db by hand in my app directory. That makes some sense, since I have to keep the app in sync with the db anyway. (That's the process I don't want to have to make convoluted).

So, fixing today's app bug is the first priority. And then maybe fixing the Linux/Postgres permissions issue stopping my vagrant user from destroying the db.