Version control in the database

boomzilla

I'm looking to avoid some NIH and find something that will facilitate a version control system of stuff in my database:

I have to allow users to build...erm...let's just call it a package. There are several pieces of a package:

Some outline structured text. A few top level things are mandatory / hard coded, but others are more free form. Basically each node of the outline will be rich text. So: text, images, tables etc.
Other list of things, many of which will include foreign key links to other stuff in the system.
File attachments. There may be folder structures included here but that's optional. We haven't figured out how were going to store the attachments, but that's not really important here.

Now: my users will be editing these things and we have to keep versions. Users check the packet out, do some work and then check it back in. This will increment a minor version number. There's also an approval workflow. After passing through that the packet's major version number will be incremented. I'm planning to increment patch numbers (a la semver) whenever a user saves a change.

Anyone know of something like this? Ideally, java and compatible with Oracle (yeah, yeah, deal with it).

iKnowItsLame

@boomzilla this is #3 of my reply, predecessors got deleted 'cos too specific.

don't, madness lies there.

Everything COTS will cover 80-90% of your needs, the rest "we can do with some 'small' customizing"
-> I can't even think of such a thing -let alone a name- but since you have this requirement I am quite sure there also exists a company which promises to cover all your needs, most likely with "Enterprise" or some random characters in it's name.

All I can say is: do not go this route UNTIL you're 100% sure that your requirements are complete AND will be in lockstep with whatever COTS-Vendor deems to be viable for the next update.

Basically imho your choices are:
-) change your workflow to align with COTS (they cant' do == you can't do, do not believe with some customizations, ever)
-) invest in some "Enterprise"-thingy (= $$$)
-) review your workflow (= dumb it down, do we really need RTF (oh my!) and/or attachments or [random feature]? Can't this be "solved" with mark[up|down] a la Wiki?)
-) tell us more wtf that thing tries to solve, NIH ain't always bad.

You know what: tell us more anyways!

boomzilla

@iKnowItsLame said in Version control in the database:

don't, madness lies there.
Everything COTS will cover 80-90% of your needs, the rest "we can do with some 'small' customizing"

Oh, believe you me, if it was entirely up to me I wouldn't even be looking. I can't imagine anything sane could come of this. But those are my marching orders, so I'm doing my due diligence.

So far I haven't found anything, but it's difficult to weed out all of the results for putting the details of your schema under version control or just regular file based version stuff (git, svn, etc).

iKnowItsLame

@boomzilla "But those are my marching orders"

Wenn man vor dem Abgrund steht ist jeder Rückschritt ein Fortschritt!

/I feel you

Captain

@boomzilla said in Version control in the database:

Some outline structured text. A few top level things are mandatory / hard coded, but others are more free form. Basically each node of the outline will be rich text. So: text, images, tables etc.
Other list of things, many of which will include foreign key links to other stuff in the system.
File attachments. There may be folder structures included here but that's optional. We haven't figured out how were going to store the attachments, but that's not really important here.

Isn't this called Sharepoint?

boomzilla

@Captain we're replacing a system that replaced Sharepoint.

Captain

@boomzilla Java and Oracle... any other techs to consider?

I'd just throw up a CMS with good Java/Oracle bindings and customize it.

Haven't tried it, but this looks like it might check off all the boxes. https://craftercms.org/

Groaner

It sounds like you're versioning schema and data. Redgate has a lot of tools for diff-ing schema, but I'm not sure about data differences. Sounds like you almost need a replication framework. I know SQL Server has replication features, but can't speak to Oracle.

boomzilla

@Captain hmmm....this looks promisingish:

Powerful git-based versioning of both code and content.

....except it's actually git...in a regular repository.

EDIT: I'm gonna have to think about this.

boomzilla

@Groaner said in Version control in the database:

It sounds like you're versioning schema and data. Redgate has a lot of tools for diff-ing schema, but I'm not sure about data differences. Sounds like you almost need a replication framework. I know SQL Server has replication features, but can't speak to Oracle.

No. Not really versioning schema, unless you mean the stuff about the outline structure or optional folder structure for attachments. I haven't thought through the document stuff yet, but the rest of the content would look like a tree with parent and sibling links among rows in my hypothetical implementation.

JBert

Have you ever heard about the Content repository API for Java?

We have been using Apache Jackrabbit as implementation and so far it has served our needs, although it also needed a lot of massaging to work in a clustered setup or have multiple isolated repositories.

For example, it needs a rather big XML file with initialization instructions and the connection string plastered a few times in there (or defined at the top and then referenced using some id), but our installation team got tired of having to edit those files and fear breaking something. So we actually made some code which reads our own "user-friendly" configuration and spits out several copies of the Jackrabbit config file, then lets Jackrabbit load each one to have a distinct repository.

The other thing to watch out for is that querying its contents can only be done using the JCR API, the database scheme is full of blobs and should thus be treated as a black box.

Finally, as mentioned before the clustering is... special. Each nodes has its own Lucene index and a "updated until 'revision' X" counter, so recreating a node means that it needs to read in the entire "event log" from the start and replay all changes. (The best way we found to quickly repair a broken node was to take a functioning node down at a quiet moment, copy its index files, overwrite the broken index files on the broken node with the working index, reset the 'revision counter' in the database for the broken node to the value of the working node, then start up both nodes again to let them get in sync. Doing this regularly as part of a backup scheme also helped a lot)

Because each change also needs to update all counters and thus all nodes, this software works best in a write rarely, read often scenario.

EDIT: I forgot to mention that this thing doesn't really have a concept of folders and files, instead it works with tree "nodes" and "leaf nodes", all of which can have properties assigned. Versioning and locking tends to use these properties, with locks having a forced synchronous syncing behaviour whereas properties might take a while to get synced.

powerlord

@boomzilla

Oracle itself supports versioning through its Workspace Manager. Workspaces work very much like branches in a version control system, where you can merge them into a parent workspace; we used this for our approval system with Oracle Spatial on a project I worked on 4 years ago.

The only real counter intuitive things about it that I noticed are:

The top level workspace is always named LIVE
You must explicitly tell Oracle which tables are version-enabled. All other tables are treated as normal.

boomzilla

Good tips, @JBert and @powerlord. I'll look into those mañana.

Unperverted Vixen

@Groaner said in Version control in the database:

It sounds like you're versioning schema and data. Redgate has a lot of tools for diff-ing schema, but I'm not sure about data differences.

Their comparison SDK was great for both schema and data differences. Unfortunately they killed the product; no more automated comparisons outside of deployments.

heterodox

@JBert said in Version control in the database:

The other thing to watch out for is that querying its contents can only be done using the JCR API, the database scheme is full of blobs and should thus be treated as a black box.

There are also connectors for things like WebDAV that make it more accessible for integration.