Non-production production environments

tarunik

Out where I work, we have a set of folks that massage various data points about the railroad and how it works into something our train-dispatching software can digest. While this process is mildly loopy (63 steps at last count), most of this is due to the flat text files masquerading as a database that the software uses -- that's a nearly-impossible-to-fully-anonymize sidebar WTF for another time, though.

We have a tool, thankfully, that keeps all this data in a 'master' database and allows for some cleanup hand-editing to be done as the original editing tool that generates this data is a WTF in its own right, and we'd rather not be fiddling with the files directly in a text editor as that's quite error-prone.

Herein lies the rub though: the consumer of this data is still years away from production, yet we need to be able to have this data fully backed up, etc. as there is significant hand-work in there that would be a serious setback if lost. This makes keeping it in a production database environment attractive; however, the access restrictions on the production environment make some processes -- such as extracting production data to populate test/dev, which we have yet to fully automate, and also the occasional need to correct data issues by hand when the automatic tools are unsuitable for the job -- quite onerous, as the change-control/data-protection tools in production where I work don't know about our situation and squawk anyway, and the DBAs don't really know our data or schema that well, because the data is rather...arcane, if you will, and the schema tracks an external requirements specification that's nowhere near 3NF!

The alternative we have so far is to keep it in a test database environment that has a custom service level agreement attached to it to handle the backup needs of this data (which are quite intensive; we have an in-application backup system to handle recovery from logical corruption atop the normal backups the DBAs take). While this works...are there some unseen dangers to this approach? Keep in mind that the tool in question is a 'fat client' that talks to the DB server directly, by the way.

Also: are there other ways to deal with such 'quasi-production' databases that you've run into?

MathNerdCNU

If it is a database with real-deal-genuine-McCoy customer data, treat it no different than production.

I'm approaching this from a field that very much frowns upon sharing any production data, even anonymized. Anytime someone remotely mentions letting anyone that does development near a live system my response is summed up as, LOLNOPE!

tarunik

@MathNerdCNU said:

If it is a database with real-deal-genuine-McCoy customer data, treat it no different than production.

I'm approaching this from a field that very much frowns upon sharing any production data, even anonymized. Anytime someone remotely mentions letting anyone that does development near a live system my response is summed up as, LOLNOPE!

This data is:

not customer related in practically any way, shape, or form -- it describes what the railroad looks like and how various pieces of signaling etal work.
parts of it are almost trivial to RE from satellite imagery and/or the simple expedient of sticking a video camera on a train car and pointing it down the track.
not growing, but changing at a fairly steady pace, as the railroad in the field changes and data fixes are made; there are also requirements changes coming at us as well.

Also: we are in a state where we are producing this quasi-production output data for other developers to use!

MathNerdCNU

Do you really "need to know" the data in the not-production-production-system to do your job? If yes, that's okay!

In which case I would say, still treat it like production. Regulate & audit access, don't allow changes that haven't gone through a process, daily offsite backups etc. If that means a dev lead/some monkey is the arbiter of the process so be it, acknowledge the potential risk(s) and accept it.

If you don't "need to know", play stupid and get more time. Win-win right?

tarunik

@MathNerdCNU said:

Do you really "need to know" the data in the not-production-production-system to do your job? If yes, that's okay!

Very much so; we have access to that data through other sources within the company anyway, save for stuff that's hand-edited in there, of course. We also have to look things up in there to troubleshoot bugs that may or may not be connected to this data -- special cases abound here. There are even future usecases where developers will be adding to and editing portions of this data!

@MathNerdCNU said:

In which case I would say, still treat it like production. Regulate & audit access, don't allow changes that haven't gone through a process, daily offsite backups etc. If that means a dev lead/some monkey is the arbiter of the process so be it, acknowledge the potential risk(s) and accept it.