Versioning serializable things

Magus

I'm working with the team I'm on on a project that uses a data store that, in the rare case that you actually look at it, is serialized stuff on disk.

However, sometimes your models change, and the serialized files are no longer valid, but we may still need to load them.

Currently, I don't know the serialization format, but it's safe to assume that it's either XML or JSON.

I do not believe we currently have a versioning system in place.

My current intent is to add an integer property to the class, which defaults to, say, version 2. When we update the class, we update the default. If data does not load, we need a migration script. (Though I'd ideally check the version number first)

So really, what I'm asking about is for someone to review my idea for a migration system. We'll assume XML for now:

The Entry Point

// Not sure exactly where this will be yet, but somewhere before desertialization happens.
var catalog = new AssemblyCatalog(typeof(SomeModelInTheProject).Assembly);
var container = new CompositionContainer(catalog);
container.ComposeParts();
var migrated = container.GetExportedValue<Migrator>().Migrate(theXmlAsAnXDocument);

The Structure

[Export]
public class Migrator
{
  private readonly IMigration migrations;

  [ImportingConstructor]
  public Migrator([ImportMany] IEnumerable<IMigration> migrations)
  {
    this.migrations = migrations;
  }

  public XDocument Migrate(XDocument document)
  {
    var version = GetVersionNumber(document);
    var usefulMigrations =
      migrations
        .Where(migration => migration.To > version)
        .OrderBy(migration => migration.To);

    var result = document;
    foreach(var migration in migrations)
      result = migration.Apply(result);

    return result;
  }
}

[InheritedExport]
public interface IMigration
{
  int To { get; }
  XDocument Apply(XDocument document);
}

So, this of course uses my favorite underused .Net library, System.ComponentModel.Composition. The end result is that if someone implements IMigration, the implementing class will be picked up and used by this system automatically if the version number is above the current one, and they will get applied in order of their version.

I'm probably missing some important considerations, but that's what I'm essentially asking about: What have I missed about this problem?

pie_flavor

@magus said in Versioning serializable things:

What have I missed about this problem?

For starters, functional code.

@magus said in Versioning serializable things:

[Export]
public void Migrator
{

@magus said in Versioning serializable things:

[ImportingConstructor]
public void Migrator([ImportMany] IEnumerable<IMigration> migrations)
{

These voids should be voided.

Magus

@pie_flavor Fixed. This is essentially pseudocode i wrote on the spot, i'm more looking for info on my approach.

Maciejasjmj

@magus said in Versioning serializable things:

However, sometimes your models change, and the serialized files are no longer valid, but we may still need to load them.

Would it be easier to just keep the old models around, deserialize to objects and convert between them instead of munging XML documents? Especially if you say you don't know the serialization format.

dkf

@magus said in Versioning serializable things:

Currently, I don't know the serialization format, but it's safe to assume that it's either XML or JSON.

Check that first!

Seriously, there's that many different serialisation formats and some are much more of a problem than others.

In general, you've got to decide if you're going to only have old data being handled by new code, or if you ever have to handle new data with old code; the latter is the awful case, whereas the former is just a matter of something with sensible defaults. And persuading the serialisation system to not throw a wally when things don't match up (and that's something that it should have the option to do; this sort of thing does tend to be thought of even if it makes things more complicated).

blakeyrat

@magus said in Versioning serializable things:

i'm more looking for info on my approach.

Well frankly, reading the OP the reason I didn't reply is that you have no clue what you're doing.

Look, currently you don't even know how the data gets serialized. Don't you think that's KIND of important when talking about how to version the serialization? (For one thing, XSDs have version numbers and it's an entirely solved problem there. But you don't even know if you're using XML, much less whether there's an XSD describing the data!)

It's like the famous fence quote which I'm too lazy to look up but will paraphrase: you don't remove the fence until you know why the fence was put up in the first place. In this case, you shouldn't mess with the serialization until you know exactly how and why it's done how it's done. What format is the data in? What other systems access it? Does it ever get put in a database? Etc.

Magus

@dkf @blakeyrat : The one part of this that I'm absolutely certain I can do is write some code that updates a serialized file to a new version. No matter the format, that part I can do.

Right now, actually doing that isn't important: while I may have to do it in a week or two at worst, right now I'm just trying to plan my approach.

Specifically, the part where I could have files in several old formats, and need to be able to detect which transformations I need and apply them in order, while keeping them as isolated as possible. That's what my OP is displaying, though clearly I did not explain that very well initially.

blakeyrat

@magus Right; but if your serializing to XML now, you probably don't want to be doing that. You want to be writing a XML transform to upgrade your XML from version X to version X+1.

External consumers of the XML won't get your code changes. They can (unless they're retarded) run an XML transform.

Magus

@blakeyrat It's all serverside in either case, but that does make sense. That is what XSLT is for.

masonwheeler

@blakeyrat said in Versioning serializable things:

It's like the famous fence quote which I'm too lazy to look up but will paraphrase: you don't remove the fence until you know why the fence was put up in the first place.

Chesterton's Fence:

There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, "I don't see the use of this; let us clear it away." To which the more intelligent type of reformer will do well to answer: "If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it."

-- G. K. Chesterton

blakeyrat

@masonwheeler said in Versioning serializable things:

Chesterton's Fence:

Yeah it's one of those quotes that's super relevant and important at the same time incredibly non-pithy.

Chesterton needed an editor.

Magus

Update:

Instead of a database, we apparently store data automatically serialized with C#'s DataContractSerializer. Which would be fine, but they won't give us the data we need, so the schema has to change somewhat frequently.

Also, apparently we may not know where the files are on disk.

All the information I can find is people doing dumb things, like just deprecating properties but keeping them around, and redirecting them to new properties on deserialization.

WHY would a database have been so bad? This is stupid!