Handling validation on historical data



  • At time t = 1, the business rules had validation requirements X and Y. Those were checked going in (on creation), but were never checked on reconstitution.

    At time t = 2, business requirement Z was added. So now you've got entities that were valid at t = 1 but are not valid at t = 2. Still no big deal, because they're still not checked on reconstitution. Leading to some messiness when you have to use this/transform these potentially-not-quite-valid things elsewhere in the code.

    ...

    At time t = now, with requirements X, Y, Z, A, ..., we want to start enforcing validation on creation of entities, whether reconstituted or newly created. Because the messiness inherent in "is this in a valid state" is causing lots of problems. But now we have all this data that only meets some subset. Or wasn't valid at all (but crept through a flaw in validation back at time t = N. Naively validating this stuff makes it all blow up.

    Options:

    1. ignore invalid data on reconstitution. Downside--user loses data that was valid when entered and may not (for other reasons) be able to be fixed.
    2. Include a factory-type method to bypass validation under certain circumstances. Downside--now we've got the same problem and can't rely on entities being valid.
    3. Have N different types: EntityValidAtT1, EntityValidAtT2, ... Downside...ugh. Ugh. Ugh.
    4. Underpants?

    More specifically, we're shifting from an old, rather incoherent data model to a new, much more coherent and well-specified one. So the big issue comes in trying to convert one to the other.


  • Considered Harmful

    @Benjamin-Hall backfill with defaults, if possible.



  • I'd say that during translation into the new format, if validation fails put it separately in a naughty corner and have wetware fix it if data must not be lost. Else just toss away the data that doesn't validate. Anything else will lead to headache in the future.
    There may be errors that can be automatically resolved as well, depending on how important correctness is you can just translate that data over without wetware.


Log in to reply