SQL help: Building changelists from history table

djls45

I have a table of information, and I need to get a list of all the changes, showing both the old and new values in one row of the results. In this situation, I'm using Oracle's SQL Developer, so I need the query in PL/SQL.

The results need to be in the following format:
mt_user, mt_timestamp, new_value, old_value, is_deleted
new_value and old_value both come from the mt_data below, and need to be consecutive values. That is, if I inserted the data as "A", then boomzilla updated it to "B", then blakeyrat updated it to "C", then I deleted it, the results should look like this:

mt_user	mt_timestamp	new_value	old_value	is_deleted
djls45	2018-04-01 10:43:52	A		N
boomzilla	2018-04-02 10:41:36	B	A	N
blakeyrat	2018-04-03 10:44:14	C	B	N
(null)	2018-04-04 10:41:22		C	Y

I can do all the date formatting stuff, but I need help getting the correct values for old_value and new_value in the right rows and only the right rows.

The way it's set up, the tables looks like this:

CREATE TABLE my_table ( -- current values
    mt_id INT NOT NULL,
    mt_data DATE NOT NULL,
    mt_user VARCHAR(100) NOT NULL,
    mt_update DATE NOT NULL,
    CONSTRAINT mt_pk PRIMARY KEY (mt_id)
);
CREATE TABLE my_table_history ( -- audit history, filled via DML triggers on my_table
    mt_id INT NOT NULL,
    mt_data DATE NOT NULL,
    mt_user VARCHAR(100) NOT NULL,
    mt_update DATE NOT NULL,
    mt_timestamp DATE NOT NULL,
    mt_action VARCHAR(1) NOT NULL
);

The database has triggers that save a copy of the new row into my_table_history whenever anything changes in my_table, include the timestamp of the change into mt_timestamp, and store the type of action that produced the change into mt_action — 'I' for INSERT, 'U' for UPDATE, and 'D' for DELETE.

So far, I think I want something like this:

SELECT COALESCE(h1.mt_user, h2.mt_user) AS mt_user
     , COALESCE(MAX(h1.mt_timestamp), h2.mt_timestamp) AS mt_timestamp --can I have a max() inside a coalesce()?
     , h1.mt_data new_value
     , h2.mt_data old_value
     , CASE WHEN h1.mt_action = 'D' THEN 'Y' ELSE 'N' END is_deleted
FROM my_table_history h1
FULL JOIN my_table_history h2 ON h1.mt_id = h2.mt_id AND h1.mt_timestamp > h2.mt_timestamp
GROUP BY ???
ORDER BY 2;

But I think I probably need to somehow use subqueries or more joins to limit it to just the rows I need with the right data. But unlike T-SQL, PL/SQL subqueries can't access columns that are not within the subquery.

LaoC

Priming at work: I briefly misread that as "evangelists".

Lorne Kates

There's a proper subform for Oracle queries. :/

But seriously, though. Looks like you need to left outer join the table to itself with a ranking. I'll try to think of the oracle later when I'm not brainfried. In pseudo SQL it'd be something like

SELECT mt_user, mt_date, t1.value as old_value, t2.value as new_value
FROM
(
SELECT RANK() OVER (PARTITION BY mt_data ORDER BY dt) as t1rank, *
FROM mt_table as t1
) as t1
LEFT OUTER JOIN
(
SELECT RANK() OVER (PARTITION BY mt_data ORDER BY dt) as t2rank, * FROM mt_table
) as t2 ON t1.mt_data = t2.mt_data AND t1rank = (t2rank - 1)

I might be arsing up which one gets -1 and which one is old,new but you get the idea.

izzion

@djls45

I think this might do what you want?

;WITH cte_ranked_history AS (
    SELECT 
        mt_id,
        mt_user,
        mt_timestamp,
        mt_data,
        mt_action,
        RANK() OVER ( PARTITION BY mt_id ORDER BY mt_timestamp ASC ) AS rank
    FROM my_table_history
)
SELECT 
    rth.mt_user AS mt_user,
    rth.mt_timestamp AS mt_timestamp,
    rth.data AS new_value,
    rthold.data AS old_value,
    CASE WHEN rth.mt_action = 'D' THEN 'Y' ELSE 'N' END is_deleted
FROM cte_ranked_history rth
LEFT JOIN cte_ranked_history rthold
ON rth.mt_id = rthold.mt_id
AND rth.rank + 1 = rthold.rank

Scarlet_Manuka

@lorne-kates More importantly, where you're using mt_data, you should be using mt_id instead.

@djls45 Some questions on datatypes: Is mt_data really supposed to be a DATE column? Why are you using VARCHAR and not VARCHAR2? And do you potentially need better than 1-second resolution on mt_timestamp? If so you should use one of the TIMESTAMP types rather than DATE.

I don't see any mention of what mt_update is used for and it doesn't seem to be relevant here so I'll ignore it for now.

Since mt_data is marked NOT NULL, what do you store in there for a delete operation? Storing null in this case would require less special-casing in the query logic.

This is what I put together and tested successfully:

with ranked_history as 
   (select hist.*, rank() over (partition by mt_id order by mt_timestamp) mt_rank
    from my_table_history hist)
select curr.mt_id, curr.mt_user, curr.mt_timestamp, decode(curr.mt_action, 'D', null, curr.mt_data) new_value, 
prev.mt_data old_value, decode(curr.mt_action, 'D', 'Y', 'N') is_deleted
from ranked_history curr left outer join ranked_history prev 
on (curr.mt_id = prev.mt_id and curr.mt_rank = prev.mt_rank + 1)
order by curr.mt_id, curr.mt_timestamp;

ETA: ... pretty much the same as @izzion's solution.

blakeyrat

@djls45 Questions:

Why is data a date? (I mean, if it is, it is, but that's an awful weird name for it... and your example has non-dates in the output...)
Why is the username stored as an nvarchar? (Is it an application user, or a SQL user? In the latter it might make sense.)
What is the output supposed to be if another user changes the value inbetween djls45 changing the value? (And how do you tell?)
What's the data date(?) when your change is a delete? It's marked not-null...

Scarlet_Manuka

@blakeyrat Heh, pretty similar to the questions I was asking.

For (3) I did assume that the history table stores the username responsible for the change, and that we want to see this in the output listing for that change. I can't really conceive of any other likely possibility.

blakeyrat

@scarlet_manuka There's not enough info in the example given to determine what to do if different users change the same record. There's also nothing in the data I can figure which identifies which record is changed... right? Or am I crazy?

@scarlet_manuka said in SQL help: Building changelists from history table:

For (3) I did assume that the history table stores the username responsible for the change, and that we want to see this in the output listing for that change. I can't really conceive of any other likely possibility.

Right; but are we filtering on just that user name? If user "al" changes a record, then "bob" changes it back, how could you tell since there's nothing in the history table to identify which record mt_data was a part of originally? There's no like mt_data_id which would let you group sensibly. Or am I crazy?

The delete thing perplexes me too. Presumably if a record is deleted, the data ought to be null, right? Or does the previous data go into my_table_history.mt_data? If the latter, these tables badly need better column names.

Hopefully one of these answers helps djls45 but as far as I'm concerned, there's too many question marks about the problem for me to even attempt an answer. (Which is just as well, since I don't know Oracle SQL in the first place. I'm just confused about what we're doing in this thread.)

EDIT: it just occurred to me that the mt_id in the history table is probably a foreign key into the my_table table and not a primary key for the history table with a weird name. Is that correct? If so that solves one of my confusions. Also you should properly mark your foreign keys; the DB can optimize based on those.

djls45

@Lorne-Kates I didn't know Oracle SQL even had RANK() OVER (PARTITION BY ... ORDER BY ...). I looked it up, and apparently it's standard SQL. TIL :D

djls45

@izzion That seems like it'll probably work. SQL Developer doesn't like having the semicolon at the beginning, though.
Also, TIL that Oracle's SQL includes CTEs. (I haven't actually sat down to figure out how they conceptually work.) :D

djls45

@Scarlet_Manuka said in SQL help: Building changelists from history table:

@djls45 Some questions on datatypes: Is mt_data really supposed to be a DATE column? Why are you using VARCHAR and not VARCHAR2? And do you potentially need better than 1-second resolution on mt_timestamp? If so you should use one of the TIMESTAMP types rather than DATE.

In this case, yes, it's a DATE. I have some other fields that are other types, but I figured that if I could get this one working, the rest should be a breeze. As for VARCHAR vs. VARCHAR2 or DATE vs. TIMESTAMP, those are what the database uses, and I can't change that. :(

I don't see any mention of what mt_update is used for and it doesn't seem to be relevant here so I'll ignore it for now.

It and mt_timestamp are basically duplicate fields, but mt_update is used in the current values table (my_table) by the system, and mt_timestamp is only in the history table, and is populated by the DML triggers.

Since mt_data is marked NOT NULL, what do you store in there for a delete operation? Storing null in this case would require less special-casing in the query logic.

A delete just removes the row from the current values table, and the state of the row at the time of deletion is saved to the history table with a 'D' in mt_action.

djls45

@blakeyrat said in SQL help: Building changelists from history table:

@djls45 Questions:

Why is data a date? (I mean, if it is, it is, but that's an awful weird name for it... and your example has non-dates in the output...)

Because that's the field. my example was just using easy placeholders. I think it's easier to see that 'A', 'B', and 'C' are different than having to parse apart a date format.

Why is the username stored as an nvarchar? (Is it an application user, or a SQL user? In the latter it might make sense.)

It's actually a varchar, not an nvarchar. Don't ask me why. I didn't design it. It can contain application user IDs or SQL user IDs. The application automatically fills it, but the SQL users have to manually enter their own ID in there whenever doing DML operations (only on the current values table, though; the history table is updated only via the DML triggers on the current values table).

What is the output supposed to be if another user changes the value inbetween djls45 changing the value? (And how do you tell?)

Then their username appears on the row with their change. I'll update the OP.

What's the data date(?) when your change is a delete? It's marked not-null...

The last values in the current values table are copied to the history table with a 'D' in the action field, and the row is removed from the current values table.

djls45

@blakeyrat said in SQL help: Building changelists from history table:

@Scarlet_Manuka There's not enough info in the example given to determine what to do if different users change the same record. There's also nothing in the data I can figure which identifies which record is changed... right? Or am I crazy?

@Scarlet_Manuka said in SQL help: Building changelists from history table:

For (3) I did assume that the history table stores the username responsible for the change, and that we want to see this in the output listing for that change. I can't really conceive of any other likely possibility.

Right; but are we filtering on just that user name? If user "al" changes a record, then "bob" changes it back, how could you tell since there's nothing in the history table to identify which record mt_data was a part of originally? There's no like mt_data_id which would let you group sensibly. Or am I crazy?

There's some other data in the results that indicate exactly which record was changed. I just left them out to focus on the specific problem I was having. I figured mt_id as PK should be enough of an indicator.

The delete thing perplexes me too. Presumably if a record is deleted, the data ought to be null, right? Or does the previous data go into my_table_history.mt_data? If the latter, these tables badly need better column names.

Delete is the only DML operation whose trigger saves the previous data. The insert and update triggers save the new info being saved in my_table to my_table_history.

Hopefully one of these answers helps djls45 but as far as I'm concerned, there's too many question marks about the problem for me to even attempt an answer. (Which is just as well, since I don't know Oracle SQL in the first place. I'm just confused about what we're doing in this thread.)

I think they do. @izzion and @Scarlet_Manuka's solution seems like it'll probably work. With a little tweaking, @Lorne-Kates's solution also seems like it should work, too.

EDIT: it just occurred to me that the mt_id in the history table is probably a foreign key into the my_table table and not a primary key for the history table with a weird name. Is that correct? If so that solves one of my confusions. Also you should properly mark your foreign keys; the DB can optimize based on those.

The history tables have no foreign keys. Nor primary keys, either. That allows other data (including application configuration) to be deleted from the current values tables with a minimum of hassle and still maintain records of what used to be there.

djls45

@blakeyrat said in SQL help: Building changelists from history table:

Or am I crazy?

We're all in the looney bin. But I'm not stuck in here with you. You're stuck in here with... THEM! :P

PleegWat

Rather than using a self-join, you may want to look at the LEAD() and LAG() analytic functions.

LAG(mt.data) OVER (order by mt.timestamp partition by mt.id) should give you the data of the previous row.

izzion

@djls45
Ah, the semicolon before the WITH is a t-sql habit because t-sql gets pissy if the WITH statement isn’t explicitly the first statement in a group, and requires the previous statement (if any) is semicolon-terminated or batch terminated (e.g. GO). So ;WITH becomes the habit when using CTEs in t-sql.

The rest of the syntax should be ANSI standard.

izzion

@pleegwat
Also, this. I always forget about those functions, lol.

How does LAG break down in the case of the initial entry? Will it return NULL for the old value?

PleegWat

@izzion Yup.

blakeyrat

@djls45 said in SQL help: Building changelists from history table:

Delete is the only DML operation whose trigger saves the previous data. The insert and update triggers save the new info being saved in my_table to my_table_history.

So to interpret the value in mt_data you need to look at the value in mt_action? Ouch.

@djls45 said in SQL help: Building changelists from history table:

The history tables have no foreign keys. Nor primary keys, either. That allows other data (including application configuration) to be deleted from the current values tables with a minimum of hassle and still maintain records of what used to be there.

Then what value goes into mt_id in the history table, if it's not a primary key or a foreign key? Moreover, if it's neither of those, why is it named "id"?

Can we go back to stratch for this history table and start over? Haha.

djls45

@blakeyrat said in SQL help: Building changelists from history table:

@djls45 said in SQL help: Building changelists from history table:

Delete is the only DML operation whose trigger saves the previous data. The insert and update triggers save the new info being saved in my_table to my_table_history.

So to interpret the value in mt_data you need to look at the value in mt_action? Ouch.

It shows what the state of the row was immediately before it was deleted, so while it may not be the best way to do it, it's also not the worst.
What is a problem (that has been fixed in newer versions of the application/stored procedures) is that the DELETE trigger doesn't include the ID of the user who performed the DELETE in the mt_user field.

@djls45 said in SQL help: Building changelists from history table:

The history tables have no foreign keys. Nor primary keys, either. That allows other data (including application configuration) to be deleted from the current values tables with a minimum of hassle and still maintain records of what used to be there.

Then what value goes into mt_id in the history table, if it's not a primary key or a foreign key? Moreover, if it's neither of those, why is it named "id"?

mt_id in the history table is the same value as the primary key for the current values table, but it's specifically not a primary key in its own table so that the full history can be stored. If it were a foreign key to the PK of the current table, then we couldn't ever delete anything from the current table.

Can we go back to stratch for this history table and start over? Haha.

Do you have recommendations for how to do things better? The only thing I can think of is perhaps make mt_id and mt_timestamp together the PK for the history table to increase performance.

blakeyrat

@djls45 said in SQL help: Building changelists from history table:

Do you have recommendations for how to do things better?

Well I only know T-SQL but a few:

I get your problem about foreign keys preventing deletes, but one problem you have is that your delete entries in your history table refer to IDs in the original table that may not exist any longer, and possibly have even been recycled to store an entirely different row. (I don't know what guarantees Oracle makes, but SQL Server reserves the right to recycle IDs at any time.)

In SQL Server, I'd solve that by marking the foreign key as ON DELETE SET NULL which does pretty much what it says-- it won't prevent you from deleting the original row, but if you do delete the original row it'll set the mt_id in the history table to null. That's probably the correct behavior in your case. No matter how you solve it, holding an ID to a deleted row is almost certainly going to bite you in the ass sooner or later.

Having columns that are multi-purpose really rubs me the wrong way. Right now, you can't do anything meaningful with the mt_data column unless you also look at the mt_action column. In the history tables I make, the history table always has the historical data, which solves this problem. You don't need the current table value in the history table, because you can just look at the current table.

Similarly, the mt_user column can contain either an application user or a database user. That also rubs me the wrong way. I'm not sure what the best way of fixing that would be, but at the very minimum you could use the column names to try to communicate what's happening. (In 99% of apps, whenever you see the word "user" in SQL, it refers to the app user and not the database user.)

PleegWat

@blakeyrat Oracle does not have identity columns the way MSSQL does. mt_id will be a number column filled from a functionally distinct sequence object. The database does not know whether values are still in use, and hence cannot recycle.

Having IDs in the history table may be useful even after the original row is deleted to recognize which history rows relate to the same object.

blakeyrat

@pleegwat Well surely Oracle also has a way of telling foreign keys to do something sensible if the primary is deleted other than simply blocking the delete right? Or am I expecting too much from Oracle?

PleegWat

@blakeyrat ON DELETE SET NULL and ON DELETE CASCADE exist and probably work as you'd expect.

djls45

@blakeyrat @PleegWat Our older versions of the application use Oracle databases. Our newer versions use SQL Server. For this, I'm working on one of the oldest we have.

The history table is not ever to be updated or deleted, because that's supposed to be our tracking for audits, which can get really nitpicky with all the regulations around them (and why I'm really surprised that the null user for deletions wasn't fixed until only late last year at the earliest).

Because it's an audit table (maybe think of it as sort of like a journaling system), it has to contain the current values as well as all the historical ones. If something ever happens to the current values table (which has happened before), we need to be able to reconstruct it from the audit table. We do have regular backups of our databases, but the lag time to restore one from those is longer than most of our clients (and possibly the regulatory agencies involved) will accept in most circumstances.

Would an ON DELETE SET NULL or ON DELETE CASCADE trigger the UPDATE/DELETE trigger for the tables with a foreign key to the affected one? If not, then that's why we don't use them.

The application handles most of the data control to the database, but occasionally, we have to perform a manual data management task, so we save the username there in both cases.

PleegWat

@djls45 said in SQL help: Building changelists from history table:

Would an ON DELETE SET NULL or ON DELETE CASCADE trigger the UPDATE/DELETE trigger for the tables with a foreign key to the affected one? If not, then that's why we don't use them.

Yes, ON DELETE actions cause triggers to fire. But that's not what @blakeyrat wants you to do. @blakeyrat wants you to ON DELETE SET NULL the mt_id in the audit table to free up the sequence numbers. Which doesn't make sense in an oracle context because sequences don't recycle.

blakeyrat

@djls45 said in SQL help: Building changelists from history table:

The history table is not ever to be updated or deleted, because that's supposed to be our tracking for audits,

Well... of course. Do you think I proposed doing that?

@djls45 said in SQL help: Building changelists from history table:

Because it's an audit table (maybe think of it as sort of like a journaling system), it has to contain the current values as well as all the historical ones.

That's the weird part. It's like a "history and current database". (I actually had a company whose database worked similarly, but they actually used a totally different DBMS for the "rolling history" part-- it went in MongoDB of all things.)

Seems like what you need is one of those newfangled temporal databases, but of course that didn't really exist when this was being designed.

@djls45 said in SQL help: Building changelists from history table:

Would an ON DELETE SET NULL or ON DELETE CASCADE trigger the UPDATE/DELETE trigger for the tables with a foreign key to the affected one? If not, then that's why we don't use them.

Well it should, a delete is a delete. Changing the foreign key behavior doesn't change the fact that you're deleting a row.

@pleegwat said in SQL help: Building changelists from history table:

Yes, ON DELETE actions cause triggers to fire. But that's not what @blakeyrat wants you to do. @blakeyrat wants you to ON DELETE SET NULL the mt_id in the audit table to free up the sequence numbers.

Correct; holding an ID to an identity column where the IDs can be recycled is the blatant error I was spotting, because I did not know this:

@pleegwat said in SQL help: Building changelists from history table: