PostgreSQL, maintaining count metadata

cartman82

It seems I'll need to begin maintaining row counts metadata for some of my auto-generated tables.

Good news: All my interactions with the tables are through stored functions, so I should be able to just update a row every time I do something and be pretty sure counts are correct

Bad news: It's still pain in the ass. Also, I'm worried about concurrency issues.

Let's get concrete. Here's a toy setup that illustrates the issue:

CREATE TABLE category_1390 (
	resource character varying
);

CREATE TABLE category_2463 (
	resource character varying
);

CREATE TABLE metadata (
	id serial,
	category character varying,
	count integer
);

CREATE OR REPLACE FUNCTION fn_move(from_tbl character varying, to_tbl character varying, move_count integer)
RETURNS VOID AS
$BODY$
	DECLARE
		moved integer;
	BEGIN
		EXECUTE '
			WITH
				deleted AS (
					DELETE FROM ' || quote_ident('category_' || from_tbl) || '
					WHERE resource IN (
						SELECT resource
						FROM ' || quote_ident('category_' || from_tbl) || '
						LIMIT $1
					)
					RETURNING *
				),
				inserted AS (
					INSERT INTO ' || quote_ident('category_' || to_tbl) || '(resource)
					SELECT deleted.resource
					FROM deleted
					RETURNING * 
				)
			SELECT *
			FROM inserted;
		'
		USING move_count;

		GET DIAGNOSTICS moved = ROW_COUNT;
		RAISE NOTICE 'Moved rows: %', moved;

		UPDATE metadata SET count = count - moved WHERE category = from_tbl;
		UPDATE metadata SET count = count + moved WHERE category = to_tbl;
	END
$BODY$
	LANGUAGE plpgsql VOLATILE
	COST 100;

Used like this:

INSERT INTO category_1390(resource) VALUES ('a'), ('b'), ('c');
INSERT INTO metadata(category, count) VALUES ('1390', 3), ('2463', 0);
SELECT fn_move('1390', '2463', 1);
SELECT category, count FROM metadata;

-- Output:
-- 1390;2
-- 2463;1

So fn_move will move count records from one table to another and update the metadata. This works in principle. But...

I'm, worried about concurrency. What happens when dozens of functions like this are running at the same time? The best I can tell, update should lock the rows and perform += or -= operations atomically. But I'm still not sure if this will work in EVERY combination of update functions running at the same time.

I suppose I could set up some kind of toy simulation, leave it to run for a day and tally the results. But that will take a LOT of time. It will not really test edge case conditions (eg. running in a different transaction isolation level comes to mind). And even if I find discrepancies, it won't tell me how to fix them.

So... anyone has any experience with stuff like this? Any red flags you can spot off the cuff?

cartman82

Some further thinking...

Let's say we have two fn_move calls at the same time. Each transaction operating in the default READ COMMITTED isolation level.

From http://www.postgresql.org/docs/9.3/static/transaction-iso.html:

Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.

So... let's call it fn_A, comes to the UPDATE metadata part. It only sees the metadata from before it started, not any changes fn_B might have made. So it updates metadata to old_count + changes_made_by_A. The same with fn_B - it sets the count to old_count + changes_made_by_B. So, basically whoever finishes last gets to set the new value, overwriting the updated value of other transaction.

Right? Or will the row lock?

Not sure...

boomzilla

The simple / straightforward solution is SELECT FOR UPDATE. You could do that on the metadata rows you're going to update at the start. Of course, you could still get into deadlocking, and it might kill performance, but if you really need that sort of thing...

Better solutions might be views or materialized views, I think. Hard to say without knowing more about what's going on.

cartman82

@boomzilla said:

The simple / straightforward solution is SELECT FOR UPDATE. You could do that on the metadata rows you're going to update at the start. Of course, you could still get into deadlocking, and it might kill performance, but if you really need that sort of thing...

Hmm, would I be able to just lock the rows I want to change? That would actually be ok, performance-wise.

But how do I do that? If I just add SELECT FOR UPDATE at the start of that fn, that lock won't hold until the end of transaction. As far as I understand, everything would need to be one query that contains the SELECT FOR UPDATE part. Right?

@boomzilla said:

Better solutions might be views or materialized views, I think. Hard to say without knowing more about what's going on.

I'm not sure how views could help.

I intentionally reduced to problem to the barest minimum of what I need done. Needless to say, there will be hundreds of tables and many different kinds of fn_move - like functions and many more columns etc. I just need to keep counts on a bunch of tables because COUNT(*) is too slow and VACUUM ANALYZE too imprecise.

boomzilla

@cartman82 said:

If I just add SELECT FOR UPDATE at the start of that fn, that lock won't hold until the end of transaction.

That's exactly what it's supposed to do, though I've never worked on pg.

@cartman82 said:

I'm not sure how views could help.

Just use a view to calculate that metadata when you need it instead of storing it in a table.

@cartman82 said:

I just need to keep counts on a bunch of tables because COUNT(*) is too slow

That answers that, then.

Hmm...can you do count( my_pk_column ) or something similar? I know on Oracle, at least, that can be pretty fast.

cartman82

@boomzilla said:

That's exactly what it's supposed to do, though I've never worked on pg.

I'll test it. I'm working on a different approach ATM, so might not be needed.

@boomzilla said:

Hmm...can you do count( my_pk_column ) or something similar? I know on Oracle, at least, that can be pretty fast.

Doesn't seems so.

Slow Counting - PostgreSQL wiki

COUNT seems to be PostgreSQL-s Achilles heel.

Working on a different idea ATM, will update as soon as coded.

cartman82

What about this:

CREATE TABLE metadata_updates (
	id serial,
	category character varying,
	delta integer
);

CREATE OR REPLACE FUNCTION fn_update_metadata()
RETURNS VOID AS
$BODY$
	BEGIN
		WITH
			deleted AS (
				DELETE FROM metadata_updates
				RETURNING category, delta
			)
		UPDATE metadata
		SET count = count + total_deltas.total_delta
		FROM (
			SELECT category, SUM(delta) AS total_delta
			FROM deleted
			GROUP BY category
		) total_deltas
		WHERE metadata.category = total_deltas.category;
	END
$BODY$
	LANGUAGE plpgsql VOLATILE
	COST 100;

Inside fn_move():

		GET DIAGNOSTICS moved = ROW_COUNT;
		RAISE NOTICE 'Moved rows: %', moved;

		INSERT INTO metadata_updates(category, delta)
		VALUES
			(from_tbl, -1 * moved),
			(to_tbl, moved);

Then we can do this:

INSERT INTO category_1390(resource) VALUES ('a'), ('b'), ('c');
INSERT INTO metadata(category, count) VALUES ('1390', 3), ('2463', 0);

SELECT fn_move('1390', '2463', 1);

SELECT category, delta FROM metadata_updates;
-- 1390;-1
-- 2463;1

SELECT fn_update_metadata();

SELECT category, count FROM metadata;
-- 1390;2
-- 2463;1

So, fn_move() doesn't touch metadata directly but creates a log of all changes. That should be pretty thread-safe.

Then, occasionally (or when I want to get fresh stats), I call fn_update_metadata(). It consumes all rows from metadata_updates and sets metadata rows to new values. Should work atomically.

I'm more confident with this. Feels it should work.

PJH

I take it you've come across http://www.varlena.com/GeneralBits/120.php ?

cartman82

@boomzilla said:

That's exactly what it's supposed to do, though I've never worked on pg.

I tested this FOR SCIENCE!

CREATE OR REPLACE FUNCTION fn_lock_and_wait(id character varying, cat character varying)
RETURNS VOID AS
$BODY$
	DECLARE startts timestamp;
	BEGIN
		startts = clock_timestamp();
		
		RAISE NOTICE '[%-%] started at: %', id, cat, startts;

		PERFORM * FROM metadata WHERE category = cat FOR UPDATE;

		RAISE NOTICE '[%-%] lock obtained after: %', id, cat, (clock_timestamp() - startts);

		PERFORM pg_sleep(60);

		RAISE NOTICE '[%-%] sleep ended after: %', id, cat, (clock_timestamp() - startts);
	END
$BODY$
	LANGUAGE plpgsql VOLATILE
	COST 100;

-- in window A
SELECT fn_lock_and_wait('A', '1390');

-- in window B
SELECT fn_lock_and_wait('B', '1390');

-- in window C
SELECT fn_lock_and_wait('C', '2463');

NOTICE:  [A-1390] started at: 2015-03-11 14:14:01.073636
NOTICE:  [A-1390] lock obtained after: 00:00:00.000174
NOTICE:  [A-1390] sleep ended after: 00:01:00.05149

NOTICE:  [B-1390] started at: 2015-03-11 14:14:02.976742
NOTICE:  [B-1390] lock obtained after: 00:00:58.154496
NOTICE:  [B-1390] sleep ended after: 00:01:58.194356

NOTICE:  [C-2463] started at: 2015-03-11 14:14:04.779712
NOTICE:  [C-2463] lock obtained after: 00:00:00.000827
NOTICE:  [C-2463] sleep ended after: 00:01:00.017549

You were right, @boomzilla.

cartman82

@PJH said:

I take it you've come across http://www.varlena.com/GeneralBits/120.php ?

Yeah, it's linked from the url I gave boomzilla.

That's what prompted me to go with this approach. Except I'd rather avoid triggers if possible, so I'm trying to do it more elegantly with these functions.

Some useful advice for initialization logic. That part is gonna be a bitch.

cartman82

I feel obligated to update this, in the name of SCIENCE.

It turns out my initial idea with direct updates would have worked too.
Postgres won't allow UPDATE to be performed on "dirty" rows, where a different update was already performed.

See here: http://big-elephants.com/2013-09/exploring-query-locks-in-postgres/

Very interesting.

Still, I think I'll stick with my update-log solution. This behavior is dependent on transaction isolation level. If a client sets isolation to "Read uncommitted", UPDATE will be able to perform a "dirty read" and corrupt the metadata. With an update log, I'll feel much safer.

Filed under: Feels trump science yet again

blakeyrat

@cartman82 said:

Hmm, would I be able to just lock the rows I want to change?

I'm not a PostgreSQL expert, but in MS SQL the equivalent tag SELECT WITH UPDLOCK will lock every row the SELECT touches until the transaction exits.

I actually agree that a view sounds like a better idea here. You can easily make a view with an aggregation in it. EDIT: ok perhaps a performance issue here, damn.

ntqz

You could always look at using advisory locks as well for locking table rows prior to updates/inserts/deletes.Hell, they've even made a task queue out of it.