SQL - Updating current state to new state for complex relationships.

Gąska

@Bulb said in SQL - Updating current state to new state for complex relationships.:

@Gąska It still makes sense because there is still just one user sitting in front of the application and doing any changes. And the background threads can still read the data for some processing in WAL mode.

I meant in general, not just your use case. I can think of many scenarios where I'd want to have simultaneous in-memory-DB writes from different threads.

Bulb

@Gąska Actually in the use-case I had I could have parallelized some of the independent steps, but it was enough to parallelize some calculations and collect the results for writing in one thread. I think the (many) steps that involved pure SQL were I/O bound anyway. So, there are some use-cases where it would help, but I don't think they are that common. There is still lot of those where it is an overkill and the simplicity wins.

dkf

@Gąska said in SQL - Updating current state to new state for complex relationships.:

I can think of many scenarios where I'd want to have simultaneous in-memory-DB writes from different threads.

It's not a big problem most of the time. The big problems come if you're wanting to do very large updates at the same time as other updating activity, or if you want to upgrade locks (as you can get deadlocks if two threads want to upgrade at the same time). What idiot would upgrade locks? Well, it's actually depressingly common if you use the Xerial SQLite JDBC driver, because the semantics of JDBC requires holding a read lock a lot of the time by default and upgrading it to an exclusive write lock whenever you do an INSERT or UPDATE. (JDBC's pretty weird in what it does with transactions under the hood. I wish I didn't know about this stuff.)

By sorting out the locking semantics to be simple and sane and keeping the length of time that locks (especially write locks!) are held short, you can do pretty good multithreaded access. Very write heavy loads remain best done single threaded in SQLite with the rest of the application not touching the DB at all; sometimes to the point where you turn off transactions entirely (at which point you will be pure I/O limited).

Bulb

@dkf Retrying transactions on deadlocks is kind of how it's taught it should be done, and you know the oldest transaction will never rollback this way, so you are guaranteed to make a progress. But sometimes it is indeed more efficient to begin immediate to get the write lock from the start.

Also, keeping write transactions short is generally recommended for all database applications on all RDBMS. Because while the servers have finer lock granularity, there is usually more parallel activity going on too.

Gąska

@dkf said in SQL - Updating current state to new state for complex relationships.:

Very write heavy loads remain best done single threaded in SQLite with the rest of the application not touching the DB at all; sometimes to the point where you turn off transactions entirely (at which point you will be pure I/O limited).

The question is how much of that is inherent in the problem domain SQLite is solving, and how much of that is just a consequence of the assumption they've made all those years ago that all CPUs ever are single-threaded.

Arantor

@Gąska SQLite is best used in the sort of embedded spaces where you’re using it as essentially a single-user case (e.g. the backing store for Skype’s local copy of the conversation for example)

You wouldn’t normally expect a lot of concurrency in those kinds of places, and full locking is (usually) fine.

But you know as well as I do that people will misuse it in ways never intended. Play stupid games, etc.

Bulb

@dkf said in SQL - Updating current state to new state for complex relationships.:

sometimes to the point where you turn off transactions entirely

I don't think it's possible. Not completely anyway. You can set read_uncommited isolation, and you can reduce the locking granularity to table for connections that share cache, but the writes at table level are always in transactions. Because ‘no transactions’ just means to SQLite to automatically commit at the end of each statement.

Gąska

@Arantor again - are you unlikely to run into those problems because you won't see SQLite used in those domains, or do you not see SQLite used in those domains because you'd run into those problems?

"This software works fine for what it's currently used for" is tautologically true for almost any software.

Benjamin Hall

@Gąska said in SQL - Updating current state to new state for complex relationships.:

"This software works fine for what it's currently used for" is tautologically true for almost any software.

The "almost" is doing heavy duty there. Because I've used lots of software that was piss-poor at what it was being used for. Which included things it explicitly said it was supposed to be used for and was purchased (and supported!) for.

TL;DR--educational software is almost all crap.

Bulb

@Gąska You do see it used in that kind of domains, actually. Because it's otherwise really pretty fast and handles even huge data sets with ease. Canonical even started building a multi-master distributed system on top of it called dqlite. I'm not sure how far they really got with it, but I think the original intent was to replace the etcd (which is nosql itself) component in kubernetes control plane.

Gąska

@Bulb really? Last time I checked SQLite was considered to be strongly on the slow side. But then, the last time I checked was over a decade ago.

Bulb

@Gąska SQLite is really, really fast … except for commits, that are really, really slow. Which does make it a bit tricky to balance, because on one hand you want to combine writes to single commit to reduce the overhead and on the other you don't want to make the transactions too long in case another thread needs to do something as well. I think the WAL improved on that a bit too.

But heaviest use-case I head was a bit special anyway. There we did transform some data, in steps, each doing some complex selects and writing a new table. And we had each step in a single transaction, so if it failed, we would fix it and just re-start the application and it would know which steps are already done and which need to be done. So it couldn't run the steps in parallel, but it would be a bit of pain to implement in the dependency engine anyway. We did parallelization inside some of the steps where they involved calculation rather than just shuffling data around.

Watson

@djls45 said in SQL - Updating current state to new state for complex relationships.:

@Gąska said in SQL - Updating current state to new state for complex relationships.:

@Kamil-Podlesak said in SQL - Updating current state to new state for complex relationships.:

At least not for all DDL statements; some of them are now "atomic" == they are actually done in a transaction, but only one statement per transaction .

Wait, then what was it doing before!?

DDL that might execute immediately, or it might wait until your next DDL statement is entered, or it might queue up DDL to be executed with the next DML statement...?
(How else can we make this terrible?)

Tried to look it up. Like everything else about MySQL, it's a nowhere-differentiable bundle of edge cases.

CREATE TABLE will cause the current transaction to commit, the table to be created, and the table creation to be committed. CREATE TEMPORARY TABLE won't cause commits, but the table will remain even if a ROLLBACK is issued. It doesn't say if a new transaction is started to contain anything that followed the create statement.

But it's a whole page of stuff like:

The CREATE TABLE statement in InnoDB is processed as a single transaction. This means that a ROLLBACK from the user does not undo CREATE TABLE statements the user made during that transaction.

I think I get what it says: "Because a CREATE TABLE statement immediately commits, it cannot be rolled back." but it gives the impression that transactions could be nested (how else could you have multiple CREATE TABLE statements in a transaction)?

Kamil Podlesak

@Watson said in SQL - Updating current state to new state for complex relationships.:

I think I get what it says: "Because a CREATE TABLE statement immediately commits, it cannot be rolled back." but it gives the impression that transactions could be nested (how else could you have multiple CREATE TABLE statements in a transaction)?

DDL transaction is something completely different than innodb DML transaction, so they can be nested.

dkf

@Bulb said in SQL - Updating current state to new state for complex relationships.:

I don't think it's possible. Not completely anyway.

OK, yes, you're only really turning off the disk-based part of the transactions (PRAGMA sync = off IIRC) but transactions without the guarantee that they're durable are a bit of a fig leaf. OTOH, if you've got a process whose only job is to make a DB from scratch that will contain a terabyte of data in intertwingled tables, and nothing's going to read from it until it is done, turning those syncs off is pretty sweet and firing up a new Postgres database would be painful by comparison.

@Bulb said in SQL - Updating current state to new state for complex relationships.:

SQLite is really, really fast … except for commits, that are really, really slow.

It's the syncs that make it slow. Really.

Bulb

@dkf Yes, it is indeed the sync. The other databases are using some system-specific optimizations that sqlite does not.

And sqlite isn't the only tool that gets hit by this. Debian has a utility called eatmydata that disables fsync for purpose of testing installations in chroots, because dpkg is massively slowed by that as well.

dkf

@Bulb said in SQL - Updating current state to new state for complex relationships.:

The other databases are using some system-specific optimizations that sqlite does not.

There are additional tricks you can play when you're a service and can be reasonably sure you're not about to have the rug pulled out from under your feet. For example, it means that they don't need to implement locks using POSIX file locking and can use whole directory structures for databases.

The syncs aren't too big a problem unless you're doing a lot of write transactions, and defaulting to doing things safely is the right choice.

Arantor

@dkf said in SQL - Updating current state to new state for complex relationships.:

intertwingled

Love this word.

PleegWat

@dkf Don't the big engines generally also want to be consistent in case of power outages and the like?

One solution which I know is used is journalling - for any write transaction:

New and modified data is written to new data blocks.
Changes which need to be made to existing data structures are written to the journal.
The journal is committed.
Changes to existing data structure are written.

If the database crashes before step 3, the transaction is lost. Step 3 is atomic. If the database crashes after step 3, then on next startup the journal can be replayed, after which the transaction is retained and the database is in valid state. even if structures would have been in a corrupt half-written state due to the crash.

I don't know how sqlite works though. If they keep data in memory until commit (rather than tentatively writing it to the data file) this would increase the IO load of the actual commit.

dkf

@PleegWat said in SQL - Updating current state to new state for complex relationships.:

I don't know how sqlite works though. If they keep data in memory until commit (rather than tentatively writing it to the data file) this would increase the IO load of the actual commit.

It depends on which transaction mode you use, and what operations the filesystem supports. I've been told more details, but I really don't remember them.

Bulb

@PleegWat said in SQL - Updating current state to new state for complex relationships.:

@dkf Don't the big engines generally also want to be consistent in case of power outages and the like?

One solution which I know is used is journalling - for any write transaction:

New and modified data is written to new data blocks.

Changes which need to be made to existing data structures are written to the journal.

The journal is committed.

Changes to existing data structure are written.

If the database crashes before step 3, the transaction is lost. Step 3 is atomic. If the database crashes after step 3, then on next startup the journal can be replayed, after which the transaction is retained and the database is in valid state. even if structures would have been in a corrupt half-written state due to the crash.

I don't know how sqlite works though. If they keep data in memory until commit (rather than tentatively writing it to the data file) this would increase the IO load of the actual commit.

My understanding is that the write-ahead log is something like this, while the old method copies the old data to a rollback journal, overwrites the main data, then discards the rollback data.