The Revival of Great SQL Ideas

Gąska

@JBert shouldn't the crash cause the transaction to rollback, releasing all locks in the process?

JBert

@Gąska Only when the transaction has been so long in the transaction log that the RDBMS rolls it back for you or if you do a manual intervention before restarting your program. Maybe it's slightly sooner if the SQL client in your program still has time to declare that it's going to close the connection, but unexpected crashes obviously don't guarantee that.

Gąska

@JBert said in The Revival of Great SQL Ideas:

or if you do a manual intervention before restarting your program

Tell me, how exactly did the crash happen? And why has it skipped all finalizers for DB-related objects? You're not writing your DB-interacting server app in C, are you?

Carnage

@Gąska said in The Revival of Great SQL Ideas:

@JBert said in The Revival of Great SQL Ideas:

or if you do a manual intervention before restarting your program

Tell me, how exactly did the crash happen? And why has it skipped all finalizers for DB-related objects? You're not writing your DB-interacting server app in C, are you?

Sanitationary carbon unit initialized power down?
(Or cleaning lady yanked the damned power line for the web server again)

Gąska

@Carnage if power going down mid-transaction causes your DB to become unusable, you've got much bigger problems than XML field.

Carnage

@Gąska said in The Revival of Great SQL Ideas:

@Carnage if power going down mid-transaction causes your DB to become unusable, you've got much bigger problems than XML field.

It won't be unusable. But if the transaction isn't rolled back or committed it'll be blocking further updates of that row until someone rolls it back. Either the database itself when it figures out that the client is MIA, or some meatbased solution.
If you've got a few hundreds of thousands of transactions flying by every minute, those locks could be a bit of a problem.

Now, I do realise that most systems won't have any issues with this. And deciding what is important is a part of the data modeling and design. For some systems, it's wrong to care if there is a write between yours and the last read you performed, while for some it's really, really important.

Gąska

@Carnage said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@Carnage if power going down mid-transaction causes your DB to become unusable, you've got much bigger problems than XML field.

It won't be unusable. But if the transaction isn't rolled back or committed it'll be blocking further updates of that row until someone rolls it back. Either the database itself when it figures out that the client is MIA, or some meatbased solution.

Note you'll have the exact same problem no matter how you're making the updates, how you've structured the data, or how you're using (or not using) transactions. Even if you don't start one explicitly, there's always this implicit transaction which is always there no matter what you do, and it can die mid-operation too.

If you've got a few hundreds of thousands of transactions flying by every minute, those locks could be a bit of a problem.

I admit I don't know enough about DBs or webdev to know the right solution. But if not transactions, then what's typically used to guarantee that data races as described above don't happen? Or is the entire issue ignored because "web scale"?

Cabbage

@Gąska

You start a transaction. You get exclusive lock for that field/table. Bob starts a transaction. He has to wait for your lock. You read the blob. You modify field A. You serialize it back, write, commit, and end your transaction. Bob gets the lock. Bob reads the blob, modifies field B, serializes it back, writes, commits and ends his transaction. Voila.

Something potentially non-obvious to people who aren't SQL experts: this isn't how SQL transactions work by default in any SQL dialect I know of. Instead, by default, transactions are not fully ~~atomic~~ isolated (EDIT: I think I was technically misusing the term "atomic" here) and the series of events described by @wft here...

A database holds a serialized representation of composite data. The database knows it’s an opaque blob of bytes. I read it in, and Bob reads it. I modify field A, Bob modifies field B. Then we both serialize what we have back, write it, and commit.

... is perfectly possible. To get the behaviour @Gąska describes, you need to either take out some sort of lock explicitly before doing stuff in your transaction, or you need to set your transaction isolation level to SERIALIZABLE (instead of the default of READ COMMITTED or REPEATABLE READ, depending upon your RDBMS) and add in retry-on-failure code in your application because SERIALIZABLE mode (at least in Postgres) basically just avoids the race conditions described here by having a tantrum and crashing the transaction when it detects the failure mode that @wft describes.

Of course, you probably won't do any of that, because

the number of programmers actually giving a shit about this is dangerously low.

Cabbage

@mott555 said in The Revival of Great SQL Ideas:

I remember seeing a manhole layer and one of the fields was "Location". Normally this is a foreign key into another table, but on this one, it was just a normal attribute on the manhole table. And it had dozens, if not hundreds, of variations for each possible value, using different nouns, capitalization, superfluous whitespace, and typographical errors!

...

"middle of the street"

"middle of street"

"middle of road"

"In the middle of the street"

"middle of street"

"center of street"

"center of road"

"street"

And so on...

Yes, a normalised version of this data would be much better.

Manhole cover table:

id | location_id
---+------------
8  | 1
9  | 2
10 | 3
11 | 4
12 | 5
13 | 6
14 | 7
15 | 8

Location table:

id | description
---+----------------------------------------------
1  | middle of the street
2  | middle of street
3  | middle of road
4  | In the middle of the street
5  | middle&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;of street
6  | center of street
7  | center of road
8  | street

Gąska

@Cabbage said in The Revival of Great SQL Ideas:

@mott555 said in The Revival of Great SQL Ideas:

I remember seeing a manhole layer and one of the fields was "Location". Normally this is a foreign key into another table, but on this one, it was just a normal attribute on the manhole table. And it had dozens, if not hundreds, of variations for each possible value, using different nouns, capitalization, superfluous whitespace, and typographical errors!

...

"middle of the street"

"middle of street"

"middle of road"

"In the middle of the street"

"middle of street"

"center of street"

"center of road"

"street"

And so on...

Yes, a normalised version of this data would be much better.

Manhole cover table:
id | location_id
---+------------
8  | 1
9  | 2
10 | 3
11 | 4
12 | 5
13 | 6
14 | 7
15 | 8
Location table:
id | description
---+----------------------------------------------
1  | middle of the street
2  | middle of street
3  | middle of road
4  | In the middle of the street
5  | middle&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;of street
6  | center of street
7  | center of road
8  | street

The assumption here is that Location table is under strict control and won't get gibberish added as easily.

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

I admit I don't know enough about DBs or webdev to know the right solution. But if not transactions, then what's typically used to guarantee that data races as described above don't happen? Or is the entire issue ignored because "web scale"?

We use transactions plus optimistic locking. Essentially, you have an extra version column and you make sure the value you last read matches up with the current value in the DB before you write your data out. If they don't, you get an exception (I don't remember exactly what it is off the top of my head).

It doesn't really happen that often for us, but the frequency would be really application specific.

Gąska

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

Cabbage

@Gąska said in The Revival of Great SQL Ideas:

The assumption here is that Location table is under strict control and won't get gibberish added as easily.

Right, but my point (to the extent that I have one rather than just trolling) is that that "strict control" is mostly orthogonal to normalisation. You can lock down the locations to a hard-coded list of values without normalisation. You can populate a dropdown or autocomplete with all previously used values without normalisation. I guess you might start feeling the need for normalisation if you wanted to have an admin section where new locations can be entered but lock down selection to those existing locations, or if you wanted to translate locations into French or something, but it's perfectly possible to have "locked down" data without normalisation or shit data with it. The solution here is "don't just store arbitrary user-provided text when you need structured data", and the possible alternatives to that may or may not require normalising the data.

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

No, this explicitly protects against @wft's race condition. Also, @Carnage's issue. If the data is stale, you have to refresh it and let the user re-do whatever he was doing.

It's basically what @JBert was saying here:

@JBert said in The Revival of Great SQL Ideas:

@Carnage said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@wft if you split read and write into separate transactions, then yes, you have that problem. But why would you ever split read and write into separate transactions?

When the read and write are separate REST calls?
Or any other API where the data leaves your transaction controlled layer.

Then you need to push some "version" identifier to the REST API consumer and require that they pass it back when they update something. The HTTP ETag can be used for that.

We use Hibernate and we have a field annotated with @Version (from the javax.persistence package). This does not work for bulk updates. We do kind of fly blind there.

Gąska

@Cabbage said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

The assumption here is that Location table is under strict control and won't get gibberish added as easily.

Right, but my point (to the extent that I have one rather than just trolling) is that that "strict control" is mostly orthogonal to normalisation.

Not quite. Normalization makes strict control possible (or at least feasible - without normalization, to achieve the same effect, you'd need on every update to query the entire table for duplicate values, and abandon update if there's none).

You can lock down the locations to a hard-coded list of values without normalisation.

And conceptually, it would be equivalent to a normalized database. It's just not all relations would live in actual tables in actual DB.

JBert

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

@boomzilla doesn't use explicit locking, unlike what you proposed here:

@Gąska said in The Revival of Great SQL Ideas:

@wft said in The Revival of Great SQL Ideas:

How does atomicity enable us to avoid undoing work of one another?

You start a transaction. You get exclusive lock for that field/table. Bob starts a transaction. He has to wait for your lock. You read the blob. You modify field A. You serialize it back, write, commit, and end your transaction. Bob gets the lock. Bob reads the blob, modifies field B, serializes it back, writes, commits and ends his transaction. Voila.

That's why SQL users make the distinction between "pessimistic locking" (nobody can touch the data until the transaction owning the lock is done / rolled back) and "optimistic locking" (no locks, but transactions operating on the same entity might fail and need to be redone if we were too optimistic that nobody was operating on this entity at this time and increased the "version").

Cabbage

@Gąska said in The Revival of Great SQL Ideas:

without normalization, to achieve the same effect, you'd need on every update to query the entire table for duplicate values

Yeah, and that's often fine. There are not enough manhole covers in the world for this to be impractical, even if the database contains every manhole cover that exists and all of them are in the middle     of street.

Gąska

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

No, this explicitly protects against @wft's race condition. Also, @Carnage's issue. If the data is stale, you have to refresh it and let the user re-do whatever he was doing.

@Carnage's issue was that if power dies mid-transaction on DB client's side, then the DB is locked out at least until transaction timeout occurs. Does your setup protect against that?

It's basically what @JBert was saying here:

It's also basically what I've been saying. But if I'm wrong and you say that same thing as me, then you must be wrong too.

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@Carnage's issue was that if power dies mid-transaction on DB client's side, then the DB is locked out at least until transaction timeout occurs. Does your setup protect against that?

We use Oracle. I assume it has some stuff like that. But our DBA is kind of retarded when it comes to actual operational stuff like that so I have no idea what might happen in that case.

@Gąska said in The Revival of Great SQL Ideas:

It's also basically what I've been saying. But if I'm wrong and you say that same thing as me, then you must be wrong too.

No, as he said, you seem to be talking about pessimistic locking, not optimistic locking. But maybe we need to have an all day flamewar about the meaning of those two words?

Gąska

@Cabbage said in The Revival of Great SQL Ideas:

There are not enough manhole covers in the world for this to be impractical

I wish we had this discussion 20 years ago, when average PC wasn't able to read millions of rows per second. Because you've basically said that reading millions of rows per second just to check if value already exists is totally fine. Which is mostly correct.

izzion

@Gąska
~~I mean, "millions" of rows per second is out of the reach of most SQL servers. Especially given the size of actual user-data-included rows and not just pretty theoretical rows that get used in dev.~~

~~"Hundreds" per second I'll grant you. Maaaaaybe "thousands" after you've thrown six figures at server & licensing costs. But "millions" of rows is going to take minutes or hours to read.~~

Edit: Then again, I went looking for an example of a shitty query using the SO database for actual stats, and they went through 9 million rows in 8 seconds of CPU time (albeit discarding all but 3000 of them), so I guess I'm wrong.

Gąska

@izzion oh, good. I've greatly overestimated Moore's law. In that case - @Cabbage, turns out there is more than enough manholes for this to be completely impractical.

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@Cabbage said in The Revival of Great SQL Ideas:

There are not enough manhole covers in the world for this to be impractical

I wish we had this discussion 20 years ago, when average PC wasn't able to read millions of rows per second. Because you've basically said that reading millions of rows per second just to check if value already exists is totally fine. Which is mostly correct.

I don't believe he did. There could very well be an index on that column regardless of normalization. Or a specific CHECK constraint on the value.

loopback0

@izzion said in The Revival of Great SQL Ideas:

so I guess I'm wrong.

No I think you're still right.

Gąska

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@Cabbage said in The Revival of Great SQL Ideas:

There are not enough manhole covers in the world for this to be impractical

I wish we had this discussion 20 years ago, when average PC wasn't able to read millions of rows per second. Because you've basically said that reading millions of rows per second just to check if value already exists is totally fine. Which is mostly correct.

I don't believe he did.

We can disagree on exact semantics of his words later.

There could very well be an index on that column regardless of normalization.

I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

Or a specific CHECK constraint on the value.

Which, as I explained earlier, is conceptually equivalent to normalization.

loopback0

@Gąska said in The Revival of Great SQL Ideas:

is it possible to query an index whether a given value already exists in a column

That's the point of an index, no?

Cabbage

@boomzilla said in The Revival of Great SQL Ideas:

I don't believe he did. There could very well be an index on that column regardless of normalization. Or a specific CHECK constraint on the value.

Eh, I think I'll technically give @Gąska this one; in my silly example all the manhole covers in the world were in the same location, so an index on the location value wouldn't help.

Though I don't think it invalidates my broader point (that enforcing data sanity does not necessarily require normalisation; normalisation is just a useful tool to employ in some situations).

@Gąska said in The Revival of Great SQL Ideas:

I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row?

Yes. It might be O(log n) or O(1) in the total number of rows in the table, depending upon the type of index.

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

There could very well be an index on that column regardless of normalization.

I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

I have questions:

What do you suppose indexes do?
Can you think of a reason to have an index that didn't do that?
Do you prefer indices over indexes?

izzion

@Gąska said in The Revival of Great SQL Ideas:

is it possible to query an index whether a given value already exists in a column, without enumerating each row?

And to be sure that you get a definite yes/no the value exists? No.

Indexes on a column will lead to the creation of a statistic that will summarize values of the column, but that statistic isn't going to be an enumeration of all distinct values in the column -- they're generally histograms that provide a snapshot of how many rows are "between" two different values (in an integer column with values from 0 to 100, the histogram might have entries for 0, 5, 10, 15, and so on, and then will have data on how many rows have exactly 0, between 0 and 5, exactly 5, etc.) Further, statistics aren't updated after every IUD operation -- by default most brands of SQL server will only update after row changes representing 15-25% of the number of rows of the table, or when the sysadmin triggers a statistics update pass manually/via job. So even the fact that the histogram ranges from 0-100 on our hypothetical integer column doesn't tell us that values outside that range are invalid -- it just means there weren't any values outside that range the last time statistics were updated.

Edit: to clarify -- though you could enumerate every row within the index (getting only the columns in the index, which would be the sort column plus any "included" columns used to help optimize read queries using that index, plus the primary key column(s) of the table), without enumerating other columns that are in the table but not in the index. You'd still need to read every row in the index to ensure the column didn't have a specified value.

Gąska

@JBert said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

@boomzilla doesn't use explicit locking, unlike what you proposed here:

@Gąska said in The Revival of Great SQL Ideas:

@wft said in The Revival of Great SQL Ideas:

How does atomicity enable us to avoid undoing work of one another?

You start a transaction. You get exclusive lock for that field/table. Bob starts a transaction. He has to wait for your lock. You read the blob. You modify field A. You serialize it back, write, commit, and end your transaction. Bob gets the lock. Bob reads the blob, modifies field B, serializes it back, writes, commits and ends his transaction. Voila.

That's why SQL users make the distinction between "pessimistic locking" (nobody can touch the data until the transaction owning the lock is done / rolled back) and "optimistic locking" (no locks, but transactions operating on the same entity might fail and need to be redone if we were too optimistic that nobody was operating on this entity at this time and increased the "version").

Okay, I see. Makes sense. Though even with optimistic locking, you'd still not be able to perform a write at the exact same moment someone else is doing a write, so most of the same (non-)issues remain. In fact, what I proposed with explicit locks is generally reimplementing optimistic locking with pessimistic locking (lock, read, compare, update, release). That's what happens when you don't know the proper terminology.

boomzilla

@izzion said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

is it possible to query an index whether a given value already exists in a column, without enumerating each row?

And to be sure that you get a definite yes/no the value exists? No.

Indexes on a column will lead to the creation of a statistic that will summarize values of the column, but that statistic isn't going to be an enumeration of all distinct values in the column -- they're generally histograms that provide a snapshot of how many rows are "between" two different values (in an integer column with values from 0 to 100, the histogram might have entries for 0, 5, 10, 15, and so on, and then will have data on how many rows have exactly 0, between 0 and 5, exactly 5, etc.) Further, statistics aren't updated after every IUD operation -- by default most brands of SQL server will only update after row changes representing 15-25% of the number of rows of the table, or when the sysadmin triggers a statistics update pass manually/via job. So even the fact that the histogram ranges from 0-100 on our hypothetical integer column doesn't tell us that values outside that range are invalid -- it just means there weren't any values outside that range the last time statistics were updated.

While I'm sure statistics are often based on indices, they are different things. And they for damn sure better get updated every time the data changes.

Zecc

@izzion said in The Revival of Great SQL Ideas:

I guess I'm wrong

No, you're wrong!

loopback0

@izzion said in The Revival of Great SQL Ideas:

Further, statistics aren't updated after every IUD operation

No but indexes are. They're not the same thing even if they're often related.

edit:

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@JBert said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

@boomzilla doesn't use explicit locking, unlike what you proposed here:

@Gąska said in The Revival of Great SQL Ideas:

@wft said in The Revival of Great SQL Ideas:

How does atomicity enable us to avoid undoing work of one another?

You start a transaction. You get exclusive lock for that field/table. Bob starts a transaction. He has to wait for your lock. You read the blob. You modify field A. You serialize it back, write, commit, and end your transaction. Bob gets the lock. Bob reads the blob, modifies field B, serializes it back, writes, commits and ends his transaction. Voila.

That's why SQL users make the distinction between "pessimistic locking" (nobody can touch the data until the transaction owning the lock is done / rolled back) and "optimistic locking" (no locks, but transactions operating on the same entity might fail and need to be redone if we were too optimistic that nobody was operating on this entity at this time and increased the "version").

Okay, I see. Makes sense. Though even with optimistic locking, you'd still not be able to perform a write at the exact same moment someone else is doing a write, so most of the same (non-)issues remain. In fact, what I proposed with explicit locks is generally reimplementing optimistic locking with pessimistic locking (lock, read, compare, update, release). That's what happens when you don't know the proper terminology.

Correct. When you actually go to write, you do take a lock on the data (hopefully at the row level!). And you don't check the version until you have that lock.

mott555

@Gąska said in The Revival of Great SQL Ideas:

@Cabbage said in The Revival of Great SQL Ideas:
@mott555 said in The Revival of Great SQL Ideas:

I remember seeing a manhole layer and one of the fields was "Location". Normally this is a foreign key into another table, but on this one, it was just a normal attribute on the manhole table. And it had dozens, if not hundreds, of variations for each possible value, using different nouns, capitalization, superfluous whitespace, and typographical errors!

...

"middle of the street"

"middle of street"

"middle of road"

"In the middle of the street"

"middle of street"

"center of street"

"center of road"

"street"

And so on...

Yes, a normalised version of this data would be much better.

Manhole cover table:
id | location_id
---+------------
8  | 1
9  | 2
10 | 3
11 | 4
12 | 5
13 | 6
14 | 7
15 | 8
Location table:
id | description
---+----------------------------------------------
1  | middle of the street
2  | middle of street
3  | middle of road
4  | In the middle of the street
5  | middle&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;of street
6  | center of street
7  | center of road
8  | street
The assumption here is that Location table is under strict control and won't get gibberish added as easily.

Yes. For a normal attribute, the GIS software gives you a text field you can put anything into. But if it's a foreign key into another table, it gives you a combobox prepopulated with that table's records, and you just select the one you need from a list rather than (mis)type a value as plain text.

Gąska

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

There could very well be an index on that column regardless of normalization.

I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

I have questions:

What do you suppose indexes do?

They're magical beasts that make finding a record by some specific (set of) column(s) take less than O(n) time, but other than that don't make any guarantees about semantics or performance beyond what regular tables do. How much off the mark am I?

Can you think of a reason to have an index that didn't do that?

Lookup by key? AFAIK indexes are made mostly for lookups by key - and when you look up, the matching row is being read in full. Is it actually guaranteed that operations that don't require actually reading the record, don't read the record? If so, that's quite neat.

Do you prefer indices over indexes?

I'm a strong supporter of indexes. Dead languages be damned.

Gąska

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@JBert said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

@boomzilla doesn't use explicit locking, unlike what you proposed here:

@Gąska said in The Revival of Great SQL Ideas:

@wft said in The Revival of Great SQL Ideas:

How does atomicity enable us to avoid undoing work of one another?

You start a transaction. You get exclusive lock for that field/table. Bob starts a transaction. He has to wait for your lock. You read the blob. You modify field A. You serialize it back, write, commit, and end your transaction. Bob gets the lock. Bob reads the blob, modifies field B, serializes it back, writes, commits and ends his transaction. Voila.

That's why SQL users make the distinction between "pessimistic locking" (nobody can touch the data until the transaction owning the lock is done / rolled back) and "optimistic locking" (no locks, but transactions operating on the same entity might fail and need to be redone if we were too optimistic that nobody was operating on this entity at this time and increased the "version").

Okay, I see. Makes sense. Though even with optimistic locking, you'd still not be able to perform a write at the exact same moment someone else is doing a write, so most of the same (non-)issues remain. In fact, what I proposed with explicit locks is generally reimplementing optimistic locking with pessimistic locking (lock, read, compare, update, release). That's what happens when you don't know the proper terminology.

Correct. When you actually go to write, you do take a lock on the data (hopefully at the row level!). And you don't check the version until you have that lock.

So the only thing I said wrong is that I didn't call it optimistic locking. So everything that @JBert and @Carnage said is wrong with my solution, isn't actually wrong - it's just I wasn't entirely clear in what I mean because I used wrong terminology. Correct?

So to get back to original issue - when @wft said transactions or atomicity don't give you anything in his scenario, he was still wrong?

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

There could very well be an index on that column regardless of normalization.

I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

I have questions:

What do you suppose indexes do?

They're magical beasts that make finding a record by some specific (set of) column(s) take less than O(n) time, but other than that don't make any guarantees about semantics or performance beyond what regular tables do. How much off the mark am I?

Sounds pretty good. Especially for a generic description.

Can you think of a reason to have an index that didn't do that?

Lookup by key? AFAIK indexes are made mostly for lookups by key - and when you look up, the matching row is being read in full. Is it actually guaranteed that operations that don't require actually reading the record, don't read the record? If so, that's quite neat.

It depends on the db and what columns you're asking for. If you only want column(s) in the index then there is no need to fetch the actual row.

Do you prefer indices over indexes?

I'm a strong supporter of indexes. Dead languages be damned.

I go back and forth.

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@JBert said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

We use transactions

And you don't experience any downsides, like what @wft and @Carnage mentioned?

@boomzilla doesn't use explicit locking, unlike what you proposed here:

@Gąska said in The Revival of Great SQL Ideas:

@wft said in The Revival of Great SQL Ideas:

How does atomicity enable us to avoid undoing work of one another?

You start a transaction. You get exclusive lock for that field/table. Bob starts a transaction. He has to wait for your lock. You read the blob. You modify field A. You serialize it back, write, commit, and end your transaction. Bob gets the lock. Bob reads the blob, modifies field B, serializes it back, writes, commits and ends his transaction. Voila.

That's why SQL users make the distinction between "pessimistic locking" (nobody can touch the data until the transaction owning the lock is done / rolled back) and "optimistic locking" (no locks, but transactions operating on the same entity might fail and need to be redone if we were too optimistic that nobody was operating on this entity at this time and increased the "version").

Okay, I see. Makes sense. Though even with optimistic locking, you'd still not be able to perform a write at the exact same moment someone else is doing a write, so most of the same (non-)issues remain. In fact, what I proposed with explicit locks is generally reimplementing optimistic locking with pessimistic locking (lock, read, compare, update, release). That's what happens when you don't know the proper terminology.

Correct. When you actually go to write, you do take a lock on the data (hopefully at the row level!). And you don't check the version until you have that lock.

So the only thing I said wrong is that I didn't call it optimistic locking. So everything that @JBert and @Carnage said is wrong with my solution, isn't actually wrong - it's just I wasn't entirely clear in what I mean because I used wrong terminology. Correct?

I think there may have been confusion about when the lock was applied.

So to get back to original issue - when @wft said transactions or atomicity don't give you anything in his scenario, he was still wrong?

I can atomically write over something you just changed without knowing it, because I originally read the data before your atomic write happened, so I have no idea what you did. Unless I check. Atomicity just means that you get all or none of what you did in a transaction. It doesn't say anything about other transactions.

Gąska

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

So to get back to original issue - when @wft said transactions or atomicity don't give you anything in his scenario, he was still wrong?

I can atomically write over something you just changed without knowing it, because I originally read the data before your atomic write happened, so I have no idea what you did. Unless I check. Atomicity just means that you get all or none of what you did in a transaction. It doesn't say anything about other transactions.

Due to my low-level roots, when someone says "atomic", I think of an atomic read-write operation (atomic "just write" doesn't make sense there, as single CPU instructions are always atomic). Yes, an atomic "just read" followed by ending transaction followed by an atomic "just write" doesn't help with anything. That's why I was talking about wrapping them in a transaction. Because at the low level, it's essentially what happens with optimistic locking (unless modern DBs have it implemented all server-side so a dead link doesn't hold up the update lock - then it would make sense to make strong distinction between the two).

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

So to get back to original issue - when @wft said transactions or atomicity don't give you anything in his scenario, he was still wrong?

I can atomically write over something you just changed without knowing it, because I originally read the data before your atomic write happened, so I have no idea what you did. Unless I check. Atomicity just means that you get all or none of what you did in a transaction. It doesn't say anything about other transactions.

Due to my low-level roots, when someone says "atomic", I think of an atomic read-write operation (atomic "just write" doesn't make sense there, as single CPU instructions are always atomic). Yes, an atomic "just read" followed by ending transaction followed by an atomic "just write" doesn't help with anything. That's why I was talking about wrapping them in a transaction. Because at the low level, it's essentially what happens with optimistic locking (unless modern DBs have it implemented all server-side so a dead link doesn't hold up the update lock - then it would make sense to make strong distinction between the two).

Right. Now, consider how a web app works. You read data and show it to the user. They make some edits and send the data back to be saved. That first read was atomic. But you don't want that to be waiting on the user to futz around and send stuff back (could be a long time, especially if you don't make sessions expire!) before that transaction ends. In optimistic locking, you do another read (to make sure no one else has changed it!) right before you write.

It's kind of like version control locking files so no one else can check them out while you're working vs detecting that a change was made since the revision you have in your working copy (and then handling that somehow...the details of that aren't important for this analogy).

Gąska

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@boomzilla said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

So to get back to original issue - when @wft said transactions or atomicity don't give you anything in his scenario, he was still wrong?

I can atomically write over something you just changed without knowing it, because I originally read the data before your atomic write happened, so I have no idea what you did. Unless I check. Atomicity just means that you get all or none of what you did in a transaction. It doesn't say anything about other transactions.

Due to my low-level roots, when someone says "atomic", I think of an atomic read-write operation (atomic "just write" doesn't make sense there, as single CPU instructions are always atomic). Yes, an atomic "just read" followed by ending transaction followed by an atomic "just write" doesn't help with anything. That's why I was talking about wrapping them in a transaction. Because at the low level, it's essentially what happens with optimistic locking (unless modern DBs have it implemented all server-side so a dead link doesn't hold up the update lock - then it would make sense to make strong distinction between the two).

Right. Now, consider how a web app works. You read data and show it to the user. They make some edits and send the data back to be saved. That first read was atomic. But you don't want that to be waiting on the user to futz around and send stuff back (could be a long time, especially if you don't make sessions expire!) before that transaction ends. In optimistic locking, you do another read (to make sure no one else has changed it!) right before you write.

And if you go back to my first post on this topic, you'll notice that's exactly what I've been saying all along. Maybe I could use better words than "read-read-write", but I didn't know the correct phrase back then.

I'd never make a transaction that could last longer than a second (unless in emergency situation, where it's not that important for the system to be operational at that very moment). I'd definitely never make a transaction that needs user input of any sort to proceed.

Gąska

So now that we've cleared up what everybody meant.

@JBert, is this optimistic locking thing still "major pain in the ass" scale, or was it just misunderstanding?
@Carnage, is this optimistic locking thing still vulnerable to sudden power outages, or was it just misunderstanding? In case it was misunderstanding - how exactly does it work that a power outage at DB client at the precise moment the second read is occuring doesn't halt the entire system?

I'm asking because it's very interesting topic that I know next to nothing about and would like to know more - and I didn't exactly get a straight answer on those.

AlexMedia

@mott555 said in The Revival of Great SQL Ideas:

"In the middle of the street"

Isn't that where our house is normally located?

Carnage

@Gąska said in The Revival of Great SQL Ideas:

So now that we've cleared up what everybody meant.

@JBert, is this optimistic locking thing still "major pain in the ass" scale, or was it just misunderstanding?

@Carnage, is this optimistic locking thing still vulnerable to sudden power outages, or was it just misunderstanding? In case it was misunderstanding - how exactly does it work that a power outage at DB client at the precise moment the second read is occuring doesn't halt the entire system?

I'm asking because it's very interesting topic that I know next to nothing about and would like to know more - and I didn't exactly get a straight answer on those.

No, optimistic locking should make things a lot less troublesome. Sometimes you don't want do do optimistic locks, but for most situations it works well.
If you have a hotspot that a lot of stuff writes to, you may want to start using pessimistic locks to prevent ending up in a nasty cycle where everything is spending inordinate amounts of time redoing work because someone updated the same thing since their last read. Sort of live locking a lot of threads/workers.

Gąska

@Carnage said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

So now that we've cleared up what everybody meant.

@JBert, is this optimistic locking thing still "major pain in the ass" scale, or was it just misunderstanding?

@Carnage, is this optimistic locking thing still vulnerable to sudden power outages, or was it just misunderstanding? In case it was misunderstanding - how exactly does it work that a power outage at DB client at the precise moment the second read is occuring doesn't halt the entire system?

I'm asking because it's very interesting topic that I know next to nothing about and would like to know more - and I didn't exactly get a straight answer on those.

No, optimistic locking should make things a lot less troublesome.

As in, it doesn't entirely solve the issue - just makes it statistically less likely? Or am I missing something again?

Carnage

@Gąska mostly just because I'm in a completely different state of mind than database, and it takes a while for my brain to do the context switch, even more so now since it's been a while since I had to dig into the finer bits of database design.
I don't want to give be advice on the topic because I misremembered some edge case. Especially since when you get down to a low level, the answers can vary substantially with different databases.

So I'd say that optimistic looking probably solves the issues described, and if there are problems they are esoteric and rare

JBert

@Gąska said in The Revival of Great SQL Ideas:

@Carnage said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

So now that we've cleared up what everybody meant.

@JBert, is this optimistic locking thing still "major pain in the ass" scale, or was it just misunderstanding?

@Carnage, is this optimistic locking thing still vulnerable to sudden power outages, or was it just misunderstanding? In case it was misunderstanding - how exactly does it work that a power outage at DB client at the precise moment the second read is occuring doesn't halt the entire system?

I'm asking because it's very interesting topic that I know next to nothing about and would like to know more - and I didn't exactly get a straight answer on those.

No, optimistic locking should make things a lot less troublesome.

As in, it doesn't entirely solve the issue - just makes it statistically less likely? Or am I missing something again?

What you seem to be missing is that with pessimistic locking the SQL client is explicitly requesting the lock at any point in your transaction, which is why I thought you were talking about that.

In optimistic locking the SQL client doesn't request any lock, it's the RDBMS which will request an internal lock for you at the point you commit your transaction and write stuff (if it didn't have an internal lock it wouldn't be a safe RDBMS). Because it's now the RDBMS which manages the lock it will also undo that lock for you if the transaction fails to commit, the SQL client goes away or anything else happens.

In the case of pessimistic locking the RDBMS cannot undo the lock until the client has asked for it, and so a crashed client can keep stuff locked

boomzilla

@Gąska said in The Revival of Great SQL Ideas:

@Carnage said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

So now that we've cleared up what everybody meant.

@JBert, is this optimistic locking thing still "major pain in the ass" scale, or was it just misunderstanding?

@Carnage, is this optimistic locking thing still vulnerable to sudden power outages, or was it just misunderstanding? In case it was misunderstanding - how exactly does it work that a power outage at DB client at the precise moment the second read is occuring doesn't halt the entire system?

I'm asking because it's very interesting topic that I know next to nothing about and would like to know more - and I didn't exactly get a straight answer on those.

No, optimistic locking should make things a lot less troublesome.

As in, it doesn't entirely solve the issue - just makes it statistically less likely? Or am I missing something again?

Yes, hence the name. Optimistic: "Hey, there probably won't be a problem!" So you operate in a way such that stuff runs smoothly when you don't have a conflict. If you're in an environment where you expect a lot of contention then a different strategy will work better. We use pessimistic locking in some specific cases for this reason.

Optimistic vs. Pessimistic locking

I understand the differences between optimistic and pessimistic locking. Now, could someone explain to me when I would use either one in general? And does the answer to this question change depend...

PJH

@Cabbage said in The Revival of Great SQL Ideas:

You can populate a dropdown or autocomplete with all previously used values without normalisation.

What's going to populate that dropdown?

If it's not that table (or something semantically equivalent) you suggested earlier, that is.