The Revival of Great SQL Ideas



  • @PJH

    SELECT DISTINCT location FROM manhole_cover
    

    With an index this will be roughly as fast as using a separate location table.


  • Discourse touched me in a no-no place

    @Cabbage said in The Revival of Great SQL Ideas:

    @PJH

    SELECT DISTINCT location FROM manhole_cover
    

    With an index this will be roughly as fast as using a separate location table.

    If you have a dropdown limited to values already entered and you've not had any values entered...



  • @Cabbage I'd think it would be much faster with a separate location table that has 10 entries, versus running DISTINCT on a table with 700,000 records on a multi-user application that customers expect to be fairly responsive.

    EDIT: I forgot to mention earlier, proper normalization reduces disk space, too. Which matters when you have a single database cluster that might need to store data for 200+ different clients, each client's dataset being tens to hundreds of gigabytes containing 30+ layers with a hundred attributes each.


  • Fake News

    @Gąska Going back to your example:

    @Gąska said:

    You start a transaction. You get exclusive lock for that field/table. Bob starts a transaction. He has to wait for your lock. You read the blob. You modify field A. You serialize it back, write, commit, and end your transaction. Bob gets the lock. Bob reads the blob, modifies field B, serializes it back, writes, commits and ends his transaction. Voila.

    The way you explain it it looks a lot like pessimistic locking.

    With optimistic locking you would actually see the following:

    • You start a transaction. You read the version field ("1") and the blob ("A"). You write an UPDATE query with SET blob = "B" and version = "2" where id = "foo" and version = "1".
    • Meanwhile Bob starts a transaction. He reads the version field ("1") and the blob ("A"). He writes an UPDATE query with SET blob = "Z" and version = "2" where id ="foo" and version = "1".
    • You commit your transaction. The database locks the record with id = "foo" for a millisecond to stop everyone from stumbling over each other, checks that version is still == "1" and then applies your update after which it replies to you "update records: 1". Everything is fine and it undoes the lock it requested. The record now has version = "2" and blob = "B".
    • Bob commits his transaction a second later. The database locks the record with id = "foo" again for a millisecond and checks the version. It notices that Bob thinks the version = "1" but that's no longer true. Bob's update fails, the lock is removed and the transaction is marked failed.

    Because the lock is managed by the RDBMS nobody has to wait unless you happen to commit two transactions for the same record on the very same instant, and the RDBMS has enough information to recover. If the lock is set from the client (when you're doing "pessimistic locking") then it becomes clear that it's the client which is the bigger point of failure.



  • @loopback0 said in The Revival of Great SQL Ideas:

    If you have a dropdown limited to values already entered and you've not had any values entered...

    ... then you're going to have an empty dropdown. But this is true whether or not your data is normalised.

    @mott555 said in The Revival of Great SQL Ideas:

    @Cabbage I'd think it would be much faster with a separate location table that has 10 entries, versus running DISTINCT on a table with 700,000 records on a multi-user application that customers expect to be fairly responsive.

    My instincts tell me that the time taken for my proposed DISTINCT query on a typical database would be solidly less than 100ms and probably less than 50. Too lazy to benchmark, but happy to be proven wrong by somebody else doing so.



  • @Cabbage said in The Revival of Great SQL Ideas:

    My instincts tell me that the time taken for my proposed DISTINCT query on a typical database machine would be solidly less than 100ms. Happy to be proven wrong by a benchmark.

    I can't benchmark it because this was at an old job 5+ years ago, but downloading a 10-record table would be far far quicker than 100 ms.


  • Banned

    @boomzilla said in The Revival of Great SQL Ideas:

    Yes, hence the name.

    The name says nothing about power outages.

    @JBert said in The Revival of Great SQL Ideas:

    In optimistic locking the SQL client doesn't request any lock, it's the RDBMS which will request an internal lock for you at the point you commit your transaction and write stuff (if it didn't have an internal lock it wouldn't be a safe RDBMS). Because it's now the RDBMS which manages the lock it will also undo that lock for you if the transaction fails to commit, the SQL client goes away or anything else happens.

    Okay, I see now. So optimistic locking is done all server-side and doesn't wait for client to commit (because everything happens after commit). Makes sense now. Thanks!


  • Discourse touched me in a no-no place

    @Cabbage said in The Revival of Great SQL Ideas:

    ... then you're going to have an empty dropdown. But this is true whether or not your data is normalised.

    Yes but your solution only works if you've allowed location to be a free for all for a bit, and then changed it after deciding people have entered all the values they'll ever need.



  • @mott555 I mean, with a type of index that's ideal for this purpose and a database smart enough to use it properly you'd expect both of them to be "basically instant", with maybe a millisecond or two difference between them. I probably shouldn't characterise 100ms as a "typical" case; it's more the worst case I wouldn't be outright surprised by.


  • ♿ (Parody)

    @mott555 said in The Revival of Great SQL Ideas:

    I'd think it would be much faster with a separate location table that has 10 entries, versus running DISTINCT on a table with 700,000 records on a multi-user application that customers expect to be fairly responsive.

    As already discussed, that depends on how it's indexed (and probably varies by DBMS).


  • ♿ (Parody)

    @Gąska said in The Revival of Great SQL Ideas:

    @boomzilla said in The Revival of Great SQL Ideas:

    Yes, hence the name.

    The name says nothing about power outages.

    Yes. You're getting into a separate issue that gets into how a particular DBMS does transactions and locking and recovery and etc.



  • @loopback0 It also works if you have a dropdown with an override that permits you to enter a new value.


  • ♿ (Parody)

    @loopback0 said in The Revival of Great SQL Ideas:

    @Cabbage said in The Revival of Great SQL Ideas:

    ... then you're going to have an empty dropdown. But this is true whether or not your data is normalised.

    Yes but your solution only works if you've allowed location to be a free for all for a bit, and then changed it after deciding people have entered all the values they'll ever need.

    Yes. It requires some application logic in addition to DB logic.


  • Banned

    @boomzilla the issue is still the same: what's the worst case scenario when a cleaning lady knocks over the power cord of DB client machine at the most unfortunate moment?


  • Discourse touched me in a no-no place

    @Cabbage said in The Revival of Great SQL Ideas:

    @loopback0 It also works if you have a dropdown with an override that permits you to enter a new value.

    :moving_goal_post: 🏆


  • ♿ (Parody)

    @Gąska said in The Revival of Great SQL Ideas:

    @boomzilla the issue is still the same: what's the worst case scenario when a cleaning lady knocks over the power cord of DB client machine at the most unfortunate moment?

    The abort code to launch the nukes doesn't get sent and we all die in a nuclear holocaust.


  • Banned

    @boomzilla but at least DB doesn't get locked, so if you ask me, all is good.


  • Discourse touched me in a no-no place

    @loopback0 said in The Revival of Great SQL Ideas:

    @Cabbage said in The Revival of Great SQL Ideas:

    @PJH

    SELECT DISTINCT location FROM manhole_cover
    

    With an index this will be roughly as fast as using a separate location table.

    If you have a dropdown limited to values already entered and you've not had any values entered...

    Shush. Stop being difficult.



  • @loopback0 How is that moving the goalposts? Your objection was, as I understand it, that not having a separate store of location names wasn't viable because there needed to be some mechanism for adding a location separate from the mechanism for selecting a location for a particular manhole cover. I retorted by pointing out an interface that:

    1. is a totally ordinary UI feature seen frequently in the wild (e.g,. check out the Find or create branch dropdown on GitHub)
    2. allows adding a new location at the same time as adding a new manhole cover
    3. still presents previously-entered values and pushes the user towards using them instead of entering a new value

    That's called rebutting your argument in my world, not :moving_goal_post: 🏆.



  • @Cabbage The problem is to work your way, you'd have to get ESRI to change their application1 to add indexing, DISTINCTS, etc on all kinds of fields that are best considered domain-constrained. It doesn't do that, which is why that example dataset had eleventy billion variations of the same value. The better option is to just use a foreign key constraint and it all Just Works with no extra effort at all on the users' or administrators' parts. If you're designing a database to work with a third-party application, you gotta do things the way the application expects, even if you think normalization is weird and unnecessary.

    1Achieving world peace would be far easier than this.


  • Discourse touched me in a no-no place

    @Cabbage said in The Revival of Great SQL Ideas:

    How is that moving the goalposts?

    You seem to have missed the 🏆 - I wasn't being serious.

    @Cabbage said in The Revival of Great SQL Ideas:

    Your objection was

    That a solution based entirely on that query wouldn't work. I didn't have a genuine objection to the override, hence the non-serious response.



  • @mott555 said in The Revival of Great SQL Ideas:

    The better option is to just use a foreign key constraint and it all Just Works with no extra effort at all on the users' or administrators' parts

    I don't get what you're trying to say.

    If you want to stop getting arbitrary free text locations, you need to change the bit of the application UI where users enter arbitrary free text locations. Some possible replacements for this UI will require normalisation; some will not.

    If you normalise while leaving the UI alone then you will (unless you are willing to just straight-up break the UI with your backend changes) have to make new location strings be automatically inserted into the location table, producing the end result I showed in https://what.thedailywtf.com/topic/26026/the-revival-of-great-sql-ideas/53.

    You cannot "just use a foreign key" and magically fix what is at its core a UI failure. You need to change the UI. Supporting some alternative UIs will require normalisation of the database, but not others. Thus I described normalisation and the problem of getting free text input as "mostly orthogonal"; normalisation in itself does not fix the problem, and fixing the problem does not in itself require normalisation.

    This all seems clear to me. Am I being retarded in some way?



  • @Cabbage said in The Revival of Great SQL Ideas:

    @mott555 said in The Revival of Great SQL Ideas:

    The better option is to just use a foreign key constraint and it all Just Works with no extra effort at all on the users' or administrators' parts

    I don't get what you're trying to say.

    If you want to stop getting arbitrary free text locations, you need to change the bit of the application UI where users enter arbitrary free text locations. Some possible replacements for this UI will require normalisation; some will not.

    If you normalise while leaving the UI alone then you will (unless you are willing to just straight-up break the UI with your backend changes) have to make new location strings be automatically inserted into the location table, producing the end result I showed in https://what.thedailywtf.com/topic/26026/the-revival-of-great-sql-ideas/53.

    You cannot "just use a foreign key" and magically fix what is at its core a UI failure. You need to change the UI. Supporting some alternative UIs will require normalisation of the database, but not others. Thus I described normalisation and the problem of getting free text input as "mostly orthogonal".

    This all seems clear to me. Am I being retarded in some way?

    Besides the fact that normalization is the correct thing to do, this wasn't our application. It was a big third-party application (basically a monopoly in the GIS industry) that hooked up to databases we controlled. There was no way for us to change the UI or the application's REST interface and data layers even if we wanted to. So yes, if you set up the database in the way the application expects, things Just Work.



  • @mott555 I still have no idea what you're saying. If you can't change the UI in which the user has to enter location as a free text field, and can't stick in a magical normalisation layer prior to storage, then you will have to store semantically equivalent locations written in different ways, because users will write them in different ways. Normalisation has no bearing on this point at all. (Indeed, whether you store your data in an SQL database, a NoSQL database, a flat JSON file, or a stack of papers in a filing cabinet that you interface with over Mechanical Turk all still has no bearing on this point at all.) What problem are you suggesting can be solved in this scenario by "just add[ing] a foreign key"?



  • @Cabbage said in The Revival of Great SQL Ideas:

    What problem are you suggesting can be solved in this scenario by "just add[ing] a foreign key"?

    If you just add in a foreign key, ArcGIS gives you a combobox of options for a field so you can select (or add) the one you need without accidentally adding in tons of data errors by typos or by having different workers use different terminology for the same logical value. Without that, the workers (who may be data entry interns at the local county offices, or survey crews with no knowledge of SQL or database design) can type in whatever they want with no guidance and that leads to messy data.

    They will not write them in different ways if they get a combobox/list of valid options first. Without that list, everyone will do it slightly different. And the only way to get the list is to use a foreign key, because that's how the application works. I don't see what's hard to understand here.



  • @mott555 ... and you didn't think that "The application I can't change already has features that inspect the underlying database schema at runtime and modify the UI elements displayed to the user based upon whether a foreign key is present" was something worth noting at, say, any point in this discussion prior to now?

    EDIT: Ah, you did, in https://what.thedailywtf.com/topic/26026/the-revival-of-great-sql-ideas/85. 🤷🏻♂


  • Banned

    This post is deleted!


  • @Cabbage said in The Revival of Great SQL Ideas:

    features that inspect the underlying database schema at runtime

    That's not quite true, but close. You're supposed to edit the schema using the application itself, that way it knows about the schema changes, but if you're really careful and know what you're doing and don't mind the occasional case of nasal demons, you can go through SQL Server Management Studio and alter things directly.


  • Banned

    @mott555 Abusing the fact that an application stores part of its configuration in DB schema itself, and manipulating the schema directly instead of going with official tools. I haven't heard of something so hacky in a very long time.



  • @Gąska said in The Revival of Great SQL Ideas:

    I'd never make a transaction that could last longer than a second (unless in emergency situation, where it's not that important for the system to be operational at that very moment).

    That seems like a good rule until you're dealing with replication jobs where a table update might take 30+ seconds.


  • Trolleybus Mechanic

    @Gąska said in The Revival of Great SQL Ideas:

    I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

    Of course it is, that's how UNIQUE INDEX works. You think it scans all rows on each insert/update? :D



  • @wft said in The Revival of Great SQL Ideas:

    specify "elephant", "virgin blood", "fuck you", or "$^@&#$&#^" as my favorite colors.

    Gray, red, milky white (?), but I have no idea what color "$^@&#$&#^" is.


  • Discourse touched me in a no-no place

    @HardwareGeek said in The Revival of Great SQL Ideas:

    I have no idea what color "$^@&#$&#^" is.

    🇧🇪 ?



  • @boomzilla said in The Revival of Great SQL Ideas:

    We use Oracle... our DBA is kind of retarded

    Sounds right.


  • Considered Harmful

    @HardwareGeek said in The Revival of Great SQL Ideas:

    milky white (?),

    I'd just go with more red.



  • @Cabbage said in The Revival of Great SQL Ideas:

    Am I being retarded in some way?

    Well, you're arguing with strangers on the internet...


  • Banned

    @sebastian-galczynski said in The Revival of Great SQL Ideas:

    @Gąska said in The Revival of Great SQL Ideas:

    I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

    Of course it is, that's how UNIQUE INDEX works. You think it scans all rows on each insert/update? :D

    Unique index has the opposite purpose from the one discussed - ensure there are NO duplicates - so due to how special uniqueness is for databases, and given my expertise in other branches of software development, it was logical for me to consider the possibility that there is some very special mechanism for uniqueness checks baked deep into database engine, that is highly optimized for this one specific usecase, and as a result, completely unusable for anything else (if only because there is no other entry point for those functions) - meaning that it's possible that all knowledge about how uniqueness checks work is not applicable to other things that could theoretically be done by indexes but isn't as common, so DB engine developers haven't made any efforts to make the uniqueness checking code more general.


  • Considered Harmful

    a database of immutable facts as of instants in time is inherently lock free



  • @sebastian-galczynski said in The Revival of Great SQL Ideas:

    @Gąska said in The Revival of Great SQL Ideas:

    I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

    Of course it is, that's how UNIQUE INDEX works. You think it scans all rows on each insert/update? :D

    To be fair, given how abstractions have become higher and higher level, it shouldn't be all that surprising. Most/all of my coworkers couldn't care less about what index fragmentation is, or why a clustering key should be small and ever-ascending, for example.

    It's sort of like how when I started programming, we were taught that once you switched to a non-text video mode, the video buffer was at address 0xA0000 and we could go nuts. Nowadays, we're taught that a "garbage collector" comes by and disposes of objects we aren't using anymore. At some point. In some manner. Don't worry about it. And then everything's hunky-dory until you play Kerbal Space Program and the GC pulses every few seconds.



  • @Gribnit said in The Revival of Great SQL Ideas:

    a database of immutable facts as of instants in time is inherently lock free

    Sounds very Java.


  • Considered Harmful

    @Groaner said in The Revival of Great SQL Ideas:

    @sebastian-galczynski said in The Revival of Great SQL Ideas:

    @Gąska said in The Revival of Great SQL Ideas:

    I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

    Of course it is, that's how UNIQUE INDEX works. You think it scans all rows on each insert/update? :D

    To be fair, given how abstractions have become higher and higher level, it shouldn't be all that surprising. Most/all of my coworkers couldn't care less about what index fragmentation is, or why a clustering key should be small and ever-ascending, for example.

    It's sort of like how when I started programming, we were taught that once you switched to a non-text video mode, the video buffer was at address 0xA0000 and we could go nuts. Nowadays, we're taught that a "garbage collector" comes by and disposes of objects we aren't using anymore. At some point. In some manner. Don't worry about it. And then everything's hunky-dory until you play Kerbal Space Program and the GC pulses every few seconds.

    Just launch it with 0GC. You should make it way longer than a few seconds. And it's a good excuse to get a more expensive computer.


  • Considered Harmful

    @Gąska said in The Revival of Great SQL Ideas:

    @sebastian-galczynski said in The Revival of Great SQL Ideas:

    @Gąska said in The Revival of Great SQL Ideas:

    I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

    Of course it is, that's how UNIQUE INDEX works. You think it scans all rows on each insert/update? :D

    Unique index has the opposite purpose from the one discussed - ensure there are NO duplicates - so due to how special uniqueness is for databases, and given my expertise in other branches of software development, it was logical for me to consider the possibility that there is some very special mechanism for uniqueness checks baked deep into database engine, that is highly optimized for this one specific usecase, and as a result, completely unusable for anything else (if only because there is no other entry point for those functions) - meaning that it's possible that all knowledge about how uniqueness checks work is not applicable to other things that could theoretically be done by indexes but isn't as common, so DB engine developers haven't made any efforts to make the uniqueness checking code more general.

    No.



  • @Gribnit said in The Revival of Great SQL Ideas:

    Just launch it with 0GC. You should make it way longer than a few seconds. And it's a good excuse to get a more expensive computer.

    Define "more expensive computer." The build I'm currently on was around $2k including accessories in 2016 dollars, including a GTX 1070, 32 gigs of RAM at 3GHz, and six physical cores overclocked to 4GHz.

    Performance has admittedly gotten better since the game was alpha, but anything with 200+ parts (i.e. anything fun) is going to turn into a slideshow eventually. Also, I need to use 4x physics warp on a lot of my craft (if you peruse the relevant threads, you'll note that I like building LV-N SSTOs as they're still fun when you've sunk over 1000 hours into the game).

    If I get the urge to play it again, maybe I'll try that out. But the last mission that captured my interest was a solar retrograde rescue mission, which involved a bi-elliptic transfer and lots of ions, and it's hard to top that. There isn't much I haven't done in that game, apart from an Eve land/return (requires pointlessly large craft and landing on Eve is annoying with the "new" aero) and Jool ascent (too stupid).


  • Considered Harmful

    @Groaner said in The Revival of Great SQL Ideas:

    Define "more expensive computer."

    0GC just dumps it on the heap and leaves it there forever. You will need to spend an amount on RAM commensurate with your intended play-time.

    But there will be no GC hiccups.



  • @Gribnit said in The Revival of Great SQL Ideas:

    @Groaner said in The Revival of Great SQL Ideas:

    Define "more expensive computer."

    0GC just dumps it on the heap and leaves it there forever. You will need to spend an amount on RAM commensurate with your intended play-time.

    But there will be no GC hiccups.

    Ah, the null garbage collector!


  • Banned

    @Gribnit said in The Revival of Great SQL Ideas:

    @Gąska said in The Revival of Great SQL Ideas:

    @sebastian-galczynski said in The Revival of Great SQL Ideas:

    @Gąska said in The Revival of Great SQL Ideas:

    I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

    Of course it is, that's how UNIQUE INDEX works. You think it scans all rows on each insert/update? :D

    Unique index has the opposite purpose from the one discussed - ensure there are NO duplicates - so due to how special uniqueness is for databases, and given my expertise in other branches of software development, it was logical for me to consider the possibility that there is some very special mechanism for uniqueness checks baked deep into database engine, that is highly optimized for this one specific usecase, and as a result, completely unusable for anything else (if only because there is no other entry point for those functions) - meaning that it's possible that all knowledge about how uniqueness checks work is not applicable to other things that could theoretically be done by indexes but isn't as common, so DB engine developers haven't made any efforts to make the uniqueness checking code more general.

    No.

    What exactly no?


  • Java Dev

    @sebastian-galczynski said in The Revival of Great SQL Ideas:

    @Gąska said in The Revival of Great SQL Ideas:

    I'm not an expert on this, so I'm going to ask: is it possible to query an index whether a given value already exists in a column, without enumerating each row? If there is, then yes, you have a point there.

    Of course it is, that's how UNIQUE INDEX works. You think it scans all rows on each insert/update? :D

    That has nothing to do with unique.

    A (b-tree) index is a binary tree which you can use to look up a value and find which rows (if any) match that value, at an efficiency of O(log(n)). Looking up a specific value does not require a full table scan, and if you're not selecting any columns which are not in the index (EG because your select list is your own lookup key, or 1, or COUNT(*)) it doesn't need to access the underlying table at all.

    A bitmap index is a (compressed) set of arrays of bits, storing for each potential column value which rows have that value. I'm not sure of the performance considerations of using such an index to test if a value is used.

    A unique constraint, generally enforced with a b-tree index, enforces that any value or set of values can only occur once.


  • Discourse touched me in a no-no place

    @Gąska said in The Revival of Great SQL Ideas:

    I admit I don't know enough about DBs or webdev to know the right solution.

    Don't host servers in broom cupboards?


  • Banned

    @Groaner said in The Revival of Great SQL Ideas:

    It's sort of like how when I started programming, we were taught that once you switched to a non-text video mode, the video buffer was at address 0xA0000 and we could go nuts. Nowadays, we're taught that a "garbage collector" comes by and disposes of objects we aren't using anymore. At some point. In some manner. Don't worry about it. And then everything's hunky-dory until you play Kerbal Space Program and the GC pulses every few seconds.

    Note that if someone tried to explain in detail how modern GCs work at the low level, they would be either mostly wrong about every aspect, or incomprehensible to most programmers (especially those who don't already have a very good idea how GCs work), or both. Computers have become so advanced that traditional knowledge usually doesn't apply anymore. For example - what's the main difference between compiled and interpreted languages? If you were to say something like "interpreted languages don't get compiled", you would be right in the 1998, but couldn't be more wrong in 2018.


  • Java Dev

    @Gąska I'd rather say that pure interpreted languages are rare nowadays, and many languages you'd traditionally think of as interpreted are just-in-time compiled nowadays. And that then you can go split hairs on how to classify bytecode.


Log in to reply