The Revival of Great SQL Ideas

dkf

@AlexMedia said in The Revival of Great SQL Ideas:

@mott555 said in The Revival of Great SQL Ideas:

"In the middle of the street"

Isn't that where our house is normally located?

Sure…

Also…
https://youtu.be/o9OPFvhFq0c

Gąska

@PleegWat said in The Revival of Great SQL Ideas:

@Gąska I'd rather say that pure interpreted languages are rare nowadays, and many languages you'd traditionally think of as interpreted are just-in-time compiled nowadays. And that then you can go split hairs on how to classify bytecode.

My point exactly. Interpreted languages have progressed so much that they're hardly interpreted anymore.

JBert

@dkf said in The Revival of Great SQL Ideas:

@AlexMedia said in The Revival of Great SQL Ideas:

@mott555 said in The Revival of Great SQL Ideas:

"In the middle of the street"

Isn't that where our house is normally located?

Sure…

I read somewhere that they had that house eventually torn down since the Chinese owners didn't like the life on a highway...

pie_flavor

@Groaner said in The Revival of Great SQL Ideas:

And then everything's hunky-dory until you play Kerbal Space Program and the GC pulses every few seconds.

That's why I like Java's G1. It makes the JVM slower overall but doesn't create big pauses, so on average it's the same speed and more stable.

TwelveBaud

@Gąska said in The Revival of Great SQL Ideas:

Is it actually guaranteed that operations that don't require actually reading the record, don't read the record?

This, by the way, is one of the many reasons why Discourse performance sucked so bad. Most database engines include the full value of each column indexed in the index, but the one Discourse comes with doesn't.

Fake edit: by @PleegWat

sebastian.galczynski

@PleegWat said in The Revival of Great SQL Ideas:

That has nothing to do with unique.

I know, I just used unique index as an example. Anyway, in this example scanning the whole table is not necessary, whether you use EXISTS or simply a separate SELECT.

boomzilla

@TwelveBaud said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

Is it actually guaranteed that operations that don't require actually reading the record, don't read the record?

This, by the way, is one of the many reasons why Discourse performance sucked so bad. Most database engines include the full value of each column indexed in the index, but the one Discourse comes with doesn't.

Fake edit: by @PleegWat

Wait, what? PostgreSQL does this? Why?

TwelveBaud

@boomzilla It apparently uses bitmap indexes exclusively, no B-tree indexes. Why? No idea. But a friend who used to run a hosting company told us this:

@Thalagyrt said in You merely adopted the 500. I was born in it, molded by it. I didn't see a fully rendered Discourse page until I was already a man:

Seems like Discourse's indexes are explicitly targeting MySQL's query optimizer, looking at them more in depth. Lots of things are missing that would allow Postgres to come up with more efficient optimization plans. There's next to no need for multi-column indexes in Postgres, and pretty much every index is a multi-column index. Postgres can use a multi-column index for a single column, but only if that single column is the first column in the multi-column index. More to the point, Postgres can combine multiple single-column indexes on the same table and gain benefits from both. The only time all these multi-column indexes make sense is when you're using the entire index in a query. i.e. with col_a and col_b indexed, your query is exactly where col_a=? and col_b=?.

Why the hell is topic_id not indexed? Is it just not used in queries at all, or was that an oversight? I'm not incredibly familiar with Discourse.

And then we had Ben run a query plan for "get me all the posts from the Likes Thread" and PG was trying to use a bitmap index but fell over when filtering out posts that were deleted and went "fuck it, TABLE SCAN!"

boomzilla

@TwelveBaud said in The Revival of Great SQL Ideas:

It apparently uses bitmap indexes exclusively, no B-tree indexes. Why? No idea.

Not at least for 9.1:

Oct 27, 2016

Index Types

PostgreSQL provides several index types: B-tree, Hash, GiST and GIN. Each index type uses a different algorithm that is best suited to different types of queries. By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations.

@TwelveBaud said in The Revival of Great SQL Ideas:

And then we had Ben run a query plan for "get me all the posts from the Likes Thread" and PG was trying to use a bitmap index but fell over when filtering out posts that were deleted and went "fuck it, TABLE SCAN!"

Whoa. I mean...no posts were deleted there (except temporarily or accidentally), so that particular thing wouldn't mean anything, but of course the DB has to be sure. Still...seems like the topic_id or whatever should have been filtered on.

But then, when you're reading in as many records as the likes thread had, it's not surprising to me that it would choose a full table scan. For that volume of records, picking out the blocks to read across the data would end up reading a whole lot of stuff you didn't need anyways and keeping track of that sometimes isn't worth it (so say the query optimizers).

That's where you typically have to get into some sort of partitioning scheme to start getting performance improvements.

But of course, this all begs the question that loading up all the posts in /t/1000 at once all the time is a good idea that won't lead to performance problems.

Tsaukpaetra

@boomzilla said in The Revival of Great SQL Ideas:

But of course, this all begs the question that loading up all the posts in /t/1000 at once all the time is a good idea that won't lead to performance problems.

It wouldn't be a problem if you limited your post-count-per-topic to maybe a thousand or so!

Gribnit

Eureka! We make just one table, and give it a key and a value column!

Zecc

@Gribnit said in The Revival of Great SQL Ideas:

Nothing, as long as they want to remain small and not do things like, perhaps, look at the favoriteColors of more than one person at a time, for instance to see who-all likes "blue".

Sometimes there is a right way to do a thing - said right way does not constitute a cargo cult until the understanding is lost. 2 + 2 = 4 isn't a cargo cult.

Whom did @Gribnit steal this coherent post from?

PleegWat

@Gribnit said in The Revival of Great SQL Ideas:

Eureka! We make just one table, and give it a key and a value column!

You forgot the entity column.

LaoC

@anonymous234 said in The Revival of Great SQL Ideas:

I call this the "principle of best effort": ideally, the developer should be able to give the computer system any amount of information about the data it's going to handle, and have the computer system perform as best as possible with it.

You need SwampSearch

HardwareGeek

@LaoC said in The Revival of Great SQL Ideas:

@anonymous234 said in The Revival of Great SQL Ideas:

as best as possible

SwampSearch

Hmmm...

dkf

@Gribnit said in The Revival of Great SQL Ideas:

We make just one table, and give it a key and a value column!

No, we make lots of tables and only put one row in each.

Karla

@dkf said in The Revival of Great SQL Ideas:

@Gribnit said in The Revival of Great SQL Ideas:

We make just one table, and give it a key and a value column!

No, we make lots of tables and only put one row in each.

Or lots of tables with only one column other than the PK.

Generate report with 1 row methodology:

SQL union hell

Generate report with 1 column methodology:

-- SQL join hell

HardwareGeek

@Karla Lots of tables with only one row and one column.

Benjamin Hall

@HardwareGeek said in The Revival of Great SQL Ideas:

@Karla Lots of tables with only one row and one column.

It wasn't SQL, but my gradebook webapp when I was a teacher was basically built using <table> tags with only one (or sometimes two) elements. Which were usually more <table>s.

It's ~~turtles~~tables all the way down.

Zecc

@HardwareGeek said in The Revival of Great SQL Ideas:

@Karla Lots of tables with only one row and one column.

With creative use of information_schema, you don't even need any rows!

Zecc

Let's just store everything in stored procedures and call it a day.

dkf

@Zecc said in The Revival of Great SQL Ideas:

Let's just store everything in stored procedures and call it a day.

Well duh. They're stored procedures. The clue is in the name.

Zecc

I wonder how many people out there thought about creating an extension for a given database which redirects data storage into a different kind of database.

Kamil Podlesak

@Benjamin-Hall said in The Revival of Great SQL Ideas:

@HardwareGeek said in The Revival of Great SQL Ideas:

@Karla Lots of tables with only one row and one column.

It wasn't SQL, but my gradebook webapp when I was a teacher was basically built using <table> tags with only one (or sometimes two) elements. Which were usually more <table>s.

It's ~~turtles~~tables all the way down.

That's how we did page layout in the old days, before ~~css~~frontend developers to throw it at

dkf

@Zecc said in The Revival of Great SQL Ideas:

I wonder how many people out there thought about creating an extension for a given database which redirects data storage into a different kind of database.

You mean like a pluggable storage engine, something that multiple databases already have support for? (It's not a standardised thing, but nor would I expect it to be.)

Zecc

@dkf Exactly, but I thinking specifically of something like making SqlServer store data inside SQLite.

Carnage

@Zecc said in The Revival of Great SQL Ideas:

@dkf Exactly, but I thinking specifically of something like making SqlServer store data inside SQLite.

Hey, just create the engine to do it.

dkf

@Zecc said in The Revival of Great SQL Ideas:

@dkf Exactly, but I thinking specifically of something like making SqlServer store data inside SQLite.

I was thinking of making SQLite store data inside Berkeley DB. (Yes, I know of someone who's worked on that. One of the SQLite core devs in fact…)

Jaime

@dkf
I was thinking along the lines of someone that mandates a SQL Server application interface of "always use stored procedures", so they can manage future changes. Someone in management asks if we can ship a version of the product using Oracle... so they install a LocalDB instance of SQL Server (because Oracle stored procedures aren't quite plug-compatible with MSSQL stored procedures), and the LocalDB procedures forward the requests on to the Oracle server.

Now... you have to support both and have all the problems and security issues of both. All the while, you can't use anything Microsoft-only because forwarding it on to the Oracle server will require a re-implementation, and you can't use anything Oracle-only because it will get mangled as it goes through the SQL Server.

dkf

@Jaime As long as the result is called Frankendatabase's Monster, we're fine.

Gribnit

@Zecc said in The Revival of Great SQL Ideas:

I wonder how many people out there thought about creating an extension for a given database which redirects data storage into a different kind of database.

The JCR folks did, in the Connector portion of the spec.

Watson

@Zecc
That has been standardised...

SQL/MED - Wikipedia

loopback0

@Kamil-Podlesak said in The Revival of Great SQL Ideas:

@Benjamin-Hall said in The Revival of Great SQL Ideas:

@HardwareGeek said in The Revival of Great SQL Ideas:

@Karla Lots of tables with only one row and one column.

It wasn't SQL, but my gradebook webapp when I was a teacher was basically built using <table> tags with only one (or sometimes two) elements. Which were usually more <table>s.

It's ~~turtles~~tables all the way down.

That's how we did page layout in the old days, before ~~css~~frontend developers to throw it at

Paging Dr @Zenith to the <table> department.

Zenith

@Jaime Which is why some of us advocate not putting a ton of logic, especially procedural/iterative logic, into the database layer. You can pretty easily make an application support multiple SP interfaces. It's the content you don't want to do a ton of duplication/rewriting on.

Gribnit

@Zenith said in The Revival of Great SQL Ideas:

@Jaime Which is why some of us advocate not putting a ton of logic, especially procedural/iterative logic, into the database layer. You can pretty easily make an application support multiple SP interfaces.

There are of course developers who would rather put code in the persistence layer, but thankfully, they keep dying.

Not that stored procedures are a bad thing, mind. As long as you're sure that the company will fail before they become a problem, or will never grow significantly, they remain a fine choice.

It's the content you don't want to do a ton of duplication/rewriting on.

Yes, although, you can do as much as you want of either as long as you don't do both.

LaoC

@Zenith said in The Revival of Great SQL Ideas:

@Jaime Which is why some of us advocate not putting a ton of logic, especially procedural/iterative logic, into the database layer. You can pretty easily make an application support multiple SP interfaces. It's the content you don't want to do a ton of duplication/rewriting on.

SELECT CONCAT(
    "<tr><td>",thing,"</td><td>",colorofthething,"</td></tr>"
) FROM thingz;

Gąska

@Karla said in The Revival of Great SQL Ideas:

@dkf said in The Revival of Great SQL Ideas:

@Gribnit said in The Revival of Great SQL Ideas:

We make just one table, and give it a key and a value column!

No, we make lots of tables and only put one row in each.

Or lots of tables with only one column other than the PK.

It's called the 6th normal form, and it's surprisingly useful for certain tasks with the right tools (although anything SQL is very definitely not the right tool).

Fun fact: there is no single, generally agreed on definition of 4th and 5th normal forms, but there are ones for 3.5th and 6th.

Zecc

@Watson said in The Revival of Great SQL Ideas:

@Zecc
That has been standardised...

SQL/MED - Wikipedia

TIL and I'm grateful I haven't felt the need for it so far.

Any idea how well implemented it is de facto?

dkf

@Gribnit said in The Revival of Great SQL Ideas:

There are of course developers who would rather put code in the persistence layer, but thankfully, they keep dying.

That's usually a mark of an organization that's got the wrong front-end tools.

Watson

@Zecc In the general case, no idea; I've only used it once myself and only in a pretty simple situation (upgrading a Postgres DB from 8.something to 12 without a delay of operations to do a dump-and-restore).

topspin

@Gąska said in The Revival of Great SQL Ideas:

@Karla said in The Revival of Great SQL Ideas:

@dkf said in The Revival of Great SQL Ideas:

@Gribnit said in The Revival of Great SQL Ideas:

We make just one table, and give it a key and a value column!

No, we make lots of tables and only put one row in each.

Or lots of tables with only one column other than the PK.

It's called the 6th normal form, and it's surprisingly useful for certain tasks with the right tools (although anything SQL is very definitely not the right tool).

Fun fact: there is no single, generally agreed on definition of 4th and 5th normal forms, but there are ones for 3.5th and 6th.

Would that make 7th normal form a single column and row containing a BLOB of an sqlite file?

dkf

@topspin said in The Revival of Great SQL Ideas:

@Gąska said in The Revival of Great SQL Ideas:

@Karla said in The Revival of Great SQL Ideas:

@dkf said in The Revival of Great SQL Ideas:

@Gribnit said in The Revival of Great SQL Ideas:

We make just one table, and give it a key and a value column!

No, we make lots of tables and only put one row in each.

Or lots of tables with only one column other than the PK.

It's called the 6th normal form, and it's surprisingly useful for certain tasks with the right tools (although anything SQL is very definitely not the right tool).

Fun fact: there is no single, generally agreed on definition of 4th and 5th normal forms, but there are ones for 3.5th and 6th.

Would that make 7th normal form a single column and row containing a BLOB of an sqlite file?

Make that BLOB hold a ZIP containing an SQLite file…

boomzilla

@Zecc said in The Revival of Great SQL Ideas:

@Watson said in The Revival of Great SQL Ideas:

@Zecc
That has been standardised...

SQL/MED - Wikipedia

TIL and I'm grateful I haven't felt the need for it so far.

Any idea how well implemented it is de facto?

It's not mentioned but our Oracle database links to an external SQL Server DB for a third party COTS component we've integrated with. For certain things we use their API calls but for other things (especially reporting) we go right to the DB.

I don't know the details of setting it up but using it is fairly straight forward. You'd use something along the lines of select * from ForeignTableName@ExternalLinkName.

You have to be careful, performance wise, and the syntax can be a bit tricky since you're mixing different dialects of SQL.

loopback0

@boomzilla said in The Revival of Great SQL Ideas:

@Zecc said in The Revival of Great SQL Ideas:

@Watson said in The Revival of Great SQL Ideas:

@Zecc
That has been standardised...

SQL/MED - Wikipedia

TIL and I'm grateful I haven't felt the need for it so far.

Any idea how well implemented it is de facto?

It's not mentioned but our Oracle database links to an external SQL Server DB for a third party COTS component we've integrated with. For certain things we use their API calls but for other things (especially reporting) we go right to the DB.

I don't know the details of setting it up but using it is fairly straight forward. You'd use something along the lines of select * from ForeignTableName@ExternalLinkName.

Database link - same as connecting one Oracle DB to another, just with an additional step to add the ODBC driver for MSSQL.

Database Administrator's Guide

boomzilla

@loopback0 yeah, that.

Gribnit

@LaoC said in The Revival of Great SQL Ideas:

@Zenith said in The Revival of Great SQL Ideas:

@Jaime Which is why some of us advocate not putting a ton of logic, especially procedural/iterative logic, into the database layer. You can pretty easily make an application support multiple SP interfaces. It's the content you don't want to do a ton of duplication/rewriting on.
SELECT CONCAT(
    "<tr><td>",thing,"</td><td>",colorofthething,"</td></tr>"
) FROM thingz;

You can't just post Oracle Forms source code on forums and expect Larry not to get you.

dkf

@LaoC said in The Revival of Great SQL Ideas:

SELECT CONCAT(
    "<tr><td>",thing,"</td><td>",colorofthething,"</td></tr>"
) FROM thingz;

That sort of code is why we can't have nice things, boys and girls.

Gribnit

@dkf said in The Revival of Great SQL Ideas:

@LaoC said in The Revival of Great SQL Ideas:
SELECT CONCAT(
    "<tr><td>",thing,"</td><td>",colorofthething,"</td></tr>"
) FROM thingz;
That sort of code is why we can't have nice things, boys and girls.

Yeah, the select should be grabbing up just 2-TD chunks, who knows when you'll need to add a column. Amateurs. I mean, sure, keep a convenience spelling of the name that gives the whole TR to retain API compatibility, fine.

Zenith

@Gribnit said in The Revival of Great SQL Ideas:

@Zenith said in The Revival of Great SQL Ideas:

@Jaime Which is why some of us advocate not putting a ton of logic, especially procedural/iterative logic, into the database layer. You can pretty easily make an application support multiple SP interfaces.

There are of course developers who would rather put code in the persistence layer, but thankfully, they keep dying.

Not that stored procedures are a bad thing, mind. As long as you're sure that the company will fail before they become a problem, or will never grow significantly, they remain a fine choice.

Depends on what you meant by code. I meant just enough code to read and write records directly. In practice, you do end up with some complex SELECT statements if you have to do reporting. From the front end, the .NET ADO classes are similar enough you can call those procedures without modifying the C# code. Or do like I did and write an interface that forces them to act similarly.

It's when your application is some dumb JavaScript (or pretend C# like MVC) front-end with just enough code to fill in SP parameters and then your SPs call SPs that call SPs that call SPs that do math/string manipulation while launching command line jobs that read from temporary tables in between transactions that you have a problem. Entity and the like are also stupid because an ORM just shifts the problem so you don't have to see/generate SQL for your sloppily designed application and now you've offloaded control over what happens to a different black box.