Small SQL-question

Kuro

I got a table with 3 columns (well, 4 if you count the primary index).
All three columns host are all numeric values.
All the numbers can be different or the same. It really doesn't matter.
I want all the entries from 1 of the columns that don't show up in the other columns.

example:

C1 | C2 | C3
 0   0    1
 4   3    1
 1   2    5
 3   4    6
 4   2    6

Let's say I want all the "unique" numbers from C3, That would be 5 and 6 because 1 is already used in in C1, row 3.

Is there a simple way to get those numbers? I haven't googled for it because I am not entirely sure what to google for.
One idea (by @Yamikuronue) was to use subqueries and NOT IN(). I am not sure if this would work and how fast this would work.

Database is a simple MySQL database because that's what I have at hand here.

Filed Under: You can derail this topic with how bad MySQL is after I have recieved help. Kthx

Addendum: I added another row to further illustrate what I mean

ChaosTheEternal

@Kuro If you want just the numbers, this may work:

SELECT C3 AS 'Unique C3'
  FROM theTable
 GROUP BY C3
HAVING COUNT(1) = 1;

May or may not be efficient, depends on the amount of data and indexes and whatnot. Someone else may have a much better idea.

boomzilla

You can use HAVING to filter based on aggregate values:

select
  c1,
  count(id)
from thetable
group by c1
having count(id) = 1

izzion

@boomzilla @ChaosTheEternal
Except, if I understand @Kuro correctly, the problem space isn't "what C3 values are unique in C3". I read the original question as: "find all values in C3 that do not exist in C1 or C2".

@Kuro, is that a correct interpretation of your requirement? Do you need to know what C1 and C2 are in rows that meet the filter, or just the distinct values in C3? Do you need to know how many times each "valid" C3 value occurs, or just that at least one case exists?

Kuro

@ChaosTheEternal @boomzilla

I must have not been clear enough (including my example. It's not exactly what I want, I think.

I added another row to the example to maybe make it clearer.

I don't want the unique C3s, I want the C3s that never appear in C1 and C2.

Filed Under; really appreaciate the help, though. Thanks

cartman82

In postgresql, but should work in MySQL as well.

CREATE TABLE numbers (
	id int PRIMARY KEY,
	c1 int NOT NULL,
	c2 int NOT NULL,
	c3 int NOT NULL
);

INSERT INTO numbers(id, c1, c2, c3)
VALUES
 (1, 0, 0, 1),
 (2, 4, 3, 1),
 (3, 1, 2, 5),
 (4, 3, 4, 6);

SELECT c3
FROM numbers
WHERE c3 NOT IN (
	SELECT c2
	FROM numbers
	UNION ALL
	SELECT c1
	FROM numbers
);

Kuro

@izzion said in Small SQL-question:

@Kuro, is that a correct interpretation of your requirement? Do you need to know what C1 and C2 are in rows that meet the filter, or just the distinct values in C3? Do you need to know how many times each "valid" C3 value occurs, or just that at least one case exists?

Yes, that interpretation is correct.

I need to know every C3, but I don't care how often the same value exists in C3. I don't even care what C1 and C2 are in those rows.

Filed Under: I hope I can make myself clear... I suck at asking these questions

mott555

I'm sure this is dumb and inefficient, and maybe not even quite usable SQL (haven't done much SQL in a few years) but something like this might work.

SELECT DISTINCT(C3) AS myValue
FROM theTable
WHERE
myValue NOT IN (SELECT C1 FROM theTable)
AND myValue NOT IN (SELECT C2 FROM theTable);

boomzilla

@izzion said in Small SQL-question:

xcept, if I understand @Kuro correctly, the problem space isn't "what C3 values are unique in C3". I read the original question as: "find all values in C3 that do not exist in C1 or C2".

I read it that way, too, at first, but then I re-read and that's not what I thought.

@Kuro said in Small SQL-question:

I don't want the unique C3s, I want the C3s that never appear in C1 and C2.

select distinct
  t.c3
from thetable t
left join thetable t2 on t.c3 = t2.c1
left join thetable t3 on t.c3 = t3.c2
where t2.id is null and t3.id is null

izzion

@boomzilla
I feel like the double left join like that would make Baby Jesus cry for any table of reasonable size, especially compared to @cartman82 's NOT IN syntax. I find myself debating whether doing some sort of SELECT DISTINCT on the c2 & c1 queries in the NOT IN clause would be faster, but I'm too lazy to set up any sort of dummy data to try things out and see.

boomzilla

@izzion said in Small SQL-question:

I feel like the double left join like that would make Baby Jesus cry for any table of reasonable size, especially compared to @cartman82 's NOT IN syntax.

reelz > feelz, which is to say...you never know unless you test it. Fuckin' databases.

Yamikuronue

@mott555 said in Small SQL-question:

SELECT DISTINCT(C3) AS myValue
FROM theTable
WHERE
myValue NOT IN (SELECT C1 FROM theTable)
AND myValue NOT IN (SELECT C2 FROM theTable);

That was basically my solution. I went for

SELECT C3
FROM theTable
WHERE C3 NOT IN (SELECT C1 FROM theTable UNION SELECT C2 FROM theTable)

because he didn't say he wanted distinct values

izzion

@mott555
(At least in MS-SQL world) the syntax you have wouldn't work because you can't use a column alias from the SELECT clause in the WHERE filter. WHERE gets parsed before SELECT, so there aren't any column aliases yet at that point of the query optimizer's execution.

FROM > WHERE > SELECT > ORDER BY, in terms of parsing order.

ChaosTheEternal

@izzion Yeah I read this:

Let's say I want all the "unique" numbers from C3

And just went from there, my bad for not reading the whole thing clearly.

anotherusername

@izzion something like this, then?

select distinct
  t.c3
from thetable t
left join (
  select distinct
    thetable.c1
  from thetable
) t2 on t.c3 = t2.c1
left join (
  select distinct
    thetable.c2
  from thetable
) t3 on t.c3 = t3.c2
where t2.c1 is null and t3.c2 is null

Kuro

@cartman82 This code does what I wanted to have happen.

Does anybody in here take issue with this code (as in, I should probably try other suggestions as well for performance, etc)?

Thank everybody who posted suggestions. You guys didn't even really derail, yet.

Filed Under:

izzion

@anotherusername
putting the "I Don't Even" in IDE today, I see...

mott555

@izzion said in Small SQL-question:

@mott555
(At least in MS-SQL world) the syntax you have wouldn't work because you can't use a column alias from the SELECT clause in the WHERE filter. WHERE gets parsed before SELECT, so there aren't any column aliases yet at that point of the query optimizer's execution.

FROM > WHERE > SELECT > ORDER BY, in terms of parsing order.

My knowledge of SQL is almost certainly an ugly blend of T-SQL, MySQL, and SQLite. I'm pretty sure my solution works somewhere, just not sure which SQL engine that would be.

But if he doesn't actually need unique like I thought he asked for, that gets rid of the alias.

anotherusername

@Kuro I'd say at least try some of the solutions that don't create a new table. But in the end, pick whatever's the fastest... (and works, obviously)

@mott555 said in Small SQL-question:

My knowledge of SQL is almost certainly an ugly blend of T-SQL, MySQL, and SQLite. I'm pretty sure my solution works somewhere, just not sure which SQL engine that would be.

Also, this.

boomzilla

@Kuro said in Small SQL-question:

Does anybody in here take issue with this code (as in, I should probably try other suggestions as well for performance, etc)?

If it runs reasonably, then probably no reason to worry about it. Premature optimization and all that. If the data is small enough then this all boils down to personal style preferences.

Kuro

@anotherusername said in Small SQL-question:

@Kuro I'd say at least try some of the solutions that don't create a new table. But in the end, pick whatever's the fastest... (and works, obviously)

I didn't copy the entire code. (Never copy the entire code). Pretty sure the created table was just for testing purposes. I just copied the select + NOT IN(subquery).

Filed Under: :)

mott555

@Kuro said in Small SQL-question:

Does anybody in here take issue with this code (as in, I should probably try other suggestions as well for performance, etc)?

How many rows are in the database and how often will the query be run? If it's a giant database and the query is running in code often, there might be room for improvement. If it isn't huge, or you're manually running the query occasionally, it probably doesn't matter enough to bother "fixing" it.

izzion

@boomzilla
At least until some group of nerds deploys your Forum For The Next Ten Years and breaks the shit out of your simple stored procedure because they created a topic with 10,000 posts and 250,000 likes, and it turns out you chose poorly with your simple throwaway function

anotherusername

@Kuro said in Small SQL-question:

@anotherusername said in Small SQL-question:

@Kuro I'd say at least try some of the solutions that don't create a new table. But in the end, pick whatever's the fastest... (and works, obviously)

I didn't copy the entire code. (Never copy the entire code). Pretty sure the created table was just for testing purposes. I just copied the select + NOT IN(subquery).

Filed Under: :)

oh, I totally misread that. Never mind. Yeah, that code looks fine.

Kuro

@boomzilla said in Small SQL-question:

If it runs reasonably, then probably no reason to worry about it. Premature optimization and all that. If the data is small enough then this all boils down to personal style preferences.

Well, I am not really "worried" about performance. But having people look over the code might bring forth some interesting insight. Maybe there is something I don't know about how this code works compared to others.

Filed Under: learning is a part of programming after all

Kuro

@mott555 said in Small SQL-question:

How many rows are in the database and how often will the query be run? If it's a giant database and the query is running in code often, there might be room for improvement. If it isn't huge, or you're manually running the query occasionally, it probably doesn't matter enough to bother "fixing" it.

Not enough to warrant any change. I am really just curious if there is a "more correct" solution I could apply if I ever ran into a similar problem on a bigger scale. :)

Filed Under: cheers

mott555

@Kuro In that case I'd go with what's simplest and easiest to understand. And I think we're already there.

Lorne Kates

I agree with whatever LEFT OUTER JOIN just said.

PleegWat

@Lorne-Kates Any database engine worth its salt should be able to transform between the LEFT OUTER JOIN ... WHERE ID IS NULL form and the WHERE ... NOT IN (SELECT ...) form

dangeRuss

@PleegWat said in Small SQL-question:

@Lorne-Kates Any database engine worth its salt should be able to transform between the LEFT OUTER JOIN ... WHERE ID IS NULL form and the WHERE ... NOT IN (SELECT ...) form

So not oracle then?

Karla

@Kuro Don't worry about performance until you have too.

I say this as someone who loves optimizing SQL.

Lorne Kates

@PleegWat said in Small SQL-question:

@Lorne-Kates Any database engine worth its salt should be able to transform between the LEFT OUTER JOIN ... WHERE ID IS NULL form and the WHERE ... NOT IN (SELECT ...) form

So not SteakDB... anyways.

I prefer LEFT OUTER JOIN because I read it easier, and other programmers suck.

Umm... I love you?

Lorne Kates

@dangeRuss said in Small SQL-question:

So not oracle then?

Oracle software is worth exactly it's weight in salt.

Filed under: What's the weight of some electrons?

djls45

My understanding (from looking at the plans for a few that we use here where I work) is that distinct and order by are the biggest time-killers for our queries for reports that can contain >100,000 rows. But maybe that's because the queries are written wrong in the first place. I would like to get in sometime and refactor them, but they're only run every couple months or so.

PleegWat

@djls45

xkcd: Is It Worth the Time?

Lorne Kates

@djls45 said in Small SQL-question:

is that distinct and order by are the biggest time-killers for our queries for reports that can contain >100,000 rows

If only there were some way to index columns you need to do sort-based operations on.

djls45

@Lorne-Kates said in Small SQL-question:

@djls45 said in Small SQL-question:

is that distinct and order by are the biggest time-killers for our queries for reports that can contain >100,000 rows

If only there were some way to index columns you need to do sort-based operations on.

We're not allowed to modify the database schema, and for whatever reason, we don't use views or stored procedures (which IIRC is the other way to get indices). If there's a way to create an index on an intermediate table/subquery in a query that doesn't take the same amount of time as the sort, I'm all ears.

Lorne Kates

@djls45 said in Small SQL-question:

If there's a way to create an index on an intermediate table/subquery in a query that doesn't take the same amount of time as the sort, I'm all ears.

Yes. But you take the performance hit populating the table variable.

We're not allowed to modify the database schema

Solution: replace your DBAs, they're defective.

djls45

@Lorne-Kates said in Small SQL-question:

@djls45 said in Small SQL-question:

If there's a way to create an index on an intermediate table/subquery in a query that doesn't take the same amount of time as the sort, I'm all ears.

Yes. But you take the performance hit populating the table variable.

Would this require splitting the One Massive Query For The Report into a series of smaller ones? I think that's what it would require in order to use table variables.

We're not allowed to modify the database schema

Solution: replace your DBAs, they're defective.

It has more to do with legal requirements and so anything that might possibly potentially lose data is absolutely verboten, forbidden, and not allowed, or else...

Lorne Kates

@djls45 said in Small SQL-question:

Would this require splitting the One Massive Query For The Report into a series of smaller ones? I think that's what it would require in order to use table variables.

Probably. Or at least you filter out logic vs. data retrieval.

DECLARE @SomeShit TABLE (SomeShitID int pk, SomeFuckingOrder int)

INSERT INTO @SomeShit (SomeShitID, SomeFuckingOrder)
SELECT GoddamnID, ROW_NUMBER() OVER (whatever)
FROM Shit
WHERE
    WholeBunchaShit = true
-- ORDER BY not needed because of ROW_NUMBER()

SELECT *
FROM @SomeShit
INNER JOIN Shit
INNER JOIN CornToShitRatio
INNER JOIN OtherShittyInformation
INNER JOIN ShitEaters
WHERE
-- @SomeShit.SomeFuckingOrder used to do pagination
ORDER BY @SomeShit.SomeFuckingOrder

Dragnslcr

@djls45 said in Small SQL-question:

Solution: replace your DBAs, they're defective.

It has more to do with legal requirements and so anything that might possibly potentially lose data is absolutely verboten, forbidden, and not allowed, or else...

If creating indexes might possibly potentially lose data, then your DBMS is so fragile that you're going to lose data eventually anyway.

e4tmyl33t

@Dragnslcr said in Small SQL-question:

@djls45 said in Small SQL-question:

Solution: replace your DBAs, they're defective.

It has more to do with legal requirements and so anything that might possibly potentially lose data is absolutely verboten, forbidden, and not allowed, or else...

If creating indexes might possibly potentially lose data, then your DBMS is so fragile that you're going to lose data eventually anyway.

That's what thrice-daily full backups that never get deleted are for.

PleegWat

@Dragnslcr said in Small SQL-question:

If creating indexes might possibly potentially lose data, then your DBMS is so fragile that you're going to lose data eventually anyway.

I've seen it happen - oracle again. The insert sometimes didn't update the index correctly, causing records to not be returned if the index was in the access path.

anotherusername

<>

@Karla said in Small SQL-question:

@Kuro Don't worry about performance until you have too.

Don't worry about performance until you've worried about performance? So, like, never?

@Lorne-Kates said in Small SQL-question:

worth exactly it's weight in salt

Don't make me come over there and slap you too...

</>

Karla

@anotherusername said in Small SQL-question:

<>

@Karla said in Small SQL-question:

@Kuro Don't worry about performance until you have too.

Don't worry about performance until you've worried about performance? So, like, never?

Man, I've been struggling to stay awake (and lost the struggle several times) during MS Dynamics Training. #sorryNotSorry

@Lorne-Kates said in Small SQL-question:

worth exactly it's weight in salt

Don't make me come over there and slap you too...

</>

Lorne Kates

@Karla said in Small SQL-question:

MS Dynamics Training

NAV or AX?

Karla

@Lorne-Kates said in Small SQL-question:

@Karla said in Small SQL-question:

MS Dynamics Training

NAV or AX?

I must have been sleeping during that part...the only answer in my mind is CRM.

Lorne Kates

@Karla said in Small SQL-question:

@Lorne-Kates said in Small SQL-question:

@Karla said in Small SQL-question:

MS Dynamics Training

NAV or AX?

I must have been sleeping during that part...the only answer in my mind is CRM.

Yes, to make it even more confusing they have a CRM that is called Dynamics but may or may not run any of the Microsoft Dynamics ERP softwares.

It is a slow, buggy crayon-colored shitshow with "Sales / Cases" along the top?

dkf

@anotherusername said in Small SQL-question:

Don't worry about performance until you've worried about performance? So, like, never?

Don't worry about fixing performance until you've shown that you've got a performance problem. Meanwhile, at least get the right answer; people don't care nearly so much about the performance of code that is wrong…

Karla

@Lorne-Kates said in Small SQL-question:

@Karla said in Small SQL-question:

@Lorne-Kates said in Small SQL-question:

@Karla said in Small SQL-question:

MS Dynamics Training

NAV or AX?

I must have been sleeping during that part...the only answer in my mind is CRM.

Yes, to make it even more confusing they have a CRM that is called Dynamics but may or may not run any of the Microsoft Dynamics ERP softwares.

It is a slow, buggy crayon-colored shitshow with "Sales / Cases" along the top?

Definitely slow, not very intuitive, it as Sales across the top. In your Cases may have a been a custom entity.

Today, I got caught nodding out.