Can't use a view--should I just directly join tables in code or use a stored procedure?

Benjamin Hall

Pretty much what the title says.

I have a MySQL () CRUD application that has a normalized structure. For ease of querying (since I mainly want the same basic information each time) I made a view. Turns out that prepared statements and views don't mix in a stable manner (long-standing bug in MySQL). So should I replicate that view structure in my query strings (doing the joins each time) or should I create a set of stored procedures for the queries?

The queries are basically

SELECT * FROM <view> WHERE <filter> = <value> ORDER BY <date>

and the view is defined as

CREATE VIEW completionreport AS
SELECT c.id AS id,
       c.student_email AS student_email,
       a.title AS title,
       c.completed_on AS completed_on,
       s.name AS subject,
       a.uuid AS assignment_id
FROM completion c
       LEFT JOIN assignment a ON c.assignment_id = a.id
       LEFT JOIN subject s ON a.subject = s.id

I'm basically searching for one of the following:
student_email = value
title = value
subject = value
or completed_on before/after value

This part is read only, the insertions and deletions are handled elsewhere more directly.

So, copy that crap into each query or make a set of stored procedures? Opinions?

RaceProUK

@benjamin-hall I'm tempted to recommend stored procedures for this, although it may be possible to do it in a single procedure, if it's carefully designed.

Benjamin Hall

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@benjamin-hall I'm tempted to recommend stored procedures for this, although it may be possible to do it in a single procedure, if it's carefully designed.

May I ask what the pros/cons are in your mind? I'm totally new to SQL, so I'm trying to learn how not to cause moments.

And how hard would it be to "carefully design" a single procedure?

wharrgarbl

@benjamin-hall I like these things in code because they're better version controlled, and I don't have to worry about the production server running some old version of said procedures and whatnot

masonwheeler

@wharrgarbl said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@benjamin-hall I like these things in code because they're better version controlled, and I don't have to worry about the production server running some old version of said procedures and whatnot

You can (and should!) put your database-setup scripts in version control too.

RaceProUK

@benjamin-hall said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

May I ask what the pros/cons are in your mind?

The main reason I'd favour sprocs is mainly to contain the logic inside an easy-to-use wrapper. You can still get a table out of them, same as any other statement.

@benjamin-hall said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

And how hard would it be to "carefully design" a single procedure?

In this case, not too hard: the easiest way would be to have the query inside with all the conditions and some null checks to bypass them, and a parameter list where you specify only the one you need. That way, you only have one copy of the query, instead of the four or five you'd have by using separate sprocs.

@wharrgarbl said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

I like these things in code because they're better version controlled

Tools exist that allow you to keep DB object definitions in source control.

accalia

@benjamin-hall said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

stored procedures for the queries?

just use these.

it's more annoyance in the short term but a fucking lifesaver in the long term.

do it right, don't be

:-)

masonwheeler

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

Tools exist that allow you to keep DB object definitions in source control.

Yeah. It's called "source control". :P

You put your "DB object definitions" in .sql scripts, which are plain text files containing CREATE WHATEVER statements, and save them in a folder alongside your source code.

In a well-run shop, you'll give the files some sort of serial number in their filename, and have a tool that will apply them to the database in order, and save the applied scripts' serial numbers to a special table so it knows which ones still need to be applied.

Benjamin Hall

@accalia @RaceProUK Thinking about trying to do it in one sproc. Which would be better:

Create a temp table in the sproc that selects like a view, just temporary. Then have a construct that picks the not-null parameter and selects from the temp table based on that value. Benefit: only have to have the "view" code once makes for easier maintenance. Downside: temp tables? are they slow (not that it really matters, but trying to learn good habits).
Have the null checks to select up front and then have the query duplicated in each of the branches. Benefit: avoids temp tables. Downsides: much duplicated code; more room for things to break if something has to change.

I'd really rather not write a specific sproc for each case, duplicating the logic each time.

Thanks for your help =)

RaceProUK

@masonwheeler said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

In a well-run shop, you'll give the files some sort of serial number in their filename, and have a tool that will apply them to the database in order, and save the applied scripts' serial numbers to a special table so it knows which ones still need to be applieduse a tool designed to synchronise SQL schemas like RedGate SQL Compare.

masonwheeler

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

RedGate

RaceProUK

@masonwheeler said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

RedGate

Still better than having a million indexed scripts that have to be run in a precise order.

blakeyrat

@benjamin-hall said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

I have a MySQL () CRUD application that has a normalized structure. For ease of querying (since I mainly want the same basic information each time) I made a view. Turns out that prepared statements and views don't mix in a stable manner (long-standing bug in MySQL).

I'd use the prepared ~~view~~query with the JOIN prepopulated.

MySQL is shit, but even it shouldn't have trouble figuring out how to ignore joined tables that aren't used in generating the output. Plus that way you can use the prepared statements you're using everywhere else, so it's more consistent.

If you're going the SPROC route, you'd probably want to go all-in, which would mean rewriting a lot of code.

boomzilla

@accalia said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

it's more annoyance in the short term but a fucking lifesaver in the long term.

It seems like mostly an annoyance to me. What's the lifesaving that's going on? I have some stored procedures for raisins and they're much more of a PITA than having my queries in the code. Especially from a configuration management perspective.

And, yes, before someone pipes in, everything defining that sort of thing is in version control.

blakeyrat

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

The main reason I'd favour sprocs is mainly to contain the logic inside an easy-to-use wrapper. You can still get a table out of them, same as any other statement.

?

SPROCS are notoriously annoying to handle if you have variable numbers of arguments. And you get a table out of the prepared statement query too, so.

Benjamin Hall

@blakeyrat said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@benjamin-hall said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

I have a MySQL () CRUD application that has a normalized structure. For ease of querying (since I mainly want the same basic information each time) I made a view. Turns out that prepared statements and views don't mix in a stable manner (long-standing bug in MySQL).

I'd use the prepared view with the JOIN prepopulated.

MySQL is shit, but even it shouldn't have trouble figuring out how to ignore joined tables that aren't used in generating the output. Plus that way you can use the prepared statements you're using everywhere else, so it's more consistent.

If you're going the SPROC route, you'd probably want to go all-in, which would mean rewriting a lot of code.

I can't use a view due to a MySQL bug that makes prepared statements against views fail if there's essentially any database activity elsewhere on that server. Since this is shared hosting with a common database server... Prepared statements and SPROC calls work just fine in the same circumstances, so I have to transition from the view to another way of doing things. It's a simple app (right now) so rewriting code isn't too bad, but I'd rather figure out the best way of doing it now.

ben_lubar

Obligatory plug for bmdbupdate.exe, which is what all Inedo software including NCMS and WtfWebApp use.

RaceProUK

@blakeyrat said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

SPROCS are notoriously annoying to handle if you have variable numbers of arguments

Who said anything about variable numbers of arguments?

lolwhat

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@masonwheeler said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

RedGate

Still better than having a million indexed scripts that have to be run in a precise order.

I work for a company where they have all this stuff down to a science - a software tool that keeps track of script order for each release (and will run them for you against a set of DBs you specify), requirements for scripts to be rerunnable without breaking, etc. This system is vastly better than what Redgate gives you, especially if you have any more than a couple developers involved (we have several hundred devs writing scripts).

blakeyrat

@benjamin-hall I meant to type "query" not "view". Sorry, just a brain fart.

blakeyrat

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

Who said anything about variable numbers of arguments?

The OP.

I'm basically searching for one of the following:
student_email = value
title = value
subject = value
or completed_on before/after value

Benjamin Hall

@blakeyrat ah. Makes more sense. On that note, can I just save the join part to a variable and concatenate a different where part for each time of query? I'm not worrying about SQL injection since it's a proper prepared statement and the initial part is constant.

blakeyrat

@benjamin-hall said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

On that note, can I just save the join part to a variable and concatenate a different where part for each time of query?

I'm sure there's some Nazi who'd ding you for that in a code review. I'm fine with it.

RaceProUK

@blakeyrat said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@raceprouk said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

Who said anything about variable numbers of arguments?

The OP.

I'm basically searching for one of the following:
student_email = value
title = value
subject = value
or completed_on before/after value

That doesn't have to be a variable number of parameters, just a fixed set of five.

@benjamin-hall said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

On that note, can I just save the join part to a variable and concatenate a different where part for each time of query? I'm not worrying about SQL injection since it's a proper prepared statement and the initial part is constant.

I'm always wary of dynamic SQL, but it can be done securely if there's no user input.

masonwheeler

@lolwhat said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

(we have several hundred devs writing scripts).

Where in the world do you work? Heck, I'm at a freaking Fortune 500 company, and we don't have teams that big working on the same project. I think I'm on the largest software team, and it's about a dozen people. (Including management and QA.)

lolwhat

@masonwheeler It's several hundred devs over the whole org, not just on one project. It just so happens that our code base is rather monolithic.

izzion

@benjamin-hall
I want to start off with an important disclaimer: I work exclusively with MS-SQL and I do not know if there are differences with the query optimizer or plan cache in MySQL that would provide caveats or drawbacks to this method.

We do a lot of stored procedures for queries, mostly due to official policy. It's nice to have a tuning point the DBAs can leverage without having to make an application change - if I don't change the shape of the inputs or outputs, and don't change which data the procedure is returning given a fixed set of inputs, I can rewrite the query to my heart's desire to make the sproc work better.

A challenge with search queries like your issue is how do you handle them properly. The simple approach would be (I'm psuedocoding the datatypes for the columns - replace with the right datatype and length based on the actual underlying column in the tables):

CREATE PROCEDURE usp_CompletionReportSearch 
    @student_email VARCHAR(MAX)
    @title VARCHAR(MAX)
    @subject VARCHAR(MAX)
    @completed_before DATETIME
    @completed_after DATETIME
AS
SELECT
    c.id AS id
    ,c.student_email AS student_email
    ,a.title AS title
    ,c.completed_on AS completed_on
    ,s.name AS subject
    ,a.uuid AS assignment_id
FROM completion c
LEFT JOIN assignment a
    ON c.assignment_id = a.id
LEFT JOIN subject s
    ON a.subject = s.id
WHERE
    (@student_email IS NULL OR c.student_email = @student_email)
AND (@title IS NULL OR a.title = @title)
AND (@subject IS NULL OR s.name = @subject)
AND (@completed_before IS NULL OR c.completed_on <= @completed_before)
AND (@completed_after IS NULL OR c.completed_on >= @completed_after)

The problem with the above approach is that the most performant way to perform the search is going to vary widely depending on which search parameter(s) are requested - the best way to filter on student_email is probably going to use different indexes and order of operations in the query than filtering on subject and completed_on.

You then have two options for how to avoid the "Parameter Sniffing" problem of the monster search query:

Add an OPTION(RECOMPILE) hint to the query, so that it compiles a new query plan every execution, rather than working with a cached plan.
Use Dynamic SQL to build a query that only has WHERE filters for non-NULL parameters, which will allow SQL to store the plan cache for each (effectively ad-hoc) query that is generated for a specific set of parameters.

In general, option #2 is better if and only if your search parameters are such that you'll get similar plans for a given filter criteria no matter what value is passed in. In this case, parameters such as student_email are likely to be very selective all the time, while completed_on might have a wide variance in the number of rows that would return (depending on whether the search date range is 1 day or 1 year), and might benefit from RECOMPILE-ing the plan every time. You can also kind of split the baby, and do Dynamic-SQL and add the OPTION(RECOMPILE) when specific search parameters are in play, but that gets into really fugly maintainability.

So, if you're going with option #2 without any RECOMPILE, it would look something like this (note: I'm including the WHERE 1 = 1 for the sake of making the Dynamic SQL simple to build - if you have some filter criteria that will ALWAYS be used, you can use that in place of the 1 = 1 check, but even if you don't, the optimizer will just blast the 1 = 1 check out of the query anyway). Also, I'm renaming the dynamic SQL variables to @P_variable for readability, you can get away with using the exact same names if you prefer.

CREATE PROCEDURE usp_CompletionReportSearch 
    @student_email VARCHAR(MAX)
    @title VARCHAR(MAX)
    @subject VARCHAR(MAX)
    @completed_before DATETIME
    @completed_after DATETIME
AS
BEGIN
DECLARE @Query VARCHAR(MAX) = 'SELECT
    c.id AS id
    ,c.student_email AS student_email
    ,a.title AS title
    ,c.completed_on AS completed_on
    ,s.name AS subject
    ,a.uuid AS assignment_id
FROM completion c
LEFT JOIN assignment a
    ON c.assignment_id = a.id
LEFT JOIN subject s
    ON a.subject = s.id
WHERE 1 = 1'

IF @student_email IS NOT NULL
    @Query += ' AND c.student_email = @P_student_email'

IF @title IS NOT NULL 
    @Query += ' AND a.title = @P_title'

IF @subject IS NOT NULL
    @Query += ' AND s.name = @P_subject'

IF @completed_before IS NOT NULL
    @Query += ' AND c.completed_on <= @P_completed_before'

IF @completed_after IS NOT NULL
    @Query += ' AND c.completed_on >= @P_completed_after'

sp_executesql @Query
    ,@params = N'@P_student_email VARCHAR(MAX)
        @P_title VARCHAR(MAX)
        @P_subject VARCHAR(MAX)
        @P_completed_before DATETIME
        @P_completed_after DATETIME'
    ,@P_student_email = @student_email
    ,@P_title = @title
    ,@P_subject = @subject
    ,@P_completed_before = @completed_before
    ,@P_completed_after = @completed_after
END

masonwheeler

@lolwhat said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@masonwheeler It's several hundred devs over the whole org, not just on one project. It just so happens that our code base is rather monolithic.

I'm so sorry...

Magus

@masonwheeler said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

In a well-run shop, you'll give the files some sort of serial number in their filename, and have a tool that will apply them to the database in order, and save the applied scripts' serial numbers to a special table so it knows which ones still need to be applied.

In a place where people are remotely sane, you will have something else control the ordering, and never name your files dumb things that you can't get information on what's inside by looking at.

masonwheeler

@magus A typical script would have a name like 6500.035.Add foo column to bar table.sql. So you can tell at a glance exactly what it is: it belongs to version 6.5.0.0, serial number 35 within that version, and it adds a foo column to the bar table.

Magus

@masonwheeler We invented entire metadata systems so we wouldn't have to do that, and there are a billion other ways to keep track of that. Sorry, but your idea is dumb.

masonwheeler

@magus said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@masonwheeler We ~~invented entire metadata systems~~did a huge amount of work so we wouldn't have to do ~~that~~something much simpler, and there are a billion other really complicated ways to keep track of that. Sorry, but your idea is dumb.

FTFY :P

Magus

@masonwheeler You are not becoming more correct.

lolwhat

@masonwheeler Yeah, uh, the chance of human error alone renders cute file names unworkable, unless you are literally the only developer on the project and thus the error blows up only you.

masonwheeler

@lolwhat How do you figure? This is something that we never had any "blow up" level problems with, in 5 years I was there. Pretty much all the (very few) errors that there were, got caught in code review.

lolwhat

@masonwheeler said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@lolwhat How do you figure? This is something that we never had any "blow up" level problems with, in 5 years I was there. Pretty much all the (very few) errors that there were, got caught in code review.

In other words, code reviewers had to look out for those numbers being correct - one more thing they had on their checklist, but something that could easily be automated away with the app we have.

OK, so this serial number thing... what happens if you discover you need to put a script between two existing ones in the running order? You can't tell me such a situation never happens. Given what you've said so far, you'd need to renumber literally ever script after that one. Right? Whereas with our app, we just drag the new script and drop it between the two others.

RaceProUK

@masonwheeler said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

This is something that we never had any "blow up" level problems with, in 5 years I was there.

Congratulations on the astonishing run of luck you had.

masonwheeler

@lolwhat said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

You can't tell me such a situation never happens.

Sure I can. Once you've committed a script, the ordering is set in stone. If you want to change something, you create another script. This keeps s involving

you'd need to renumber literally ever script after that one. Right?

from ever happening.

lolwhat

@masonwheeler I know it's Friday, but you're taking the piss. So, with the system we have, whenever we, say, change a stored procedure named blah, we know that the script for that sproc is always, always, always named blah.sql. We also know that it's located in a precise folder within source control, named sprocs. On top of that, since the sproc is used within the foo DB type (we have several), that sprocs folder is in turn contained within a folder named foo. In other words, if you wish to see the change history of a given sproc, you know exactly where to look. No fucking way does that happen in your system, unless you manually look through the entire source tree or you have some magical source control provider that lets you somehow track that shit.

boomzilla

@blakeyrat said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

SPROCS are notoriously annoying to handle if you have variable numbers of arguments.

I know that in Oracle I can set default values and use named parameters when I call, which makes this case pretty easy. I haven't used SPROCs in MySQL, but it wouldn't surprise me if it sucked for that.

masonwheeler

@lolwhat Text search for "alter procedure blah". Got yer change history right here. Next?

lolwhat

@masonwheeler said in Can't use a view--should I just directly join tables in code or use a stored procedure?:

@lolwhat Text search for "alter procedure blah". Got yer change history right here. Next?

And not have any of the history tools that source control gives you if you keep the name the same in the first place. Not to mention you need Agent Ransack or some other full-text search tool that isn't instant.

cartman82

Everybody here is focusing on performance a bit too much IMO. I would always focus on system clarity and simplicity first, and worry about performance later, if the need even arises at all (as long as you are not doing ridiculously stupid things).

I went the sproc heavy route on an older project, with dynamic queries and everything. It can work, I got some performance out of it, but at the cost of system stability and increased complexity.

Compared to code, database is a clumsy heavy beast to manage. I can just pave over my code deployment in a second and restart the services. Can I do the same with database? It's almost like my DB code became a hostage of this giant pile of data which I had to carefully prod and maintain. It was doable, but a huge headache nonetheless.

These days, barring some special use case, I'd want all my code to be in actual code files that I can package, move around and easily deploy, and my database to remain as close to a dumb repository of data as possible.

TLDR: Put SQL in your code, keep database simple.