It's Time To Get Over That Stored Procedure Aversion You Have

xaade

If you use a sproc to keep the data in sync, then you have to concern yourself with validating every code path that interacts with the hourly data.

It's easier than you think to police calling sprocs.

Well, one of the reasons is that we have a few tasks that occur on every update of a table. So every table has a write/update/view set.

Jaime

@xaade said:

It's easier than you think to police calling sprocs.

Is it easier than the zero effort it takes to police trigger usage? If not, then your statement is a form of "It's good because it doesn't hurt that much."

blakeyrat

@RaceProUK said:

Sometimes you need to to get the performance. It's shit, I know, but sometimes you have no other choice.

I've never seen anybody who made that claim actually prove it via data.

I'm not saying it's wrong. But it's definitely in my, "hmmmmmmmmm" file.

RaceProUK

Where I used to work (a vehicle telematics company), they would roll-up data in some form of cache, simply because there was so much plot data. Whether that's genuine denormalisation or not… I dunno. But what I do know is that, in order to make the reports/screens not take minutes to run, some data had to be rolled up.

blakeyrat

@RaceProUK said:

Where I used to work (a vehicle telematics company), they would roll-up data in some form of cache,

I don't even know what that means. How do you "roll-up" data? What form of cache?

https://www.youtube.com/watch?v=_P_hKcZgQhY

RaceProUK

Roll-up just means summarise really. And I'm using cache in a fairly loose sense.

blakeyrat

@RaceProUK said:

Roll-up just means summarise really. And I'm using cache in a fairly loose sense.

Are you talking about OLAP? Because that's completely different (and has absolutely nothing to do with) denormalization.

PleegWat

Similarly here, though we use an OLAP approach. Though not any of the oracle olap features.

Database, pump in half a million rows a minute. That's the base cube. Derive from that aggregated data (subset of dimensions, truncated time). That's the data you end up viewing in the UI.

This is all via backend job scheduling; there is no 'update aggregated data when the user updates the source' because the configuration only affects how new input data is processed.

RaceProUK

@blakeyrat said:

Are you talking about OLAP?

Maybe? I dunno. I was on the web side of things; someone else handled all the SQL magic.

blakeyrat

Ok how about you figure out what the fuck you're talking about, then come back, ok?

lolwhat

@blakeyrat said:

Ok how about you figure out what the fuck you're talking about, then come back, ok?

RaceProUK

@blakeyrat said:

Ok how about you figure out what the fuck you're talking about, then come back, ok?

Eh, it doesn't matter really; just being a silly girl ;)

Yamikuronue

If you're going to be like that you can shoo right back onto the other team >.> ;)

RaceProUK

…I'll be good

In all fairness, @blakeyrat's probably right; I probably was talking about something other than OLAP.

Eldelshell

Should I?

I feel dirty

RaceProUK

I've liked the occasional @blakeyrat post; for all his acidity, he does sometimes have a valid point

…and Discourse is telling me off for posting so much… but then, as of this post, I do have 21% of the posts in this thread, so it's kinda got a point

accalia

@RaceProUK said:

That should be painted in dayglo pink on a black background in letters 18" high, and erected in every development office worldwide.

We're gonna need a lot of paint…

s/sproc/the database/g and we have a winner.

has anyone heard of SQL Queues? i just learned about them because apparently core business logic relies on them and they stopped working all of a sudden.

4 days of trying to figure out what the damn thing was even trying to do and it still wans't fixed. so i said belgium it and nuked the belgium-er, then implimented the behavior in the service layer where it damn well belonged.

ten minutes later all was fixed and all was happy, except for the original dev of the code who still insisted that the queue needed to be fixed.

PleegWat

I near-daily work with a software platform that is acctually 99% implemented in SQL. As in, there's a java shim because you cannot call stored procedures direct from the webserver.

I reimplemented the overview page in PHP because that performs better and gives more information.

RaceProUK

@accalia said:

except for the original dev of the code who still insisted that the queue needed to be fixed.

I'd tell him to go Belgium himself ;)
@PleegWat said:

I reimplemented the overview page in PHP because that performs better

Wow, the original code must have been a veritable trainwreck…

PleegWat

I can't imagine building HTML pages in PL/SQL is fast. PHP at least was built for that job.

RaceProUK

@PleegWat said:

PHP at least was built for that job.

Well… OK, I'll concede that one ;)

Jaloopa

@PleegWat said:

building HTML pages in PL/SQL

I've built very limited HTML snippets in T-SQL

Jarry

@Jaloopa said:

HTML snippets in T-SQL

you should burn in hell. or at least spend a couple of years in the purgatory.

Luhmann

Join the club! One can do crazy stuff in SQL when the platform you're calling from is weaker then a VB6. It was a custom/visual flow designer. Breaking out with ODBC to MS SQL stored procs or plain queries was often done for 'trivial' stuff like date manipulation because that was hardly supported, e.g. unless you wanted to cut date strings manipulate them tediously and collate them again. Or you could just flunk in an ODBC connection to the existing application DB and fire of a query. And yes I did create HTML snippets, mostly text for e-mails, that way.

Groaner

@Onyx said:

I think the main complaint most people will sling at stored procedures is "Potrability!"

But, honestly, how fucking often do you change database engines, and how sure are you that your ORM can deal with it?

Seriously. DB portability is a problem that should be solved when it becomes a problem, because it's invariably going to be a massive undertaking for any non-trivial product.

@Onyx said:

The prime example is getting the ID of last inserted record. Every DB engine has (or doesn't have) it's own way of dealing with this. Does your ORM deal with them all? If not, how much pain will that cause? More or less than changing bits of specific syntax in SQL.

I always use ~~@@identity~~ ~~SCOPE_IDENTITY()~~ OUTPUT clauses!

@Onyx said:

Not to mention query planners and their quirks. With stored procs or without them, are you sure you won't have to rewrite a single query you / your ORM generates?

Of course not. And tuning is going to be highly specific to the load, the database engine, etc.

Groaner

@Maciejasjmj said:

Also, version controlling.

Yeah, there's no getting around having to script out changes. If each proc is scripted into its own file, it's reasonably easy to manage, though.

@Maciejasjmj said:

Also also, debugging is a pain.

How so? I've found it to be fairly simple, Do something like this:

begin transaction

select <validate precondition...>

exec procname

select <validate postcondition...>

rollback

And since you can compare entire datasets fairly easily, if you dump the output of Big Huge Critical Report A into tables, you can perform regression testing of old outputs vs. new by doing something like this:

select * from #olddata
except
select * from #newdata

select * from #newdata
except
select * from #olddata

(If both of those return no results, the old matches the new).

Groaner

@Eldelshell said:

Why you can't ask your developer to be an SQL guru? Because it's not something they are 100% of their time doing. In general, SQL stuff is something you learn enough to accomplish a certain task and then forget about it. It's also why you don't want your DBA's to be doing any frontend development. They'll probably hack something in jQuery that'll look like an SQL EXECUTE and end up here.

At the last two places I've worked, developers are expected to be competent both at front-end stuff and at SQL.

Groaner

@boomzilla said:

I use ORM extensively. Also SQL. I have some stored procedures and functions and even some backed by Java code.

This sounds like a prudent mix of strategies.

@boomzilla said:

The main thing that stored procedures do for me is that they make my life more difficult from a configuration control / administrivia / procedure perspective. So there better be some major gain to adding them.

How so? I've seen nothing but benefits from using them (and views/functions), most notably:

All paths to the database are known and documented, so risk assessment is much easier
Having an interface/abstraction layer allows changes to the underlying data model without changing the application
Compiled query plans
Improved security (provided the app isn't logged in as sysadmin)

@boomzilla said:

Exactly. And it's a lot easier to drop to SQL when you really need it and let the ORM handle the drudgery the rest of the time.

I can agree with this, but I typically find myself using the ORM on views/functions/stored procedures just because having that layer of abstraction is helpful. If the ORM, for example, coughs on a proc that returns multiple result sets, then I too drop down to SQL.

Groaner

@Jaime said:

Trigger.

This should always be at the bottom of the list, when there are no other alternatives.

Triggers are magical things. They're so magical that people forget that they exist, and that's where the problems start.

@Jaime said:

If you use a sproc to keep the data in sync, then you have to concern yourself with validating every code path that interacts with the hourly data.

@Jaime said:

Is it easier than the zero effort it takes to police trigger usage?

If you use a trigger to keep the data in sync, then you have to concern yourself with all the subtle ways that trigger (and ostensibly a few others) can fire. It also leads to some fun debugging sessions.

Groaner

@Jaloopa said:

I've built very limited HTML snippets in T-SQL

I've seen code that composes HTML emails to be sent via sp_send_dbmail*. That's where I draw the line of insanity.

*This snippet was inside an UPDATE trigger to email a big boss whenever a record in table XYZ changed.

accalia

@Groaner said:

This snippet was inside an UPDATE trigger to email a big boss whenever a record in table XYZ changed.

WTF?

what level of insane paranoia and mistrust pervaded if the big bos was getting emails from the middle of a production database?

Groaner

@accalia said:

WTF?

what level of insane paranoia and mistrust pervaded if the big bos was getting emails from the middle of a production database?

A big boss of one of our clients. The table in question contained product information for products they sold, and thus the information therein was updated fairly infrequently, but when it was updated, it was important enough that they wanted to know about it.

sam

The point is usually stuff like this

users = User.where('age > 2')

if descending 
   users = users.order('age desc')
else
   users = users.order('age')
end

if admins_only
  users = users.where('admin')
end

You do tend to carry along MAJOR amounts of code to support this stuff 10s of thousands of lines, personally I almost always prefer the my SQL Builder pattern which fits into 100 lines of dependency.

query = 'SELECT * FROM Users /**where**/ /**order**/'

builder = SqlBuilder.new(query)

if descending 
   builder.order('age desc')
else
   builder.order('age')
end

if admins_only
  builder.where('admin')
end

svieira

Not quite that level of insanity, but at one place I worked we had a system where multiple stored procedures were called to build up a multi-tab report in HTML/XML that could be either displayed in the browser or wrapped up in a zip file and sent to the client as an Excel file.

_{The zipping and downloading of the "Excel" file was handled by the application though, hence the lesser WTF.}
_{_{And yes, I think the real WTF is that Excel supports that}}
_{_{_{And yet, I can't help but be the teensiest bit impressed}}}

Groaner

@svieira said:

And yes, I think the real WTF is that Excel supports that

Excel is remarkably tolerant of all the different formats you can throw at it. I've seen a lot of export utilities that spit out HTML and import flawlessly.

apapadimoulis

@Groaner said:

@Onyx said:
I think the main complaint most people will sling at stored procedures is "Potrability!"
But, honestly, how fucking often do you change database engines, and how sure are you that your ORM can deal with it?

Seriously. DB portability is a problem that should be solved when it becomes a problem, because it's invariably going to be a massive undertaking for any non-trivial product.

So, interestingly enough, we're actually right in the middle of this with ProGet. It currently runs on Windows/SQL Server, and we're working on getting it working in Linux / Postgres, too. Everything's done with sprocs.

You know what would have been even more of a pain? Not using sprocs. SQL Server and Postgres are pretty close, but far enough away that using SPROCS as a database "API" keeps the database-specific stuff in the database.

flabdablet

You build applications to generate data. That data is your value, that value brings in the money. There is nothing else and if you think safeguards for this data belong anywhere else other than cuddled up next to your data with a machine gun and flamethrower you’re out of your fucking mind.

Points for passion and style.

Bulb

@Groaner said:

At the last two places I've worked, developers are expected to be competent both at front-end stuff and at SQL.

It depends on what you are up to. At my current job I am doing, on and off, a lot of SQL. No stored procedures though, because sqlite does not have them. It's an application that bakes some data and SQL is the right tool for big part of processing that needs to be done to it. And I mean queries that join seven tables, joins that join the same table three times with itself plus some other table, nested queries and such. It wouldn't be really possible to have separate person writing the SQL, because it is very much part of the logic.

On the other hand if there is a bunch of disparate applications written by different people at different times that connect to the same database, there have to be separate database developers (database admins does not really sound appropriate any more) who do the SQL and provide simpleish interface with sprocs, views and triggers and the application developers don't need to understand most of it.

Bulb

@Groaner said:

ORM

Years ago I was on a loan to a team doing some serious database stuff for government agency. They just took over maintenance of a key system from another company. The system was generated in some database design tool (not really ORM or just ORM; some huge enterprisey thing I don't really know anything about).

The result was that when I needed to upload something to the database I had to use a xml-rpc service where for each record I first asked for a bloated chunk of XML, edited the XML, sent it back and then I had to handle retries when it changed meanwhile, because obviously I couldn't have normal transactions that way.

And then the database guy showed me the database backend it generated. Everything was wrapped in layers and layers of cryptically named views (something in style of __psqw0021, __psqw0058 etc.) nested at least 10, 20 deep. Compared to the databases made by our team that were simple translation of the required model to database schema and uploading was usually done by inserting somewhere and calling an sproc to update related entries correspondingly.

RaceProUK

That sounds like a system that needs to die in a lake of fire...

Bulb

I believe that was the intention, but it's pretty difficult when the system keeps rather important records about 10 million people and bunch of other systems rely on that data. For me it was only a temporary job though, so I don't know how it developed further.

On a side-note, my task there was uploading data to various systems that came, usually offline, from various other agencies. That was done using a sparrows' nest of perl scripts. Since the formats were various text files with fixed-width fields, various half-baked forms of xml and such, it was actually reasonable tool for the job (the WTFs were they weren't using any version control and that when I installed subversion for them, they didn't understand it anway; plus the ad-hoc setup on the servers where each script was executed differently and used different method for reporting errors).

dkf

@Bulb said:

No stored procedures though, because sqlite does not have them.

You can use a custom function instead, but SQLite really relies on the fact that it's running embedded…

dkf

@Bulb said:

a sparrows' nest of perl scripts

That sounds like the correct collective noun to me. :D

Bulb

The problem was that the resident Perl Popes did not really now it. The script I was to make had to poll input directory for new files and if one was found process it and move it to archive. So I wrote a module for checking for new files and moving them, put the actual actions in another module, tied it together and documented it so the file handling part could be reused. The two devs responsible for most of the perl scripts were not able to, because they simply had no clue how to write a "class" in perl at all.

boomzilla

@Groaner said:

How so? I've seen nothing but benefits from using them (and views/functions), most notably:

I have an extra layer of review and bureaucracy and deployment. Now configuration control is more fragmented. If a query is inside my code, it's easy to see what's deployed with that code. Not nearly so easy to know what code is living inside a schema.

Now I'm dealing with PL/SQL (though all such languages from all DBs looks about equally awful to me).

@Groaner said:

All paths to the database are known and documented, so risk assessment is much easier

But anything inside the DB is more of a PITA to analyze, not to mention debug. Pass.

@Groaner said:

Having an interface/abstraction layer allows changes to the underlying data model without changing the application

I can see why you might want this, and in some cases, I have this with my judicious use of them.

@Groaner said:

Compiled query plans

This seems like the least of my issues, though I have used them in some circumstances when certain queries would go bonkers based on the parameters used over time. So, again, something to use for a particular purpose, not to drive me crazy for the occasional benefit.

@Groaner said:

Improved security (provided the app isn't logged in as sysadmin)

If this is a benefit, then you have bigger problems.

Jaime

@Groaner said:

All paths to the database are known and documented, so risk assessment is much easier

No, all paths within the database are known. You still have the same challenge figuring out which applications call which procedure. Also, sp_depends is notoriously inaccurate since deferred resolution was introduced.
@Groaner said:

Having an interface/abstraction layer allows changes to the underlying data model without changing the application

Yes. And this is just as true if that interface/abstraction layer is written in Java or C# as it would be if it is written in T-SQL. What is your argument for the T-SQL implementation being superior?
@Groaner said:

Compiled query plans

Meh. Ad hoc plan caching has been around for a very long time, so this advantage is only true if you are using a 15 year old version of your chosen database platform. Ironically, I sometimes implement things as stored procedures specifically so I can mark those procedures as "don't cache". You can't do that with ad hoc batches.
@Groaner said:

Improved security (provided the app isn't logged in as sysadmin)

The security issues don't change, they just move. Security should be at the business layer, not the data layer. That way, when security gets more sophisticated than "these people have read access to this entity", it doesn't turn all hairy. If you insist on implementing security at the data layer, you get the following effects:

Business logic starts to creep into the database since some security rules can only be expressed in terms of business processes.
Identity has to be pushed to the db layer. This is really bad for web apps and three-tier apps. It disables connection pooling. It makes password management harder. It forces SQL Server to use the less secure SQL Authentication.
It doesn't fix a damn thing. Somebody still needs to implement security properly. The entire list of what the app people could screw up is the same list that the DB people could screw up.

The first two points lead to scalabilty problems.

dkf

@Jaime said:

It doesn't fix a damn thing.

QFT (and what were we talking about anyway?)

Jaime

@Groaner said:

This should always be at the bottom of the list, when there are no other alternatives.

Triggers are magical things. They're so magical that people forget that they exist, and that's where the problems start.

This only matters if you put business logic in a trigger. You are supposed to forget that triggers exist. A good analogous technology is access logging for a web server. It happens for every call to the web server, but you almost never have to think about it.
@Groaner said:

If you use a trigger to keep the data in sync, then you have to concern yourself with all the subtle ways that trigger (and ostensibly a few others) can fire. It also leads to some fun debugging sessions.

If you use triggers for the tasks they were designed for (denormalization and auditing), it is really simple to thoroughly test them. Then they are like any other background task - you ignore them when debugging other stuff.

BTW, there are no subtle ways to fire a trigger. A trigger has an FOR clause - it happens then and no other time.

abarker

@Jaime said:

The reason a trigger is a better answer than a sproc is that a trigger makes it so you can rely on that denormalized data.

I feel the need to point out that triggers are just a subclass of sprocs. At least, that's how SQL Server handles them.

Jaime

@abarker said:

I feel the need to point out that triggers are just a subclass of sprocs. At least, that's how SQL Server handles them.

This thread is about how the next layer up interacts with the database. How SQL Server internally handles triggers is irrelevant in this context.