Tools misusage

ubersoldat

Hi

I'm working on getting rid of a tool (legacy, unmaintainable, broken) we use to monitor our sales data which was outsourced to a far away land where they either didn't know about JDBC in 2004 or were paid by the hour. So, this is a pretty straight forward struts/jsp application which retrieves data from a database and shows a few broken SVG graphs. That's it. Should be easy, right?

Well, no. The magic lies in how this thing retrieves the data from our Oracle database:

QueryResult=`$ORACLE_HOME/bin/sqlplus -silent "$DBUserName"/"$DBPassword"@"$DBSID" <<END
spool QueryOutput.txt;
@$SQLFile;
spool off;
exit;
END`

cat QueryOutput.txt

If that seems familiar to you, it's because it's Bash.

Now, in the crontab there are 4 different entries that call a bunch of Bash script, which call another bunch of Bash and AWK scripts which generate "dynamic" SQL queries, from a bunch of .sql files which call this script correctly named "FireSQL.sh" which generates a bunch of .out files which somehow generate a bunch of .dat and .svg files which are then, and somehow, shown by the whole struts/jsp site.

Now I know why we want to get rid of this piece of shit.

As a side WTF, the generated .svg files only work in older versions of MSIE using some piece of shit Adobe's SVG viewer which is deprecated because this generated SVG files' headers are broken and to fix this header, I had to dig into a whole stinking mess of Bash/AWK/SQL code.

Ugh!

Xyro

Dude! Java? Bash? AWK? THAT PROJECT SOUNDS AWESOME

...wait, SQL? Ugh, no wonder you want to ditch it.

dogbrags

1) If it were easy, they wouldn't pay us so much

2) What is wrong with this? You have a server that generates output files on a scheduled basis; a client application that reads these files whenever the user demands them.

If the output files take a long time to generate or require considerable resources, it makes sense to generate them in the background on a scheduled basis, rather than generate them each time the client requests them. Depending on the number of clients, you could overwhelm the system with dynamic queries.

A better approach might be a webservice, where the client (rather than reading the file directly from the server) asks the server for the content. This decouples the client from the server, because the client no longer needs to know the name/location of the file; all it needs to do is ask the server for the information. The server is free to deliver pre-generated content or dynamically create content if necessary.

But, assuming that the reason the cron was set up in the first place is the load on the system, then I don't see anything wrong with using scripts on the server to generate the files. Yes, I'd update the svg to current standards; but otherwise I'd leave it alone.

On the other hand, if the queries are relatively lightweight, then I'd certainly rework this. The reason for that is that the svg files are really "presentation" layer graphs; and ideally you'd want the presentation layer (that is, the client) to decide how to present the data to the user. If that means a graph, then the client should render the graph based on data from the sql result set -- not by reading/displaying a pre-generated graph.

henke37

I get the feeling that the code is already separated into query making scripts and graph making scripts. So there would be nothing to gain by separating gui and backend logic, because it is already separated.

Nagesh1

Your report looks to be the right candidate for an ORM based design. Have you considered possibility for SoA architecture?

TGV

I'm sure it could be done a bit more elegantly, but building O/JDBC apps just to run a query where sqlplus can do the same is not efficient. It may be more standard, but that's only because everyone's drilled in thinking you can't approach databases without Java, JDBC and a seven tiers of enterprisey XML marshalling.

If the bash/awk stuff is messy, that should be fixed, but the general idea is not bad per se.

blakeyrat

@TGV said:

I'm sure it could be done a bit more elegantly, but building O/JDBC apps just to run a query where sqlplus can do the same is not efficient. It may be more standard, but that's only because everyone's drilled in thinking you can't approach databases without Java, JDBC and a seven tiers of enterprisey XML marshalling.
If the bash/awk stuff is messy, that should be fixed, but the general idea is not bad per se.

If you have the resources, you could probably rebuild that using OLAP and a charting library in no time at all. As an added bonus, you'd be able to change date ranges and filters and such in real-time.

Xyro

@TGV said:

If the bash/awk stuff is messy, that should be fixed

with more bash/awk! :D

Or maybe Perl.

Or all three!! :D

@Xyro said:

@TGV said:
If the bash/awk stuff is messy, that should be fixed

with more bash/awk! :D

Or maybe Perl.

Or all three!! :D

Don't forget python and sed.

shimon

"Now, in the crontab there are 4 different entries that call a bunch of Bash script, which call another bunch of Bash and AWK scripts which generate "dynamic" SQL queries, from a bunch of .sql files which call this script correctly named "FireSQL.sh" which generates a bunch of .out files which somehow generate a bunch of .dat and .svg files which are then, and somehow, shown by the whole struts/jsp site."

This is called modular design, you insensitive clod!

TGV

@blakeyrat said:

If you have the resources, you could probably rebuild that using OLAP and a charting library in no time at all. As an added bonus, you'd be able to change date ranges and filters and such in real-time.

They already have the queries, so ... perhaps. Creating these reports offline has a great advantage: speed. If you can use simple analytic functions, fine, but if have to set up special analytic functions for it, that might take some time. And Oracle's OLAP costs $23k, and requires the Enterprise Edition. And you'd still need an offline scheduler, so there are bound to be crontabs and shell scripts around (or Windows tasks and BAT files if you prefer).

blakeyrat

@TGV said:

They already have the queries, so ... perhaps. Creating these reports offline has a great advantage: speed.

If you're generating reports where the few milliseconds of difference between serving up a pre-computed image and building an OLAP report actually matters, then you're treading deep waters indeed.

@TGV said:

And Oracle's OLAP costs $23k, and requires the Enterprise Edition.

Yeah, well, like I said, "if you have the resources." If you're not already paying for it, then don't worry about it.

@TGV said:

And you'd still need an offline scheduler, so there are bound to be crontabs and shell scripts around (or Windows tasks and BAT files if you prefer).

SQL Server has the scheduling built-in. If you're writing batch files, or using task scheduler, you're doing something wrong.

TGV

@blakeyrat said:

If you're generating reports where the few milliseconds of difference between serving up a pre-computed image and building an OLAP report actually matters, then you're treading deep waters indeed.

Indeed, and that's not what I meant. I meant that probably (because it's just guessing) the queries take too long to run them on demand, so some kind of offline process is required. Generating the graphs is taking it one step too far.

And indeed, I hadn't considered running it as a normal db job (we're talking Oracle, not SQL Server). But I don't see a great difference between that and crontab, as long as you've got someone that can support it.

Qwerty

@TGV said:

They already have the queries, so ... perhaps.

No, I don't think they do. It was specifically mentioned that these queries are "dynamic" and are created as text files by the bash scripts. This opens up lovely possibilities for simultaneous users to overwrite each other's queries.

In SQL, no-one can hear you scream.

TGV

@Qwerty said:

No, I don't think they do. It was specifically mentioned that these queries are "dynamic" and are created as text files by the bash scripts. This opens up lovely possibilities for simultaneous users to overwrite each other's queries.

I think that probably refers to adding time stamps and such to the queries, and no matter what awk magic, they can always be made with some kind of stored procedure.

In SQL, no-one can hear you scream.

I remember the first man in SQL. He went for 12 minutes outside the mainframe environment. He had no means to control his jobs other than pulling on his tether. Afterwards, he claimed this was easy, but his SQL suit ballooned from its internal pressure against the vacuum of SQL, stiffening so much that he could not activate the shutter on his chest-mounted camera.

At the end of his SQL walk, the suit stiffening caused a more serious problem: He had to re-enter the capsule through the inflatable cloth table lock. After his SQL walk, he improperly entered the table lock head-first and got stuck sideways. He could not get back in without a rollback, risking "data loss". This added another 12 minutes to his time in SQL, and he was overheated by from the exertion. It would be almost four years before anyone tried this again.

Xyro

Truly, the future of SQL exploitation will not be with manned editors and manually-navigated reports, but fully unmanned, fully automated robots. It may not be as romantic a notion, by why risk the dangerous and inhuman conditions for something we fundamentally cannot achieve without layers upon layers of technological support anyway?

This, of course, leads one to wonder: if man is not made for SQL, that is, if SQL is best plumbed with automatons of our own design, why does it have such an alluringly familiar yet godawful syntax?

(Hey, it's kind of like XML. Theoretically human-readable, but touching it with notepad.exe is a fool's idea.)

TGV

SQL is an environment hostile to men. Only the bravest dare go there, strapped in a small chair on top of a large canister containing hundreds of tons of highly combustible material. The minutest error can lead to devastating effects and loss of production data. One misplaced semicolon, and ... BANG, there go years of work. It's probably the danger that attracts us. We are bold explorers, seeking out new civilizations in stored and remote procedures, with weapons hot, ready to combat sluggish performance, infinite backups, and moody admins. That's probably why there are so many women in SQL exploration. It's just not in their genes to expand the horizon across the safe borders of an online transaction, into the realms of warrior data cubes.

@Xyro said:

(Hey, it's kind of like XML. Theoretically human-readable, but touching it with notepad.exe is a fool's idea.)

That's a good one. The other day I tried to format a 500 line SELECT statement in emacs. You can guess the outcome...

Qwerty

@TGV said:

We are bold explorers, seeking out new civilizations

If we apply the Drake Equation, the number of civilisations to discover is:

N = R* . Ft . Nt . Fd . Fi . L

Where

R* = Rate of server installation

Ft = Fraction of those servers that contain tables

Nt = Average number of tables per server

Fd = Fraction of those tables that contain data

Fi = Fraction of those tables that contain intelligence

L = Length of time to run SELECT * query

Using conservative estimates for the above parameters, the number of civilisations is less than 1. The civilization you are living in does not exist or if it does, it will soon cease to exist.