Maximum Innefficiency

codefanatic

As I've mentioned previously the application I have inherited has a very screwy database design. Today though I discovered a wtf that actually manages to span two databases.

To understand whats going on you first have to know a little about how the application is setup.

We have three servers, a webserver which hosts the web app, a database server (DB1) which sits in the same rack as the webserver and services the web app (ie it stores users accounts and tool permission). Finally we have a second database server (DB2) which operates as the primary datasource for client data.

Now when a user logs in they get all their permissions from DB1, however there is one tool which is only available to users if a certain type of record (specialReport) in DB2 is associated with the user account.

The way this record is maintained is that the users table is replicated from DB1 to DB2. Then an association table links the user record on DB2 to the specialReport record. This user table replication by itself is a major wtf, but its not the star of the show. The star is the way the webapp handles checking if a user has this special report is this.

On DB2 there is a stored procedure which returns a two column query userid, specialreportid. specialreportid is a unique val but not relevent here as we only want to know if any association exists. The webapp automatically calls this stored proc every 15 minutes and loops through the list of userId's and compares it with the user group list in DB1. If any user is not in the group on db1 the new userid gets added to the group. The group access is then checked at login.

The reason I discovered this was that I kept seeing spikes in memory usage every 15 minutes that I could not attribute to normal user activity.

When I checked the stored proc I discovered that it was returning 34,970 records each time it was being called.

24hrs * 4 (every 15 minutes) * 34970 records = 3,357,120 records pulled every day. While not enough to bring down the web server by itself it certainly adds to the many other performance issues.

Further checking showed that the majority of these where duplicates. Adding a "distinct" to the stored proc query reduced the records returned from 34,970 to 975.

So now we are down to 24 * 4 * 975 = 93,600 records pulled each day

You'd think great, except that we only have on average 600 users logging in each day, and the majority of them dont even have this specialReport.

So I pull out this entire code block and add a single query to DB2 for report access by userid to the login code so now we only have a maximum 600 aprox records returned from the database.

If I tried I could not have come up with a more inneficient way to handle this. The real beauty of this is that it the original developer claims to be a former dba. Now I understand the "former" part.

ekolis

Wait, so it only checks if users have been GRANTED access to the special report, not that the access has been REVOKED? That's... scary!

codefanatic

@ekolis said:

Wait, so it only checks if users have been GRANTED access to the special report, not that the access has been REVOKED? That's... scary!

Yup.

As part of my update I checked for users who should have it revoked and came up with 43 users who should not have access any more.

Of course I covered this use case in the update which needless to say means I'll start getting complaints that these users who should not have access but did up till today can no longer get access. Just a day in the life.

Speakerphone_Dude

@codefanatic said:

we have a second database server (DB2) which operates as the primary datasource for client data.

So the second server is the primary and it runs DB2?

tgape

@codefanatic said:

Finally we have a second database server (DB2) which operates as the primary datasource for client data.

Ah. One of *those* companies. Personally, I prefer companies that understand client data comes from the clients, and the database only acts as a repository. Many people would say it's a minor point, but it's pretty impressive how significant that difference in attitude can be.

@codefanatic said:

the original developer claims to be a former dba. Now I understand the "former" part.

I've talked to far too many DBAs who seem to only be concerned that all of their database's queries have a high results per second average. I apparently cannot communicate well, as I've never been able to adequately explain to any of them exactly why the metric they're optimizing on is not helpful to the goals of their company. (I understand how they can be quite helpful to the goals of the DBAs, in the face of management who confuse large hardware budgets with the importance of the group who runs the hardware.) So, unfortunately, I don't think that necessarily explains the 'former' part, in much the same way as you haven't explained that the original 'developer' is also a former 'developer'.

Speakerphone_Dude

@tgape said:

@codefanatic said:
Finally we have a second database server (DB2) which operates as the primary datasource for client data.

Ah. One of *those* companies. Personally, I prefer companies that understand client data comes from the clients, and the database only acts as a repository. Many people would say it's a minor point, but it's pretty impressive how significant that difference in attitude can be.

Don't worry, you'll make your way further down the path of data management and someday discover that data does not "come" from the clients, it's a strategic asset.

I win.

Ben L.

@Speakerphone Dude said:

@tgape said:
@codefanatic said:
Finally we have a second database server (DB2) which operates as the primary datasource for client data.

Ah. One of *those* companies. Personally, I prefer companies that understand client data comes from the clients, and the database only acts as a repository. Many people would say it's a minor point, but it's pretty impressive how significant that difference in attitude can be.

Don't worry, you'll make your way further down the path of data management and someday discover that data does not "come" from the clients, it's a strategic asset.

I win.

I come from the “Data? What data?” philosophy. Weekly maintainence involves deleting old data to make room for new data. More input means shorter weeks.

codefanatic

@Speakerphone Dude said:

@codefanatic said:
we have a second database server (DB2) which operates as the primary datasource for client data.

So the second server is the primary and it runs DB2?

Confusing isnt it. Yes the naming convention is that the database server we consider to be the primary is labeled DB2 and the database server we consider to be secondary is labeled DB1. I once was in a meeting where I observed two people discussing some update to one of the machines and they both kept referring to the machine as "server one" for 20 minutes. The next day there was a bit of a fuss because the update was applied to the primary database (DB2) and not the secondary DB1. I was still new to the company so I stayed out of that whole discussion/mess.

The real point that I guess I skipped over was'nt the fact that the webserver pulled 3,000,000+ records in a single day from DB2. It was that it would then loop over all the pulled records and perform a query against DB1 for each record individually. So on one side you have 96 queries (in the day) to DB2 pulling 35000~ records each time, and then on the other side the webserver making 35000~ matching queries to DB1 with each query pulling a single record (again happening 96 times during the day). All to determine if a user gets access to a particular tool when they login.

All I can say is thank the stars that we only have a few hundred users.

Speakerphone_Dude

@Ben L. said:

Weekly maintainence involves deleting old data to make room for new data. More input means shorter weeks.

Interesting: based on your statement it appears that the sheer volume of data can impact Time. I propose the following formula:

T = MB²

Speakerphone_Dude

@codefanatic said:

@Speakerphone Dude said:
@codefanatic said:
we have a second database server (DB2) which operates as the primary datasource for client data.

So the second server is the primary and it runs DB2?

Confusing isnt it. Yes the naming convention is that the database server we consider to be the primary is labeled DB2 and the database server we consider to be secondary is labeled DB1. I once was in a meeting where I observed two people discussing some update to one of the machines and they both kept referring to the machine as "server one" for 20 minutes. The next day there was a bit of a fuss because the update was applied to the primary database (DB2) and not the secondary DB1. I was still new to the company so I stayed out of that whole discussion/mess.
The real point that I guess I skipped over was'nt the fact that the webserver pulled 3,000,000+ records in a single day from DB2. It was that it would then loop over all the pulled records and perform a query against DB1 for each record individually. So on one side you have 96 queries (in the day) to DB2 pulling 35000~ records each time, and then on the other side the webserver making 35000~ matching queries to DB1 with each query pulling a single record (again happening 96 times during the day). All to determine if a user gets access to a particular tool when they login.
All I can say is thank the stars that we only have a few hundred users.

This is beautiful. You should tell the person who architected this that they created a reversed Hadoop. (aka: "Poodah")

codefanatic

@Speakerphone Dude said:

@codefanatic said:
@Speakerphone Dude said:
@codefanatic said:
we have a second database server (DB2) which operates as the primary datasource for client data.

So the second server is the primary and it runs DB2?

Confusing isnt it. Yes the naming convention is that the database server we consider to be the primary is labeled DB2 and the database server we consider to be secondary is labeled DB1. I once was in a meeting where I observed two people discussing some update to one of the machines and they both kept referring to the machine as "server one" for 20 minutes. The next day there was a bit of a fuss because the update was applied to the primary database (DB2) and not the secondary DB1. I was still new to the company so I stayed out of that whole discussion/mess.
The real point that I guess I skipped over was'nt the fact that the webserver pulled 3,000,000+ records in a single day from DB2. It was that it would then loop over all the pulled records and perform a query against DB1 for each record individually. So on one side you have 96 queries (in the day) to DB2 pulling 35000~ records each time, and then on the other side the webserver making 35000~ matching queries to DB1 with each query pulling a single record (again happening 96 times during the day). All to determine if a user gets access to a particular tool when they login.
All I can say is thank the stars that we only have a few hundred users.

This is beautiful. You should tell the person who architected this that they created a reversed Hadoop. (aka: "Poodah")

Well actually I gave him an opportunity to fix this mess. I was even less impressed with his "fix".

I mentioned earlier that this query is run every 15 minutes. He thought a good solution to the problems I described was to cache the DB2 query in memory (all 35000~ records). Apart from the fact that it ate up even more memory, the real kicker was he set the cache timeout to 5 minutes.

Yup, on a query that is called every 15 minutes exactly, he decided the fix to the problem was caching the resultset for 5 minutes.

I wish I was making this up, I really do.

pnieuwkamp

What was he thinking anyway? The sole reason for the query is to get changes. If you cache the result you might as well just run the qeury less often...

ekolis

@tgape said:

Ah. One of those companies. Personally, I prefer companies that understand client data comes from the clients, and the database only acts as a repository. Many people would say it's a minor point, but it's pretty impressive how significant that difference in attitude can be.

From the perspective of the application code, the client data DOES come from DB2. Where DB2 gets it is none of the application's business - if it really comes from the Mars rover or from /dev/rand, that's the DBA's problem, or the client's problem!

codefanatic

@pnieuwkamp said:

What was he thinking anyway? The sole reason for the query is to get changes. If you cache the result you might as well just run the qeury less often...

You are absolutely right.

He was caching a query that should never be cached in the first place as its purpose was to look for change
He was setting the cache expiration to 5 minutes so that when the query was called every 15 minutes the cache was marked as expired and rebuilt with new data from the database.
Finally I checked this morning and the policy is that permission changes made internally in the client database (DB2) should replicate out to the web app within 1 hour. So taking the laziest option for fixing this, he could have simply set the process to run every 60 minutes and not every 15 minutes.(Admittedly it would still be returning 825,000~ in the day, but thats still a big improvement over 3,300,000~.)

The clue is in the word you used "thinking". In short he was'nt.