TRWTF is that they had to write this

Scarlet_Manuka

Background information on Oracle database rowids, which are used internally in indexes and such:

The rowid of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by specifying its rowid is the fastest way to retrieve a single row, because the exact location of the row in the database is specified.

So I was looking up syntax for an Oracle index hint, and I came across this in the documentation:

Rowids are an internal Oracle representation of where data is stored. They can change between versions. Accessing data based on position is not recommended, because rows can move around due to row migration and chaining and also after export and import. Foreign keys should be based on primary keys. For more information on rowids, see Oracle Database Application Developer's Guide - Fundamentals.

The scary part is that you know this means that someone, somewhere, tried using rowids as a foreign key, and then complained to Oracle when it stopped working.

blek

We once got a bug report from a customer saying that while our product starts up, if you keep clicking the left mouse button in the console (and I mean keep clicking like it's the goddamn Cookie Clicker), the product might stop accepting mouse and keyboard input after startup completes. It wasn't even 100% reproducible, it only happened on certain kind of hardware and even then only occasionally.

Developers looked at the report, did a double take, and collectively went "we're not going to hunt that bug down and fix that, what the fuck is wrong with you". The customer didn't like that and was absolutely adamant that if it's not fixed, it should at least be documented, and so it landed in my bug queue.

I spent half a fucking day trying to figure out how to say this without using words like "retard", "how did you manage to presumably reach adulthood", or "you goddamned manchild, go play with your foreskin, not the mouse" - 'cause, you know, none of these are compliant with our style guide. The result is that our documentation, the part where it describes startup, contains a note saying something like "avoid providing excessive keyboard or mouse input while the application is starting".

aliceif

@blek said in TRWTF is that they had to write this:

Developers looked at the report, did a double take, and collectively went "we're not going to hunt that bug down and fix that, what the fuck is wrong with you".

I assume your devs are total retarded morons?

Unresponsive GUI is BAD.
Unresponsive GUI that misbehaves when a user clicks again/repeatedly because they see no response is super-bad.

People who tell other people that they click / do stuff "too quickly" (fearing that something "might break") are the worst and evil. Fuck those people.

gwowen

@blek said in TRWTF is that they had to write this:

"avoid providing excessive keyboard or mouse input while the application is starting"

... because our developers are exactly the right mix of stupid/lazy.

blek

@aliceif
I probably should have described it in more detail. It wasn't a GUI (yet) - at the point where this was happening, all you saw was text-based terminal output which was not interactive at all. The user had no reason to expect any kind of response to clicking at all. Imagine the user's computer is going through POST and the user is clicking the mouse like a maniac - it's that kind of situation.

Sumireko

@aliceif said in TRWTF is that they had to write this:

@blek said in TRWTF is that they had to write this:

Developers looked at the report, did a double take, and collectively went "we're not going to hunt that bug down and fix that, what the fuck is wrong with you".

I assume your devs are total retarded morons?

Unresponsive GUI is BAD.
Unresponsive GUI that misbehaves when a user clicks again/repeatedly because they see no response is super-bad.

People who tell other people that they click / do stuff "too quickly" (fearing that something "might break") are the worst and evil. Fuck those people.

I mean seriously, just throw on a throbber and ignore click events while loading is in progress (note: things like the menu bar should still be processed like File->Exit). Some of these issues are "unavoidable", but at least have the decency to compensate for this in the UI.

Groaner

@Scarlet_Manuka said in TRWTF is that they had to write this:

The scary part is that you know this means that someone, somewhere, tried using rowids as a foreign key, and then complained to Oracle when it stopped working.

If they'd kept their big mouths shut and didn't write the first paragraph, they wouldn't have needed to write the second paragraph.

Tsaukpaetra

@blek said in TRWTF is that they had to write this:

, if you keep clicking the left mouse button in the console (and I mean keep clicking like it's the goddamn Cookie Clicker), the product might stop accepting mouse and keyboard input after startup completes.

Sounds like the app isn't attaching to the event queue quickly enough and therefore filling the queue to the buffer before it attaches could be the cause of this behaviour.

anotherusername

@Groaner said in TRWTF is that they had to write this:

@Scarlet_Manuka said in TRWTF is that they had to write this:

The scary part is that you know this means that someone, somewhere, tried using rowids as a foreign key, and then complained to Oracle when it stopped working.

If they'd kept their big mouths shut and didn't write the first paragraph, they wouldn't have needed to write the second paragraph.

And if they didn't made their internal rowid accessible, they wouldn't have needed to write the first paragraph.

But then they'd probably have to figure out how to make limit work correctly.

boomzilla

@anotherusername said in TRWTF is that they had to write this:

But then they'd probably have to figure out how to make limit work correctly.

I think you're confusing rownum with rowid.

anotherusername

@boomzilla that thought briefly crossed my mind as I was writing it, but dismissed quickly for . Both of them are , anyway...

PleegWat

@boomzilla said in TRWTF is that they had to write this:

@anotherusername said in TRWTF is that they had to write this:

But then they'd probably have to figure out how to make limit work correctly.

I think you're confusing rownum with rowid.

I've used rowid exactly once, in a 'select for update' scenario. I could imagine that kind of situation being more common in the distant past when doing complicated stuff in SQL was relatively more expensive.

Scarlet_Manuka

@Groaner said in TRWTF is that they had to write this:

If they'd kept their big mouths shut and didn't write the first paragraph, they wouldn't have needed to write the second paragraph.

While this is true, the first part was in the context of explaining how accessing a row through an index works and why an access by rowid normally follows an index lookup in an execution plan. I can understand why they didn't expect people to think "that sounds like a good way for me to implement a foreign key" – though I agree that they should have; writing "this is the fastest way to get a row" is asking for trouble.

@PleegWat said in TRWTF is that they had to write this:

I've used rowid exactly once, in a 'select for update' scenario. I could imagine that kind of situation being more common in the distant past when doing complicated stuff in SQL was relatively more expensive.

I've also used it once, when there were two completely identical rows in a table and I wanted to delete exactly one of them. I mean, yes, you could just delete them both and insert one back in (I didn't have any triggers to worry about), but using rowid was a more elegant method.

ben_lubar

@Scarlet_Manuka said in TRWTF is that they had to write this:

someone, somewhere, tried using rowids as a foreign key

paging @accalia

anotherusername

@Scarlet_Manuka said in TRWTF is that they had to write this:

there were two completely identical rows in a table

TRWTF. Tables that don't have at least one unique primary key field, and databases that allow them to exist.

accalia

@ben_lubar said in TRWTF is that they had to write this:

@Scarlet_Manuka said in TRWTF is that they had to write this:

someone, somewhere, tried using rowids as a foreign key

paging @accalia

ara?

your point?

ben_lubar

@accalia said in TRWTF is that they had to write this:

@ben_lubar said in TRWTF is that they had to write this:

@Scarlet_Manuka said in TRWTF is that they had to write this:

someone, somewhere, tried using rowids as a foreign key

paging @accalia

ara?

your point?

You tried using rowids as a foreign key.

accalia

@ben_lubar said in TRWTF is that they had to write this:

@accalia said in TRWTF is that they had to write this:

@ben_lubar said in TRWTF is that they had to write this:

@Scarlet_Manuka said in TRWTF is that they had to write this:

someone, somewhere, tried using rowids as a foreign key

paging @accalia

ara?

your point?

You tried using rowids as a foreign key.

successfully too.

your point?

ben_lubar

@accalia said in TRWTF is that they had to write this:

your point?

That was my point.

accalia

@ben_lubar said in TRWTF is that they had to write this:

@accalia said in TRWTF is that they had to write this:

your point?

That was my point.

soooo....... i did a good and you pointed it out...... why?

candlejack1

@accalia a good wtf? or sqlite is different? rowids change

accalia

@candlejack1 said in TRWTF is that they had to write this:

rowids change

in my experience..... no, they really don't.

not in sqlite, they're well documented and the rules say that the rowid won't change unless i delete the db or mess about with database dumps. i never delete from pages and.... well it's just for statsporn so if i bork the db i just start over, no need to dump/import at all, let alone weirdly.

Scarlet_Manuka

@anotherusername said in TRWTF is that they had to write this:

TRWTF. Tables that don't have at least one unique primary key field, and databases that allow them to exist.

In this case the table in question was a staging table for our BI database. And in fact the reason the duplicates caused a problem was that we had a unique index defined, but the BI staging table process is generally drop indexes, load staging table, create indexes, and it failed to create the index because of the duplicates.

If we didn't have the index drop and create, it would have failed during record insertion, which actually would have been more of a pain to recover from (because we'd have an incomplete data set in the staging table).

dkf

@Scarlet_Manuka said in TRWTF is that they had to write this:

the reason the duplicates caused a problem was that we had a unique index defined

Sounds like that index really isn't unique and that someone slipped up during schema design.

asdf

@Scarlet_Manuka said in TRWTF is that they had to write this:

which actually would have been more of a pain to recover from (because we'd have an incomplete data set in the staging table)

Ummm… transactions?

Scarlet_Manuka

@dkf said in TRWTF is that they had to write this:

@Scarlet_Manuka said in TRWTF is that they had to write this:

the reason the duplicates caused a problem was that we had a unique index defined

Sounds like that index really isn't unique and that someone slipped up during schema design.

It should be unique. In this case, from memory, someone did something weird in the source data and we had the same role appearing twice on a deal, which should be prevented by application logic. That caused two copies of the deal to be extracted.

@asdf said in TRWTF is that they had to write this:

Ummm… transactions?

The Informatica default paradigm is to commit every 10k records. If you get a insertion failure on record 85,320 out of 122,188, you'll wind up with 80,000 records committed to the staging table. You generally don't want to do single-commit-for-everything if you're loading millions of rows into a table (unless you have explicitly planned and sized your DB for this).

Indeed, if you're dealing with unreliable systems you may need to dial it right down. One of my colleagues wrote a process which commits after every record, because the webservice they're using is prone to failure and it's vital that our system accurately matches the invoices that have been created by the webservice. So every successful response goes and commits the matching record in our system. It's slow of course, but at least we keep things in sync.

Note that in this sort of task a partial load isn't difficult to recover from, it's just a matter of running the staging table load again once the source problem is fixed. (It's set to truncate the staging table before it loads it, so we can just reload all the data if required.) Of course if there's a lot of data (or the source queries are slow) this may take a while.

On the other hand, loading it all into the staging table and then having the error happen when you recreate the index means that you can fix it without having to immediately clean up the source data (which is good if you can't figure out exactly what the source data should look like, or don't have the authority to change it), and you don't have to rerun the staging table load. So it's a little easier and quicker, and you can make the people who stuffed up the data fix it. Sometimes they might even learn not to do it again.

Yamikuronue

@Scarlet_Manuka Do you work with me? XD

Just last week my tester group sent an order in to our system using a captured XML packet as a template and forgot to change the order number, which broke referential integrity and brought down our demo system somehow.

blakeyrat

@Scarlet_Manuka said in TRWTF is that they had to write this:

In this case the table in question was a staging table for our BI database. And in fact the reason the duplicates caused a problem was that we had a unique index defined, but the BI staging table process is generally drop indexes, load staging table, create indexes, and it failed to create the index because of the duplicates.

Business "intelligence".

"DROP ENTIRE TABLE, RELOAD FROM SCRATCH! Surely the most intelligent way of accomplish--"

"Why don't we just insert the new rows?"

"FUCK YOU! INTELLIGENCE! INTELLIGENCE MEANS MAKING THE SERVER CHUG FOR HOURS FOR NO PURPOSE!"

@Scarlet_Manuka said in TRWTF is that they had to write this:

If we didn't have the index drop and create, it would have failed during record insertion, which actually would have been more of a pain to recover from (because we'd have an incomplete data set in the staging table).

Well you could use a transaction. That's... kind of exactly what they're for.

blakeyrat

@Scarlet_Manuka said in TRWTF is that they had to write this:

The Informatica default paradigm is to commit every 10k records.

Sounds like a shitty product.

If you reload the entire table from scratch any time, I'm not sure why bad data or automatic commits would make it "hard to recover from". Just do a "truncate" on the table and start the process over.

Tsaukpaetra

@blakeyrat said in TRWTF is that they had to write this:

Business "intelligence".
"DROP ENTIRE TABLE, RELOAD FROM SCRATCH! Surely the most intelligent way of accomplish--"
"Why don't we just insert the new rows?"
"FUCK YOU! INTELLIGENCE! INTELLIGENCE MEANS MAKING THE SERVER CHUG FOR HOURS FOR NO PURPOSE!"

Yeah. We have a "batch processing" section of time every day where most of the user access is disabled just so we can truncate and reload several billion records in our databases.

djls45

@Tsaukpaetra said in TRWTF is that they had to write this:

@blakeyrat said in TRWTF is that they had to write this:

Business "intelligence".
"DROP ENTIRE TABLE, RELOAD FROM SCRATCH! Surely the most intelligent way of accomplish--"
"Why don't we just insert the new rows?"
"FUCK YOU! INTELLIGENCE! INTELLIGENCE MEANS MAKING THE SERVER CHUG FOR HOURS FOR NO PURPOSE!"

Yeah. We have a "batch processing" section of time every day where most of the user access is disabled just so we can truncate and reload several billion records in our databases.

This is beyond stupid. How can you ensure data integrity? If I changed a field in the input data, then that would be lost in the myriads of records that get re-inserted every day. The only way this could be worse (and might be reportable to the appropriate authorities) is if the data at all belongs or is related to other people (credit card info, health records, etc.).

Tsaukpaetra

@djls45 said in TRWTF is that they had to write this:

@Tsaukpaetra said in TRWTF is that they had to write this:

@blakeyrat said in TRWTF is that they had to write this:

Business "intelligence".
"DROP ENTIRE TABLE, RELOAD FROM SCRATCH! Surely the most intelligent way of accomplish--"
"Why don't we just insert the new rows?"
"FUCK YOU! INTELLIGENCE! INTELLIGENCE MEANS MAKING THE SERVER CHUG FOR HOURS FOR NO PURPOSE!"

Yeah. We have a "batch processing" section of time every day where most of the user access is disabled just so we can truncate and reload several billion records in our databases.

This is beyond stupid. How can you ensure data integrity? If I changed a field in the input data, then that would be lost in the myriads of records that get re-inserted every day. The only way this could be worse (and might be reportable to the appropriate authorities) is if the data at all belongs or is related to other people (credit card info, health records, etc.).

The theory is that the databases that get trunked and reloaded aren't "the source of record". somehow trunk-n-reload is more efficient than transactionalizing, I guess.

sloosecannon

@Scarlet_Manuka said in TRWTF is that they had to write this:

@dkf said in TRWTF is that they had to write this:

@Scarlet_Manuka said in TRWTF is that they had to write this:

the reason the duplicates caused a problem was that we had a unique index defined

Sounds like that index really isn't unique and that someone slipped up during schema design.

It should be unique. In this case, from memory, someone did something weird in the source data and we had the same role appearing twice on a deal, which should be prevented by application logic. That caused two copies of the deal to be extracted.

@asdf said in TRWTF is that they had to write this:

Ummm… transactions?

The Informatica default paradigm is to commit every 10k records. If you get a insertion failure on record 85,320 out of 122,188, you'll wind up with 80,000 records committed to the staging table. You generally don't want to do single-commit-for-everything if you're loading millions of rows into a table (unless you have explicitly planned and sized your DB for this).

Indeed, if you're dealing with unreliable systems you may need to dial it right down. One of my colleagues wrote a process which commits after every record, because the webservice they're using is prone to failure and it's vital that our system accurately matches the invoices that have been created by the webservice. So every successful response goes and commits the matching record in our system. It's slow of course, but at least we keep things in sync.

Note that in this sort of task a partial load isn't difficult to recover from, it's just a matter of running the staging table load again once the source problem is fixed. (It's set to truncate the staging table before it loads it, so we can just reload all the data if required.) Of course if there's a lot of data (or the source queries are slow) this may take a while.

On the other hand, loading it all into the staging table and then having the error happen when you recreate the index means that you can fix it without having to immediately clean up the source data (which is good if you can't figure out exactly what the source data should look like, or don't have the authority to change it), and you don't have to rerun the staging table load. So it's a little easier and quicker, and you can make the people who stuffed up the data fix it. Sometimes they might even learn not to do it again.

I fear you may have missed the point of the "Umm... transactions"

He's asking why you don't wrap the whole thing in one huge transaction, not why you don't transactionalizatify each individual one.

Then, when that error occurs on record 85,320, you aren't completely and totally screwed with partial data committed, and it can roll it all back at once...

Tsaukpaetra

@sloosecannon said in TRWTF is that they had to write this:

transactionalizatify

Holy ! I got a notification that you mentioned me!
I think?

dkf

@sloosecannon said in TRWTF is that they had to write this:

He's asking why you don't wrap the whole thing in one huge transaction, not why you don't transactionalizatify each individual one.

Because committing a transaction takes quite a bit of time. That matters when you've a billion records…

sloosecannon

@Tsaukpaetra said in TRWTF is that they had to write this:

@sloosecannon said in TRWTF is that they had to write this:

transactionalizatify

Holy ! I got a notification that you mentioned me!
I think?

I tried to, but it broke the abbr so I removed it

Scarlet_Manuka

@blakeyrat The staging table IS just the new and updated rows. I should have been clearer about that but I didn't think it was germane at the time I was writing it.

We don't in general truncate and reload our target BI tables on a nightly basis because yes, that would be stupid. (Though the previous BI system we replaced 10 years ago did that; it was an Access database with a collection of scripts that scraped the relevant data every night. Hilariously awful as it was, I'm told that what it replaced was even worse.)

I'm not sure why bad data or automatic commits would make it "hard to recover from"

As I explicitly noted in my previous post, it's not hard to recover from, and the recovery process is pretty much what you said. It's just that the other way is even less work - and, sometimes importantly, doesn't require me to fiddle around with the source data in cases where I don't understand what the source data should be.

Tsaukpaetra

@Scarlet_Manuka said in TRWTF is that they had to write this:

We don't in general truncate and reload our target BI tables on a nightly basis because yes, that would be stupid

I probably shouldn't mention that we scrape reports generated from this daily data from Excel files dumped from Report Server.

Matches

@Tsaukpaetra if your data deals with a lot of edit/delete records, it actually is faster in a lot of cases to truncate, bulk insert instead of insert, update, delete. The main reason is the transaction log.

That being said, it's a very specific use case in very specific live data environments, and should never be used for sources of truth. But when you meet all the check boxes, it is the best route in terms of minimal downtime to record updates required.

Reporting and data aggregates tend to be the scenario where this makes sense, rather than the working system.

Tsaukpaetra

@Matches said in TRWTF is that they had to write this:

if your data deals with a lot of edit/delete records

Nope, this is account snapshot data essentially. Less than 0.02% change in any given table.

ben_lubar

@Tsaukpaetra said in TRWTF is that they had to write this:

Less than 0.02% change in any given table.

Any statistics on how many records fail to insert on average?

If that number isn't 0, something is terribly wrong.

Tsaukpaetra

@ben_lubar said in TRWTF is that they had to write this:

@Tsaukpaetra said in TRWTF is that they had to write this:

Less than 0.02% change in any given table.

Any statistics on how many records fail to insert on average?

If that number isn't 0, something is terribly wrong.

Given that most of the tables only have one identity index on them (typically {TABLENAME}_SEQ_ID ).... Once in a while, actually.
In fact, any releases must include a flag in the PRB request whether or not the release could potentially affect Batch. That's how serious doing anything to these tables is.

ben_lubar

@Tsaukpaetra it sounds like your company said "we want to use a relational database, but we also like all the problems caused by NoSQL databases, so if we could have both, that would be great".

Tsaukpaetra

@ben_lubar said in TRWTF is that they had to write this:

@Tsaukpaetra it sounds like your company said "we want to use a relational database, but we also like all the problems caused by NoSQL databases, so if we could have both, that would be great".

Yeah, I can probably count on my hand how many relations there are in this database.
That's not to say things aren't relational (for example, the VARCHAR LOAN_NUMBER is usually joinable to the INT ACCTNO) but there's nothing there that says they must be.

dkf

@Tsaukpaetra said in TRWTF is that they had to write this:

VARCHAR is usually joinable to the INT

Matches

@Tsaukpaetra then there's only one thing to do.

Create a handful of new tables that have a foreign key to the truncated tables. It will make the truncates fail.

You don't happen to be working for a blue company related to houses?

Tsaukpaetra

@Matches said in TRWTF is that they had to write this:

@Tsaukpaetra then there's only one thing to do.

Create a handful of new tables that have a foreign key to the truncated tables. It will make the truncates fail.

You don't happen to be working for a blue company related to houses?

Well, I've never heard it described exactly like that, but... maybe? Don't worry, we #OwnIt.

Matches

@Tsaukpaetra company name start with an s, and end with an s?

Tsaukpaetra

@Matches said in TRWTF is that they had to write this:

@Tsaukpaetra company name start with an s, and end with an s?

Nope, colder . D and h

ben_lubar

@Tsaukpaetra said in TRWTF is that they had to write this:

D and h

Weapon Demonstration: Dalokohs Bar – 02:34
— OfficialTF2Wiki