My Forward-Seeing Predecessors

zelmak

So, I just discovered a database table in our live schema where my predecessors discovered NoSQL SQL.

That is:

CREATE TABLE ERRORS (
  "ERROR_ID" NUMBER(8,0) NOT NULL ENABLE,
  "KEY"      VARCHAR2(50 BYTE) NOT NULL ENABLE,
  "VALUE"    VARCHAR2(50 BYTE)
)

where every record with duplicate ID (why is it important to have a table with a specific name and then explicity name the ID field <TABLE_NAME>_ID?) represents a record.

I've also discovered it was abandoned back in 2005ish timeframe but persists because "We don't know what will break if we remove it."

Anketam

@zelmak said:

...
"We don't know what will break if we remove it."

I love this excuse. My response is there is one way to find out, *hits delete*. If the code still compiles and appears to run and test cant find any issue with it send it to production. I have done this several times and one time it did break something in production, upon investigation we discovered that by removing the 'does not appear to be doing anything' code there was a bigger underlying issue that the code was hiding, which explained several unusual things we had been seeing, but could not pin down. So I personally am not afraid of removing 'does not appear to be doing anything' or 'unused' code and breaking things, since by doing so it can reveal even bigger problems that you really need to fix.

Weps

@zelmak said:

(why is it important to have a table with a specific name and then explicity name the ID field <TABLE_NAME>_ID?)

That's something that really annoys me too. A table users, with fields user_id, user_name, user_address, user_age and so on. I guess some people are just inconsistent as hell and this is one way of trying to deal with it.

zelmak

@Anketam said:

If the code still compiles and appears to run and test cant find any issue with it send it to production.

In my situation (which I'm trying to change, but the Titanic doesn't turn quickly),

compiles + appears_to_run = test

One cannot develop tests for 700-ish classes, 140K LOC (440K LOT) when one doesn't know what a great deal of those classes are for NOR if the values that various methods output are even correct.

... is there an easy way to count the number of methods in a set of classes (.class directory or .jar file)?

Anketam

I see. So does your dependency model look something like this:

Alex Papadimoulis / Mar 7, 2011

Enterprise Dependency: Big Ball of Yarn

Not too long ago, I posted The Enterprise Dependency. Essentially, it was a visual depiction of a good ole' enterprise framework that was "several dozen megabytes chock full of helper classes like IEnterpriseAuthenticationProviderFactoryManagementFactory." Inspired by the diagram, commenter...

If that is the case, then yes removing 'unused' code would be a bad idea.

erikal

@zelmak said:

... is there an easy way to count the number of methods in a set of classes (.class directory or .jar file)?

I'm pretty sure you can abuse one of the bytecode manipulation APIs for such a purpose, but then I'd question if you can be arsed.

Julia1

For the other columns yes, redundancy not good, but having all the Ids prefixed means queries are more easy to debug. In the following:

SELECT * FROM Foo f JOIN Bar b ON f.Id = b.Id

it's not immediately obvious that the two IDs are completely unconnected, but

SELECT * FROM Foo f JOIN Bar b ON f.FooId = b.BarId

is clearly an error and should be "f.FooId = b.FooId".

mihi

@zelmak said:

... is there an easy way to count the number of methods in a set of classes (.class directory or .jar file)?

Not the easiest way, but will create you a really useful nice report with drill-down features like this one: Sonar. You'd also need source files for this. Having a Maven build file makes it really easy (just run mvn sonar:sonar), but without you just point it to sources and classes (and optionally test classes if you have any). Do enable "FindBugs report" and just skim over the blocker and critical ones - you will find lots of obvious bugs that way (like "if (x & 3 == 5)" or "if (foo && long_stuff && !foo)"). Do not enable "Technical Debt calculation" since it will shock you with the result (but I guess it might be almost right).

zelmak

@mihi said:

Not the easiest way, but will create you a really useful nice report with drill-down features like this one: Sonar. You'd also need source files for this. Having a Maven build file makes it really easy (just run mvn sonar:sonar), but without you just point it to sources and classes (and optionally test classes if you have any). Do enable "FindBugs report" and just skim over the blocker and critical ones - you will find lots of obvious bugs that way (like "if (x & 3 == 5)" or "if (foo && long_stuff && !foo)"). Do not enable "Technical Debt calculation" since it will shock you with the result (but I guess it might be almost right).

omg wow ... that is an amazing tool ... I've scavanged FindBugs and run it occasionally to fix 'yes, that's really a bug' bugs. But sonar looks 10x better. :sigh: Little chance of being able to get it, though ... institutional fear of anything that you don't pay for.

And then there's the fight to get money for those things you want to pay for.

morbiuswilters

@Julia said:

In the following:

SELECT * FROM Foo f JOIN Bar b ON f.Id = b.Id

it's not immediately obvious that the two IDs are completely unconnected

It's obviously a bug. You're joining two tables on their PKs. When would you need to do this? At the very least, it's so rare that it should immediately jump out to you, just as "f.FooId = b.BarId" does.

zelmak

@Anketam said:

I see. So does your dependency model look something like this:

Alex Papadimoulis / Mar 7, 2011

Enterprise Dependency: Big Ball of Yarn

Not too long ago, I posted The Enterprise Dependency. Essentially, it was a visual depiction of a good ole' enterprise framework that was "several dozen megabytes chock full of helper classes like IEnterpriseAuthenticationProviderFactoryManagementFactory." Inspired by the diagram, commenter...

If that is the case, then yes removing 'unused' code would be a bad idea.

I still want to generate one of our own for this project.

Granted, everyone knows our system is complex, but sometimes, you need a picture to stab the manglement folks in the eye.

boomzilla

@morbiuswilters said:

@Julia said:
In the following:

SELECT * FROM Foo f JOIN Bar b ON f.Id = b.Id

it's not immediately obvious that the two IDs are completely unconnected

It's obviously a bug. You're joining two tables on their PKs. When would you need to do this? At the very least, it's so rare that it should immediately jump out to you, just as "f.FooId = b.BarId" does.

Exactly. Using "id" also makes it easy to see what's a PK vs FK in a query. I have to deal with a schema for a COTS system that uses the TABLE_ID (IMHO anti-)pattern, though to be fair, originally, there were no PK constraints declared at all.

Cassidy

@Anketam said:

So I personally am not afraid of removing 'does not appear to be doing anything' or 'unused' code and breaking things, since by doing so it can reveal even bigger problems that you really need to fix.

Echoed.

Whilst it's in "dunno" territory, you... don't actually know how important it is or isn't. Deleting it is simply a way of forcing the answer out.

robbak

Hmm. The possibility of an unknown function rerrering to an orphan table that may contain outdated data? Nuke that table now - especially if it is close to release!

AndyCanfield

I've got a table like that. It's called "CARP" (an acronym ...), has fields CARPIDENT and CARPVALUE. It holds information about the database as a whole."DBVERSION" gives me the name of the last sql file that modified the database structure; "LASTUPDATE" tells me the date and time, and table name, of the last change to the database data.

pjt33

@erikal said:

@zelmak said:
... is there an easy way to count the number of methods in a set of classes (.class directory or .jar file)?

I'm pretty sure you can abuse one of the bytecode manipulation APIs for such a purpose, but then I'd question if you can be arsed.

That's overkill. javap and grep.

TheCPUWizard

@zelmak said:

"We don't know what will break if we remove it."

Although the following applies more to "application code" than database, it is still appropriate...

Step 1: Run a full test suite with "coverage" enabled

Step 2: If the item is not exercised by a test, then delete it.

Weps

@TheCPUWizard said:

@zelmak said:
"We don't know what will break if we remove it."
Although the following applies more to "application code" than database, it is still appropriate...
Step 1: Run a full test suite with "coverage" enabled
Step 2: If the item is not exercised by a test, then delete it.

I have never seen a "full" test suite. You?

TheCPUWizard

@Weps said:

@TheCPUWizard said:
@zelmak said:
"We don't know what will break if we remove it."
Although the following applies more to "application code" than database, it is still appropriate...
Step 1: Run a full test suite with "coverage" enabled
Step 2: If the item is not exercised by a test, then delete it.
I have never seen a "full" test suite. You?

For existing projects... Truely full?.. NO. But to the point where every class is exercised at least once, along with each DB entity (table, view, stored proc)..quite often. In many cases getting to this baseline is the first activity when I engage with a new client.

For greenfield developement.... YES! Using Whitebox TDD methodologies there should be a test for each defined behaviour. This means that any code not invoked, is not necessary and can be pruned.

morbiuswilters

@TheCPUWizard said:

@Weps said:
@TheCPUWizard said:
@zelmak said:
"We don't know what will break if we remove it."
Although the following applies more to "application code" than database, it is still appropriate...
Step 1: Run a full test suite with "coverage" enabled
Step 2: If the item is not exercised by a test, then delete it.
I have never seen a "full" test suite. You?
For existing projects... Truely full?.. NO. But to the point where every class is exercised at least once, along with each DB entity (table, view, stored proc)..quite often. In many cases getting to this baseline is the first activity when I engage with a new client.
For greenfield developement.... YES! Using Whitebox TDD methodologies there should be a test for each defined behaviour. This means that any code not invoked, is not necessary and can be pruned.

100% test coverage has got to be about as dumb as not testing at all.

Cassidy

@Weps said:

@TheCPUWizard said:
I have never seen a "full" test suite. You?
For existing projects... Truely full?.. NO. But to the point where every class is exercised at least once...

Including unneeded classes?

I also thought Black-box is more suited to testing for defined behaviour.

ASheridan

@morbiuswilters said:

@Julia said:
In the following:
SELECT * FROM Foo f JOIN Bar b ON f.Id = b.Id

it's not immediately obvious that the two IDs are completely unconnected

It's obviously a bug. You're joining two tables on their PKs. When would you need to do this? At the very least, it's so rare that it should immediately jump out to you, just as "f.FooId = b.BarId" does.

Not necessarily, I can think of at least one valid reason off the
top of my head why you might do this, although it will be a rare set of
circumstances.

Consider the scenario where a DB is using an engine
that stores one physical file per table (I belive myisam does this) and
is sitting on a server that has a file size limit on how large a file
can become (lets say Fat32 as an example).

Table A has rather a
lot of fields, 70% of which are only ever used by one part of the code,
while the remainng 30% are used nearly everywhere, and this DB is
approaching critical limits in terms of file size, although the hard
disk space itself is fine. Now, there are plenty of possible solutions,
but management has frozen the budget, so we can't get a new server, or
temporarily replace this one while we rebuild it with something more
modern on to overcome the Fat32 limit.

Splitting the table into
two with a 1:1 relationship on the id's might make perfect sense, is
very easy to implement, and won't cause any problems with management who
just want this fixed but don't want it to cost anything.

But, this is a rare scenario, and would immediately become obvious if you even looked at the error log.

morbiuswilters

@ASheridan said:

@morbiuswilters said:
@Julia said:
In the following:

SELECT * FROM Foo f JOIN Bar b ON f.Id = b.Id

it's not immediately obvious that the two IDs are completely unconnected

It's obviously a bug. You're joining two tables on their PKs. When would you need to do this? At the very least, it's so rare that it should immediately jump out to you, just as "f.FooId = b.BarId" does.
Not necessarily, I can think of at least one valid reason off the
top of my head why you might do this, although it will be a rare set of
circumstances.
Consider the scenario where a DB is using an engine
that stores one physical file per table (I belive myisam does this) and
is sitting on a server that has a file size limit on how large a file
can become (lets say Fat32 as an example).
Table A has rather a
lot of fields, 70% of which are only ever used by one part of the code,
while the remainng 30% are used nearly everywhere, and this DB is
approaching critical limits in terms of file size, although the hard
disk space itself is fine. Now, there are plenty of possible solutions,
but management has frozen the budget, so we can't get a new server, or
temporarily replace this one while we rebuild it with something more
modern on to overcome the Fat32 limit.
Splitting the table into
two with a 1:1 relationship on the id's might make perfect sense, is
very easy to implement, and won't cause any problems with management who
just want this fixed but don't want it to cost anything.

But, this is a rare scenario, and would immediately become obvious if you even looked at the error log.

Maybe I should have said "It is improper to do this" rather than "It is a bug"; not all improper behavior is a bug, I suppose. Your example is somewhat contrived, but contrived situations pop up in the real world.

Scarlet_Manuka

@morbiuswilters said:

It's obviously a bug. You're joining two tables on their PKs. When would you need to do this?

I do it quite often. So it depends on the system you're working with, I guess.

In my particular case, in one of the main systems I work with, certain objects are regarded as extensions of other objects. In particular, the Employee, User and Contact objects are all extensions of a common parent, which means that their PK column is the same as the PK of the corresponding parent record. So if I want to join Employee to Contact (which I do very often; the name is stored on the Contact table) I join them on their PKs.

However, I'd say the bigger value of having prefixed columns is not to prevent incorrect joins but to prevent errors in the select clause. If you ask for foo.Status when you really wanted bar.Status, you can get some difficult-to-debug errors. If you ask for foo.BarStatus instead of bar.BarStatus the database will complain immediately.

That being said, I still prefer un-prefixed columns, myself, if for no other reason than that I hate typing out long queries. One of the other systems I work with goes for the prefixed approach and by the time you've done the sixth join along the lines of "e.formula_variable_uuid = fv.formula_variable_uuid" you really get sick of it. :) (I do a lot of ad-hoc troubleshooting, so I'm writing manual queries all the time.)

@mihi said:

@zelmak said:
... is there an easy way to count the number of methods in a set of classes (.class directory or .jar file)?
Not the easiest way, but will create you a really useful nice report with drill-down features like this one: Sonar.

For Eclipse, there's also this nice plugin called "UCDetector" which I really like, because it can analyze your code to find unused classes/methods/fields etc. - really handy when dealing with poorly maintained legacy code. It can also detect isolated reference cycles (say, class A references class B, and B references A, so neither is technically "unused" on its own, but no other class references either of them).

Severity_One

@zelmak said:

(why is it important to have a table with a specific name and then explicity name the ID field <TABLE_NAME>_ID?)

I personally find it makes foreign keys easier: fields with the same value have the same name. And what, exactly, is the problem of putting the table name, or something similar, before 'ID'? Are queries going to run slower? Is your database going to fill up because of all the extraneous data? Or is it that you simply can't stand it, like some people simply can't stand cats?

boomzilla

@Severity One said:

Or is it that you simply can't stand it, like some people simply can't stand cats?

Aha! So you admit that we're right!

blakeyrat

@Severity One said:

I personally find it makes foreign keys easier: fields with the same value have the same name.

You could just give the table an alias the first time you refer to it in the query, solving that problem without cluttering your DB schema with shit names. You need to do that anyway if the query joins the same table twice, just do it all the time.

@Severity One said:

Or is it that you simply can't stand it, like some people simply can't stand cats?

Those people are really body snatchers. All normal humans love cats.

(Is that not the best clip from any movie ever? Except maybe this one.)

Cassidy

@Severity One said:

@zelmak said:
(why is it important to have a table with a
specific name and then explicity name the ID field
<TABLE_NAME>_ID?)
I personally find it makes foreign keys easier: fields with the same value have the same name. And what, exactly, is the problem of putting the table name, or something similar, before 'ID'?

It isn't a problem, it's just unnecessary.

Consider the following:

SELECT * 
FROM Foo
JOIN Bar 
ON foo.Id = bar.Id;

What you're suggesting would look like:

SELECT * 
FROM Foo
JOIN Bar 
ON foo.foo_Id = bar.bar_Id;

Another advantage of omitting the table prefix in the column names means that queries could be written like:

SELECT * 
FROM Foo
JOIN Bar 
USING(Id)

or:

SELECT * 
FROM Foo
NATURAL JOIN Bar

Cassidy

@blakeyrat said:

Those people are really body snatchers. All normal humans love cats.

(Is that not the best clip from any movie ever?

Shit, yep. I remember seeing that as a kid - the shock ending was well fucking unnerving. There's not that many films that end up that way. Better than the original, and better than the 90's remake (although the 2007 one wasn't bad).

PJH

@Cassidy said:

@Severity One said:
@zelmak said:
(why is it important to have a table with a
specific name and then explicity name the ID field
<TABLE_NAME>_ID?)
I personally find it makes foreign keys easier: fields with the same value have the same name. And what, exactly, is the problem of putting the table name, or something similar, before 'ID'?
It isn't a problem, it's just unnecessary.
Consider the following:
SELECT * 
FROM Foo
JOIN Bar 
ON foo.Id = bar.Id; 

Aren't you joining on two primary keys there? Since foreign keys were mentioned, it would probably be more like

SELECT * 
FROM Foo
JOIN Bar 
ON foo.Id = bar.Foo_Id;

no? (Where bar would have fields [Id, Baz, Qux, Foo_Id])

Weps

@Cassidy said:

@Severity One said:
@zelmak said:
(why is it important to have a table with a
specific name and then explicity name the ID field
<TABLE_NAME>ID?)
I personally find it makes foreign keys easier: fields with the same value have the same name. And what, exactly, is the problem of putting the table name, or something similar, before 'ID'?
It isn't a problem, it's just unnecessary.

See inconsistency. Why not also add "<database_name><table_name>FIELD<field_type>" in front too? Baffles me.

boomzilla

@Weps said:

See inconsistency. Why not also add "<database_name><table_name>FIELD_<field_type>" in front too? Baffles me.

How do you know that the database/schema name will always be the same? To beat this stupid horse some more, why not add the DB vendor? Your company name? Or go bigger and include the galaxy name?

Like most standards, it's more important to pick one way and stick to it than which one you pick. But if you name your primary keys like TABLE_ID, then you'll be rightfully mocked.

Weps

@boomzilla said:

@Weps said:
See inconsistency. Why not also add "<database_name><table_name>FIELD_<field_type>" in front too? Baffles me.

How do you know that the database/schema name will always be the same? To beat this stupid horse some more, why not add the DB vendor? Your company name? Or go bigger and include the galaxy name?

Like most standards, it's more important to pick one way and stick to it than which one you pick. But if you name your primary keys like TABLE_ID, then you'll be rightfully mocked.

Don't ask me, it's not my horse. Good questions nonetheless ;)

morbiuswilters

@Scarlet Manuka said:

In my particular case, in one of the main systems I work with, certain objects are regarded as extensions of other objects. In particular, the Employee, User and Contact objects are all extensions of a common parent, which means that their PK column is the same as the PK of the corresponding parent record. So if I want to join Employee to Contact (which I do very often; the name is stored on the Contact table) I join them on their PKs.

Your model sounds.. weird. Employee, User and Contact all extend a base type, but then are composed of each other? And if I had extended tables referring to a base table I'd FK the extension tables on the base table's PK.

Lorne Kates

@boomzilla said:

But if you name your primary keys like TABLE_ID, then you'll be rightfully mocked.

I don't have as much problem with this, if only because it's the least badish of a shitpile of bad.

Caveat: only for the ID, and only for one reason. It makes the code just ever so slightly more-- hmm-- readable? Not sure if that's the best term.

Basically, if there's a page that takes a database ID as input, and spits out a form to deal with that data, I'll probably name that variable (and querystring key) table_id.

Sure, I [b]could[/b] just do id, but then I'll have a page that takes two ids. Customer ID and booking ID, for example. I can't have two variables named "id", so I need to prefix them.

There are, of course, exceptions. The above example works for a parent/child relation. Not so much for a "holy shit we ran out of columns and don't understand normalization" relation.

SELECT c1.client_id, c1.first_name, c1.last_name, c1.twohundredotherfuckingfields, c2.two_hundred_and_first_field
FROM client as c1
LEFT OUTER JOIN client2 as c2 ON c1.client_id = c2.client_id [b]-- not client2_id[/b]

And of course, I could always use a proper object model and not have to muck around with database IDs in code, but if the codebase I work with was built like that, I wouldn't have anything fun to post, would I?

boomzilla

@Lorne Kates said:

And of course, I could always use a proper object model and not have to muck around with database IDs in code, but if the codebase I work with was built like that, I wouldn't have anything fun to post, would I?

I'm not convinced that not having anything truthful or interesting or uninteresting to post about has stopped anyone around here before.

Cassidy

@PJH said:

Aren't you joining on two primary keys there?

No, joining on two columns called "ID". Admittedly, I should have called them something like "UserID" or "CustomerNo" but I'm talking in the context of the PK and FK of related tables having the same column name in each table, which makes the previous join syntax possible.

morbiuswilters

@Cassidy said:

@PJH said:
Aren't you joining on two primary keys there?
No, joining on two columns called "ID". Admittedly, I should have called them something like "UserID" or "CustomerNo" but I'm talking in the context of the PK and FK of related tables having the same column name in each table, which makes the previous join syntax possible.

What I was suggesting was naming PKs "id" and FKs "foreign_table_id".

dhromed

@morbiuswilters said:

What I was suggesting was naming PKs "id" and FKs "foreign_table_id".

's how I do it.

PJH

@Cassidy said:

Admittedly, I should have called them something like "UserID" or "CustomerNo"

So, the primary key in the User table is UserID?....

Cassidy

@PJH said:

@Cassidy said:
Admittedly, I should have called them something like "UserID" or "CustomerNo"
So, the primary key in the User table is UserID?....

Possibly. Or it's the FK of a Permissions/Posts/Threads table.

Hmmm... looks like I've not through this one through. Can't remember what my point was, now (beer-addled at the mo)

morbiuswilters

@Cassidy said:

(beer-addled at the mo)

You lucky son of a bitch..

zelmak

@morbiuswilters said:

@Cassidy said:
(beer-addled at the mo)
You lucky son of a bitch..

Nearing 6K posts, morbs ... does that win you anything? Rep? A mug?

Perhaps a cuddle from dhromed?

morbiuswilters

@zelmak said:

@morbiuswilters said:
@Cassidy said:
(beer-addled at the mo)
You lucky son of a bitch..

Nearing 6K posts, morbs ... does that win you anything? Rep? A mug?

Perhaps a cuddle from dhromed?

4 years and 1.5 months of emptiness. Horrible, horrible emptiness... sigh..

Also, the knowledge I've made the world a better place.

Also, somebody mentioned a $15 Arby's giftcard.

Jaime

@morbiuswilters said:

@Cassidy said:
@PJH said:
Aren't you joining on two primary keys there?
No, joining on two columns called "ID". Admittedly, I should have called them something like "UserID" or "CustomerNo" but I'm talking in the context of the PK and FK of related tables having the same column name in each table, which makes the previous join syntax possible.

What I was suggesting was naming PKs "id" and FKs "foreign_table_id".

There is an entire syntax that your naming convention makes worthless: USING clause. I'm not saying this makes you wrong, but it is strong support for using the same column name for both PK and FK usage. If only Oracle had the syntax I'd say it's safe to write it off, but DB2 has it too.

Lorne Kates

@Jaime said:

There is an entire syntax that your naming convention makes worthless: USING clause. I'm not saying this makes you wrong, but it is strong support for using the same column name for both PK and FK usage. If only Oracle had the syntax I'd say it's safe to write it off, but DB2 has it too.

But but but.. what am I supposed to do when I need multiple FK to the same table? You know:

@bad fucking dumb shit database said:

student_id int,
primary_parent_person_id int foreign key references person(person_id),
secondary_parent_person_id int foreign key references person(person_id),
alternate_caretaker_person_id int foreign key references person(person_id),
favorite_uncle_person_id int foreign key references person(person_id),
...

And don't say "put it in a table with student_id, person_id, person_type", because then I'd have to write multiple joins, wouldn't I?

...... what do you mean how am I going to get the names of all those people? By writing a join for each of those foreign keys, of course.

Scarlet_Manuka

@morbiuswilters said:

@Scarlet Manuka said:
In my particular case, in one of the main systems I work with, certain objects are regarded as extensions of other objects. In particular, the Employee, User and Contact objects are all extensions of a common parent, which means that their PK column is the same as the PK of the corresponding parent record. So if I want to join Employee to Contact (which I do very often; the name is stored on the Contact table) I join them on their PKs.
Your model sounds.. weird. Employee, User and Contact all extend a base type, but then are composed of each other? And if I had extended tables referring to a base table I'd FK the extension tables on the base table's PK.

Well, it's the vendor's model, not my own. :) Actually, Position and Organisation also are extensions of the same basic object (Party). The term 'extension' in this context means there's a 1-1 relationship between the extension and a subset of the base. So it's not necessary to have an Employee ID that's distinct from the Party ID, because those Party records which represent Employees are in a 1-1 relationship with the Employees. Some of the extension tables do actually have a FK to Party as well as their own ID column, but I'm not sure why as the two are inevitably the same.

For the Employee / Contact / User breakdown, there are many more Contacts than Employees (roughly 2 million vs. 15,000). So the Contact table stores generic information that applies to anyone, such as name, address, birthdate and so on. The Employee table stores information that relates specifically to Employees, such as status, start and end dates, accounts payable data, and so on. The User table stores stuff like their login name for the system, and encrypted password. In our implementation all our Employees are Users, but I think the vanilla implementation allows Users to be a proper subset of Employees.

The real weirdness with this system is actually the way in which we've twisted it to suit our somewhat nonstandard business model. For instance, the 15,000 Employees I mentioned are largely (98%) not our employees but our customers (the 2 million Contact records are [i]their[/i] customers). On a more restricted note, I used to always draw attention to the way in which we used the columns on the revenue object: we stored the account number in the "Type Code" column, the type in the "Revenue Status Code" column, and the status in the "Quote Number" column. :D (There was actually a reason for this, to do with which columns had the lengths and indexes we needed.) The symmetry of that got spoiled a few years ago when we added a custom status column, but it's still fun. Corrected a bug the other day where someone assumed that the "Company" flag on the Contact object was stored in the custom extension column "Company", rather than in the builtin "Consumer Flag" column.

TheCPUWizard

@Scarlet Manuka said:

Well, it's the vendor's model, not my own. :) Actually, Position and Organisation also are extensions of the same basic object (Party). The term 'extension' in this context means there's a 1-1 relationship between the extension and a subset of the base. So it's not necessary to have an Employee ID that's distinct from the Party ID, because those Party records which represent Employees are in a 1-1 relationship with the Employees. Some of the extension tables do actually have a FK to Party as well as their own ID column, but I'm not sure why as the two are inevitably the same.

For the Employee / Contact / User breakdown, there are many more Contacts than Employees (roughly 2 million vs. 15,000). So the Contact table stores generic information that applies to anyone, such as name, address, birthdate and so on. The Employee table stores information that relates specifically to Employees, such as status, start and end dates, accounts payable data, and so on. The User table stores stuff like their login name for the system, and encrypted password. In our implementation all our Employees are Users, but I think the vanilla implementation allows Users to be a proper subset of Employees.

Wait.. If there is a 1-1 between the two then every Party is an Employee, and every party is a Contact, and therefore the number of Contacts, Parties, and Employees would all be identical.

Your scenario only makes sense if there is a 1-0:1 relationship (i.e. one side of a full join may be null for any given key). In a 1-1, there most always be exactly one key on both tables, and never a key in one table that is not in the other.

Scarlet_Manuka

@TheCPUWizard said:

@Scarlet Manuka said:
The term 'extension' in this context means there's a 1-1 relationship between the extension and [b]a subset of[/b] the base.
Wait.. If there is a 1-1 between the two then every Party is an Employee, and every party is a Contact, and therefore the number of Contacts, Parties, and Employees would all be identical.

Bolded the relevant bit for you.