Surrogate vs natural primary keys

Tsaukpaetra

@Jaime said in Surrogate vs natural primary keys:

In the real world, you don't delete all of your data every day.

Our morning batch process literally does this every day, daily. Granted, we're not a source system database, but that does not discount that we're truncating and reloading several hundred gigabytes of tables for the sake of it every day.

Oddly enough, this apparently caused index issues (, right?), so after batch processing a maintenance process goes through all the indexes and tells them to rebuild.

It's... not pretty.

@Jaime said in Surrogate vs natural primary keys:

a predicable mapping between drive letters and physical disks due

Yeah, apparently most of the databases get one iSCSI target if they're big enough, mounted up under whatever letter. This includes the database file itself, log, and indexes.

ScholRLEA

@thegoryone Did you ever get back to this, and does it make more sense now that you are able to think again?

I assume you know what 'lies to children' are, but to clarify: when you start out in teaching a subject, there's usually a lot of groundwork that needs to be understood before a student can really begin to grasp the matter. This leads to a bootstrapping problem: the students need to have a basic understanding of some aspects before moving on to the next, but that understanding itself often requires a partial understanding of the second part. More crucially, deep understanding it often requires introducing a whole new set of ideas, and to present those ideas, the student has to have some context for them.

Hence, lies to children: using limited, spoon-sized analogies to introduce a topic, followed by increasingly more accurate analogies until you begin approximating an understanding of the truth.

For example, we usually tell children in middle school that Johannes Gutenberg invented the printing press in 1439. This is a 'good enough' understanding at that point, as trying to explain the history of printing from antiquity (i.e., ancient uses of print blocks, the development of modern paper and the woodblock press in 5c China, its introduction into Europe in the 12th century, etc.) to Gutenberg's development of die-cast movable typefaces would just confuse the students; the details can be filled in later, what is really needed at that stage is an appreciation that this set off the publishing revolution of the 15th and 16th centuries.

The danger in this is that if one of the analogies at one step is more compelling than the next, then it can be next to impossible to get people to understand the more detailed analogy. The classic example of this is the Bohr model of the atom: it is a simple, compelling visual analogy that is, unfortunately, dead wrong, and was known to be wrong by the time it started reaching the general public, but the succeeding model - probabilistic electron orbitals - is a hard one to grasp, so a version of the Bohr model (and often a badly mangled one at that) is still what most people think of when they are talking about atoms.

A really good use of this approach for teaching programming to adults, including an explanation of the process of refining analogies itself, can be found in the second half of the first Abelson-Sussman lecture:

But how does [the evaluation of the expression he just wrote on the blackboard] happen? If we're going to understand processes and how we control them, then we have to have a mapping from the mechanisms of this procedure into the way in which these processes behave. What we're going to have is a formal, or semi-formal, mechanical model whereby you understand how a machine could, in fact, in principle, do this. Whether or not the actual machine really does what I'm about to tell you is completely irrelevant at this moment.
In fact, this is an engineering model in the same way that, [for an] electrical resistor, we write down a model, V = IR. It's approximately true. It's not really true. If I put up current through the resistor it goes boom. So the voltage is not always proportional to the current, but for some purposes the model is appropriate. In particular, the model we're going to describe right now, which I call the substitution model, is the simplest model that we have for understanding how procedures work and how processes work - how procedures yield processes. And that substitution model will be accurate for most of the things we'll be dealing with in the next few days. But eventually, it will become impossible to sustain the illusion that that's the way the machine works, and we'll go to other more specific and particular models that will show more detail.
(emphasis added)

Jaime

@Tsaukpaetra said in Surrogate vs natural primary keys:

Our morning batch process literally does this every day, daily. Granted, we're not a source system database, but that does not discount that we're truncating and reloading several hundred gigabytes of tables for the sake of it every day.

That's not the norm, but common enough to be considered. As discussed here, truncating and reloading a large table can cause massive index fragmentation. The possible fixes are pretty simple:

Change the workflow to truncate table, drop indexes, load table, re-create indexes. Since you load the table with no indexes, you don't incur the cost of the index twice.
Rebuild the index after load. Not as efficient as the previous method, but doesn't require the process to have as much privilege.
Simply don't create any clustered index. There's a decent chance it isn't being used anyways if it's on the primary key. How often do you do single record lookups on a table that get several GB of new data every day?

dkf

@thegoryone said in Surrogate vs natural primary keys:

you'll have to re-learn it if you take A-Level or Degree level Chemistry

Actually you'll have to relearn it both times. ;)

dkf

@thegoryone Well, yes. In GCSE chemistry, you're really not doing much more than ball-and-stick models. At A Level, you're talking about electrons more and how electrons don't need to be exactly shared between the parts, leading into a discussion of things like electronegativity. At degree level, there's a lot more talk of the quantum mechanics behind it all (though not the calculations involved, which are awful) and how that means that the resonance model of bonding in things like benzene is a load of shit (it's a delocalised multi-electron π-bond, folks!) and how it changes the energetics so that the energy stored isn't the same as in the equivalent number of isolated bonds.

In short, there's a whole bunch of stuff beyond GCSE and yes, you have to advance several levels of model. There might be more; I dropped out of chemistry after a year…