Extended rant on Relational theory and SQL



  • @ScholRLEA said in Programming mini-rants thread:

    that a relationan attribute defines a uniquely-constrained domain, such that two relationsattributes in the same tablerelation with the same primitive implementation are not of the same type in terms of strong typing,

    FTF myself. I need to get more sleep than I've had lately.

    Basically, what I mean is that an attribute's domain is the sort of data it holds/represents, not the concrete data type that represents it, and two attributes which are holding different information - say, one being the number of couples that had sex on a beach in 2011, and the other the number of whale carcasses that washed up on that beach in 2012 (I don't want to know what the conclusions of that study were, but...) - obviously should not be the same domain, even if they both are represented by an INT value. While SQL-92 does have a mechanism (the CREATE DOMAIN clause) for defining new domains, not only is it horribly broken, but most DBAs and DB developers are unaware of it, and it is usually implemented in such a haphazard way that even if they tried to use it they be fighting off nasal demons before they finished the implementation.

    The result is that typing constraints in SQL is a joke, which is a big deal because they were a significant factor in Codd's theory - Codd (and Date) very specifically said that a) the domains of different attributes, both within a relation and between relations, should be considered disjoint unless one relation is using it as a foreign key to the other, and b) the values within attributes should not affect the relation's behavior, but the domains should, in that they should constraint any insertions to or updates of the attribute.

    @anonymous234 said in Programming mini-rants thread:

    @ScholRLEA said in Programming mini-rants thread:

    That tables are just a representation of a relvar, not the relvar itself and certainly not the relation a relvar instantiates

    I don't think I understand that at all.

    @dkf said in Programming mini-rants thread:

    1. Tables are a way to generate a relation through providing the instances of the tuples that satisfy the relation.

    Well... yes and no. We're actually conflating five things when we talk about tables in RDBMSes: relations, which are the templates or schema that describe a connection or (duh) relation between two or more pieces of data; relvars, which are a data representation of those relationships; data tables, which are one of the several possible implementations of a relvar; the data that the relvar is associating; and table layout, which is a specific way of presenting data for users visually.

    To clarify: a relation is a something that says that the domain of the datum Membership Number is associated in some way with the the domains of the datum Member Full Name and the datum Member Phone Number. It is a purely abstract thing, though it needs to be made concrete inside the RBDMS in order for the engine to operate on it. The point here is that it is associating those domains, or rather, defining them. While not all domains can be automatically constrained (you would have no way of knowing if 'Simon Moon' is in fact the name of the member, without checking it manually or having some other source of confirming it), but in principle at least, it should never allow any data that isn't legitimate to that constraint of (in this example) 'is a member of the set of people who hold memberships in organization x'.

    The relvar, in turn, is a way of representing that a specific Membership Number is connected to specific Member Full Name and Member Phone Number values, such as:

    (1723, 'Simon Moon', '(312) 555-1313')
    

    Now there are several ways to represent this, one of which is a table - that is, an array a data structure with those three domains (or pointers/refs to them) as the fields. This is not a particularly common representation, however, as it makes dynamic CRUD problematic, and has poor performance when used in a disk file. Most RDBMS systems use various disk-friendly methods such as B-Trees instead, and the data themselves are not necessarily ordered in any way physically.

    Finally, we have table layouts, which for a lot of things that RDBMSes are used for is a convenient presentation; but too many people seem to assume that that display is the relational data, and that other views (e.g., histograms, pie charts) are derived from the table layout in some way. Actual programmers 'know better', or should, but many who understand this intellectually (including myself) still tend to make that connection when they aren't being careful because the tables are a compelling presentation.

    Tables themselves need a unique identifier per row in order to allow the table to be managed (otherwise it isn't necessarily possible to identify a particular row).

    I'm going to have to disagree with you here, because this was more or less exactly what Codd was rejecting. The whole point of RDBMS was to focus on the relationships of the data, not on the structure of how it is represented, either on disk or on paper (or screen). This is the reason he made such a strong point about the importance of candidate keys - that you don't need to add a unique identifier, because if you have designed the relation correctly, then the unique identifier is already there.

    While he was writing this before the B-Tree data structure and algorithm was well established, and deliberately avoided referring to any specific implementation, the fact is that a B-Tree - or most other tree implementations, for that matter - does not require a unique identifier to track the tuples, shows that it is not an absolute requirement, not even for the sake of efficiency.

    1. I'm guessing you find ORMs annoying. That's perfectly OK. ;)

    Mostly, I see most ORMs as mapping the wrong things - they make the mistake, first pointed out by Date, that a class is analogous to a domain, not a relation. They are both forms of Abstract Data Types (technically, Codd defined domains as algebraic data types, since he defined them in terms of constraints), which is not the case for a relation.


  • area_pol

    @ScholRLEA So you are saying that we should have different datatypes for different units, instead of real number it would be m, kg, s. In physics dimensional analysis helps checking if the formula is right, so that could be useful in programming too :)

    As to that part about DBs vs relation-theory - what is the problem really?



  • @Adynathos said in Extended rant on Relational theory and SQL:

    @ScholRLEA So you are saying that we should have different datatypes for different units, instead of real number it would be m, kg, s.

    Actually, yes. That's the whole idea behind ADTs/classes, after all. The fact that most ADTs and classes are defined as compound types should be irrelevant - it shouldn't be possible for a client-programmer to determine if a given ADT is a primitive or not. This ties in with my rant about how most people who use 'classes' and 'objects' aren't really doing OOP, because it's the same problem of confusing the type with the type's implementation.

    Look at the range and sub-typing rules in Ada - a language which doesn't assume type equality of types with the same implementation, even to the point where two different access variables pointing to the same type are not considered interchangeable if they were declared separately - to see what strong typing actually means.

    As to that part about DBs vs relation-theory - what is the problem really?

    The problem is that most people who work with SQL learn SQL, not Relational theory, and SQL is a really quarter-assed approach to RDB. Most DBAs assume that the foibles and limitations of SQL are inherent in Relational design, which simply isn't the case, and one of the biggest of those is that they assume RDB only works with a fixed set of primitive domains - something that isn't even true for SQL, at least not any version of the standard in the past twenty-five years - because the RDBMS vendors tell them that, and put little to no effort into supporting user-defined domains despite the fact that having domains disjoint is a vital aspect of a fully working RDB implementation.


  • Discourse touched me in a no-no place

    @ScholRLEA said in Extended rant on Relational theory and SQL:

    We're actually conflating five things when we talk about tables in RDBMSes: relations, which are the templates or schema that describe a connection or (duh) relation between two or more pieces of data; relvars, which are a data representation of those relationships; data tables, which are one of the several possible implementations of a relvar; the data that the relvar is associating; and table layout, which is a specific way of presenting data for users visually.

    What's this “we”? :p I learned databases mainly by starting from discrete mathematics, and I also did a lot with abstract datatypes before ever doing a thing with DBs. I've always known to distinguish between the models, the instances and the implementation. (I also really don't mix them up with a tabular rendering.)

    Of course, tables as understood by SQL default to being multisets of relational tuples following the schema. They only act as true sets if there's a specific request to make them so (via a relevant UNIQUE constraint). Given that, as a practical consideration (and totally not as part of the true abstract relation) there's a need to identify the individual items in the multiset via an arbitrary unique token so as to allow modification of the particular relation instance. This identification token is often numeric in appearance, but that's by the by, a mere artefact of the implementation.

    It's not possible to consistently conceal all aspects of the implementation; stuff leaks through. That's reality, biting as usual…

    The distinction between relation and relvar is the thing I'm finding hard to quite grok at 1am (my time). It's either profound or where you're reading too much into something, and I can't tell which. ;)



  • @dkf Is the distinction between a type and a variable defined as that type profound? It's the exact same thing, it's just that it is uncommon to have more than one relvar of a given relation, because, well, it generally doesn't make sense to do that as (in principle) if they actually are the same relation, then any attributes in one relvar logically should be in the other (because they are representing the relation, not acting as a container for the data).

    Actually, now that I think of it, it is the idea that a relation (or a relvar/table representing it) is a container that is really at the heart of a lot of these confusions.


  • Discourse touched me in a no-no place

    @ScholRLEA said in Extended rant on Relational theory and SQL:

    Look at the range and sub-typing rules in Ada

    That brings back memories of when I was a noob developer (working on languages for provably correct hardware). It turns out that this sort of thing is not nearly as nice as you might hope, as there's a real need to be able to apply standard operations to these restricted numeric types, and that's where everything gets really complicated in practice. So you end up having to define how to handle all the numeric algebra and so on, which is not a nice thing to drop on an ordinary programmer, or you end up with lots of cast-like things converting to and from the tagged type, and that destroys much of the safety.

    Modern languages can get quite a bit of the same effect (not all, but maybe all isn't required?) through use of type boxing techniques, especially ones that do domain validation during construction. It's only a partial approach, but it's surprisingly effective and easy to implement…



  • @dkf Yeah, it is a nasty nettle in practice, I agree. I am currently learning Haskell's 'typeclass' system, which is an interesting approach to the problem which has the potential for wider use, but has plenty of confusing aspects and odd pitfalls of its own.

    It's just that hardly any developers I know are even aware of this, or that classes/ADTs are or can be anything other than fancy structs.

    Fancy structs are a damn useful thing, but if you don't get the reasons why you want the extras like encapsulation and inheritance, or what their limitations and pitfalls are, you are never going to really understand what you are doing.

    As for leaky abstractions, I know, but again, it is as much the fact that the abstractions aren't even supported, or understood, that is galling to me. I'd like to at least have practical RDBMS that tries to be something more than half-assed. I intend to try and implement one myself, as part of Thelema, but... well, you know how likely that is to ever happen.


  • ♿ (Parody)

    Preface: cheap whiskey maybe preventing me from understanding / coherency...

    @ScholRLEA said in Extended rant on Relational theory and SQL:

    The result is that typing constraints in SQL is a joke

    Is this a rant along the lines of a Haskell / ADa enthusiast?

    @ScholRLEA said in Extended rant on Relational theory and SQL:

    This is the reason he made such a strong point about the importance of candidate keys - that you don't need to add a unique identifier, because if you have designed the relation correctly, then the unique identifier is already there.

    I guess that in theory I agree that this situation would be nice. But I object in that the real world just doesn't play nicely with this. Fuck natural keys in their star holes. You may think that they make sense, but they are soooooo likely to fuck you over that it's just not worth it to even try or make the effort to figure out where they really do make sense.


  • Java Dev

    @dkf said in Extended rant on Relational theory and SQL:

    there's a need to identify the individual items in the multiset via an arbitrary unique token

    This depends on your problem domain. I'll give you two counterexamples drawn from my real-world experience:

    • Tables which reflect the state of some external system. This requires combining the identity of the remote node with a unique identifier provided by that node, resulting in a compound key which is not a traditional database-generated key. Adding a traditional database-generated key to that causes significant complexity but doesn't really solve anything.
    • OLAP tables, which in our application case are insert-only, and are always viewed in aggregated forms. We don't have any unique keys on this.

  • Discourse touched me in a no-no place

    @boomzilla said in Extended rant on Relational theory and SQL:

    Fuck natural keys in their star holes. You may think that they make sense, but they are soooooo likely to fuck you over that it's just not worth it to even try or make the effort to figure out where they really do make sense.

    They only make sense where the key has Buddha true-identity nature. Names of things never do.


  • Discourse touched me in a no-no place

    @PleegWat said in Extended rant on Relational theory and SQL:

    OLAP tables, which in our application case are insert-only, and are always viewed in aggregated forms.

    Insert-only? No queries at all? It's WOM all the way! 🐠

    We don't have any unique keys on this.

    Well, you probably do (database implementations will make one if you don't specify one; it's often called something like _ROWID, though the name totally varies between DB implementations, and might actually be a virtual column that instead maps to how the DB internally addresses things) but you're ignoring them. The DBA might be hiding all that stuff from you. That's perfectly OK, FWIW.

    I'm entirely happy that something outside the DB can have true-identity nature. It does happen, @boomzilla's reasonable concerns notwithstanding, and it happens because there isn't just one DB, or just one authoritative source of identity. You just need to be careful when dealing with external identities to make sure that they actually are identities; noting the reference to the issuer of the identity as part of an identity-tuple is very good.



  • @ScholRLEA said in Extended rant on Relational theory and SQL:

    what I mean is that an attribute's domain is the sort of data it holds/represents, not the concrete data type that represents it

    This was the original concept behind Hungarian notation before the Windows team so comprehensively missed the point.



  • To be honest, I don't think 80% of programmers should have to care about that.

    You have some custom data types (probably classes), with some formally defined constraints to help catch errors, and you want to store instances of them and be able to retrieve them. That seems like something that the computer should already be able to do for me with reasonable efficiency.


  • Discourse touched me in a no-no place

    @anonymous234 Yes and no. The problem is that you then decide you want to select some instances based on some criteria. The usual approach of ordinary programmers when faced with this is to fetch the collection of all the instances and to scan through that collection looking for the ones that actually match.

    Yes, that's SELECT * FROM THEBIGTABLEOFTHINGS and do the rest after the mega-fetch has returned everything. Precisely what you don't want with a DB about. Instead, you want to hoist your filtering to the DB, so that you get just the set of things you want (or the smallest convenient superset of them) to minimise the amount of information moved around. With a properly configured DB, this is very efficient, but it requires either the programmer to work with the DBA (to define proper queries) or to learn SQL itself.


  • Impossible Mission - B

    @ScholRLEA said in Extended rant on Relational theory and SQL:

    Look at the range and sub-typing rules in Ada - a language which doesn't assume type equality of types with the same implementation, even to the point where two different access variables pointing to the same type are not considered interchangeable if they were declared separately - to see what strong typing actually means.

    ...and look at how many people use Ada. The term "epsilon" comes to mind.

    There is a reason for this.


  • Winner of the 2016 Presidential Election

    @masonwheeler said in Extended rant on Relational theory and SQL:

    There is a reason for this.

    Yep, it's that programming languages with Pascal-style syntax are unfashionable. Ada is not that bad IME.

    Also, your argument is complete BS. "Ada is not widespread, therefore feature X of Ada must be bad."



  • @anonymous234 said in Extended rant on Relational theory and SQL:

    That seems like something that the computer should already be able to do for me with reasonable efficiency.

    Unfortunately, it often isn't. In Java it particularly isn't. Because primitive types are ❄ there and you get a lot of inefficiency boxing them in classes, plus the lack of operator overloading makes the wrap painful to work with, since you often do want to maintain arithmetic operations for the subtypes of numbers.


  • Discourse touched me in a no-no place

    @Bulb said in Extended rant on Relational theory and SQL:

    Unfortunately, it often isn't. In Java it particularly isn't. Because primitive types are there and you get a lot of inefficiency boxing them in classes, plus the lack of operator overloading makes the wrap painful to work with, since you often do want to maintain arithmetic operations for the subtypes of numbers.

    Defining general subtypes of numbers that also support arithmetic is hard.



  • @dkf Being able to associate a unit with each numeric value, and having the compiler track the units through arithmetic expressions to check whether the proposed operations are reasonable and whether each expression result is compatible with its proposed container, could eliminate whole classes of errors I've seen in production code. High-school physics students are supposed to be able to do dimensional analysis; how hard could it be to have compilers do it too?



  • @flabdablet Some googling shows that Boost can do this, as well as languages intended for numerical calculations (though not Matlab, oddly enough) and plenty more third-party libraries for languages that support operator overloading.

    I suppose it's not a common enough feature to get included into the standard library of popular languages.

    Oh and F# supports it too


  • BINNED

    @ScholRLEA said in Extended rant on Relational theory and SQL:

    The result is that typing constraints in SQL is a joke

    I agree, but I'd be happy if T-SQL just stopped doing implicit type conversions. 🏆



  • @antiquarian said in Extended rant on Relational theory and SQL:

    @ScholRLEA said in Extended rant on Relational theory and SQL:

    The result is that typing constraints in SQL is a joke

    I agree, but I'd be happy if T-SQL just stopped doing implicit type conversions. 🏆

    🤦



  • @anonymous234 said in Extended rant on Relational theory and SQL:

    To be honest, I don't think 80% of programmers should have to care about that.

    You have some custom data types (probably classes), with some formally defined constraints to help catch errors, and you want to store instances of them and be able to retrieve them. That seems like something that the computer should already be able to do for me with reasonable efficiency.

    I have to disagree there, because using a RDBMS to hold operational objects misses the point of both RDB and OOP. Objects aren't data, and real-world data aren't objects. There should be some kind of interface between them surely, if only to store serialized objects for later use, but they aren't the same thing, nor do they serve the same purposes.

    The whole point of a RDBMS is to a) represent information about the real world, b) represent which data are related to each other in a unique way, and c) store that information in an efficient manner. It is quite fine-grained in its elements, or at least should be, but at the same time, in principle should be abstract enough that the specific data can be disregarded for most operations, allowing you to work on sets of data in an algebraic and declarative fashion. While the state of the relvars is of primary importance, the relations and the queries operating one them should be very close to stateless (that is, given a specific relvar state, the query should always produce the same result).

    OOP, on the other hand, is about creating easily-manipulated analogs for relatively concrete things either in the real world or in some virtual construct, things that is in some way unitary, atomic, and coarse-grained, and provide a means for directing them in how to interact. While it is somewhat declarative, it still is focused on the actions rather than the goals, and inherently has to deal the state of the objects themselves, while at the same time treating that state as atomic (no peeking into the implementation, if it can be avoided).

    Now, the reality for each of these is never going to match the ideal of course, because like all abstractions they leak when made concrete.The thing that I am objecting to is that most of the implementations for both show a real disregard for the ideas behinds them, and a real lack of understanding on the parts of the designers, that goes beyond any arguments of 'efficiency' or 'terseness'. I can put up with the client-programmers not really 'getting it', if only because they will have to kludge it to make a working system anyway; but the language designers should at least pay more than just lip service to the things they are supposedly implementing.


  • ♿ (Parody)

    @dkf said in Extended rant on Relational theory and SQL:

    I'm entirely happy that something outside the DB can have true-identity nature. It does happen, @boomzilla's reasonable concerns notwithstanding, and it happens because there isn't just one DB, or just one authoritative source of identity.

    I can believe that. But it seems like such a ❄ situation that it's safer and more efficient to just never worry about it.

    Ok, sure, if you don't really care because it's just a dump of data that you're going to aggregate later...then who cares?


  • Discourse touched me in a no-no place

    @flabdablet Adding a unit, sure, and I've seen a number of libraries that do it. However, I was talking about restricting the range of the set. For example, having a type that corresponds to the set of odd numbers greater than 49. They're also useful, and yet painfully difficult to work with.

    Merely adding the dimension doesn't guarantee correctness at all anyway. There's a quite a few formulæ that need an extra factor of (a multiple of) π in there for good measure because of the nature of the conversions between different notions of coordinate space used…



  • @dkf said in Extended rant on Relational theory and SQL:

    a type that corresponds to the set of odd numbers greater than 49. They're also useful, and yet painfully difficult to work with.

    Not being closed under addition could make them a bit of a nuisance.


  • Discourse touched me in a no-no place

    @flabdablet said in Extended rant on Relational theory and SQL:

    Not being closed under addition could make them a bit of a nuisance.

    Yeah, you have to project the domains to the range, and that's when things get nasty. Or you have to rely on the user of the type system knowing the theory of numbers pretty thoroughly (enough to specify how to make the mathematical operations at least project onto a set they care about), and that's starting to get into areas where even mathematicians usually skim over the boring bits. By far the easiest thing is when you can specify that the operations are working on finite fields (e.g., 32-bit integer arithmetic does this) and then the operations are vaguely sane. Doing the whole job properly is much harder. BTTTDT. :(

    The lesser thing — tying in the dimensional units to the type — is useful, but only if you can define what the base set of dimensional units is. For example, some things are dimensionless units by strict SI rules, but you don't want to treat them that way in practice. You also need non-SI units to handle quantities that need to be converted; if that thermometer is in Farenheit, you shouldn't just treat it as Kelvin even if it is the same physical thing that's being talked about.



  • @dkf said in Extended rant on Relational theory and SQL:

    if that thermometer is in Farenheit, you shouldn't just treat it as Kelvin even if it is the same physical thing that's being talked about

    See also: intervals vs. wall times and the whole question of which epoch to use.



  • @dkf said in Extended rant on Relational theory and SQL:

    @flabdablet Adding a unit, sure, and I've seen a number of libraries that do it. However, I was talking about restricting the range of the set. For example, having a type that corresponds to the set of odd numbers greater than 49. They're also useful, and yet painfully difficult to work with.

    Yes, in the general case it is. But there are many cases where you don't need to restrict the range or you can do with modulo just fine. For example geocoordinates, or differentiating between different kinds of coordinates like document and screen.

    And it's not just numbers. There are many kind of unique identifiers that are just strings or specific numbers of letters where wrapper type can go a long way towards catching invalid data early.

    And many languages support this efficiently. Including C#. Just not Java.

    @dkf said in Extended rant on Relational theory and SQL:

    The lesser thing — tying in the dimensional units to the type — is useful, but only if you can define what the base set of dimensional units is. For example, some things are dimensionless units by strict SI rules, but you don't want to treat them that way in practice.

    Yep. Sometimes angles have to be dimensionless, because you need to calculate or use them as ratios of lengths and other times you want to have unit for them so you don't mix them up with other things. Or sometimes you may want to go even further: if widths and heights never mix in your domain, they can be separate dimensions; or longitudes and latitudes etc. On the other hand in natural units, length may be the same dimension as time and speed come out dimensionless; after all, it is a single 4-dimensional space-time.

    @dkf said in Extended rant on Relational theory and SQL:

    You also need non-SI units to handle quantities that need to be converted; if that thermometer is in Farenheit, you shouldn't just treat it as Kelvin even if it is the same physical thing that's being talked about.

    A decent dimensional analysis library can handle the conversions, even implicitly if you want. Either by normalizing on input or by having subtypes by specific and implicit or explicit conversion between them. A decent dimensional analysis library should also have some support for intervals and "absolute" values with some arbitrary reference point because many, even most, dimensions have those.


  • Discourse touched me in a no-no place

    @Bulb said in Extended rant on Relational theory and SQL:

    length may be the same dimension as time

    Formally, they're at right angles to each other in any non-accelerating reference frame, though they will not look as if they are if they are observed from another reference frame. They're all linked by Lorenz transforms, which are really just rotations in spacetime… 🤓



  • @dkf dimension as in measured in the same unit, not as in the same component of the vector space...


    filed under: termitology is hard


  • Discourse touched me in a no-no place

    @Bulb said in Extended rant on Relational theory and SQL:

    dimension as in measured in the same unit, not as in the same component of the vector space...

    Well, yes, though the fact that when you square one of the quantities you get a negative w.r.t. the other is pretty significant too…


  • ♿ (Parody)

    @flabdablet said in Extended rant on Relational theory and SQL:

    Not being closed under addition could make them a bit of a nuisance.

    Often a sign that doing arithmetic on them is :doing_it_wrong: .


  • ♿ (Parody)

    @dkf said in Extended rant on Relational theory and SQL:

    By far the easiest thing is when you can specify that the operations are working on finite fieldsrings (e.g., 32-bit integer arithmetic does this)

    No?


  • Discourse touched me in a no-no place

    Maybe? Remembering exactly would feel like working…


  • Java Dev

    @dkf 2³² is not closed under multiplication (not a prime base). Not sure about the terminology though.



  • @PleegWat said in Extended rant on Relational theory and SQL:

    @dkf 2³² is not closed under multiplication (not a prime base). Not sure about the terminology though.

    It is closed, basically by definition. Because closed just means that result of multiplication is from the set and it is.

    What the non-prime base does is that it does not have inverse for multiplication, so it is not a field, just a commutative ring. But then, integers (unlimited) are just a commutative ring too, because they don't have multiplication inverse either (a-1 is a fractional number; note that integers modulo prime base are fields and do have a-1 for all elements in the set except 0).



  • @ScholRLEA said in Extended rant on Relational theory and SQL:

    This is the reason he made such a strong point about the importance of candidate keys - that you don't need to add a unique identifier, because if you have designed the relation correctly, then the unique identifier is already there.

    In theory.

    The problem with theory is that you eventually have to put it into practice. In practice, you may not have a good candidate key.

    What makes a good candidate key?

    • Data that is unique description of a tuple
    • Data that doesn't change

    In the example you gave, that would be the Membership Number. There's just one minor problem with that:

    Membership Number isn't a natural key

    Membership Number is a surrogate key, a value created explicitly to be a unique identifier. The exact thing Codd said you shouldn't do.

    For reference, the other two datum in your example make poor candidate keys because they can change. Changing a primary key's value means it has to cascade updates to any tables that reference said primary key.

    This isn't the only reason to use surrogate keys... you also use them in cases where you would otherwise need a very large number of candidate keys to make a compound key.


  • Discourse touched me in a no-no place

    @powerlord said in Extended rant on Relational theory and SQL:

    Membership Number is a surrogate key, a value created explicitly to be a unique identifier. The exact thing Codd said you shouldn't do.

    Well, it matters because we need identities for things otherwise we end up getting hang up on whether something is the same when it has part of it changed. It's something of an argument for philosophers more than anyone else, of course…



  • @dkf said in Extended rant on Relational theory and SQL:

    we end up getting hang up on whether something is the same when it has part of it changed.

    Solution there is to add a nanosecond timestamp to every relvar and refuse to overwrite or delete anything ever. 🚎



  • Regarding scalar range domains, my current plan for Thelema (stop snickering) is to have a (range) domain c'tor that takes an upper and lower bound, and optional :overflow and :underflow keywords which take as parameters :fail (that is, raise an exception), :wrap, and :saturate, with the default being to :fail with a standard overflow-of parameterized exception (that is, it will match to a catch of either overflow-of (this-type) or to an unparameterized overflow). This doesn't really solve the problem as completely and elegantly as I would like, though.



  • @powerlord said in Extended rant on Relational theory and SQL:

    a value created explicitly to be a unique identifier. The exact thing Codd said you shouldn't do.

    Well, screw that guy then.


Log in to reply