Programming mini-rants thread

RaceProUK

@dkf said in Programming mini-rants thread:

numpy

dkf

@RaceProUK It'd be fun if it was called numpty, but it's actually an obvious name for a numerics package for Python.

Dreikin

@asdf said in Programming mini-rants thread:

@Dreikin said in Programming mini-rants thread:

Trees and graphs, in my case.

So, basically graphs, since the former is a subset of the latter. Sounds like you want a graph database or graph processing engine, and not an RDBMS.

No, I want a relationship DB and query syntax that can natively and easily handle more than one type of relationship.

Jaloopa

@dkf said in Programming mini-rants thread:

it's actually an obvious name

Which is why it has no place in the bold new world of software named after whatever the person who put it on GitHub thought sounded cool. It should be renamed to Gherkin or something

asdf

@Dreikin said in Programming mini-rants thread:

query syntax

The problem is not the query syntax; a regular RDBMS simply cannot handle graph queries efficiently. That's why specialized software for this use case exists in the first place.

dkf

@asdf said in Programming mini-rants thread:

The problem is not the query syntax; a regular RDBMS simply cannot handle graph queries efficiently. That's why specialized software for this use case exists in the first place.

RDBMSs do quite a good job provided you only use joins that follow foreign key relations and you've got your indices correct. And you avoid self-joins; self-joins suck ass.

asdf

@dkf
The bigger issue is that graph queries often need a large number of joins, and that joins are still expensive. In many cases, you cannot even know the necessary number of joins in advance. I'm pretty sure you can somehow deal with that using nonstandard extensions like WITH RECURSIVE, but the performance will probably be abysmal.

RaceProUK

@asdf said in Programming mini-rants thread:

joins are still expensive

I see a lot of people claim this (usually those who then go on to say "MongoDB is awesome because it's webscale!"), yet no-one seems to have any numbers to back it up

asdf

@RaceProUK said in Programming mini-rants thread:

usually those who then go on to say "MongoDB is awesome because it's webscale!"

Flagged for libel!

dkf

@asdf said in Programming mini-rants thread:

Flagged for libel!

Is he not supplying enough libel in that statement? :p

asdf

@dkf said in Programming mini-rants thread:

@asdf said in Programming mini-rants thread:

Flagged for libel!

Is he not supplying enough libel in that statement? :p

He accused me of supporting databases which "solve" the problem that some DB operations are expensive by nature by not providing anything like it and forcing users to badly re-implement them themselves, outside the DB. As far as I'm concerned, both the accusation and the stupidity which lead to the popularity of those "databases" in the first place should be punishable by execution via purple cactus dildo. ;)

RaceProUK

@asdf Firstly, I'm a 'she', not a 'he'. Secondly, I didn't accuse you of anything: I simply stated that I had heard that claim before, and identified a group that oft repeat that claim.

asdf

@RaceProUK said in Programming mini-rants thread:

Firstly, I'm a 'she', not a 'he'.

Oops, sorry

Secondly, I didn't accuse you of anything: I simply stated that I had heard that claim before, and identified a group that oft repeat that claim.

I wasn't being serious. As in, not at all.

RaceProUK

@asdf said in Programming mini-rants thread:

I wasn't being serious. As in, not at all.

Huh.

Excuse me a moment.

a loud thud is heard as the hedgehog disposes of an obviously broken humour detection unit

Dreikin

@asdf said in Programming mini-rants thread:

@dkf
The bigger issue is that graph queries often need a large number of joins, and that joins are still expensive. In many cases, you cannot even know the necessary number of joins in advance. I'm pretty sure you can somehow deal with that using nonstandard extensions like WITH RECURSIVE, but the performance will probably be abysmal.

@asdf said in Programming mini-rants thread:

@Dreikin said in Programming mini-rants thread:

query syntax

The problem is not the query syntax; a regular RDBMS simply cannot handle graph queries efficiently. That's why specialized software for this use case exists in the first place.

There's no technical reason they couldn't handle it well. You're still thinking of having to manage the relation pointers yourself, instead of letting the DB do that. The SQL language assumes you're doing that, and it does make using graphs a terrible experience, and terrible on performance.

But relational databases should be able to handle and manage them easily without any such penalties like trying to do it through SQL would impose.

Dreikin

@Dreikin said in Programming mini-rants thread:

@asdf said in Programming mini-rants thread:

@dkf
The bigger issue is that graph queries often need a large number of joins, and that joins are still expensive. In many cases, you cannot even know the necessary number of joins in advance. I'm pretty sure you can somehow deal with that using nonstandard extensions like WITH RECURSIVE, but the performance will probably be abysmal.

@asdf said in Programming mini-rants thread:

@Dreikin said in Programming mini-rants thread:

query syntax

The problem is not the query syntax; a regular RDBMS simply cannot handle graph queries efficiently. That's why specialized software for this use case exists in the first place.

There's no technical reason they couldn't handle it well. You're still thinking of having to manage the relation pointers yourself, instead of letting the DB do that. The SQL language assumes you're doing that, and it does make using graphs a terrible experience, and terrible on performance.

But relational databases should be able to handle and manage them easily without any such penalties like trying to do it through SQL would impose.

For example, you should be able to specify that a relation is a graph or tree (I'm assuming optimizations can be made for trees relative to general graphs). Then it can be indexed using an index specialized for them, and you could make queries like

select names
from employees
where employees.bosses includes
(select employees from employees where title = 'CIO')

which the DB could handle by grabbing the tree subset(s) descending from the CIO.

Magus

@Unperverted-Vixen said in Programming mini-rants thread:

@RaceProUK You mean I'm not supposed to declare all of them as object x;?

I feel like you're the wrong fox to be joking about perverting the language in such a way.

dkf

@RaceProUK said in Programming mini-rants thread:

Firstly, I'm a 'she', not a 'he'.

https://static1.squarespace.com/static/53ddaa78e4b0536a495710cf/t/53de9137e4b01c361c98fa12/1407095100206/8210669_s.jpg

sh_code

@Dreikin said in Programming mini-rants thread:

@Unperverted-Vixen said in Programming mini-rants thread:

@RaceProUK You mean I'm not supposed to declare all of them as object x;?

Iterator variables are supposed to be i, j, and k, remember? We went over this like 50 times!

i've never understood this nonsense... you use them in cycles (a word closer to my native lang word for loops), so why can't they be c, or for nested cycles c1, c2, ...cn! UNLIMITED ADDRESS SPACE FOR NESTED CYCLES!

Yamikuronue

@dkf is that hedgehog...dancing? I'm ing on what that image is meant to mean >.>

dkf

@Yamikuronue said in Programming mini-rants thread:

@dkf is that hedgehog...dancing? I'm ing on what that image is meant to mean >.>

I dunno what it is meant to mean. I just GISed for “she hedgehog” and picked a result I liked. ;)

Magus

@sh_code Because nesting them is really bad?

Dreikin

@sh_code said in Programming mini-rants thread:

@Dreikin said in Programming mini-rants thread:

@Unperverted-Vixen said in Programming mini-rants thread:

@RaceProUK You mean I'm not supposed to declare all of them as object x;?

Iterator variables are supposed to be i, j, and k, remember? We went over this like 50 times!

i've never understood this nonsense... you use them in cycles (a word closer to my native lang word for loops), so why can't they be c, or for nested cycles c1, c2, ...cn! UNLIMITED ADDRESS SPACE FOR NESTED CYCLES!

Well, i is obviously short for iterator, and then when they needed more, they followed standard mathematical process and used the subsequent letters. ax^2 + bx + c, (x, y, z), ..., [i, j, k].

Captain

@Jaloopa I hate that about C#. It's a strongly typed language, but it's still stringly typed. Sucks ass. Wasting the language's greatest strength.

Jaloopa

@Captain I was talking about VB. I'm fairly sure C# would fail to compileif you tried to return a string when the signature says DateTime.

I've never tried it because I'm not a fucking idiot

RaceProUK

@Captain said in Programming mini-rants thread:

I hate that about C#. It's a strongly typed language, but it's still stringly typed.

Clearly you've never used C#: if you try and pass a string instead of a DateTime the compiler will quite rightly ~~shoot you in the head~~raise a compilation error.

Zecc

@Dreikin said in Programming mini-rants thread:

Well, i is obviously short for iterator

No! It's short for "index"!
:faux_anger: :overly_dramatic_table_flip:

remi

@Dreikin said in Programming mini-rants thread:

Well, i is obviously short for iterator, and then when they needed more, they followed standard mathematical process and used the subsequent letters. ax^2 + bx + c, (x, y, z), ..., [i, j, k].

I think the mathematical approach came first (), I've always seen i, j, k... used in formulas such as sums and series (*). Since programming was initially not much more than mathematical formulas, my guess is that this is how this habit started.

(*) although strictly speaking I don't think I have read a lot of mathematical stuff in the form in which it was written before programming was a thing, so in theory it could be that using i, j... in sums is a new thing and not how mathematicians wrote things before programming (a bit like they used words and not formulas a few centuries ago), but I very much doubt it.

RaceProUK

@remi said in Programming mini-rants thread:

I think the mathematical approach came first (), I've always seen i, j, k... used in formulas such as sums and series (*). Since programming was initially not much more than mathematical formulas, my guess is that this is how this habit started.

My research indicates that FORTRAN, the first to use i for this purpose, took the convention from mathematics, where it's often used in series notations e.g. summation.

dkf

@RaceProUK said in Programming mini-rants thread:

My research indicates that FORTRAN, the first to use i for this purpose, took the convention from mathematics, where it's often used in series notations e.g. summation.

Also, mathematicians tend to try to avoid using lots of different indices if they can. If there's a hint that they'll need more than i, j and k, they'll usually try to write some custom operator that hides all that stuff entirely.

NedFodder

@dkf said in Programming mini-rants thread:

they'll need more than i, j and k

I worked with a guy whose code usually had at least half a dozen in every function. He'd write a function with one for loop, so he'd declare i at the top of the function and use it in the loop. Then he'd need another for loop in the same function. Instead of re-using i, he'd declare a new variable for the new loop. And instead of j or k, it was ii, iii, iv, etc. The highest I saw was viii.

dkf

@NedFodder Fortunately, modern compilers do sensible things with such silliness.

masonwheeler

@blakeyrat said in Programming mini-rants thread:

If I have a list of, say, school teacher addresses, which one is "greater than" another?

The one that comes later in an alphabetical ordering, obviously.

It's ridiculous to even think that way.

Why?

masonwheeler

@Dreikin said in Programming mini-rants thread:

No, I want a relationship DB and query syntax that can natively and easily handle more than one type of relationship.

This is a common misunderstanding. It's called a relational (not "relationship") DB because it's made up of relations (a mathematical formalization of the basic concept of tables,) not because of the way it models the relationship between the tables.

dcon

@masonwheeler said in Programming mini-rants thread:

@blakeyrat said in Programming mini-rants thread:

If I have a list of, say, school teacher addresses, which one is "greater than" another?

The one that comes later in an alphabetical ordering, obviously.

Nonono. Obviously you have to go west to east. Except when you need to go south to north. And you have no additional information except the address. And you're not allowed to access the Internet when sorting. And you're required to deliver a perfect sorting routine by tomorrow.

Dreikin

@masonwheeler said in Programming mini-rants thread:

@Dreikin said in Programming mini-rants thread:

No, I want a relationship DB and query syntax that can natively and easily handle more than one type of relationship.

This is a common misunderstanding. It's called a relational (not "relationship") DB because it's made up of relations (a mathematical formalization of the basic concept of tables,) not because of the way it models the relationship between the tables.

Not quite a misunderstanding; I'm not quite convinced a more expansive subset of graph relations can't be fit into relational database theory/implementation (although I lack the formal knowledge of that area to be certain of it). Graph relations can already be embedded in table relations, for example, so this basically comes down to query and indexing support, and possibly datatype support. You don't have to change the table model, just expand/revise how you can query it efficiently.

asdf

@Dreikin said in Programming mini-rants thread:

Graph relations can already be embedded in table relations, for example, so this basically comes down to query and indexing support, and possibly datatype support.

I'm not a database expert either, but I think your view is too simplistic. For example, relational database are row-oriented, but column-oriented storage makes a lot more sense for property graphs. You cannot easily have both at the same time.

The only database I know which even tries that is SAP Hana, and while I've never used it myself, I've heard a few bad things about it. Also, it's purely in-memory, and not a traditional, disk-backed relational database system.

Dreikin

@asdf said in Programming mini-rants thread:

@Dreikin said in Programming mini-rants thread:

Graph relations can already be embedded in table relations, for example, so this basically comes down to query and indexing support, and possibly datatype support.

I'm not a database expert either, but I think your view is too simplistic. For example, relational database are row-oriented, but column-oriented storage makes a lot more sense for property graphs. You cannot easily have both at the same time.

The only database I know which even tries that is SAP Hana, and while I've never used it myself, I've heard a few bad things about it. Also, it's purely in-memory, and not a traditional, disk-backed relational database system.

Hm, let me try to present this in another way. First off, apparently SQL databases already don't fit the formal mathematical relational database model mentioned. This is important because however neat the mathematical model may be, what I'm concerned with is the tools.

Next, you may be thinking I want these graphs/tree/hierarchies to be lateral to tables - that's not the case either: this non-finitary relation, in the extension I'm desiring, is embedded in a field. It'd basically be a new datatype, with indexes, syntax, and operators that can deal with it specially, such as already exist for, e.g., strings (LIKE). You could even interact with it with standard SQL and get an appropriate representation of it in the output, and work with a string representation of it.

By way of example:

CREATE TABLE Employees (
    Name AS VARCHAR,
    Title AS VARCHAR,
    Position AS HIERARCHY
)

INSERT INTO Employees (Name, Title, Position) VALUES ('Foo Bar', 'IT Manager', Hierarchy.Parent((SELECT Employee FROM Employees WHERE Title = 'CIO')))

SELECT Employee FROM Employees WHERE Position.Parent.Title = 'CIO'

NB: I'm combining this with my references-instead-of-pointers idea, which for these examples has some ORM-like magic around singular/plural version of table names, with the singular version representing a row reference.

djls45

@asdf said in Programming mini-rants thread:

For example, relational database are row-oriented, but column-oriented storage makes a lot more sense for property graphs.

That's an easy fix -- just transpose your rows and columns!

(I have no idea what a property graph is.)

asdf

@djls45 said in Programming mini-rants thread:

(I have no idea what a property graph is.)

What is a graph database? - Getting Started

Basically, you have edges and nodes, and both can have an arbitrary number or properties associated with them (and, optionally, a label). Let's assume our graph is modeling a railroad network (stupid textbook example, I know). Nodes might be labeled "station", "turnout" or "intersection", depending on what they represent. The edges (tracks) would have multiple properties associated with them, like length, maximum speed, ...

dkf

@djls45 said in Programming mini-rants thread:

That's an easy fix -- just transpose your rows and columns!

Some SQL database engines do that internally. As someone just using the DB, you shouldn't have to care.

Tsaukpaetra

@dkf said in Programming mini-rants thread:

@Dreikin said in Programming mini-rants thread:

Foremost by only directly/easily modeling one type of relation (tables), but also by things like requiring this type of manual pointer management.

Which other sorts of relations would you want?

Well, there's tables, but also beds, kitchen counters, the floor, food, animal, toys, etc...