In which I show how referential integrity is not a thing here

boomzilla

@blakeyrat said in In which I show how referential integrity is not a thing here:

But if you write Mongo code without keeping it's weird arbitrary limit in mind, you're going to have troubles.

I would say that the way they structured the data is consistent with keeping that in mind, even though I'm not aware of anything to actively guard against overflowing the limit.

Most of the actual documents tend to be pretty small. The exception to that are the posts, which do have limits applied to them that are much smaller than the document limit. There's an administrative setting for that, the default is 32KB. I don't know if they limit the setting. Maybe I'll try it out (locally) later just to see what happens.

But even if they don't, I don't think that's the same as a design that allows things to accumulate in the manner you're describing.

boomzilla

@blakeyrat said in In which I show how referential integrity is not a thing here:

You've never explained what this mythical trade-off was trading-off against.

It's directly related to the reasons behind Raymond Chen's starting points of -100 for any new feature.

blakeyrat

@boomzilla said in In which I show how referential integrity is not a thing here:

It's directly related to the reasons behind Raymond Chen's starting points of -100 for any new feature.

I love how instead of admitting you lied and you have absolutely no idea whether or not a "trade-off" was involved, you just give all these vague hints. Keep it up.

wft

@Adynathos the reasons (raisins, actually):

Mongo is webscale!
Hipsters cannot into SQL
Data modelling is for the old farts, cool kids want INFINITE FLEXIBILITIES!

boomzilla

@blakeyrat said in In which I show how referential integrity is not a thing here:

I love how instead of admitting you lied and you have absolutely no idea whether or not a "trade-off" was involved, you just give all these vague hints. Keep it up.

So, now "lied" means "@blakeyrat pretends to be an idiot." To the TIL thread!

Adynathos

@wft I once tried doing an example program with Mongo:

so, to store/read/process the values to the DB, i need to know what kind of data to expect, so a schema
it would be nice if the DB automatically kept track of the schemas and checked the values...
that would be a reimplementation of relational DB, so let's use a relational DB instead :P

Rhywden

@wft Indeed. Okay, guys, please tell me if I got this wrong - let's assume I don't have access to the $aggregate function (and want an INNER JOIN anyway).

So, I have Comments and I have Users. And now I want to post the username along the comment of said user.

In SQL that would be:

SELECT Comments.content, Users.username
FROM Comments
INNER JOIN Users
ON Comments.user_id = Users.id

So, in essence, one(1) query and the db sorts it out.

In MongoDB I first do a

let all_comments = Comments.find({});

and then I have to do something like:

let complete_comments = _.map(all_comments, (comment) => {
    let user = Users.find({_id: comment.user_id});
    return { content: comment.content, username: user.username };
});

So, basically, I have n+1 queries. Or is there some way to do that differently that I overlooked? Remember, no $aggregate.

blakeyrat

@Rhywden Are you talking about NodeBB's flavor of MongoDB specifically? Or MongoDB generally?

Is there a reason you can't use aggregates?

Rhywden

@blakeyrat Yes, the particular version of software I'm using has some problems with $aggregate which make it kind of useless.

blakeyrat

@Rhywden Can you use map-reduce? Or are you limited to the .find() command?

I'd say the solution is to fix your broken-ass software so you can use aggregates, because that's what you're supposed to be using. (Even map-reduce is discouraged unless you can't perform the query any other way.)

Rhywden

@blakeyrat as far as I can see, map-reduce only works on singular collections.

And, no, "fixing" is not an option. Though an upcoming version will do away with the need for MongoDB entirely. Still a ways off, though.

blakeyrat

@Rhywden The only alternative to what you're doing would be, if I read your question LITERALLY, to store the username along with the post.

(Since you probably need stuff in the User table other than just the name, you're already doing about the best you can.)

But write a comment there:

// This is wrong and stupid and broken code because our MongoDB library is shit

Rhywden

@blakeyrat Okay, thanks, just wanted to see if there was some feature I overlooked somehow.

ben_lubar

NodeBB doesn't have multiple collections for data.

powerlord

@wft said in In which I show how referential integrity is not a thing here:

@Onyx that's the point.

I'm now working in a project where the dumbfucks chose Mongo instead of a database for data that's... well, relational in its nature. They imported some data into a test environment and they cannot even make the data match even on a single screen. I mean, it was all cool and dandy on generated data, but wherever there's data migrated from a live customer, there's a lot of BS.

Fuck. Kill 'em with fire, won't shed a single tear.

I'm sorry for your loss.

Of sanity.

flabdablet

@wft said in In which I show how referential integrity is not a thing here:

Data modelling is for the old farts, cool kids want INFINITE FLEXIBILITIES!

So Mongo is just a way to implement the EAV antipattern without having to admit to yourself that that's what you're doing?

Arantor

@flabdablet pretty much though there are times EAV isn't entirely wrong. I have a use case coming up where EAV is the least retarded solution on the table...

dkf

@Arantor said in In which I show how referential integrity is not a thing here:

there are times EAV isn't entirely wrong

They're mostly times when you need dynamic user-defined fields of records. You can do that with relational DBs, but you end up with some really nasty JOINs. (Relational DBs work very well when your schema is sufficiently static, which is most DB work but not all.)

wft

@dkf That's why you can have JSON type fields in the modern databases (PostgreSQL, MySQL even has it). Balance, friend.

dkf

@wft They're probably taking that JSON apart behind the scenes so they can efficiently index it (as well as keeping it as a TEXT/LOB). That's the sane way.

Rhywden

@dkf said in In which I show how referential integrity is not a thing here:

@wft They're probably taking that JSON apart behind the scenes so they can efficiently index it (as well as keeping it as a TEXT/LOB). That's the sane way.

Indeed - let the db do the heavy lifting for you. That's what it's for.

dkf

@Rhywden Most DBs are pretty miserable at string processing, at least in the language that they expose.

flabdablet

@Arantor said in In which I show how referential integrity is not a thing here:

there are times EAV isn't entirely wrong

Reaching for EAV before trying anything else is an antipattern all the same.

tufty

@dkf And yet PostgreSQL is faster at handling JSON than MongoDB.

flabdablet

@tufty Postgres is enterprise scale.

dkf

@tufty said in In which I show how referential integrity is not a thing here:

And yet PostgreSQL is faster at handling JSON than MongoDB.

Without dissing PostgreSQL, that's not the greatest achievement ever.

https://crazysmartcluelessdotcom.files.wordpress.com/2014/09/short-short-bus.jpg

Arantor

@flabdablet ok, here's my situation, tell me EAV isn't the most retarded solution here.

I have an RDBMS, it's MySQL. I have a users table. I want people to be able to add fields to the user record. More than that, I want them to be able to add user fields without me having to do it. So if they decide the user needs a new field attached to their records, called "User Preference for Flowers" with choices "Daisy", "Buttercup" and "Rose", selectable both at registration and the user settings page, they should be able to configure this without dragging me into it.

My proposal: a table of fields attached to users, and a table tying user ID, field ID and value. Tell me there is a better solution to this problem than EAV.

dkf

@Arantor said in In which I show how referential integrity is not a thing here:

I have an RDBMS, it's MySQL. I have a users table. I want people to be able to add fields to the user record. More than that, I want them to be able to add user fields without me having to do it. So if they decide the user needs a new field attached to their records, called "User Preference for Flowers" with choices "Daisy", "Buttercup" and "Rose", selectable both at registration and the user settings page, they should be able to configure this without dragging me into it.

Do you need to do anything other than store some JSON that you splurge back at the UI? Would you ever conceivably want any of those fields in a WHERE clause, assuming the system was magical enough to make that work?

blakeyrat

@Arantor The way we implement that in our application is with an XML column named "metadata". Which is basically SQL Server's equivalent to a JSON-valued column.

We'd never query based on that though. If we add a column we need to query on, then yes, it get some proper design work behind it.

Arantor

@dkf I might conceivably do lookups against some of those values later on, yes.

Arantor

@blakeyrat now factor in the design requirement that an administrator who is not a developer should be able to add fields basically on demand, without developer time.

And remember, this platform is designed to be used across a few dozen different customers and deployments meaning that consistent code is pretty important for maintenance reasons.

blakeyrat

@Arantor You can do that with the XML column solution.

What you can't do is search it (quickly) or do any JOINs based on the data in the XML column.

I'm not saying it's the perfect whiz-bang solution for every girl and boy, I'm just saying that's what our company does.

Arantor

@blakeyrat I thought about doing something similar but I'd have to manually serialise that crap in and out of the database in PHP no less. And lookups against that shit in MySQL... No thanks.

blakeyrat

Serializing XML is like a 2-liner in C#. And, sure, JSON would be a 1-liner if we installed Newtonsoft, but. Point is, it's not a big deal.

dkf

@Arantor Just fetching the values or using in a real WHERE (or ORDER BY or something like that)? If you can just keep it as an unparsed document slab, do that. It's easy to work with. But if you really need those values understandable by the DB, you are either going to be in JOIN city or you're going to be in a case where EAV might be reasonable.

If the values in your EAV could be other EAV collections, consider using a graph database and switching your query language to something like SPARQL.

Weng

@Arantor Users table has a 1:1 key to a UsersExtension table. Extension table schemas can be managed from the admin interface and have names, strong types, etc.

Internal app functionality never reads the extension table, but it's there for the user generated stuff.

Yes, this means the app gets to do DDL. Put the tables in a separate schema if that's scary.

Arantor

@Weng I don't need the DB to enforce strictness, I can quite happily do that in the application, and I won't be joining to this, more a case of "find me all users with this field having this value" stuff.

Not sure I could convince my lot that DDL in the application would be a good idea.

Let me explain, when this was first pitched, they were going to add thirty columns to the users table. Ten for possible checkboxes, ten for possible varchar and ten for possible drop down values. And no indexes. EAV was a thing that was "impossible" to do.

I still think this is not a terrible use case for EAV. I wouldn't generally advocate such a thing but the user-editable potentially on the fly deal makes it fairly hard to resist.

dkf

@Arantor It really depends on whether you can live with parsing the EAV bits purely on the client side or not. If you can keep them client side (without this meaning you have to ship massively more data to the client) then you don't need to parse and can use a very simple DB model. But if the DB needs to do anything even vaguely sophisticated, especially if it is something you'd build an INDEX for, you need to take care and pick the right tech.

I'm still trying to figure out why a forum needs MongoDB. ;)

asdf

@dkf said in In which I show how referential integrity is not a thing here:

I'm still trying to figure out why a forum needs MongoDB.

When it starts to make sense, go directly to your closest mental hospital. Do not pass "Go", do not collect £200.

martijntje

Ah, MongoDB. The way to store humongous datasets without ever breaking a sweat. Or so they claim.

We use it too at work but thankfully only for caching purposes. If the thing falls over we can just scrap everything and start fresh. Since this happens about once or twice a week we now have scripts that do this, they kill -9 every mongo-related process, clear out the directory where mongo stores its data and then start it again.

flabdablet

@Arantor said in In which I show how referential integrity is not a thing here:

Tell me there is a better solution to this problem than EAV.

I won't do that. If you're required to give your users the ability to do ad-hoc data modelling of their own, and you're justifiably unwilling to give them even filtered access to DDL in order to do that, then EAV is about the best you can do. I think it's better than JSON and XML and whatnot because if somewhere down the line it turns out that you do need to incorporate some of the user-added stuff into the formal schema, the database conversion script should end up rather more robust and rather less painful.

This is all still an antipattern with very strong design smell, but it's not your design smell - it's coming from further up the requirements chain than the point where you're working.

cheong

@Yamikuronue said in In which I show how referential integrity is not a thing here:

@tufty So then how did it end up different in multiple spots on a screen? If you're not doing multiple data dips, and it's not being stored in multiple places in the database, how did you end up with two answers to the same question?

Maybe muting one that can disagree should work.

If the script responsible for updating "notifications" see the URL is in "/unread" it skips updating the number in the Unread icon.
Now add the script for updating Unread icon based of gathered unread records on that script that responsible for updating that Unread page. And the number should be in sync.

tufty

@dkf said in In which I show how referential integrity is not a thing here:

they should be able to configure this without dragging me into it.

That's where it gets hard, regardless of the underlying data representation you use.