Architecture talk! Fast, safe, write-optimized db

cartman82

Let's say I have a flood of requests coming into the system (presume >= 100,000 requests per second). I need to update some state based on those. It's OK to reject requests in case of problems or pressure, but once I say request is accepted, I must make sure it is actually stored.

I am currently prototyping a setup like this: requests come in, they update in-memory state and get written into a "request log". If write fails, I have a way to reverse the changes to the in-memory state. Every minute or so, the in-memory state gets dumped to persistent storage. During system startup, I load the last snapshot, then replay all the requests from log based on state sequence id.

Currently, my "request log" is literally a log file - basically a plain old txt file where I I'm appending requests line by line. This was the quickest to prototype, but I'm not sure how safe it is (I presume it is at least as performant as any DB).

Is this sane? Should I use some DB instead? Which one?

If there are any other thoughts about a system like this, please share.

sockpuppet7

@cartman82 I think 100k/s is Cassandra cluster territory. But I wouldn't say any nosql database is any safe after reading aphyr: https://aphyr.com/posts/294-jepsen-cassandra

cartman82

@sockpuppet7

I don't really need request log to be distributed. I mean, data in it only needs to live for like a minute or two. I don't get anything by adding a network layer, I don't think.

sockpuppet7

@cartman82 I thought about distributing only because of all these writes. If you fsync all of them individually to call accepted, I'm not sure how many requests per second a single disk can take.

But you could hold a batch of them, write them, fsync the file, and only then you answer with an "accepted" for all items in the batch.

cartman82

@sockpuppet7 said in Architecture talk! Fast, safe, write-optimized db:

@cartman82 I thought about distributing only because of all these writes. If you fsync all of them individually to call accepted, I'm not sure how many requests per second a single disk can take.
But you could hold a batch of them, write them, fsync the file, and only then you answer with an "accepted" for all items in the batch.

That's how I made it.

Requests go into a queue. In a loop, I grab a chunk of requests out of the queue (10-100), write them all at once and once they are flushed (no fsync for now), go back to process the next chunk.

If the queue overflows, I start throwing away requests. So that gives me flood control.

I never considered that disk might not be able to take the requests fast enough. I guess I'll know once I do some load testing...

blakeyrat

@cartman82 100k/second is a lot of writes. Most databases can handle this but they'll need to be running on pretty beefy hardware probably. NoSQL databases might require a small cluster.

You said it's ok to reject requests. What percentage of requests are rejected? Do you need to log rejects to the database? Does that cut-down the amount of traffic hitting the DB?

If you do bulk-inserts from the log, that'd increase your performance a lot for DBs which support bulk-inserts. Try rolling over the log file every 30s then bulk-insert the previous log.

If I were you, I'd code your solution as database-agnostic as possible, then evaluate common databases to see which handles the load best. But frankly this kind of problem is a bit over my head, I've never dealt with a system with that amount of traffic.

Dragnslcr

Not sure if it would meet all of your needs, but you could try testing a time-series database like https://www.timescale.com (an extension for Postgres).

robo2

Or a localhost deployed kafka queue or ravendb 4 instance (we use the former in one of our systems, and I read good reviews on the latter).
If your persistence service is on localhost then networkoverhead isn't a big factor (but serialization overhead might)

blakeyrat

@dragnslcr I thought the point of a timescale database is that it added enough database that you could see a "snapshot" of your data at any particular point in time. Am I wrong?

Because if so I'd think the meta-data would really slow down the insert performance.

Dragnslcr

@blakeyrat said in Architecture talk! Fast, safe, write-optimized db:

@dragnslcr I thought the point of a timescale database is that it added enough database that you could see a "snapshot" of your data at any particular point in time. Am I wrong?

Because if so I'd think the meta-data would really slow down the insert performance.

I think you are, yeah. A time-series database is optimized for constant writes, primarily by physically storing the data ordered by time. It's good for things like monitoring and logging, where you're mostly just appending data to the end.

cartman82

@blakeyrat said in Architecture talk! Fast, safe, write-optimized db:

You said it's ok to reject requests. What percentage of requests are rejected? Do you need to log rejects to the database? Does that cut-down the amount of traffic hitting the DB?

I can just bounce then off at the network level, without storing them.

blakeyrat

@cartman82 Well I get that, but are you rejecting 99% or are you rejecting 0.009%? That's really what I want to know: how the rejections affect the amount of data you need to store.

cartman82

@blakeyrat said in Architecture talk! Fast, safe, write-optimized db:

But frankly this kind of problem is a bit over my head, I've never dealt with a system with that amount of traffic

Same here. I don't really know what to expect once I turn on the hose.

cartman82

@blakeyrat said in Architecture talk! Fast, safe, write-optimized db:

cartman82 Well I get that, but are you rejecting 99% or are you rejecting 0.009%? That's really what I want to know: how the rejections affect the amount of data you need to store.

With the current setup, I'll be rejecting everything that overflows the input queue. So the percentage depends on how fast I can make the storage and how many requests actually come (I set 100k as the upper limit).

Ideally, I'll reject nothing. But the mechanism is there as a pressure valve, so the system doesn't keel over in case of traffic jam.

homoBalkanus

@cartman82 did you consider using a proper queue? This sounds like something 0MQ / Rabbit MQ / Kafka were built for.

Gąska

@cartman82 but how many requests do you expect to have actually? What is the acceptable LOWER limit of handled requests per second?

cartman82

@homobalkanus said in Architecture talk! Fast, safe, write-optimized db:

@cartman82 did you consider using a proper queue? This sounds like something 0MQ / Rabbit MQ / Kafka were built for.

I did.

The upsides:

Well tested, well documented, no gotchas of rolling out and maintaining own code

The downsides:

Use case. All these queues are primarily optimized for distributing communication between multiple nodes (at the cost of speed, usually). I, on the other hand, need high speed (so I could take more requests), but don't need communication to different nodes. So it's a slightly different feature set, it seems to me.
Operational complexity. I'll probably have to custom deploy all this on a dedicated server. So the fewer moving pieces there is, the better.

I might end up adding one of those, but for now, I want to give it a try with something simpler and see how it fares.

cartman82

@gąska said in Architecture talk! Fast, safe, write-optimized db:

@cartman82 but how many requests do you expect to have actually? What is the acceptable LOWER limit of handled requests per second?

Good question. I've heard a lot of these startup dreams of big success, but I've yet to see it come true.

I am being guided by estimates of the company management and what we know about the throughput of competing products. I guess 10K / sec would be OK too, but really, who knows how much it'll get in reality.

JBert

While I'm aware that architectural styles like CQRS and event sourcing exist, I've never really checked what database or middleware is the recommended way to go about it.

I agree though that queue systems tend to be better at not dropping and distributing all incoming messages rather than "spilling over", and hence they might tie up all the processing resources until the incoming requests die down a bit.

Gąska

@cartman82 it really sounds like nobody at your company has any idea what you actually want. If I were you, I'd code something up without looking at performance metrics at all - which should give about 100 requests per second, an absolutely okay performance for a vast majority of applications - and on the side, experiment with various specialized solutions dedicated to ultra-high-throughput. And try convince your higher-ups that having more than one server is definitely worth it.

sockpuppet7

@gąska said in Architecture talk! Fast, safe, write-optimized db:

requests per second

I would call it transactions per second, or TPS for short, so I could have something called TPS reports on the system

https://youtu.be/Fy3rjQGc6lA