You merely adopted the 500. I was born in it, molded by it. I didn't see a fully rendered Discourse page until I was already a man

TwelveBaud

I think it is using the index first, based on the number of posts being passed up the plan and the numerical ordering of "cost". But it still looks really, really expensive.

Thalagyrt

Did I really read the planner result backwards? Time for more coffee.

Edit: In my defense I've been screwing with a hilariously bad MySQL query the past few days, and MySQL displays the query plan top down, which is basically the inverse of Postgres.

Definitely need an index on that deleted_at column, though.

TwelveBaud

It seems like that would be possible, and I can add an index in migrations, but I can't understand the underlying Ruby/ActiveRecord soup. I think the right place is in the Trashable concern, maybe?

riking

I tried, and I can't yet get the query plan better than this: (only 949 posts in this topic)

                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2994.57..2996.94 rows=949 width=8)
   Sort Key: sort_order
   ->  Bitmap Heap Scan on posts  (cost=26.26..2947.64 rows=949 width=8)
         Recheck Cond: ((topic_id = 237) AND (deleted_at IS NULL))
         ->  Bitmap Index Scan on index_posts_on_topic_id_and_sort_order_and_deleted_at  (cost=0.00..26.03 rows=949 width=0)
               Index Cond: ((topic_id = 237) AND (deleted_at IS NULL))
(6 rows)

Did you notice that the Bitmap Heap Scan increased the number of rows in @ben's EXPLAIN? :( http://www.postgresql.org/message-id/464F3C5D.2000700@enterprisedb.com

Try increasing work_mem, we've got 2GB in disk buffers

It looks like the best solution is to make this a separate HTTP request.

TwelveBaud

Since we're only grabbing the post ID anyway, maybe throw that into the index too? Save you from having to table scan for it and load whatever other goo is floating around the post table.@riking said:

Try increasing work_mem, we've got 2GB in disk buffers

I think that's what Sam did a while back, but undid just before going on holiday. Alex and Ben are the only folks with shell besides DiscoDevs; maybe you can help them with that?

Thalagyrt

The bitmap heap scan didn't increase the number of rows in ben's explain. The query plan output in Postgres is a bit counterintuitive. Data flow is bottom up, not top down in that explain. Top down is the request flow for data from outside in. It went from 40617 rows on the bitmap index scan to 40486 on the bitmap heap scan when rechecked.

riking

Yes, that's an increase. It means that "oh, multiple matching rows mapped to the same bit in the bitmap".

riking

@TwelveBaud said:

Since we're only grabbing the post ID anyway, maybe throw that into the index too? Save you from having to table scan for it and load whatever other goo is floating around the post table.

Tried that, just increased the cost of the bitmap index scan.

TwelveBaud

Or it's a decrease that means "oh, we found a bunch of rows that had the correct topic_id in the index but now we need to exclude them since we see they have a deleted_at".

@riking said:

Tried that, just increased the cost of the bitmap index scan.

Damn. Thanks anyway.

PleegWat

@TwelveBaud said:

Since we're only grabbing the post ID anyway, maybe throw that into the index too? Save you from having to table scan for it and load whatever other goo is floating around the post table.

That's btree indexes. These are bitmap indexes, and as mentioned before multi-column indexes don't really make sense with bitmap.

anonymous234

http://forums.xkcd.com/viewtopic.php?f=7&t=101043

91k posts and going strong, phpBB. The same forum also has several more threads with >50k posts.

PleegWat

Was thinking of that one too. No clue what hardware they're running on though.

sam

@loopback0 said:

Has it always done this?

Back in the day it did not, but then we had a ton of edge cases when large amounts of posts in the middle of a topic got deleted / moved. Also new technique is far cheaper than old one for non giant topics.

Its a tricky problem to fix.

Another issue I have seen is the user directory. Its issuing a refresh once an hour that truncates a giant table and rebuilds it, each query is taking 2 minutes. So that is like 10 minutes of super expensive work every hour.

I went ahead and disabled the user directory for now until we fix the queries to delta stuff and are smarter about load (eg. no need to run yearly every hour)

aliceif

@sam said:

I went ahead and disabled the user directory for now until we fix the queries to delta stuff and are smarter about load (eg. no need to run yearly every hour)

That /users list thing, right?

dkf

@sam said:

So that is like 10 minutes of super expensive work every hour.

Ah! I'd been wondering what had been causing that pattern of load. It didn't look bot-, user- or monitor-generated; those all have shorter natural cycles.

sam

@aliceif said:

That /users list thing, right?

Yeah that page ... its scary to see this:

2015-04-19 21:02:09 UTC [32127-7] discourse@discourse LOG:  duration: 143882.129 ms  statement: INSERT INTO directory_items
	                (period_type, user_id, likes_received, likes_given, topics_entered, days_visited, posts_read, topic_count, post_count)
	                SELECT
	                  5,
	                  u.id,
	                  SUM(CASE WHEN ua.action_type = 2 THEN 1 ELSE 0 END),
	                  SUM(CASE WHEN ua.action_type = 1 THEN 1 ELSE 0 END),
	                  COALESCE((SELECT COUNT(topic_id) FROM topic_views AS v WHERE v.user_id = u.id AND v.viewed_at >= '2015-04-18 20:59:45.684081'), 0),
	                  COALESCE((SELECT COUNT(id) FROM user_visits AS uv WHERE uv.user_id = u.id AND uv.visited_at >= '2015-04-18 20:59:45.684081'), 0),
	                  COALESCE((SELECT SUM(posts_read) FROM user_visits AS uv2 WHERE uv2.user_id = u.id AND uv2.visited_at >= '2015-04-18 20:59:45.684081'), 0),
	                  SUM(CASE WHEN ua.action_type = 4 THEN 1 ELSE 0 END),
	                  SUM(CASE WHEN ua.action_type = 5 THEN 1 ELSE 0 END)
	                FROM users AS u
	                LEFT OUTER JOIN user_actions AS ua ON ua.user_id = u.id
	                LEFT OUTER JOIN topics AS t ON ua.target_topic_id = t.id
	                LEFT OUTER JOIN posts AS p ON ua.target_post_id = p.id
	                LEFT OUTER JOIN categories AS c ON t.category_id = c.id
	                WHERE u.active
	                  AND NOT u.blocked
	                  AND COALESCE(ua.created_at, '2015-04-18 20:59:45.684081') >= '2015-04-18 20:59:45.684081'
	                  AND t.deleted_at IS NULL
	                  AND COALESCE(t.visible, true)
	                  AND COALESCE(t.archetype, 'regular') = 'regular'
	                  AND p.deleted_at IS NULL
	                  AND (NOT (COALESCE(p.hidden, false)))
	                  AND COALESCE(p.post_type, 1) != 2
	                  AND u.id > 0
	                GROUP BY u.id

Repeated 5 times more or less (diff dates)

sam

In other news, every new post to t/1000 incurs a 2 second query to figure out the "top posters" in the topic.

RaceProUK

Maybe those queries should only run daily? Can't think of a reason to do it more opfen than that.

RaceProUK

@sam said:

In other news, every new post to t/1000 incurs a 2 second query to figure out the "top posters" in the topic.

Hoe Lee

; no wonder the race to 40k killed the site!

PleegWat

Daily seems slow, especially in the 'many small topics' usecase.

RaceProUK

A hedgehog can only do so much! I'm doing my best!

:P

PleegWat

Thinking of it, I think you really want it instant (or within 5 minutes or so) up to a couple dozen posts or so. And it probably doesn't become a problem anywhere below 1k posts.

I'd suggest moving it to a batch job above a key threshold so it doesn't run as often, but if those topics aren't supposed to exist it's probably hard to defend the extra code complexity. How about killing the overview entirely on megatopics?

RaceProUK

Why is that data not cached? And if it is, why is it totally rebuilt when a post is made? Surely it's sufficient to update a single count 99% of the time?

sam

Any post can tip the scale on the "who is the most popular posters in the topic" scale.

But at some point it just does not matter and we can skip updating it and move to a daily / hourly cycle.

aliceif

How would that affect the filtering-by-user feature?

dstopia

Is that feature even useful to anyone?

RaceProUK

@sam said:

Any post can tip the scale on the "who is the most popular posters in the topic" scale.

Well, yes

But the longer a topic continues, the more stable the top posters would be... so you can update them less frequently... like you suggested... yeah, that's a good way to do it

loopback0

@RaceProUK said:

But the longer a topic continues, the more stable the top posters would be... so you can update them less frequently...

You could also update it more cleverly, then it wouldn't need to only be refreshed once a day (or whatever) but would also be right.

RaceProUK

http://what.thedailywtf.com/t/this-500-business-is-not-even-funny-anymore/47937/234?u=raceprouk

:P

loopback0

Reading, barriers and shit.

Filed under: I didn't miss that post at all

aliceif

Although, I guess that feature could be redundant once the search page exists ..?

loopback0

@sam said:

Back in the day it did not, but then we had a ton of edge cases when large amounts of posts in the middle of a topic got deleted / moved. Also new technique is far cheaper than old one for non giant topics.

But clearly more expensive for big topics. And whether His Royal Jeffness agrees or not - forums have big topics. Either artificially like T-1000 or naturally like, well, most forums with a decent level of activity. Returning all of the postIds is silly.
The client should care about the batch it's looking at, and the batch it's asking for next (be it forwards or backwards) and not about the thousands of other batches out there.

lolwhat

@sam said:

Any post can tip the scale on the "who is the most popular posters in the topic" scale.

Create another table with user ID, thread ID, # posts in thread by user. Index on thread ID ASC, # posts DESC. As posts are added/deleted, update counts, then query table for top poster. Done and done.

Of course, another index is needed on user ID and thread ID.

TwelveBaud

Regardless of what the issue is this time, I think we're at the point where we as a community should have dedicated, experienced support rather than relying on the good graces of Jeff "anything I do for them is a courtesy" Atwood and his team. Sam is a great guy, and he knows Discourse very well, but we shouldn't be handing him "the site is down; please fix it!" issues right as he's going on vacation, that his boss doesn't even consider real issues. Riking's also a great dude, but this ain't his day job; he's got college to do, and though he obviously knows Ruby well he doesn't have professional DBA experience. We should be handing problems like these to an actual operations team, whose full-time job is support of things like this.

I'm willing to pay for that support.

lolwhat

@TwelveBaud said:

I'm willing to pay for that support.

And I'd be willing to work on it on a part-time, hourly-billed basis.

Kuro

The topic where we try to take each others money by means of (false) advertising is here:
http://what.thedailywtf.com/t/ill-buy-you-a-few-sandwiches-thats-more-than-enough/47991/

Filed Under: ← ↑ → ↓

lolwhat

It's your right to disbelieve me, I suppose, although your comment has a rather libelous touch.

TwelveBaud

@lolwhat said:

And I'd be willing to work on it on a part-time, hourly-billed basis.

I've been Thalagyrt's customer for several years and I have a fistful of testimonials from his other customers. I've basically never seen you before last month and never professionally, otherwise I'd totally be up for that.

lolwhat

@TwelveBaud said:

I've basically never seen you before last month and never professionally, otherwise I'd totally be up for that.

I completely understand. That's why I was approaching it from the "start small by doing it part-time hourly" direction. However, you obviously have a very good reason to work with Thalagyrt.

HardwareGeek

@Yamikuronue said:

It's still possible to read all the posts that are made every day

Maybe, just barely. It's getting to the point where it's really hard to keep up while also spending appropriate amounts of time on work and the other things one needs to do.

PleegWat

@sam said:

Any post can tip the scale on the "who is the most popular posters in the topic" scale.

But at some point it just does not matter and we can skip updating it and move to a daily / hourly cycle.

Strictly, if the current poster is in the top-5 (or whatever number of top posters you keep track of) you don't need to rerun the entire determine-top-posters query.

And if you keep track of all top posters with counts you never need to rerun the whole query.

Yamikuronue

@HardwareGeek said:

just barely

But within the realms of possibility. I'd consider a forum high-volume when there's absolutely no way you could even consider reading every post.

loopback0

@HardwareGeek said:

Maybe, just barely. It's getting to the point where it's really hard to keep up while also spending appropriate amounts of time on work and the other things one needs to do.

I'm finding this as it's busy at work. I've got some topics where I'm like 3/4 days behind, some where I'm only a post or 2 behind.

darkmatter

@RaceProUK said:

I know we have a fair amount of content, but 60GB?

it has to hold all of jeff's memes

abarker

I can't help it anymore. Every time I see this topic I think Was it ever funny?

loopback0

I'm sure the logs and backups munch a fair chunk of that, from previous posts around here about it.

RaceProUK

@abarker said:

I can't help it anymore. Every time I see this topic I think Was it ever funny?

With one exception, we are an amazingly tolerant group

@loopback0 said:

I'm sure the logs and backups munch a fair chunk of that, from previous posts around here about it.

Almost as if there's no backup/log archival strategy

loopback0

@RaceProUK said:

Almost as if there's no ~~backup/log archival~~ strategy

FTFY.

IIRC the backups might be on a 3 day rotation. @PJH @boomzilla ?

PJH

@loopback0 said:

IIRC the backups might be on a 3 day rotation. @PJH @boomzilla ?

The backups are on hiatus at the moment. Even @shadowmod can't get the smaller one.

loopback0

@PJH said:

The backups [are on hiatus][1] at the moment.
[1]: http://what.thedailywtf.com/t/shadowmod-queries/6921/3781?u=pjh

Natch. Who needs backups anyway?!

Making sure enough disk space gets provisioned is waaaaaay tricky.