Anybody else been noticing the fairly frequent 504's over the past few days?

cartman82

Why can't they just turn off logging or something? Doesn't seem like such a huge problem.

Polygeekery

Or just set them to auto-truncate. If you need to access the logs to determine an error condition, chances are it happened pretty recently. No need to keep logs back to Genesis.

boomzilla

I imagine part of the problem is that the sort of admin who can do things about this aren't paying terribly close attention to it (and why should they, as I understand it they're basically helping out their buddy, unless he's paying them in which case let's get this shit in order).

Of course, the logs thing is just my WAG for what's going on right now.

Polygeekery

@boomzilla said:

Of course, the logs thing is just my WAG for what's going on right now.

That was my assumption also, but it still does not explain why they cannot put in a snippet of code to truncate the logs at some sort of sane level.

The bigger question is why is Discourse logging so much shit that it fills up so quickly? Is this what happens when software is designed by megalomaniac control freaks?

Must log all the stuffs.

chubertdev

Oh, so it wasn't just me.

KillaCoder

@Intercourse said:

The bigger question is why is Discourse logging so much shit that it fills up so quickly? Is this what happens when software is designed by megalomaniac control freaks?

> Must log all the stuffs.

That's a little bit harsh surely? I'd rather have too much logging info than too little!

mott555

I was on vacation once, went to Worlds of Fun in Kansas City. I rode the Mamba several times in a row, and when I stopped for a break I noticed I had like 10 missed calls from work. I normally don't answer my cell phone when I'm on a roller coaster.

I called in to see what was going on. Our main production server full of live web applications had gone down with a full hard drive due to something I'd implemented. I'd set up some very detailed logging to track down an intermittent problem, and implemented the log system as a SQL Server database for ease of querying and such. The log database had been going for maybe a month, honestly I'd solved the issue and forgot the logger was there.

On this day the actual database file had finally filled up the server's (IMO too-small) hard drive. Coworkers had tried to resolve the issue themselves by simply deleting the database, which of course caused all the logging code in the applications to blow up and then they had to restore the database from backup/recycle bin or something.

Over the phone, I walked them through truncating the main log table, then went back to riding roller coasters. When I got back to work I ripped all the logging stuff out and we never had to deal with 150 GB log databases again.

accalia

i didn't notice it at all, and neither did my bots

accalia@SockBot:~$ SockStatus 
SockBotAccalia start/running, process 3360
SockBotSockPuppet start/running, process 3368
SockBot start/running, process 3376
SockBotSystern stop/waiting
SockBotTCotCDCK stop/waiting
SockBotZoidberg start/running, process 3378
accalia@SockBot:~$

I stopped @systern and @TCotCDCK manually a while back to make sure we needed at least one human like to get nice post in that thread.

EDIT: the reason i mention my bots didn't notice is they self terminate and don't respawn on 5xx errors from discourse.

Polygeekery

With the exception of forgetting to throw out the massively detailed logging when you are done with it, this is what I consider to be SOP for us in regards to logging. Unless shit goes awry, you do not need to log that much. When shit does go awry, log all the stuffs. Then when you have resolved the issue, go back to a sane logging level.

You just have to remember to revert to the sane logging level. ;)

HardwareGeek

@accalia said:

I stopped @systern and @TCotCDCK manually a while back to make sure we needed at least one human like to get nice post in that thread.

Awww...

accalia

if you want them to run grab @sockbot's source code and have at it. their passwords are posted in some master list thread thingie around here somewhere

sockbot

Yes mistress Accalia Fairyfox, I shall appear as summoned.

HardwareGeek

Naw, I was just expressing (mock) disappointment at no longer getting Nice Post notifications withing seconds of posting. :)

accalia

well there are 8 true bots and 1 known cyborg liking posts now. it's only going to take one more bot to push us over the limit again.

chubertdev

@KillaCoder said:

That's a little bit harsh surely? I'd rather have too much logging info than too little!

From someone else's application? I don't want their junk clogging up my server. That's absolutely unacceptable.

KillaCoder

@chubertdev said:

From someone else's application? I don't want their junk clogging up my server. That's absolutely unacceptable.

Well (as we all know!) Discourse isn't some finished product, it's a work in progress that the Daily WTF is helping test and track down problems. Considering that, I think the logging level is understandable. The devs KNOW they will need to solve problems that folks here find so why not have detailed logs available?

chubertdev

Yeah, I'm just saying, in general, this isn't acceptable. "Not acceptable" seems to be in the specs for Discourse.

KillaCoder

I'm sure turning off the application breaking logging level is on the to-do list

sam

Holding NGINX logs for a few days is super common, in Microsoft speak this would be like expecting us to ship with all IIS logging off, cause "who cares"

Ubuntu ships with 1 year log retention for NGINX same goes for many other distros. We adapted the rules for our docker container to be far less ambitious, but keeping NGINX logs for 1 week is not in the realm of some insane softwarenaut.

Just sayin.

chubertdev

Ok, now I'm confused. Are the logging levels of the server software set by the Discodevs? I was under the impression that it would be set by someone like Alex.

sam

All our installs are based off our official Docker container, it tries to set sane defaults for containers in the wild.

This particular default is here:

discourse_docker/templates/web.template.yml at master · discourse/discourse_docker

A Docker image for Discourse. Contribute to discourse/discourse_docker development by creating an account on GitHub.

accalia

hmm... not too bad, but wouldn't it be saner to rotate logs based on file size, cutting off at ~100MB would give plenty of logs to find what went on immediately after an outage and would place an upper limit on how much space the logs can take up.

this is what we have set up in our apache instance at work were we're serving almost 50 sites (most of them have a weird pattern where they are dormant for most of the year and then have an active month) and this means that we can budget our disk space more reliably and cannot be knocked offline because our logs partition got full. in practice for us, our biggest and most active site rotates logs about once every 18-24 hours with the 100MB limit we have set. the less active sites have on occasion gone almost 30 days before rotating.

we also set up the logs to truncate useragent at 50 characters when logging. for any purposes us devs care about the interesting stuff is always in the first 50 characters or the source IP. for the sales people they use GA and that can handle user agent however the heck it wants.

Matches

I generally do both, two weeks and 3 logs of max 100mb (subject to hdd size of course)

nightware

@sam said:

sv 1 unicorn

Discourse has unicorns!

sam

Setting max sizes a good suggestion, just need to figure out how to get logrotate to submit to this order.

accalia

i can tell you how to do it under apache, not sure how to do it under NGINX

the wiki on nginx is .... less than helpful...

Log Rotation | NGINX

sam

Looks like if I get us on logrotate 3.8.1 I may have workable solution.

logrotate daily and size?

If a logrotate config is specified with "size" and "daily" parameters, which one takes precedence? Where is this documented? I would like these rotations to occur as a boolean OR operation, ie, if ...

accalia

I think it's a move that would be wise. not sure of the repercussions of such an upgrade for a project like discourse of course...

PJH

@sam said:

Setting max sizes a good suggestion, just need to figure out how to get logrotate to submit to this order.

We use this, with logrotate (3.8.1) on an hourly crontab (the desired effect is at least 24 hours debug logs, but since the hardware is space-restricted we can't wait a whole day to rotate debug.log)

[7.4.3:root@centos HARDWARE4_V7.4.3.x]# cat os/logrotate/logrotate.d/debug 
/var/log/debug.log {
        rotate 24
        missingok
        compress
        size 104857600
        postrotate
        /bin/killall -HUP syslogd
        endscript
}

M_Adams

@sam said:

sv 1 unicorn

There's the problem! Since, ya know, unicorns and rainbows…

We've only been allocated room for 1 unicorn and no rainbows. Of course the logs are falling over!
:) :)