Cooties



  • If I had a lot of free time, I'd start a blog and publish a post a day about something Discourse's developers didn't understand and therefore ruined.

    The database is like fifteen gigabytes uncompressed. The uploads folder is 8.6GB.

    In fact, let's look at the uploads directory.

    5.8M    /var/discourse/shared/standalone/uploads/default/avatars
    2.6M    /var/discourse/shared/standalone/uploads/default/_emoji
    739M    /var/discourse/shared/standalone/uploads/default/optimized
    2.2G    /var/discourse/shared/standalone/uploads/default/_optimized
    1.9G    /var/discourse/shared/standalone/uploads/default/original
    du: cannot access ‘/var/discourse/shared/standalone/uploads/default/_original’: No such file or directory
    

    And then I was all

    root@what:~# du -sh /var/discourse/shared/standalone/uploads/default/<tab>
    

    ... wait, why isn't tab completion working?

    [fifteen seconds later]

    Display all 21492 possibilities? (y or n)
    

    Fun fact: there are no files in the uploads directory. Of the 21492 directories inside, only 5 have multiple children.

    50630 directories
    70346 files
    

    So let's look at the data we've collected.

    • 15 gigabyte database. Let's call it 16 gigabytes just to be nice to Discourse.
    • 8.6 gigabyte uploads directory tree, but with 0.7 + 2.2 gigabytes of "optimized" images. So that's 5.7 gigabytes of uploads.
    • Using my incredible math skills and the ability to type two digit numbers into a calculator, I have discovered that the entirety of this forum's data takes up 21.7 gigabytes of disk space.
    • The server has a 60GB root filesystem.
    • According to Ubuntu's system requirements, Ubuntu Server takes up about a gigabyte.
    • Therefore, Discourse has managed to fill up 60 gigabytes of space with 22.7 gigabytes of data. If that isn't amazing, I don't know what is.

  • Notification Spam Recipient

    @ben_lubar said:

    22.7 gigabytes of data

    Did you ever find where it creeped?



  • @ben_lubar said:

    If I had a lot of free time

    Actually, maybe I should write up some abstracts and let the good people at http://thedailywtf.com/submit-wtf do the blogging.



  • @ben_lubar said:

    sam.saffron@[redacted because fuck spam].com

    Should be sam.saffron@[redacted%20because%20fuck%20spam].com, surely?



  • @ben_lubar said:

    ```
    root@what:~# du -s /var/lib/docker
    34178240 /var/lib/docker
    root@what:~# du -s /var/discourse/shared/standalone
    37164992 /var/discourse/shared/standalone

    
    @ben_lubar <a href="/t/via-quote/53615/96">said</a>:<blockquote>```
    root@what:/# du --max-depth=1 /var/www/discourse/tmp/backups/default/
    3720040 /var/www/discourse/tmp/backups/default/2015-12-19-042932
    3722508 /var/www/discourse/tmp/backups/default/2015-12-20-035253
    7282576 /var/www/discourse/tmp/backups/default/2015-12-22-044854
    ```</blockquote>
    
    Given that information, plus the additional information that:
    
    - Discourse doesn't lose all of its information when you "rebuild the docker instance", which is a thing that only Discourse developers have ever told anyone to do.
    - `/var/www/discourse` is inside `/var/lib/docker` because magic.
    - The Ubuntu running Discourse is inside the Ubuntu running Docker and uses separate system files.
    
    We can figure out:
    
    - The contents of `/var/lib/docker` minus the three failed backups' temporary files is about 20GB. At least 1GB of that is Ubuntu.
    
    I'm not sure what the 19GB of Discourse is for, but that's what it is.

  • Notification Spam Recipient

    @ben_lubar said:

    19GB of Discourse

    Well, yeah, but shirley it resides something logical, right? Like, is it cache? What is the nature of the temp files?



  • root@what:/# du -h --max-depth=1 /var/www/discourse/tmp/
    8.0K /var/www/discourse/tmp/backups
    20K /var/www/discourse/tmp/pids
    1.9M /var/www/discourse/tmp/ember-rails
    46M /var/www/discourse/tmp/stylesheet-cache
    65M /var/www/discourse/tmp/cache
    4.0K /var/www/discourse/tmp/miniprofiler
    4.0K /var/www/discourse/tmp/sockets
    112M /var/www/discourse/tmp/

    root@what:/# ls /var/www/discourse/vendor/bundle/ruby/2.0.0/gems | wc -l
    247
    
    root@what:/# du -sh /var/www/discourse/vendor/bundle/ruby/2.0.0/gems
    363M    /var/www/discourse/vendor/bundle/ruby/2.0.0/gems
    

    The entire filesystem, apart from the mounted host directory, is about 2GB. Something's wrong with that taking up 20GB.


  • Notification Spam Recipient

    @ben_lubar said:

    Something's wrong with that taking up 20GB.

    I'll say!
    Maybe it's all the Sparse nginx logs I heard mentioned in another thread... :trollface:

    Edit: Nope, it was this thread. How do we detect if invisible files are taking up space?


  • Notification Spam Recipient

    Interesting ServerFault question about this:

    What's lsof claim is being used?



  • @Tsaukpaetra said:

    How do we detect if invisible files are taking up space?

    @Tsaukpaetra said:

    What's lsof claim is being used?

    (nb: I replaced the outside-of-container username with the inside-of-container username for the same user ID)

    root@what:~# lsof | head -n 1; lsof | grep deleted
    COMMAND     PID   TID       USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
    postmaste  1966       postgres    287u      REG              253,1   16777216    3155775 /shared/postgres_data/pg_xlog/000000010000075200000073 (deleted)
    postmaste  8500       postgres    175u      REG              253,1   16777216    3155894 /shared/postgres_data/pg_xlog/00000001000007520000008B (deleted)
    postmaste 12803       postgres    260u      REG              253,1   16777216    3155816 /shared/postgres_data/pg_xlog/000000010000075200000082 (deleted)
    


  • From that question: http://serverfault.com/questions/275206/disk-full-du-tells-different-how-to-further-investigate/581521#comment882057_275233

    @PJH, how many times have you updated Discourse since the last "rebuild of the container"? Docker containers are supposed to be built, used with an external data storage location, and then discarded when there's an update. They're not VMs. Discourse is :doing_it_wrong:.


  • Discourse touched me in a no-no place

    @ben_lubar said:

    So, something huge is inside Docker. @PJH, what do we do?

    Wait for this?




  • Discourse touched me in a no-no place

    @ben_lubar said:

    I'm not sure what the 19GB of Discourse is for

    Don't worry, neither are the Discodevs I'd imagine.


  • Discourse touched me in a no-no place

    @ben_lubar said:

    They're not VMs. Discourse is :doing_it_wrong:.

    Shocker.


  • Discourse touched me in a no-no place

    @ben_lubar said:

    @PJH, how many times have you updated Discourse since the last "rebuild of the container"?

    None, the last rebuild was the last update. (Or rather the last update was a rebuild. )


  • Discourse touched me in a no-no place

    @ben_lubar said:

    Before today's backupcrash that I have to clean up after every day, can you disable the automated backup in the admin panel?

    Better late than never...

    Seems the main backup only tries once a week.

    Will hobble @shadowmod later.


  • I survived the hour long Uno hand

    @ben_lubar said:

    maybe I should write up some abstracts and let the good people at http://thedailywtf.com/submit-wtf do the blogging.

    I'd be happy to help! :)


  • Java Dev

    @ben_lubar said:

    I'm not sure what the 19GB of Discourse is for, but that's what it is.

    More temporary file leak?



  • Did anyone check the contents of the tar file?



  • Yeah, @tar, what're you made of?



  • This is definitely fascinating. Keep up the good work, @ben_lubar!

    Also, why does it take several seconds for the name suggestion popup to show? If I can write an LDAP query which crawls through 10 AD domains across 6 continents in less than 5 seconds, then surely Discourse should be able to do a SELECT TOP(5) username FROM users WHERE username LIKE '@query%' in less than 1 second?


  • Discourse touched me in a no-no place

    Your LDAP queries aren't going through 29174 layers of JS and Ruby hell presumably.


  • FoxDev

    @loopback0 said:

    Your LDAP queries aren't going through 2917431415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679 layers of JS and Ruby hell presumably.

    FTFMFY


  • Discourse touched me in a no-no place

    Updates..

    @ben_lubar said:

    db_logging_collector: on

    Now off

    @ben_lubar said:

    DISCOURSE_DEVELOPER_EMAILS: 'sam.saffron@[redacted because fuck spam].com'

    Adjusted

    @ben_lubar said:

    RBTRACE: 1

    Now 0.

    Of course, changing all of those won't take effect until the next rebuild.

    @ben_lubar said:

    @PJH how much of these can we get rid of?

    root@what:/var/discourse/shared/standalone# (df; df -h) | grep vda
    /dev/vda1       61796348 43769624  14864612  75% /
    /dev/vda1        59G   42G   15G  75% /
    root@what:/var/discourse/shared/standalone/postgres_data/pg_log# find /var/discourse/shared/standalone/postgres_data/pg_log -mtime +5  -exec rm {} \;
    root@what:/var/discourse/shared/standalone/postgres_data/pg_log# (df; df -h) | grep vda
    /dev/vda1       61796348 37316488  21317748  64% /
    /dev/vda1        59G   36G   21G  64% /
    root@what:/var/discourse/shared/standalone# rm /var/discourse/shared/standalone/log/var-log/nginx/*.gz
    root@what:/var/discourse/shared/standalone# (df; df -h) | grep vda
    /dev/vda1       61796348 36581764  22052472  63% /
    /dev/vda1        59G   35G   22G  63% /
    root@what:/var/discourse/shared/standalone# find /var/discourse/shared/standalone/postgres_data/pg_log -mtime +1  -exec rm {} \;
    root@what:/var/discourse/shared/standalone# (df; df -h) | grep vda
    /dev/vda1       61796348 36345348  22288888  62% /
    /dev/vda1        59G   35G   22G  62% /
    root@what:/var/discourse/shared/standalone# du -h /var/discourse/shared/standalone/{log,postgres_data/pg_*log} | sort -rh
    822M	/var/discourse/shared/standalone/log
    461M	/var/discourse/shared/standalone/log/var-log/nginx
    461M	/var/discourse/shared/standalone/log/var-log
    361M	/var/discourse/shared/standalone/log/rails
    273M	/var/discourse/shared/standalone/postgres_data/pg_xlog
    48M	/var/discourse/shared/standalone/postgres_data/pg_clog
    2.3M	/var/discourse/shared/standalone/postgres_data/pg_log
    12K	/var/discourse/shared/standalone/log/var-log/apt
    4.0K	/var/discourse/shared/standalone/postgres_data/pg_xlog/archive_status
    

    @ben_lubar said:

    docker images -a

    I don't know enough about docker to say which of those can be removed, or what the effect may be, though I agree, only one of them appears to be being used (some might be snapshots? Virtual Size isn't actual size etc.):

    root@what:/var/discourse/shared/standalone# docker images -a
    REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
    local_discourse/app    latest              1f19af41e12e        10 weeks ago        1.932 GB
    samsaffron/discourse   1.0.13              27f52292c186        3 months ago        1.238 GB
    <none>                 <none>              157f6a775410        3 months ago        1.238 GB
    <none>                 <none>              7ea991d02f7e        3 months ago        1.238 GB
    <none>                 <none>              65746d67224e        3 months ago        1.238 GB
    <none>                 <none>              dd46ba35af06        3 months ago        821.1 MB
    <none>                 <none>              d3a1f33e8a5a        4 months ago        188.2 MB
    root@what:/var/discourse/shared/standalone# docker ps -s
    CONTAINER ID        IMAGE                        COMMAND             CREATED             STATUS              PORTS                                      NAMES               SIZE
    899929e6af7a        local_discourse/app:latest   "/sbin/boot"        10 weeks ago        Up 14 hours         0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   app                 537.9 MB
    

    @ben_lubar said:

    The database is like fifteen gigabytes uncompressed. The uploads folder is 8.6GB.

    root@what:/var/discourse/shared/standalone/uploads# du -ah /var/discourse/shared/standalone/uploads | sort -rh | head -n50
    8.7G	/var/discourse/shared/standalone/uploads
    8.6G	/var/discourse/shared/standalone/uploads/default
    2.2G	/var/discourse/shared/standalone/uploads/default/_optimized
    1.9G	/var/discourse/shared/standalone/uploads/default/original
    1.7G	/var/discourse/shared/standalone/uploads/default/original/3X
    740M	/var/discourse/shared/standalone/uploads/default/optimized
    603M	/var/discourse/shared/standalone/uploads/default/optimized/3X
    122M	/var/discourse/shared/standalone/uploads/default/original/3X/e
    122M	/var/discourse/shared/standalone/uploads/default/original/3X/8
    113M	/var/discourse/shared/standalone/uploads/default/original/3X/1
    111M	/var/discourse/shared/standalone/uploads/default/original/3X/b
    <snip>
    

    Yeah. No.

    DiscoOptimized maybe...

    @ben_lubar said:

    crash that I have to clean up after every day

    What needs to be cleaned up?

    @AlexMedia said:

    Also, why does it take several seconds for the name suggestion popup to show?

    postgres@what:~$ psql -d discourse -c "select count(username) from users"
     count  
    --------
     141157
    (1 row)
    
    postgres@what:~$ 
    


  • Don't touch the pg_clog or pg_xlog. Those are transaction logs, not log-logs.



  • @fbmac said:

    Did anyone check the contents of the tar file?

    Oi!



  • @PJH said:

    I don't know enough about docker to say which of those can be removed, or what the effect may be, though I agree, only one of them appears to be being used (some might be snapshots? Virtual Size isn't actual size etc.):

    I have used the following commands in my little instance, worked for me:

    https://meta.discourse.org/t/low-on-disk-space-cleaning-up-old-docker-containers/15792/2


  • Discourse touched me in a no-no place

    @fbmac said:

    I have used the following commands in my little instance, worked for me:

    Saw that earlier in my investigations. Just did a repeat to show my conclusions...

    docker rm `docker ps -a | grep Exited | awk '{print $1 }'`

    root@what:~# docker ps -a | grep Exited 
    root@what:~# 
    

    Ok - none of those around.

    docker rmi `docker images -aq`

    root@what:~# docker images -aq
    1f19af41e12e
    27f52292c186
    7ea991d02f7e
    157f6a775410
    65746d67224e
    dd46ba35af06
    d3a1f33e8a5a
    

    Wait - what? What's the first one again?....

    root@what:~# docker images -a
    REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
    local_discourse/app    latest              1f19af41e12e        10 weeks ago        1.932 GB
    samsaffron/discourse   1.0.13              27f52292c186        3 months ago        1.238 GB
    <none>                 <none>              157f6a775410        3 months ago        1.238 GB
    <none>                 <none>              7ea991d02f7e        3 months ago        1.238 GB
    <none>                 <none>              65746d67224e        3 months ago        1.238 GB
    <none>                 <none>              dd46ba35af06        3 months ago        821.1 MB
    <none>                 <none>              d3a1f33e8a5a        4 months ago        188.2 MB
    root@what:~# 
    

    Er - no. That looks like very bad advice. I think I'd rather like to keep 1f19af41e12e.

    Perhaps something other than our running instance needs to be deleted.



  • @PJH said:

    @AlexMedia said:
    Also, why does it take several seconds for the name suggestion popup to show?

    postgres@what:~$ psql -d discourse -c "select count(username) from users"
     count  
    --------
     141157
    (1 row)
    
    postgres@what:~$ 
    ```</blockquote>
    
    That's quite a lot, but still... shouldn't it be faster than it is right now? It's just one column that you have to go through, and the query is a "starts with" query. 
    
    Also: yay discoursistency!
    
    Without space after the 'b':smile: 
    <img src="/uploads/default/original/3X/6/e/6ee57ca01b2eaf6cdbf84b8d39301aad20e2d089.png" width="690" height="172">
    
    With a space after the 'b':
    <img src="/uploads/default/original/3X/9/d/9db0632968f5380ce4ebbf4f2f317170b8788caa.png" width="690" height="154">

  • Discourse touched me in a no-no place

    @AlexMedia said:

    That's quite a lot, but still... shouldn't it be faster than it is right now? It's just one column that you have to go through, and the query is a "starts with" query.

    It's worse.



  • Aha, so they do a "contains". That explains why it's so slow...

    But why do they do that? :/


  • Discourse touched me in a no-no place

    @AlexMedia said:

    But why do they do that?

    Sanitizing user input. Or not in this case. _ is a valid username character (as can be seen from the image.)

    Anyone deliberately using it will be (absent knowledge of this particular foible) searching for a literal _ - they seem to pass it through unescaped to the SQL query, where it becomes a single character wildcard.



  • Is there ever going to be a front page article about our experiences with this shitty ass-forum?



  • To list images with no tag (for example, old versions of a tagged image when the image gets rebuilt):

    docker images -f dangling=true
    

    And to delete:

    docker images -f dangling=true -q | xargs docker rmi
    

  • Discourse touched me in a no-no place

    @ben_lubar said:

    docker images -f dangling=true

    root@what:~# docker images -f dangling=true
    REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
    root@what:~# 
    

    :wtf:



  • That's because all of the images on there are part of the Discourse image. They're like a singly-linked list.


  • Discourse touched me in a no-no place

    @blakeyrat said:

    Is there ever going to be a front page article about our experiences with this shitty ass-forum?

    Depending on the detail, it could probably keep the front-page occupied for at few months.

    Very unlikely to happen however, not least for the reason we're using it to begin with.


  • :belt_onion:

    @PJH said:

    @fbmac said:
    I have used the following commands in my little instance, worked for me:

    Saw that earlier in my investigations. Just did a repeat to show my conclusions...

    docker rm `docker ps -a | grep Exited | awk '{print $1 }'`

    root@what:~# docker ps -a | grep Exited 
    root@what:~# 
    

    Ok - none of those around.

    docker rmi `docker images -aq`

    root@what:~# docker images -aq
    1f19af41e12e
    27f52292c186
    7ea991d02f7e
    157f6a775410
    65746d67224e
    dd46ba35af06
    d3a1f33e8a5a
    

    Wait - what? What's the first one again?....

    root@what:~# docker images -a
    REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
    local_discourse/app    latest              1f19af41e12e        10 weeks ago        1.932 GB
    samsaffron/discourse   1.0.13              27f52292c186        3 months ago        1.238 GB
    <none>                 <none>              157f6a775410        3 months ago        1.238 GB
    <none>                 <none>              7ea991d02f7e        3 months ago        1.238 GB
    <none>                 <none>              65746d67224e        3 months ago        1.238 GB
    <none>                 <none>              dd46ba35af06        3 months ago        821.1 MB
    <none>                 <none>              d3a1f33e8a5a        4 months ago        188.2 MB
    root@what:~# 
    

    Er - no. That looks like very bad advice. I think I'd rather like to keep 1f19af41e12e.

    Perhaps something other than our running instance needs to be deleted.

    Apparently,

    The errors are fine for this rough script, docker will not delete images that are in use, so its complaining (correctly) that you are using these images.

    So........ They're relying on Docker being smart enough to not kill itself. Ohkay then


  • Discourse touched me in a no-no place

    Lovely formatting in that quote btw - looks nothing like the original...



  • It will, however, untag the images, which means you'll have to redownload them if you ever start another instance.



  • @PJH said:

    Lovely formatting in that quote btw - looks nothing like the original...

    See also the screenshots that I posted before. Looks like the buggy rendering isn't constrained to the preview pane.

    Adding a space to the end of the line underneath the quote might help. Or not, because Discourse.


  • Discourse touched me in a no-no place

    @AlexMedia said:

    Or not, because Discourse.

    It's apparently quoted the cooked, not the raw - I had to escape a few backticks in my OP to get it to render properly - they didn't make it through the wash...


  • :belt_onion:

    Nope, just tried copy-pasting the raw and it did the same thing.

    How do you get this so wrong?

    :headdesk:



  • @sloosecannon said:

    How do you get this so wrong?*<a???????????????????????????????
    FTFY


  • :belt_onion:

    + *twitch


  • Discourse touched me in a no-no place

    @rc4 said:

    Yeah, @tar, what're you made of?

    The stuff that makes Superman evil.



  • Look at the difference between how much code Dell has to write to support phpBB (actually, it's more than they should have written, since after each RUN command, a filesystem snapshot is taken and cached, so running apt-get update alone is a terrible idea and running apt-get clean doesn't actually free up disk space if it's not in the same RUN that created the files) versus how much code the DiscoDevs wrote for Discourse's docker image.

    Also keep in mind that Dell's base image is used by multiple child images, whereas Discourse's base image is used by multiple child images only if you count the images that are compiled on the client machine.

    https://github.com/discourse/discourse_docker/blob/master/image/discourse/Dockerfile


  • ♿ (Parody)

    They also look at the long names.


  • Winner of the 2016 Presidential Election

    @boomzilla said:

    They also look at the long names.

    *cough* I have NO idea whom that could affect. Who would even misuse the long name? That person must be the worsta genius at abusing Discourse

    Filed Under: 🚎


Log in to reply