Adding the Data Explorer to TDWTF



  • Continuing the discussion from 🕶 shadowmod queries:

    @accalia said:

    @PJH said:

    [pjh@sofa ~]$ aws s3 ls s3://wtdwtf-back | tail -n4
    2015-07-28 06:08:48 692104694 what-the-daily-wtf-2015-07-28-050006.tar.gz
    2015-07-29 04:30:09 6133509166 what-the-daily-wtf-2015-07-29-030935.tar.gz
    2015-07-29 06:08:18 693088360 what-the-daily-wtf-2015-07-29-050006.tar.gz
    2015-07-30 06:08:13 694800081 what-the-daily-wtf-2015-07-30-050006.tar.gz
    [pjh@sofa ~]$ ls -l ~/src/SockBot/backups/ | anonymize
    total 317516
    -rw-r--r-- 1 pjh pjh 325123299 Jul 30 06:22 what-the-daily-wtf-2015-07-30-050006.tar.gz
    -rw-r--r-- 1 pjh pjh 6123 Jul 30 06:26 what-the-daily-wtf-2015-07-30-050006.tar.gz.log
    [pjh@sofa ~]$

    Pass.

    wait.... so the backup is good but the download is truncated?

    .... :wtf:10¹⁰⁰

    @PJH Data Explorer has matured, I think you should install it at the next update.

    Plugin URL is https://github.com/discourse/discourse-data-explorer - just add it next to the git clone line for docker_manager.

    Once that's in, you can transfer the queries to its format. I also believe that the schema viewer will be incredibly helpful in writing your queries.

    For the parameterized queries, you'll need to do this:

    • Remove LIMIT at the end of your queries, this is replaced by &limit= (default 1000)

    • Replace $1::varchar-style parameters with :username-style parameters

    • Declare all parameters in the query header

       -- [params]
       -- user_id :the_user_id
       -- null int :three = 3
       
       ...
      
       JOIN users u ON u.id = S.user_id AND
       u.id = :the_user_id
      

    Acceptable parameter types include boolean, int, user_id (same as int, but you can POST the username), post_id (same as int, but you can POST a /t/-/1234/2 URL), topic_id (guess what), category_id (POST the category name, or "parent/subcategory") int_list (for use in value IN (:type_ids) statements; POST them comma-separated), string_list (same), user_list (same as int_list, but you POST the usernames, comma-separated), and putting null in front of any of the types.

    For @accalia:

    You actually don't www-form-encode the parameters, but rather JSON-encode them and put that in the params value of the form.



  • Can't wait to break it!

    (seriously though, that seems nice and actually pretty useful. Especially in context of bots and client-side plugins, which can now be sort-of enhanced with those queries)



  • Access is strictly admin-only, so client-side plugins don't get help here.

    It's so strict, anyone else 404s ;) (AdminConstraint.new is, honestly, an abuse of route constraints, but it puts the config in a hard-to-miss place, you never accidentally add a /admin route users can reach)



  • Aww, shucks. Thought you'd let the admins whitelist some queries, perhaps with pre-specified parameters like user IDs.

    It's up to @PJH to break it, then.


  • sockdevs

    i'll look into that when i update statsporn to 2.0.


  • Discourse touched me in a no-no place

    @Maciejasjmj said:

    It's up to @PJH to break it, then.

    Is that spinner at the bottom supposed to remain there for a minute?

    Something's not right here:

    Tried %...% only because this didn't work at all:

    Changing the type of username:

    All very interesting...



  • @Maciejasjmj said:

    Aww, shucks. Thought you'd let the admins whitelist some queries, perhaps with pre-specified parameters like user IDs.

    I believe the goal is to let @shadowmod use that so we'd get queries against live data instead of backups subject to the vagaries of discobackups and the financial acumen of @PJH's roommate or whatever who's responsible for paying the internet bill.


  • Discourse touched me in a no-no place

    @boomzilla said:

    I believe the goal is to let @shadowmod use that so we'd get queries against live data instead of backups subject to the vagaries of discobackups and the financial acumen of @PJH's roommate or whatever who's responsible for paying the internet bill.

    One thing has not been taken account of, which doesn't really matter with the current setup, is the potential hit on the server actually processing said queries.

    F'rexample, on empirical evidence, an attendance query takes around a second:

    https://what.thedailywtf.com/t/shadowmod-queries/6921/5287?u=pjh



  • @PJH said:

    One thing has not been taken account of, which doesn't really matter with the current setup, is the potential hit on the server actually processing said queries.

    setInterval ready for takeoff, sir...


    Filed under: nah, we'll try to be responsible


  • sockdevs

    @Maciejasjmj said:

    setInterval ready for takeoff, sir...

    did you know that sockbot instances can be told to ignore certain users?

    by default it's only @blakeyrat and @PaulaBean that are ignored, but i'm sure @PJH can add @Maciejasjmj to that list for his bots.

    :passport_control:



  • Why the fuck would you call me into this thread to tell me your annoying bullshit is ignoring me?

    Thanks. Now how about you not sending me pointless notifications too.


  • sockdevs

    1. It's not polite to talk about people behind their back.
    2. did you know that you also have a list of people that you ignore and cannot generate notifications of any kind for you? it's true! by default it's empty, but i'm sure you could be convinced to add @accalia to that list!


  • @PJH said:

    One thing has not been taken account of, which doesn't really matter with the current setup, is the potential hit on the server actually processing said queries.

    Yes, I imagine the cooldown stuff will need to be managed differently. Maybe a queue that throttles the frequency, plus jumping requests from users who haven't requested something jumps in front of waiting requests of a user who just requested a bunch of things.



  • Attempt 2: user_id is an integer, look at my first post, "Declare all parameters in the query header". You put in the username but the query gets the ID.

    Attempt 3: You should have used u.username LIKE '%' || :username || '%'

    Attempt 4: you should have used u.username IN (:usernames)

    Because putting in the username is easier for humans, and the ID is more useful for the query.


  • area_deu

    What, you don't know who user 922 is?



  • This is also telling me that I should write a manual for this thing. Or revisit some of the design decisions. Not sure which.

    Including example queries that exercised common facilities would help, too.



  • I always find examples to be the biggest help. I can copy them and get stuff working and the understanding can creep up on me instead of being frustrated because the documentation author thinks about things differently than I do.



  • Here's a query I'm thinking about making default:

    who-likes-each-other-the-most.dcquery.json (855 Bytes)

    -- [params]
    -- null user_id :filter_liked
    -- null user_id :filter_liker
    
    WITH pairs AS (
        SELECT p.user_id liked, pa.user_id liker
        FROM post_actions pa
        LEFT JOIN posts p ON p.id = pa.post_id
        WHERE post_action_type_id = 2
        AND (:filter_liker IS NULL OR pa.user_id = :filter_liker)
        AND (:filter_liked IS NULL OR p.user_id = :filter_liked)
    )
    SELECT liker user$liker_user_id, '→' arrow, liked user$liked_user_id, count(*)
    FROM pairs
    GROUP BY liked, liker
    HAVING count(*) > 10
    ORDER BY count DESC
    LIMIT 25
    

  • sockdevs

    interesting query....

    how would this one look in the new data explorer?

    https://github.com/SockDrawer/SockBot/blob/master/sock_modules/stats.yml#L31-L57

    or this one?

    https://github.com/SockDrawer/SockBot/blob/master/sock_modules/stats.yml#L65-L110

    those are two of the more popular querries shadowmod gets asked.


  • I survived the hour long Uno hand

    Speaking of data queries, is there any way to figure out what the site requires to attain a trust level? Some parts of it are configurable, right? I'm probably going to go do a local install to play with the configuration settings, but I'm not sure where a normal user could find that out?



  • MonthPosts gets no edits.

    Attendance changes like this:

    -- [params]
    -- string :username_pattern
    
    -- (rest of query here)
    
    JOIN users u ON u.id=S.user_id
    AND u.username ILIKE :username_pattern
    ORDER BY u.id ASC, S.visited_at DESC
    

    And, in the form body, limit=50


  • sockdevs

    hmm.....

    i can see i'll have to see about this...

    of course this has the potential to cause much server cooties....

    hmm.... I think i'm going to need access to a database dump to properly test an update to this data explorer for @shadowmod.... and i don't think that's going to happen, not unless i bribe @PJH with enough lager to bathe in.




  • sockdevs

    right, right, but your forum has no where near the data volume that WTDWTF has.

    i've been bitten with that before accidentally causing cooties storms because something that worked fine on your test forum just didn't scale to here.

    it's the data volume i'd want to test not the actual functionality of the data explorer plugin.



  • Here's the MonthPosts results on Meta. 458ms.

    {"success":true,"errors":[],"duration":458.8,"params":{},"columns":["rank","username","percent","count"],"rows":[["1","codinghorror","0.63","259"],["2","sam","1.27","246"],["3","riking","1.90","127"],["4","Mittineague","2.53","64"],["5","downey","3.16","43"],["6","gingerling","3.80","43"],["7","erlend_sh","4.43","34"],["8","cpradio","5.06","34"],["9","zogstrip","5.70","33"],["10","eviltrout","6.33","32"],["11","AstonJ","6.96","28"],["12","tgxworld","7.59","27"],["13","AmiAthena","8.23","26"],["14","jesselperry","8.86","24"],["15","meglio","9.49","23"],["16","mcwumbly","10.13","20"],["17","steve_pd","10.76","20"],["18","purldator","11.39","20"],["19","Falco","12.03","17"],["20","rewphus","12.66","17"],["21","tobiaseigen","13.29","16"],["22","alefattorini","13.92","16"],["23","Lee_Ars","14.56","14"],["24","vicatcu","15.19","14"],["25","techAPJ","15.82","14"],["26","PJH","16.46","14"],["27","dmitry_fedyuk","17.09","14"],["28","Tom_Newsom","17.72","13"],["29","will_io","18.35","13"],["30","gerhard","18.99","12"],["31","Lawal","19.62","12"],["32","mr8","20.25","12"],["33","strager","20.89","11"],["34","snjqi188","21.52","10"],["35","DeanMarkTaylor","22.15","9"],["36","pjv","22.78","9"],["37","trudat","23.42","9"],["38","vulkanino","24.05","9"],["39","fefrei","24.68","9"]]}


  • sockdevs

    @riking said:

    "duration":458.8,

    ....... please tell me that is in miliseconds, not seconds

    nevermind you did


  • sockdevs

    well that's something. what about attendance?



  • Please note that Meta is on better hardware than TDWTF, and there's a server (mostly) dedicated to all of the databases.

    Also:

    username	start_range	end_range	days
    sam	2013-02-01	2015-08-06	917
    

    Actually, it seems that attendance takes about 1sec no matter how many users it is :s



  • @riking said:

    Actually, it seems that attendance takes about 1sec no matter how many users it is :s

    CLOSED FIXED

    -- [params]
    -- user_list :users
    
    WITH StartingPoints AS (
      SELECT user_id, visited_at, ROW_NUMBER() OVER(ORDER BY user_id, visited_at) AS rownum
      FROM user_visits AS A
      WHERE NOT EXISTS (
         SELECT 1
         FROM user_visits AS B
         WHERE B.visited_at = A.visited_at - INTERVAL '1 day' AND
         B.user_id = A.user_id
      ) AND user_id IN (:users)
    ),
    EndingPoints AS (
      SELECT user_id, visited_at, ROW_NUMBER() OVER(ORDER BY user_id, visited_at) AS rownum
      FROM user_visits AS A
      WHERE NOT EXISTS (
         SELECT 1
         FROM user_visits AS B
         WHERE B.visited_at = A.visited_at + INTERVAL '1 day' AND
         B.user_id = A.user_id
      ) AND user_id IN (:users)
    )
    SELECT u.username, S.visited_at AS start_range, E.visited_at AS end_range, (E.visited_at - S.visited_at + 1) AS Days
    FROM StartingPoints AS S
    JOIN EndingPoints AS E ON E.rownum = S.rownum
    JOIN users u ON u.id=S.user_id AND
    u.id IN (:users)
    ORDER BY u.id ASC, S.visited_at DESC
    

    10ms


  • sockdevs

    @riking said:

    Please note that Meta is on better hardware than TDWTF, and there's a server (mostly) dedicated to all of the databases.

    and this is why i want a proper DB to play with so i can make sure i don't cripple our site

    I wonder if i could get a script to sanitize the DB (put 0.0.0.0 for all the IPs, "this is a post" for all the posts, replace all emails with "tim@example.com", etc.)


  • Discourse touched me in a no-no place

    @Yamikuronue said:

    Speaking of data queries, is there any way to figure out what the site requires to attain a trust level? Some parts of it are configurable, right?

    That's the sort of thing some sites would prefer to keep 'secret' to stop people trying to game the system. Otherwise they could have just made the criteria public for all installs.


  • Discourse touched me in a no-no place

    @PJH said:

    trying to game the system

    That's OK. We'd never do that. :innocent:


  • sockdevs

    hmm.... what is @sockbot's trust level right now? TL3 IIRC

    EDIT: Yup.



  • @accalia has summoned me, and so I appear.

    <!-- Posted by SockBot 0.16 "Hazardous Hera" (+417:4eb2731) on behalf of &#x40;accalia -->

Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.