Adding the Data Explorer to TDWTF
-
Continuing the discussion from ๐ถ shadowmod queries:
@PJH said:
[pjh@sofa ~]$ aws s3 ls s3://wtdwtf-back | tail -n4
2015-07-28 06:08:48 692104694 what-the-daily-wtf-2015-07-28-050006.tar.gz
2015-07-29 04:30:09 6133509166 what-the-daily-wtf-2015-07-29-030935.tar.gz
2015-07-29 06:08:18 693088360 what-the-daily-wtf-2015-07-29-050006.tar.gz
2015-07-30 06:08:13 694800081 what-the-daily-wtf-2015-07-30-050006.tar.gz
[pjh@sofa ~]$ ls -l ~/src/SockBot/backups/ | anonymize
total 317516
-rw-r--r-- 1 pjh pjh 325123299 Jul 30 06:22 what-the-daily-wtf-2015-07-30-050006.tar.gz
-rw-r--r-- 1 pjh pjh 6123 Jul 30 06:26 what-the-daily-wtf-2015-07-30-050006.tar.gz.log
[pjh@sofa ~]$
Pass.
wait.... so the backup is good but the download is truncated?
.... 10ยนโฐโฐ
@PJH Data Explorer has matured, I think you should install it at the next update.
Plugin URL is https://github.com/discourse/discourse-data-explorer - just add it next to the
git clone
line for docker_manager.Once that's in, you can transfer the queries to its format. I also believe that the schema viewer will be incredibly helpful in writing your queries.
For the parameterized queries, you'll need to do this:
-
Remove LIMIT at the end of your queries, this is replaced by &limit= (default 1000)
-
Replace
$1::varchar
-style parameters with:username
-style parameters -
Declare all parameters in the query header
-- [params] -- user_id :the_user_id -- null int :three = 3 ... JOIN users u ON u.id = S.user_id AND u.id = :the_user_id
Acceptable parameter types include
boolean
,int
,user_id
(same as int, but you can POST the username),post_id
(same as int, but you can POST a /t/-/1234/2 URL),topic_id
(guess what),category_id
(POST the category name, or "parent/subcategory")int_list
(for use invalue IN (:type_ids)
statements; POST them comma-separated),string_list
(same),user_list
(same as int_list, but you POST the usernames, comma-separated), and puttingnull
in front of any of the types.For @accalia:
You actually don't www-form-encode the parameters, but rather JSON-encode them and put that in the
params
value of the form.
-
-
Can't wait to break it!
(seriously though, that seems nice and actually pretty useful. Especially in context of bots and client-side plugins, which can now be sort-of enhanced with those queries)
-
Access is strictly admin-only, so client-side plugins don't get help here.
It's so strict, anyone else 404s ;) (AdminConstraint.new is, honestly, an abuse of route constraints, but it puts the config in a hard-to-miss place, you never accidentally add a /admin route users can reach)
-
Aww, shucks. Thought you'd let the admins whitelist some queries, perhaps with pre-specified parameters like user IDs.
It's up to @PJH to break it, then.
-
i'll look into that when i update statsporn to 2.0.
-
It's up to @PJH to break it, then.
Is that spinner at the bottom supposed to remain there for a minute?
Something's not right here:
Tried
%...%
only because this didn't work at all:Changing the type of
username
:All very interesting...
-
Aww, shucks. Thought you'd let the admins whitelist some queries, perhaps with pre-specified parameters like user IDs.
I believe the goal is to let @shadowmod use that so we'd get queries against live data instead of backups subject to the vagaries of discobackups and the financial acumen of @PJH's roommate or whatever who's responsible for paying the internet bill.
-
I believe the goal is to let @shadowmod use that so we'd get queries against live data instead of backups subject to the vagaries of discobackups and the financial acumen of @PJH's roommate or whatever who's responsible for paying the internet bill.
One thing has not been taken account of, which doesn't really matter with the current setup, is the potential hit on the server actually processing said queries.
F'rexample, on empirical evidence, an
attendance
query takes around a second:https://what.thedailywtf.com/t/shadowmod-queries/6921/5287?u=pjh
-
One thing has not been taken account of, which doesn't really matter with the current setup, is the potential hit on the server actually processing said queries.
setInterval
ready for takeoff, sir...
Filed under: nah, we'll try to be responsible
-
setInterval ready for takeoff, sir...
did you know that sockbot instances can be told to ignore certain users?
by default it's only @blakeyrat and @PaulaBean that are ignored, but i'm sure @PJH can add @Maciejasjmj to that list for his bots.
-
Why the fuck would you call me into this thread to tell me your annoying bullshit is ignoring me?
Thanks. Now how about you not sending me pointless notifications too.
-
- It's not polite to talk about people behind their back.
- did you know that you also have a list of people that you ignore and cannot generate notifications of any kind for you? it's true! by default it's empty, but i'm sure you could be convinced to add @accalia to that list!
-
One thing has not been taken account of, which doesn't really matter with the current setup, is the potential hit on the server actually processing said queries.
Yes, I imagine the cooldown stuff will need to be managed differently. Maybe a queue that throttles the frequency, plus jumping requests from users who haven't requested something jumps in front of waiting requests of a user who just requested a bunch of things.
-
Attempt 2: user_id is an integer, look at my first post, "Declare all parameters in the query header". You put in the username but the query gets the ID.
Attempt 3: You should have used
u.username LIKE '%' || :username || '%'
Attempt 4: you should have used
u.username IN (:usernames)
Because putting in the username is easier for humans, and the ID is more useful for the query.
-
What, you don't know who user 922 is?
-
This is also telling me that I should write a manual for this thing. Or revisit some of the design decisions. Not sure which.
Including example queries that exercised common facilities would help, too.
-
I always find examples to be the biggest help. I can copy them and get stuff working and the understanding can creep up on me instead of being frustrated because the documentation author thinks about things differently than I do.
-
Here's a query I'm thinking about making default:
who-likes-each-other-the-most.dcquery.json (855 Bytes)
-- [params] -- null user_id :filter_liked -- null user_id :filter_liker WITH pairs AS ( SELECT p.user_id liked, pa.user_id liker FROM post_actions pa LEFT JOIN posts p ON p.id = pa.post_id WHERE post_action_type_id = 2 AND (:filter_liker IS NULL OR pa.user_id = :filter_liker) AND (:filter_liked IS NULL OR p.user_id = :filter_liked) ) SELECT liker user$liker_user_id, 'โ' arrow, liked user$liked_user_id, count(*) FROM pairs GROUP BY liked, liker HAVING count(*) > 10 ORDER BY count DESC LIMIT 25
-
interesting query....
how would this one look in the new data explorer?
https://github.com/SockDrawer/SockBot/blob/master/sock_modules/stats.yml#L31-L57
or this one?
https://github.com/SockDrawer/SockBot/blob/master/sock_modules/stats.yml#L65-L110
those are two of the more popular querries shadowmod gets asked.
-
Speaking of data queries, is there any way to figure out what the site requires to attain a trust level? Some parts of it are configurable, right? I'm probably going to go do a local install to play with the configuration settings, but I'm not sure where a normal user could find that out?
-
MonthPosts gets no edits.
Attendance changes like this:
-- [params] -- string :username_pattern -- (rest of query here) JOIN users u ON u.id=S.user_id AND u.username ILIKE :username_pattern ORDER BY u.id ASC, S.visited_at DESC
And, in the form body, limit=50
-
hmm.....
i can see i'll have to see about this...
of course this has the potential to cause much server cooties....
hmm.... I think i'm going to need access to a database dump to properly test an update to this data explorer for @shadowmod.... and i don't think that's going to happen, not unless i bribe @PJH with enough lager to bathe in.
-
-
right, right, but your forum has no where near the data volume that WTDWTF has.
i've been bitten with that before accidentally causing cooties storms because something that worked fine on your test forum just didn't scale to here.
it's the data volume i'd want to test not the actual functionality of the data explorer plugin.
-
Here's the MonthPosts results on Meta. 458ms.
{"success":true,"errors":[],"duration":458.8,"params":{},"columns":["rank","username","percent","count"],"rows":[["1","codinghorror","0.63","259"],["2","sam","1.27","246"],["3","riking","1.90","127"],["4","Mittineague","2.53","64"],["5","downey","3.16","43"],["6","gingerling","3.80","43"],["7","erlend_sh","4.43","34"],["8","cpradio","5.06","34"],["9","zogstrip","5.70","33"],["10","eviltrout","6.33","32"],["11","AstonJ","6.96","28"],["12","tgxworld","7.59","27"],["13","AmiAthena","8.23","26"],["14","jesselperry","8.86","24"],["15","meglio","9.49","23"],["16","mcwumbly","10.13","20"],["17","steve_pd","10.76","20"],["18","purldator","11.39","20"],["19","Falco","12.03","17"],["20","rewphus","12.66","17"],["21","tobiaseigen","13.29","16"],["22","alefattorini","13.92","16"],["23","Lee_Ars","14.56","14"],["24","vicatcu","15.19","14"],["25","techAPJ","15.82","14"],["26","PJH","16.46","14"],["27","dmitry_fedyuk","17.09","14"],["28","Tom_Newsom","17.72","13"],["29","will_io","18.35","13"],["30","gerhard","18.99","12"],["31","Lawal","19.62","12"],["32","mr8","20.25","12"],["33","strager","20.89","11"],["34","snjqi188","21.52","10"],["35","DeanMarkTaylor","22.15","9"],["36","pjv","22.78","9"],["37","trudat","23.42","9"],["38","vulkanino","24.05","9"],["39","fefrei","24.68","9"]]}
-
"duration":458.8,
....... please tell me that is in miliseconds, not secondsnevermind you did
-
well that's something. what about attendance?
-
Please note that Meta is on better hardware than TDWTF, and there's a server (mostly) dedicated to all of the databases.
Also:
username start_range end_range days sam 2013-02-01 2015-08-06 917
Actually, it seems that attendance takes about 1sec no matter how many users it is :s
-
Actually, it seems that attendance takes about 1sec no matter how many users it is :s
CLOSED FIXED
-- [params] -- user_list :users WITH StartingPoints AS ( SELECT user_id, visited_at, ROW_NUMBER() OVER(ORDER BY user_id, visited_at) AS rownum FROM user_visits AS A WHERE NOT EXISTS ( SELECT 1 FROM user_visits AS B WHERE B.visited_at = A.visited_at - INTERVAL '1 day' AND B.user_id = A.user_id ) AND user_id IN (:users) ), EndingPoints AS ( SELECT user_id, visited_at, ROW_NUMBER() OVER(ORDER BY user_id, visited_at) AS rownum FROM user_visits AS A WHERE NOT EXISTS ( SELECT 1 FROM user_visits AS B WHERE B.visited_at = A.visited_at + INTERVAL '1 day' AND B.user_id = A.user_id ) AND user_id IN (:users) ) SELECT u.username, S.visited_at AS start_range, E.visited_at AS end_range, (E.visited_at - S.visited_at + 1) AS Days FROM StartingPoints AS S JOIN EndingPoints AS E ON E.rownum = S.rownum JOIN users u ON u.id=S.user_id AND u.id IN (:users) ORDER BY u.id ASC, S.visited_at DESC
10ms
-
Please note that Meta is on better hardware than TDWTF, and there's a server (mostly) dedicated to all of the databases.
and this is why i want a proper DB to play with so i can make sure i don't cripple our site
I wonder if i could get a script to sanitize the DB (put 0.0.0.0 for all the IPs, "this is a post" for all the posts, replace all emails with "tim@example.com", etc.)
-
Speaking of data queries, is there any way to figure out what the site requires to attain a trust level? Some parts of it are configurable, right?
That's the sort of thing some sites would prefer to keep 'secret' to stop people trying to game the system. Otherwise they could have just made the criteria public for all installs.
-
-
hmm.... what is @sockbot's trust level right now? TL3 IIRC
EDIT: Yup.
-
@accalia has summoned me, and so I appear.