So is the migration thing still happening?
-
Continuing the discussion from Two forums or what?:
I got the db dump from Alex last night, so I've only started looking through it. Right now I'm working on sorting out what is easy, what is relatively doable, what's difficult but theoretically possible, and what is impossible. I'll probably rope you in once I've got things sorted out. DoctorJones expressed interest in helping, too. We'll form a coalition of powerful and attractive heroes who work together for the glory of TDWTF.
@apapadimoulis can I have a database dump of some arbitrary subset of the old forum? I might be able to whip something up to convert the old forum's content to Discourse this week.
Filed under: the script will be powered by SSDS
-
can I have a database dump of some arbitrary subset of the old forum?
Do you not value your sanity?
-
-
You must be new here.
I've been watching you from the shadows for a while, but @ben_lubar always manages to surprise me.
Filed under: Not creepy, Well, maybe a little...
-
He has plenty time on his hands.
Fiddled Under: We all like to judge people, some of us do it public.
-
>Keith said:
It is possible to value something even though you do not actually possess it.ben_lubar said:
Do you not value your sanity?
You must be new here.
Filed under: Nested quotes sometimes work.
-
It is possible to value something even though you do not actually possess it.
Yes, but I don't think we have to worry about that in this particular case.
-
Awesome! I sent @moderator a back-up of the SQL Server database for CS, and the postgres database for discourse. You obviously would need both to do some sort of import... I can prep those, it's not so bad to set-up. Email me and I'll send the links.
-
Also, our migration code is open source. It is very much dev only at this point, but it is available in the tree on GitHub.
-
I tried to email you, but I think your email is broken.
Edit: Just sent an email to your Google+ address and it didn't come back as unsendable immediately, so it probably got through.
Filed under: Google+ is useful for something⸮
-
-
Could have stripped the (3 instances) of tags out of that backup to at least reduce it to <1GB...
-
I compressed it without opening it and its size went down by about 90%.
-
Select the file, then use Shift + Delete for maximum compression.
-
Select the file, then use <kbd>Shift</kbd> + <kbd>Delete</kbd> for maximum compression.
You could always use BARF.
-
I compressed it without opening it and its size went down by about 90%.
I wonder how much that says about TDWTF community.
-
I wonder how much that says about TDWTF community.
I expected more just from compressing the phrase TRWTF and profanities tbh.
-
-
creating users
331 / 138275 ( 0.2%) Skipping user id 333 because email is blank
604 / 138275 ( 0.4%) Skipping user id 1000 because email is blank
605 / 138275 ( 0.4%) Skipping user id 1001 because email is blank
5369 / 138275 ( 3.9%)
-
So, you now have a database of 138,272 email addresses.
Want a crate of beer?
Filed under: it's called resale and it's a valid business model so shut up
-
So, you now have a database of 138,272 email addresses.
Want a crate of beer?
Filed under: it's called resale and it's a valid business model so shut up
And a database of 138,275 hashed passwords.
Want a bunch of Steam games?
Filed under: Hashed? Who am I kidding? Ha!
-
I'm 20 as of today, not 21.
-
I'm 20 as of today, not 21.
Happy Birthday, mine is Monday.
Filed under: Next year, no one reading this will be able to figure out my birthday., Unless they have one of those hovering pointing devices.
-
So, you now have a database of 138,272 email addresses.
Want a crate of beer?
I'm 20 as of today, not 21.
I can provide beer and accommodations in a country where 18 is legal age...
Filed under: Happy Bithday,
-
I'm 20 as of today, not 21.
Wow. So there are people who adhere to the "no drinking till 21" law.
I can provide beer and accommodations in a country where 18 is legal age...
So is here. You've wasted your trump card, sorry.
Filed under: because, unlike drinking at twenty, selling mail databases to spammers is totally legal, fuck, now I'm officially as young as Ben
-
Why does everyone want these email addresses? I'm already going to sign them up to a mailing list called What the Daily WTF.
Filed under: but they won't recieve the emails because the SMTP server is set to mail.example.com, they will also eventually be signed up to another mailing list with the same name but with a valid SMTP server
-
Why does everyone want these email addresses?
I have some interesting business offers I'd like to make.
Filed under: they involve penises
-
Why does everyone want these email addresses? I'm already going to sign them up to a mailing list called What the Daily WTF.
I will definitely Reply-All with my unsubscribe requests.
Filed under: Bedlam DL3? Me too!
-
If I send you the MD5 of all the email addresses, is that good? Here, I'll even throw in the registration dates, usernames, and ID numbers.
user@vensa:/var/docker/shared/discourse-tdwtf$ md5sum tdwtf-users.csv 4a71edae588a6e7aac9bea93223c9eed tdwtf-users.csv
-
I have some interesting business offers I'd like to make.
Filed under: they involve penises
That's what she said.
-
If I send you the MD5 of all the email addresses, is that good? Here, I'll even throw in the registration dates, usernames, and ID numbers.
user@vensa:/var/docker/shared/discourse-tdwtf$ md5sum tdwtf-users.csv 4a71edae588a6e7aac9bea93223c9eed tdwtf-users.csv ```</blockquote> Now I just need a 16 yottabyte drive full of rainbow tables. --- Filed under: [Drive manufacturers measure in yottabytes not yobibytes.](#tag)
-
If anyone wants to follow along at home, here's my script so far:
require File.expand_path(File.dirname(__FILE__) + '/base.rb') class ImportScripts::CommunityServer < ImportScripts::Base def initialize super end def execute users_results = CSV.read('tdwtf-users.csv') keys = users_results.shift.map(&:to_sym) create_users(users_results) do |u| Hash[keys.zip(u)] end end end ImportScripts::CommunityServer.new.perform
Reconstructing the csv file from the md5 hash is left as an exercise for the reader.
Edit: Here's the SQL query:
SELECT cs_Users.UserID AS id, '"' + REPLACE(cs_Users.Email, '"', '""') + '"' AS email, '"' + REPLACE(cs_Users.UserName, '"', '""') + '"' AS name, cs_Users.CreateDate AS created_at FROM cs_Users ORDER BY id ASC;
For some reason SSMS doesn't support exporting actual valid CSV, so I had to improvise.
-
users_results = CSV.read('tdwtf-users.csv')
Loading a 4GB dump into memory, what could go wrong?
-
The 4GB is mostly tags. tdwtf-users.csv is about three orders of magnitude smaller.
-
I suggest matching old users to new users on email address first, then username if you can't find a match. Usernames got mangled a bit, and some of us took the 3 day window to change our username.
-
Already done for me:
-
Loading a 4GB dump into
memoryswap, what could go wrong?More likely, if it's on a Chromebook.
-
Not bad.
So the posts are in HTML, how is Discourse going to handle that? Can you just load the "cooked" field with the rendered HTML? Is this a new opportunity for cross-system XSS?
-
@ben_lubar add signature guys as users.
Hopefully, if people just pasted the existing post structure when making them (which is a fair assumption) you should be able to parse out any usernames not already in database to get a list.
Filed under: Gonna take ages but it's a worthy cause
-
For some reason SSMS doesn't support exporting actual valid CSV, so I had to improvise.
If you are having windows, then get Toad For SQl Server Freeware version and get on with it.
-
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes.
-
-
We have these new-fangled XML parser things these days...
-
I'm planning on cooking tags into the posts.
-
ok it's really scary to know in real time how many people like my post.
-
I'm planning on cooking tags into the posts.
Cooking tags, copypasta. Mmm.
Filed under: Jam a noodle in it.
-
ok it's really scary to know in real time how many people like my post.
Now take your ninja edit window to change it to something horribly racist.
Filed under: You can still Ben L. on this forum, but it has to be vertically.
-
-
Now take your ninja edit window to change it to something horribly racist.
And then see how fast you rack up the Likes.
-
Objection, relevance?
@Nagesh said:http://blog.codinghorror.com/parsing-html-the-cthulhu-way/
Try a poll, maybe you get an answer that way.
Filed under: Fucking hell why does it move my viewport when people edit huge images into their posts?