So is the migration thing still happening?
-
Try a poll, maybe you get an answer that way.
Start
Poll: Do you like ice cream?
, then change the title toPoll: Should all the Jews be exterminated?
Filed under: Yes, I managed to Godwin even this topic., 33% think the answer is Vanilla
-
Should I import this table?
-
-
Please don't.
-
@codinghorror @sam Am I doing the avatar upload part right?
Example of an avatar from the CSV:
"FFD8FFE000104A46494600010101006000600000FFDB004300080606070605080707070909080A0C140D0C0B0B0C1912130F141D1A1F1E1D1A1C1C20242E2720222C231C1C2837292C30313434341F27393D38323C2E333432FFDB0043010909090C0B0C180D0D1832211C213232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232323232FFC0001108004F004F03012200021101031101FFC4001F0000010501010101010100000000000000000102030405060708090A0BFFC400B5100002010303020403050504040000017D01020300041105122131410613516107227114328191A1082342B1C11552D1F02433627282090A161718191A25262728292A3435363738393A434445464748494A535455565758595A636465666768696A737475767778797A838485868788898A92939495969798999AA2A3A4A5A6A7A8A9AAB2B3B4B5B6B7B8B9BAC2C3C4C5C6C7C8C9CAD2D3D4D5D6D7D8D9DAE1E2E3E4E5E6E7E8E9EAF1F2F3F4F5F6F7F8F9FAFFC4001F0100030101010101010101010000000000000102030405060708090A0BFFC400B51100020102040403040705040400010277000102031104052131061241510761711322328108144291A1B1C109233352F0156272D10A162434E125F11718191A262728292A35363738393A434445464748494A535455565758595A636465666768696A737475767778797A82838485868788898A92939495969798999AA2A3A4A5A6A7A8A9AAB2B3B4B5B6B7B8B9BAC2C3C4C5C6C7C8C9CAD2D3D4D5D6D7D8D9DAE2E3E4E5E6E7E8E9EAF2F3F4F5F6F7F8F9FAFFDA000C03010002110311003F00D9A28A2800A864BA8E3C8CEE3E8292EA5F2A3C0FBCDC0ACEA00B66F9B3F2A0C7BD0B7C7F8901FA1AA9450069C53A4BC29C1F4352D63838391D6B46D67F3530C7E61FAD004F4514500145151CCE6388B28C9A00AB7C7F78A33DBA555A56259892724F5A4A0028A28A002A4B7244E983DEA3A96D977CEBEDC9A00D3A28A2800A28A28032646DD231E9934DA9AE90ACC491804F18A86800A28A2800AD0B48BCB8F711F337F2AA31E3CC5DDF772335AD4005145140051451400D745914AB0C835426B56886E1F32FF2AD1A3A8C50063D14F990473328E80D32800AB96F7438493F06AA745006C515991DC491F01B8F4357A19D661E8DDC5004B4514500151CD2795116EFD07D6A4AA57CC77AAF6C66802A9258924E49A4A28A0028A28A0029558AB641C114945007FFD9"
-
Wait, avatars are in the database as hex?
I know I've stored files directly in the database before but there were specialised circumstances (no ability to write to the filesystem) but this seems... inelegant even by those standards.
-
They're stored as binary, but I can't put binary data in a UTF-8 file, so I had to convert to hex.
-
No idea off the top of my head, odd that they are in hex. I would start by writing a standalone that takes the heck and writes a file.
-
[Who needs files?](data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABPAE8DASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDZooooAKhkuo48jO4+gpLqXyo8D7zcCs6gC2b5s/Kgx70LfH+JAfoaqUUAacU6S8KcH0NS1jg4OR1rRtZ/NTDH5h+tAE9FFFABRRUczmOIsoyaAKt8f3ijPbpVWlYlmJJyT1pKACiiigAqS3JE6YPeo6ltl3zr7cmgDTooooAKKKKAMmRt0jHpk02prpCsxJGATxioaACiiigArQtIvLj3EfM38qox48xd33cjNa1ABRRRQAUUUUANdFkUqwyDVCa1aIbh8y/yrRo6jFAGPRT5kEczKOgNMoAKuW90OEk/Bqp0UAbFFZkdxJHwG49DV6GdZh6N3FAEtFFFABUc0nlRFu/QfWpKpXzHeq9sZoAqkliSTkmkoooAKKKKAClVirZBwRSUUAf/2Q==)
Filed under: I'm leaving it unparsed because Discourse strips data: links.
-
-
you gonna need to base64 decode this puppy.
-
I didn't want to spend too long messing around in SSMS. Here's the thing you'd get if you copied the
data:
url into your browser.
-
@codinghorror this user failed to import with the error
"Failed to create user id redacted@hotmail.com: ["Username can't be blank", "Username must be present"]"
-
Do some sanitising on usernames, EG:
Filed under: Having User.suggest_name and UserNameSuggester.sugget is borderline insane
-
Ben, also have a look at the import scripts directory most of your work should be already done, you just need to plug in a few queries and inherit off base.
-
If that's the only user that failed to import, I wouldn't worry. He hasn't been back since the day he joined almost four years ago, and has no posts. No great loss.
-
The reason I'm using CSV is because I couldn't find an easy way to run SQL Server in a docker instance.
-
liked this.
It's up to
57334 / 138275 ( 41.5%)
on the current run. I restarted it from the backup earlier today when I wrote the avatar code. Note: the avatar code does not change users with avatars that are already uploaded to Discourse. Note: this code will probably fail if a Community Server forum has over about 10 million users with avatars because each avatar is about a kilobyte and 10 million times a thousand is a big number.
A wise man once said:
It amazes me how Microsoft's more expensive database program is less capable of outputting one of the simplest data formats in existence.
In the meantime, I've been making life easier for space marines with long names.
Filed under: SpectateSwamp forum Migration Service
-
10 million times a thousand is a big number.
Not when it comes to databases it isn't.
-
Note: this code will probably fail if a Community Server forum has over about 10 million users with avatars because each avatar is about a kilobyte and 10 million times a thousand is a big number
Can the avatar's be factorised? i.e. I'm guessing most of the users in the CS database (esp. spammers) are using the default one which could all reference the same one?
-
Via careless research, I have determined that Error is already the most loved person on the forums; followed by moi.
We are naturally in constant yet indirect combat over this illustrious position.
-
The default avatar is stored as NULL in the database.
The power went out last night at about 2 AM. I'm currently getting everything back running.
-
The power went out last night at about 2 AM. I'm currently getting everything back running.
Are you also in India?
-
For some reason SSMS doesn't support exporting actual valid CSV, so I had to improvise.
If you have the Business Intelligence tools installed, an SSIS package can do valid CSV, unless there are quotes in the data, in which case you're screwed.
Filed under: or you can hack something together using Ruby and ODBC
-
I hacked something together using sql, and it's working fine since only one user has quotes in their name.
-
Are we going to end up with over 9000 unread topics when this import happens?
-
We should have a
betting poolpoll for that.
-
In CS, "have I read this post?" is a boolean for the whole topic, so if you had over 9000 topics unread in CS, you'll have over 9000 times the average number of posts per topic unread posts in Discourse.
Edit: Fuck, I included the "deleted posts" forum.
Edit: No, that spam at the end was in the sidebar when the backup was taken.
Edit: I did forget the tags, however. How could I
-
In CS, "have I read this post?" is a boolean for the whole topic, so if you had over 9000 topics unread in CS, you'll have over 9000 times the average number of posts per topic unread posts in Discourse.
My brain is unable to parse this statement. Regardless, I'm pretty sure it bodes ill for me, as I had only recently created an account on CS.
-
My brain is unable to parse this statement
Yeah, it seemed like more of a threat than answering the question. I'm not sure how it fits into tracking and whatnot, but the imports should definitely appear to be already read. I would want to be alerted to new posts there, but not the whole damn thing.
-
TSQL doesn't support aggregate-by-concatenation (fun fact: MySQL supports that shit) so I had to use XML.
-
You forgot to add:
Filed under: XML is TRWTF
-
That goes without saying at this point.
-
- Copying to C:\Users\Owner\Downloads\tdwtf-posts.csv (Error)
Messages- Error 0xc0202009: Data Flow Task 1: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 11.0" Hresult: 0x80004005 Description: "FOR XML could not serialize the data for node 'NoName' because it contains a character (0x0001) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive.".
(SQL Server Import and Export Wizard) - Error 0xc0047038: Data Flow Task 1: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Source - Query returned error code 0xC0202009. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
(SQL Server Import and Export Wizard)
- Error 0xc0202009: Data Flow Task 1: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
Filed under: shit. what now?
Ok, the only post with an invalid tag is this one. What should I do with it?
- Copying to C:\Users\Owner\Downloads\tdwtf-posts.csv (Error)
-
@sam the base import code doesn't seem to have anywhere to put topic IDs from the old forum. What should I do with them?
-
TSQL doesn't support aggregate-by-concatenation (fun fact: MySQL supports that shit) so I had to use XML.
Actually, if I understand what you're talking about, you're looking for COALESCE. Used like:
DECLARE @Text nvarchar(max) SELECT @Text = COALESCE(@Text, '') + Text FROM MyTable
If my table has 2 rows, with text of 'Row 1' and 'Row 2', respectively, this example would result in @Text containing the value 'Row 1Row 2'.
-
What if I need to do that for 264651 distinct rows? Do I need 264651
@Text
variables?
-
What if I need to do that for 264651 distinct rows?
I think you may be trying to solve the wrong problem.
-
-
-
-
-
/usr/local/lib/ruby/2.0.0/csv.rb:1820:in `block (2 levels) in shift': Missing or stray quote in line 92072 (CSV::MalformedCSVError)
Hmm
Ok, that's weird. Here's the line with the error and the next line on each side:
uses OCR do read the request and reply phone number, uses the web service, and returns the requested character. I can easily see this hello world program taking about 2 minutes to actually display the
Maybe CRLF line endings are TRWTF. Here's line
$((92072/2))
and the next line on each side:""<span style=""font-weight: bold;"">meaningful</span>"" character <span style=""font-weight: bold;"">(i.e., . [ ^ etc)</span> that will *only* match itself ... or prove that one cannot exist.<br><br>nonDev<br></span>","" 57727,57676,2280,3567,18,"Re: Funny Conversation",2006-01-25 03:13:42.840000000,"[quote user=""ferrengi""]I work in a small office and I was just passing by the office of the
Nope, I converted the CRLF to LF and it still says that the line with no quotes on it has invalid quotes. Weird.
-
if you need them, chuck them in a custom_field on Topic
-
So it turns out when the CSV error says "line 92072" it actually means "the 92072th line that begins a record". Here are three lines, as usual:
182281,182199,10199,12298,18,"Re: TOAD",2008-10-08 10:01:35.810000000,"<p>[quote user=""belgariontheking""]Can anyone else confirm?[/quote]WFM Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0) Gecko/20081008 Firefox/3.0.3; WFM Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 (ID: 2008040413). Dunno what your problem is BTK, sorry.</p>","" 182282,182160,10203,12298,18,"Re: Thanks for the export",2008-10-08 10:46:54.323000000,"[quote user=""Spectre""]<pre>. - spot "" - rabbit-ears</pre>[/quote]"<pre style=""display: inline;""> - rabbit</pre><br><ref style=""font-style: oblique;"">—Revised INTERCAL Manual, Page 42 - Tonsil A</ref>
Wait, does SQL
REPLACE
only work when it feels like it? What's that quote doing in the middle of/quote]"<pre
SELECT REPLACE(CAST(Body AS nvarchar(max)), '"', ' @mikeTheLiar ') FROM cs_Posts WHERE PostID = 182282;
[quote user= @mikeTheLiar Spectre @mikeTheLiar ]<pre>. - spot @mikeTheLiar - rabbit-ears</pre>[/quote]"̣<pre style= @mikeTheLiar display: inline; @mikeTheLiar > - rabbit</pre><br><ref style= @mikeTheLiar font-style: oblique; @mikeTheLiar >—Revised INTERCAL Manual, Page 42 - Tonsil A</ref>
-
uote]"̣<pre
Crikey; U+0323, COMBINING DOT BELOW. That's got some classic evil right there. It seems to confuse browsers rather a lot too. Ick. (Consider normalizing the text — probably to NFC — as part of the migration.)
-
-
No, but I would normally expect such nonsense to be reserved for threads about HTML parsing with regular expressions…
-
I would normally expect such nonsense to be reserved for threads about HTML parsing with regular expressions…
That would be ... sane.
-
Crikey; U+0323, COMBINING DOT BELOW. That's got some classic evil right there. It seems to confuse browsers rather a lot too. Ick. (Consider normalizing the text — probably to NFC — as part of the migration.)
Can't we just strip out any non-alphanumerics outside of, say, ISO 8859-1?
Filed under: Defining alphanumerics to include non-Latin letters and numbers.