Imports (Was: {brace yourselves} the import is coming {Spoiler Alert: Not all of it} [i.e. blakeyrat was not utterly wrong for the first time ever] Filed under: append-only titles.)

tar

Hmm... so this is fun...

> function abc() { console.log(1) }
undefined
> (function abc() { console.log(2) })
[Function: abc]
> abc()
1

Doesn't relate to the discussion (other than being JS), but it's fun...

accalia

@tar said:

debugging porpoises...

leave the dolphins alone! :-P

RaceProUK

@accalia said:

well yes, but you also can't tell the difference between Array and Object with typeof alone

Is there a definitive way to tell them apart? Checking for a property called length will work right up until someone defines length as part of the prototype..

accalia

@RaceProUK said:

Is there a definitive way to tell them apart?

probably...... but why would you care, so long as it quacks like a duck?

(well it does help to know that you actually have the array prototype to use things like forEach and filter and map. btu that's what arr = Array.prototype.call(arr); is for. :-P )

tar


function sane_typeof(x) {
    switch(x) {
        case null: return 'Null';
        case undefined: return 'Undefined';
        default: return x.constructor.name;
    }
}


function prove(x, result) {
    if(x !== result) {
        throw 'oops: ' + x + " !== " + result;
    }
}


function Thing() { };

prove(sane_typeof(undefined), 'Undefined');
prove(sane_typeof(null), 'Null');
prove(sane_typeof(true), 'Boolean');
prove(sane_typeof(false), 'Boolean');
prove(sane_typeof(new Boolean()), 'Boolean');
prove(sane_typeof(0), 'Number');
prove(sane_typeof(1), 'Number');
prove(sane_typeof(1.7), 'Number');
prove(sane_typeof(1e6), 'Number');
prove(sane_typeof(new Number()), 'Number');
prove(sane_typeof('String'), 'String');
prove(sane_typeof(new String()), 'String');
prove(sane_typeof(/a/), 'RegExp');
prove(sane_typeof(new RegExp()), 'RegExp');
prove(sane_typeof([]), 'Array');
prove(sane_typeof(new Array()), 'Array');
prove(sane_typeof({}), 'Object');
prove(sane_typeof(new Object()), 'Object');
prove(sane_typeof(function x() {}), 'Function');
prove(sane_typeof(new Function()), 'Function');
prove(sane_typeof(new Thing()), 'Thing');
// node.js specific...
prove(sane_typeof(console), 'Console');
prove(sane_typeof(console.log), 'Function');
prove(sane_typeof(process), 'process');
prove(sane_typeof(process.stdin), 'ReadStream');
prove(sane_typeof(process.stderr), 'WriteStream');

console.log('seems good then!');

This is JavaScript—I'm bound to have missed something...

EDIT: missed Strings, Functions, RegExps, some other random stuff...

Onyx

I'm pretty sure I could pull out similar (or greater) amounts of fun from PHP. If it weren't 2AM and I weren't here just because of a broken sleep cycle. Maybe I should pull out some of my reflection abuse code. "Horrible" does not begin to explain it.

The fun bit? Even though some things could very likely be written better the most of it is just working around deficiencies in PHP's half-assed OOP model and all you can do is just shrug and write out the stupid.

Onyx

Back on topic: Any updates on that import? I have this distinct feeling it crashed again... Unless Discosearch can't cope with new topics, a few random title searches of the stuff from the old forums didn't yield any results.

boomzilla

Yeah, the About page is only up to 274K right now, so not much happened. Though most old topics are pretty small, so it might have gone through a few topics.

mikeTheLiar

This thread title is apt, because much like GoT/ASoIaF, (the import | winter) has been coming for twenty fucking years and still hasn't happened yet.

Onyx

My vote is still for "Waiting for ~~Godo~~import" as the next "we're doing this guise, honest!" thread title.

What, you thought we'd reuse this thread? Hah!

blakeyrat

Let's name the thread: "Blakeyrat was right, again."

darkmatter

if you didnt refuse to like things, you could name it yourself

Luhmann

@darkmatter said:

if you didnt refuse to like things, you could name it yourself

But then all topics would be called "Blakeyrat was right, again." ...

lolwhat

@Yamikuronue said:

this thread is fucking with my perception of whites

tar

anonymous234

@tar said:

"not fucking up a quotation by turning all the <s into <s or otherwise ruining the formatting"?

WHAT?!?!?!??!? Are you saying quotations should look like the original post? You're crazy, obviously they should look like the raw text of the original post because of reasons. Discourse's the first forum software to get it right.
Next you'll be saying that when the software finds a "<" and does not recognize the word as valid HTML, it should just render it as < instead of silently hiding the whole word.

Edit: even when I try to mock Discourse, it ends up exceeding my expectations. Original post, quoted, expanded quote:

PJH

@Luhmann said:

But then all topics would be called "Blakeyrat was right, again." ...

... for a while at least...

ben_lubar

Ok, I'm keeping a log of the import this time. So far, the import output is this:

loading existing groups...
loading existing users...
loading existing categories...
loading existing posts...
loading existing topics...
      331 / 139465 (  0.2%)  Skipping user id 333 because email is blank
      604 / 139465 (  0.4%)  Skipping user id 1000 because email is blank
      605 / 139465 (  0.4%)  Skipping user id 1001 because email is blank
     3995 / 139465 (  2.9%)  Skipping user id FFD9 because email is blank
    12032 / 139465 (  8.6%)  Skipping user id :58 because email is blank
    28800 / 139465 ( 20.7%)  Skipping user id :39 because email is blank
    62351 / 139465 ( 44.7%)  Skipping user id  because email is blank
    79154 / 139465 ( 56.8%)  Skipping user id  because email is blank
    95939 / 139465 ( 68.8%)  Skipping user id  because email is blank
   111421 / 139465 ( 79.9%)  Skipping user id  because email is blank
   139465 / 139465 (100.0%)

counting posts
264466

migrating posts

and then the last line has a lot of spaces on it, which I'm assuming is printed by this line:

tdwtf-convert/communityserver.rb at master · BenLubar/tdwtf-convert

I may as well make this Open Sores Software to annoy morbs. <3 morbs - BenLubar/tdwtf-convert

@PJH can you check http://what.thedailywtf.com/sidekiq for the number of enqueued jobs?

Onyx

First screen, now logging. Gettin' enterprisey in here!

tar

@Onyx said:

First screen, now logging. Gettin' enterprisey in here!

Is it a binary log format though? I heard those are the future...

ben_lubar

It's this one: http://tukaani.org/xz/format.html

sam

you got to create posts in batches, if you got keys right its resumable. try for batches of 1000

Luhmann

@tar said:

Is it a binary log format

If he's really aiming for enterprisey it should be XML ... nothing screams enterprise app like an XML log file.

tar

@Luhmann said:

XML ...

Nuh-uh. Not any more. Systemd uses binary logs, and systemd is the future...

Luhmann

@tar said:

Systemd uses binary logs

Since when do Linux dudes with beards and sandals decide what is Enterprisey and what not? What the uck is wrong with this world?

tar

Since this guy:

Filed under: systemd can do anything!

Luhmann

@tar said:

systemd can do anything!

Can it fetch me another beer? I'm running on empty here ...

Onyx

@tar said:

this guy

Hey, remember when we raged at having to poke around the Event viewer on Windows and waste hours looking for shit and then were happy that we can just use tail and grep on *NIX? Fuck that, binary all the things!

At least that's how I imagine him.

loopback0

Maybe it'd be easier to import the Dischorse posts into CS at this point?

sam

we have done plenty of huge imports, they trickle in , the issue here appears to be the everything or nothing approach.

@ben_lubar see:

discourse/script/import_scripts/getsatisfaction.rb at main · discourse/discourse

A platform for community discussion. Free, open, simple. - discourse/discourse

ben_lubar

I'm not seeing any transactions in this function:

discourse/script/import_scripts/base.rb at main · discourse/discourse

A platform for community discussion. Free, open, simple. - discourse/discourse

Why does it matter if I pass 1 iterator to 1000 elements or 10 iterators to 100 elements? It's going to be O(n=1000) either way.

CreatedToDislikeThis

@Onyx said:

> Hey, remember when we raged at having to poke around the Event viewer on Windows and waste hours looking for shit and then were happy that we can just use tail and grep on *NIX? Fuck that, binary all the things!

At least that's how I imagine him.

Storing binary files is ultimately better than text files. Smaller size & faster to read & write by programs.
The problem is that when you double click and/or cat/vi/emacs a binary file, it opens the raw file rather than converting it to a text file and opening that.

OSes should support generic APIs for converting a file to/from text, which programs can then implement for their file formats don't get me started on extensions. Then, text editors can use these APIs and open arbitrary super-compressed log files or whatever you want. The APIs can be sufficiently smart to support opening (small part of) an arbitrarily large file, etc.
Depending on the binary format, they can even support fast saving of arbitrary edits to a small part of an arbitrarily large file, something text files do not support.

If you want to poke around the raw file (and I know I do!), that's what hex editors are for.

ben_lubar

Example log line:

     3378 / 264466 (  1.3%)  InvalidAccess creating post 46690. Topic is closed? can_create? failed

That's this post:

Oct 12, 2005 / Side Bar WTF

HowTo_AccessRaw SectorsOfPhysical Drives_onLinux.py

I hate to submit Python code ...... but this is really bad.linkfor physicalDriveID in ['a','b','c','d','e','f','g','h']: drive = '/dev/hd%s'%(physicalDriveID,) try: fileLikeAccessToContentOfSectorsOfHardDrive0 = open(drive, 'rb') hardDriveSector...

@sam, what does this line do?

discourse/script/import_scripts/base.rb at main · discourse/discourse

A platform for community discussion. Free, open, simple. - discourse/discourse

It doesn't seem to be actually disabling any validation logic.

dkf

@CreatedToDislikeThis said:

Storing binary files is ultimately better than text files.

Cute Stuff - Only cute things

ben_lubar

Can you imagine how much more of a clusterfuck Dwarf Fortress would be if it stored everything as human-editable text and had to parse dumb shit humans write?

aliceif

So, when will programmer dwarves will get added?

sam

@ben_lubar said:

@sam, what does this line do?

It skips internal validations (the ones it can). For example the length validations or entropy ones, but it can not skip db validations.

It should allow you to post on a closed topic, sounds like a bug.

tar

Stands to reason. You only need 7 bits to store binary files, so you can get some space savings over text files which need a full 8 bits to store. So we can safely conclude that binary files are a more compact format of computer storage.
Filed under: at least, I think that's correct...

ben_lubar

I think it's the fact that the categories aren't set to allow posting or that the users that got imported aren't allowed to create posts because they haven't logged in or something.

Is it safe to monkey patch Guardian to make it allow everything during the import script, or would that somehow leak into the live forum?

tar

@ben_lubar said:

Is it safe to monkey patch Guardian to make it allow everything during the import script, or would that somehow leak into the live forum?

Only one way to find out. Test it!

ben_lubar

Well, if I want to store １２３４５６７８９ as a binary integer, it only takes 4 bytes, but if I want to store it as text, it takes 72 bytes!

ben_lubar

I'd rather not cause the forumpocalypse if it can be avoided.

CreatedToDislikeThis

@dkf said:

(No content)

Then give me a disadvantage of binary files compared to text files assuming the scheme which I outlined in my post is in place.
(Aka: "you can vi/tail/head/ass text files" is not a valid advantage since you can do that for binary files just as easily under my scheme)
("Your scheme too complex for me wee brain" is a more valid disadvantage, but please think of something more interesting)

(EDIT: Thought of one: "Text files are easier to programmatically create/edit/parse than Binary files"
Text files that do not take advantage of an existing generic format like xml are harder to programmatically parse/edit than binary files. Xml/json/shmson files are just as easy to create/edit/parse (given the right libraries) as equivalent binary format (if you can't see that, I'm sorry).
The fact that there aren't enough generic binary formats that I could give a good example, stems from the lack of OS support for convenient editing of arbitrary binary files as detailed in my previous post.
)

tar

@ben_lubar said:

Well, if I want to store １２３４５６７８９ as a binary integer, it only takes 4 bytes, but if I want to store it as text, it takes 72 bytes!

Hang on, it'll take 123456789 bytes, won't it?

RaceProUK

The more I read about this import process, the more I think it was never tested against a test instance…

ben_lubar

It was tested last July.

tar

@RaceProUK said:

The more I read about this import process, the more I think it was never tested against a test instance…

But it is being tested! It's being tested right now!

loopback0

If only there was some kind of instance on some kind of droplet thing which is a bit like this one, just two weeks ago......

RaceProUK

@tar said:

But it is being tested! It's being tested right now!

On live.

If my quills were twitching any harder, they'd pull themselves right out…

ben_lubar

This is that instance.