RAM is slower than HDDs! If you are an idiot, that is.

aliceif

Saw this paper linked on Slashdot and oh boy.
~~http://arxiv.org/ftp/arxiv/papers/1503/1503.02678.pdf~~ Link died.
Looks like the paper was withdrawn: http://arxiv.org/abs/1503.02678

loopback0

I skimmed it but saw enough to classify it as .

RaceProUK

Dept of Biological Sciences

Dept of Chemical and Biological Engineering

Exactly what do these have to do with computers?

And you'd think the guy from the

Dept of Electrical and Computer Engineering

would know that RAM is always faster than HDDs.

PJH

Had a cursory glance - from the 'paper'

Constructing the file content has a major impact on the code’s performance. For the in-memory case, wedefine a string to contain the file contents, and use a loop to concatenate another string to it, until a predetermined file size is reached. This size limit was arbitrarily set to 1,000,000 bytes (less than 1 MB), which is small fraction of RAM available in most current computers, so the situation described in the paper happens for even small data sets. We start by adding 1 character (byte) at a time to the content, so in the in-memory case, the string containing the file will be concatenated 1,000,000 times. In the disk-only case, 1,000,000 disk operations are issued. We then repeat the experiment by adding 10, 1,000, and 1,000,000 bytes at a time.

I'm fairly certain without going beyond that, that they perhaps need to read this:

http://www.joelonsoftware.com/articles/fog0000000319.html (Shlemiel the painter's algorithm)

locallunatic

Or would know to actually write to disk for each add on the write to disk version of the task rather than just once. (in the java code they add to the buffered writer over and over again, but only flush at the end).

cvi

My summary of the "paper":

jaming

RaceProUK

Oh sweet merciful Chaos…

The method they use to prove in-memory is 'slower' is string concatenation.
In Java, this creates a whole new String object.

Every. Fucking. Time.

So when you're creating one million of the fuckers, of course it'll be slow!

locallunatic

@RaceProUK said:

'slower' is string concatenation.

Yeppers. Though the not writing to disk for disk writing is a bigger issue than not using a string builder for the in memory version IMO.

jaming

Even if they used a StringBuffer and flushed the BufferedWriter in the loop, it doesn't mean anything was actually written to disk--it just means it made it to the disk buffer.

Onyx

Well done, you wrote a string directly to disk. You are winrar.

Can I see that string now? It's ok, I'll wait while you read it back into memory so you can display it.

Yes, I know, more specific explanations above, but even this is enough for pretty much everyone to figure out why this is not a valid test without even knowing what the differences between languages used are.

cvi

@jaming said:

Even if they used a StringBuffer and flushed the BufferedWriter in the loop, it doesn't mean anything was actually written to disk--it just means it made it to the disk buffer.

This. They are pretty much comparing two ways of concatenating 1M "1":s in memory, where one of the ways is positively terrible. Any operations involving disks of any kind are completely coincidental.

boomzilla

It sounds like these guys aren't clear on the concept of memory management and whatever their language runtime / OS was doing to cache stuff when writing to disk.

More seriously, this is perhaps the best argument for putting all your stuff into a single text file and just going from there. Imagine how easy it would be to share stuff! I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.

Onyx

@boomzilla said:

More seriously, this is perhaps the best argument for putting all your stuff into a single text file and just going from there. Imagine how easy it would be to share stuff!

But can you lock it?

@boomzilla said:

I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.



FrostCat

No, from what I skimmed, they're aware that string concatenation is an expensive operation. I couldn't bear to read on to see if they understood that that means that the way they did it is stupid.

Then I saw "In other words, the operating system already lessons disk access times" and was all

RaceProUK

@boomzilla said:

It sounds like these guys aren't clear on the concept of memory management and whatever their language runtime / OS was doing to cache stuff when writing to disk.

One of them is an electronics and computing engineer. Which is quite the

loopback0

Not good ones, seemingly.

Salamander

I like how they used Java 6 and Python 2.6.6 in Linux, while using Java 8 and Python 2.7.6 in Windows.
It's like they don't even know how to run an experiment!

aliceif

Also, obligatory kind-of-related front page article:

Remy Porter / Oct 15, 2014

Line by Line

In the bowels of a business unit, a director got a great deal on a third party software package. He bought it, without talking to corporate IT, and then was upset when it couldn’t gracefully integrate with any of the corporate IT assets. Eager to throw good money after bad, the director hired his...

boomzilla

@FrostCat said:

No, from what I skimmed, they're aware that string concatenation is an expensive operation.

Yeah, I totally just skimmed, but it sounds like they don't understand it as well as they think they do.

Actually, I'm not sure WTF they're doing. It reads like a term paper and are confusing people by making statements about in memory processing vs disk access.

WHBT

dkf

@aliceif said:

http://arxiv.org/ftp/arxiv/papers/1503/1503.02678.pdf

The more I read of that paper, the stupider I feel.

@RaceProUK said:

Exactly what do these have to do with computers?

They do hook computers up to instruments these days, and do quite a bit of simulation of reactions. I wouldn't let the majority of people in those departments actually implement an algorithm though; some people are best kept using just the large lego blocks well into adulthood…

@locallunatic said:

Or would know to actually write to disk for each add on the write to disk version of the task rather than just once.

The buffered writer will also flush once an internal threshold is reached. It's not written by a fucking moron.

boomzilla

@RaceProUK said:

One of them is an electronics and computing engineer. Which is quite the

Not really. I mean: CANADA.

dkf

@boomzilla said:

More seriously, this is perhaps the best argument for putting all your stuff into a single text file and just going from there. Imagine how easy it would be to share stuff! I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.

You could even Fast Forward <SSDS!> through that stuff…

PJH

@boomzilla said:

I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.

I'm sure we've featured some software round here that could do that - maybe even take a video of it as well. Perhaps they could enlist the services of the author of the software to help them with their next paper...

Onyx

Then again, I believe that the software you're referring doesn't use automatic indexing. This might be a point of contention, since I believe the author didn't like them due to being slow. Which we now know is incorrect, what with HDDs being faster than RAM.

ben_lubar

Go Playground - The Go Programming Language

Guys I have proof that RAM is slower than emulated hard drives that are also in RAM!

marinus

And they're already in the same country too. Perhaps they know him. I read this abomination and it wouldn't surprise me if they had done drugs together.

I hope this is just some first-year bachelor students' "here's how to write a paper" course, although even then they would've failed it.

tar

@boomzilla said:

putting all your stuff into a single text file and just going from there

This sounds familar to me. Once you have all your stuff in the one file, you need some kind of Desktop Search functionality, no?

tar

@Salamander said:

I like how they used Java 6 and Python 2.6.6 in Linux, while using Java 8 and Python 2.7.6 in Windows.
It's like they don't even know how to run an experiment!

You have to use old versions of software when you're running on Linux hardware, because it's just not as good as Windows hardware.

sloosecannon

@tar said:

This sounds familar to me. Once you have all your stuff in the one file, you need some kind of Desktop Search functionality, no?

But not just Search. Because once you have the Search, you can do Other things with it like Videos. Need to remember Something? Use Search!

tar

Search *mumblemumble* Video *mumblemumble* For the Masses!

sloosecannon

We need a CotSSDS.

Filed Under: The bad ideas thread is  ?

tar

We need a Cult of the Civilized Desktop Search Construction Kit.

nerd4sale

There's hardly a WTF in the paper. From their abstract:

In this paper we ... show that in-memory operations are not always a guarantee for high performance, and may actually cause a considerable slow-down.
...
We argue that ... better developer awareness and coding practices are necessary to ensure in-memory computing can achieve its full potential.

Translation: we can write code so crappy that we achieve a considerable slow-down.
They certainly did prove that with their WTF code.
Now they only need to apply the better developer awareness and coding practices to themselves.

PJH

@tar said:

This sounds familar to me. Once you have all your stuff in the one file, you need some kind of Desktop Search functionality, no?

Hanzo'd, ever so subtly, by 6 hours I'm afraid.

Eldelshell

You're starting to become a meme all by yourself. And the guys in golang should give you your own referrer ID.

tar

@PJH said:

Hanzo'd, ever so subtly, by 6 hours I'm afraid.

Arriving 6 hours late is a barrier to remaining un-Hanzo'd...

boomzilla

@nerd4sale said:

Translation: we can write code so crappy that we achieve a considerable slow-down.

Yeah, they talk about how they "use code inspired by real, production software." Looking at what they do, it's probably some sort of bioinformatics thing, where I understand the data can get quite large.

The more I think about this, the more I suspect that the biologists did most of the work here with the computer guy helping to set up their code or doing the actual diagnostics or something, so he gets an easy paper added to his CV and the biologists get to look like smartypants to their biologist friends.

Huzzah for publish or perish!

ijij

But, but, these guys are Professional Scientists ....

What could possibly go wrong?

dkf

@ijij said:

What could possibly go wrong?

Do you want a map?

ijij

@dkf said:

Do you want a map?

It'll have to be digital....

Folding that bad-boy to fit in my glove-box... not gonna happen.

Steve_The_Cynic

Let me summarise the essential point of the paper:

If your chosen algorithm is horrible, your program will perform badly.

There, did I do it well?

powerlord

For a summary, sure. For an actual article, you'll have to write code and measure its performance.

Actually, I think they did the hard part of that for (writing deliberately bad code).

Eldelshell

If I now write a paper debunking this crap, does it make it to be some scientific research paper? I could even add it to my LinkedIn profile!

BTW, I would probably use this title for the paper.

TheCPUWizard

@powerlord said:

Actually, I think they did the hard part of that for (writing deliberately bad code).

Not agreeing with the premise that they deliberately wrote bad code [I have no knowledge]...But I will attest that writing bad code is actually harder than it looks. Over the years I have produced training material on improving code (not just performance, but that is the relevant part). Sure, there are certain types of code that are obviously crap, but the much more important is that which looks good at first glance, coupled with something that looks better - and then having the exact opposite turn out to be the reality.