RAM is slower than HDDs! If you are an idiot, that is.
-
Saw this paper linked on Slashdot and oh boy.
http://arxiv.org/ftp/arxiv/papers/1503/1503.02678.pdfLink died.
Looks like the paper was withdrawn: http://arxiv.org/abs/1503.02678
-
I skimmed it but saw enough to classify it as .
-
Dept of Biological Sciences
Dept of Chemical and Biological Engineering
Exactly what do these have to do with computers?
And you'd think the guy from the
Dept of Electrical and Computer Engineering
would know that RAM is always faster than HDDs.
-
Had a cursory glance - from the 'paper'
Constructing the file content has a major impact on the code’s performance. For the in-memory case, wedefine a string to contain the file contents, and use a loop to concatenate another string to it, until a predetermined file size is reached. This size limit was arbitrarily set to 1,000,000 bytes (less than 1 MB), which is small fraction of RAM available in most current computers, so the situation described in the paper happens for even small data sets. We start by adding 1 character (byte) at a time to the content, so in the in-memory case, the string containing the file will be concatenated 1,000,000 times. In the disk-only case, 1,000,000 disk operations are issued. We then repeat the experiment by adding 10, 1,000, and 1,000,000 bytes at a time.
I'm fairly certain without going beyond that, that they perhaps need to read this:
http://www.joelonsoftware.com/articles/fog0000000319.html (Shlemiel the painter's algorithm)
-
Or would know to actually write to disk for each add on the write to disk version of the task rather than just once. (in the java code they add to the buffered writer over and over again, but only flush at the end).
-
My summary of the "paper":
-
-
Oh sweet merciful Chaos…
The method they use to prove in-memory is 'slower' is string concatenation.
In Java, this creates a whole new String object.Every. Fucking. Time.
So when you're creating one million of the fuckers, of course it'll be slow!
-
'slower' is string concatenation.
Yeppers. Though the not writing to disk for disk writing is a bigger issue than not using a string builder for the in memory version IMO.
-
Even if they used a StringBuffer and flushed the BufferedWriter in the loop, it doesn't mean anything was actually written to disk--it just means it made it to the disk buffer.
-
Well done, you wrote a string directly to disk. You are winrar.
Can I see that string now? It's ok, I'll wait while you read it back into memory so you can display it.
Yes, I know, more specific explanations above, but even this is enough for pretty much everyone to figure out why this is not a valid test without even knowing what the differences between languages used are.
-
Even if they used a StringBuffer and flushed the BufferedWriter in the loop, it doesn't mean anything was actually written to disk--it just means it made it to the disk buffer.
This. They are pretty much comparing two ways of concatenating 1M "1":s in memory, where one of the ways is positively terrible. Any operations involving disks of any kind are completely coincidental.
-
It sounds like these guys aren't clear on the concept of memory management and whatever their language runtime / OS was doing to cache stuff when writing to disk.
More seriously, this is perhaps the best argument for putting all your stuff into a single text file and just going from there. Imagine how easy it would be to share stuff! I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.
-
More seriously, this is perhaps the best argument for putting all your stuff into a single text file and just going from there. Imagine how easy it would be to share stuff!
But can you lock it?
I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.
-
No, from what I skimmed, they're aware that string concatenation is an expensive operation. I couldn't bear to read on to see if they understood that that means that the way they did it is stupid.
Then I saw "In other words, the operating system already lessons disk access times" and was all
-
It sounds like these guys aren't clear on the concept of memory management and whatever their language runtime / OS was doing to cache stuff when writing to disk.
One of them is an electronics and computing engineer. Which is quite the
-
Not good ones, seemingly.
-
I like how they used Java 6 and Python 2.6.6 in Linux, while using Java 8 and Python 2.7.6 in Windows.
It's like they don't even know how to run an experiment!
-
Also, obligatory kind-of-related front page article:
-
No, from what I skimmed, they're aware that string concatenation is an expensive operation.
Yeah, I totally just skimmed, but it sounds like they don't understand it as well as they think they do.
Actually, I'm not sure WTF they're doing. It reads like a term paper and are confusing people by making statements about in memory processing vs disk access.
WHBT
-
http://arxiv.org/ftp/arxiv/papers/1503/1503.02678.pdf
The more I read of that paper, the stupider I feel.
Exactly what do these have to do with computers?
They do hook computers up to instruments these days, and do quite a bit of simulation of reactions. I wouldn't let the majority of people in those departments actually implement an algorithm though; some people are best kept using just the large lego blocks well into adulthood…
Or would know to actually write to disk for each add on the write to disk version of the task rather than just once.
The buffered writer will also flush once an internal threshold is reached. It's not written by a fucking moron.
-
One of them is an electronics and computing engineer. Which is quite the
Not really. I mean: CANADA.
-
More seriously, this is perhaps the best argument for putting all your stuff into a single text file and just going from there. Imagine how easy it would be to share stuff! I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.
You could even Fast Forward <SSDS!> through that stuff…
-
I think next year's paper should be about randomly retrieving parts of the string, optionally in slow motion.
I'm sure we've featured some software round here that could do that - maybe even take a video of it as well. Perhaps they could enlist the services of the author of the software to help them with their next paper...
-
Then again, I believe that the software you're referring doesn't use automatic indexing. This might be a point of contention, since I believe the author didn't like them due to being slow. Which we now know is incorrect, what with HDDs being faster than RAM.
-
Guys I have proof that RAM is slower than emulated hard drives that are also in RAM!
-
And they're already in the same country too. Perhaps they know him. I read this abomination and it wouldn't surprise me if they had done drugs together.
I hope this is just some first-year bachelor students' "here's how to write a paper" course, although even then they would've failed it.
-
putting all your stuff into a single text file and just going from there
This sounds familar to me. Once you have all your stuff in the one file, you need some kind of Desktop Search functionality, no?
-
I like how they used Java 6 and Python 2.6.6 in Linux, while using Java 8 and Python 2.7.6 in Windows.
It's like they don't even know how to run an experiment!You have to use old versions of software when you're running on Linux hardware, because it's just not as good as Windows hardware.
-
This sounds familar to me. Once you have all your stuff in the one file, you need some kind of Desktop Search functionality, no?
But not just Search. Because once you have the Search, you can do Other things with it like Videos. Need to remember Something? Use Search!
-
Search *mumblemumble* Video *mumblemumble* For the Masses!
-
-
We need a Cult of the Civilized Desktop Search Construction Kit.
-
There's hardly a WTF in the paper. From their abstract:
In this paper we ... show that in-memory operations are not always a guarantee for high performance, and may actually cause a considerable slow-down.
...
We argue that ... better developer awareness and coding practices are necessary to ensure in-memory computing can achieve its full potential.Translation: we can write code so crappy that we achieve a considerable slow-down.
They certainly did prove that with their WTF code.
Now they only need to apply the better developer awareness and coding practices to themselves.
-
This sounds familar to me. Once you have all your stuff in the one file, you need some kind of Desktop Search functionality, no?
Hanzo'd, ever so subtly, by 6 hours I'm afraid.
-
You're starting to become a meme all by yourself. And the guys in golang should give you your own referrer ID.
-
Hanzo'd, ever so subtly, by 6 hours I'm afraid.
Arriving 6 hours late is a barrier to remaining un-Hanzo'd...
-
Translation: we can write code so crappy that we achieve a considerable slow-down.
Yeah, they talk about how they "use code inspired by real, production software." Looking at what they do, it's probably some sort of bioinformatics thing, where I understand the data can get quite large.
The more I think about this, the more I suspect that the biologists did most of the work here with the computer guy helping to set up their code or doing the actual diagnostics or something, so he gets an easy paper added to his CV and the biologists get to look like smartypants to their biologist friends.
Huzzah for publish or perish!
-
But, but, these guys are Professional Scientists ....
What could possibly go wrong?
-
-
Do you want a map?
It'll have to be digital....
Folding that bad-boy to fit in my glove-box... not gonna happen.
-
Let me summarise the essential point of the paper:
If your chosen algorithm is horrible, your program will perform badly.
There, did I do it well?
-
For a summary, sure. For an actual article, you'll have to write code and measure its performance.
Actually, I think they did the hard part of that for (writing deliberately bad code).
-
If I now write a paper debunking this crap, does it make it to be some scientific research paper? I could even add it to my LinkedIn profile!
BTW, I would probably use this title for the paper.
-
Actually, I think they did the hard part of that for (writing deliberately bad code).
Not agreeing with the premise that they deliberately wrote bad code [I have no knowledge]...But I will attest that writing bad code is actually harder than it looks. Over the years I have produced training material on improving code (not just performance, but that is the relevant part). Sure, there are certain types of code that are obviously crap, but the much more important is that which looks good at first glance, coupled with something that looks better - and then having the exact opposite turn out to be the reality.
-
-
Looks like they took down the article. Have a like anyway.
-
What could possibly go wrong?
Obligatory response:
-
Looks like they took down the article.
This paper has been withdrawn by the authors for personal reasons.
The authors decided they didn't want any more people laughing at them?
-
Oh, it's still there if you look at the revision history.