Optimization by voodoo programming

brazzy

Well, Alex doesn't seem to have considered this WTF-worty, but I still do, so here it is: Probably the most convoluted

and inefficient method to write an array of bytes to disk in Java - and the guy who wrote it probably though he

was being extra good and efficient by using buffered streams...

The depressing thing is that this code had already been spread by copy+paste when I discovered it. I think

I exterminated all instances.

byte[] bImg = (byte[]) alRequest.getAttribute(ChartRetrieverConstants.IMG_DATA);

File file = tmpFile.getFile();

BufferedInputStream bis = new BufferedInputStream(new ByteArrayInputStream(bImg));

BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(file));

int numBytesRead;

while ((numBytesRead = bis.read()) != -1)

{

bos.write(numBytesRead);

}

bis.close();

bos.flush();

bos.close();

farnz

I'm no Java programmer, so I'm guessing from API docs here, but is this a better way to do it?

byte[] bImg = (byte[]) alRequest.getAttribute(ChartRetrieverConstants.IMG_DATA);
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(tmpFile.getFile()));
bos.write(bImg);
bos.close();

At least one WTF in the code is obvious: bos.close() implies bos.flush(), according to the API docs.

brazzy

@farnz said:

I'm no Java programmer, so I'm guessing from API docs here, but is this a better way to do it?
byte[] bImg = (byte[]) alRequest.getAttribute(ChartRetrieverConstants.IMG_DATA);
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(tmpFile.getFile()));
bos.write(bImg);
bos.close();

Close. You can leave out the BufferedOutputStream too, it's only additional overhead here.

@farnz said:

At least one WTF in the code is obvious: bos.close() implies bos.flush(), according to the API docs.

Superfluous, but at least it happens only once. The buffered streams just mean your data gets copied twice extra with no benefit (in this case), but the biggest WTF is the loop resulting in a pointless method call for every single byte. A wonderful combination of loss of code clarity and loss of performance resulting from a combination of lack of understanding and misguided optimization. So the total list of WTFs:

1. Wrapping a byte[] into a ByteArrayOutputstream when you're just going to loop over it anyway.
2. Using a buffered input stream to read in-memory data.
3. Using a buffered output stream when you're writing a single block of data.
4. Looping to copy every byte separately rather than using the bulk operation
5. Flush()ing a stream just before close()ing it.

An especially nice touch is how 1 necessitates 4 which in turn makes 3 an improvement. 5 is minor, but there's absolutely not excuse for 1 and 2.

I have two theories for what could have caused it:

A) Programming by copy-and-paste-something-vaguely-like-what-you-need-and-fiddle-around-until-it-works. Such programmers should have their Ctrl-[CXV] privileges revoked.

B) The programmer heard somewhere that "you always have to use buffered streams for IO to avoid bad performance" and took this as a rule that must always be followed, without the slightest understanding of why and when (and when not) to use buffered streams.

I wouldn't consider it that much of a WTF. I just assume that he didn't know that a BufferedOutputStream can write entire arrays of data. Operating on the assumption that you can't do that, the cost of the for loop is vastly overshadowed by the cost in IO. Yes, it's stupid, and harder to read, but I'm willing to bet that the difference in speed between the two is pretty insignificant.

masklinn

@brazzy said:

A) Programming by
copy-and-paste-something-vaguely-like-what-you-need-and-fiddle-around-until-it-works.
Such programmers should have their Ctrl-[CXV] privileges revoked.

They don't care! They can still use (Shift|Ctrl)(Ins|Suppr)!

@Volmarias said:

I wouldn't consider it that much of a WTF. I just assume that he didn't know that a BufferedOutputStream can write entire arrays of data.

Not only was the BufferedOutput redundant anyway, but not being able to read the frigging Javadoc (which, for all the hatred I have for Java, is still one of the clearest and most extensive API docs in existence) should be punished by an immediate death.

brazzy

@brazzy said:

So the total list of WTFs:

Wrapping a byte[] into a ByteArrayOutputstream when you're just going to loop over it anyway.

Using a buffered input stream to read in-memory data.

Using a buffered output stream when you're writing a single block of data.

Looping to copy every byte separately rather than using the bulk operation

Flush()ing a stream just before close()ing it.

Forgot one:

Misleading variable name numBytesRead

This actually indicates the the code was copied-and-modified from
somewhere that used the normal idiom for copying large chunks of data
between streams which due to an API WTF needs to involve the
number of bytes actually read
by the InputStream.read(byte[]) method (which is its return value).
That was probably too complicated for the genius to understand so he
switched to a manual loop over all bytes but was too lazy to change the
variable name.

compay

I'd think java.io.RandomAccessFile would be the fastest way to write those bytes to disk.

I'd laugh at the guy that wrote that code but I did more or less
the same thing a few weeks ago. When I realized it was slow as molasses
I Googled it and found a better way to do it.

brazzy

@compay said:

I'd think java.io.RandomAccessFile would be the fastest way to write those bytes to disk.

RAF doesn't really have any speed advantages over a simple
FileOutputStream; in fact it may be slower if you have lots of small
writes, because you cannot buffer it. The only reason to use it is if
you need the features (random access, d'oh, being able to create sparse
files, truncate files).

phelyan

Thing is, reading and then writing single bytes is significantly slower than reading/writing X amount in blocks.