Java: String vs. StringBuffer

tpierzina

So what's the conventional wisdom here among Java folks? Not on the
obvious cases, like building a string up in a loop or working with very
large strings that need to be modified, but on the more typical,
non-literal cases?

This seems to be the prevailing sentiment: http://fishbowl.pastiche.org/2002/09/04/the_stringbuffer_myth
Similar sentiments in this thread: http://www.caucho.com/support/resin-interest/0106/0576.html
Good stuff here: http://www.jroller.com/page/sterat
...and here: http://www-128.ibm.com/developerworks/java/library/j-jtp04223.html?ca=dgr-lnxw10JavaUrbanLegends

I've always been old-school--any time I concatenate more than 2 strings I would try to use a StringBuffer. But I did some reading last night and it appears that I've been wasting my time, and even making things worse (aside from writing less-readable code).

I did learn/re-learn some things while reading up on this though:

StringBuffers are synchronized (StringBuilder in 5.0 is not)
Initial capacity of StringBuffer is key to helping performance. I knew this about ArrayLists, but I'm embarassed to say that I did not apply the idea to my (valid) use of StringBuffers.
My insistence on using "final" for private variables that are, indeed, final, might actually be useful (at least for Strings), since the compiler can then easily treat them as literals.

Any thoughts to add? [By the way, I've been enjoying this site immensely since a friend sent me the link last week.]

--
Todd

emptyset

@tpierzina said:

So what's the conventional wisdom here among Java folks? Not on the obvious cases, like building a string up in a loop or working with very large strings that need to be modified, but on the more typical, non-literal cases?

not a java folk myself (currently), but i've worked with java in the past (and i <3 it). i generally live by two principles when i code:

1. the compiler is always correct. if it is not, problems with it are quickly addressed. especially with java's large number of users, you're sure to find clever optimizations such as the one illustrated by the first link. compilers are moving more and more towards optimizations that transform readable, "inefficient" code into blazing-fast code.

2. a successful optimization of a slow feature in a typical program will not come from throwing "efficient" code at the compiler, but by stopping to think and rework the design/algorithms used. especially painful is to code up something "efficient" for a particular platform/compiler and then watch the updated version of the platform/compiler two months later take your "efficient" code and make it slower than it was before.

and of course, the bottleneck is never where you expect it.

eagle

@tpierzina said:

So what's the conventional wisdom here among Java folks? Not on the
obvious cases, like building a string up in a loop or working with very
large strings that need to be modified, but on the more typical,
non-literal cases?

Well, my wisdom on String-Concatenation is like this:

1.Readability first, performance second.

2. Ignore rule #1 for code parts that are run very often.

Effectively I almost never use StringBuffer, if I can write a single
expression to construct the result String. (Creating a new String in a
loop is NOT an expression, so "for (...) a = ... + a + ...;" does NOT
qualify for NOT using StringBuffer/Builder.)

If the single expression requires many inline conditionals ( ... ? ... : ... ) then I will rewrite to use a StringBuffer.

As for what the compiler does:

package com.thedailywtf;

public class StringConcatenationDemo {



 static final String B = "B";

 static final String C = "C";

 static final String D = "D";



 static String b = "b";

 static String c = "c";

 static String d = "d";



 static boolean what = false;

 public static void main(String[] args) {

 String a;

 a = B + C + D;

 a = b + c + d;

 a = (what ? b : b + c) + d;

 }

}

Here main() will effectively have bytecode doing this (Java 5):

// for: a = B + C + D

a = "BCD"

// for: a = b + c + d

a = new StringBuilder(String.valueOf(b)).append(c).append(d).toString();

// for: a = ( what ? b : b + c) + d;

a = new StringBuilder(String.valueOf((what ? b : new
StringBuilder(String.valueOf(b)).append(c).toString()).append(d).toString();

As you can see every inline conditional that contains subexpressions
will introduce another StringBuffer/Builder, hence it may be less
readable but faster to get rid of them, and rewrite that last
expression like this:

StringBuilder tmp = new StringBuilder(String.valueOf(b)); // or without valueOf if b is never null

if (!what) {

tmp.append(c);

}

a = tmp.append(d).toString();



Problem here: every time you write "tmp." extra bytecode is created. In
code that is called for several million times, this may add up. Hence
the best performing and again more readable solution will be to
duplicate code:

if (what) {

a = b + d;

} else {

a = b + c + d;

}



(This looks great in this example, but think of expressions having 3
conditions and more than 10 variables and literals in the
concatenation, this would give you 8 cases repeating most of the
literals.)

However, out of 1000 String expressions that I write and let the
compiler optimize, at most a single one is a candidate for this kind of
by hand optimization. So as always: unless you have a performance
problem, DO NOT (pre-)OPTIMIZE!

cu

PeaceLove_WTF

Interesting points, eagle. I tend to agree with your philosophy on optimization, i.e., readability first, and performance second, in fact I generally take this beyond the realm string concatenation (that is of course, unless I run into performance problems).

I have a quick question about your code example though. Assuming the String "a" is read later on in the code, would Sun's optimizer simply just omit the first two lines of concatenation? They have no effect on the program, and it seems like this is something a reasonbly well written optimizer should be able to determine (I understand what you're trying to demonstrate, just asking 'cos you seem to know a bit about optimizers).

Also, in your third concatenation example, may I suggest:

a = b + (what ? c : "") + d;

This is, IMO, the most readable way to accomplish what we're trying to accomplish (it clearly shows that regardless of what "what" is, b and d are going to be the front and back of the string, and c may be added, depending on the value of "what"). Also, it's only marginally less optimal than your example (i.e., when "what" is false, we're still stuck with two appends, rather than just one).

eagle

< quote user="PeaceLove&WTF" >

I have a quick question about your code example though. Assuming the
String "a" is read later on in the code, would Sun's optimizer simply
just omit the first two lines of concatenation? They have no effect on
the program, and it seems like this is something a reasonbly well
written optimizer should be able to determine (I understand what you're
trying to demonstrate, just asking 'cos you seem to know a bit about
optimizers).

< /quote >

I'm not quite sure if I understand your question. Do you mean this:

a = expr1;

a = expr2;

a = expr3; // now assignments from expr1 and expr2 are just superfluous!

??

If so, then yes, if expr1 and expr2 do not have side-effects, they
could be optimized away, yet I do not know if any JIT-Compiler does
this.

< quote user="PeaceLove&WTF" >

Also, in your third concatenation example, may I suggest:

a = b + (what ? c : "") + d;

This is, IMO, the most readable way to accomplish what we're trying to
accomplish (it clearly shows that regardless of what "what" is, b and d
are going to be the front and back of the string, and c may be added,
depending on the value of "what"). Also, it's only marginally less
optimal than your example (i.e., when "what" is false, we're still
stuck with two appends, rather than just one).

< /quote >

First of all it should be

a = b + (what ? "" : c) + d;

Well, sure this works for this simplified example, but the point I
wanted to illustrate, are the two levels of StringBuilders introduced
through inline conditionals with subexpressions. Perhaps I should have
used this nonreducable example:

a = (what ? "Looks like " + b + " is element of " : "Looks like "+b + " and "+ c +" are elements of ") + d +".";

But wouldn't you agree, it would have distracted from the point to be illustrated?

cu

(And, yes I know, this last example is bad code regarding internationalization, but it's an example, for god's sake!)

(And yes, I know how to quote, but this f**king forum SW always rejected my post: Non matching quote blocks in post.Where?)