Java: String vs. StringBuffer



  • So what's the conventional wisdom here among Java folks? Not on the
    obvious cases, like building a string up in a loop or working with very
    large strings that need to be modified, but on the more typical,
    non-literal cases?


    • This seems to be the prevailing sentiment: http://fishbowl.pastiche.org/2002/09/04/the_stringbuffer_myth
    • Similar sentiments in this thread: http://www.caucho.com/support/resin-interest/0106/0576.html
    • Good stuff here: http://www.jroller.com/page/sterat
    • ...and here: http://www-128.ibm.com/developerworks/java/library/j-jtp04223.html?ca=dgr-lnxw10JavaUrbanLegends

    I've always been old-school--any time I concatenate more than 2 strings I would try to use a StringBuffer.  But I did some reading last night and it appears that I've been wasting my time, and even making things worse (aside from writing less-readable code).

    I did learn/re-learn some things while reading up on this though:
    • StringBuffers are synchronized (StringBuilder in 5.0 is not)
    • Initial capacity of StringBuffer is key to helping performance.  I knew this about ArrayLists, but I'm embarassed to say that I did not apply the idea to my (valid) use of StringBuffers.
    • My insistence on using "final" for private variables that are, indeed, final, might actually be useful (at least for Strings), since the compiler can then easily treat them as literals.

    Any thoughts to add?  [By the way, I've been enjoying this site immensely since a friend sent me the link last week.]

    --
    Todd



  • @tpierzina said:

    So what's the conventional wisdom here among Java folks? Not on the obvious cases, like building a string up in a loop or working with very large strings that need to be modified, but on the more typical, non-literal cases?

    <FONT face="Courier New" size=2>not a java folk myself (currently), but i've worked with java in the past (and i <3 it).  i generally live by two principles when i code:</FONT>

    <FONT face="Courier New" size=2>1. the compiler is always correct.  if it is not, problems with it are quickly addressed.  especially with java's large number of users, you're sure to find clever optimizations such as the one illustrated by the first link.  compilers are moving more and more towards optimizations that transform readable, "inefficient" code into blazing-fast code.</FONT>

    <FONT face="Courier New" size=2>2. a successful optimization of a slow feature in a typical program will not come from throwing "efficient" code at the compiler, but by stopping to think and rework the design/algorithms used.  especially painful is to code up something "efficient" for a particular platform/compiler and then watch the updated version of the platform/compiler two months later take your "efficient" code and make it slower than it was before.</FONT>

    <FONT face="Courier New" size=2>and of course, the bottleneck is never where you expect it.</FONT>



  • @tpierzina said:

    So what's the conventional wisdom here among Java folks? Not on the
    obvious cases, like building a string up in a loop or working with very
    large strings that need to be modified, but on the more typical,
    non-literal cases?




    Well, my wisdom on String-Concatenation is like this:



    1.Readability first, performance second.

    2. Ignore rule #1 for code parts that are run very often.



    Effectively I almost never use StringBuffer, if I can write a single
    expression to construct the result String. (Creating a new String in a
    loop is NOT an expression, so "for (...) a = ... + a + ...;" does NOT
    qualify for NOT using StringBuffer/Builder.)



    If the single expression requires many inline conditionals ( ... ? ... : ... ) then I will rewrite to use a StringBuffer.





    As for what the compiler does:



    <font size="1">package com.thedailywtf;



    public class StringConcatenationDemo {

       

        static final String B = "B";

        static final String C = "C";

        static final String D = "D";

       

        static String b = "b";

        static String c = "c";

        static String d = "d";

       

        static boolean what = false;



        public static void main(String[] args) {

            String a;

            a = B + C + D;

            a = b + c + d;

            a = (what ? b : b + c) + d;

        }



    }



    </font>Here main() will effectively have bytecode doing this (Java 5):



    // for: a = B + C + D

    a = "BCD"

    // for: a = b + c + d

    a = new StringBuilder(String.valueOf(b)).append(c).append(d).toString();

    // for: a = ( what ? b : b + c) + d;

    a = new StringBuilder(String.valueOf((what ? b : new
    StringBuilder(String.valueOf(b)).append(c).toString()).append(d).toString();



    As you can see every inline conditional that contains subexpressions
    will introduce another StringBuffer/Builder, hence it may be less
    readable but faster to get rid of them, and rewrite that last
    expression like this:



    <font size="1">StringBuilder tmp = new StringBuilder(String.valueOf(b)); // or without valueOf if b is never null

    if (!what) {

      tmp.append(c);

    }

    a = tmp.append(d).toString();

    </font>

    Problem here: every time you write "tmp." extra bytecode is created. In
    code that is called for several million times, this may add up. Hence
    the best performing and again more readable solution will be to
    duplicate code:



    <font size="1">if (what) {

      a = b + d;

    } else {

      a = b + c + d;

    }

    </font>

    (This looks great in this example, but think of expressions having 3
    conditions and more than 10 variables and literals in the
    concatenation, this would give you 8 cases repeating most of the
    literals.)



    However, out of 1000 String expressions that I write and let the
    compiler optimize, at most a single one is a candidate for this kind of
    by hand optimization. So as always: unless you have a performance
    problem, DO NOT (pre-)OPTIMIZE!



    cu

    <font size="1"></font>



  • Interesting points, eagle. I tend to agree with your philosophy on optimization, i.e., readability first, and performance second, in fact I generally take this beyond the realm string concatenation (that is of course, unless I run into performance problems).

    I have a quick question about your code example though. Assuming the String "a" is read later on in the code, would Sun's optimizer simply just omit the first two lines of concatenation? They have no effect on the program, and it seems like this is something a reasonbly well written optimizer should be able to determine (I understand what you're trying to demonstrate, just asking 'cos you seem to know a bit about optimizers).

    Also, in your third concatenation example, may I suggest:

    a = b + (what ? c : "") + d;

    This is, IMO, the most readable way to accomplish what we're trying to accomplish (it clearly shows that regardless of what "what" is, b and d are going to be the front and back of the string, and c may be added, depending on the value of "what"). Also, it's only marginally less optimal than your example (i.e., when "what" is false, we're still stuck with two appends, rather than just one).



  • < quote user="PeaceLove&WTF" >



    I have a quick question about your code example though. Assuming the
    String "a" is read later on in the code, would Sun's optimizer simply
    just omit the first two lines of concatenation? They have no effect on
    the program, and it seems like this is something a reasonbly well
    written optimizer should be able to determine (I understand what you're
    trying to demonstrate, just asking 'cos you seem to know a bit about
    optimizers).



    < /quote >



    I'm not quite sure if I understand your question. Do you mean this:



    a = expr1;

    a = expr2;

    a = expr3; // now assignments from expr1 and expr2 are just superfluous!



    ??



    If so, then yes, if expr1 and expr2 do not have side-effects, they
    could be optimized away, yet I do not know if any JIT-Compiler does
    this.



    < quote user="PeaceLove&WTF" >



    Also, in your third concatenation example, may I suggest:



    a = b + (what ? c : "") + d;



    This is, IMO, the most readable way to accomplish what we're trying to
    accomplish (it clearly shows that regardless of what "what" is, b and d
    are going to be the front and back of the string, and c may be added,
    depending on the value of "what"). Also, it's only marginally less
    optimal than your example (i.e., when "what" is false, we're still
    stuck with two appends, rather than just one).



    < /quote >



    First of all it should be



    a = b + (what ? "" : c) + d;



    Well, sure this works for this simplified example, but the point I
    wanted to illustrate, are the two levels of StringBuilders introduced
    through inline conditionals with subexpressions. Perhaps I should have
    used this nonreducable example:



    a = (what ? "Looks like " + b + " is element of " : "Looks like "+b + " and "+ c +" are elements of ") + d +".";



    But wouldn't you agree, it would have distracted from the point to be illustrated?



    cu



    (And, yes I know, this last example is bad code regarding internationalization, but it's an example, for god's sake!)



    (And yes, I know how to quote, but this f**king forum SW always rejected my post: <font color="Red">Non matching quote blocks in post.</font>Where?)


Log in to reply