Someone explain this to me - string.Format compared to concatenation



  • Comment from someone on a code review:

    Please remember that StringBuilder or string.Format is more preferred then the string concatenation operator.

    Is this just a style thing? Or is there a practical difference between:

     var baz = blah.ToString() + "-" + foo.ToString();
    

    and:

    var baz = string.Format("{0}-{1}", blah.ToString(), foo.ToString())
    

    other than the latter is much more difficult to read and requires a shitload more typing?


    I 100% get that StringBuilder is a better option if you know in advanced that you're doing dozens of concatenations at once, but we're talking about error logging lines that will likely be executed once a month at best, for that StringBuilder is complete overkill.


  • FoxDev

    @blakeyrat said:

    var baz = string.Format("{0}-{1}", blah, foo)

    Fixed that potential null pointer exception for you.

    when using string.Format ToString() is called implicitly in a way that is NPE safe, and so it is not necessary to explicitly call it.

    other than that.... it's pretty much just a style thing, and if the business has settled on a style it's a good idea to either conform to it, or make a reasoned campaign to change it to something more sane.


  • Trolleybus Mechanic

    I'm pretty sure somewhere on the old forum, I did a StringBuilder to "xxx"+"yyy" comparison. String builder comes out on top if you're doing tens to hundreds of thousands of concats, otherwise you're saving milliseconds.

    For String.Format vs. "xxx"+"yyy", I do prefer****strong text String.Format. If only because:

    1. It allows for numeric and date formatting {0:N2} or {1:yyyy-MM-dd}
    2. The String.Format format is well documented and universal, so it discourages devs from inventing their own templating language (and then re-inventing it in each function).
    3. It makes it easier to see what the string is in a single chunk, rather than trying to horizontal scroll across a huge wide string to figure out what's going on. SomeDumbVariable + " - " + SomeDumberVariable + " - " + Lloyd
    4. Makes for easier localization.
      EN:
      Your dumbfuck level is {0}, Jeff

    FR:
    JESUS FUCKING CHRIST CAN DISCOURESE JUST FORMAT THIS SHIT PROPERLY THIS IS THE THIRD TIME I'M EDITING!
    Le Jeff, le level au doomfuk est {0} ici



  • This happened to me not long ago:

    @fbmac said:

    That is the problem with cargo cult and strict rules. A good coder should understand why something is bad and know when it is too much or too little, but I frequently hear people preaching strict rules and applying then when they do more damage than help.

    Edit: the worse that happened to me was when they forbid us to use the "+" operator in c# strings anywhere, no exceptions. Because String.Concat is faster in some rare cases where that would matter.



  • @blakeyrat said:

    other than the latter is much more difficult to read

    Hey, that's just, like, your opinion, man.

    StringBuilder is definitely an overkill, and it'd read horribly without question. Format... I like it, seems to leave less room for silly mistakes like calling .ToString() on a null, or adding two numbers accidentally.

    But yeah, ultimately question of style.



  • @blakeyrat said:

    Is this just a style thing? Or is there a practical difference between

    There is practical no difference between them if the concatenation isn't in a loop. The compiler will implement a concatenation as a StringBuilder if it sees a benefit. Also, concatenating short strings has almost no measurable ill effects - the performance problems happen as the string gets larger.



  • @Lorne_Kates said:

    1) It allows for numeric and date formatting {0:N2} or {1:yyyy-MM-dd}

    I don't need that.

    @Lorne_Kates said:

    2) The String.Format format is well documented and universal, so it discourages devs from inventing their own templating language (and then re-inventing it in each function).

    Irrelevant for this use.

    @Lorne_Kates said:

    3) It makes it easier to see what the string is in a single chunk, rather than trying to horizontal scroll across a huge wide string to figure out what's going on. SomeDumbVariable + " - " + SomeDumberVariable + " - " + Lloyd

    I highly disagree with that.

    @Lorne_Kates said:

    4) Makes for easier localization.

    Irrelevant for this use.


    I'm actually really cheesed about this code review in general, I'm getting slammed on because:

    1. I separated the work from the GUI ("why did you put this work in a thread? Because it shouldn't need to know about Console to communicate to the outside world." "Oh well I guess that makes sense, but you don't need two threads to do that you could have had a callback blah blah blah")

    2. Then when I convinced my boss it makes sense to separate the logic from the UI, he gets on my case about using a BackgroundWorker to do it. Why? It's a handy class that already does 95% of what I need, it's in the framework, it's not OS-specific... I honestly don't understand what his beef is. I guess he would have preferred I implemented a new worker thread class from scratch.

    Grump.

    I am definitely in the wrong industry.



  • @Lorne_Kates said:

    3) It makes it easier to see what the string is in a single chunk, rather than trying to horizontal scroll across a huge wide string to figure out what's going on. SomeDumbVariable + " - " + SomeDumberVariable + " - " + Lloyd

    If the string is something like "Welcome Mr {0}, the time is {1}, there are {2} unread messages", yes, it's much more clear.

    If it's more like "{0}: {1} - {2}", though, there's not much of an advantage.



  • Does "{0}-{1}" have to be parsed at runtime? If so, I don't see how string.Format could possibly be faster than concatenation.



  • @anonymous234 said:

    If the string is something like "Welcome Mr {0}, the time is {1}, there are {2} unread messages", yes, it's much more clear.

    No it's not, because I have to do all this mental exercise to figure out what gets filled-in at those placeholders, and the info I need to do that is WAY out of context with the rest of the string.

    @anonymous234 said:

    If it's more like "{0}: {1} - {2}", though, there's not much of an advantage.

    Since this project contains status and log lines, 99.9% of them are like this.


  • ♿ (Parody)

    Firstly, I'd reject that comment for bad grammar. However...

    @blakeyrat said:

    other than the latter is much more difficult to read and requires a shitload more typing?

    It's a style thing. The latter is, in general, immensely more readable to me for not all that much extra typing. For something as simple as what you showed, I wouldn't care too much, though.

    But also, I wouldn't put cargo cult optimization past your reviewer.



  • @boomzilla said:

    But also, I wouldn't put cargo cult optimization past your reviewer.

    I never mentioned optimization, I don't know where people are getting this from.

    I asked if there was a practical difference, and the answer is: there is-- the handling of .ToString() on null objects is better. Which is great and all, but I wish it wasn't so fucking difficult to read that.

    Here's one I just changed a second ago:

    webClient.UploadData( webHost + "/api/foobar", "POST", baz );
    

    To:

    webClient.UploadData( string.Format( "{0}/api/foobar", webHost ), "POST", baz );
    

    The second version is a WHORE to read compared to the first. You encounter the relevant data in literally REVERSE ORDER. Not to mention it's almost twice as fucking long. Anybody who says they prefer the bottom example is an idiot.



  • I think the clearest way is how languages like PHP or Bash do it

    "Welcome Mr {person.getName()}, the time is {time.now()}, there are {len(unread_messages)} unread messages"

    webClient.UploadData( "{webhost}/api/foobar", "POST", baz );

    But I don't think there's any way to implement it cleanly in a compiled language.



  • @anonymous234 said:

    I think the clearest way is how languages like PHP or Bash do it

    Yeah I'd be ok with that.


  • ♿ (Parody)

    @blakeyrat said:

    I never mentioned optimization, I don't know where people are getting this from.

    It's from the reviewer talking about it being "more preferred." That's a common cargo cult optimization thing about people who heard that story a long time ago.

    @blakeyrat said:

    Anybody who says they prefer the bottom example is an idiot.

    Something super simple like that I would be fine with. It's more complex messages where I prefer the String.format() method.



  • @blakeyrat said:

    Yeah I'd be ok with that.

    You might want to familiarize your devs with string interpolation, then.

    $"Name = {name}, hours = {hours:hh}"
    var s = $"hello, {name}"
    System.IFormattable s = $"Hello, {name}"
    System.FormattableString s = $"Hello, {name}"
    $"{person.Name, 20} is {person.Age:D3} year {(p.Age == 1 ? "" : "s")} old." 
    


  • @blakeyrat said:

    > Please remember that StringBuilder or string.Format is more preferred then the string concatenation operator.

    Please, remember that the point of Code Review is to check for bugs, missing error checks, unnecessarily specific or complex logic and other errors related to usability of the code, not nitpicking minor style issues. There are better uses of everyone's time than fixing nits.



  • The funny thing is he called out a couple nits, but this he stood fast on. Whatever.



  • @Maciejasjmj said:

    You might want to familiarize your devs with string interpolation, then.

    Doesn't that only work with C# 6.0 ???



  • @uncreative said:

    Doesn't that only work with C# 6.0 ???

    Well, yes. But I think if @blakeyrat wasn't on the latest and greatest, we'd see him ranting much more about it...



  • @blakeyrat said:

    The funny thing is he called out a couple nits, but this he stood fast on. Whatever.

    IF it's done that way in the rest of the code, maybe it's a good idea to follow. But personally, working code is always better than "correct" code. At most a teaching moment, but not something to hound you on.



  • I got him to back down on reimplementing 90% of BackgroundThread just so I could use Thread.Join() instead of polling. At least.

    He also called out a bunch of "efficiency" things which don't fucking matter in an app that spends 99.9% of its time waiting for a web service to return, but they're easy to change so what fucking ever.



  • True... @blakeyrat , what version of C#??



  • @Maciejasjmj said:

    Well, yes. But I think if @blakeyrat wasn't on the latest and greatest, we'd see him ranting much more about it...

    This project is 4.5.

    But your alternative seems to be more annoying to implement, if it requires all those lines. But I barely looked at it.


  • Trolleybus Mechanic

    @blakeyrat said:

    I don't need that.

    Then don't use it. "X"+"Y" will work for you.

    @blakeyrat said:

    Irrelevant for this use.

    Then don't use it. "X"+"Y" will work for you.

    @blakeyrat said:

    I highly disagree with that.

    I highly disagree with your disagreement. I've seen string concating a dozen variables, interspaced with text-- and often an x ? y : z decider in there. No line breaks. Usually to format a hyperlink or some other dumbshit.

    99% of the time it can be refactored by getting a better developer. Othertimes, you just want to know:

    String.Format("{4:yyyy-MM-dd}", FormatDumbassLongHyperlink(If(IsHttp, Link.HttpVersion, Link.HttpsVersion)), GetHyperlinkClassForItem(If(Item.LinkClass IsNothing Then Website.DefaultLinkClass, Item.LinkClass)), If(Item.Target.Is....

    And so forth. I got a phone call while typing this, and now even I can't remember how I was going to finish typing that, so my case is proven.

    @blakeyrat said:

    Irrelevant for this use.

    Then don't use it. "X"+"Y" will work for you.



  • @blakeyrat said:

    But your alternative seems to be more annoying to implement, if it requires all those lines. But I barely looked at it.

    Those are examples, Blakey.

    Basically instead of

    String.Format("This is my {0} string which has {1} and {2} in it", something, otherThing, some.Other.Thing);
    

    you do

    $"This is my {something} string which has {otherThing} and {some.Other.Thing} in it."
    


  • @Maciejasjmj said:

    $"This is my {something} string which has {otherThing} and {some.Other.Thing} in it."

    And this is why I'm sad to be leaving the C# game. It's getting so awesome.



  • That is awesome, honestly.

    This page has absolutely no version numbers on it, how do I know what version(s) of C# support it? Not like MSDN to leave that off. (Well, it says "Visual Studio 2015", but that hardly seems relevant for a language feature!)



  • @blakeyrat said:

    webClient.UploadData( string.Format( "{0}/api/foobar", webHost ), "POST", baz );

    BTW I totally undid this change. Fuck that noise. If I have to defend it, I will.



  • @blakeyrat said:

    This page has absolutely no version numbers on it, how do I know what version(s) of C# support it? Not like MSDN to leave that off. (Well, it says "Visual Studio 2015", but that hardly seems relevant for a language feature!)

    It's in the language reference itself apparently. The non-selectable version means it's not available otherwise. As compared to the "string" reference page, that lets you change versions



  • That looks a lot like ruby.

    <ducks, runs>



  • @tufty said:

    That looks a lot like ruby.

    Looks, sounds, acts like a duck, it must be M$ making a better version of a duck!



  • @tufty said:

    That looks a lot like ruby.

    A lot of things in C# look like things from other languages, the difference is C#'s implementation is (generally) much better-thought-out. Except async/await. That's a nightmare. Great job turning compile-time errors into runtime errors there, Microsoft. Oh and you broke the debugger in the process? Great!

    @tufty said:

    <ducks, runs>

    Well feed your duck more fiber then.



  • @blakeyrat said:

    Well, it says "Visual Studio 2015", but that hardly seems relevant for a language feature!

    Well, when there are versions, they are wrong anyway. E.g. CreateFileMappingFromApp. Phone was in the list of supported versions and silently disappeared some time after the comment below was made.

    However, in this case it actually might be enough. Interpolated strings are purely compile-time feature (they are just syntactic sugar for the corresponding String.Format call), so the VS2015 compiler might support them even if targeting older runtime. I don't work in C# and don't have VS2015 around, so I can't try it, but it might be worth trying (if you don't need older compiler, of course).



  • IIRC java.lang.String + is only good for a chain of one or two and with small strings, because it is really inefficient. I think compilers actually change some uses of + into StringBuilders for you because they are so much more efficient.



  • Sounds like Cargo Cult thinking to me.

    This code ...

    System.out.println(a + b + c + d + e + f + g + h + I + j);
    

    ... doesn't have any performance problems unless the total string is over a few thousand bytes long. StringBuilders have to do allocation too, but they pre-allocate space to avoid Schlemiel the Painter problems. For small strings, a StringBuilder will often estimate too large and you will have wasted space in the buffer and you'll still need to do one copy to build a string for the result of toString(), so it's not terribly good in this use case from a memory standpoint. For medium-size strings, a StringBuilder will still have to reallocate and copy a few times, possibly once for almost every append if all of the strings being appended are the approximately same size. Worst case is the case where the appends get larger. For example, if every append is slightly larger than the current size of the string, then StringBuilder provides no benefit at all and incurs a performance penalty when it has to do the final copy, ending up being both slower and less memory efficient than simple concatenation. StringBuilder really shines when it has a large buffer and there are many small appends.

    For logging scenarios, I can't imagine a case where StringBuilder would have a measurable positive impact on performance.



  • I think somewhere in Rico Mariani's blog mentioned that if you try to concat more then 3 strings together with "+", the compiler will automagically convert the call to StringBuilder.

    This is because concating string in mutable form is much much cheaper than the normal immutable interned form.

    So if you code for anything higher than .NET v2, this advice is more on style than anything. Do so if you're required to change, but little to no harm if you leave it as is.



  • @Jaime said:

    Sounds like Cargo Cult thinking to me.

    Try disabling optimizations. javac converts your + to StringBuilder code automatically (according to several sources online). Or you could also examine the bytecode.



  • If you have something like log.Print(blah.String() + " " + foo.String()) versus log.Printf("%s %s", blah, foo) versus log.Println(blah, foo), that's the order (worst→best) I'd put them in.



  • @Bulb said:

    they are just syntactic sugar for the corresponding String.Format call

    It's apparently a bit more complicated



  • A tiny bit. That thing is basically object form of the String.Format's underlying machinery. However, since it is available (publicly; it probably existed internally for some time before) only since .NET 4.6, it is unlikely interpolation would be supported when targeting lower version.



  • In Java the StringBuilder thing would make a difference since they have this idea that all strings are constants. So if you type something like

    String a=b.toString() + c.toString() + "XXX";
    

    You are creating two strings for b and c, a third string for b+c, and a fourth string for (b+c)+"XXX", which you store a pointer to in a. With a StringBuilder there is one less unreferenced string sloshing around for the garbage collector to pick up in this particular example. Of course, more +:es -> more unreferenced strings.

    I don't know if this works the same in C#, but maybe they want the same set of coding standards to apply to all languages to an as large extent as possible even when it does not make sense technically?


  • Fake News

    No, the Java compiler will detect your example.

    It translates to this on Java 8 (and most likely every version higher than version 4):

    String a = new StringBuilder(String.valueOf(b.toString())).append(c.toString()).append("XXX").toString();
    

    It's only when you start doing things like this that the compiler might get confused:

    String a = "";
    int i = 0;
    while (i < 5) {
         a = a + "b";
         i++;
    }
    

    I heard the compiler had some checks for loops, but in this case the compiler seems to create a new StringBuilder each and every time.



  • @JBert said:

    String.valueOf(b.toString())

    You can never be too sure, can you?


  • Fake News

    String.valueOf is generated by the compiler so that no null is passed to the constructor, the toString calls were from the example code I took from @Mikael_Svahnberg.

    Why you'd use toString for string concatenation goes beyond my understanding...



  • Interestingly, string.Format is less efficient than concatenation. It apparently creates more strings, even though it ought to be more efficient.



  • @Bulb said:

    the VS2015 compiler might support them even if targeting older runtime.

    This. You can change the C# language version as well somehow, but only 2015 supports the C#6 language features.



  • @blakeyrat said:

    Except async/await. That's a nightmare. Great job turning compile-time errors into runtime errors there, Microsoft. Oh and you broke the debugger in the process? Great!

    But they also started silently swallowing exceptions from worker threads! Progress!



  • @Magus said:

    Interestingly, string.Format is less efficient than concatenation. It apparently creates more strings, even though it ought to be more efficient.

    I don't see how parsing a format string could possibly not come at the cost of some added overhead.



  • It depends on when that's done. If that's done in the compilation, it shouldn't be any worse.


Log in to reply