Split Personality



  • My coworker found this in a chunk of old code he's detarding/refactoring:

    <font color="teal" face="Courier New" size="2">String</font><font face="Courier New" size="2">[] textParagraphArray = _mainBodyText.Split(<font color="maroon">"\n"</font>.ToCharArray());</font><font face="Courier New" size="2"><o:p></o:p>
    textArray = </font><font face="Courier New" size="2">textParagraphArray</font><font face="Courier New" size="2">[i].Split(<font color="maroon">" "</font>.ToCharArray());</font><font face="Arial" size="2"><o:p></o:p></font>


    (only variable names changed to protect the stupid) 



  • Anybody want to count the number of string allocations in these two lines?



  • Why would anybody care? Do you get charged by the String?

     What's the language? In Java you can just split on "\n" without the chararray bit.
     

     

     



  • I think it is C#.  I don't see what the big deal is though, except for the fact that the two lines could be combined into one and you can use '\n' instead of the ToCharArray().



  • This is c#, i dont see a wtf here if there is a purpose for spliting the the text into paragraphs then the paragraphs into word arrays.

    Splitting strings into arrays depending on the situation can make them a lot easier to work with then doing substrings of a large string.

     and " ".ToCharArray() is probably used because the person was confused at the parameter for the split() function.  Which if you look at it only shows accepting char arrays or string arrays. so converting a string to a char array is easier then making a char array.  



  • "String.Split uses a  delimiter to create an array of strings from a source string. In doing so, String.Split allocates a new string object for each string that it has split out, plus one object for the array. As a result, using String.Split in a heavy duty context (such as a sorting routine) can be expensive. "



  • @Seltsam said:

    "String.Split uses a  delimiter to create an array of strings from a source string. In doing so, String.Split allocates a new string object for each string that it has split out, plus one object for the array. As a result, using String.Split in a heavy duty context (such as a sorting routine) can be expensive. "

     

    Just because something uses a few extra resources doesnt make it bad.  The code you shown, im guessing was just to split it up to make it easier to work with. That is very low context because your only going to do it maybe a few times(like if it was a asp.net page, once per post back).  What you quoted is talking about using split intensly, like having it within a loop getting called hundreds of times.

     

    Resources are not a problem with these kind of things considering the tiny amount of resources it uses compared to how much we have avaliable on our computers.

     

    But ofcourse this could be a wtf. I mean cause id like to know how you would rewrite this without using the extra resources.
     



  • @Seltsam said:

    " As a result, using String.Split in a heavy duty context (such as a sorting routine) can be expensive. "

    And we were meant to somehow work that out from the *zero* information about context you gave us to go on?  I don't think so!



  • I think the code is probably executing in a loop because of the use of the [i] to reference the array item.  In which case the big wtf is doing <font color="teal" face="Courier New" size="2"></font>

     

    <font color="teal" face="Courier New" size="2">String</font><font face="Courier New" size="2">[] textParagraphArray = _mainBodyText.Split(<font color="maroon">"\n"</font>.ToCharArray()); </font>

     

    <font face="Courier New" size="2">in every iteration of the loop, i'm guessing a split is a fairly expensive operation time wise, it would be much better to allocate the array before entering the loop.  It's not a huge WTF but certainly a very lame bit of programming, the kind of code you write at 16:00 on a friday afternoon.</font>

    <font face="Courier New" size="2"></font>
     



  • I certainly hope that the code in question needs to look at every word in every paragraph, and needs to process paragraphs individually.

    At a first look, I'd have guessed that this was used in some sort of word-count function. In this case, it's far more efficient to walk the string once and increase a counter when you hit a new word. It's more work, but the decrease in processing time is immense.

    However, regardless of why Split is needed, the use of "\n".ToCharArray() is totally needless.

    It's quite possible to use an inline array, like this:

    String[] textParagraphArray = _mainBodyText.Split(new char[] { '\n' });
    textArray = textParagraphArray[i].Split(new char[] { ' ' });

    A better idea would be to preallocate the arrays as private static variables in the class. This means they would only be created once, instead of every time:

    class MyClass
    {
        private static final char[] ParagraphSplit = { '\n' };
        private static final char[] TextSplit = { ' ' };
        ...
        void MyFunction()
        {
            String[] textParagraphArray = _mainBodyText.Split(ParagraphSplit);
            String[] textArray;
            ...
            for (int i = 0; i < textParagraphArray.Count; ++i)
            {
                textArray = textParagraphArray[i].Split(TextSplit);
                ...
            }
        }
    }


  • Split and Join are very slow methods.

    In Javascript, I refactored a string-rewrite algorithm that used split and join, to one that used substr, indexOf and +=.

    In IE, the total thing ran a little under twice as fast.
    In Gecko, it ran five times as fast.



  • @plazmo said:

    @Seltsam said:

    "String.Split uses a  delimiter to create an array of strings from a source string. In doing so, String.Split allocates a new string object for each string that it has split out, plus one object for the array. As a result, using String.Split in a heavy duty context (such as a sorting routine) can be expensive. "

     

    Just because something uses a few extra resources doesnt make it bad.  The code you shown, im guessing was just to split it up to make it easier to work with. That is very low context because your only going to do it maybe a few times(like if it was a asp.net page, once per post back).  What you quoted is talking about using split intensly, like having it within a loop getting called hundreds of times.

     

    Resources are not a problem with these kind of things considering the tiny amount of resources it uses compared to how much we have avaliable on our computers.

     

    But ofcourse this could be a wtf. I mean cause id like to know how you would rewrite this without using the extra resources.
     


    The real WTF is that you're tagging all your posts 'banana'.



  • @fennec said:

    The real WTF is that you're tagging all your posts 'banana'.

     

    I agree 



  • @plazmo said:

    @fennec said:

    The real WTF is that you're tagging all your posts 'banana'.

     

    I agree 

    Beware of the bored power-obsessed moderator. 


Log in to reply