Javascript Trim



  • So convoluted:

    function trim(s) {
            while (s.substring(0,1) == ' ') {
                    s = s.substring(1,s.length);
            }
            while (s.substring(s.length-1,s.length) == ' ') {
                    s = s.substring(0,s.length-1);
            }
            return s;
    }

    It works, but thank god for regex:

    function trim(s)
            return s.replace(/(^\s*)|(\s*$)/,'');
    }



  • A few notes with your regex: 

    - The parens are not required
    - it doesn't make much sense to replace "zero or more" whitespace.
    - lacking the Global switch, it's not going to work on strings with padding on both ends.

     /^\s+|\s+$/g



  • TRWTF is Javascript['s lack of a trim function]

    Take your pick. 



  • function trim(s)
            return str.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
    }




  •  dhromed's regex seems to be the best mix of speed and maintainability.  Also:

     

    String.prototype.trim = function() {
        return this.replace(/^\s+|\s+$/g, '');
    }


  • Why don't you guys just use AJAX to access a trim() service provided by .NET or Java or something? That's the Enterprise Solution!



  • Actually, although you'd expect catastrophic performance from this function due to the potentially huge number of new objects being created and destroyed, the particular regular expression you've replaced it with is probably going to run about 100 times slower for reasonably small amounts of leading/trailing whitespace.

    The "right" way to do a trim is to use two for-loops - one to get the start index and one to get the end - followed by a call to substring.  In this case there is only one new string ever created and no dependency on the regex.  The total number of loop iterations across the entire function (including both for loops and the substring call) is exactly the length of the original string.

    I understand that certain regexes that are commonly used for trimming are heavily optimized in certain browsers and run about as fast as the "bad" method you quoted.  However, this isn't consistent across browsers and you don't want to rely on undocumented behaviour if you're using the trim to, say, sort a table.  If you're just validating a few form fields then it's not really going to matter, of course.

    Every time you use regular expressions for simple string manipulations, God kills a kitten.  So please think of the kittens and stop doing that. 



  • @rbowes said:

    Why don't you guys just use AJAX to access a trim() service provided by .NET or Java or something?

    I need someone to write me a VB.NET application that runs as a server providing trim() functionality for JavaScript apps.

    It MUST BE in VB.NET, because I want it to have a nice GUI Interface!!



  • @Aaron said:

    Every time you use regular expressions for simple string manipulations, God kills a kitten.  So please think of the kittens and stop doing that. 

    I would wager that your self-incurred expulsions of semen have voided the lives of entire evolutionary branches of felines, so I'm sticking with the regex, thx. ;)

    A very informative article, though.



  • @Aaron said:

    Actually, although you'd expect catastrophic performance from this function due to the potentially huge number of new objects being created and destroyed, the particular regular expression you've replaced it with is probably going to run about 100 times slower for reasonably small amounts of leading/trailing whitespace.

    You have any proof of this?  I did a simple test in FF2 and the regex method was about 40% faster for small strings.  If you read the article linked by bobday, it seems regex is generally the best way to go, although I find the \s\s* thing to be a bit silly and it seems doing one global regex is better than doing separate pre- and post- ones.



  • @dhromed said:

    I would wager that your self-incurred expulsions of semen have voided the lives of entire evolutionary branches of felines...

    Quoted For Awesomeness. 



  • @bobday said:

    That link is well worth a look. (Thanks, bobday!) The article compares a variety of trim implementations under FF and IE for execution speed and provides raw numbers and analysis.

     



  • @morbiuswilters said:

    You have any proof of this?  I did a simple test in FF2 and the regex method was about 40% faster for small strings.  If you read the article linked by bobday, it seems regex is generally the best way to go, although I find the \s\s* thing to be a bit silly and it seems doing one global regex is better than doing separate pre- and post- ones.

     <hints id="hah_hints"></hints>
    I was about to post a link to some basic benchmarks but it looks like bobday already posted it above.

    Of course for some reason that link doesn't test simple counting loops, just loops with substrings (ugh).  That is not what I'm talking about at all, and what you're saying about one regex being better than two doesn't make sense because the normal method doesn't use any regexes.

    I did my own test in FF2 using this trim method and the regex you posted above and, well, let's just say the results are wildly inconsistent:

    function trim(s, trimType)
    {
        var whitespaceChars = " \r\n";</p><p>    var startIndex = 0;
        var endIndex = s.length - 1;
        var totalIterations = 0;

        if (trimType != 2)
        {
            while (startIndex &lt; s.length)
            {
                totalIterations++;
                var ch = s.charAt(startIndex);
                if (whitespaceChars.indexOf(ch) == -1)
                {
                    break;
                }
                startIndex++;
            }
        }
       
        if (trimType != 1)
        {
            totalIterations++;
            while (endIndex &gt;= 0)
            {
                var ch = s.charAt(endIndex);
                if (whitespaceChars.indexOf(ch) == -1)
                {
                    break;
                }
                endIndex--;
            }
        }

        return s.substring(startIndex, endIndex + 1);
    }

    Trials are all based on 500 tries, using text from the online lipsum generator.

    1 paragraph, no padding: Normal 0ms, Regex 63ms
    1 paragraph, 10 leading spaces:  Normal 16ms, Regex 63ms
    1 paragraph, 20 leading spaces:  Normal 31ms, Regex 62ms
    1 paragraph, 50 leading spaces:  Normal 78ms, Regex 63ms
    1 paragraph, 10 leading/trailing spaces:  Normal 31ms, Regex 63ms
    1 paragraph, 20 leading/trailing spaces:  Normal 63ms, Regex 62ms
    1 paragraph, 50 leading/trailing spaces:  Normal 234ms, Regex 78ms
    1 paragraph, 50 trailing spaces:  Normal 62ms, Regex 63ms
    5 paragraphs, no padding:  Normal 0ms, Regex 250ms
    5 paragraphs, 10 leading spaces:  Normal 16ms, Regex 250ms
    5 paragraphs, 20 leading spaces:  Normal 31ms, Regex 235ms

    5 paragraphs, 50 leading spaces:  Normal 79ms, Regex 234ms

    5 paragraphs, 10 leading/trailing spaces:  Normal 31ms, Regex 250ms

    5 paragraphs, 20 leading/trailing spaces:  Normal 47ms, Regex 234ms

    5 paragraphs, 50 leading/trailing spaces:  Normal 235ms, Regex 250ms

    5 paragraphs, 50 trailing spaces:  Normal 78ms, Regex 297ms

    Results are similar in IE6, but the regex method usually takes about 30% longer than FF2 (normal method is about the same).

    Honestly?  I'm not even sure what this means.  The regex results kind of make sense - they're consistently slow - but the other results make no sense to me, because the iterative method doesn't seem to show O(n) performance even though the algorithm is clearly O(n).  Or maybe it is O(n) but for the number of spaces as opposed to total characters, which would lead me to believe that there's some way to optimize the iterative method (damned if I know what it is though).

    Well, that's browser JavaScript for you.  Infer whatever you like; in my eye, even though the iterative method seems to have some unexplained scaling problems, it's still much faster than the regex in the majority of realistic cases.



  • Apparently there's some awfulness in that indexOf method... if you replace the whitespaceChars.indexOf(...) call to something like this:

    function isSpace(c)
    {
        return ((c == ' ') || (c == '\r') || (c == '\n'));
    }

    It shaves about 40% of the time off the standard method.  Still got a scaling problem though.

    My guess as to what's happening is that the majority of the time spent in the iterative method is actually the overhead of interpreting the statements; since the bowels of the regex object are native code, eventually you reach a point where the interpretation of thousands/millions of JS statements actually starts to cause its own bottleneck.  This seems to agree with the appearance of being O(n) with respect to amount of whitespace instead of number of chars - the call to String.substring is also O(n) but it's O(n) compiled, not O(n) interpreted.

    So in conclusion: If you plan to be dealing with HUGE blocks of text with HUGE amounts of padding, use a regex instead, because the JS engine then only needs to interpret one statement instead of millions. 

    <hints id="hah_hints"></hints>

Log in to reply