...cause I got bored [critique]



  • Since one of the ways I learn best is from helpful criticism, I figured:
    what better way to get it than to write something that most will recognize...yet is almost completely pointless.

    So, I gave a try at rewriting Javascript's Number.parseInt();

    Aside from the fact that it was pointless (besides consuming time), how'd I do?

    var parse = {
        Int: function(s, r)
        {
            // check parameters/arguments
            if ( this.Int.arguments.length < 1 ) return Number.NaN;
    
        // initialize variables
        var spectrum = "0123456789abcdefghijklmnopqrstuvwxyz",  // full spectrum
            actSpec;                                            // active spectrum
            
        var num = 0,                            // returned number
            neg = false,                        // negetive
            str = new String(s).toLowerCase(),  // processed string
            rad;                                // active radix
            
        var flag = false;   // at least 1 valid character is found
        
        // determin radix
        if ( r )
            if ( r &gt;= 2 &amp;&amp; r &lt;= 36 )
                rad = r;
            else
                num = Number.NaN;
        else
        {
            // negative
            if ( str.charAt(0) == '-' )
            {
                neg = true;
                str = str.substr(1);
            }
            
            // octal or hex
            if ( str.charAt(0) == '0' )
            {
                flag = true;        // assume 0
                rad = 8;
                str = str.substr(1);
                
                // hex
                if ( str.charAt(0) == 'x' )
                {
                    flag = false;   // remove 0 assumption
                    rad *= 2;
                    str = str.substr(1);
                }
            }
            else
                rad = 10;
        }
        
        // set "effective spectrum"
        actSpec = spectrum.substr(0, rad);
        
        // verify [remaining] string length
        if ( str.length &lt; 1 &amp;&amp; !flag )
            num = Number.NaN;
        
        // scan string for proper numbers
        for ( var x = 0; rad &amp;&amp; str.length &gt; 0; x++ )
        {
            var i = actSpec.indexOf( str.charAt(0) );
            
            if ( i == -1 )              // invalid character
            {
                if (!flag)              // if no valid characters were found
                    num = Number.NaN;
                break;                  // end for loop
            }
            else
            {
                num *= rad;             // move digits (e.g., 10 =&gt; 100 [base 10])
                num += i;               // add new value
                
                str = str.substr(1);    // remove character for next iteration
                
                flag = true;            // found at least 1 valid character
            }
        }
        
        if (neg &amp;&amp; (!isNaN(num)) )      // adjust for negative
            num *= -1;
        return num;
    }
    

    }

    alert( ''
    + parse.Int("184") + "\n" // 184
    + parseInt("184") + "\n"

    + parse.Int("184", 12) + "\n"     // 244
    + parseInt("184", 12) + "\n"
    
    + parse.Int("0379") + "\n"        // 31 (ignores "9")
    + parseInt("0379") + "\n"
    
    + parse.Int("0x0FG") + "\n"       // 15 (ignores "G")
    + parseInt("0x0FG") + "\n"
    
    + parse.Int("00zz", 35) + "\n"    // 0  (ignored "zz")
    + parseInt("00zz", 35) + "\n"
    
    + parse.Int("0x00zz", 36) + "\n"  // 55428623
    + parseInt("0x00zz", 36) + "\n"
    
    + parse.Int("foo") + "\n"         // NaN
    + parseInt("foo") + "\n"
    );</PRE></BLOCKQUOTE>


  •         // check parameters/arguments
            if ( this.Int.arguments.length < 1 ) return Number.NaN;

    IIRC this kind of construct is deprecated, and not part of the ECMAScript standard. Better to use just arguments.length.

            var flag = false;   // at least 1 valid character is found

    flag isn't a very good variable name.

            // determin radix
            if ( r )
                if ( r >= 2 && r <= 36 )
                    rad = r;
                else
                    num = Number.NaN;
            else

    I would put braces round the outer if body - it's not necessary, but a bit clearer IMHO. Also, you don't check for a negative sign in this branch.



  • Hopelessly inefficient - it's O(RN) for radix R and length N, or perhaps O(RN^2) (I'm not sure of the performance characteristics of javascript substr). Use a lookup table, or direct manipulation of the character's ASCII value, don't scan a string to translate the characters to integer values. Use charAt(x), don't substr in the innermost loop.

    Avoiding premature optimisation is one thing, but there's no reason to include needless polynomial factors when it doesn't make the code any simpler. 



  • without testing, I don't think any of those examples work:

    parse.Int('-1', 10);

    parse.Int('-0x01');
     



  • @noehch said:

    Aside from the fact that it was pointless (besides consuming time), how'd I do?

    Well, you forgot that the function accepts leading and trailing whitespace... [url=http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf]Check section 15.1.2.2[/url]

    Also you should have probably leveraged regular expressions to get usable performance from it; parsing strings a character at a time with a ecmascript explicit state machine is a bit slow.
     



  • Thanks for the feedback.

    Though...I, honestly, don't see how regular expressions would make things any faster.
    The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.

    Anyways, here's the updated snippet:

    var parse = {
        Int: function(s, r)
        {
            // check parameters/arguments
            if ( arguments.length < 1 ) return Number.NaN;
    
        // initialize variables
        var spectrum = {    // character spectrum // lookup table
            0: 0,   1: 1,   2: 2,   3: 3,
            4: 4,   5: 5,   6: 6,   7: 7,
            8: 8,   9: 9,   a: 10,  b: 11,
            c: 12,  d: 13,  e: 14,  f: 15,
            g: 16,  h: 17,  i: 18,  j: 19,
            k: 20,  l: 21,  m: 22,  n: 23,
            o: 24,  p: 25,  q: 26,  r: 27,
            s: 28,  t: 29,  u: 30,  v: 31,
            w: 32,  x: 33,  y: 34,  z: 35
        }
    
        var num = 0,                            // returned number
            neg = false,                        // negetive
            str = new String(s).toLowerCase(),  // processed string
            rad;                                // active radix
    
        var index = 0;      // input string traverser
            valid = false;  // validity of the string
    
        // trim leading/trailing spaces
        str = str.replace(/^\s+|\s+$/g ,'');
    
        // check trimmed string length
        if ( str.length &lt; 1 )
        {
            str = Number.NaN;
        }
        else
        {
            // determin negativity
            if ( str.charAt(index) == '-' )
            {
                neg = true;
                index++;
            }
    
            // determin radix
            if ( r )
            {
                if ( r &gt;= 2 &amp;&amp; r &lt;= 36 )
                    rad = r;
                else
                    num = Number.NaN;
            }
            else
            {
                // octal or hex
                if ( str.charAt(index) == '0' )
                {
                    valid = true;       // assume 0
                    rad = 8;
                    index++;
    
                    // hex-only
                    if ( str.charAt(index) == 'x' )
                    {
                        valid = false;  // remove assumption
                        rad *= 2;
                        index++;
                    }
                }
                else
                {
                    rad = 10;
                }
            }
    
            // check unprocessed string length
            if ( !( str.length &gt; index ) )
            {
                num = Number.NaN;
            }
    
            // scan unprocessed string
            for ( var i = 0 ; (!isNaN(num)) &amp;&amp; rad &amp;&amp; index &lt; str.length; index++ )
            {
                // get index of character from the active spectrum (sym-lookup)
                i = spectrum[ str.charAt(index) ];
    
                if ( i &gt; -1 &amp;&amp; i &lt; rad )    // valid character
                {
                    num *= rad;             // move digits (e.g., 10 =&gt; 100 [base 10])
                    num += i;               // increment
                    valid = true;           // found a valid character
                }
                else
                {
                    if (!valid)
                        num = Number.NaN;   // set to NaN rather than the initial value of 0
                    break;                  // end for loop
                }
            }
    
            if ( neg &amp;&amp; (!isNaN(num)) )  // adjust for negative
                num *= -1;
        }
    
        return num;
    }
    

    }

    alert( ''
    + parse.Int("184") + "\n" // 184
    + parseInt("184") + "\n"

    + parse.Int("184", 12) + "\n"     // 244
    + parseInt("184", 12) + "\n"
    
    + parse.Int("0379") + "\n"        // 31 (ignores "9")
    + parseInt("0379") + "\n"
    
    + parse.Int("0x0FG") + "\n"       // 15 (ignores "G")
    + parseInt("0x0FG") + "\n"
    
    + parse.Int("00zz", 35) + "\n"    // 0  (ignored "zz")
    + parseInt("00zz", 35) + "\n"
    
    + parse.Int("0x00zz", 36) + "\n"  // 55428623
    + parseInt("0x00zz", 36) + "\n"
    
    + parse.Int("foo") + "\n"         // NaN
    + parseInt("foo") + "\n"
    
    + parse.Int('-1', 10) + "\n"      // tests from replies
    + parse.Int('-0x01') + "\n"
    
    + parse.Int(' -0x81 ') + "\n"     // allows leading/trailing spaces
    );</PRE></BLOCKQUOTE>


  • @noehch said:

    Thanks for the feedback.

    Though...I, honestly, don't see how regular expressions would make things any faster.
    The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.

    Parsing a string to int is about patterns in text.

    RegEx enters the picture naturally, it seems.

    Whether it makes things faster is up for experiment. For a program a while ago, I wasn't sure whether a combination of substr, indexOf and += was faster than splitting and joining an array, but doing it "manually" turned out to be significantly quicker. In FFX extremely so, even.

    As usual, YMMV.



  • I was thinking you could use it to match 2 or 3 digits at a time with appropriate (large) lookup table. Static initialization function for common bases (3-grouped for octal and decimal, 2-grouped for hex). Or something...



  • @kirchhoff said:

    I was thinking you could use it to match 2 or 3 digits at a time with appropriate (large) lookup table. Static initialization function for common bases (3-grouped for octal and decimal, 2-grouped for hex). Or something...

    The problem I see with this is that I'd already need 2 seperate lookup tables...and that would only cover radix 8 and 16...what about the others between 2 and 36?

    I also can't really combine them into 1 lookup table cause [for example] "yb" is a different value for radix 35 (1201) and radix 36 (1235).

    In theory, your idea is good...but, once it's introduced to practice, its feasibility seems shot...but, maybe it's just me.

    But, in all honesty, this suggestion keeps making me think of Just in Case.



  • @noehch said:

    But, in all honesty, this suggestion keeps making me think of Just in Case.

    Meant to point specifically to "the suggestion of regular expressions."

    More and more...it just seems like a Complicator's approach (e.g., Bycicle and Glove) to use regular expressions in this case.
    Maybe character-by-character is as well...but, it has a longer and stronger history than regular expressions...which work by matching character-by-character (seems redundant).

    For example, validating email and date strings makes sense. But, with those, you generally use something else (class, function, etc.) to actually put the strings to use.
    The only exception, I can [quickly] think of, would be replace() [or s///g]...and, I don't think, it's really that good of an exception.



  • Retrospect has a nice way of showing you how much of an ass you can be.
    Sorry about that.

    Let me try this again.


    @dhromed said:

    @noehch said:
    Thanks for the feedback.

    Though...I, honestly, don't see how regular expressions would make things any faster.
    The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.

    Parsing a string to int is about patterns in text.

    RegEx enters the picture naturally, it seems.


    While that does make sense, what I am (and was) stuck on is how do you then get the integer value using RegExps?
    My only guess was to then use the lookup table (or indexOf) anyways.

    If you have any suggestions, I'll gladly try them.


Log in to reply