...cause I got bored [critique]

noehch

Since one of the ways I learn best is from helpful criticism, I figured:
what better way to get it than to write something that most will recognize...yet is almost completely pointless.

So, I gave a try at rewriting Javascript's Number.parseInt();

Aside from the fact that it was pointless (besides consuming time), how'd I do?

var parse = {
    Int: function(s, r)
    {
        // check parameters/arguments
        if ( this.Int.arguments.length < 1 ) return Number.NaN;
    // initialize variables
    var spectrum = "0123456789abcdefghijklmnopqrstuvwxyz",  // full spectrum
        actSpec;                                            // active spectrum
        
    var num = 0,                            // returned number
        neg = false,                        // negetive
        str = new String(s).toLowerCase(),  // processed string
        rad;                                // active radix
        
    var flag = false;   // at least 1 valid character is found
    
    // determin radix
    if ( r )
        if ( r &gt;= 2 &amp;&amp; r &lt;= 36 )
            rad = r;
        else
            num = Number.NaN;
    else
    {
        // negative
        if ( str.charAt(0) == '-' )
        {
            neg = true;
            str = str.substr(1);
        }
        
        // octal or hex
        if ( str.charAt(0) == '0' )
        {
            flag = true;        // assume 0
            rad = 8;
            str = str.substr(1);
            
            // hex
            if ( str.charAt(0) == 'x' )
            {
                flag = false;   // remove 0 assumption
                rad *= 2;
                str = str.substr(1);
            }
        }
        else
            rad = 10;
    }
    
    // set "effective spectrum"
    actSpec = spectrum.substr(0, rad);
    
    // verify [remaining] string length
    if ( str.length &lt; 1 &amp;&amp; !flag )
        num = Number.NaN;
    
    // scan string for proper numbers
    for ( var x = 0; rad &amp;&amp; str.length &gt; 0; x++ )
    {
        var i = actSpec.indexOf( str.charAt(0) );
        
        if ( i == -1 )              // invalid character
        {
            if (!flag)              // if no valid characters were found
                num = Number.NaN;
            break;                  // end for loop
        }
        else
        {
            num *= rad;             // move digits (e.g., 10 =&gt; 100 [base 10])
            num += i;               // add new value
            
            str = str.substr(1);    // remove character for next iteration
            
            flag = true;            // found at least 1 valid character
        }
    }
    
    if (neg &amp;&amp; (!isNaN(num)) )      // adjust for negative
        num *= -1;
    return num;
}

}
alert( ''

+ parse.Int("184") + "\n"         // 184

+ parseInt("184") + "\n"
+ parse.Int("184", 12) + "\n"     // 244
+ parseInt("184", 12) + "\n"

+ parse.Int("0379") + "\n"        // 31 (ignores "9")
+ parseInt("0379") + "\n"

+ parse.Int("0x0FG") + "\n"       // 15 (ignores "G")
+ parseInt("0x0FG") + "\n"

+ parse.Int("00zz", 35) + "\n"    // 0  (ignored "zz")
+ parseInt("00zz", 35) + "\n"

+ parse.Int("0x00zz", 36) + "\n"  // 55428623
+ parseInt("0x00zz", 36) + "\n"

+ parse.Int("foo") + "\n"         // NaN
+ parseInt("foo") + "\n"
);</PRE></BLOCKQUOTE>

iwpg

        // check parameters/arguments
        if ( this.Int.arguments.length < 1 ) return Number.NaN;

IIRC this kind of construct is deprecated, and not part of the ECMAScript standard. Better to use just arguments.length.

        var flag = false;   // at least 1 valid character is found

flag isn't a very good variable name.

        // determin radix
        if ( r )
            if ( r >= 2 && r <= 36 )
                rad = r;
            else
                num = Number.NaN;
        else

I would put braces round the outer if body - it's not necessary, but a bit clearer IMHO. Also, you don't check for a negative sign in this branch.

asuffield

Hopelessly inefficient - it's O(RN) for radix R and length N, or perhaps O(RN^2) (I'm not sure of the performance characteristics of javascript substr). Use a lookup table, or direct manipulation of the character's ASCII value, don't scan a string to translate the characters to integer values. Use charAt(x), don't substr in the innermost loop.

Avoiding premature optimisation is one thing, but there's no reason to include needless polynomial factors when it doesn't make the code any simpler.

ammoQ

without testing, I don't think any of those examples work:

parse.Int('-1', 10);

parse.Int('-0x01');

kirchhoff

@noehch said:

Aside from the fact that it was pointless (besides consuming time), how'd I do?

Well, you forgot that the function accepts leading and trailing whitespace... [url=http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf]Check section 15.1.2.2[/url]

Also you should have probably leveraged regular expressions to get usable performance from it; parsing strings a character at a time with a ecmascript explicit state machine is a bit slow.

noehch

Thanks for the feedback.

Though...I, honestly, don't see how regular expressions would make things any faster.
The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.

Anyways, here's the updated snippet:

var parse = {
    Int: function(s, r)
    {
        // check parameters/arguments
        if ( arguments.length < 1 ) return Number.NaN;
    // initialize variables
    var spectrum = {    // character spectrum // lookup table
        0: 0,   1: 1,   2: 2,   3: 3,
        4: 4,   5: 5,   6: 6,   7: 7,
        8: 8,   9: 9,   a: 10,  b: 11,
        c: 12,  d: 13,  e: 14,  f: 15,
        g: 16,  h: 17,  i: 18,  j: 19,
        k: 20,  l: 21,  m: 22,  n: 23,
        o: 24,  p: 25,  q: 26,  r: 27,
        s: 28,  t: 29,  u: 30,  v: 31,
        w: 32,  x: 33,  y: 34,  z: 35
    }

    var num = 0,                            // returned number
        neg = false,                        // negetive
        str = new String(s).toLowerCase(),  // processed string
        rad;                                // active radix

    var index = 0;      // input string traverser
        valid = false;  // validity of the string

    // trim leading/trailing spaces
    str = str.replace(/^\s+|\s+$/g ,'');

    // check trimmed string length
    if ( str.length &lt; 1 )
    {
        str = Number.NaN;
    }
    else
    {
        // determin negativity
        if ( str.charAt(index) == '-' )
        {
            neg = true;
            index++;
        }

        // determin radix
        if ( r )
        {
            if ( r &gt;= 2 &amp;&amp; r &lt;= 36 )
                rad = r;
            else
                num = Number.NaN;
        }
        else
        {
            // octal or hex
            if ( str.charAt(index) == '0' )
            {
                valid = true;       // assume 0
                rad = 8;
                index++;

                // hex-only
                if ( str.charAt(index) == 'x' )
                {
                    valid = false;  // remove assumption
                    rad *= 2;
                    index++;
                }
            }
            else
            {
                rad = 10;
            }
        }

        // check unprocessed string length
        if ( !( str.length &gt; index ) )
        {
            num = Number.NaN;
        }

        // scan unprocessed string
        for ( var i = 0 ; (!isNaN(num)) &amp;&amp; rad &amp;&amp; index &lt; str.length; index++ )
        {
            // get index of character from the active spectrum (sym-lookup)
            i = spectrum[ str.charAt(index) ];

            if ( i &gt; -1 &amp;&amp; i &lt; rad )    // valid character
            {
                num *= rad;             // move digits (e.g., 10 =&gt; 100 [base 10])
                num += i;               // increment
                valid = true;           // found a valid character
            }
            else
            {
                if (!valid)
                    num = Number.NaN;   // set to NaN rather than the initial value of 0
                break;                  // end for loop
            }
        }

        if ( neg &amp;&amp; (!isNaN(num)) )  // adjust for negative
            num *= -1;
    }

    return num;
}

}
alert( ''

+ parse.Int("184") + "\n"         // 184

+ parseInt("184") + "\n"
+ parse.Int("184", 12) + "\n"     // 244
+ parseInt("184", 12) + "\n"

+ parse.Int("0379") + "\n"        // 31 (ignores "9")
+ parseInt("0379") + "\n"

+ parse.Int("0x0FG") + "\n"       // 15 (ignores "G")
+ parseInt("0x0FG") + "\n"

+ parse.Int("00zz", 35) + "\n"    // 0  (ignored "zz")
+ parseInt("00zz", 35) + "\n"

+ parse.Int("0x00zz", 36) + "\n"  // 55428623
+ parseInt("0x00zz", 36) + "\n"

+ parse.Int("foo") + "\n"         // NaN
+ parseInt("foo") + "\n"

+ parse.Int('-1', 10) + "\n"      // tests from replies
+ parse.Int('-0x01') + "\n"

+ parse.Int(' -0x81 ') + "\n"     // allows leading/trailing spaces
);</PRE></BLOCKQUOTE>

dhromed

@noehch said:

Thanks for the feedback.

Though...I, honestly, don't see how regular expressions would make things any faster.
The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.

Parsing a string to int is about patterns in text.

RegEx enters the picture naturally, it seems.

Whether it makes things faster is up for experiment. For a program a while ago, I wasn't sure whether a combination of substr, indexOf and += was faster than splitting and joining an array, but doing it "manually" turned out to be significantly quicker. In FFX extremely so, even.

As usual, YMMV.

kirchhoff

I was thinking you could use it to match 2 or 3 digits at a time with appropriate (large) lookup table. Static initialization function for common bases (3-grouped for octal and decimal, 2-grouped for hex). Or something...

noehch

@kirchhoff said:

I was thinking you could use it to match 2 or 3 digits at a time with appropriate (large) lookup table. Static initialization function for common bases (3-grouped for octal and decimal, 2-grouped for hex). Or something...

The problem I see with this is that I'd already need 2 seperate lookup tables...and that would only cover radix 8 and 16...what about the others between 2 and 36?

I also can't really combine them into 1 lookup table cause [for example] "yb" is a different value for radix 35 (1201) and radix 36 (1235).

In theory, your idea is good...but, once it's introduced to practice, its feasibility seems shot...but, maybe it's just me.

But, in all honesty, this suggestion keeps making me think of Just in Case.

noehch

@noehch said:

But, in all honesty, this suggestion keeps making me think of Just in Case.

Meant to point specifically to "the suggestion of regular expressions."

More and more...it just seems like a Complicator's approach (e.g., Bycicle and Glove) to use regular expressions in this case.
Maybe character-by-character is as well...but, it has a longer and stronger history than regular expressions...which work by matching character-by-character (seems redundant).

For example, validating email and date strings makes sense. But, with those, you generally use something else (class, function, etc.) to actually put the strings to use.
The only exception, I can [quickly] think of, would be replace() [or s///g]...and, I don't think, it's really that good of an exception.

noehch

Retrospect has a nice way of showing you how much of an ass you can be.
Sorry about that.

Let me try this again.

@dhromed said:

@noehch said:
Thanks for the feedback.

Though...I, honestly, don't see how regular expressions would make things any faster.
The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.

Parsing a string to int is about patterns in text.

RegEx enters the picture naturally, it seems.

While that does make sense, what I am (and was) stuck on is how do you then get the integer value using RegExps?
My only guess was to then use the lookup table (or indexOf) anyways.

If you have any suggestions, I'll gladly try them.