...cause I got bored [critique]
-
Since one of the ways I learn best is from helpful criticism, I figured:
what better way to get it than to write something that most will recognize...yet is almost completely pointless.So, I gave a try at rewriting Javascript's Number.parseInt();
Aside from the fact that it was pointless (besides consuming time), how'd I do?
var parse = { Int: function(s, r) { // check parameters/arguments if ( this.Int.arguments.length < 1 ) return Number.NaN;
// initialize variables var spectrum = "0123456789abcdefghijklmnopqrstuvwxyz", // full spectrum actSpec; // active spectrum var num = 0, // returned number neg = false, // negetive str = new String(s).toLowerCase(), // processed string rad; // active radix var flag = false; // at least 1 valid character is found // determin radix if ( r ) if ( r >= 2 && r <= 36 ) rad = r; else num = Number.NaN; else { // negative if ( str.charAt(0) == '-' ) { neg = true; str = str.substr(1); } // octal or hex if ( str.charAt(0) == '0' ) { flag = true; // assume 0 rad = 8; str = str.substr(1); // hex if ( str.charAt(0) == 'x' ) { flag = false; // remove 0 assumption rad *= 2; str = str.substr(1); } } else rad = 10; } // set "effective spectrum" actSpec = spectrum.substr(0, rad); // verify [remaining] string length if ( str.length < 1 && !flag ) num = Number.NaN; // scan string for proper numbers for ( var x = 0; rad && str.length > 0; x++ ) { var i = actSpec.indexOf( str.charAt(0) ); if ( i == -1 ) // invalid character { if (!flag) // if no valid characters were found num = Number.NaN; break; // end for loop } else { num *= rad; // move digits (e.g., 10 => 100 [base 10]) num += i; // add new value str = str.substr(1); // remove character for next iteration flag = true; // found at least 1 valid character } } if (neg && (!isNaN(num)) ) // adjust for negative num *= -1; return num; }
}
alert( ''
+ parse.Int("184") + "\n" // 184
+ parseInt("184") + "\n"+ parse.Int("184", 12) + "\n" // 244 + parseInt("184", 12) + "\n" + parse.Int("0379") + "\n" // 31 (ignores "9") + parseInt("0379") + "\n" + parse.Int("0x0FG") + "\n" // 15 (ignores "G") + parseInt("0x0FG") + "\n" + parse.Int("00zz", 35) + "\n" // 0 (ignored "zz") + parseInt("00zz", 35) + "\n" + parse.Int("0x00zz", 36) + "\n" // 55428623 + parseInt("0x00zz", 36) + "\n" + parse.Int("foo") + "\n" // NaN + parseInt("foo") + "\n" );</PRE></BLOCKQUOTE>
-
// check parameters/arguments if ( this.Int.arguments.length < 1 ) return Number.NaN;
IIRC this kind of construct is deprecated, and not part of the ECMAScript standard. Better to use just
arguments.length
.var flag = false; // at least 1 valid character is found
flag
isn't a very good variable name.// determin radix if ( r ) if ( r >= 2 && r <= 36 ) rad = r; else num = Number.NaN; else
I would put braces round the outer
if
body - it's not necessary, but a bit clearer IMHO. Also, you don't check for a negative sign in this branch.
-
Hopelessly inefficient - it's O(RN) for radix R and length N, or perhaps O(RN^2) (I'm not sure of the performance characteristics of javascript substr). Use a lookup table, or direct manipulation of the character's ASCII value, don't scan a string to translate the characters to integer values. Use charAt(x), don't substr in the innermost loop.
Avoiding premature optimisation is one thing, but there's no reason to include needless polynomial factors when it doesn't make the code any simpler.
-
without testing, I don't think any of those examples work:
parse.Int('-1', 10);
parse.Int('-0x01');
-
@noehch said:
Aside from the fact that it was pointless (besides consuming time), how'd I do?
Well, you forgot that the function accepts leading and trailing whitespace... [url=http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf]Check section 15.1.2.2[/url]
Also you should have probably leveraged regular expressions to get usable performance from it; parsing strings a character at a time with a ecmascript explicit state machine is a bit slow.
-
Thanks for the feedback.
Though...I, honestly, don't see how regular expressions would make things any faster.
The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.Anyways, here's the updated snippet:
var parse = { Int: function(s, r) { // check parameters/arguments if ( arguments.length < 1 ) return Number.NaN;
// initialize variables var spectrum = { // character spectrum // lookup table 0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, a: 10, b: 11, c: 12, d: 13, e: 14, f: 15, g: 16, h: 17, i: 18, j: 19, k: 20, l: 21, m: 22, n: 23, o: 24, p: 25, q: 26, r: 27, s: 28, t: 29, u: 30, v: 31, w: 32, x: 33, y: 34, z: 35 } var num = 0, // returned number neg = false, // negetive str = new String(s).toLowerCase(), // processed string rad; // active radix var index = 0; // input string traverser valid = false; // validity of the string // trim leading/trailing spaces str = str.replace(/^\s+|\s+$/g ,''); // check trimmed string length if ( str.length < 1 ) { str = Number.NaN; } else { // determin negativity if ( str.charAt(index) == '-' ) { neg = true; index++; } // determin radix if ( r ) { if ( r >= 2 && r <= 36 ) rad = r; else num = Number.NaN; } else { // octal or hex if ( str.charAt(index) == '0' ) { valid = true; // assume 0 rad = 8; index++; // hex-only if ( str.charAt(index) == 'x' ) { valid = false; // remove assumption rad *= 2; index++; } } else { rad = 10; } } // check unprocessed string length if ( !( str.length > index ) ) { num = Number.NaN; } // scan unprocessed string for ( var i = 0 ; (!isNaN(num)) && rad && index < str.length; index++ ) { // get index of character from the active spectrum (sym-lookup) i = spectrum[ str.charAt(index) ]; if ( i > -1 && i < rad ) // valid character { num *= rad; // move digits (e.g., 10 => 100 [base 10]) num += i; // increment valid = true; // found a valid character } else { if (!valid) num = Number.NaN; // set to NaN rather than the initial value of 0 break; // end for loop } } if ( neg && (!isNaN(num)) ) // adjust for negative num *= -1; } return num; }
}
alert( ''
+ parse.Int("184") + "\n" // 184
+ parseInt("184") + "\n"+ parse.Int("184", 12) + "\n" // 244 + parseInt("184", 12) + "\n" + parse.Int("0379") + "\n" // 31 (ignores "9") + parseInt("0379") + "\n" + parse.Int("0x0FG") + "\n" // 15 (ignores "G") + parseInt("0x0FG") + "\n" + parse.Int("00zz", 35) + "\n" // 0 (ignored "zz") + parseInt("00zz", 35) + "\n" + parse.Int("0x00zz", 36) + "\n" // 55428623 + parseInt("0x00zz", 36) + "\n" + parse.Int("foo") + "\n" // NaN + parseInt("foo") + "\n" + parse.Int('-1', 10) + "\n" // tests from replies + parse.Int('-0x01') + "\n" + parse.Int(' -0x81 ') + "\n" // allows leading/trailing spaces );</PRE></BLOCKQUOTE>
-
@noehch said:
Thanks for the feedback.
Though...I, honestly, don't see how regular expressions would make things any faster.
The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.Parsing a string to int is about patterns in text.
RegEx enters the picture naturally, it seems.
Whether it makes things faster is up for experiment. For a program a while ago, I wasn't sure whether a combination of substr, indexOf and += was faster than splitting and joining an array, but doing it "manually" turned out to be significantly quicker. In FFX extremely so, even.
As usual, YMMV.
-
I was thinking you could use it to match 2 or 3 digits at a time with appropriate (large) lookup table. Static initialization function for common bases (3-grouped for octal and decimal, 2-grouped for hex). Or something...
-
@kirchhoff said:
I was thinking you could use it to match 2 or 3 digits at a time with appropriate (large) lookup table. Static initialization function for common bases (3-grouped for octal and decimal, 2-grouped for hex). Or something...
The problem I see with this is that I'd already need 2 seperate lookup tables...and that would only cover radix 8 and 16...what about the others between 2 and 36?I also can't really combine them into 1 lookup table cause [for example] "yb" is a different value for radix 35 (1201) and radix 36 (1235).
In theory, your idea is good...but, once it's introduced to practice, its feasibility seems shot...but, maybe it's just me.
But, in all honesty, this suggestion keeps making me think of Just in Case.
-
@noehch said:
But, in all honesty, this suggestion keeps making me think of Just in Case.
Meant to point specifically to "the suggestion of regular expressions."More and more...it just seems like a Complicator's approach (e.g., Bycicle and Glove) to use regular expressions in this case.
Maybe character-by-character is as well...but, it has a longer and stronger history than regular expressions...which work by matching character-by-character (seems redundant).For example, validating email and date strings makes sense. But, with those, you generally use something else (class, function, etc.) to actually put the strings to use.
The only exception, I can [quickly] think of, would be replace() [or s///g]...and, I don't think, it's really that good of an exception.
-
Retrospect has a nice way of showing you how much of an ass you can be.
Sorry about that.Let me try this again.
@dhromed said:@noehch said:
Thanks for the feedback.
Though...I, honestly, don't see how regular expressions would make things any faster.
The only practical use for them would be checking the beginning up to '-0x' -- and that's not very practical, at all.Parsing a string to int is about patterns in text.
RegEx enters the picture naturally, it seems.
While that does make sense, what I am (and was) stuck on is how do you then get the integer value using RegExps?
My only guess was to then use the lookup table (or indexOf) anyways.If you have any suggestions, I'll gladly try them.