IsNumber function



  • Hi there,

    I recently stumbled accross the following code in our, urm, project:

    public bool IsNumber(String strNumber)
            {  
                Regex objNotNumberPattern=new Regex("[^0-9.-]");
                Regex objTwoDotPattern=new Regex("[0-9][.][0-9][.][0-9]");
                Regex objTwoMinusPattern=new Regex("[0-9]
    [-][0-9][-][0-9]");
                String strValidRealPattern="^([-]|[.]|[-.]|[0-9])[0-9][.][0-9]+$";
                String strValidIntegerPattern="^([-]|[0-9])[0-9]*$";
                Regex objNumberPattern =new Regex("(" + strValidRealPattern +")|(" + strValidIntegerPattern + ")");
                return !objNotNumberPattern.IsMatch(strNumber) &&
                    !objTwoDotPattern.IsMatch(strNumber) &&
                    !objTwoMinusPattern.IsMatch(strNumber) &&
                    objNumberPattern.IsMatch(strNumber); 
            } 

    I commented it out and replaced it with:

    return Regex.IsMatch(strNumber, "^[0-9]+$");
    // ^[-]?[\d]+.[\d]+$

    Leaving in an expression to check strings with a decimal point (I don't think the function is actually used, I just couldn't leave it be).

    Maybe I am the stupid one, but it bugging the heck out of me what the first function was actaully trying to do, I thought someone else might enjoy a look.

    Is there any sense in what looks like madness?

    ed: Oh, I am no regex expert, I just spent a few minutes downloading Expresso, reading about and then having a play until I got what I wanted. (this function would also have been written by a highly paid contractor)



  • @lampkin said:

    Hi there,

    I recently stumbled accross the following code in our, urm, project:

    public bool IsNumber(String strNumber)
            {  
                Regex objNotNumberPattern=new Regex("[^0-9.-]");
                Regex objTwoDotPattern=new Regex("[0-9][.][0-9][.][0-9]");
                Regex objTwoMinusPattern=new Regex("[0-9]
    [-][0-9][-][0-9]");
                String strValidRealPattern="^([-]|[.]|[-.]|[0-9])[0-9][.][0-9]+$";
                String strValidIntegerPattern="^([-]|[0-9])[0-9]*$";
                Regex objNumberPattern =new Regex("(" + strValidRealPattern +")|(" + strValidIntegerPattern + ")");
                return !objNotNumberPattern.IsMatch(strNumber) &&
                    !objTwoDotPattern.IsMatch(strNumber) &&
                    !objTwoMinusPattern.IsMatch(strNumber) &&
                    objNumberPattern.IsMatch(strNumber); 
            } 

    I commented it out and replaced it with:

    return Regex.IsMatch(strNumber, "^[0-9]+$");
    // ^[-]?[\d]+.[\d]+$

    Leaving in an expression to check strings with a decimal point (I don't think the function is actually used, I just couldn't leave it be).

    Maybe I am the stupid one, but it bugging the heck out of me what the first function was actaully trying to do, I thought someone else might enjoy a look.

    Is there any sense in what looks like madness?

    ed: Oh, I am no regex expert, I just spent a few minutes downloading Expresso, reading about and then having a play until I got what I wanted. (this function would also have been written by a highly paid contractor)


    Looks ok.

    One obvious one is that you don't need to put brackets around [i]everything[/i].

    Brackets are for:

    [a-z] = any single char in the a-z range
    [a-zA-Z] = any single char in teh a-z range, case-insensitive
    [0-9] = any digit, but more fruitfully replaced with \d
    dhw = a char sequence that is exacly "dhw"
    [dhw] = any single char that is d OR h OR w. Equivalent to (d|h|w), but simpler.

    [^G] = any single [i]existing[/i] char that is NOT a "G".
    G? = a "G" zero OR one times. That is, 1 "G" or nothing at all.



  • I don't see how your regex matches floating point numbers.  It looks like you have to change the commented regex in by hand depending on whether you are matching integers or floats.  That would be a wtf in itself!
    And what about negative integers?
    But if the function isn't used you should really just rip it out to clean the source tree.

    As for the original function: it is very strange!  I especially like the TwoDotPattern - what about  ThreeDotPattern etc ;-)
    It looks like the original function will accept just "-" as a valid integer: not sure if that is good or bad...


    Having the sub regex strings named as strValidRealPattern & strValidIntegerPattern is nice though: self documenting code.




  • Yeh, I use this as a reference:


    I just thought the initial function was crazy, I don't even think it works properly? (yeh I woindered about threedot)

    I probably should just strip out the function. I wasn't sure if it they wanted it to check for floating point numbers (why I put the code to do it in brackets).  IsNumber could imply, is whole number and there are many areas where only whole numbers should be allowed in this application.

    ed: I guess the reason I havn't stripped it out is because it is the kind of application where you darn't take something out for fear of the whole thing dieing.



  • Personally I am not into the whole hungarian notation. We use camel case.

    ed: Sorry, probably should have done a search before creating a thread.  Perhaps this wasn't really deserving :P



  • Crap, sorry for the multiple posts..I am a bit of an edit freak (it has a limit, nooo). I just did a check and it is used and should be for whole numbers only based on where it is being used.



  • @lampkin said:


    I just thought the initial function was crazy, I don't even think it works properly? (yeh I woindered about threedot)


    It looks like the original author didn't know of the ? syntax.  Without ? they had to do the twodot stuff because they have two dot matches in the isReal matching string.

    One thing the original does that yours doesn't is accept .7 or -.3 as a valid number.  I reckon it can be minimized to this to match both int and float:
    ^-?\d*[.]?\d+$

    @lampkin said:

    I probably should just strip out the function. I wasn't sure if it they wanted it to check for floating point numbers (why I put the code to do it in brackets).  IsNumber could imply, is whole number and there are many areas where only whole numbers should be allowed in this application.


    If it is to match just integers then IsInteger() would be clearer.  If I saw IsNumber() in code I wouldn't assume that it would reject a decimal as a number.

    @lampkin said:

    ed: I guess the reason I havn't stripped it out is because it is the kind of application where you darn't take something out for fear of the whole thing dieing.


    I know what you mean.  But you do have source control don't you?  And if this is compiled code then the build will fail so you'll know to put it back in...

    Oh yes just thought of another thing.  This is fine for US and UK applications but what if you want to deploy to Europe where they use , as the decimal separator?



  • @danio said:



    It looks like the original author didn't know of the ? syntax.  Without ? they had to do the twodot stuff because they have two dot matches in the isReal matching string.

    One thing the original does that yours doesn't is accept .7 or -.3 as a valid number.  I reckon it can be minimized to this to match both int and float:
    ^-?\d*[.]?\d+$

    If it is to match just integers then IsInteger() would be clearer.  If I saw IsNumber() in code I wouldn't assume that it would reject a decimal as a number.

    I know what you mean.  But you do have source control don't you?  And if this is compiled code then the build will fail so you'll know to put it back in...



    Yeh we have source control, thankfull. 

    Actually I am getting all confused.  I did a search and  it seems this function has been copied and pasted into each individual web page, one page has "IsWholeNumber", the page I altered it on doesn't even use it.  I'll bin it from this page. 

    Thanks for the advice on accepting .7 -.3 I'll remember that for future reference!

    ed: (I like my eds..heh) Nice forum you have here too! I expected to have ben n00b bashed to death by now.  Thanks for the responses!



  • the real WTF is that the source has been copy pasted onto every page.



  • @tster said:

    the real WTF is that the source has been copy pasted onto every page.


    Yeh, it's been a fun application. Every single page that has been done was copied from another page, we have one page over 2000 lines of code (coded by the same guy I think) which is pure copy and paste.. I really think the whole application could be about 1/10th of the size with an inheritance model.  I did'nt come in on the project until the end.    I came on as a junior too, so it's quite scary to see experienced and well paid people doing this.

    The client said the last release was the best (the one I was fully involved in and since all the contractors have gone) :D

    The best WTF we had was a trigger we had which updated column of a table, the problem it has is the update had no 'where' clause...




  • @lampkin said:

    Maybe I am the stupid one, but it bugging the heck out of me what the first function was actaully trying to do, I thought someone else might enjoy a look.

    Is there any sense in what looks like madness?

    I'm not sure if this is what you're asking, but maybe it's supposed to give the user some idea of what exactly is wrong with the number they entered?  (Assuming the function is called on user input in the first place.)  Of course, it would help if it actually provided the caller with some way to know which regex ended up rejecting the input, but, well, you did say "[i]trying[/i] to do". ;-)



  • @lampkin said:

    Maybe I am the stupid one, but it bugging the heck out of me what the first function was actaully trying to do, I thought someone else might enjoy a look.

    Is there any sense in what looks like madness?


    It looks to me like this happened:

    Developer A write a function that checked the string by hand.
    A common approach is to check "Is everything in the string a valid charactor",  "Is there the correct number of each type of charactor", "Is everything in the right order".
    Developer B tells A that it would be easier/better to use Regex.
    Developer A thinks this means to convert his individual tests to regex, rather than write one regex to do the whole check completely.

    The proper regex to check for a possibly negative decimal number (with leading/trailing spaces)
    If you allow " - .123", "-0.123","0.", and "0": Use "^\s*-?\s*(:?\d*.\d+|\d+.?\d*)\s*$"
    If you allow "- 0.123", "-0" but not ".123" or "1.": Use "^\s*-?\s*(:?\d+.\d+|\d+)\s*$"

    Also, you should consider precompiling your regex (Not sure in .NET how, but I know you can in Java).  It can save a little bit of runtime, especially if its called repeatedly in a loop.  Probably not the case for isNumber, but its a good habit anyway.

    In Java I would write:
    private static final Pattern REGEX_DECIMAL_NUMBER= Pattern.compile("^\s*-?\s*(:?\d*\.\d+|\d+\.?\d*)\s*$");

    Good luck.



  • Since it's okay to have no digits on one side of the (optional) decimal
    point, but not on both sides, I would think it needs to look like this:



    ^-?(\d+(.\d*)?|.\d+)$




  • @VGR said:

    Since it's okay to have no digits on one side of the (optional) decimal
    point, but not on both sides, I would think it needs to look like this:



    ^-?(\d+(.\d*)?|.\d+)$


    That is more or less what this does: "^\s*-?\s*(:?\d*.\d+|\d+.?\d*)\s*$"
    (Without whitespace: "^-?(:?\d*.\d+|\d+.?\d*)$"
    That tranlates to "Optional '-'" and either (0 or more decimal, a '.' and 1 or more decimal) or (1 or more decimal, an optional '.' and 0 or more decimal)

    In other words:
    "\d*.\d+" matches ".1" ".123" "0.123"
    "\d+.?\d*" matches "0" "12." "123.32" "123"

    Niether match "."

    Your regex will match that too, but it has more "()" groupings. Not that that is a bad thing, but you should make them non-capturing groups anyway (again, slight optimization, not strictly necessary, but good practice)






  • Call me naive, but what about this:

    public static boolean isNumber(String s) {
        try {
            Double.parseDouble(s);
            return true;
        } catch (NumberFormatException e) {
            return false;
        }



  • Error trapping AND format parsing.

    Quite the bit of overhead, compared to a proper regex.

    It's like smashing it through a round hole and if it breaks, it was a square peg.



  • [quote user="dhromed"]

    Error trapping AND format parsing.

    Quite the bit of overhead, compared to a proper regex.

    It's like smashing it through a round hole and if it breaks, it was a square peg.

    [/quote]

     

    I know what you mean, but if it is really only about one single method isNumber, then I see it as justified. I mean it is not that I try to archieve some icky control flow through out the application.



  • [quote user="Phalphalak"][quote user="dhromed"]

    Error trapping AND format parsing.

    Quite the bit of overhead, compared to a proper regex.

    It's like smashing it through a round hole and if it breaks, it was a square peg.

    [/quote]

     

    I know what you mean, but if it is really only about one single method isNumber, then I see it as justified. I mean it is not that I try to archieve some icky control flow through out the application.

    [/quote]

    Which is why I love .Net 2.0 - it is trivial unless you have some additional rules to deal with. 

    <font size="3">public static bool IsNumber(String s)
    {
    Double d;
    return Double.TryParse(s,d);
    }</font>

    First, it doesn't throw exceptions so you don't have a performance hit.  Second, it's readable/understandable by those who aren't very good at regexes. And third, it correctly handles cultural formatting which you would need to code for with a regex implementation.  Not sure what the performance difference between regexes and TryParse are though.

     

     



  • [quote user="lpope187"]

    <font size="3">public static bool IsNumber(String s)
    {
    Double d;
    return Double.TryParse(s,d);
    }</font>

    First, it doesn't throw exceptions so you don't have a performance hit.  Second, it's readable/understandable by those who aren't very good at regexes. And third, it correctly handles cultural formatting which you would need to code for with a regex implementation.  Not sure what the performance difference between regexes and TryParse are though.

     [/quote]

     Hm... I'm not very familiar with .Net but the way I understand TryParse is that it basically does the following:

    public static boolean TryParse(String s, Double d) {
        try {
            d = Double.parseDouble(s);
            return true;
        } catch (NumberFormatException e) {
            return false;
        }
    }

    Assuming that the Double d is passed on by reference that is... (???)
    As I said, I'm not familiar with .Net but a method called "tryParse" leads me to think of it as if it incorporates an actual try-catch. Don't know if it actually does. :)
    Nevertheless it seems to be a handy convenience method. 



  • [quote user="Phalphalak"]

     Hm... I'm not very familiar with .Net but the way I understand TryParse is that it basically does the following:

    public static boolean TryParse(String s, Double d) {
        try {
            d = Double.parseDouble(s);
            return true;
        } catch (NumberFormatException e) {
            return false;
        }
    }

    Assuming that the Double d is passed on by reference that is... (???)
    As I said, I'm not familiar with .Net but a method called "tryParse" leads me to think of it as if it incorporates an actual try-catch. Don't know if it actually does. :)
    Nevertheless it seems to be a handy convenience method. 

    [/quote]

    Yeah, d is an out parameter.  I can see your logic for the underlying implementation actually using a try/catch, but I haven't found one yet.  I reflected it and got to about four levels deep to the internal Number class and the ParseNumber method.  At that point it seems to be a whole bunch of ifs, bit ops, and char matching since it is a generic number parser that can handle integers, floats, hex, etc.  It is definitely not pretty, but it gets the job done. 

    I think the naming comes from the Parse methods that were created first (that do throw exceptions) and when the need arose for methods that need the extra performance, MS decided to put Try infront of them so they didn't break anything.  At least, that's the only sane explanation I can come up with.
     



  • Those seem like awfully complex patterns.  Why could he not just use

    "^[-:b]*[0-9.]+[:b]*$" instead?

    The first group matches any combination of hyphens and spaces (captures the possible leading whitespace and the possible minus sign).  The second group captures the digits and a possible decimal point.  The third group captures any possible trailing whitespace.

     



  • [quote user="JCM"]

    Those seem like awfully complex patterns.  Why could he not just use

    "^[-:b][0-9.]+[:b]$" instead?

    The
    first group matches any combination of hyphens and spaces (captures the
    possible leading whitespace and the possible minus sign).  The
    second group captures the digits and a possible decimal point. 
    The third group captures any possible trailing whitespace.[/quote]

     

    Last I checked, "-- -- - -- - 123...3..3....123." is not a valid number.

    You're RE will match a valid number, but it won't weed out all invalid numbers. 


Log in to reply