It works...

Stupidumb

Found in previous employee's code. Not a huge deal, but, I think it's kinda funny:

public static bool IsSixDigits(string strDigits)
{
Regex objSixDigitPattern = new Regex("[0-9][0-9][0-9][0-9][0-9][0-9]");
return objSixDigitPattern.IsMatch(strDigits);
}

AbbydonKrafts

Not too bad. It's a little overkill to instantiate a Regex object just for that. I would've stuck with a numeric and length check.

skippy

Is that for exactly six digits or at least 6?

viraptor

The real WTF is that he didn't do "[0-9]{6}".

The real real WTF is that he forgot ^ and $.

AbbydonKrafts

@viraptor said:

The real real WTF is that he forgot ^ and $.

Oh yeah.. that, too.

@AbbydonKrafts said:

Not too bad. It's a little overkill to instantiate a Regex object just for that. I would've stuck with a numeric and length check.

probably because you have not mastered regexes. Otherwise you would have realized that "^\d{6}$" is much easier to write and maintain than however you would have done it.

fennec

@tster said:

@AbbydonKrafts said:
Not too bad. It's a little overkill to instantiate a Regex object just for that. I would've stuck with a numeric and length check.
probably because you have not mastered regexes. Otherwise you would have realized that "^\d{6}$" is much easier to write and maintain than however you would have done it.

Yes, but, since the one minor strike against me on my quiz-for-employment was with a regular expression problem not entirely dissimilar to this one, I must point out the error: that particular regular expression you have would match "123456\n" - which is six digits and a newline, not just six digits. Apparently it's generally-somewhat-better to use \A and \Z instead of ^ and $ in validation cases like this. Pesky newlines.

reptar

Logarithms are your friends...

dhromed

@fennec said:

the one minor strike against me on my quiz-for-employment was with a regular expression problem not entirely dissimilar to this one, I must point out the error: that particular regular expression you have would match "123456\n" - which is six digits and a newline, not just six digits. Apparently it's generally-somewhat-better to use \A and \Z instead of ^ and $ in validation cases like this. Pesky newlines.

If your regex engine does that, it depends on the flavour. :)

In JS, /^\d{6}$/ will test for precisely six numeric characters, and not the newline. $ will test for the end of the string. It will return false on "123456\n", as the end of the string is not taken by a digit.

When multiline mode is set, all newlines function as end-of-string, and the $ will also match \n. However, $ matches a position, and not a character, and the string selected by the reg will not include a \n.

And it will suck balls, by the way, if $ suddenly returns a whole character. It voids the use of $.

fennec

@dhromed said:

If your regex engine does that, it depends on the flavour. :)

Perl.

In JS, /^\d{6}$/ will test for precisely six numeric characters, and not the newline. $ will test for the end of the string. It will return false on "123456\n", as the end of the string is not taken by a digit.
When multiline mode is set, all newlines function as end-of-string, and the $ will also match \n. However, $ matches a position, and not a character, and the string selected by the reg will not include a \n.
And it will suck balls, by the way, if $ suddenly returns a whole character. It voids the use of $.

True. But you're in an isSixDigits function, not an extractSixDigits function, so even if it doesn't return the character, the string being checked will still pass if it matches. This looks like Java. What would Java do with this, I wonder...? maybe I'll test it some day.

XIU

Looks dotnet to me, Java doesn't use Pascal casing.

zip

@dhromed said:

@fennec said:
the one minor strike against me on my quiz-for-employment was with a regular expression problem not entirely dissimilar to this one, I must point out the error: that particular regular expression you have would match "123456\n" - which is six digits and a newline, not just six digits. Apparently it's generally-somewhat-better to use \A and \Z instead of ^ and $ in validation cases like this. Pesky newlines.

If your regex engine does that, it depends on the flavour. :)

In JS, /^\d{6}$/ will test for precisely six numeric characters, and not the newline. $ will test for the end of the string. It will return false on "123456\n", as the end of the string is not taken by a digit.

When multiline mode is set, all newlines function as end-of-string, and the $ will also match \n. However, $ matches a position, and not a character, and the string selected by the reg will not include a \n.

And it will suck balls, by the way, if $ suddenly returns a whole character. It voids the use of $.

The fact that there's this much discussion about as simple a regex as "is it 6 numeric characters?" is why regexes are more trouble than they're worth at least half the time.

If I can write code that does what the regex does in less than 60 seconds, there's no reason to use one. If I write code like this:

match = true
match &= (string length is 6)
foreach(char in string)

{

match &= char >= 0 && char <= 9

}

return match

There's no gotchas about regex engines and I can get on with my life.

AbbydonKrafts

@tster said:

probably because you have not mastered regexes. Otherwise you would have realized that "^\d{6}$" is much easier to write and maintain than however you would have done it.

You have no idea how many RegExes are in this Frankenstein JavaScript/HTML/WinForm/COM thing that I have to maintain. I've had to write plenty due to the way the "SDK" works.

However, for simple validation in WinForm apps, I would write out the code for it. It's easier to read and maintain. Also, it's guaranteed to work. I've already found that every single RegEx implementation is different from the next. So, "mastering" RegEx is almost like mastering the English language including its many dialects-within-dialects (ex: American English/New York Bronx). It's retarded when I have to tinker with a RegEx that works fine in one language just to make it work in another. Yet, it's pretty much a straight translation when regular code is involved.

dhromed

@zip said:

The fact that there's this much discussion about as simple a regex as "is it 6 numeric characters?" is why regexes are more trouble than they're worth at least half the time.

As the saying goes,

"Sometimes you have this problem, and you think Regex may help you solve it.

Now you have two problems."

Stupidumb

Correct, this is C#

DaveK1

@zip said:

If I can write code that does what the regex does
in less than 60 seconds, there's no reason to use one. If I write
code like this:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match
There's no gotchas about regex engines and I can get on with my life.

Well, ok, but I would leave out the

match = true

and just write

match = (string length is 6)

I wouldn't bother iterating the chars either, if the length wasn't 6. It's not a WTF, but there's no need to be gratuitously inefficient - how do you suppose Windoze ended up getting so bloated?

asuffield

@zip said:

The fact that there's this much discussion about as simple a regex as "is it 6 numeric characters?" is why regexes are more trouble than they're worth at least half the time.

If I can write code that does what the regex does in less than 60 seconds, there's no reason to use one. If I write code like this:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match

The primary reason for using a regexp is that the regexp engine is probably much smarter than you are about executing them efficiently. For example, the code you quote is considerably slower (O(N)) than what any reasonable regexp engine would have generated for the correct regexp (O(1)).

You can implement anything in a Turing-complete language, but that doesn't mean you necessarily should. More restrictive languages offer better opportunities for an optimising compiler. Regexps are a special case: for anything that can be implemented as a regexp, there is a known algorithm for generating the fastest code possible. (Turing-complete languages lie at the opposite end of the scale: there is a proof that no such algorithm can exist for a program written in such a language)

zip

@DaveK said:

@zip said:
If I can write code that does what the regex does
in less than 60 seconds, there's no reason to use one. If I write
code like this:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match
There's no gotchas about regex engines and I can get on with my life.
Well, ok, but I would leave out the
match = true
and just write
match = (string length is 6)
I wouldn't bother iterating the chars either, if the length wasn't 6. It's not a WTF, but there's no need to be gratuitously inefficient - how do you suppose Windoze ended up getting so bloated?

I hope some day I can be as 1337 as you at critiquing pseudocode.

zip

@asuffield said:

@zip said:
The fact that there's this much discussion about as simple a regex as "is it 6 numeric characters?" is why regexes are more trouble than they're worth at least half the time.

If I can write code that does what the regex does in less than 60 seconds, there's no reason to use one. If I write code like this:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match
The primary reason for using a regexp is that the regexp engine is probably much smarter than you are about executing them efficiently. For example, the code you quote is considerably slower (O(N)) than what any reasonable regexp engine would have generated for the correct regexp (O(1)).

Are you joking? You really think running time of a regex doesn't scale linearly as input size increases?

fennec

@zip said:

@asuffield said:
@zip said:
The fact that there's this much discussion about as simple a regex as "is it 6 numeric characters?" is why regexes are more trouble than they're worth at least half the time.

If I can write code that does what the regex does in less than 60 seconds, there's no reason to use one. If I write code like this:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match
The primary reason for using a regexp is that the regexp engine is probably much smarter than you are about executing them efficiently. For example, the code you quote is considerably slower (O(N)) than what any reasonable regexp engine would have generated for the correct regexp (O(1)).
Are you joking? You really think running time of a regex doesn't scale linearly as input size increases?

For one that anchors to the start and end with ^$ or \A\Z and makes no use of the Kleen closure? I should hope it doesn't. Otherwise, whoever wrote the regular expression engine clearly hasn't optimized it nearly enough. Would you like me to run some benchmarks?

asuffield

@zip said:

@asuffield said:
@zip said:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match
The primary reason for using a regexp is that the regexp engine is probably much smarter than you are about executing them efficiently. For example, the code you quote is considerably slower (O(N)) than what any reasonable regexp engine would have generated for the correct regexp (O(1)).
Are you joking? You really think running time of a regex doesn't scale linearly as input size increases?

This particular regexp can be resolved by inspecting a maximum of six bytes of memory, regardless of the input size. The standard regexp compiler algorithm will find this solution.

GettinSadda

@asuffield said:

This particular regexp can be resolved by inspecting a maximum of six bytes of memory, regardless of the input size. The standard regexp compiler algorithm will find this solution.

I have two problems with this comment.

1) It would take a minimum of 7 bytes to be checked - one for each digit and at least one one to check that the length is 6 (which could be by reading the following byte and checking for a terminating zero, or reading a length value)

2) Are you really claiming that the C# compiler will optimize a Regex object to inline code by examining the contents of the expression - I very very much doubt that!

zip

@asuffield said:

@zip said:
@asuffield said:
@zip said:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match
The primary reason for using a regexp is that the regexp engine is probably much smarter than you are about executing them efficiently. For example, the code you quote is considerably slower (O(N)) than what any reasonable regexp engine would have generated for the correct regexp (O(1)).
Are you joking? You really think running time of a regex doesn't scale linearly as input size increases?
This particular regexp can be resolved by inspecting a maximum of six bytes of memory, regardless of the input size. The standard regexp compiler algorithm will find this solution.

Oh, ok, so if you know the right input contains exactly 6 chars then you only need to look at 6 bytes of memory. So change my pseudocode to only run the foreach statement if string length is 6, or to break on a 7th character and they're both O(1).

The regex still isn't magically faster. I thought you were saying that a the regex runs in O(1) when N is the number of input chars that need to be examined to determine validation, not just the total number of input chars.

asuffield

@GettinSadda said:

@asuffield said:
This particular regexp can be resolved by inspecting a maximum of six bytes of memory, regardless of the input size. The standard regexp compiler algorithm will find this solution.

I have two problems with this comment.

1) It would take a minimum of 7 bytes to be checked - one for each digit and at least one one to check that the length is 6 (which could be by reading the following byte and checking for a terminating zero, or reading a length value)

Bah. Depends how you're thinking of the problem. For regexps anchored on both ends, yes, you need to check seven bytes. I was assuming left-anchor only, in the style of a stream parser (since that's what regexps are most commonly used for).

2) Are you really claiming that the C# compiler will optimize a Regex object to inline code by examining the contents of the expression - I very very much doubt that!

I have no idea what the C# compiler does. Good regexp engines more or less do this. They don't actually generate inline code, because the standard regexp algorithm uses exactly the same code to evaluate every regexp - what they do is generate a table of data which that code then uses. The code itself will be in a library function, which the compiler may or may not inline, and is approximately this:

i = 0;
state = start;
while (state) {
  c = data[i]; // where data is the string to examine
  edge = state[c];
  state = edge.state;
  i += edge.shift;
}

(All the real work is done by the regexp compiler, which generates a 'program' of states and edges from the supplied regexp. Whether this happens when the containing program is compiled, or when that program is executed, is an implementation detail that doesn't matter here - executing the regexp is exactly the same either way.)

asuffield

@zip said:

Oh, ok, so if you know the right input contains exactly 6 chars then you only need to look at 6 bytes of memory. So change my pseudocode to only run the foreach statement if string length is 6, or to break on a 7th character and they're both O(1).

It still contains two compares per loop, where a good regexp engine might choose to use a lookup table. The point remains: compiling and optimising code by hand is very rarely a good idea; we have compilers for a reason.

Stupidumb

Well, I'm still a bit confused on what you guys are discussing, but that's ok because I figured out the best way to re-write this:

public static bool IsSixDigits(int x1, int x2, int x3, int x4, int x5, int x6)
{
return true;
}

And, I don't know anything about compilers, but it should probably optimize this function to look like this:

/* OW! */

GettinSadda

@asuffield said:

I have no idea what the C# compiler does. Good regexp engines more or less do...

I don't know much about the C# compiler either, but I believe it is similar to the C++ compiler where the library classes are not built into the compiler, but are separate C# header and object files that are linked in. This means that as far as the compiler is concerned you may well be working with a "FooBar" object. It does not know enough about it to do special optimizing.

It is actually possible to write functions and simple classes that can be totally optimized out by a good compiler, but for anything less than trivial they tend to make your brain bleed.

For a trivial example:

inline int IsStringYes(const char *Str) {

if(Str[0] == "Y"] && Str[1] == "e" && Str[2] == "s" && Str[3] == 0) return true;

return false;

}

Will vanish if you call it with a string constant, but:

inline int IsStringYes(const char *Str) {

return strcmp(Str, "Yes");

}

Will still call the strcmp function.

zip

@asuffield said:

@zip said:
Oh, ok, so if you know the right input contains exactly 6 chars then you only need to look at 6 bytes of memory. So change my pseudocode to only run the foreach statement if string length is 6, or to break on a 7th character and they're both O(1).
It still contains two compares per loop, where a good regexp engine might choose to use a lookup table. The point remains: compiling and optimising code by hand is very rarely a good idea; we have compilers for a reason.

Right. If the runtime of this mattered in any way whatsoever, then your regex argument is fair. But in this case, it's a simple comparison, it's going be plenty fast (O(1)) with decently written code and I can move on to bigger and better things instead of worry about the corner cases of my regex engine. Which was my original point.

fine, if your worried about a "\n" at the end of the thing then do this.

if (str.length() != 6) return false;

if (str.match("\D")) return false;

return true;

contrary to what has been said, these short regexes are easier to maintain than for loops looping through the characters.

Rafael_Almeida

@reptar said:

Logarithms are your friends...

Couldn't agree more!

If Ceil(Log(number)) = 6 Return true

( pseudo-code, lol )

Arancaytar

"Do one thing well."

It works, but it's really messed up. Regardless of what application this is in, there is no way that having a function like this would been an improvement over checking the length and number format every time it is needed.

Oh, and the "Six" in the name is a WTF itself. Prediction:

1. In the forseeable future, this function library will be extended by "isFourDigits", "isFiveDigits", "isThreeDigits" etc.

2. The following comment will need to be added at some point.

public static bool isSixDigits(String strDigits) // now actually checks for a length of seven. six is only due to legacy naming.

bobday

@asuffield said:

The primary reason for using a regexp is that the regexp engine is probably much smarter than you are about executing them efficiently. For example, the code you quote is considerably slower (O(N)) than what any reasonable regexp engine would have generated for the correct regexp (O(1)).

Nah baby. The regular expression would have to be compiled at runtime to get any decent performance, and still wouldn't match some simple code here for speed and readability. You've got a good point about the regex engine being smarter than we are, but I think it only applies to problems that are a bit more complex than this.

- Eam

upper_branch

Uh, the people suggesting logarithms are joking, right? I know premature optimization is bad, but there's no need to involve floating-point approximations(or other, probably more expensive implementations of 10-logarithms) where two integer comparisons would suffice and be clearer:

#define IS_SIX_DIGITS(x) ( x >= 100000 && x <= 999999 )

Of course, this assumes that "000042" is not supposed to be a six-digit number, even though it does match the regexp, but logarithms would be useless for checking any property in which, e.g., 000042 is not equivalent to 42, since log( 000042 ) = log( 42 ).

DaveK1

@zip said:

@DaveK said:
@zip said:
If I can write code that does what the regex does
in less than 60 seconds, there's no reason to use one. If I write
code like this:
match = true
match &= (string length is 6)

foreach(char in string)
{
 match &= char >= 0 && char <= 9
}
return match
There's no gotchas about regex engines and I can get on with my life.
Well, ok, but I would leave out the
match = true
and just write
match = (string length is 6)
I wouldn't bother iterating the chars either, if the length wasn't 6. It's not a WTF, but there's no need to be gratuitously inefficient - how do you suppose Windoze ended up getting so bloated?
I hope some day I can be as 1337 as you at critiquing pseudocode.

I hope someday I can be as paranoid as you when perceiving an innocent comment as a personal attack. No, wait a minute, I don't....

Random832

@GettinSadda said:

@asuffield said:
I have no idea what the C# compiler does. Good regexp engines more or less do...
I don't know much about the C# compiler either, but I believe it is similar to the C++ compiler where the library classes are not built into the compiler, but are separate C# header and object files that are linked in. This means that as far as the compiler is concerned you may well be working with a "FooBar" object. It does not know enough about it to do special optimizing.
It is actually possible to write functions and simple classes that can be totally optimized out by a good compiler, but for anything less than trivial they tend to make your brain bleed.
For a trivial example:
inline int IsStringYes(const char *Str) {
if(Str[0] == "Y"] && Str[1] == "e" && Str[2] == "s" && Str[3] == 0) return true;
return false;
}
Will vanish if you call it with a string constant, but:
inline int IsStringYes(const char *Str) {
return strcmp(Str, "Yes");
}
Will still call the strcmp function.

~~I had this whole response written, and then realized you were talking about the C# compiler in particular, not compilers in general.~~

~~Here it is anyway.~~

reads again Nevermind, i was wrong, you were generalizing, my compiler's better than yours.

#include <stdio.h>

inline int IsStringYes(const char *Str) {

      return strcmp(Str, "Yes");

}

int main() {
    printf("%d\n",IsStringYes("Yes"));
    printf("%d\n",IsStringYes("No"));
    printf("%d\n",IsStringYes("File not found"));
    return 0;
}

Output assembler contains no calls to strcmp, and while there is a definition of IsStringYes (this vanishes if static is added), it is not called from the main function.

Relevant output:

        movl    $0, 4(%esp)
        leal    LC1-"L00000000002$pb"(%ebx), %esi ; pointer is to "%d\n\0"
        movl    %esi, (%esp)
        call    L_printf$stub
        movl    $-1, 4(%esp)
        movl    %esi, (%esp)
        call    L_printf$stub
        movl    $-1, 4(%esp)
        movl    %esi, (%esp)
        call    L_printf$stub