Speaking of javaScript...

..though this is wrong in any language:

var lastChar = str.charAt(emailStr.length - 1);
if(!lastChar.match(/[^\.]/i)) {
	return false;
}

Rarely do I think " w... t... F???! , but this was one of those times.

I guess

if( str[str.length-1] == '.' ) return false;

wouldn't be case insensitive, huh?

morbiuswilters

for (var i = 0; i < str.length; i++) {
    if ((str.charAt(i) == '.') && (i == (str.length - 1))) {
        return false;
    }
}

djmaze

Lets analyze if(!lastChar.match(/[^\.]/i))

[^\,] = not a dot

if (not not dot) return false

But the real WTF: why only check for a dot in the email address? Is anything other then a-z valid (&, ^, $, %, #, >, >, etc.)?

@djmaze said:

if (not not dot) return false

Correct.@djmaze said:

But the real WTF: why only check for a dot in the email address?

Eh, I failed the anonymization, didn't I?@djmaze said:

Is anything other then a-z valid (&, ^, $, %, #, >, >, etc.)?

Well, this is only part of the function. The email string is then checked against another regular expression, which kind of makes this redundant, to top the WTFness.

ender

@djmaze said:

But the real WTF: why only check for a dot in the email address? Is anything other then a-z valid (&, ^, $, %, #, >, >, etc.)?

The local part of the e-mail address can't really be checked for validity in any other way than by connecting to server and trying to send something. One of my e-mail addresses is &#/|@mydomain.

morbiuswilters

@ender said:

The local part of the e-mail address can't really be checked for validity in any other way than by connecting to server and trying to send something. One of my e-mail addresses is &#/|@mydomain.

Or read the RFC. Seriously, there are standards for what constitutes a valid email address and even though it's a mess, it's still possible to tell a legitimate address without connecting to the server.

sootzoo

ObligRandalSchwartz

@morbiuswilters said:

Or read the RFC. Seriously, there are standards for what constitutes a valid email address and even though it's a mess, it's still possible to tell a legitimate address without connecting to the server.

site:regexlib.com WRONG WRONG WRONG - Google Search

ender

@morbiuswilters said:

Or read the RFC. Seriously, there are standards for what constitutes a valid email address and even though it's a mess, it's still possible to tell a legitimate address without connecting to the server.

I said validity, not RFC conformity. You can check if the address conforms to the RFC 822 through some fairly complicated checks, but even if the address passes them, it doesn't tell you that it's actually valid and not something made up on the spot. As long as there's something in front of @domain, you should accept the address, and if you want to check if it's valid, send a confirmation message to it.

morbiuswilters

@ender said:

I said validity, not RFC conformity. You can check if the address conforms to the RFC 822 through some fairly complicated checks, but even if the address passes them, it doesn't tell you that it's actually valid and not something made up on the spot. As long as there's something in front of @domain, you should accept the address, and if you want to check if it's valid, send a confirmation message to it.

Conformity to the standard is part of validity. Just because your mail server will accept any garbage thrown at it doesn't mean I should accept your invalid email address. The fact that you fail elementary data validation scares me. What happens when someone submits an address that is crafted to trigger a buffer overflow in an MTA to your app? It doesn't even have to be your MTA, the attacker can just use the domain of an MTA with a known vulnerability and have your server launch the attack.

I suppose you also just insert GET vars into your SQL queries, right? "Hey, my database should accept any data that I put into it! Gee-haw!"

morbiuswilters

@sootzoo said:

http://www.google.com/search?q=site%3Aregexlib.com+WRONG+WRONG+WRONG

Wow, you fail on so many levels. First, I never said it was possible to validate an RFC address using regexes -- it's not. Second, I never said you couldn't be more restrictive in what you accepted. Honestly, I wouldn't bother accepting anything other than "standard" charcters -- fuck anyone who uses a + or a $ in their address. However, you don't have to connect to the server to validate the address. Seriously, pick up a book and teach yourself to rea-- oh, wait..

ender

@morbiuswilters said:

Conformity to the standard is part of validity. Just because your mail server will accept any garbage thrown at it doesn't mean I should accept your invalid email address. The fact that you fail elementary data validation scares me.

Well, it is possible to validate an address to conform to RFC-822, but even when address does conform, you can't know if it's valid until you actually try to send something to it. And checking for conformity won't catch typos, while overzealously limiting valid characters will prevent perfectly working e-mail addresses from being entered.@morbiuswilters said:

What happens when someone submits an address that is crafted to trigger a buffer overflow in an MTA to your app?

I accept it - if it can trigger this through the web form, it'd be even easier if he connected to the e-mail server directly and do the same thing.@morbiuswilters said:

It doesn't even have to be your MTA, the attacker can just use the domain of an MTA with a known vulnerability and have your server launch the attack.

That's not my problem. If administrator of domain X doesn't keep his software patched, it's not my place to guard him from vulnerabilities on his public facing servers.@morbiuswilters said:

I suppose you also just insert GET vars into your SQL queries, right? "Hey, my database should accept any data that I put into it! Gee-haw!"

I'm not actually a developer, but the few CGI scripts I threw together all used parametric queries. They're much easier to use than doing manual escaping and hoping you didn't miss anything. I also don't see how this relates to validating e-mail addresses.

morbiuswilters

@ender said:

Well, it is possible to validate an address to conform to RFC-822, but even when address does conform, you can't know if it's valid until you actually try to send something to it. And checking for conformity won't catch typos, while overzealously limiting valid characters will prevent perfectly working e-mail addresses from being entered.

Obviously it doesn't catch those things and sending a verification email that has a link the user is required to click before they can proceed will at least confirm the address exists for now. Of course, it doesn't protect you against disposable addresses or an incorrectly-entered address that actually happens to be somebody else's mailbox or a hacker who has compromised the email account or the user from dying mere seconds after clicking the verification link or brain slugs that take control of the user and force him to use his email account for nefarious brain slug ends.

@ender said:

I accept it - if it can trigger this through the web form, it'd be even easier if he connected to the e-mail server directly and do the same thing.

Not necessarily, the local MTA can be different than your incoming MX MTA. Additionally, data entered locally is usually more trusted than data obtained remotely. Unless we're talking about your web apps, then remotely-obtained data isn't your problem!

@ender said:

That's not my problem. If administrator of domain X doesn't keep his software patched, it's not my place to guard him from vulnerabilities on his public facing servers.

Terrific attitude. I sure appreciate that you can't be bothered to prevent your own mediocre software from becoming an attack vector for hackers. I will be sure to withhold any help or sympathy when your servers are one day compromised.

@ender said:

I'm not actually a developer, but the few CGI scripts I threw together all used parametric queries. They're much easier to use than doing manual escaping and hoping you didn't miss anything. I also don't see how this relates to validating e-mail addresses.

Then why are we even having this conversation? If you're not a developer and don't intend to follow common development practices, why argue with me?

ender

@morbiuswilters said:

Terrific attitude. I sure appreciate that you can't be bothered to prevent your own mediocre software from becoming an attack vector for hackers. I will be sure to withhold any help or sympathy when your servers are one day compromised.

Sorry, but if a 3rd party server is compromised through an intended use of my server, that's not my fault.@morbiuswilters said:

Then why are we even having this conversation? If you're not a developer and don't intend to follow common development practices, why argue with me?

I love an argument :) . And I'm pissed off by web forms that claim my e-mail address is invalid because it happens to contain a -, or a | (or in one case, because my e-mail domain didn't have an A record).

morbiuswilters

@ender said:

Sorry, but if a 3rd party server is compromised through an intended use of my server, that's not my fault.

Hmm, seems like a pretty callous attitude to take. What if you had an "email this page to a friend" form that was used to spam someone? Would it be their fault for not having better spam protection?

@ender said:

I love an argument :) .

Heh, me too.

@ender said:

And I'm pissed off by web forms that claim my e-mail address is invalid because it happens to contain a -, or a | (or in one case, because my e-mail domain didn't have an A record).

I don't see why - would be rejected, but I would not accept |. Basically, a-z, 0-9, -, _ and . should all be allowed. That's 99.9999% of users. Anyone who has a pipe in their address is probably a nerd anyway and should just set up an alias to forward their mail so they can have a normal email address for sites that don't accept their main address.

ender

@morbiuswilters said:

Hmm, seems like a pretty callous attitude to take. What if you had an "email this page to a friend" form that was used to spam someone? Would it be their fault for not having better spam protection?

No, that's a different kind of situation - such form would be my problem.@morbiuswilters said:

I don't see why - would be rejected, but I would not accept |. Basically, a-z, 0-9, -, _ and . should all be allowed. That's 99.9999% of users. Anyone who has a pipe in their address is probably a nerd anyway and should just set up an alias to forward their mail so they can have a normal email address for sites that don't accept their main address.

Why - and _, but not |? RFC 822 allows all of them equally. Also, don't forget about +, which many sites let you use to add some unique ID to your e-mail address (I know several non-geek gmail users that take advantage of this).

morbiuswilters

@ender said:

Why - and _, but not |? RFC 822 allows all of them equally. Also, don't forget about +, which many sites let you use to add some unique ID to your e-mail address (I know several non-geek gmail users that take advantage of this).

Because I don't like them and it's such a small subset of users that I don't mind telling them no. + would probably be fine, too.

Kyanar

@sootzoo said:

@morbiuswilters said:
Or read the RFC. Seriously, there are standards for what constitutes a valid email address and even though it's a mess, it's still possible to tell a legitimate address without connecting to the server.

site:regexlib.com WRONG WRONG WRONG - Google Search

You might find this interesting then:

Regular Expression Library

It actually looks valid!

Kyanar

Ignore my last post, the fucking forum software expired my edit timer - woo 30 seconds!

@sootzoo said:

@morbiuswilters said:
Or read the RFC. Seriously, there are standards for what constitutes a valid email address and even though it's a mess, it's still possible to tell a legitimate address without connecting to the server.

site:regexlib.com WRONG WRONG WRONG - Google Search

You might find this interesting then:

Regular Expression Library

It actually looks valid!

@morbiuswilters said:

@ender said:
Why - and _, but not |? RFC 822 allows all of them equally. Also, don't forget about +, which many sites let you use to add some unique ID to your e-mail address (I know several non-geek gmail users that take advantage of this).
Because I don't like them and it's such a small subset of users that I don't mind telling them no. + would probably be fine, too.

If you can't be bothered validating it properly, just don't validate it. You have no business telling users what characters are "allowed" in their email addresses. Doing what you're doing just adds you to the ranks of those plagues on the internet who believe that you either agree with the way they do things or you can piss off. But hey, based on your responses in this thread that is your mentality.

morbiuswilters

@Kyanar said:

You might find this interesting then:

Regular Expression Library

It actually looks valid!

There is no regex that will actually fully validate all email addresses. It's not possible.

morbiuswilters

@Kyanar said:

If you can't be bothered validating it properly, just don't validate it. You have no business telling users what characters are "allowed" in their email addresses. Doing what you're doing just adds you to the ranks of those plagues on the internet who believe that you either agree with the way they do things or you can piss off. But hey, based on your responses in this thread that is your mentality.

Of course I have business telling them what addresses I will accept. If they can't deal, they can go elsewhere. I will make allowances for any commonly-used character, but nobody needs nested comments or other garbage in their addresses and by refusing to support them I'm making a decision to save tons of time and energy. Considering that 99.99% of sites don't validate addresses right, I hardly consider this to be a big deal and I'm certainly not alone. By this same logic, I would only support payment in US Dollars and Euros, I would only provide support and documentation in English. I'm only going to work with clients that support HTML 4.01, CSS and Javascript. None of these are unreasonable. Sounds like you are the type who would rather complain than contribute anything useful to society. But, hey, when you write a real grown-up app, feel free to support the entire range of valid addresses. Until then, take your money and bitching away from me.

Kyanar

@morbiuswilters said:

@Kyanar said:
You might find this interesting then:

Regular Expression Library

It actually looks valid!
There is no regex that will actually fully validate all email addresses. It's not possible.

I'm gonna disagree with that. At http://www.regular-expressions.info/email.html is a regex that fully implements the RFC. It will not, however, tell you if the email is real (only valid) or if the TLD is even possible.

That said, just save time - send a confirmation email.

morbiuswilters

@Kyanar said:

I'm gonna disagree with that. At http://www.regular-expressions.info/email.html is a regex that fully implements the RFC.

Then you're gonna be wrong. Too bad you didn't read the article you linked to, because it clearly says that the regex at the top is the recommended on. It's the best from a practical standpoint and is supported by almost everything. It happens to almost exactly match what I described here. It is impossible to fully validate an email address because the RFC allows for abitrary nested comments (yes, comments) in an email address. Perl forms a tiny exception to this because it allows you to embed perl code into a regex, meaning any perl code is technically a valid regex. However, most people would not consider perl embedded in a regex a real regex, so it remains impossible.

ender

@morbiuswilters said:

There is no regex that will actually fully validate all email addresses. It's not possible.

There's a long ass one that implements full RFC 822 spec, except for comments (which have to be filtered out first).@Kyanar said:

I'm gonna disagree with that. At http://www.regular-expressions.info/email.html is a regex that fully implements the RFC. It will not, however, tell you if the email is real (only valid) or if the TLD is even possible.

&#/|@example.org is a vaild e-mail address which that regex doesn't match. Whoops, you fail. It also doesn't match any e-mail address on the .museum TLD.

morbiuswilters

@ender said:

There's a long ass one that implements full RFC 822 spec, except for comments (which have to be filtered out first).

Thanks for finding that, I was unable to locate that page earlier. That regex should give anyone who doesn't know an idea of how insane RFCs 822/2822 are. And of course you have to strip comments which isn't a big deal or anything, but technically means it is impossible for a regex to 100% validate an email address. Of course, I'm not arguing you should do something so complex, I would just limit users to common, simple addresses. Anyway, I think that's all I want to say on the address validation thing. It was a pleasure arguing with you, ender. :-D

PJH

@morbiuswilters said:

fuck anyone who uses a + or a $ in their address.

<googlename>+antispam@gmail.com

I've had problems. Amex being one of them. They don't like the + on *some* of their services. Specifically, they allow the + as an address to sign up, but verifying after you've lost a card... "Try again, your email address is invalid."

Of course you tried without the '+antispam' part, in case they only stored the part before?

j6cubic

@morbiuswilters said:

Of course I have business telling them what addresses I will accept. If they can't deal, they can go elsewhere.

That's exactly what I do. When a website tells me I'm not allowed to use my mail server's character of choice to sort mail depending on which site I gave the address to I take my business somewhere else where they have more competent web developers.

Nested comments aren't even necessary. The following set contains all characters you need to allow to support all reasonable and semi-reasonable addresses:
A-Za-z0-9!#$%&'*+-/=?&`_{}|
That's it. Not hard to write a regexp for.

Physics_Phil

@j6cubic said:

@morbiuswilters said:
Of course I have business telling them what addresses I will accept. If they can't deal, they can go elsewhere.
That's exactly what I do. When a website tells me I'm not allowed to use my mail server's character of choice to sort mail depending on which site I gave the address to I take my business somewhere else where they have more competent web developers.
Nested comments aren't even necessary. The following set contains all characters you need to allow to support all reasonable and semi-reasonable addresses:
A-Za-z0-9!#$%&'*+-/=?&`_{}|
That's it. Not hard to write a regexp for.

This is a necessary but not sufficient condition for a valid email address, since although all valid email addresses would pass, not all invalid addresses would fail.

morbiuswilters

@j6cubic said:

That's exactly what I do. When a website tells me I'm not allowed to use my mail server's character of choice to sort mail depending on which site I gave the address to I take my business somewhere else where they have more competent web developers.

I bet you also wet yourself when a site requires Javascript or will only ship to the United States. "Oh noes, how dare they put contraints on what I can do with the service they provide!!"

@j6cubic said:

Nested comments aren't even necessary. The following set contains all characters you need to allow to support all reasonable and semi-reasonable addresses:
A-Za-z0-9!#$%&'*+-/=?&`_{}|
That's it. Not hard to write a regexp for.

Oh, so when I arbitrarily disallow certain characters and patterns, I'm "incompetent" but you when you do it, it's alright? You sound like the kind of person who can't ever get software shipped because you spend 500 hours making complex rules for yourself for no other reason than feeding your OCD.

j6cubic

@morbiuswilters said:

Oh, so when I arbitrarily disallow certain characters and patterns, I'm "incompetent" but you when you do it, it's alright? You sound like the kind of person who can't ever get software shipped because you spend 500 hours making complex rules for yourself for no other reason than feeding your OCD.

Okay, I admit that RFC 2822 (which I derived that set from) is a horrible source to use when determining which characters are allowed in the subject line of an RFC 2822-conformant email message. Mea culpa.

lolwtf

Websites that require Javascript are still made of fail and AIDS though.
@PJH said:

@morbiuswilters said:
fuck anyone who uses a + or a $ in their address.

<googlename>+antispam@gmail.com

I've had problems. Amex being one of them. They don't like the + on *some* of their services. Specifically, they allow the + as an address to sign up, but verifying after you've lost a card... "Try again, your email address is invalid."

Obvious solution is to just not use the + here. Also I somehow doubt modern bots aren't capable of removing common strings like nospam and antispam. Use something more creative.

PJH

@lolwtf said:

@PJH said:
[...][Amex] don't like the + on *some* [parts] of their services.[...]

Obvious<wbr> solution is to just not use the + here.

No it's not. There are a few obvious solutions, but that is not one of them.

Especially since they can't decide, in Amex's case, whether to allow it or not:

1) Sign up, use a + in your address. Amex fail to gripe, and even send a confirmation email to that address.

2) Lose your card

3) Get a new card. Go online to verify your details to activate your card. Enter (among other things) above email address with + in it.

4) "Your email address is invalid"

5) Remove + (and stuff afterwards, to retain my email address)

6) "Your email address does not match the one we have on file"

7) Ring up and complain that their online reactivation service is borked.

I was assured by the CSA on the other end of the line that, having explained it, he understood the problem and my comments would be passed up the food chain.

Physics_Phil

@lolwtf said:

Obvious solution is to just not use the + here. Also I somehow doubt modern bots aren't capable of removing common strings like nospam and antispam. Use something more creative.

This isn't what the + is for. A + is used by many ststems to sort the message into a category, so messages with antispam in the to address are likely to be scraped from a forum or suchlike.

EDIT: probably shouldn't have responded to the idiot, but oh well.

morbiuswilters

@Physics Phil said:

EDIT: probably shouldn't have responded to the idiot, but oh well.

No, you should have responded, but you should have insulted his intelligence and flamed him for resurrecting a dead thread.

AbbydonKrafts

@morbiuswilters said:

flamed him for resurrecting a dead thread.

Wasn't dead enough. It was still twitching some.