The email address you have been using for 10 years is not valid

Zemm

I got sick of waiting for Google to sell the Nexus 4 on the Australian Play Store (it oversold-out in 22 minutes on 13 November and has not returned) so I decided to buy it from the US Play Store. The first step involves getting a "US address" and the place I tried to get one is comGateway. Very first form and my email address fails the verification. Note that I don't have anything weird in my address, not even dashes or numbers, but it is in the form of something@example.id.au. I checked the code:

<input id="freebtn" type="button" value="" size="" align="right" onclick="javascript:checkRegisterForm()">

So it's on the onclick and not onsubmit...

function checkRegisterForm(){
	if(checkData()){
		registerSubmit();
		}
	}

And the relevant lines in checkData():

reg=new RegExp("^([A-Za-z0-9_-]+\.[A-Za-z0-9_-]+)+@[A-Za-z0-9_-]+\.[A-Za-z]{2,4}$");
if(!reg.test(content)){
	content="Please enter a valid email.";
}

For a site aimed at non-US people, they have never heard of third level domains just baffles the mind!?

Bonus points for allowing the illegal underscore in the domain part, allowing something@foobar, some&thing@foo^bar, and unsuccessfully trying to block usernames without a dot. (Hint: The regexp never sees the backslash before the dots, so the dot becomes a wildcard. But you can only have one such character on each side of the "@").

I've got three options:

Find somewhere else
Use my gmail account (probably that's what they wanted anyway)
Run registerSubmit() from the console

Guess which one worked?

locallunatic

But their checking on your email could have been to protect something later in the workflow from throwing a fit. I mean yeah, they should allow valid addresses but this could be more of a prevent problems caused by someone else not doing things right rather than a mistake on their part.

Zemm

No, I ran the JS function, next screen showed me my US address, asking for more information, including my email address again, submitted that fine and it sent me an email (albeit HTML only with the text part "To view the message, please use an HTML compatible email viewer!").

barfoo1

@Zemm said:

For a site aimed at non-US people, they have never heard of third level domains just baffles the mind!?

Even in the US, third-level domains are not that rare in institutional email addresses (for example, last.first@department.university.edu). So wow.

Zemm

@barfoo said:

Even in the US, third-level domains are not that rare in institutional email addresses (for example, last.first@department.university.edu). So wow.

I was thinking that too. I seem to remember even ISPs doing that, like username@city.rr.com. They included an underscore instead of a dot in the regexp, who knows how long it's been like that?

pjt33

@Zemm said:

But you can only have one such character on each side of the "@"

You can have more on the left as long as they're separated by alphanumerics.

Zemm

@pjt33 said:

@Zemm said:
But you can only have one such character on each side of the "@"

You can have more on the left as long as they're separated by alphanumerics.

Ah yes, missed the + after the bracket. But it has to be two [a-zA-Z0-9_-] characters between funny characters.

Side-WTF: in my OP, CS removed the colon after "javascript". We have discussed the redundant "protocol" on an onclick before.

ender

@Zemm said:

Bonus points for allowing the illegal underscore in the domain part, allowing something@foobar, some&thing@foo^bar

You do know that & is valid in the local part of the e-mail address? :)

(also, IIRC, there used to be a TLD with an MX record, though I don't know if any e-mail address was actually set up)

ASheridan2

Pretty much anything is allowed in the local part if you look at the RFC for email address format. http://en.wikipedia.org/wiki/Email_address is actually pretty accurate for that sort of information.

Zemm

@ender said:

You do know that & is valid in the local part of the e-mail address? :)

Yes, but & or ^ is not valid in the domain part. :-P They tried to stop it but double-failed.

@ASheridan2 said:

Pretty much anything is allowed in the local part if you look at the RFC for email address format

They got that by accident, but there's nothing in the regexp to allow the local part to be surrounded by quotes. I have created a table:

Example	Expected	WTFRegExp
niceandsimple@example.com	true	true
very.common@example.com	true	true
a.little.lengthy.but.fine@dept.example.com	true	false
unusual-tld@example.museum	true	false
disposable.style.email.with+symbol@example.com	true	true
user@[IPv6:2001:db8:1ff::a0b:dbd0]	true	false
"much.more unusual"@example.com	true	false
"very.unusual.@.unusual.com"@example.com	true	false
"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com	true	false
postbox@com (top-level domains are valid hostnames)	true	false
!#$%&'*+-/=?^_`{}\|~@example.org	true	false
"()<>[]:,;@\\\"!#$%&'*+-/=?^_`{}\| ~ ? ^_`{}\|~.a"@example.org	true	false
""@example.org	true	false
Abc.example.com (an @ character must separate the local and domain parts)	false	false
Abc.@example.com (character dot(.) is last in local part)	false	false
Abc..123@example.com (character dot(.) is double)	false	false
A@b@c@example.com (only one @ is allowed outside quotation marks)	false	false (but a@bb@c@example.com would be true)
a"b(c)d,e:f;g<h>i[j\k]l@example.com (none of the special characters in this local part is allowed outside quotation marks)	false	false (but if the alphas were doubled it would be true)
just"not"right@example.com (quoted strings must be dot separated, or the only element making up the local-part)	false	true
this is"not\allowed@example.com (spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash)	false	true
this\ still\"not\\allowed@example.com (even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes)	false	false

It is impossible to validate all possible email addresses by regexp (while disallowing all possible invalid ones).

ASheridan2

Oh probably, that doesn't stop people trying though!

Sutherlands

@Zemm said:

It is impossible to validate all possible email addresses by regexp (while disallowing all possible invalid ones).

That's patently untrue. You can just create a regex listing all possible e-mail addresses.

spamcourt

I'm usually a big proponent of "follow standards, even if they have some flaws", but these rules just seems way too overcomplicated (seriously, [i]comments[/i] in email addresses? And why isn't ".." allowed?). Is there any real advantage over just saying "allowed characters in the local part are a-zA-Z0-9_.+- and it must have between 1 and 255 characters"?

ASheridan2

Is there any real advantage to limiting the allowed characters in email addresses? What about people with names that aren't just a-z? This kind of thinking is why email address format is in such a mess, developers look at the spec, decide it's too complicated, and then make up their own set of rules that they think seem reasonable based on their experience which doesn't actually see the bigger picture, and end up causing needless limitations.

error

It's often the most valuable piece of information collected by the form; also, you want to prevent headaches - for users, for support, and for admins - caused by invalid data making it further in the pipeline.

I usually validate these with (one or more of any character)@(one or more of any character).(one or more of any character)

That gets the most blatantly invalid ones without (I think) excluding any valid ones. There are diminishing returns on validating beyond that.

locallunatic

@joe.edwards said:

I usually validate these with (one or more of any character)@(one or more of any character).(one or more of any character)

That gets the most blatantly invalid ones without (I think) excluding any valid ones.

ThisOneIsTechnicallyValid@com is excluded, but the chance of ever having to deal with that is so low it isn't worth fixing.

Lorne Kates

@locallunatic said:

But their checking on your email could have been to protect something later in the workflow from throwing a fit.

If skipping a client-side validation check screws up some later server-side process, then they have way more problems on hand than a regex.

error

@locallunatic said:

ThisOneIsTechnicallyValid@com is excluded, but the chance of ever having to deal with that is so low it isn't worth fixing.

Yeah, actually in most cases you wouldn't want to allow that address even if it is technically valid.

locallunatic

@Lorne Kates said:

If skipping a client-side validation check screws up some later server-side process, then they have way more problems on hand than a regex.

I didn't mean to imply that the check was only done client side, just that the check could be to catch something during submission that would cause a different poorly made system to choke (like some client that sends out emails, too late to fix things at that point so catch it early and plan violence against the makers of the emailer client).

ASheridan2

@locallunatic said:

ThisOneIsTechnicallyValid@com is excluded, but the chance of ever having to deal with that is so low it isn't worth fixing.

Are you sure 'com' is a valid domain? Sure it's a valid TLD, but doesn't it need more than just the TLD?

georgir

Wow, all that quoting and backslash escaping in the local part really surprises me.

Without having read the spec, I had always regarded email addresses as just a special case (i.e. without the pass part) of a generic user:pass@domain URL, where user and pass can be absolutely anything, and have to be percent-encoded.

Quoting and backslash encoding really doesn't mix well with that concept... WTF specs.

Someone_You_Know

@ASheridan2 said:

@locallunatic said:
ThisOneIsTechnicallyValid@com is excluded, but the chance of ever having to deal with that is so low it isn't worth fixing.
Are you sure 'com' is a valid domain? Sure it's a valid TLD, but doesn't it need more than just the TLD?

RFC 822 actually says that you only need to include the leftmost (least-significant) level of the domain part if the rest of it is the same as the sender's address. So, whether "com" is a valid domain name in and of itself or not, according to the RFC alice@example.com should be able to send email to bob@example.com by just specifying "bob@example". I have no idea if any mail server in the world actually allows this sort of thing.

ASheridan2

How would any system determine if an email for alice@example was intended for alice@example.com, or alice@example.net? I dug a little into it and it seems that that is only valid where sending to an email address that shares parts of the right side of the domain in common:

"For example, if a sender's address is:
             sender@registry-A.registry-1.organization-X

    and one recipient's address is:

            recipient@registry-B.registry-1.organization-X

    and another's is:

            recipient@registry-C.registry-2.organization-X

    then ".registry-1.organization-X" need not be specified in the
    the  message,  but  "registry-C.registry-2"  DOES  have  to be
    specified.  That is, the first two addresses may  be  abbrevi-
    ated, but the third address must be fully specified." </pre><pre>So I think it would only be considered a valid email address when testing it against local servers. <br></pre><p>&nbsp;</p>

Someone_You_Know

@ASheridan2 said:

I dug a little into it and it seems that that is only valid where sending to an email address that shares parts of the right side of the domain in common

...that would be what I said, yes.

I'm not saying it's a good idea, just that if you want to be 100% in compliance with RFC 822's address format, you have to allow the domain portion of the address to have only one component.

Sutherlands

@joe.edwards said:

It's often the most valuable piece of information collected by the form; also, you want to prevent headaches - for users, for support, and for admins - caused by invalid data making it further in the pipeline.

I usually validate these with (one or more of any character)@(one or more of any character).(one or more of any character)

That gets the most blatantly invalid ones without (I think) excluding any valid ones. There are diminishing returns on validating beyond that.

user@[IPv6:2001:db8:1ff::a0b:dbd0]

The only way to validate an email is to send an email.

Sutherlands

Oh, and while I'm at it. I had to create an Etsy account a year or so ago to buy something that my wife wanted. They didn't allow periods in the email on your sign up! I have no idea if they do now or not, but they said it was for "security". I can't even think of why that would help, even from an idiot developer's point of view.

ASheridan2

@Sutherlands said:

The only way to validate an email is to send an email.

There's a difference between validating an email address and validating the format of an email address.

ASheridan2

@Sutherlands said:

I can't even think of why that would help, even from an idiot developer's point of view.

They were having trouble matching the . with regular expressions

Sutherlands

@ASheridan2 said:

@Sutherlands said:
The only way to validate an email is to send an email.
There's a difference between validating an email address and validating the format of an email address.

And since regexes can't do either, the only way to do either is to send an email.

dtobias

I think the reason for all that complexity in what's allowed by the standards is the need, at every stage of the standardization process, to preserve compatibility with all the "legacy" address formats that have ever been used before, and might possibly still be in somebody's address; and that's a huge number of formats given that the Internet mail format standards derive in a continuous progression from standards used on various academic and research networks as far back as the early 1970s.

Ben L.

@Sutherlands said:

@ASheridan2 said:
@Sutherlands said:
The only way to validate an email is to send an email.
There's a difference between validating an email address and validating the format of an email address.
And since regexes can't do either, the only way to do either is to send an email.

Regular expressions can validate the format of an email address. For example, email addresses cannot contain newlines and are at least one byte long.

error

@Sutherlands said:

@joe.edwards said:
It's often the most valuable piece of information collected by the form; also, you want to prevent headaches - for users, for support, and for admins - caused by invalid data making it further in the pipeline.

I usually validate these with (one or more of any character)@(one or more of any character).(one or more of any character)

That gets the most blatantly invalid ones without (I think) excluding any valid ones. There are diminishing returns on validating beyond that.
user@[IPv6:2001:db8:1ff::a0b:dbd0]
The only way to validate an email is to send an email.

Again, this is another example of a technically valid email address that I would not want polluting our database. If anything, this counterexample encourages me to be less lenient in validation.

locallunatic

@joe.edwards said:

Again, this is another example of a technically valid email address that I would not want polluting our database.

The point was that your original post said that you were using a basic validator and that you thought that it did not exclude valid values. People were mearly showing things that are valid that you would reject, not that rejecting them isn't acceptable (or at least that isn't what I intended).

blakeyrat

@joe.edwards said:

Again, this is another example of a technically valid email address that I would not want polluting our database.

Why?

error

@blakeyrat said:

@joe.edwards said:
Again, this is another example of a technically valid email address that I would not want polluting our database.

Why?

Um, because the domain part is an IP address? Is there a good reason to accept deliberately obfuscated addresses? I can't think of any legitimate reasons to use that format.

error

@locallunatic said:

@joe.edwards said:
Again, this is another example of a technically valid email address that I would not want polluting our database.

The point was that your original post said that you were using a basic validator and that you thought that it did not exclude valid values. People were mearly showing things that are valid that you would reject, not that rejecting them isn't acceptable (or at least that isn't what I intended).

You know, I put that parenthetical in there specifically to ward off the pedantic dickweeds.

I was outlining what I feel is a good way to stop junk email addresses while still allowing legitimate ones. The @com and @[ipv6] examples just made me think, "oh, good, it's doing its job."

blakeyrat

@joe.edwards said:

Um, because the domain part is an IP address?

Adding the word "um" doesn't allow me to magically read your mind. Why does it matter that it's an IP address?

@joe.edwards said:

Is there a good reason to accept deliberately obfuscated addresses?

What makes you think it's deliberately obfuscated?

@joe.edwards said:

I can't think of any legitimate reasons to use that format.

I can't think of any reasons not to. IMPASSE!

Lorne Kates

@Sutherlands said:

The only way to validate an email is to send an email.

And even then, not so much.

@Sutherlands said:

They didn't allow periods in the email on your sign up! I have no idea if they do now or not, but they said it was for "security". I can't even think of why that would help, even from an idiot developer's point of view.

Because any character that isn't alphanumeric or @ can be used to inject teh SQLs! That's why you can't have special characters in your password. For security.

And runner-up for the dick-punch award: any developer who disables pasting and/or autocomplete on "enter your email" and "enter your email again" fields.

I'm pasting it into the "again" field because I copied it from the "email" field. And I copied it from the "email" field, because its correct. And I know its correct, because it was autocompleted from the last 500 forms I filled out with a field called "email".

error

@blakeyrat said:

@joe.edwards said:
Um, because the domain part is an IP address?

Adding the word "um" doesn't allow me to magically read your mind. Why does it matter that it's an IP address?

@joe.edwards said:
Is there a good reason to accept deliberately obfuscated addresses?

What makes you think it's deliberately obfuscated?

@joe.edwards said:
I can't think of any legitimate reasons to use that format.

I can't think of any reasons not to. IMPASSE!

Please, show me one person using that format for their actual email address. Just one. The dorkiest of the nerdiest of the geeks I can dream up would not use that address, and if they did just for street cred, I don't think they'd be the least bit surprised or confused that they couldn't sign up for our newsletter.

blakeyrat

@joe.edwards said:

Please, show me one person using that format for their actual email address. Just one. The dorkiest of the nerdiest of the geeks I can dream up would not use that address, and if they did just for street cred, I don't think they'd be the least bit surprised or confused that they couldn't sign up for our newsletter.

So what? How would it hurt you to accept his rare email address? Why is it a problem?

I just want one ACTUAL REASON that isn't based on your knee-jerk reaction or "it's uncommon". One actual reason is all I want.

error

@blakeyrat said:

@joe.edwards said:
Please, show me one person using that format for their actual email address. Just one. The dorkiest of the nerdiest of the geeks I can dream up would not use that address, and if they did just for street cred, I don't think they'd be the least bit surprised or confused that they couldn't sign up for our newsletter.

So what? How would it hurt you to accept his rare email address? Why is it a problem?

I just want one ACTUAL REASON that isn't based on your knee-jerk reaction or "it's uncommon". One actual reason is all I want.

I gave you an actual reason: it's clearly obfuscated. I cannot verify what domain it belongs to. I can't check to see if it's a duplicate, because you could write the same address multiple ways.

This is an example of a malicious input, and it's a good thing that validation filters it.

Lorne Kates

@joe.edwards said:

Please, show me one person using that format for their actual email address. Just one

Doesn't matter. I'm siding with Blakey on this one. Its a valid address. Maybe someone owns an IP but not a domain name. Easy to imagine in an IPv6 world where EVERYTHING will be routable. Maybe someone is waiting on a domain order to go through. Who knows? It's valid.

I can't think of a single reason to reject it. It doesn't cause a security hole. Unless your business process is "read out the email address to a typist, who then types it back in, types in the newsletter-- sorry E-lect-tronic newsletter-- to each individual on the E-lect-tronic mailng list-- and she gets easily confused by long strings of numbers".

error

@Lorne Kates said:

Easy to imagine in an IPv6 world where EVERYTHING will be routable.

It's also easy to imagine a current world, where the lack of existing IPv6 infrastructure means our mail server can't route this email.

You know, like I said twice already.

blakeyrat

@joe.edwards said:

I gave you an actual reason: it's clearly obfuscated. I cannot verify what domain it belongs to.

Who cares what domain it belongs to? Or even if it belongs to a domain? Why does that matter? How does that harm you? You still haven't answered the question, you just keep dropping little hints like this. Just answer the fucking question.

@joe.edwards said:

I can't check to see if it's a duplicate, because you could write the same address multiple ways.

How do you know this isn't the only valid representation? Not every IP has a DNS entry.

@joe.edwards said:

This is an example of a malicious input,

How is it malicious?

Just answer the fucking question already and stop dodging. Or I'll just assume the actual answer is, "joe.edwards is a dumbshit who hasn't spent even 10 milliseconds thinking about the design of his program." Because I suspect that's the actual answer.

error

@blakeyrat said:

Just answer the fucking question already and stop dodging. Or I'll just assume the actual answer is, "joe.edwards is a dumbshit who hasn't spent even 10 milliseconds thinking about the design of his program." Because I suspect that's the actual answer.

If you can't understand why it's a bad thing to accept an email address that we can't send mail to or verify any affiliation with or even check to see if it already exists, then I'm sorry for you. I guess it takes a little more brainpower to see beyond the basic problem of RFC compliance, into what information is actually useful to collect.

A technically valid address that we can't send mail to is less than worthless to our organization. It's a detriment.

Lorne Kates

@joe.edwards said:

because you could write the same address multiple ways

Do you also filter for example@gmail.com and example+YouSuckJoeEdwards@gmail.com?

Do you also filter for example@domain.com and example@domainalias.com?

Do you also filter for example@example.com and CatchAll@example.com, both of which route to example@example.com?

Do you also filter for example@example.com and DistributionListThatExampleIsOn@example.com?

(Note, if you answered "yes" to any of these, then you and your entire infrastructure is deeply flawed)

@joe.edwards said:

email Can't route IPv6

Is your email subsystem written in Turbo Pascal?

error

@Lorne Kates said:

Do you also filter for example@gmail.com and example+YouSuckJoeEdwards@gmail.com?
Do you also filter for example@domain.com and example@domainalias.com?
Do you also filter for example@example.com and CatchAll@example.com, both of which route to example@example.com?
Do you also filter for example@example.com and DistributionListThatExampleIsOn@example.com?

We're well aware of these problems and actually use a number of heuristics to identify when a user is probably using multiple identities and correlate that data together. The gmail one in particular is easy to correct for, though it's not something that gets caught in client-side validation. Client-side validation is mostly for the user's benefit, to give them immediate feedback on issues that would otherwise cause them to have to go back in the process.

@Lorne Kates said:

@joe.edwards said:
email Can't route IPv6
Is your email subsystem written in Turbo Pascal?

No, but the network infrastructure here was established before IPv6 was a thing that existed, and no one was inclined to change what worked just fine.

blakeyrat

@joe.edwards said:

If you can't understand why it's a bad thing to accept an email address that we can't send mail to

Why not? It's a perfectly valid email address. If you can't send email to it, that's your problem, not your customer's.

@joe.edwards said:

A technically valid address that we can't send mail to is less than worthless to our organization. It's a detriment.

So fix it. Again, that is your problem: your program is buggy and you're sitting here telling us that your shit don't stink. Fix it.

You still haven't explained why you believe the email address is obfuscated or how it's a malicious input. While you were failing to answer the basic question you built-up a whole stable of new questions you now need to explain.

locallunatic

@blakeyrat said:

@joe.edwards said:
If you can't understand why it's a bad thing to accept an email address that we can't send mail to
Why not? It's a perfectly valid email address. If you can't send email to it, that's your problem, not your customer's. @joe.edwards said:
A technically valid address that we can't send mail to is less than worthless to our organization. It's a detriment.
So fix it. Again, that is your problem: your program is buggy and you're sitting here telling us that your shit don't stink. Fix it.

Not being able to send to some addresses is broken, but it may be more cost effective to build validation that doesn't allow things that trigger those bugs than to fix them. Annoying yes, but it MAY be a valid business decision.

blakeyrat

Yes and it's fine as long as joe.blow there admits it's his product that's in the wrong and the reason that he can't allow those email addresses is because his engineers fucked up when developing it.

But to say that nothing should accept those email addresses, or that those email addresses are malicious or obfuscated, that's just insane crazytalk.