How not to check the validity of an email address





  • Ah yes, "hit by a bus."



  •  What's wrong here? Almost all web servers use compression nowadays, so much less than the 2.5MB will actually be transmitted, especially because text compresses so well. And in fact 2.5MB is not really that much on todays computers, seeing most people have at least 512MB of RAM nowadays.



  • @pbean said:

     What's wrong here? Almost all web servers use compression nowadays, so much less than the 2.5MB will actually be transmitted, especially because text compresses so well. And in fact 2.5MB is not really that much on todays computers, seeing most people have at least 512MB of RAM nowadays.

    Brillant!


  • ♿ (Parody)

    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.



  • @boomzilla said:

    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.

    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.



  • @pjt33 said:

    @boomzilla said:
    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.
    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.

    Or "|ThisStringWillNeverOccurInAnEmailAndIfSoTheOwnerOfThatEmailNeedsToBeStabbedWithARustySpoon|". I think that's a safe separator to use in any file format. 



  • @boomzilla said:

    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.

    Four is not even a prime number. To do it properly you need to salt it and encode it in base64 or other cryptographically secure method. Some people really don't take security seriously.



  • @Faxmachinen said:

    @boomzilla said:

    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.

    Four is not even a prime number. To do it properly you need to salt it and encode it in base64 or other cryptographically secure method. Some people really don't take security seriously.


    ...don't forget XML; you'll need a SaltBridgeFactoryBuilder, too.


  • Considered Harmful

    @toon said:

    @Faxmachinen said:

    @boomzilla said:

    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.

    Four is not even a prime number. To do it properly you need to salt it and encode it in base64 or other cryptographically secure method. Some people really don't take security seriously.


    ...don't forget XML; you'll need a SaltBridgeFactoryBuilder, too.

    You guys are clueless about cryptography. The only truly secure method of encryption is a one-time pad. I suggest using 42 or some other hard-to-guess value as the pad.



  • @Faxmachinen said:

    To do it properly you need to salt it
     

    You mean salt the earth.



  • I've worked with Desire2Learn (the product referenced there) for about a year and it is truly awful at it's core. The product I built relied on their API for basic functionality (because the product itself is impossible to customize) and couldn't do basic things like "am I assigned to this course?". Instead they require you to parse through a list of all users assigned to the course and check to see if you are in that list. Other complaints include non-existent support (they told us to ask on Stack Overflow instead), 3 months to activate an API key and a complete lack of communication with their own paying customers (we did not own the license for the system, a client of ours already used their LMS and required us to use it as well).


  • Trolleybus Mechanic

    @aapis said:

    they told us to ask on Stack Overflow instead
     

    .......... And with that, we have The Most WTF-Dense Sentence of the month. That was quick. We can close down and take a breather. See y'all in October.



  • @joe.edwards said:

    The only truly secure method of encryption is a one-time pad. I suggest using 42 or some other hard-to-guess value as the pad.
    A quick roll of my D&D dice brought up a 4 and a 2, confirming that 42 is a random number.



  • @pbean said:

     What's wrong here? Almost all web servers use compression nowadays, so much less than the 2.5MB will actually be transmitted, especially because text compresses so well. And in fact 2.5MB is not really that much on todays computers, seeing most people have at least 512MB of RAM nowadays.


    In all seriousness, what's wrong is:

    1. They're broadcasting 70,000 email addresses. Email addresses don't need to be kept secret, but that doesn't mean you should give all of them away to everyone.
    2. They almost certainly don't check again server-side.



  • @anonymous235 said:

    @pbean said:
    What's wrong here? Almost all web servers use compression nowadays, so much less than the 2.5MB will actually be transmitted, especially because text compresses so well. And in fact 2.5MB is not really that much on todays computers, seeing most people have at least 512MB of RAM nowadays.

    In all seriousness, what's wrong is:

    1. They're broadcasting 70,000 email addresses. Email addresses don't need to be kept secret, but that doesn't mean you should give all of them away to everyone.

    Obviously they stole the idea from Community Server which includes the entire tag cloud in every page load.

     



  • @anonymous235 said:

    Email addresses don't need to be kept secret

    Jeez, dude, it's not October yet!



  • @pjt33 said:

    @boomzilla said:
    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.
    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.
     

    Now I wonder, is there any character that is illegal in an email address? I have always been convinced such a character does not exist.


  • Considered Harmful

    @pbean said:

    @pjt33 said:

    @boomzilla said:
    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.

    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.
     

    Now I wonder, is there any character that is illegal in an email address? I have always been convinced such a character does not exist.


    I'd wager control sequences like NUL and backspace aren't allowed, but someone's about to yell at me again for being too strict in my validation.



  • @ochrist said:

    Seen via Hacker News:



    Easily worthy of a front page article. An excellent study of how NOT to do something.



  • @joe.edwards said:

    I'd wager control sequences like NUL and backspace aren't allowed, but someone's about to yell at me again for being too strict in my validation.
    ASCII 00-31 are definitely not permissible.


  • Considered Harmful

    @barfoo said:

    @joe.edwards said:

    I'd wager control sequences like NUL and backspace aren't allowed, but someone's about to yell at me again for being too strict in my validation.
    ASCII 00-31 are definitely not permissible.


    I was referring to the time I got lambasted for demanding the domain part of an address contain at least one period character.



  • @pbean said:

    @pjt33 said:

    @boomzilla said:
    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.
    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.
     

    Now I wonder, is there any character that is illegal in an email address? I have always been convinced such a character does not exist.

     

    Is there a string that is not a valid email address?

    Does this regex fail to match something?

    (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
    \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
    ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
    \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
    31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
    ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
    (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
    (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
    ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
    r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
     \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
    ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
    )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
     \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
    )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
    *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
    |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
    \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
    \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
    ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
    ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
    ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
    :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
    :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
    :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
    [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
    \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
    @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
    (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
    :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
    \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
    \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
    ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
    :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
    ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
    .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
    ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
    [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
    r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
    |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
    00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
    .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
    ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
    :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
    ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
    ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
    ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
    ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
    \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
    ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
    ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
    :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
    \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
    [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
    ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
    ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
    ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
    ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
    @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
     \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
    ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
    )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
    (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
    \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
    "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
    *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
    +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
    .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
    ?:\r\n)?[ \t])*))*)?;\s*)

     

     



  • @Mcoder said:

    @pbean said:

    @pjt33 said:

    @boomzilla said:
    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.

    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.
     

    Now I wonder, is there any character that is illegal in an email address? I have always been convinced such a character does not exist.

     

    Is there a string that is not a valid email address?

    Does this regex fail to match something?

    [big monster]

    I'm sure that would look better if it were expressed in a more suitable form like BNF or a simple script that did the matching. Life doesn't end at regular expressions you know.


  • Considered Harmful

    @Mcoder said:

    @pbean said:

    @pjt33 said:

    @boomzilla said:
    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.

    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.
     

    Now I wonder, is there any character that is illegal in an email address? I have always been convinced such a character does not exist.

     

    Is there a string that is not a valid email address?

    Does this regex fail to match something?

    (?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))
    "(?:(?:
    \r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(
    ?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[
    \t]))"(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\0
    31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t])
    )(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+
    (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:
    (?:\r\n)?[ \t])))|(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)
    ?[ \t])
    )&lt;(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:
    r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[
    \t])
    )(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)
    ?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t]
    )))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[
    \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t])*
    )(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    )+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t]))))
    :(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
    |\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r
    \n)?[ \t])
    )(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:
    \r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t
    ]))
    "(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031
    ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](
    ?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?
    :(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?
    :\r\n)?[ \t])
    ))&gt;(?:(?:\r\n)?[ \t]))|(?:[^()<>@,;:\".[] \000-\031]+(?:(?
    :(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?
    [ \t]))"(?:(?:\r\n)?[ \t])):(?:(?:\r\n)?[ \t])(?:(?:(?:[^()<>@,;:\".[]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|
    \.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>
    @,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"
    (?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))
    "(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t]
    )(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\
    ".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?
    :[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[
    ]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))|(?:[^()<>@,;:\".[] \000-
    \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(
    ?:(?:\r\n)?[ \t]))
    "(?:(?:\r\n)?[ \t]))&lt;(?:(?:\r\n)?[ \t])(?:@(?:[^()<>@,;
    :\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([
    ^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\"
    .[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[
    ]\r\]|\.)](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".
    [] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]
    r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[]
    \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]
    |\.)
    ](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))?(?:[^()<>@,;:\".[] \0
    00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?:[^"\r\]|\
    .|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[^()<>@,
    ;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]]))|"(?
    :[^"\r\]|\.|(?:(?:\r\n)?[ \t]))
    "(?:(?:\r\n)?[ \t])))@(?:(?:\r\n)?[ \t])*
    (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".
    []]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t])(?:[
    ^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[]
    ]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t])))&gt;(?:(?:\r\n)?[ \t]))(?:,\s(
    ?:(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\
    ".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(
    ?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
    ["()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))
    "(?:(?:\r\n)?[ \t
    ])))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t
    ])+|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t]))(?
    :.(?:(?:\r\n)?[ \t])
    (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
    \Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))|(?:
    [^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".[
    ]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))
    "(?:(?:\r\n)?[ \t]))&lt;(?:(?:\r\n)
    ?[ \t])(?:@(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["
    ()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)
    ?[ \t])
    (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>
    @,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))(?:,@(?:(?:\r\n)?[
    \t])
    (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,
    ;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t]))(?:.(?:(?:\r\n)?[ \t]
    )(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\
    ".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t])))):(?:(?:\r\n)?[ \t]))?
    (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[["()<>@,;:\".
    []]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))"(?:(?:\r\n)?[ \t]))(?:.(?:(?:
    \r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[[
    "()<>@,;:\".[]]))|"(?:[^"\r\]|\.|(?:(?:\r\n)?[ \t]))
    "(?:(?:\r\n)?[ \t])
    ))@(?:(?:\r\n)?[ \t])(?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])
    +|\Z|(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)
    ](?:(?:\r\n)?[ \t]))(?:
    .(?:(?:\r\n)?[ \t])
    (?:[^()<>@,;:\".[] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    |(?=[["()<>@,;:\".[]]))|[([^[]\r\]|\.)](?:(?:\r\n)?[ \t])))&gt;(?:(
    ?:\r\n)?[ \t])
    )))?;\s)

     

     


    IIRC that doesn't correctly handle comments in the address.



  • @Mcoder said:

    @pbean said:

    @pjt33 said:

    @boomzilla said:
    TRWTF is using 4 asterisks to separate the addresses. There are much more secure numbers of asterisks, plus using more gives you better future proofing and optimization opportunities.
    Even better would be to separate with a string which can't legally occur in an e-mail address. I think that .". would do the trick.
     

    Now I wonder, is there any character that is illegal in an email address? I have always been convinced such a character does not exist.

     

    Is there a string that is not a valid email address?

    Does this regex fail to match something?

    [regexp puke]

     

     

    It is widely known that regular expression is not complete. As far as I understood, basically anything can be a valid email address, so the only way to validate it, is to send an email to it and see if it arrives.

    I would seem reasonable however that control characters would actually be illegal, but then again, a lot of those RFCs are not quite so reasonable in the details.

     

     



  • @joe.edwards said:

    @barfoo said:

    @joe.edwards said:

    I'd wager control sequences like NUL and backspace aren't allowed, but someone's about to yell at me again for being too strict in my validation.
    ASCII 00-31 are definitely not permissible.


    I was referring to the time I got lambasted for demanding the domain part of an address contain at least one period character.
    If domain names had to have at least one dot in them that would make sense. But since they don't your demand seems a bit arbitrary. It's like demanding that TLDs have at most four characters. Technically true, for the time being anyway, but why assume that?


  • Discourse touched me in a no-no place

    @toon said:

    It's like demanding that TLDs have at most four characters. Technically true, for the time being anyway, but why assume that?
    I present to you the .museum TLD…



  • @dkf said:

    @toon said:
    It's like demanding that TLDs have at most four characters. Technically true, for the time being anyway, but why assume that?
    I present to you the .museum TLD…

    Should have seen that coming...



  • The real WTF is the article: "I can store the valid emails in a data structure that allows O(1) membership testing".

    I assume he refers to hashmaps, but even they're not guaranteed to be O(1).



  • @joe.edwards said:

    @pbean said:
    Now I wonder, is there any character that is illegal in an email address? I have always been convinced such a character does not exist.

    I'd wager control sequences like NUL and backspace aren't allowed, but someone's about to yell at me again for being too strict in my validation.

    No, you're correct, but it's not as fun to suggest separating the strings with one of the control characters designed for separating strings. All printable ASCII characters are permitted, but some of them require that you quote the localpart (the part before the @), and of course double-quotes have to be escaped inside a quoted localpart. Curiously you also have to quote the localpart if it contains two consecutive ., which is the motivation for my suggested separator.



  • @Evo said:

    The real WTF is the article: "I can store the valid emails in a data structure that allows O(1) membership testing".

    I assume he refers to hashmaps, but even they're not guaranteed to be O(1).

    It's O(70000), which is equivalent to O(1).



  • @Evo said:

    The real WTF is the article: "I can store the valid emails in a data structure that allows O(1) membership testing".

    I assume he refers to hashmaps, but even they're not guaranteed to be O(1).

    O(1) data structures do exist, you know.
    Sure in some cases they'll be specialised data structures, but they do exist.


  • Considered Harmful

    @toon said:

    @joe.edwards said:
    @barfoo said:

    @joe.edwards said:

    I'd wager control sequences like NUL and backspace aren't allowed, but someone's about to yell at me again for being too strict in my validation.
    ASCII 00-31 are definitely not permissible.


    I was referring to the time I got lambasted for demanding the domain part of an address contain at least one period character.
    If domain names had to have at least one dot in them that would make sense. But since they don't your demand seems a bit arbitrary. It's like demanding that TLDs have at most four characters. Technically true, for the time being anyway, but why assume that?

    Because it's roughly thirteen quadrillion times more likely a user from the web will type , when he meant . than that he has an email on a local unqualified hostname or IPv6 address. Less than that, really, because the number of times the latter has happened is zero.



  • @joe.edwards said:

    @toon said:
    @joe.edwards said:
    @barfoo said:

    @joe.edwards said:

    I'd wager control sequences like NUL and backspace aren't allowed, but someone's about to yell at me again for being too strict in my validation.
    ASCII 00-31 are definitely not permissible.


    I was referring to the time I got lambasted for demanding the domain part of an address contain at least one period character.
    If domain names had to have at least one dot in them that would make sense. But since they don't your demand seems a bit arbitrary. It's like demanding that TLDs have at most four characters. Technically true, for the time being anyway, but why assume that?

    Because it's roughly thirteen quadrillion times more likely a user from the web will type , when he meant . than that he has an email on a local unqualified hostname or IPv6 address. Less than that, really, because the number of times the latter has happened is zero.

    What if your email address is on a TLD and the . key is broken on your keyboard? What if someone types in an IPv4 address with only three octets? Why is the email address specification so fucked up that checking an address's validity requires sending an email? That's like a car with a fuel gauge that only works while connected to a gas pump.



  • @Salamander said:

    @Evo said:
    The real WTF is the article: "I can store the valid emails in a data structure that allows O(1) membership testing".

    I assume he refers to hashmaps, but even they're not guaranteed to be O(1).

    O(1) data structures do exist, you know.
    Sure in some cases they'll be specialised data structures, but they do exist.

    Ooh, I know! Convert the all the email addresses to base32 or something and then make them fields in a struct. If the program compiles with an access to the requested email's field, it's a valid email address!


  • @Lorne Kates said:

    @aapis said:

    they told us to ask on Stack Overflow instead
     

    .......... And with that, we have The Most WTF-Dense Sentence of the month. That was quick. We can close down and take a breather. See y'all in October.

     

    At leats the message it somehwat helpful. The replies I get from one of our suppliers is usually:

    "This ticket has not seen any activity in the last 3 days and was closed automatically.

     


  • Considered Harmful

    @Ben L. said:

    Ooh, I know! Convert the all the email addresses to base32 or something and then make them fields in a struct. If the program compiles with an access to the requested email's field, it's a valid email address!

    Reminds me of Gödel's tricks with mathematical isomorphisms.



  • @toon said:

    If domain names had to have at least one dot in them that would make sense.
    According to ICANN regulations. you can't have a domain name without a dot.


  • Considered Harmful

    @El_Heffe said:

    @toon said:

    If domain names had to have at least one dot in them that would make sense.
    According to ICANN regulations. you can't have a domain name without a dot.


    You can, however, have a domain part of an email address that is not a domain name. Which was what the pedantic dickweeds were bitching about in my validation. Even though NOBODY DOES THAT.



  • @Ben L. said:

    What if your email address is on a TLD
    Not possible.

    @Ben L. said:

    Why is the email address specification so fucked up that checking an address's validity requires sending an email?

    Mostly because people don't follow the spec.

  • Considered Harmful

    @Sutherlands said:

    @Ben L. said:
    What if your email address is on a TLD
    Not possible.

    There are 21[1] TLDs with MX records.



  • @joe.edwards said:

    @Sutherlands said:
    @Ben L. said:
    What if your email address is on a TLD
    Not possible.

    There are 21[1] TLDs with MX records.
    Well screw those people, in particular...


  • Considered Harmful

    How about we write a new email spec that defines an email address to be what 99.9% of people think an email address is.



  • @El_Heffe said:

    According to ICANN regulations. you can't have a domain name without a dot.

    The real question is, why do TLDs even exist? Apart from serving as a reliable source of income for ICANN. "We need more money. Announce a new TLD and everyone will rush to buy companyname.new1, companyname.new2, companyname.new3..."

    Not all .com's are commercial. Not all .org's are organizations. A whole bunch of country TLDs are chosen just for vanity hostnames or cheap registrars. Just nuke the whole concept of TLDs being a restricted set so that a single http://thedailywtf/ is sufficient.


  • Considered Harmful

    @Arnavion said:

    Just nuke the whole concept of TLDs being a restricted set so that a single http://thedailywtf/ is sufficient.

    My website is http://hell/. I give away money to visitors. I put up a big billboard that says "If you want my money then go to hell", but no one is visiting my website. What could be wrong?


  • Considered Harmful

    @Arnavion said:

    @El_Heffe said:
    According to ICANN regulations. you can't have a domain name without a dot.

    The real question is, why do TLDs even exist? Apart from serving as a reliable source of income for ICANN. "We need more money. Announce a new TLD and everyone will rush to buy companyname.new1, companyname.new2, companyname.new3..."

    Not all .com's are commercial. Not all .org's are organizations. A whole bunch of country TLDs are chosen just for vanity hostnames or cheap registrars. Just nuke the whole concept of TLDs being a restricted set so that a single http://thedailywtf/ is sufficient.

    Actually, one link deeper into El_Heffe's link gives the reason. [quote user="IAB Statement: Dotless Domains Considered Harmful"]
    Unfortunately, dotless domains will not work as intended by TLD operators in the vast majority of cases. As recommended by IETF standards track RFCs, existing deployed systems apply a search list to single-label names prior to attempting to resolve them. As a result, the resolution of dotless domains depends on local configuration such as the search list. For example, in a location where “example.com” is included within the search list, the URL http://printer1/ will generate a query for “printer1.example.com”, whereas in a location where “example.net” is in the search list, it will generate a query for “printer1.example.net”.

    This behavior was developed in the DNS precisely because most users entering single-label names want them to be resolved in a local context, and they do not expect a single name to refer to a TLD. The behavior is specified within a succession of standards track documents developed over several decades, and is now implemented by hundreds of millions of Internet hosts. This standard approach enables single-label names to be conveniently used as shortcuts to hosts within a local administration, while also shielding the root zone from a potentially excessive number of queries for single-label names. Since the configuration of the search list has security implications, it is under the control of local host and network administrators, and completely outside the control of TLD operators.

    Since dotless domains will not behave consistently across various locations (and applications and platforms that may have different search list configuration mechanisms), they have the potential to confuse users and erode the stability of the global DNS. By attempting to change expected behavior, dotless domains introduce potential security vulnerabilities. These include causing traffic intended for local services to be directed onto the global Internet (and vice-versa), which can enable a number of attacks, including theft of credentials and cookies, cross-site scripting attacks, etc. As a result, the deployment of dotless domains has the potential to cause significant harm to the security of the Internet.

    [/quote]


  • @Sutherlands said:

    Mostly because people don't follow the spec.

    Which is the right way to do. Standards should be followed only if they're sane enough. When they get to the point where you can put FUCKING COMMENTS in an email address, you make a new standard yourself and convince people to adopt it (which should be easier than convincing them to adopt the first one).



  • @Arnavion said:

    @El_Heffe said:
    According to ICANN regulations. you can't have a domain name without a dot.

    The real question is, why do TLDs even exist? Apart from serving as a reliable source of income for ICANN. "We need more money. Announce a new TLD and everyone will rush to buy companyname.new1, companyname.new2, companyname.new3..."

    Not all .com's are commercial. Not all .org's are organizations. A whole bunch of country TLDs are chosen just for vanity hostnames or cheap registrars. Just nuke the whole concept of TLDs being a restricted set so that a single http://thedailywtf/ is sufficient.

    It seemed like a good idea at the time. Trying to categorize everything is tempting, but ultimately it tends to fail or be more annoying than useful. TLDs are the perfect example of that, but there's also MIME types (I'd have skipped the whole category/type thing and just make it a magic string). Also similar to how nobody uses the ASCII control characters, and instead use combinations of visible characters to do the same thing, but in a more flexible way.



  • @joe.edwards said:

    Actually, one link deeper into El_Heffe's link gives the reason.

    Yes, I am aware of how hostname resolution works. What you quoted is not a reason - it is a justification of the current system. There is no reason that cannot be inverted. Rather than make 50 special TLDs so that non-TLD-qualified hostnames are local, there should be one TLD (say, .local) for local hostnames and everything else becomes an internet hostname.


Log in to reply