Anti-spam take on a clbuttic mistake



  • An acquaintance of mine was "really looking forward to" this film, and I'd never heard of it, so I checked it out, and discovered this. It appears that a) at.com is a URL shortening service, and b) Fandango uses regular expressions in its anti-spam engine without any niceties like escaping the period in .com. Also, they still get spammed.



  •  Could you maybe have highlighted the WTF? I don't see it...


  • Discourse touched me in a no-no place

    3rd review, centre bottom

    somewhat.com[BLOCKED WEBSITE]pelling


  • @PJH said:

    3rd review, centre bottom
    somewhat.com[BLOCKED WEBSITE]pelling
     

     

    The irony:

     

    see <address>

    see <address>

     

    bla bl[BLOCKED WEBSITE]lydoo



  •  "contains some sensuality and violence" is definitely up there in the clbuttic class too...


  • Discourse touched me in a no-no place

    I once saw "contains furriness and mild squeaking" (or similar - Google ain't being helpful) once. Think it was meant to be a joke. Was Meerkat Manor or some cartoon film.



  • @blakeyrat said:

     Could you maybe have highlighted the WTF? I don't see it...

    Yes, you could have just shot the Recent Fan Reviews bit. The rest is just noise.


  • @PJH said:

    3rd review, centre bottom
    somewhat.com[BLOCKED WEBSITE]pelling
     

    Except, as OP pointed out, it wasn't "what.com" as you have there; it was "somewhat compelling", and the space matched the period in the regex "at.com" because they forgot to escape it so it got treated as a metachar.



  • This also reminds me of some forums where people couldn't have a discussion of so[b]cialis[/b]m or mention being a spe[b]cialis[/b]t because of spam filters.



  • League of Legends has a chat client pre game. They have filters for words like

     

    Fuck

    Hell, Damn <-- really???

     

    But these filters don't only look for whole words so:

    motherfucker would be mother****er, if motherfucker was not in the list, which it is. However it leads to

    ****o instead of hello.

     

    I guess give em credit, it is the most intense filtering I've seen in a game, good for the lill' ones. Not to say they filter f-uck or f u c k or anything of that sort. Nothing intelligent.



  • @astonerbum said:

    League of Legends has a chat client pre game. They have filters for words like

     

    Fuck

    Hell, Damn <-- really???

     

    But these filters don't only look for whole words so:

    motherfucker would be mother****er, if motherfucker was not in the list, which it is. However it leads to

    ****o instead of hello.

     

    I guess give em credit, it is the most intense filtering I've seen in a game, good for the lill' ones. Not to say they filter f-uck or f u c k or anything of that sort. Nothing intelligent.

     

     

    NASCAR 2003 had something similar, but it tried to do f-uck and similar as well, and it would replace the entire string with '@#$!@^@'. I actually saw a guy try to say that he was from Scunthorpe, believe it or not, and he kept saying, "I'm from @#$^)(@#%."

    The aggressiveness of the filter resulted in things getting censored inappropriately and looking obscene when they weren't - "He's hit the wall" turned into "@#$#@! the wall", for instance. Good stuff.



  • @PeriSoft said:

    "He's hit the wall" turned into "@#$#@! the wall", for instance. Good stuff.
     

    My old uni had an IRC server with a bot that banned users for similar things. it even included the words "ass" in the block list. So you couldn't talk about assignments or other mundane things. However, the bot was stupid and would actually kick itself for flooding (that just required a few people constantly swearing for a few minutes), then come back without ops so it then couldn't kick/ban anyone else. It also kicked/banned one of my friends on sight, even though she never swore or partook in the flood kicks, and that was the only user that was banned.



  • @PJH said:

    I once saw "contains furriness and mild squeaking" (or similar - Google ain't being helpful) once. Think it was meant to be a joke. Was Meerkat Manor or some cartoon film.
     

    The phrase was "contains scenes of a furry nature and mild squeaking". Love it as it shows up the statements that they put on the films. As if they think that makes a difference to who views the film. I think most people can understand what is in a film from watching the preview, or sometimes reading the title.



  • @JonAxtell said:

    @PJH said:

    I once saw "contains furriness and mild squeaking" (or similar - Google ain't being helpful) once. Think it was meant to be a joke. Was Meerkat Manor or some cartoon film.
     

    The phrase was "contains scenes of a furry nature and mild squeaking". Love it as it shows up the statements that they put on the films. As if they think that makes a difference to who views the film. I think most people can understand what is in a film from watching the preview, or sometimes reading the title.

    Gaah! Watch out, or you'll attract morbiuswilters.



  • Ham-fisted attempts at censorship always take me back to this bash quote. I don't know if it's real, but it does serve as a good example of why automated censorship is always a bad idea.



  • Typing "somewhat.compelling" is the WTF in my mind.

    And I love that particular Bash quote - the fact that the auto-kick script can't spell "don't" correctly leads me to wonder how many correct words would get censored, and vice versa.


  • Discourse touched me in a no-no place

    @The_Assimilator said:

    Typing "somewhat.compelling" is the WTF in my mind.
    I made that suggestion based on the broken ellipsis also present in the same review.



  • @Faxmachinen said:

    Ham-fisted attempts at censorship always take me back to this bash quote. I don't know if it's real, but it does serve as a good example of why automated censorship is always a bad idea.

     

    I saw something similar happen in a Christian channel, back in the day, so it's more than plausible. Do people just hang out on random channels in efnet anymore? It's been a long time since I've used anything put a purpose-specific IRC channel on a small server network...



  • @The_Assimilator said:

    Typing "somewhat.compelling" is the WTF in my mind.

    You don't think it's possible the site used /at.com/ instead of /at.com/ to search for the URL?




  • This should have made the front page. It's better than any of the front page stories in a while.



    * Big commercial website

    * Common problem (filtering user-generated content)

    * Common tools misused (regex written by somebody who doesn't know that a dot is a character match operator = clownshoes)

    * The juxtaposition of the two spam spam posts and then the mangled-but-legit post is amazing



    My question is this: why block site names at all? You're never going to capture all of the various permutations, like 'visit mysite [dot] uk [slash] thehotness', but you could trivially block tags in user posts.



  • @savar said:


    My question is this: why block site names at all? You're never going to capture all of the various permutations, like 'visit mysite [dot] uk [slash] thehotness', but you could trivially block tags in user posts.

    I don't think it's tags so much as people pasting http://example.com/my_url_here in the middle of a post - and if you have to use a [dot] notation you're not just unclickable, you're much less copypastable.



  • @fennec said:

    @savar said:

    My question is this: why block site names at all? You're never going to capture all of the various permutations, like 'visit mysite [dot] uk [slash] thehotness', but you could trivially block tags in user posts.

    I don't think it's tags so much as people pasting http://example.com/my_url_here in the middle of a post - and if you have to use a [dot] notation you're not just unclickable, you're much less copypastable.

    Right.  Also, a lot of spam is for the purposes of getting PageRank.  The "[dot] notation" breaks this.



  • @morbiuswilters said:

    Right.  Also, a lot of spam is for the purposes of getting PageRank.  The "[dot] notation" breaks this.
    This is why rel="nofollow" exists.



  •  @snover said:

    @morbiuswilters said:

    Right.  Also, a lot of spam is for the purposes of getting PageRank.  The "[dot] notation" breaks this.
    This is why rel="nofollow" exists.

    Yes, but we're talking about spammers here.  They aren't geniuses.

     



  • @morbiuswilters said:

    Yes, but we're talking about spammers here.  They aren't geniuses.
    It’d be done by the site itself. If they have processing in place to remove “blocked” addresses, they could have processing in place to ensure any links also get the rel="nofollow" attribute.


Log in to reply