FORBIDDENed swearword proofing



  • Apparently, TV.com  has a sub-par swearword filter on its articles. It seems to be supposed to replace forbidden words with... FORBIDDEN. Check it out at:

    http://www.tv.com/south-park/lice-capades/episode/1000646/recap.html

     

    As you can see, the algorithm seems to be affiliated with the highly efficient... String.Replace (or whatever passes for that in the language back-end of the site).

    What I don't understand is of the hell this is supposed to work. I can understand what part of "classroom" might trigger the filter, but then why is it replaced by FORBIDDENoom ???? (or do I miss somehting ?)



  • Another FORBIDDENic mistake...



  • The code must be really weird. At first I thought it was filtering the whole word in which the curse word appeared, so you get "FORBIDDEN" instead of "clFORBIDDEN". It also appears to have a +1 character bug so "class had" loses the space and becomes "FORBIDDENhad". Strange. The +1 bug appears later on when "classroom" is replaced by "FORBIDDENoom", but now the whole word wasn't filtered. The filter appears to find the curse word, then filter out the entire part of the word before and including the curse, but leaves the remainder of the word.

    The real WTF is that somebody obviously realized that a simple string replacement wasn't good enough, thought about the problem, then came up with this "solution". Sounds to me like a highly-paid consultant worked on it. At least when it's a string replacement you know the programmer didn't even think about the problem so the possibility exists that he or she is capable of producing a good filter.



  • I love the way it filters out the amazingly offensive "ass" but totally ignores "minge". I wonder how it feels about Scunthorpe. Also, it totally ignores another instance of "class" near the bottom of the article.
     



  • There is something else going on, since the word "class" does appear somewhere on the page (4th line from below). So the substitution algorithm is even weirder than you might have thought. Perhaps (PERHAPS!) they have a blacklist of all FORBIDDEN words plus a regexp-scanner for use of these words in compounds and they do not operate in the same way. Just a guess.



  • @TGV said:

    There is something else going on, since the word "class" does appear somewhere on the page (4th line from below). So the substitution algorithm is even weirder than you might have thought. Perhaps (PERHAPS!) they have a blacklist of all FORBIDDEN words plus a regexp-scanner for use of these words in compounds and they do not operate in the same way. Just a guess.

     

    I did not see that one when I posted it. It really does seem like it's filtering according to some god-only-knows awful algorithm (or godawful monstruous side-effect infected regex nobody could ever maintain).

    String.Replace (or whatever your preferred language offers for that) just doesn't cut it to obatin that kind of result.

    Another option is that the poster wanted to have fun, or to give the "bleeping" effect TV networks are so fond of these days (particularly annoying on something like South Park, since it bleeps like there's no tomorrow) . That would explain the definitively random-looking effect like forgetting one in the middle.

    I did a quick search and found other episodes where the word "classroom" is not affected. Might be that this is actually not a tech-WTF... So sad, it seemed like a nice candidate.
     


  • Considered Harmful

    @misha said:

    I love the way it filters out the amazingly offensive "ass" but totally ignores "minge". I wonder how it feels about Scunthorpe. Also, it totally ignores another instance of "class" near the bottom of the article.
     

    I've never heard the term "minge" before...  And I've been known to swear like a sailor.  I wonder if it's because I'm too young, or if it's a matter of regional dialect.  Anyhow, I've learned a new vulgar term today, and for that I am thankful.

    If I was writing the algorithm ( which I would never do [I'm strongly against any form of censorship] ), then I surely would have missed that one as well.


Log in to reply