Web Filter/Proxy



  • I'm normally just a lurker in forums but this seemed worth signing up to tell you guys about.


    I work at a school, and we pay one company to act as ISP, content filtering, email provider, web host, etc.


    The filtering is pretty severe at times, obviously we have to protect kids from all sorts of things, so sometimes they block a site that is purely educational. Sometimes the reverse happens and we need a site blocked ASAP. Either way there is a system in place to deal with this




    The old system (which we used up until very recently) was a pain. The system for changing whether a site was blocked was as follows: we phoned the company, gave them a URL and asked for it to be either blocked or unblocked. After "48" hours (maybe it was working hours, I dunno, but it never happened within 2 days) the relevant site would be allowed or blocked as requested.




    Also, with the old system, if you didn't use the proxy server they told you to, the filter just plain didn't work. This is probably more of a WTF on our side than theirs, but we had kids bringing in firefox on a memory stick without the proxy configured, and just browsing to whatever site they wanted. And the company, when informed, just shrugged and said "It doesn't work with Firefox."




    So anywho, they came out with this great new self-healing, adaptive, sounds like it might become skynet type filter. And then they email us to tell us on the very day they deploy it. And they take the old system offline. And if you try to use the old proxy, you get a page telling you to set up for the new proxy. Fortunately there were only a few machines in the finance department that desperately needed the change so we did get our wages that month.




    So after changing each of the finance machines and pushing the changes to the workstations on the main network everything was fine for at least 2 hours. At which point we discovered that the new filter had 'forgotten' all the extra blocked and allowed sites we had requested over the last 8 years. We are still working on getting a list of them all, since apparently our lovely ISP has no record.




    Finally, today, every search anyone did on google required anti-spam validation (CAPCHA) when done through the proxy. But direct connections worked fine. I can only assume google saw all the requests(presumably from every school who uses this ISP) coming from the same IP and thought that thousands of requests a second from one address was a bit odd. Unless someone did actually hack the adaptive, self-healing wonder filter and use it to send spam.



  • @reverse_Atomic_roger said:

    I can only assume google saw all the requests(presumably from every school who uses this ISP) coming from the same IP and thought that thousands of requests a second from one address was a bit odd.
     

    It's got to be more than that... there are many large organisations and even ISPs that use proxies such that all their requests come from one (or a small number of) IP(s)... Maybe ISPs are supposed to register their proxies with Google or something?



  • @reverse_Atomic_roger said:

    Unless someone did actually hack the adaptive, self-healing wonder filter and use it to send spam.

    I find this alternative more likely.  I've heard of companies finding out they had infected systems within their network because Google started requiring capchas.  (Well, technically, they found that out because they'd gotten bounce messages from my company telling them to contact our postmaster (the one whitelisted address, per RFC), who told them, "You're on an RBL for infected systems."  In the ensuing conversation, it came out that Google started requiring capchas from them hours, days, or weeks before they tried sending us email.

    Most of the time I've seen adaptive, self-healing wonders, they don't necessarily do so well at blocking spam.  At least one of them adaptively, self-healed around spam blocks instead of adaptively propagating them.



  • @mallard said:

    @reverse_Atomic_roger said:

    I can only assume google saw all the requests(presumably from every school who uses this ISP) coming from the same IP and thought that thousands of requests a second from one address was a bit odd.
     

    It's got to be more than that... there are many large organisations and even ISPs that use proxies such that all their requests come from one (or a small number of) IP(s)... Maybe ISPs are supposed to register their proxies with Google or something?

    Nah, I don't think so.  I think what Google is doing is trying to discriminate between real web surfers and malware doing automated searches by looking to see if the HTTP request contains the full set of headers typically sent by a browser, or if it's a minimal or inconsistent set of headers such as might come from a crudely-hacked malware.

    Uh, so I thought.  But then I tested it, and it didn't work like that: even crudely telnetting into port 80 and sending an HTTP GET request with only a Hosts: header works fine...

    .... or .... well ...

    I think I found the real WTF.

    This is me, telnetting into google and issuing a search for "fnarr fnarr".  Note how crude it is: no User-Agent or Referer, even, just the absolute minimum to specify the content I'm requesting.


    DKAdmin@ubik /cygdrive/f/xss
    $ telnet www.google.co.uk 80 2>&1 | tee http11.log
    Trying 66.249.91.103...
    Connected to www.google.co.uk.
    Escape character is '^]'.
    GET /search?hl=en&q=fnarr+fnarr&btnG=Google+Search&meta= HTTP/1.1
    Host: www.google.co.uk

    HTTP/1.1 200 OK
    Cache-Control: private, max-age=0
    Date: Thu, 11 Dec 2008 18:12:01 GMT
    Expires: -1
    Content-Type: text/html; charset=ISO-8859-1
    Set-Cookie: SS=Q0=Zm5hcnIgZm5hcnI; path=/search
    Set-Cookie: PREF=ID=cfb3db35e8e979df:TM=1229019121:LM=1229019121:S=-5pppb5ZBQHls
    GdG; expires=Sat, 11-Dec-2010 18:12:01 GMT; path=/; domain=.google.co.uk
    Set-Cookie: NID=17=EfDPZvrTnM49lJ6N4ep66Pvi7sUcGPFCclRLUk9-ySYaHQ5gTyluTdowFL_E1
    9zcklav3gH3zjL2VdIzcSKpMtnNZydCYMOtHcy4ZdhfdfZnS32yraOA2oQkcO8Mzrkk; expires=Sat
    , 11-Dec-2010 18:12:01 GMT; path=/; domain=.google.co.uk
    Server: gws
    Transfer-Encoding: chunked

    1ea0

    <!doctype html><head><title>fnarr fnarr - Google Search</title><style>body{backg[ ... snip ... ]

    OK, so that disproved my theory, so I thought I'd try going one cruder, and use HTTP 1.0 (no Host: header) instead:

    DKAdmin@ubik /cygdrive/f/xss
    $ telnet www.google.co.uk 80 2>&1 | tee http10.log
    Trying 66.249.91.103...
    Connected to www.google.co.uk.
    Escape character is '^]'.
    GET /search?hl=en&q=fnarr+fnarr&btnG=Google+Search&meta= HTTP/1.0

    HTTP/1.0 200 OK
    Cache-Control: private, max-age=0
    Date: Thu, 11 Dec 2008 18:12:32 GMT
    Expires: -1
    Content-Type: text/html; charset=ISO-8859-1
    Set-Cookie: SS=Q0=Zm5hcnIgZm5hcnI; path=/search
    Set-Cookie: PREF=ID=6f7f328c6ae67410:TM=1229019152:LM=1229019152:S=G8HBc4UYTrb00
    5E8; expires=Sat, 11-Dec-2010 18:12:32 GMT; path=/; domain=.google.com
    Set-Cookie: NID=17=iEz8lBNFv_FxUh0yEZzxsa-3I_4pj_qCoFvjjcbdFNXbw4h4Q1FiBU5L_oSZo
    jyDQGsMruxxLEmR-totr21dGNeRHLDOiHE9m5g2nJkK9T61cp-a2h7IZpS-EMiS3udX; expires=Sat
    , 11-Dec-2010 18:12:32 GMT; path=/; domain=.google.com
    Server: gws
    Connection: Close

    <!doctype html><head><title>fnarr fnarr - Google Search</title><style>body{backg[ ... snip ... ]

     

    And now for the *really* interesting bit: 

     

    DKAdmin@ubik /cygdrive/f/xss
    $ grep -c 'stormfront.org' http1?.log
    http10.log:1
    http11.log:0

    DKAdmin@ubik /cygdrive/f/xss

    So, the real WTF is that if you use HTTP1.0 to access Google, it returns otherwise-hidden search results pointing to Nazi / White supremacist websites?  May I be the frist to say

    <font color="#ff0000" size="7">WTF?!!!?</font> 

     

    [ For reference and comparison, I uploaded the html source returned when using FF to do a google search (from the "view source" window), and the html returned from both http1.0 and http1.1 requests:

    real page source from browser:


    http1.0 page source:
    http1.1 page source:   ]
     



  • more likely you got a different data store on the second search request - they don't keep all their data stores precisely in sync.

     

    each of their datacenters contains one or more data stores which are comprised of a large number of systems - it could be even on the same data store the first time one of the systems didn't respond fast enough for the content rendering node and its results got excluded for the sake of fast response.

     

    (i interviewed for a position with google SRE last spring, did my homework on their datacenter architecture)



  • @Kazan said:

    more likely you got a different data store on the second search request - they don't keep all their data stores precisely in sync.

     

    each of their datacenters contains one or more data stores which are comprised of a large number of systems - it could be even on the same data store the first time one of the systems didn't respond fast enough for the content rendering node and its results got excluded for the sake of fast response.

     

    (i interviewed for a position with google SRE last spring, did my homework on their datacenter architecture)

    Awww, a common-sense explanation.  You're no fun!  [ puts away tinfoil hat ]

     



  • @DaveK said:

    Awww, a common-sense explanation.  You're no fun!  [ puts away tinfoil hat ]

    Is this your 'thang'? To just go from thread to thread and make some comment that includes some tired cliche or internet meme?



  • @Farmer Brown said:

    @DaveK said:
    Awww, a common-sense explanation.  You're no fun!  [ puts away tinfoil hat ]
    Is this your 'thang'? To just go from thread to thread and make some comment that includes some tired cliche or internet meme?

    I realize I'm feeding the troll here, but I'm glad someone appears to be as sick of the memes as I am.



  • @Soviut said:

    I'm glad someone appears to be as sick of the memes as I am.

    Well timed memes can still be a little funny, but spamming them across every thread is never funny, especially if they don't even have any relevance.



  • @Farmer Brown said:

    @Soviut said:
    I'm glad someone appears to be as sick of the memes as I am.
    Well timed memes can still be a little funny, but spamming them across every thread is never funny, especially if they don't even have any relevance.
    But what if you put the memes on a wooden table?



  • @bstorer said:

    @Farmer Brown said:
    @Soviut said:
    I'm glad someone appears to be as sick of the memes as I am.
    Well timed memes can still be a little funny, but spamming them across every thread is never funny, especially if they don't even have any relevance.
    But what if you put the memes on a wooden table?
    In SpectateSwamp Desktop Search, you can access a list of memes using the MML command. Beware though; aliens may have stolen the meme server!



  • @TwelveBaud said:

    In SpectateSwamp Desktop Search, you can access a list of memes using the MML command.

    Oh, I'm sure you meant the mmmmm commands. By the way, that stands for meme meme meme meme meme.



  • @TwelveBaud said:

    In SpectateSwamp Desktop Search, you can access a list of memes using the MML command.
     

    In Soviet Russia, list of memes accesses YOU! 



  • @Farmer Brown said:

    Well timed memes can still be a little funny, but spamming them across every thread is never funny, especially if they don't even have any relevance.

    This is from the tubby jackass who has 3 jokes he spams to IRC night-and-day.  Oh, sweet irony.... 



  • @Farmer Brown said:

    @Soviut said:
    I'm glad someone appears to be as sick of the memes as I am.
    Well timed memes can still be a little funny, but spamming them across every thread is never funny, especially if they don't even have any relevance.
     

    OMG zee googles do nathing!  can i haz cheezburger?  all your base! hay chibi san! kekeke ^_^ FAIL LOL EPIC FAIL U FAIL!

    The whole "fail" meme just smacks of student rails programmers.



  • @derula said:

    @TwelveBaud said:
    In SpectateSwamp Desktop Search, you can access a list of memes using the MML command.

    Oh, I'm sure you meant the mmmmm commands. By the way, that stands for meme meme meme meme meme.

    Don't forget, it can return random memes and play them in slow motion!



  • @Soviut said:

    Don't forget, it can return random memes and play them in slow motion!
     

    Yes, but only with the -killbrain flag



  • @dtech said:

    @Soviut said:

    Don't forget, it can return random memes and play them in slow motion!
     

    Yes, but only with the -killbrain flag

    And the VB5 runtime DLL.



  • @morbiuswilters said:

    Oh, sweet irony....

    Nice to see my fan still follows me around.


Log in to reply