Google Indexing



  • I recently changed hosting providers, so have been paying attention to my access logs for broken links and other strangeness. And started seeing a bunch of entries like this:

    GET /?page=java.byteBuffer
    The real URL is supposed to look like this:
    GET /index.php?page=java.byteBuffer

     OK, there are a bunch of broken links to my site out in the wild; I don't really care about them. But in this case I was seeing dozens of these entries. Then I noticed the referer:

    https://www.google.com/
    In fact, ever single one of the malformed links had Google (or one of its international variants) as referer. I decided to look for myself, Googling "java bytebuffer":

     clip from Google search results
    WTF?!?

    I asked on the Google Webmasters group, and got the following reply (no idea whether the replier is from Google):

    Google algorithms generally do a comparison by requesting (e.g.) domain.com/ and domain.com/index.php  and if the content is the same, they will select the domain root URL without the redundant default file
    WTF?!?

    I gave them a sitemap that includes "index.php" on every URL. It's there on the "rel='canonical'" tag that they suggest I use. But because my old provider mentioned index.php in their DirectoryIndex config (something that they didn't do when I first built the site), Google decided that my URLs had a redundant component and helpfully removed it. Luckily, my new provider gives me an .htaccess.

    Suggestion for webmasters: name your default file "doorknob.php" or "doorknop.html": it may be dumb, but at least Google is unlikely to decide it's redundant.

     



  •  no you should emit a 301 pointing to the index.php



  • Is there a good reason why / and /index.php do not behave in the same way?


    I'd expect Google to follow what I told it to; but I'd also expect that the default page could be visited successfully on either URL


  • Discourse touched me in a no-no place

    @scudsucker said:

    Is there a good reason why / and /index.php do not behave in the same way?
    Yes, if / doesn't resolve to index.php.



  • @PJH said:

    ]Yes, if / doesn't resolve to index.php.



    I do understand this, but I want to know why anyone would intentionally have an index.php that does not serve the same content as the default served by /



  • @PJH said:

    @scudsucker said:
    Is there a good reason why / and /index.php do not behave in the same way?
    Yes, if / doesn't resolve to index.php.
     

     

    Except google's algorithm only selects / rather than index.php if it finds that they serve the same content.

     

     

     


  • Discourse touched me in a no-no place

    @scudsucker said:

    @PJH said:
    ]Yes, if / doesn't resolve to index.php.



    I do understand this, but I want to know why anyone would intentionally have an index.php that does not serve the same content as the default served by /
    Because if, for whatever reason, you also have an index.html and DirectoryIndex index.html index.php is specified.



  • @PJH said:

    Because if, for whatever reason, you also have an index.html and DirectoryIndex index.html index.php is specified

     

    On shared hosting you don't always get a choice.

    When I created the site six years ago, my previous provider did not recognize index.php as a default page, so I made an index.html that did a client-side redirect. At some point in the intervening time, they decided to treat index.php as a default (nice of them, considering that PHP was the only scripting language they supported). 



  • @PJH said:

    @scudsucker said:
    Is there a good reason why / and /index.php do not behave in the same way?
    Yes, if / doesn't resolve to index.php.

    Well derp.

    The question is: why would anybody set it up that way?


  • Discourse touched me in a no-no place

    @blakeyrat said:

    @PJH said:
    @scudsucker said:
    Is there a good reason why / and /index.php do not behave in the same way?
    Yes, if / doesn't resolve to index.php.

    Well derp.

    The question is: why would anybody set it up that way?

    Already answered in a subsequent post.


  • I've always been of the opinion that users shouldn't have to know or care the site is running PHP. So why have it specified in the URL? The first thing I always do when cleaning up a codebase is strip all those pesky extra characters from links and configure the server so they aren't needed.

    Likewise, why should the homepage for coolsite.com be coolsite.com/index.omgwtfbbq and not just coolsite.com?



  •  So where the fuck did this post come from, and why does replying to it send that reply into oblivion?


  • sekret PM club

    Sam discovered the magic of signature trickery today.



  • @e4tmyl33t said:

    Sam discovered the magic of signature trickery today.
    So what happens if I do this? [mod - logout abuse removed - PJH]



  • THREAD SUMMARY

    • Hey, why does Google not show index.php on my index.php page?
    • Oh because Google is smarter than me


  • @SamC said:

    @e4tmyl33t said:
    Sam discovered the magic of signature trickery today.
    So what happens if I do this?



  • @SamC said:

    So what happens if I do this?

    I vaguely recall seeing that mentioned in an old thread where everyone was trying to break CS as hard as possible. And yeah, it's does work and is fairly annoying.



  • You made me log out and realize my former password doesn't work, now I had to create a new user. Hope you're happy.



  • Have you considered using a password manager?



  • @Salamander said:

    Have you considered using a password manager?

    What browser doesn't have a password manager?

     



  • @El_Heffe said:

    What browser doesn't have a password manager?

    Having and using are not the same thing.



  • @Salamander said:

    @SamC said:
    So what happens if I do this?

    I vaguely recall seeing that mentioned in an old thread where everyone was trying to break CS as hard as possible. And yeah, it's does work and is fairly annoying.

    Filed under: Adblocking the logout page fixes it though.

    I have Adblock Plus, and I still keep getting logged out.


  • @anonymous235 said:

    You made me log out and realize my former password doesn't work, now I had to create a new user. Hope you're happy.

     

    My work here is done.

     

     

    Oh hell, [url=http://forums.thedailywtf.com/forums/t/20943.aspx?PageIndex=2#342587]thats[/url] where that post went—

    @FrostCat said:

    @HardwareGeek said:

    @FrostCat said:

    @serguey123 said:

    [quote user="blakeyrat"]@serguey123 said:

    Blakeyrat, cool off man,
    Sorry for being "uncool", but I use a handle on this site for a reason. Don't be an asshole.

    I'd appreciate it if a mod could remove my name there.

    Remove it myself, sorry about that, but if you want to remain anonymous, you should really check what you put on those webforms, it takes like 1 minute of googling, I thought you were aware that pretty much everybody can know who you are if they care, FYI, my handle is my name except for the numbers so...

    Given that he's previously provided links to multiple other sites where
    he's used his real name, he probably is well aware that people can find
    out who he is. You crossed a line there. Good for you, fixing your
    mistake, but it was still an asshole thing to do.

    [b]TRWTF is
    necroing this thread after 3 years, thanks to signature guy sending
    replies to random posts.[/b]

    However, I learned something from this. I
    found out who blakey is, and that he does not live in the same city I
    do; he lives in the next city over. I also learned that Network
    Solutions is willing to sell me the domain ratsmating.com. For only
    $475. Uh, no thanks.

     

    Oops. Well, it's his fault, not mine, for necroing the thread. I mean,
    how far back up the post chain should i look at the dates?[/quote] 


     

     


Log in to reply