Search Engine Optimization and it's Follies

mstahl

I work at a certain design and marketing firm with some fairly big clients. Though each one of those clients has its own little nuances, this particular project on which I'm working right now has been a special kind of comedy of errors.

This particular client, though usually kind of finicky about their web development, has recently been taken in by the Internet's equivalent of snake oil sale: search engine optimization. So I'm doing a complete re-design of their entire corporate website, and during the process of building the site we have to deal with this SEO guy. I know what you're all thinking. You're all thinking that search engines search the content of web pages, right? You're thinking that search engines spend all their time poring over the various metadata included in the headers of web pages, right? Apparently, that's incidental, and the most important thing to do if you want accurate search engine results is to make sure that your filenames are long and descriptive. Immediately, I was aghast, as I had already built out the shell of the site and all the filenames as far as I was concerned were pretty much set in stone.

This, as it happened, apparently wasn't the case. After a first attempt to talk some sense into the client failed, I began changing all the filenames and link URLs for a 400 page corporate website that was still under construction. The process took three days, and it gave me pause. With all these really long filenames, like, for example, "national_and_regional_retail_solutions.htm", would the filenames actually be too long to upload to my company's testing server for the client to view? Since my company uses a Microsoft server that only supports filenames of 31 characters or less, my fears were justified. I emailed the project manager, who emailed a manager above her, who emailed the client, alerting them of this problem. I received, a couple of days later, a new spreadsheet of slightly shorter filenames. I looked it over, then just to be sure copied the filenames into a text editor to run a script over them to give me their lengths. My head hit my desk seconds later, when I saw that the filenames were all 28, 29, 30 characters, but the SEO guy had forgotten that each one would have a ".htm" extension on it. At this point, I was so frustrated that I had to wait half an hour before shooting an email back up the beaurocratic pipeline. This time I got a response from the SEO guy, too.

He said he was sorry that his "recommendation" resulted in such calamity. I would agree that that was the case, if "recommendation" didn't mean "demand" when it comes from a client, and if "recommendation" didn't mean "incompetance" when coming from this guy. So I awaited new filenames from the client. When I received them, my head hit the desk again. They were all 28 characters or less. With the ".htm" at the end they were still over the assinine 31 character limit. If it were up to me, folks could have whatever filenames they wanted, but I didn't want to deliver the site then have them reject it on the grounds that it didn't work on their server. At this point, I informed everyone that I would use my own judgment to shorten the remaining filenames, and they would have to deal with that.

All told, at least a week and a half's work wasted on a poor "recommendation" from a snake oil salesman. And still no metadata written for the site. Ugh. . . .

shadowman

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAUUUUUUUUUUUUUGHHHHHHHHHHHHHHHHHHHHHHH!

eljo

Sorry, but the SEO was right on this one: descriptive URLs _do_ help to get a better rating on google.

Now, the real WTF here(TM) are:

- "Since my company uses a Microsoft server that only supports filenames of 31 characters or less"
- 400 static pages... What do you do if you want to change the design?

kuroshin

[quote user="eljo"]

Sorry, but the SEO was right on this one: descriptive URLs do help to get a better rating on google.

[/quote]
Yes, that right. But it's files over here, which needn't have descriptive names.

One can always rewrite the URL to point to a file with a smaller and more system-friendly name. Not quite sure whether IIS could do that when this happened.

Ghost_Ware_Wizard

<typical/> users should not drive implementation efforts. They drive requirements, but only from the conceptual side of the application. the logical part is where the developers and managers figure it out logically and the physical part is the actufal representation of the conception/logical models.

You see the same thing with database names when the database name limit is 32 characters as well....

stinch

[quote user="eljo"]

Sorry, but the SEO was right on this one: descriptive URLs do help to get a better rating on google.

Now, the real WTF here(TM) are:

- "Since my company uses a Microsoft server that only supports filenames of 31 characters or less"
- 400 static pages... What do you do if you want to change the design? [/quote]

Then there is the amount of time it took. Three days renaming files, why did it take so long? Computers are pretty good at repetative tasks.

I have always understood that most of the currently used search engines where built after metadata was used to influnce search engines. Since metadata in many cases had no relation to the page content and wasn't need to index content the search engine developers chose to ignore it.

viraptor

[quote user="mstahl"]You're thinking that search engines spend all their time poring over the various metadata included in the headers of web pages, right? Apparently, that's incidental, and the most important thing to do if you want accurate search engine results is to make sure that your filenames are long and descriptive.[/quote]

Almost - you should make URLs long and descriptive. For google metadata has almost no meaning. Most influence is in: domain, rest of url, headers (like in html headers and bold / big text) and text - in this order.

WTF#1: IIS can't handle >31 chars.
WTF#2: No redirection???

Google doesn't care about keywords meta. It only supports charsets, bases / links and robot parameters (nofollow, noindex, ...)

Nandurius

I'm seeing a few WTF's here, but none with the SEO guy.

#1: Your lack of knowledge of URL rewriting

#2: Your lack of research of SEO, which would back up the recommendations in multiple places

#3: The alleged 31 character limit of whatever tech you're using

I'll help you out a little:

First find the IIS equivalent of the mod_rewrite, and add a rewriting rule that will translate URL in the form of "/300-this-page-lets-you-contact-customer-service-for-my-company.htm" to "/300.html" or "getpage.asp?id=300". That way all 'long' URL are translated into the old ones.

Now the only thing left to do is to change existing links to add the 'SEO' information. If you're dealing with static pages, then you'll have to write a script to go through the files and parse/update all <a> tags. If the site is dynamic you can generate the SEO part of the url yourself (say, from the page title.) After all, it doesn't matter what the SEO part says, as /300-blahblah.htm and /300-wtf.htm all go to the same page. It's only there to help search engines (AND humans) associate more information with the url.

VGR

[quote user="viraptor"]

WTF#1: IIS can't handle >31 chars.

[/quote]

When I try to imagine the server implementation code that is the cause of this limitation, I get a splitting headache.

But such hard-coded limits are consistent with the wealth of buffer overflow exploits found over the years.

Ixpah

Are you sure its descriptive urls which are actually making the difference? Or is it just a side effect, for instance:-

i) the contents of the link does matter; so it is just people who include the full URL inside a link which makes this help, e.g. <a>full url</a>
or

ii) search engines ignore or give less weight to pages which they "think" are fully dynamic, so the act of removing query parameters helps

Cap_n_Steve

Both of those are true, but I know that keywords in the URL are highly valued, which is why you see some insanely long domain names from spammers.

YOu could have bought a new copy of Windows Server 2003 for less than the price of this fiasco. Or better yet, just install Linux free from the internet and use Apache to serve up your long filenames like a pro.

viraptor

Difference? Of course. Proof of concept: search for 110383771200001 on google. It's id of post in kmail maillist: (it's not anywhere on page)

<font size="-1">lists.kde.org/?t=110383771200001&r=1&w=2</font>

But google still finds it. If it includes links in index -> new place for keyword-spamming. Just be aware of "Err<font size="-0"> 414 (Requested URI is too long)" ;)</font>

[quote user="eljo"]

Sorry, but the SEO was right on this one: descriptive URLs do help to get a better rating on google.

Now, the real WTF here(TM) are:

- "Since my company uses a Microsoft server that only supports filenames of 31 characters or less"
- 400 static pages... What do you do if you want to change the design?

[/quote]

I've never heard of an IIS that only works with 31-character filenames. Is it some IIS4 archiacness? Is it going to be hosted off a CD for some odd reason? (Joliet does have a 31-character filename limit, UDF does not however.)

As for time renaming, may I recomend Oscar's Renamer as the fastest way to rename a bunch of files with no significant naming pattern?

Aescnt

[quote user="stinch"]

Then there is the amount of time it took. Three days renaming files, why did it take so long? Computers are pretty good at repetative tasks.

[/quote]

When you rename web pages, you have to make sure all references to that filename is changed, too. Dreamweaver helps out by doing this automatically for you (which sometimes can mess things up), or you can use grep/sed or an equivalent to help out, which still may not be that easy (example: three links can point to "../../path/index.html", or "/path/index.html", or "../../../path" and still point to the same page you need).

This can be especially time-consuming if the site was made by
someone else, with no versioning system of any kind (backup files scaterred around, etc) -- its incredibly common, too.

On the point of using a mod_rewrite equivalent for IIS, I wasn't able to find a free one, nor try the ones that require a fee. It's not really a viable option -- in a company, putting a purchase order for one should take a while. Same goes for requesting admins to install an ISAPI module, or remote access to install it yourself. Plus you'd have make sure that the module is on the production site as well, which will be hard to guarantee if you're not in charge of the site.

RiX0R

Finally, you should use use dashes instead of underscores when separating words in a URL.

R_Flowers

This conversation brings up a question I have (which I am afraid may expose some ignorance on my part).

How do the long URIs work for digg? For example:

http://www.digg.com/tech_news/A_sneaky_change_in_Windows_licensing_terms_Ed_Bott_s_Microsoft_Report

There is no .HTM or .PHP extension. Is the server configured in some way to return file names without exstensions? It doesn't seem reasonable to suppose that's actually a directory name.

KenW

I'm a little confused. On my Win2K box, IIS works fine with filenames longer than 31 characters, like http://localhost/Testing_a_very_long_html_file_name.htm - 38 characters.

Ken

Carnildo

[quote user="R.Flowers"]

This conversation brings up a question I have (which I am afraid may expose some ignorance on my part).

How do the long URIs work for digg? For example:

http://www.digg.com/tech_news/A_sneaky_change_in_Windows_licensing_terms_Ed_Bott_s_Microsoft_Report

There is no .HTM or .PHP extension. Is the server configured in some way to return file names without exstensions? It doesn't seem reasonable to suppose that's actually a directory name.

[/quote]

A URI referring to a directory must end with a forward slash. It's a common convention that if a webserver can't find a file with a given no-extension name, but it can find a directory with that name, it'll forward you to that directory, but it does so by sending your web browser a "redirect" response, and waiting for your browser to change what it's requesting. Every webserver out there can handle filenames without extensions.

R_Flowers

Thanks, Carnildo. And by the way, I used Google (I did not Google^TM!) to find possible answers. In the case of Apache, apparently you can use .htaccess to do it, among other things.

Angstrom

[quote user="Carnildo"][quote user="R.Flowers"]

This conversation brings up a question I have (which I am afraid may expose some ignorance on my part).

How do the long URIs work for digg? For example:

http://www.digg.com/tech_news/A_sneaky_change_in_Windows_licensing_terms_Ed_Bott_s_Microsoft_Report

There is no .HTM or .PHP extension. Is the server configured in some way to return file names without exstensions? It doesn't seem reasonable to suppose that's actually a directory name.

[/quote]

A URI referring to a directory must end with a forward slash. It's a common convention that if a webserver can't find a file with a given no-extension name, but it can find a directory with that name, it'll forward you to that directory, but it does so by sending your web browser a "redirect" response, and waiting for your browser to change what it's requesting. Every webserver out there can handle filenames without extensions.

[/quote]

Odds are very good you're seeing URL rewriting in action, where it's rewriting SITE/([^/])/(.) to something like SITE/article.php?section=$1&title=$2 before handling it.

The other approach I've used is to make the script "tech_news", and parse the rest out of PATH_INFO. For the above URL the PATH_INFO for tech_news would be "/a_really_long_title" -- pretty easy to parse.

Aside: URLs actually shouldn't contain a file extension. It's fairly easy to make apache guess the extension automatically by turning on Options +MultiViews or to hardcode the type using a typemap file, which will allow you to have several types for the same resource. An image might be available as png, jpg, and gif files under http://www.exaple.com/images/an_image, and the server selects one based on the browser's Accept: header.

Noam_Samuel

Actually, as much as I hate SEO (Should be renamed Search Engine Manipulation), I must say that you are in the wrong on one thing: Google (and other engines, I suppose) completely ignores metadata due to frequent abuse by website authors.

[quote user="Angstrom"]

Aside: URLs actually shouldn't contain a file extension. It's fairly easy to make apache guess the extension automatically by turning on Options +MultiViews or to hardcode the type using a typemap file, which will allow you to have several types for the same resource. An image might be available as png, jpg, and gif files under http://www.exaple.com/images/an_image, and the server selects one based on the browser's Accept: header.

[/quote]

IMHO this is also one of the least useful aspects of the HTTP spec. Same deal with NTFS streams. (Mac resource forks being the only useful implementation of the idea.) The idea of one file=one resource is too intuitive and too ingrained in people for automated rules like that to make sense, especially when they're supported to very different degrees across products. (The gif/jpeg example is so 1991, video playback ability would make more sense today.) Yes, it can be powerful and useful, but rewrite and scripts are much more so with similar maintenance costs, and a browser can no longer efficiently enumerate its entire list of capabilities (and bugs?) to the server every hit.