How Search Engine Optimization (SEO) Works
Giant Spiders in the Library of Babel
In 1941, Jose Luis Borges published a brilliant, disturbing short story about an infinite library. While at first that might seem to be heaven, the struggles of the narrator to find even one relevant volume among the nearly infinite meaningless tomes reveal the infinite library to be more like hell. While there may be a catalog showing where everything is, first one must find the catalog. And, of course, there are a nearly infinite number of false catalogs. The library contains:
“Everything: the minutely detailed history of the future, the archangels' autobiographies, the faithful catalogues of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogue, the Gnostic gospel of Basilides, the commentary on that gospel, the commentary on the commentary on that gospel, the true story of your death, the translation of every book in all languages, the interpolations of every book in all books.”
-The Library of Babel
Jose Luis Borges
When the internet became the World Wide Web, a potential universe of unlimited information was suddenly hooked up to the desktop of anyone who could afford a modem. The ease of publishing on the Web, along with the early lack of clear copyright law in cyberspace led to a proliferation of truth, untruth, and everything in between. What was lacking was any way to distinguish one from the other. Even a salty veteran of media spin and political mudslinging, Pierre Salinger, was unable to recognize false information, insisting that a missile had shot down an airliner in 1996. He had downloaded what he thought was a government report off the internet. Now some glossaries include the phrase: “Pierre Salinger Syndrome” to describe the tendency of internet novices to believe false information.
But even if you can recognize the truth when you see it, how do you find it? The internet contains several billion pages, and is growing at a rate of several millions pages a day. It may not be infinite, like the Library of Babel in Borges’ short story, but every day it gets closer. Yet, every day more and more research is done partially or totally using the internet. People are using search engines and directories to find what they seek. But how? How do they know where to look? How do they know that, out of all the billions of documents out there, the particular one they’ve found is the most relevant answer to their question?
Giant Spiders
85% of internet users find what they want by entering a search phrase into a search engine, meta-search engine, or directory. Most of them use Google. Another large group use Yahoo, which serves up results from Google, mixed in with a few other sources. But where do those results come from?
Once upon a time, Yahoo was king. Yahoo had a directory of many of the best Web sites on the internet, each site was hand-reviewed and placed in an appropriate category. If you wanted to be listed in Yahoo, you polished up your site, wrote up a description, then asked Yahoo to review it. Sooner or later they would.
Then Yahoo realized that, as the internet economy was not going to run on fumes forever, it needed to make money. Suddenly there was a rapidly escalating review fee. It began as an optional fee required only by those who wanted their sites reviewed quickly. Then it expanded. Another major directory came along, the DMOZ or Open Directory Project. A third directory, Looksmart, also took up some real estate on the horizon. Smaller directories provided relevant results for searchers in particular industries or geographic areas.
But directories are famously far behind the expansion rate of the internet. The rate at which sites can be hand-reviewed with any degree of consistency is limited. An innovate method of creating Search Engine results is the Spider. Spiders crawl from site to site, using the links it finds to spread out further and further across the Web. Each spider examines a Web site, weighing the text, the tags, and the links from other Web sites and then ranking that site in terms of relevancy to particular queries.
Spiders can cover many more pages than hand-reviewed directories, and are much more capable of keeping information up to date. The major directories have not been able to compete, and have adopted a more supplemental role. Yahoo, which for a time was supplementing its directory results with Google listings, gave up and now provides Google listings first. Looksmart changed its business model entirely, becoming a Pay Per Click search engine. Open Directory is still hanging in there, but it is essentially non-profit, and people report waiting many months to get their sites listed. In any case, nowhere are Open Directory listings offered first. Many search engines use the Open Directory to supplement their spidered results, as well as provide a tree structure to those users who prefer it.
So the spiders crawl across the Web, much like the Inquisitors in Borges’ story. They read billions of pages, assigning each page an objective weight, then, if asked, ranking it in terms of relevance to a particular query. However, life is not easy for these Spiders. Borges’ Inquisitors also ran afoul of many process problems:
“There are official searchers, inquisitors. I have seen them in the performance of their function: they always arrive extremely tired from their journeys; they speak of a broken stairway which almost killed them; they talk with the librarian of galleries and stairs; sometimes they pick up the nearest volume and leaf through it, looking for infamous words. Obviously, no one expects to discover anything.”
- The Library of Babel
- Jose Luis Borges
But the spiders have to worry about more than just the broken stairway or two. When they talk with the “librarians of galleries and stairs,” (whom we call webmasters) they are often deluged with false promises as the particular librarians try to have their sections or particular books placed high in the listings. Technically, this constant deluge of false promises is called search engine spam. If you are caught doing it, your site may be banned from that search engine for some arbitrarily long time.
So, to the point. Search Engine Optimization, also known as Search Engine Marketing is a discipline that is now becoming the primary means used to ensure that a well constructed site attracts and impresses visiting spiders so much that they run all over the internet telling every one they know about this new Web site.
However, this effort is a struggle for recognition, not only with other significant content sites, but with shoddy sites using sneaky search engine optimization tactics to appear more relevant than they really are.
More about search engine results
Try this. Go to Google and type in “Library of Babel”. Go to MSN and type in the same thing. Google reports some 84,000 results, while MSN tells you there are more like 39,000. Still, both search engines offer several of the same listings. MSN offers a directory site first, taken from one of three directories, but then gives the same first listing as Google. The URL is: http://jubal.westnet.com/hyperdiscordia/library_of_babel.html
It’s as good a place as any to find Borges’ story if you want to read it. Of the top listings, several are common to both sets of results. In a way that’s what you would expect, after all there is one World Wide Web, one story: Library of Babel, and one search string, library of babel.
Let’s try something else, then. Enter just “Borges” into Google, MSN search, and Altavista. Now the results are beginning to diverge. Part of this is that Borges is a single word. Part of it is that different ranking factors come into play when a word, or a name like Borges, may apply to several different things. The top Google results are almost all tied to Jose Luis Borges, still. In the MSN results, the directory listings tend to be about Jose Luis Borges, but from fairly standard sources. The web page results begin with the same listing as the Google results, then offers up some Spanish language results, a travel listing, and a popcorn company. Altavista goes even further afield, offering up the Borges Law firm, something that doesn’t show up until deep in the Google listings.
Which search engine is correct? Well, it depends on the mind of the searcher. Someone who wants to find out more about Jose Luis Borges is probably happy with any of the results, though Google might seem to offer the most relevant. However, a searcher interested in finding the Borges Law Firm would only think Altavista helpful.
Let’s try finding Apple. Or maybe we want to know about apples. Enter apple into all three search engines. The Google results show no affection for the fruit. The first listing for a Web site having to do with the fruit is at position number 47. The rest is mostly Apple Computer Related, though Fiona Apple comes in at 27, and a couple other companies using Apple in their title slip in. MSN, naturally, seems more inclined to show results having to do with the fruit. Most of the listings are still computer related, but Apple Recipes shows up at #2 among the featured sites, and a Web site giving information about orchards shows up at #14.
Altavista serves up Fiona Apple and the Washington State apple commission on page 2, but mostly deluges us with Apple Computer sites, sponsored sites, and featured sites.
Intuitively, we expect things to change if we just go to the plural: apples. Many people might think of Apple Computer first when the word is singular, but almost everyone thinks of the fruit when faced with the term: apples. In fact, all three search engines are able to accommodate this change. Though MSN insists on giving us Apple Computer related sponsored listings, the actual search results are all along the lines of http://www.urbanext.uiuc.edu/apples/index.html
A site that shows up #1 on Google and #2 on MSN, as well as coming up #3 on Alstavista (after the sponsored results). I think Borges would be impressed.
Still, such unification of search engine result and searcher intent is likely to break down under two types of pressure. One is as the sheer volume grows. As big as the Web has become, it is nowhere close to Borges’ infinite library. And as many categories grow and become more and more stuffed with material, finding the best results for those key first 10 or 20 listings will be more problematic.
The other way in which things break down is under the pressure to sell. Borges saw the Inquisitors as sympathetic souls, crawling across broken links, always tired, often near death. But Webmasters, desperate to gain the high rankings that will sell whatever it is they offer, see the Inquisitors (or spiders in this case) as vulnerable victims that can be tricked, tortured, or bribed. The Webmasters resort to many tricks and traps, fake front pages, hidden text, false titles, all in an attempt to make nonsense books seem more relevant than complete volumes with less dressing.
Go to Google and type in a phrase “Web Site Design Company”. Within a fraction of a second, Google has offered you 2.3 MILLION choices. You will also see above and to the right sponsored results. But the results aren’t random, nor are they alphabetical. Run the search a few times and you’ll get more or less the same order. Google has made millions of decisions in that .28 seconds and ranked all those pages for you. It believes you are most interested in “Successmakers.net”, while somewhat less interested in “webolutions.com”, even though both offer Web site design. How does it know? Where did it get all these listings in the first place?
Now go to www.msn.com and type the same phrase into the search box on the right. Once again you get a list of results, though MSN seems to think there are only about 653 sites that match your search. Also, instead of sponsored links off to the side, you see the first listing, then three sponsored sites, then the rest of the listings. MSN has obviously decided upon much different answers to the same question.
Finally, go to Yahoo and try the same search. Yahoo believes there are 11,753 answers to your question. Note that the sponsored listings in Yahoo look the same as those in MSN. About the only overlap we get. In case you didn’t find Stratecomm, we’re listed at about 85 on Google, and we don’t show up for this listing on Yahoo or MSN. A quick review of other sites, including those above us shows our design and services are probably better, but we haven’t gotten that high ranking yet. On the other hand, we are higher than 2.3 million other listings.
Of course we want to be listed first, or at least in the top thirty listings. We also would like our clients to be listed in the first three pages for whatever key phrases are most relevant to their business. So we need to understand how these different search engines rank all the listings. Unfortunately, each one is different. To make things worse, though you may think you are using one search engine, you are actually probably using several. Go back to the MSN search we did earlier. If you scroll your mouse over the various listings, without clicking, you will see a URL in the bottom gray bar. For the paid listings this may say something like www5.overture…. Some listings will display the same URL in the gray bar as they list. Others, like the eDezines listing will show r.lksmt in the lower corner. You are seeing results from the pay-per-click search engine known as Overture, from the directory Looksmart, and possibly from Altavista’s Best of the Web. The Yahoo listings are a combination of Google results and Yahoo’s own directory. The Google listings include the Google database as well as listings taken from the Open Directory Project. What should you remember about all that? Just that it’s complicated.
Traditionally Google is described as a search engine, while Yahoo is called a Directory. However, as both provide both search engine and directory listings, the distinction is a little vague. However, it is a good idea to know the difference because good optimization requires appealing to both.
A search engine assembles a huge database of Web sites by sending little traveling programs, called robots or spiders, all over the Web. The spider will find a Web site via some link, crawl through the site using all links it can find. Spiders rank sites using a variety of algorithms. A spider will return to a site on a periodic basis, depending on the search engine and other factors, including whether you have a paid inclusion program with that service. Search engines include Google, Altavista, and Inktomi.
A directory reviews a Web site that has been submitted to a particular category. Directories are reviewed by hand, meaning that an actual person reads through the Web site and ranks it according to a secret set of criteria. Directories are much smaller, but proportionally more important because the links are more highly valued. Directories include Yahoo, the Open Directory Project, and (for the moment) Looksmart.
A third, quasi-search engine category has evolved recently. It is more like a search engine than a directory, but could also be seen as a whole other kind of animal. Called the Pay-per-click engine, it works through having Web site owners bid on certain keyphrases. Those that bid the most are listed the highest. The leader of this technology is Overture. While not many search directly on Overture, you will find the highest ranked listings for Overture on major engines such as Yahoo and MSN. While marked off as sponsored listings, on some engines the distinction is not at all clear.
What to do?
Search Engine Optimization Process
Now that we all have a basic idea of what search engines are and how they work, we can start talking about what is involved in optimizing for them. There is no easily described set of rules, search engine optimization is as much an art as a science. It begins with knowing two things, what the client is selling and what people are buying. And not in general terms, but in specific language.
Every Website has a mission. If the Website were a college paper, that would be the theme. The mission holds the Website together. It is what the Client wants the Website to do. We want our Website to generate business for Stratecomm. To do this, it should attract potential Clients, inform them of our services, and prompt them to contact us. The visitors we want to attract are those who are interested in our services. Web design, maintenance, hosting, etc. We don’t want people looking for Free Web Hosting or Adult Services.
Finding the Keyphrases
So what are those potential Clients searching for? Many Websites assume that strong traditional marketing language is the answer. That people, for some obscure reason, are entering search phrases such as “top-quality, intergrated paradigm enhancing internet solutions outside the box”. While this stuff may look great on a billboard, in fact nobody is searching for it. To find out what phrases people are actually seeking, we use Wordtracker.
Wordtracker is a tool that takes a phrase you think might be close, generates a bunch of similar phrases, then tests them on various search engines. It also looks at other Websites to see how many have included those search phrases in their own content. Word tracker lets you know two important things, how much traffic is looking for any particular phrase, and how many sites are competing for that traffic. For instance, Web Site Design brings up any number of other possibilities, such as Web Design and Ecommerce. Looking further into Web Site Design, we find a traffic number of 1986. What this number actually means is a little vague, but they predict that at least 1971 people per day will see your listing if it is on the first page for this search phrase. That’s a lot. Right underneath it, we see that Free Web Site Design would attract 151, also a lot, but probably not people we want. Doing a competition search on these phrases tell as that 1.7 million sites on AOL, and 1.04 million on Google are trying to reach this same audience. Interestingly, Bad Web Site Design draws in a traffic number of 66 and only 135 competing sites are found. If we thought Bad Web Site Design would bring in quality leads, we might optimize some part of our site for that.
Ideally, our Wordtracker research will produce a list of some 100 phrases, and we will note those that have high traffic as well as low competition. Some may have both. These are the key search phrases we will use to optimize our site. Some people recommend ignoring the high traffic phrases unless there is little competition. Our techniques have shown high placement even for some high traffic phrases.
Bad Search Engine Techniques
A lot of methods have been tried in order to increase the ranking of Websites. Many of these methods are now considered shady, some will get a site banned from a search engine. Many will still work, at least for a while. Some were considered perfectly legitimate for a while. Collectively, these methods are called Search Engine Spamming. These techniques include:
* Cloaked pages – creating pages that, through use of Javascript, are never seen by the site visitor.
* Invisible text – using text and background color set to the same color. Allows keyword laden text to be added to a page without the visitor ever seeing it.
* Keyword repetition – repeating the same keyword over and over, either in metatags, invisible text, or in actual text.
* Alt-tag keyword stuffing – using alt tags to cram in additional instances of keywords.
* Frequent resubmission – submitting the same pages over and over again, usually using machine submissions.
* Doorway Pages – Pages that have nearly identical content, are often submitted to only one search engine each, and are not linked to from the body of the Web site. The line between legitimate and illegitimate is blurry in this areas.
* Link Farms – Giant stacks of cross-links, intended only to build up the link popularity of sites.
A general guideline is that any text should be, in some sense, useful to human visitors of a Website. Additional content pages that are written to attract keyphrases are probably okay if they present content in a unique way, and are integrated into the overall structure of the site. Stuffing random keywords into Alt tags may be bad, but keyphrases that also describe the Alt image are useful to site-readers and are, thus legitimate.
Good Search Engine Optimization Techniques
Content is King
People search using words. Naturally, words provide the best answer, and words are what search engines seek out. Having high-quality substantial text is ultimately the best form of SEO. While there are recommendations for each search engine, a general rule is to have at least 250 words per page. The targeted keyphrases should appear within the first couple sentences. However, if the text does not read naturally, revise it until it does. A page that attracts spiders but reads badly to actual humans is a loser. Individual pages should be created for each major search phrase, with content that supports the phrase. Content is text, text is generally html. Spiders can index asp pages, etc., sometimes. If we are using that kind of text, we need to take special measures. Frames also require special measures to be readable. Content should be broken into individual pages by topic. Search engines will rank a page according to what appears in the first text it finds. If your phrases are buried in a subtopic that you have to scroll and scroll to reach, break them up onto separate pages.
But The Tags Still Matter
What does a Spider look at? The spider wants to know what a particular Web page is about. Naturally it will look certain places to see this. It will look at the page title. If your client puts just the company name in the page title, the page is about the company name. We have titled most of our Web pages to include: Web Site Design Company. The pages are about Web Sites, Web Design, and a Web Site Design Company. If someone is looking for these things, a spider will consider us much more strongly than if we just had our company name.
Another place to put information is in the Meta Tags. The most important Meta Tag is the Description tag. Some engines read this, some don’t. Some display the description tag in the search results. So the description tag should be readable to people as well as spiders. The Keyword tag is often skipped by spiders now, as it was too subject to keyword spamming. But it is a good place to add geographic references, misspelled words, etc.
The Header (H1) tag is also important. Words appearing in the header tag are considered to be very descriptive of what a Web site is about. The appearance of the header can still be controlled with style sheets, so all is not lost.
We also use Alt tags for two reasons. One is that visually impaired people use text readers to surf the Web. Alt tags let them know what images are found on a Web site. Search Engines also look at Alt tags to see what is on a page. This is an area that has been subject to abuse and recently search engines have gotten more sophisticated about catching spammers.
Link Popularity
Google thinks that the best sites are those that are most popular. They may have a point. If your site has content relevant to ice cream nutrition, then some ice cream sites and some nutrition sites have probably linked to you. If someone is looking for information on ice cream nutrition, and you have a lot of sites linking to you, then Google will offer you as an answer to that person over sites without links. Some links are more important than others, however. The following factors play into link popularity:
*
Total number of links – we currently have 750 links to our site, according to Google.
*
The importance of the linking pages – a site that has a high Google ranking gives recommendations that are taken more seriously.
*
The relevance of linking pages – in our example, links from ice cream and nutrition sites would be counted for more than links from, say, bicycle repair sites.
*
The relevance of the keywords in the link itself – this is an area that has been used by Bloggers to trick Google recently, but is still pretty effective. A link to our site that says “Stratecomm” helps us overall, but a link to us with the words “Web Site Design Company” does far more when someone is searching for web design.
Link Popularity may be the most important factor in Google, and Google is probably the most important search engine. So even though most engines don’t use it that much, it is still a priority.
Then What?
Submission
Once a site has been optimized it doesn’t automatically show up on all the search engines. Once upon a time the spiders would show up and index it within a few weeks. But now the Web is so huge that submission is mandatory. There are several types of submission. Some are free, some are paid. Some accept payment for faster review. The Stratecomm site has a current fee submission table, but these prices change frequently. Let’s look at a few of the important services:
*
Google is free. You can submit for free to Google, you can also submit to the Open Directory Project, from which Google pulls its directory section. Google will review a site in 4 to 8 weeks or so. The Open Directory Project, depending on category, will take months.
*
Yahoo costs money. If you are a business, you have to submit using the Express Submission, which guarantees review within 7 days. (supposedly.) They do NOT guarantee that you will be listed (which makes you wonder how you could enforce the first guarantee). Non-profits and non-commercial sites can still use the basic submit, though that category seems to shrink daily.
*
Looksmart, currently, is another large directory that serves up listings to MSN, Altavista, Iwon, CNN, and others. They basic submission is $149, the Express, $299. The back door to Looksmart is Zeal.com, a directory open to non-commercial sites. To submit to Zeal, you have to become a Zealite, pass a test, etc. Rumor has it that Looksmart will convert to Pay-per-click in the near future.
Rank Checking
Once a site has been optimized and running for a couple months, we do a ranking report, using WebPosition Gold. This program will take a set of keyphrases, submit them to a chosen group of search engines, and show you how they rank. The results are fairly accurate, though they may fluctuate considerably at times. One note: currently Google seems to be having a tiff with WPG, and considers automated search queries a violation of their terms of service. Using WPG on Google involves some risk of getting your search privileges revoked. “Banned From Google”. However, the results for Yahoo Web Sites are almost identical, so we can use those in place of Google results.