OREGON STATE UNIVERSITY

How A Search Engine Works

 

The World Wide Web contains hundreds of millions of Web pages.  How in the world is it that we can enter a few words in a search box and get a list of items that are even close to what we want?

The answer is a little creepy.  Spiders do the work.

Search engine spiders are special programs that specifically build and maintain lists of the words found on Web sites.  When a spider is building its lists, this process is known as crawling the Web.  To provide us with the seemingly endless answers to the questions we put into the search engine of our choice, these little robots, or bots, have to look at a lot of pages.

Spiders work by starting at lists of heavily used servers and Web pages and following the hyperlinks that are on those pages.  Different search engines such as Google, Yahoo, and Bing, use different spiders, but some elements of all web pages are looked at and remembered by all spiders.  Information such as the title of a web page, sub-titles or headings on body content, and various metadata such as keywords and alternate text on images are good data that a spider will consume.

There exists quite a bit of variation in the extra information that is gathered, above and beyond the standard metadata, and the way the different spiders from the various search engines compile their information.  These methods are generally proprietary.  In all cases, though, the information is gathered, weighted according to the standards of the search engine, and then encoded to compress a large amount of data into a small amount of space.

After compression, the information is then indexed.  Indexing serves the sole purpose of making data retrieval fast and efficient.

After that, it's up to us, the users, to build a query in a search engine.  A query can be simple, such as one word, it can contain a phrase, or the user can usually even elect to use Advanced search features such as Boolean operators that allow a user to filter information based on the operators "And", "Or", and "Not".

In the following sections we'll take a look at some points you will want to consider while creating and publishing your content to the Web.  With a few simple adjustments, you can increase the exposure of your publication.