Search Engine · Shivaji Varma

A web search engine is a software system that is designed to search for information on the World Wide Web.

This is my academic project.

A search engine generally consists of three parts, namely:

Crawler
Indexer
Query Parser

This website is made possible by displaying online advertisements to our visitors.
Please consider supporting by disabling your ad blocker.

Architecture

Crawler

In the current scenario, Information Technology is advancing rapidly. World Wide Web or Internet is one of the best achievements of it. Internet can be treated as a huge repository of information and sophisticated methods are always required to extract needed information. Search Engines like Google, Yahoo and MSN are really necessary tools to retrieve needed information.

Most of such search-engines basically perform Crawler-Based Search. These SEs generally consist of a Web Crawler – a program that crawl the web, an Indexing Technique, some Encoding Mechanism and a huge Database. These SEs use crawlers (spiders) for information collection on the web. Then indexing, encoding and storing of collected data are performed subsequently. Following diagram represents the anatomy of search-engine.

Steps of Crawler Based Search-engines:

Web–Crawling: Search-Engines use a special program called Robot or Spider which crawls (travels) the web from one page to another. It travels the popular sites and then follows each link available at that site.
Information Collection: Spider records all the words and their respective position on the visited web-page. Some search-engines do not consider common words such as articles (‘a’, ‘an’, ‘the’); prepositions (‘of’, ‘on’).

Indexer

After collecting all the data, search-engines build an index to store that data so that a user can access pages quickly. Different search-engines use different approach for indexing. Due to this fact the different search-engines give different results for the same query. Some important considerations for building indexes include: the frequency of a term of appearing in a web-page, part of a web-page where that term appears, font-size of a term (whether capitalized or not). In fact, Google ranks a page higher if more number of pages vote (having links) to that particular page.

Query Parser

This performs the search for given search query using the support of semantics. A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.

Architecture

Crawler

Indexer

Query Parser

Screenshot