Most search engine's deploy robots programs called spiders or crawlers, depending on the search tool the robots may begin by visiting pages that have numerous hyperlinks or sites that have been identified as most popular , after identifying new sites ,robots record the URL in order to revisit and update the site. The new site are then indexed. It is at this point the actual problem arises. How to identify the contents of website, key element of web technology, HTML currently provides no widely used means for categorizing documents. Relevant information can be extracted by robots from ,

* Title tag * Meta tag * Text of the site

Title tag - contains the main theme behind the site.

<title>MY HOME PAGE</title>

Meta tag - They are individual tags embedded in the HTML document that provide information on particular documents characteristics but are not visible on page, the meta tag has two attributes used to include, to specify information about document Name - this identifies type of information Content - this identifies the meta information that you want to include

<meta name="Author" content ="Prasanna">

<meta name="Classification" content="Computers-internet, searching">

Again there are two possibilities for robot to explore Breadth first and depth first algorithm. Breath first algorithm first leads the robot to new servers in order to find representative documents from as many server as possible while depth first leads the robots into each site in order to explore deep within each server menu structure. Performing a deep search Knowing how to use the web intelligently means knowing how to locate useful information on the web


<< Prev   Index    Next>>