Drawbacks of Search Engines

Poor precision - List of retrieved documents contains a high percentage of irrelevant documents.

Poor recall - Most web's search engines consult databases of the most frequently used words in documents, such as words drawn from documents title and first few sentences hence they won't retrieve documents in which the keywords for which you are searching are buried somewhere within document. Many page authors send search engine numerous web pages containing various tricks like irrelevant title tag or repeating certain words in first few levels that are irrelevant to actual contents of the page, to boost the ratings. Though this seems to be matter of less concern but when attempted by many persons leads to very serious problem, it might lead to situation where in not even one of the top ten sites listed would be of subject you would expect

Varied document quality -spider can't discriminate between valuable documents and spams.

Varied indexing depth - Some spiders retrieve only the document's title others retrieve entire document text. Unless you understand how spider works you are not very likely to succeed. .

XML and Search Engine

Since HTML doesn't provide any standard method to identify contents of documents, it is extremely difficult for search engine to identify contents of web page to index them, this led to need for more powerful and flexible way of presenting data on the web.. XML (extended Markup Language) is a simplified dialect of the mother of all document defining language, SGML (Standardized General Markup Language ) though XML is not as powerful as SGML but much easier to use . Developing web pages using XML is much similar to HTML but provides author with ability to invent their own tags, the tag names and what they mean are left to author to define depending on subject matter . The most important thing about XML is it allows more details to be included in document, searching for specific topics should become more accurate avoiding many mismatches


As World Wide Web seems to be ever expanding, with increasing threat to quality of information available on the web, use of XML in developing web pages and search engines with advanced capabilities like GCS seems to hold the key for future in organizing and retrieving quality data on web.


