Tuesday, February 24, 2009

Improvements to web search engine methodologies from user point of view

Terms Used:

Importance Index: This is a number generated by evaluating the backlinks. Google PageRank is example of importance index, this page suggest an improvement in this practice.

Content Quality Index: This is a number generated by evaluating the contents of the web page. This evaluation should not consider the importance index.


Spam Index: This is number is evaluation of the website by spam content. This number is most difficult one to formulate?


The goggle and potentially other search engine work on the basis of assigning some importance index to a web page. Along with this text pattern search gives the results links. The importance index is calculated on the basis of backlinks pointing to that website. The importance index computation also considers the importance index of the web sites pointing to this website and number of links of the websites. This is similar to voting where every voter is assigned unequal importance.
This technique sounds great but has some potential problems. The problem is related to spam’s and assigned importance to individual links in computation of importance index.
The list of possible problem is listed below.
Visibility/Size of backlinks pointing to site: Visibility and Size of link is ignored: This is slightly against the democracy. As more important paged will be linked with batter visible links. The web designers use special graphics effect to display important links. The important links generally contains fancy and special colored text, big font size or images to link to important pages. Search engines should also consider these while evaluating back links. The area and effort used in designing a link should also make it more important while calculating the linked page importance index.


The quality of contents

The overall quality of the contents should be evaluated by modeling stats of good contents form the web. This content quality index should be used in parallel with importance index while ordering the websites in search result. The quality content index should be formulated such that it does not affect absolute importance index.
Spam’s website

Nowadays spam websites are linking with each other and have large number of pages which does not contain any important contents and just links and advertisements. A similar indexing can be used for spam-ness of a website. This requires more evaluations.
Ordering of search Result

The conclusion is while ordering search results following should be taken in the account.Importance index. Content quality index.Spam index.And last but not the least the user input pattern matching.
Older search engines only used the 4th point and ignored first three. The ideal search engines on the interned would be those that evaluate all these suggestive technique in an optimized manner. This makes clear that the field of search engine is not yet closed and opens for more research work.


PageRank and Google are trademarks of Google Inc., Mountain View CA, USA. PageRank is protected by US Patent 6,285,999.

No comments: