JOURNAL OF SOFTWARE (JSW)
ISSN : 1796-217X
Volume : 4    Issue : 5    Date : July 2009

Semantic Focused Crawling for Retrieving E-Commerce Information
Wei Huang, Liyi Zhang, Jidong Zhang, and Mingzhu Zhu
Page(s): 436-443
Full Text:
PDF (777 KB)


Abstract
Focused crawling is proposed to selectively seek out pages that are relevant to a predefined set of
topics without downloading all pages of the Web. With the rapid growth of the E-commerce, how to
discovery the specific information such as about buyer, seller and products etc. adapting for the
online business user becomes a focused issue to the information search engine. We present a
novel semantic approach for building an intelligent focused crawler which deals with evaluating the
page’s content relevance to the E-commerce topic by the domain ontology and the hyperlinks
connection to the commercial web pages by link analysis. In the process of crawling, the domain
ontology can evolve automatically by machine learning based on the statistics and rules.
Experiments have been performed, and the results show that our approach is more effective than
the other traditional crawling algorithms, and prevents the topic-drift with higher harvest rate.

Index Terms
Focused crawling, Information retrieval, E-commerce, Semantic, Machine learning