Nanchang, China May 22 - 24, 2009

Nanchang, China May 22 - 24, 2009

WISA 2009

WISA 2009

Second International Symposium on

Web Information Systems and Applications

Second International Symposium on

Web Information Systems and Applications

Home > Table of Contents

 

Proceedings of the 2nd International Symposium on Web Information Systems and Applications (WISA 2009)

Nanchang, China, May 22-24, 2009

Editors: Fei Yu, Jiexian Zeng, and Guangxue Yue

AP Catalog Number: AP-PROC-CS-09CN001

ISBN: 978-952-5726-00-8 (Print), 978-952-5726-01-5 (CD-ROM)

Page(s): 140-143

A Hash-based Hierarchical Algorithm for Massive Text Clustering

Yin Luo, Yan Fu

Full text: PDF

Abstract

Text clustering is the process of segmenting a particular collection of texts into subgroups including content based similar ones. The purpose of text clustering is to meet human interests in information searching and understanding. This study proposes a new fast hierarchical text clustering algorithm HBSH (Hash-based Structure Hierarchical Clustering), which is suitable for massive text clustering. This algorithm uses hash table instead of numerical vectors as its input data. Compared with the other clustering algorithms, the HBSH performs the text clustering process without setting clustering center number and has minor space complexity in advance, which can achieve better performance. The experimental results illustrate that the average time of HBSH is faster than that of traditional text clustering algorithms.

Index Terms

hierarchical, text clustering, hash table

Copyright @ 2009 ACADEMY PUBLISHER All rights reserved