
|
Nanchang, China May 22 - 24, 2009 |
|
Nanchang, China May 22 - 24, 2009 |
|
WISA 2009 |
|
WISA 2009 |
|
Second International Symposium on Web Information Systems and Applications |
|
Second International Symposium on Web Information Systems and Applications |
|
Proceedings of the 2nd International Symposium on Web Information Systems and Applications (WISA 2009) Nanchang, China, May 22-24, 2009 Editors: Fei Yu, Jiexian Zeng, and Guangxue Yue AP Catalog Number: AP-PROC-CS-09CN001 ISBN: 978-952-5726-00-8 (Print), 978-952-5726-01-5 (CD-ROM) Page(s): 140-143 |
|
|
A Hash-based Hierarchical Algorithm for Massive Text Clustering Yin Luo, Yan Fu |
Full text: PDF |
|
Abstract |
|
|
Text clustering is the process of segmenting a particular collection of texts into subgroups including content based similar ones. The purpose of text clustering is to meet human interests in information searching and understanding. This study proposes a new fast hierarchical text clustering algorithm HBSH (Hash-based Structure Hierarchical Clustering), which is suitable for massive text clustering. This algorithm uses hash table instead of numerical vectors as its input data. Compared with the other clustering algorithms, the HBSH performs the text clustering process without setting clustering center number and has minor space complexity in advance, which can achieve better performance. The experimental results illustrate that the average time of HBSH is faster than that of traditional text clustering algorithms. |
|
|
Index Terms |
|
|
hierarchical, text clustering, hash table |
|
|
Copyright @ 2009 ACADEMY PUBLISHER — All rights reserved |
|