Home > Table of Contents


Proceedings of 2009 International Workshop on Information Security and Application (IWISA 2009)

Qingdao, China, November 21-22, 2009

Editors: Feng Gao and Xijun Zhu

AP Catalog Number: AP-PROC-CS-09CN004

ISBN: 978-952-5726-06-0

Page(s): 1-6

A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method

Xian e Guo and Junmei Yan

Full text: PDF


Data sets are randomly divided into several subsets, then fuzzy clustering method for A high-dimensional data based on genetic algorithm is proposed to cluster the subsets, by importing a fuzzy dissimilar matrix to express the dissimilar degree between any two data, and initializing the high-dimensional samples to two-dimensional plane. Then iteratively optimize the coordinate value of two-dimensional plane using genetic algorithm, which makes the Euclidean distance between the two-dimensional plane approximate to the fuzzy dissimilar degree between samples gradually. At last cluster the two-dimensional data using FCM algorithm, so avoid dependence of clustering validity on the space distribution of high-dimensional samples. Experimental results show the method has high quality result, and improves the clustering speed greatly.

Index Terms

fuzzy clustering; distributed method; genetic algorithm; fuzzy dissimilar matrix; large data sets; high dimension

Copyright @ 2009 ACADEMY PUBLISHER All rights reserved