ISSN : 1796-217X
Volume : 4    Issue : 5    Date : July 2009

A Semantic Approach for Document Clustering
Khaled Shaban
Page(s): 391- 404
Full Text:
PDF (693 KB)

Conventional document mining systems mainly use the presence or absence of keywords to mine
texts. However, simple word counting and frequency distributions of term appearances do not
capture the meaning behind the words, which results in limiting the ability to mine the texts. In this
paper, the application of a semantic understanding-based approach to mine documents is
presented. The approach is based on semantic notions to represent text, and to measure similarity
between text documents. The representation scheme reflects existing relations between concepts
and facilitates accurate similarity measurements that result in better mining performance. A
document mining process, namely semantic document clustering, is investigated and tackled in
various ways. The proposed representation scheme along with the proposed similarity measure
were implemented as vital components of a mining system. The approach has enabled more
effective document clustering than what conventional techniques would provide. The experimental
work is reported, and its results are presented and analyzed.

Index Terms
Document mining, semantic understanding, text representation, similarity measure, document