JOURNAL OF SOFTWARE (JSW)
ISSN : 1796-217X
Volume : 4    Issue : 5    Date : July 2009

Research on Web Session Clustering
Chaofeng Li
Page(s): 460-468
Full Text:
PDF (397 KB)


Abstract
The task of clustering web sessions is to group web sessions based on similarity and consists of
maximizing the intra-group similarity while minimizing the inter-group similarity. The results of Web
session clustering can be used in personalization, system improvement, site modification,
business intelligence, usage characterization and so forth. This paper proposes a framework of
Web session clustering first. Then several data preparation techniques that can be used to improve
the performance of data preprocessing are presented. A new method for measuring similarities
between web pages that takes into account not only the URL but also the viewing time of the visited
web page is also introduced and a new method to measure the similarity of web sessions using
sequence alignment and the similarity of web page access is given in detail. Finally, an algorithm of
web session clustering is proposed. This algorithm defines the number of clusters according to the
knowledge of application fields, takes advantage of ROCK to decide the initial data points of each
cluster and determines the criterion function according to the contributions of overall increase in
similarities made by dividing Web sessions into different clusters --- which not only overcomes the
shortcomings of traditional clustering algorithm which merely focus on partial similarities, but also
decreases the complexities of time and space.

Index Terms
Web session clustering; Data Preprocessing; sequence alignment; similarity measurement