JOURNAL OF COMPUTERS (JCP)
ISSN : 1796-203X
Volume : 1    Issue : 3    Date : June 2006

Generalized Sequential Pattern Mining with Item Intervals
Yu Hirate and Hayato Yamana
Page(s): 51-60
Full Text:
PDF (447 KB)


Abstract
Sequential pattern mining is an important data mining method with broad applications that can
extract frequent sequences while maintaining their order. However, it is important to identify item
intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence <
A;B > with a 1-day interval and a sequence < A;B > with a 1-year interval are completely different; the
former sequence may have some association, while the latter may not. To adopt item intervals, two
approaches have been proposed for integration of item intervals with sequential pattern mining; (1)
constraint-based mining and (2) extended sequence-based mining. However, although
constraint-based mining approach avoids the extraction of sequences with non-interest time
intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints
related to item interval, and users must re-execute constraint-based algorithms with changing
constraint values. On the other hand, extended sequence-based mining approach does not need to
specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt
any constraints based on time intervals, it may extract meaningless patterns, such as sequences
with
too long item intervals. This means these two approaches have not only advantages but also
disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with
item interval. The generalization includes three points; (a) a capability to handle two kinds of item
interval measurement, item gap and time interval, (b) a capability to handle extended sequences
which are defined by inserting pseudo items based on the interval itemization function, and (c)
adopting four item interval constraints. Generalized sequential pattern mining is able to substitute
all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese
earthquake data, we have confirmed that our proposed algorithm is able to extract sequential
patterns with item interval, defined in a flexible manner by the interval itemization function.

Index Terms
Data Mining, Sequential Pattern Mining, Item Intervals, Gap, Time-stamp