JOURNAL OF COMPUTERS (JCP)

ISSN : 1796-203X

Volume : 1 Issue : 3 Date : June 2006

**Generalized Sequential Pattern Mining with Item Intervals**

Yu Hirate and Hayato Yamana

Page(s): 51-60

Full Text: PDF (447 KB)

**Abstract**

Sequential pattern mining is an important data mining method with broad applications that can

extract frequent sequences while maintaining their order. However, it is important to identify item

intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence <

A;B > with a 1-day interval and a sequence < A;B > with a 1-year interval are completely different; the

former sequence may have some association, while the latter may not. To adopt item intervals, two

approaches have been proposed for integration of item intervals with sequential pattern mining; (1)

constraint-based mining and (2) extended sequence-based mining. However, although

constraint-based mining approach avoids the extraction of sequences with non-interest time

intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints

related to item interval, and users must re-execute constraint-based algorithms with changing

constraint values. On the other hand, extended sequence-based mining approach does not need to

specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt

any constraints based on time intervals, it may extract meaningless patterns, such as sequences

with

too long item intervals. This means these two approaches have not only advantages but also

disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with

item interval. The generalization includes three points; (a) a capability to handle two kinds of item

interval measurement, item gap and time interval, (b) a capability to handle extended sequences

which are defined by inserting pseudo items based on the interval itemization function, and (c)

adopting four item interval constraints. Generalized sequential pattern mining is able to substitute

all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese

earthquake data, we have confirmed that our proposed algorithm is able to extract sequential

patterns with item interval, defined in a flexible manner by the interval itemization function.

**Index Terms**

Data Mining, Sequential Pattern Mining, Item Intervals, Gap, Time-stamp

ISSN : 1796-203X

Volume : 1 Issue : 3 Date : June 2006

Page(s): 51-60

Full Text: PDF (447 KB)

extract frequent sequences while maintaining their order. However, it is important to identify item

intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence <

A;B > with a 1-day interval and a sequence < A;B > with a 1-year interval are completely different; the

former sequence may have some association, while the latter may not. To adopt item intervals, two

approaches have been proposed for integration of item intervals with sequential pattern mining; (1)

constraint-based mining and (2) extended sequence-based mining. However, although

constraint-based mining approach avoids the extraction of sequences with non-interest time

intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints

related to item interval, and users must re-execute constraint-based algorithms with changing

constraint values. On the other hand, extended sequence-based mining approach does not need to

specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt

any constraints based on time intervals, it may extract meaningless patterns, such as sequences

with

too long item intervals. This means these two approaches have not only advantages but also

disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with

item interval. The generalization includes three points; (a) a capability to handle two kinds of item

interval measurement, item gap and time interval, (b) a capability to handle extended sequences

which are defined by inserting pseudo items based on the interval itemization function, and (c)

adopting four item interval constraints. Generalized sequential pattern mining is able to substitute

all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese

earthquake data, we have confirmed that our proposed algorithm is able to extract sequential

patterns with item interval, defined in a flexible manner by the interval itemization function.