SPMF is an open-source data mining mining platform written in Java.

Sequence data pattern mining Framework in Java (SPMF)

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

I was doing some research on how to mine patterns from sequence data and I found a really good open sourced platform called SPMF.

SPMF is an open-source data mining platform written in Java. It is distributed under the GPL v3 license.

SPMF is an open-source data mining mining platform written in Java.

SPMF is an open-source data mining mining platform written in Java.

The link to the website:  http://www.philippe-fournier-viger.com/spmf/index.php

It offers implementations of 52 data mining algorithms for:

  • sequential pattern mining,
  • association rule mining,
  • frequent itemset mining,
  • sequential rule mining,
  • clustering

It can be used as a standalone program with a user interface or from the command line. Moreover, the source code of each algorithm can be integrated in other Java software.

The following picture is a map which you can visualize the relationship between the various data mining algorithms offered in SPMF.

Visual map of algorithms

Visual map of algorithms

 

Supporting Algorithms

Sequential Pattern Mining Algorithms

  • the PrefixSpan algorithm for mining frequent sequential patterns from a sequence database (Pei et al., 2004).
  • the SPAM algorithm for mining frequent sequential patterns from a sequence database (Ayres, 2002)
  • the BIDE+ algorithm for mining frequent closed sequential patterns from a sequence database (Wang et al. 2007)
  • the SeqDIM algorithm for mining frequent multidimensional sequential patterns from a multi-dimensional sequence database (Pinto et al. 2001)
  • the Songram et al. algorithm for mining frequent closed multidimensional sequential patterns from a multi-dimensional sequence database (Songram et al. 2006)
  • the Fournier-Viger et al. algorithm, a sequential pattern mining algorithm that combines several features from well-known sequential pattern mining algorithms and also proposes some original features (Fournier-Viger et al., 2008):

Sequential Rule Mining Algorithms

Frequent Itemset Mining Algorithms

  • the Apriori algorithm for discovering frequent itemsets from a transaction database. (Agrawal & Srikant, 1994)
  • the AprioriTID algorithm for discovering frequent itemsets from a transaction database. (Agrawal & Srikant, 1994)
  • the FP-Growth algorithm for discovering frequent itemsets from a transaction database. (Han et al., 2004)
  • the Eclat algorithm for discovering frequent itemsets from a transaction database. (Zaki, 2000)
  • the Relim algorithm for discovering frequent itemsets from a transaction database (Borgelt, 2005)
  • the H-Mine algorithm for discovering frequent itemsets from a transaction database (Pei et al., 2007)
  • the Charm algorithm for discovering frequent closed itemsets from a transaction database (Zaki and Hsiao, 2002)
  • the DCI_Closed algorithm for mining frequent closed itemsets from a transaction database (Lucchese et al, 2004)
  • the AprioriClose a.k.a Close algorithm for discovering frequent itemsets and frequent closed itemsets from a transaction database (Pasquier et al., 1999)
  • the AprioriTIDClose algorithm for discovering frequent itemsets and frequent closed itemsets from a transaction database (Pasquier et al., 1999Agrawal & Srikant, 1994)
  • the Charm-MFI algorithm for discovering frequent closed itemsets and maximal frequent itemsets from a transaction database (Szathmary et al. 2006)
  • the AprioriInverse algorithm for mining all perfectly rare itemsets from a transaction database (Koh & Roundtree, 2005)
  • the AprioriRare algorithm for mining minimal rare itemsets and frequent itemsets from a transaction database (Szathmary et al. 2007b)
  • the Zart algorithm for discovering frequent closed itemsets and their minimal generators from a transaction database (Szathmary et al. 200)
  • the U-Apriori algorithm for mining frequent itemsets from uncertain data (Chui et al, 2007)
  • an algorithm for mining pseudo-closed itemsets from a transaction database (Pasquier et al., 1999)
  • the CloStream algorithm for mining frequent closed itemsets from a data stream (Yen et al, 2009)
  • the Two-Phase algorithm for mining high-utility itemsets from a transaction database with utility information (Liu et al., 2005)
  • the HUI-Miner algorithm for mining high-utility itemsets from a transaction database with utility information (Liu & Qu, CIKM 2012) new
  • the VME algorithm for mining ereasable itemsets (Deng & Xu, 2010)
  • algorithms for building, updating and querying an Itemset-Treea structure that allows efficiently generating targeted association rules and frequent itemsets and can be updated incrementally.(Kubat et al, 2003)
  • the MSApriori algorithm for mining frequent itemsets with multiple minimum supports (Liu et al, 1999new
  • the CFPGrowth algorithm for mining frequent itemsets with multiple minimum supports (Hu & Chen, 2006new

Association Rule Mining Algorithms

  • an algorithm for mining all association rules from a transaction database (Agrawal & Srikant, 1994)
  • an algorithm for mining all association rules with the lift measure from a transaction database (adapted from Agrawal & Srikant, 1994)
  • an algorithm for mining the IGB informative and generic basis of association rules from a transaction database (Gasmi et al., 2005)
  • an algorithm for mining perfectly sporadic association rules (Koh & Roundtree, 2005)
  • an algorithm for mining the Guigues-Duquenne basis for exact association rules from a transaction database (Pasquier et al., 1999)
  • an algorithm for mining the proper basis for approximative association rules from a transaction database (Pasquier et al., 1999)
  • an algorithm for mining the structural basis for approximative association rules from a transaction database (Pasquier et al., 1999)
  • an algorithm for mining closed association rules (Szathmary et al. 2006).
  • an algorithm for mining minimal non redundant association rules (Kryszkiewicz, 1998)
  • the Indirect algorithm for mining indirect association rules (Tan et al. 2000; Tan et 2006)
  • the FHSAR algorithm for hiding sensitive association rules (Weng et al. 2008)
  • the TopKRules algorithm for mining the top-k association rules (Fournier-Viger, 2012bpowerpoint)
  • the TNR algorithm for mining top-k non-redundant association rules (Fournier-Viger 2012dpowerpoint)new

Clustering Algorithms

  • the K-Means algorithm for clustering vectors containing one or more double values (MacQueen, 1967)
  • a hierarchical clustering algorithm

Classification Algorithms

  • the ID3 algorithm for building decision trees (Quinlan, 1986)

Data structures

  • red-black tree,
  • itemset-tree,
  • binary tree,
  • KD-tree,
  • triangular matrix.

3 thoughts on “Sequence data pattern mining Framework in Java (SPMF)

Leave a Reply

Your email address will not be published. Required fields are marked *