Directly Mining Descriptive Patterns

Abstract. Mining small, useful, and high-quality sets of patterns is an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by post-processing it is virtually impossible to analyse dense or large databases in any detail.

We introduce Slim, an any-time algorithm for mining high-quality sets of itemsets directly from data. We use MDL to identify the best set of itemsets as that set that describes the data best. To approximate this optimum, we iteratively use the current solution to determine what itemset would provide most gain—estimating quality using an accurate heuristic. Without requiring a pre-mined candidate collection, Slim is parameter-free in both theory and practice.

Experiments show we mine high-quality pattern sets; while evaluating orders-of-magnitude fewer candidates than our closest competitor, Krimp, we obtain much better compression ratios—closely approximating the locally-optimal strategy. Classification experiments independently verify we characterise data very well.

Public release: source code and binaries. Our implementation of Slim is freely available for research purposes; we provide both the source code and binaries for GNU/Linux (x64) and Windows (x86 and x64). Please refer to the documentation in the package for installation/compilation details and usage hints.

Implementation

Slim source code & binaries for Windows and Linux (20th July 2015) by Manan Gandhi, Koen Smets, Matthijs van Leeuwen, and Jilles Vreeken.

Related Publications

Gandhi, M & Vreeken, J Slimmer, outsmarting Slim. PhD Poster and Video at: the 13th International Symposium on Intelligent Data Analysis (IDA), Springer, 2014.
video
Smets, K & Vreeken, J Slim: Directly Mining Descriptive Patterns. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 236-247, SIAM, 2012.
Vreeken, J, van Leeuwen, M & Siebes, A Krimp: Mining Itemsets that Compress. Data Mining and Knowledge Discovery vol.23(1), pp 169-214, Springer, 2011. (IF 2.950)