KRIMP

The Krimp algorithm is our answer to the pattern explosion in data mining: the best set of patterns is that set that compresses the data best. Using the Minimum Description Length (MDL) principle, Krimp achieves reductions of up to 7 orders of magnitude in the number of frequent itemsets. The selected patterns are highly characteristic for the data, as indicated by good compression ratios and high classification accuracies.
Krimp has been first published as Siebes et al (2006), although not yet under that name. Since then, we extended the Krimp foundation for data mining tasks like characterising differences between databases, generating data, completing missing data, detecting changes in streams, identifying the components of a database, and more. For more details, see the publication list below.

Public release: source code and binaries
Our implementation of Krimp is freely available for research purposes; we provide both the source code and binaries for Windows (x86 and x64) and Linux. In addition to the pattern selection algorithm, it contains the Krimp classifier and the StreamKrimp algorithm. For your convenience, the package includes some example UCI datasets taken from the LUCS-KDD data library. Please refer to the documentation in the package for installation/compilation details and usage hints.

Download the most recent public release of Krimp here (version 1st of February 2013).


Selected Refereed Krimp Publications

2011
Bonchi, F., van Leeuwen, M. & Ukkonen, A. Characterizing Uncertain Data using Compression. In: Proceedings of the SIAM Conference on Data Mining 2011 (SDM'11), 2011.
Vreeken, J., van Leeuwen, M. & Siebes, A. Krimp: Mining Itemsets that Compress. In: Data Mining and Knowledge Discovery, vol.23(1), Springer, 2011.
2010
van Leeuwen, M. Patterns that Matter, Dissertation, Universiteit Utrecht, 2010.
2009
Vreeken, J. Making Pattern Mining Useful, Dissertation, Universiteit Utrecht, 2009.
van Leeuwen, M., Bonchi, F., Sigurbjörnsson, B. & Siebes, A. Compressing Tags to Find Interesting Media Groups. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'09), pp 1147-1156, 2009.
van Leeuwen, M., Vreeken, J. & Siebes, A. Identifying the Components. In: Data Mining and Knowledge Discovery, special issue ECMLPKDD'09, vol.19(2), pp 176-193, Springer, 2009.
Heikinheimo, H., Vreeken, J., Siebes, A. & Mannila, H. Low-Entropy Set Selection. In: Proceedings of the SIAM International Conference on Data Mining (SDM'09), pp 569-579, 2009.
2008
van Leeuwen, M. & Siebes, A. StreamKrimp: Detecting Change in Data Streams. In: Proceedings of the European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Data 2008 Part II (ECML PKDD'08), pp 765-774, 2008.
Vreeken, J. & Siebes, A. Filling in the Blanks - Krimp Minimisation for Missing Data. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08), pp 1067-1072, 2008.
2007
Vreeken, J., van Leeuwen, M. & Siebes, A. Preserving Privacy through Data Generation. In: Proceedings of the IEEE Conference on Data Mining 2007 (ICDM'07), pp 685-690, 2007.
Vreeken, J., van Leeuwen, M. & Siebes, A. Characterising the Difference. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2007 (KDD'07), pp 765-774, 2007.
2006
van Leeuwen, M., Vreeken, J. & Siebes, A. Compression Picks the Item Sets that Matter. In: Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'06), pp 585-592, 2006.
Siebes, A., Vreeken, J. & van Leeuwen, M. Item Sets That Compress. In: Proceedings of the SIAM Conference on Data Mining 2006 (SDM'06), pp 393-404, 2006.

Selected Unrefereed Krimp Publications

2008
Vreeken, J. & Siebes, A. Krimp Minimisation for Missing Data Estimation, Technical Report UU-CS-2008-034, Universiteit Utrecht, 2008.
2007
Vreeken, J., van Leeuwen, M. & Siebes, A. Privacy Preservation through Data Generation, Technical Report UU-CS-2007-020, Universiteit Utrecht, 2007.
2006
van Leeuwen, M., Vreeken, J. & Siebes, A. Compression Picks the Significant Item Sets, Technical Report UU-CS-2006-050, Universiteit Utrecht, 2006.