KRIMP
The Krimp algorithm is our answer to the pattern explosion in data mining: the best set of patterns is that set that compresses the data best. Using the Minimum Description Length (MDL) principle, Krimp achieves reductions of up to 7 orders of magnitude in the number of frequent itemsets. The selected patterns are highly characteristic for the data, as indicated by good compression ratios and high classification accuracies.Krimp has been first published as Siebes et al (2006), although not yet under that name. Since then, we extended the Krimp foundation for data mining tasks like characterising differences between databases, generating data, completing missing data, detecting changes in streams, identifying the components of a database, and more. For more details, see the publication list below.
Public release: source code and binaries
Our implementation of Krimp is freely available for research purposes; we provide both the source code and binaries for Windows (x86 and x64) and Linux. In addition to the pattern selection algorithm, it contains the Krimp classifier and the StreamKrimp algorithm. For your convenience, the package includes some example UCI datasets taken from the LUCS-KDD data library. Please refer to the documentation in the package for installation/compilation details and usage hints.
Download the most recent public release of Krimp here (version 1st of February 2013).
Selected Refereed Krimp Publications
2011 |
|
Characterizing Uncertain Data using Compression. In: Proceedings of the SIAM Conference on Data Mining 2011 (SDM'11), 2011. |
|
Krimp: Mining Itemsets that Compress. In: Data Mining and Knowledge Discovery, vol.23(1), Springer, 2011. |
|
2010 |
|
Patterns that Matter, Dissertation, Universiteit Utrecht, 2010. |
|
2009 |
|
Making Pattern Mining Useful, Dissertation, Universiteit Utrecht, 2009. |
|
Compressing Tags to Find Interesting Media Groups. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'09), pp 1147-1156, 2009. |
|
Identifying the Components. In: Data Mining and Knowledge Discovery, special issue ECMLPKDD'09, vol.19(2), pp 176-193, Springer, 2009. |
|
Low-Entropy Set Selection. In: Proceedings of the SIAM International Conference on Data Mining (SDM'09), pp 569-579, 2009. |
|
2008 |
|
StreamKrimp: Detecting Change in Data Streams. In: Proceedings of the European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Data 2008 Part II (ECML PKDD'08), pp 765-774, 2008. |
|
Filling in the Blanks - Krimp Minimisation for Missing Data. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08), pp 1067-1072, 2008. |
|
2007 |
|
Preserving Privacy through Data Generation. In: Proceedings of the IEEE Conference on Data Mining 2007 (ICDM'07), pp 685-690, 2007. |
|
Characterising the Difference. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2007 (KDD'07), pp 765-774, 2007. |
|
2006 |
|
Compression Picks the Item Sets that Matter. In: Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'06), pp 585-592, 2006. |
|
Item Sets That Compress. In: Proceedings of the SIAM Conference on Data Mining 2006 (SDM'06), pp 393-404, 2006. |
Selected Unrefereed Krimp Publications
2008 |
|
Krimp Minimisation for Missing Data Estimation, Technical Report UU-CS-2008-034, Universiteit Utrecht, 2008. |
|
2007 |
|
Privacy Preservation through Data Generation, Technical Report UU-CS-2007-020, Universiteit Utrecht, 2007. |
|
2006 |
|
Compression Picks the Significant Item Sets, Technical Report UU-CS-2006-050, Universiteit Utrecht, 2006. |



