Planning

Which papers are discussed at which meeting is listed here. Links to these papers are given under the tab Literature

  • April 26: Introduction to the course.
  • April 28: No Class, to allow you te prepare
  • May 3: papers 1 - 3. The first two papers created the field of frequent pattern mining, the third was among the first - if not the first - that showed that one does not need transactions to do frequent pattern mining.
  • May 5: No class because the university is closed on liberation day
  • May 10: papers 4 and 5. One way to limit the number of patterns you discover is by formulating more exactly what patterns you want to find, i.e., by specifying (extra) constraints. Clearly it would be best to discover all patterns that satisfy the constraints directly rather than by a filtering step. This is called constraint pushing. The first paper for today introduces FP-Growth an algorithm to find all frequent item sets. The second paper shows how (a certain class of) constraints can be pushed into FP-Growth.
  • May 12: papers 6 and 7. Constraint pushing is one way to avoid filtering, there are alternatives. The first for today gives such an alternative. While the second questions whether pushing constraints is a good idea at all.
  • May 17: No class, because of a PhD defense
  • May 19: paper 8. Constraints are one way to limit the number of interesting patterns, condensed representations are another one. A condensed representation is a subset of all frequent item sets that allows the reconstruction of the complete set of frequent item sets (including their support!) without looking at the data. Today we encounter the first such representation, viz., closed itemsets.
  • May 24: paper 9. Non-derivable itemsets. As condensed as it can be.
  • May 26: no class, the university is closed the day after ascension day
  • May 31: papers 10 - 12. The constraints we saw earlier are a way to define what makes a poattern interesting. Another way of solving the pattern explosion problem is by specifiyng what makes a set of patterns interesting. If you know that, all you have to do is search for the most interesting set of patterns. Today we search for informative collections of item sets.
  • June 2: paper 13. Tiles are another way to define what makes a pattern -- and a set of patterns -- interesting.
  • June 7: papers 14 and 15. Statistics offers different ways to determine whether or not a set of patterns is interesting. Today we look at the first statistical approach, viz., sampling. Note that the second paper uses some advanced concepts, such as MCMC, if you haven't seen that before: try to read past it. It is (in this case) more important to understand what the method achieves than to understand the algorithmic details.
  • June 9: papers 16 and 17. Next to sampling, Statistics offers statistical tests to determine the importance (significance) of results. Todays papers both use tests to find good sets of patterns
  • June 14: papers 18 and 19. The final approach that Statistics offers for pattern set selection is by modelling. Todays (related) papers use Maximal Entropy and Subjective Interestingness to model.
  • June 16: Paper 16. Statistics is one principled way to infer knowledge from data. A more computational way is called Algorithmic Information Theory (AIT). Todays paper shows how the Minimum Description Length Principle (MDL), which is an instantiation of AIT, can be used to discover interesting sets of patterns by introducing the Krimp algorithm.