Lecturer: prof. dr Arno Siebes

Contact: if you have any questions, please ask them in the lecture hall: before the lecture starts, when it has finished, during the break, or during the lecture. Per day I get more email than I can possibly answer, there are no guarantees that I will answer any email you send me.

The Course

One of the characteristic problems of Big Data is that the volume of data requires sub-linear algorithms to process it. Sub-linear often simply means that you sample the data. But how do you sample the data if you don't know the distribution? PAC learning is an answer to that problem. The problem does not only depend on the data distribution, but also on the data mining problem you want to solve. To a large extend this course aims to solve the problem how to sample for frequent item set mining. Other topics are addressed because they illustrate the power of the PAC learning framework or because they have a direct bearing on the frequent item set mining problem.

This is a tough course, we discuss many theorems and their proofs. Handling Big Data in a sound way requires firm foundations. But, note that it is about understanding this formal framework rather than being able to reproduce it.


  • [05-02] The website is online. Details on the exam can be found here, pointers for (more) literature here, the slides can be found here, information on the excercise classes can be found here
  • [08-03] The exam page has been updated. The pdf file with the detailed instructions can be downloaded from there as well as the necessary latex files.
  • [08-03] The slides have also been updated, mainly correcting small errors
  • [26-03] On March 26, our regular lecture slot will be used for a Q&A session. We will discuss all questions you have regarding the course material
  • [28-03] Because we had a lively Q&A session last wednesday, we'll do another one on April 3, again at the regular time and place