Lecturer: prof. dr Arno Siebes

Contact: if you have any questions, please ask them in the lecture hall: before the lecture starts, when it has finished, during the break, or during the lecture. Per day I get more email than I can possibly answer, there are no guarantees that I will answer any email you send me.

The Course

One of the characteristic problems of Big Data is that the volume of data requires sub-linear algorithms to process it. Sub-linear often simply means that you sample the data. But how do you sample the data if you don't know the distribution? PAC learning is an answer to that problem. The problem does not only depend on the data distribution, but also on the data mining problem you want to solve. To a large extend this course aims to solve the problem how to sample for frequent item set mining. Other topics are addressed because they illustrate the power of the PAC learning framework or because they have a direct bearing on the frequent item set mining problem.

This is a tough course, we discuss many theorems and their proofs. Handling Big Data in a sound way requires firm foundations. But, note that it is about understanding this formal framework rather than being able to reproduce it.

Announcements

  • [16-05] The results of the exam have been emailed to all students who submitted. Since all those students passed, the 2nd round is only for those who asked for an extension. The deadline for submitting your essay is July 13.
  • [02-02] The website is online. Details on the exam can be found here, pointers for (more) literature here, the slides can be found here, information on the excercise classes can be found here
  • [02-02] There will be no classes on
    • Friday Feb 16, as the professor attends the valedictory lecture of one of his PhD thesis advisors
    • March 7 and March 9, because the professor attends a workshop in Titisee
  • [[18-03] More details on the essay can be found in the slides of Lecture 10.
  • [05-04] On Friday April 6 you can ask whatever you want about the course to help you write the first part of your essay.
  • [16-05] The results of the exam have been emailed to all students who submitted. Since all those students passed, the 2nd round is only for those who asked for an extension. The deadline for submitting your essay is July 13.