General

This page has been adapted for the year 2020.

You are supposed to work in groups of two persons.

Although this page is in English, you are allowed to write your report in Dutch.

Assignment 1

For assignment 1 you will have to establish a connection between a C#-program and a database (SQLite). Have a look at the blog of Tigran Gasparian.

Ranking on query results

The goal of this assignment is basically to implement the ideas of the article by Agrawal, Chaudhuri e.a. So you have to solve both the zero-answers and the many-answers problem. We offer a table and a query workload.

The program should be able to process conjunctive equality queries (ceq) on the table. A ceq consists of predicates of the kind attr = value, separated by comma's, terminated by a semicolon. An input consists of a ceq and is read by a (simple) gui. The required value for k is also entered according to this syntax. When missing, use a default value k = 10. Example inputs:

k = 6, brand = 'volkswagen';
cylinders = 4, brand = 'ford';

The basic query is
SELECT * FROM autompg WHERE ceq;
In the ceq, the comma's are replaced by ANDs and the k-value is left out.
The output consists of the top-k tuples according to some ranking principle.

Your program will do some preprocessing on the data and/or the workload. During this phase, a meta database will be constructed and filled. This metadb will be used when aswering the actual queries. Note that your metadb should be constructed only once, for this particular contents of the db and before processing a batch of queries.

Your software is supposed to meet requirement [1], and at least one of requirements [2] and [3]. If you deal with all requirements, the maximum score is 10, otherwise 9.

[1] deal with similarity properties of numerical attributes
[2] use sophisticated techniques for finding value-similarities
[3] use sophisticated techniques for top-k calculations.

The deliverables are:

  • A text file metadb.txt, containing the data definitions required for your meta database (that is
    filled during the preprocessing)
  • A text file metaload.txt, containing sql-statements used to fill the metadb
  • A C# program to determine and fill the contents of the metadb, based on the preprocessing of data and the workload
  • A C# program to deal with the queries
  • A description of your approach, explaining choices and describing experiences. It contains a class diagram of your second program. It also contains an extensive discussion of your approach towards solving both problems. Format: pdf; max 10 pages.
  • Finally, it contains a link to an online clip, where you give a demonstration of the second program running. Prepare it well, by choosing some example queries that give a clear insight to your approach.

Zip your stuff before submitting. Deadline for assignment 1 is Tuesday May 26, 23:55. Prepare test-queries that make sense and are illustrative.