General

You are supposed to work in groups of two persons.

Although this page is in English, you are allowed to write your report in Dutch.

Assignment 1

For assignment 1 you will have to establish a connection between a C#-program and a database (SQLite). Have a look at the blog of Tigran Gasparian. Be sure that you have finished these preparations in week 19.

Note that we expect a short report (pdf) on your approach by Sunday May 14, 2017, 23:55. In this report, you are supposed give a sketch of the architecture of your program (modules) and a schema design for the meta database.

Ranking on query results

The goal of this assignment is basically to implement the ideas of the article by Agrawal, Chaudhuri e.a. So you have to solve both the zero-answers and the many-answers problem. We offer a table and a query workload.

The program should be able to process conjunctive equality queries (ceq) on the table. A ceq consists of predicates of the kind attr = value, separated by comma's, terminated by a semicolon. An input consists of a ceq and is read by a (simple) gui. The required value for k is also entered according to this syntax. When missing, use a default value k = 10. Example inputs:

k = 6, brand = 'volkswagen';
cylinders = 4, brand = 'ford';

The basic query is
SELECT * FROM autompg WHERE ceq;
In the ceq, the comma's are replaced by ANDs and the k-value is left out.
The output consists of the top-k tuples according to some ranking principle.

Your program will do some preprocessing on the data and/or the workload. During this phase, a meta database will be constructed and filled. This metadb will be used when aswering the actual queries. Note that your metadb should be constructed only once, for this particular contents of the db and before processing a batch of queries.

Your software is supposed to meet requirement [1], and at least one of requirements [2] and [3]. If you deal with all requirements, the maximum score is 10, otherwise 9.

[1] deal with similarity properties of numerical attributes
[2] use sophisticated techniques for finding value-similarities
[3] use sophisticated techniques for top-k calculations.

The deliverables are:

  • a text file metadb.txt, containing the data definitions required for your meta database (that is
    filled during the preprocessing)
  • a text file metaload.txt, containing sql-statements used to fill the metadb
  • a C# program to determine and fill the contents of the metadb, based on the preprocessing of data and the workload
  • a C# program to deal with the queries
  • a description of your approach, explaining choices and describing experiences (we do not expect a user manual). Format: pdf; max 6 pages.

Zip your stuff before submitting. Deadline for assignment 1 is Sunday May 28, 2017, 23:55. A demonstration of your software to one of the assistants may be part of the protocol. Prepare test-queries that make sense and are illustrative.