Title: Axioms for independence Student: vacancy Supervisor: Peter de Waal ECTS: 7.5 or 15 Related courses: Probabilistic Reasoning Description: For a set of variables we can define an independence relation as a set of triplets (X,Y|Z), the so-called independence statements. The statement (X,Y|Z) can be interpreted as "variables X and Y are independent given Z". In order to have this relation represent a notion of independence that corresponds to, for instance, independence as we know from random variables, the set of triplets must obey a number of axioms, for instance symmetry. Any independence relation can be characterised by a small basic set of statemens, the kernel. Any independence statement can then be derived by (repeated) application of the axioms to the kernel. There exist algorithms to determine the axiomatic closure of a given kernel. Two algorithms have also been developed to tackle the representation problem the other way round. These algorithms determine the kernel set given a set of independence statements. We have no good insight, however, into the complexity of the full set of independence statements, given only the kernel statements and vice versa. We also would like to investigate the complexities of the mentioned algorithms and we would like to obtain insight into which properties of independence relations influence these complexities. A possible topic for this experimentation project is to implement some of these algorithms and to investigate the complexity of these algorithms. Another topic is to investigate which properties of an independence relation determine the size of the kernel, and which properties of the kernel determine the size of the axiomatic closure. ============================================ Title: Complexity in Multi-dimensional Bayesian Classifiers Student: vacancy Supervisor: Peter de Waal ECTS: 7,5 or 15 Related courses: Probabilistic Reasoning, Advanced Data Mining Description Bayesian network classifiers are popular tools in machine learning for classification of data instances. A classifier tries to determine the best guess for the class of a given data instance, given observations of other features in this data. A Bayesian spam filter, for instance, tries to compute the possibility that an email message is Spam, based on the occurence of words or other features in the body of the message. Up to now Bayesian network classifiers usually deal with one dimensional classification: the class of a data instance can be represented as a one-dimensional variable. Recently we have developed an extension to Bayesian classifiers, where the variable to be classified is multi-dimensional. An example is a classifier in the medical domain for oesophageal cancer, which tries to classify the size of the tumor, and the extent of the tumor's invasion of other organs. It is expected that multi-dimensional classifiers use fewer parameters in the models than their one-dimensional counterparts. This hopefully will lead to better estimation of these parameters and to improved classification performance. The topic of this research project is to investigate whether or when multi-dimensional classifiers indeed do need fewer parameters, and for which types of networks this occurs. For this an analysis of the classification algorithms has to be made on existing data sets, and possibly on artificial data sets. For this project the software tool Dazzle will be used. Dazzle has been developed at the ICS departement to model and analyse Bayesian networks, including classifiers. ============================================= Title: Sensitivity in Dynamic Bayesian Networks Student: vacancy Supervisor: Peter de Waal: ECTS: 7.5 or 15 Related courses: Probabilistic Reasoning Description Dynamic Bayesian networks (DBN) have become quite popular as extension to classical Bayesian networks for modelling processes that evolve over time. It is common in modelling with DBN's to assume that the unknown parameters of the DBN are homogeneous, i.e. they do not change over time. Recently research has started on sensitivity analysis for this type of networks. We conjecture that sensitivity analysis can also be used to study the effect of relaxing the homogeneity assumption. In this experimentation project you would investigate different approaches to using sensitivity analysis for this purpose. This would be done on benchmark data sets, using our homemade analysis tool Dazzle.