Holmes:HuntingForHaskellFrauds

Stc

-- BrianVermeer - 16 Jun 2010 Date: 2010-06-24

Time: 12:00

Room: BBL 023

Speaker: Brian Vermeer

Title: Holmes: Hunting for Haskell Frauds

Abstract:
For the functional programming language Haskell there is no specific tool available to compare source code for plagiarism. Other, more used, programming languages like Java do have tool support for checking plagiarism. Especially for educational institutes it would be convenient to have tool support for checking large batches of submissions for plagiarism.

When checking Haskell submission for plagiarism it is important first to discover how we can achieve that. What do we need to compare, how do we compare it and can we automate this procedure.

A possible solution to compare Haskell programs for plagiarism was already known at the beginning of this thesis. There is a tool called MOSS that can compare Haskell programs. However it is based on a universal technique rather then specifically designed for Haskell and therefore doesn't use the specific characteristics of Haskell. This thesis focuses on the issue how to detect possible plagiarism in Haskell submission by using the characteristics of the language. Therefore we created a tool, based on Helium, that parses the source and applies various heuristics to compare the sources. This program, called Holmes, consists of a pre-process that normalises the source and a compare tool that compares the normalised sources. The implemented heuristics divided in three categories: structural, semantic and literal. All heuristics are applied to both prepared and unprepared test sets to both verify and validate the outcome.

At the end of this project the outcome of the heuristics implemented in Holmes showed us that only a few heuristics give useful results looking for plagiarism in Haskell sources. The pre-processor turns out to be a very important link in this process. The implementation shows that we can automatically detect possible software plagiarism in a functional languages like Haskell.