Best-OnlyParsingForNaturalLanguages

Stc
ComputingScienceColloquium

Date: May 13

Time: 11:00

Room: BBL 509

Speaker: Kees Koster

Title: Best-Only Parsing for Natural Languages (slides in pdf)

Abstract

Rule-based grammars for Natural Languages are nondeterministic, due to syntactical and lexical ambiguity. Furthermore, they are often Attributed rather than Context Free and weighted by penalties or probabilities, making it hard to construct efficient parsers. In this talk we discuss how to construct Top-Down parsers for such grammars in spite of these complications.

We first show how to optimize nondeterministic Context-Free parsing by means of memoization and then generalize this technique to certain attribute grammars. This makes the parsing process itself polynomial, but the generation of all parse trees may still demand exponential time.

Although linguistic research may demand an enumeration of all parse trees, NLP applications usually require only the best analysis among all possible analyses of a sentence. Based on this observation, we introduce the Best-Only Parsing algorithm (BOP) for finding the single best analysis according to a weighted Attributed grammar.

As an illustration, we consider an English grammar developed for Information Retrieval applications, for which only a single most probable analysis of the input is required. Using BOP, the time to find the single most probable analysis is dominated by the lexicalization time and depends practically linearly on the sentence length.