Best-OnlyParsingForNaturalLanguages
Stc
ComputingScienceColloquium
Date: May 13
Time: 11:00
Room: BBL 509
Title: Best-Only Parsing for Natural Languages (slides in pdf)
Abstract
Rule-based grammars for Natural Languages are nondeterministic,
due to syntactical and lexical ambiguity. Furthermore, they are often
Attributed rather than Context Free and weighted by penalties or
probabilities, making it hard to construct efficient parsers.
In this talk we discuss how to construct Top-Down parsers for such
grammars in spite of these complications.
We first show how to optimize nondeterministic Context-Free parsing
by means of memoization and then generalize this technique to certain
attribute grammars. This makes the parsing process itself polynomial,
but the generation of all parse trees may still demand exponential
time.
Although linguistic research may demand an enumeration of all parse
trees, NLP applications usually require only the best analysis among
all possible analyses of a sentence. Based on this observation, we
introduce the Best-Only Parsing algorithm (BOP) for finding the
single best analysis according to a weighted Attributed grammar.
As an illustration, we consider an English grammar developed for
Information Retrieval applications, for which only a single most
probable analysis of the input is required. Using BOP,
the time to find the single most probable analysis is dominated
by the lexicalization time and depends practically linearly on
the sentence length.