Home

uuagsmall.png

Projects

Resources

Parser Combinators FAQ

HUT

I am not interested in error correction, and want to stop parsing as soon as I encounter an error.

The parser produces a list of error messages; if you only print the first message on this list and do not force the evaluation of the parser in some other way (e.g. by asking how many errors there are in total), your parser will just stop at this first error, and hopefully with a good suggestion.

My parsing times are getting very large

Most likely you have an ambiguous grammar. For each ambiguous element the parsing time for the rest of the program doubles, quickly leading to exponential parsing times. Experience has shown that it is not always a trivial transformation to make your original grammar one that is not left-recursive. It is easy to make a mistake here. (see also the next question)

How can I choose one of the alternatives in case a grammar is ambiguous

It may be the case that you have several alternatives that are overlapping, and in that case you want to give preference to one of them. There is a function pCost that can be of help here. A typical example is if you use the combinators to define a scanner and you want to parse an identifier, but some of those identifiers are to be interpreted as keywords. The trick now is to append a small step with low cost to the end of the identifier, that makes that alternative a bit more expensive, and thus gives preference to the other alternative. In order to be more robust if we have an erroneaous input immediately after we have added costs to both alternatives in the example.

In the contenxt of:

pChar = 'a' <..> 'z'
pIdent = pList pChar

if_as_ident   = ((("This is the identifier: ") ++) <$> pIdent)
if_as_keyword = ((("This is the keyword: ")    ++) <$> pToks "if")
we get:
*Examples> test (if_as_ident <* pCost 2 <|> if_as_keyword <* pCost 1) "if"
"This is the keyword: if"

How can I debug my parser

The parser maintains an input state. probably the easiest way to find out what your parser is actually doing is to change the functions splitInput and splitInputE, which construct the next token from the input. If you do not have a problem you will see each token being asked for twice.

module Test where
import UU.Parsing
import Debug.Trace
import UU.Parsing.CharParser

data Tracing s = Tracing [s]

instance Show s => InputState (Tracing s) s (Maybe s) where
   splitStateE (Tracing  [])     = Right' (Tracing [])
   splitStateE (Tracing (s:ss))  = trace ("Inspecting " ++ show s  ++ "\n") (Left'  s (Tracing ss))
   splitState  (Tracing (s:ss))  = trace ("Inspecting " ++ show s  ++ "\n") (       s, Tracing ss)
   getPosition (Tracing  [])     = Nothing
   getPosition (Tracing (s:ss))  = Just s



-- parseIO' :: (Eq s, Show s, Symbol s) => Parser s a -> Tracing s -> IO a
parseIO' = parseIOMessage showMessage 
  where showMessage (Msg expecting position action)  
          =  let pos = case position of
                           Nothing -> "at end of file"
                           Just s  -> case action of 
                                Insert _ -> "before " ++ show s
                                Delete t -> "at " ++ show t  
             in "\n?? Error      : " ++ pos ++
                "\n?? Expecting  : " ++ show expecting ++
                "\n?? Repaired by: " ++ show action ++ "\n"                

a = pSym 'a'
b = pSym 'b'
c = pSym 'c'

ab = (\ a b -> [a,b]) <$> a <*> b
bc = (\ b c -> [b,c]) <$> b <*> c

abc = (:)                    <$> a  <*> bc
abc' = (\ ab c -> ab ++ [c]) <$> ab <*> c

test = parseIO' (abc <|> abc') (Tracing "abc")

As you can see every character (except the 'a' on which the parser forks) shows up 4 times now, indicating that you have two alternative parses:

*Test> test
Inspecting 'a'

Inspecting 'a'

Inspecting 'a'

Inspecting 'b'

Inspecting 'b'

Inspecting 'b'

Inspecting 'b'

Inspecting 'c'

Inspecting 'c'

Inspecting 'c'

Inspecting 'c'

"abc"

-- DoaitseSwierstra - 19 Jan 2007