| |||||||||||||||||||
| |||||||||||||||||||
Description | |||||||||||||||||||
By Chris Kuklewicz (haskell (at) list (dot) mightyreason (dot) com), 2006. This file is licensed under the LGPL (version 2, see the LICENSE file), because it is a derivative work. This DFAEngine takes the lazy transition table from Manuel Chakravarty's lexer in CTK Light, but uses it for simpler purposes. The original CTK code can be found here http://www.cse.unsw.edu.au/~chak/haskell/ctk/ Don Stewart (http://www.cse.unsw.edu.au/~dons/contact.html) also contributed to this code. I want the freedom to alter the types a bit, so this is a separate module. The CTK and DFA code can be thought of as three parts: 1. The ability to compose Regexp combinators which will lazily assemble a DFA. This is mainly bound up in the Cont type and the internal functions that merge it (exported as >||<). 2. The interface of how to specify Failure and Success. This was bound up in LexAction holding an function and is now lexAcceptlesFailurelexContinue. 3. The traversal engine. At each longer and longer match the last seen match is updated. Different traversals keep track of different levels of detail. As a descendent of the regex-dna entry at http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=all, this module has contributions from Don Stewart, Alson Kemp, and Chris Kuklewicz. | |||||||||||||||||||
Synopsis | |||||||||||||||||||
Documentation | |||||||||||||||||||
data Lexer | |||||||||||||||||||
| |||||||||||||||||||
newtype LexAction | |||||||||||||||||||
| |||||||||||||||||||
type Regexp = Lexer -> Lexer | |||||||||||||||||||
a regular expression | |||||||||||||||||||
data Cont | |||||||||||||||||||
| |||||||||||||||||||
data Boundary | |||||||||||||||||||
| |||||||||||||||||||
emptyOp :: Regexp | |||||||||||||||||||
These create Regexp Empty lexeme (noop) | |||||||||||||||||||
char :: DoPa -> Char -> Regexp | |||||||||||||||||||
One character regexp | |||||||||||||||||||
alt :: DoPa -> [Char] -> Regexp | |||||||||||||||||||
accepts a list of alternative characters Equiv. to `(foldr1 (>|<) . map char) cs', but much faster | |||||||||||||||||||
altNot :: DoPa -> [Char] -> Regexp | |||||||||||||||||||
accepts an inverted list of alternative characters Equiv. to `(foldr1 (>|<) . map char) cs', but much faster | |||||||||||||||||||
allChar :: DoPa -> Regexp | |||||||||||||||||||
accepts any character | |||||||||||||||||||
beginLine :: DoPa -> Regexp | |||||||||||||||||||
endLine :: DoPa -> Regexp | |||||||||||||||||||
beginInput :: DoPa -> Regexp | |||||||||||||||||||
endInput :: DoPa -> Regexp | |||||||||||||||||||
(>|<) :: Regexp -> Regexp -> Regexp | |||||||||||||||||||
disjunctive combination of two regexps, corresponding to x|y. This will find the longest match | |||||||||||||||||||
orRE :: [Regexp] -> Regexp | |||||||||||||||||||
(+>) :: Regexp -> Regexp -> Regexp | |||||||||||||||||||
These combine two Regexp's Concatenation of regexps is just concatenation of functions x +> y corresponds to xy | |||||||||||||||||||
concatRE :: [Regexp] -> Regexp | |||||||||||||||||||
quest :: Regexp -> Regexp -> Regexp | |||||||||||||||||||
x quest y corresponds to the regular expression x?y | |||||||||||||||||||
star :: Regexp -> Regexp -> Regexp | |||||||||||||||||||
x star y corresponds to the regular expression x*y self is of type Lexer | |||||||||||||||||||
plus :: Regexp -> Regexp -> Regexp | |||||||||||||||||||
x plus y corresponds to the regular expression x+y | |||||||||||||||||||
failure :: Lexer | |||||||||||||||||||
accept :: Regexp -> Lexer | |||||||||||||||||||
Have a match to Regexp be consider a success | |||||||||||||||||||
(>||<) :: Lexer -> Lexer -> Lexer | |||||||||||||||||||
disjunctive combination of two lexers (longest match, right biased) | |||||||||||||||||||
findRegex | |||||||||||||||||||
| |||||||||||||||||||
matchesRegex :: Lexer -> [Char] -> Bool | |||||||||||||||||||
This searches the input string for a match to the regex There is no need to wait for the longest match, so stop at first lexAccept | |||||||||||||||||||
testHere | |||||||||||||||||||
| |||||||||||||||||||
countRegex :: Lexer -> [Char] -> Int | |||||||||||||||||||
This counts the number of matches to regex in the string, (it checks each possible starting position). This should be the same as ((length (splitRegex re input))-1) but more efficient ^^^ fix | |||||||||||||||||||
findRegexS :: Lexer -> String -> (String, Maybe (String, Int, String)) | |||||||||||||||||||
This is a version of findRegex that does not compute the length of the prefix | |||||||||||||||||||
peek :: Cont -> Char -> Lexer | |||||||||||||||||||
inBounds :: Char -> BoundsNum -> Bool | |||||||||||||||||||
lexFailure :: LexAction | |||||||||||||||||||
lexContinue :: LexAction | |||||||||||||||||||
lexAccept :: LexAction | |||||||||||||||||||
Produced by Haddock version 0.8 |