| Copyright | (c) Chris Penner 2019 |
|---|---|
| License | BSD3 |
| Safe Haskell | None |
| Language | Haskell2010 |
Control.Lens.Regex
Contents
Description
Synopsis
- regex :: Regex -> IndexedTraversal' Int Text (Match Text)
- regexBS :: Regex -> IndexedTraversal' Int ByteString (Match ByteString)
- match :: Monoid text => Traversal' (Match text) text
- groups :: Traversal' (Match text) [text]
- matchAndGroups :: Monoid text => Getter (Match text) (text, [text])
- rx :: QuasiQuoter
- mkRegexQQ :: [PCREOption] -> QuasiQuoter
- compile :: ByteString -> [PCREOption] -> Regex
- compileM :: ByteString -> [PCREOption] -> Either String Regex
- type Match text = [Either text text]
- data Regex
Combinators
regex :: Regex -> IndexedTraversal' Int Text (Match Text) Source #
The base combinator for doing regex searches.
It's a traversal which selects Matches; you can compose it with match or groups
to get the relevant parts of your match.
>>>txt = "raindrops on roses and whiskers on kittens" :: Text
Search
>>>has (regex [rx|whisk|]) txtTrue
Get matches
>>>txt ^.. regex [rx|\br\w+|] . match["raindrops","roses"]
Edit matches
>>>txt & regex [rx|\br\w+|] . match %~ T.intersperse '-' . T.toUpper"R-A-I-N-D-R-O-P-S on R-O-S-E-S and whiskers on kittens"
Get Groups
>>>txt ^.. regex [rx|(\w+) on (\w+)|] . groups[["raindrops","roses"],["whiskers","kittens"]]
Edit Groups
>>>txt & regex [rx|(\w+) on (\w+)|] . groups %~ Prelude.reverse"roses on raindrops and kittens on whiskers"
Get the third match
>>>txt ^? regex [rx|\w+|] . index 2 . matchJust "roses"
Match integers, Read them into ints, then sort them in-place
dumping them back into the source text afterwards.
>>>"Monday: 29, Tuesday: 99, Wednesday: 3" & partsOf (regex [rx|\d+|] . match . unpacked . _Show @Int) %~ sort"Monday: 3, Tuesday: 29, Wednesday: 99"
To alter behaviour of the regex you may wish to pass PCREOptions when compiling it.
The default behaviour may seem strange in certain cases; e.g. it operates in 'single-line'
mode. You can compile the Regex separately and add any options you like, then pass the resulting
Regex into regex;
Alternatively can make your own version of the QuasiQuoter with any options you want embedded
by using mkRegexQQ.
regexBS :: Regex -> IndexedTraversal' Int ByteString (Match ByteString) Source #
A version of regex which operates directly on ByteStrings.
This is more efficient than using regex as it avoids converting back and forth
between ByteString and Text.
match :: Monoid text => Traversal' (Match text) text Source #
Traverse each match
Get a match if one exists:
>>>"find a needle in a haystack" ^? regex [rx|n..dle|] . matchJust "needle"
Collect all matches
>>>"one _two_ three _four_" ^.. regex [rx|_\w+_|] . match["_two_","_four_"]
You can edit the traversal to perform a regex replace/substitution
>>>"one _two_ three _four_" & regex [rx|_\w+_|] . match %~ T.toUpper"one _TWO_ three _FOUR_"
groups :: Traversal' (Match text) [text] Source #
Access all groups of a match at once.
Note that you can edit the groups through this traversal,
Changing the length of the list has behaviour similar to partsOf.
Get all matched groups:
>>>"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups[["raindrops","roses"],["whiskers","kittens"]]
You can access a specific group by combining with ix
>>>"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups . ix 1["roses","kittens"]
groups is a traversal; you can mutate matches through it.
>>>"raindrops on roses and whiskers on kittens" & regex [rx|(\w+) on (\w+)|] . groups . ix 1 %~ T.toUpper"raindrops on ROSES and whiskers on KITTENS"
Editing the list rearranges groups
>>>"raindrops on roses and whiskers on kittens" & regex [rx|(\w+) on (\w+)|] . groups %~ Prelude.reverse"roses on raindrops and kittens on whiskers"
You can traverse the list to flatten out all groups
>>>"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups . traversed["raindrops","roses","whiskers","kittens"]
matchAndGroups :: Monoid text => Getter (Match text) (text, [text]) Source #
Collect both the match text AND all the matching groups
>>>"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . matchAndGroups[("raindrops on roses",["raindrops","roses"]),("whiskers on kittens",["whiskers","kittens"])]
Compiling regex
rx :: QuasiQuoter Source #
QuasiQuoter for compiling regexes.
This is just re re-exported under a different name so as not to conflict with re from
Lens
mkRegexQQ :: [PCREOption] -> QuasiQuoter #
Returns a QuasiQuoter like re, but with given PCRE options.
compile :: ByteString -> [PCREOption] -> Regex #
Compile a perl-compatible regular expression stored in a strict bytestring.
An example
let r = compile (pack "^(b+|a){1,2}?bc") []Or using GHC's -XOverloadedStrings flag, and importing Data.ByteString.Char8, we can avoid the pack:
let r = compile "^(b+|a){1,2}?bc" []If the regular expression is invalid, an exception is thrown.
If this is unsuitable, compileM is availlable, which returns failure
in a monad.
To do case insentive matching,
compile "^(b+|a){1,2}?bc" [caseless]Other flags are documented below.
The resulting abstract regular expression can be passed to match
for matching against a subject string.
The arguments are:
pat: A ByteString containing the regular expression to be compiled.flags, optional bit flags. IfNothingis provided, defaults are used.
Valid compile-time flags are:
anchored- Force pattern anchoringauto_callout- Compile automatic calloutsbsr_anycrlf- \R matches only CR, LF, or CRLFbsr_unicode- \R matches all Unicode line endingscaseless- Do caseless matchingdollar_endonly-$not to match newline at enddotall- matches anything including NLdupnames- Allow duplicate names for subpatternsextended- Ignore whitespace and # commentsextra- PCRE extra features (not much use currently)firstline- Force matching to be before newlinemultiline-^and$match newlines within datanewline_any- Recognize any Unicode newline sequencenewline_anycrlf- Recognize CR, LF, and CRLF as newline sequencesnewline_cr- Set CR as the newline sequencenewline_crlf- Set CRLF as the newline sequencenewline_lf- Set LF as the newline sequenceno_auto_capture- Disable numbered capturing parentheses (named ones available)ungreedy- Invert greediness of quantifiersutf8- Run in UTF-8 modeno_utf8_check- Do not check the pattern for UTF-8 validity
The regex is allocated via malloc on the C side, and will be deallocated by the runtime when the Haskell value representing it goes out of scope.
See 'man pcreapi for more details.
Caveats: patterns with embedded nulls, such as "0*" seem to be mishandled, as this won't currently match the subject "000".
compileM :: ByteString -> [PCREOption] -> Either String Regex #
Types
type Match text = [Either text text] Source #
Match represents a whole regex match; you can drill into it using match or groups or matchAndGroups
text is either Text or ByteString depending on whether you use regex or regexBS
Consider this to be internal; don't depend on its representation.