Copyright | (c) Chris Penner 2019 |
---|---|
License | BSD3 |
Safe Haskell | None |
Language | Haskell2010 |
Control.Lens.Regex
Contents
Description
Synopsis
- regex :: Regex -> IndexedTraversal' Int Text (Match Text)
- regexBS :: Regex -> IndexedTraversal' Int ByteString (Match ByteString)
- match :: Monoid text => Traversal' (Match text) text
- groups :: Traversal' (Match text) [text]
- matchAndGroups :: Monoid text => Getter (Match text) (text, [text])
- rx :: QuasiQuoter
- mkRegexQQ :: [PCREOption] -> QuasiQuoter
- compile :: ByteString -> [PCREOption] -> Regex
- compileM :: ByteString -> [PCREOption] -> Either String Regex
- type Match text = [Either text text]
- data Regex
Combinators
regex :: Regex -> IndexedTraversal' Int Text (Match Text) Source #
The base combinator for doing regex searches.
It's a traversal which selects Match
es; you can compose it with match
or groups
to get the relevant parts of your match.
>>>
txt = "raindrops on roses and whiskers on kittens" :: Text
Search
>>>
has (regex [rx|whisk|]) txt
True
Get matches
>>>
txt ^.. regex [rx|\br\w+|] . match
["raindrops","roses"]
Edit matches
>>>
txt & regex [rx|\br\w+|] . match %~ T.intersperse '-' . T.toUpper
"R-A-I-N-D-R-O-P-S on R-O-S-E-S and whiskers on kittens"
Get Groups
>>>
txt ^.. regex [rx|(\w+) on (\w+)|] . groups
[["raindrops","roses"],["whiskers","kittens"]]
Edit Groups
>>>
txt & regex [rx|(\w+) on (\w+)|] . groups %~ Prelude.reverse
"roses on raindrops and kittens on whiskers"
Get the third match
>>>
txt ^? regex [rx|\w+|] . index 2 . match
Just "roses"
Match integers, Read
them into ints, then sort them in-place
dumping them back into the source text afterwards.
>>>
"Monday: 29, Tuesday: 99, Wednesday: 3" & partsOf (regex [rx|\d+|] . match . unpacked . _Show @Int) %~ sort
"Monday: 3, Tuesday: 29, Wednesday: 99"
To alter behaviour of the regex you may wish to pass PCREOption
s when compiling it.
The default behaviour may seem strange in certain cases; e.g. it operates in 'single-line'
mode. You can compile
the Regex
separately and add any options you like, then pass the resulting
Regex
into regex
;
Alternatively can make your own version of the QuasiQuoter with any options you want embedded
by using mkRegexQQ
.
regexBS :: Regex -> IndexedTraversal' Int ByteString (Match ByteString) Source #
A version of regex
which operates directly on ByteString
s.
This is more efficient than using regex
as it avoids converting back and forth
between ByteString
and Text
.
match :: Monoid text => Traversal' (Match text) text Source #
Traverse each match
Get a match if one exists:
>>>
"find a needle in a haystack" ^? regex [rx|n..dle|] . match
Just "needle"
Collect all matches
>>>
"one _two_ three _four_" ^.. regex [rx|_\w+_|] . match
["_two_","_four_"]
You can edit the traversal to perform a regex replace/substitution
>>>
"one _two_ three _four_" & regex [rx|_\w+_|] . match %~ T.toUpper
"one _TWO_ three _FOUR_"
groups :: Traversal' (Match text) [text] Source #
Access all groups of a match at once.
Note that you can edit the groups through this traversal,
Changing the length of the list has behaviour similar to partsOf
.
Get all matched groups:
>>>
"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups
[["raindrops","roses"],["whiskers","kittens"]]
You can access a specific group by combining with ix
>>>
"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups . ix 1
["roses","kittens"]
groups
is a traversal; you can mutate matches through it.
>>>
"raindrops on roses and whiskers on kittens" & regex [rx|(\w+) on (\w+)|] . groups . ix 1 %~ T.toUpper
"raindrops on ROSES and whiskers on KITTENS"
Editing the list rearranges groups
>>>
"raindrops on roses and whiskers on kittens" & regex [rx|(\w+) on (\w+)|] . groups %~ Prelude.reverse
"roses on raindrops and kittens on whiskers"
You can traverse the list to flatten out all groups
>>>
"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups . traversed
["raindrops","roses","whiskers","kittens"]
matchAndGroups :: Monoid text => Getter (Match text) (text, [text]) Source #
Collect both the match text AND all the matching groups
>>>
"raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . matchAndGroups
[("raindrops on roses",["raindrops","roses"]),("whiskers on kittens",["whiskers","kittens"])]
Compiling regex
rx :: QuasiQuoter Source #
QuasiQuoter
for compiling regexes.
This is just re
re-exported under a different name so as not to conflict with re
from
Lens
mkRegexQQ :: [PCREOption] -> QuasiQuoter #
Returns a QuasiQuoter like re
, but with given PCRE options.
compile :: ByteString -> [PCREOption] -> Regex #
Compile a perl-compatible regular expression stored in a strict bytestring.
An example
let r = compile (pack "^(b+|a){1,2}?bc") []
Or using GHC's -XOverloadedStrings flag, and importing Data.ByteString.Char8, we can avoid the pack:
let r = compile "^(b+|a){1,2}?bc" []
If the regular expression is invalid, an exception is thrown.
If this is unsuitable, compileM
is availlable, which returns failure
in a monad.
To do case insentive matching,
compile "^(b+|a){1,2}?bc" [caseless]
Other flags are documented below.
The resulting abstract regular expression can be passed to match
for matching against a subject string.
The arguments are:
pat
: A ByteString containing the regular expression to be compiled.flags
, optional bit flags. IfNothing
is provided, defaults are used.
Valid compile-time flags are:
anchored
- Force pattern anchoringauto_callout
- Compile automatic calloutsbsr_anycrlf
- \R matches only CR, LF, or CRLFbsr_unicode
- \R matches all Unicode line endingscaseless
- Do caseless matchingdollar_endonly
-$
not to match newline at enddotall
- matches anything including NLdupnames
- Allow duplicate names for subpatternsextended
- Ignore whitespace and # commentsextra
- PCRE extra features (not much use currently)firstline
- Force matching to be before newlinemultiline
-^
and$
match newlines within datanewline_any
- Recognize any Unicode newline sequencenewline_anycrlf
- Recognize CR, LF, and CRLF as newline sequencesnewline_cr
- Set CR as the newline sequencenewline_crlf
- Set CRLF as the newline sequencenewline_lf
- Set LF as the newline sequenceno_auto_capture
- Disable numbered capturing parentheses (named ones available)ungreedy
- Invert greediness of quantifiersutf8
- Run in UTF-8 modeno_utf8_check
- Do not check the pattern for UTF-8 validity
The regex is allocated via malloc on the C side, and will be deallocated by the runtime when the Haskell value representing it goes out of scope.
See 'man pcreapi for more details.
Caveats: patterns with embedded nulls, such as "0*" seem to be mishandled, as this won't currently match the subject "000".
compileM :: ByteString -> [PCREOption] -> Either String Regex #
Types
type Match text = [Either text text] Source #
Match represents a whole regex match; you can drill into it using match
or groups
or matchAndGroups
text
is either Text or ByteString depending on whether you use regex
or regexBS
Consider this to be internal; don't depend on its representation.