Text.XML.Enumerator.Parse
Description
This module provides both a native Haskell solution for parsing XML documents into a stream of events, and a set of parser combinators for dealing with a stream of events.
The important thing to know about the combinators is that they do not work
on the fully-powered Event
datatype; rather, this module defines an
SEvent
datatype which only deals with tags, attributes and content. For
most uses, this is sufficient. If you need to parse doctypes, instructions
or contents, you will not be able to use the combinators.
As a simple example, if you have the following XML file:
<?xml version="1.0" encoding="utf-8"?> <people> <person age="25">Michael</person> <person age="2">Eliezer</person> </people>
Then this code:
{-# LANGUAGE OverloadedStrings #-} import Text.XML.Enumerator.Parse import Data.Text.Lazy (Text, unpack) data Person = Person { age :: Int, name :: Text } deriving Show parsePerson = tag' "person" (requireAttr "age") $ \age -> do name <- content' return $ Person (read $ unpack age) name parsePeople = tag'' "people" $ many parsePerson main = parseFile_ "people.xml" (const Nothing) $ force "people required" parsePeople
will produce:
[Person {age = 25, name = "Michael"},Person {age = 2, name = "Eliezer"}]
- parseBytes :: Monad m => Enumeratee ByteString Event m a
- parseText :: Monad m => Enumeratee Text Event m a
- detectUtf :: Monad m => Enumeratee ByteString Text m a
- data SEvent
- = SBeginElement Name [SAttr]
- | SEndElement
- | SContent Text
- simplify :: Monad m => (Text -> Maybe Text) -> Enumeratee Event SEvent m b
- type SAttr = (Name, Text)
- parseFile :: String -> (Text -> Maybe Text) -> Iteratee SEvent IO a -> IO (Either SomeException a)
- parseFile_ :: String -> (Text -> Maybe Text) -> Iteratee SEvent IO a -> IO a
- tag :: Monad m => (Name -> Maybe a) -> (a -> AttrParser b) -> (b -> Iteratee SEvent m c) -> Iteratee SEvent m (Maybe c)
- tag' :: Monad m => Name -> AttrParser a -> (a -> Iteratee SEvent m b) -> Iteratee SEvent m (Maybe b)
- tagName :: Monad m => Name -> AttrParser a -> (a -> Iteratee SEvent m b) -> Iteratee SEvent m (Maybe b)
- tag'' :: Monad m => Name -> Iteratee SEvent m a -> Iteratee SEvent m (Maybe a)
- tagNoAttr :: Monad m => Name -> Iteratee SEvent m a -> Iteratee SEvent m (Maybe a)
- content :: Monad m => Iteratee SEvent m (Maybe Text)
- content' :: Monad m => Iteratee SEvent m Text
- ignoreElem :: Monad m => Iteratee SEvent m (Maybe ())
- ignoreSiblings :: Monad m => Iteratee SEvent m ()
- data AttrParser a
- requireAttr :: Name -> AttrParser Text
- optionalAttr :: Name -> AttrParser (Maybe Text)
- requireAttrRaw :: String -> (SAttr -> Maybe b) -> AttrParser b
- optionalAttrRaw :: (SAttr -> Maybe b) -> AttrParser (Maybe b)
- ignoreAttrs :: AttrParser ()
- choose :: Monad m => [Iteratee SEvent m (Maybe a)] -> Iteratee SEvent m (Maybe a)
- many :: Monad m => Iteratee SEvent m (Maybe a) -> Iteratee SEvent m [a]
- force :: Monad m => String -> Iteratee SEvent m (Maybe a) -> Iteratee SEvent m a
- skipTill :: Monad m => Iteratee SEvent m (Maybe a) -> Iteratee SEvent m (Maybe a)
- skipSiblings :: Monad m => Iteratee SEvent m a -> Iteratee SEvent m a
- data XmlException
- = XmlException { }
- | InvalidEndElement Name
- | InvalidEntity Text
- | SXmlException { }
- | UnparsedAttributes [SAttr]
Parsing XML files
parseBytes :: Monad m => Enumeratee ByteString Event m aSource
Parses a byte stream into Event
s. This function is implemented fully in
Haskell using attoparsec-text for parsing. The produced error messages do
not give line/column information, so you may prefer to stick with the parser
provided by libxml-enumerator. However, this has the advantage of not
relying on any C libraries.
This relies on detectUtf
to determine character encoding, and parseText
to do the actual parsing.
parseText :: Monad m => Enumeratee Text Event m aSource
Parses a character stream into Event
s. This function is implemented
fully in Haskell using attoparsec-text for parsing. The produced error
messages do not give line/column information, so you may prefer to stick
with the parser provided by libxml-enumerator. However, this has the
advantage of not relying on any C libraries.
detectUtf :: Monad m => Enumeratee ByteString Text m aSource
Automatically determine which UTF variant is being used. This function first checks for BOMs, removing them as necessary, and then check for the equivalent of <?xml for each of UTF-8, UTF-16LEBE, and UTF-32LEBE. It defaults to assuming UTF-8.
Simplified events
A greatly simplified XML event datatype. The best way to produce these
values is the simplify
enumeratee.
Constructors
SBeginElement Name [SAttr] | |
SEndElement | |
SContent Text |
simplify :: Monad m => (Text -> Maybe Text) -> Enumeratee Event SEvent m bSource
Convert a stream of Event
s into a stream SEvent
s. The first argument
is a function to decode character entity references. Some things to note
about this function:
- It drops events for document begin/end, comments, and instructions.
- It concatenates all pieces of content together. The output of this
function is guaranteed to not have two consecutive
SContent
s. - It automatically checks that tag beginnings and endings are well balanced, and throws an exception otherwise.
- It also throws an exception if your supplied entity function does not know how to deal with a character entity.
Please also note that you do not need to handle the 5 XML-defined character entity references (lt, gt, amp, quot and apos), nor deal with numeric entities (decimal and hex).
parseFile :: String -> (Text -> Maybe Text) -> Iteratee SEvent IO a -> IO (Either SomeException a)Source
A helper function which reads a file from disk using enumFile
, detects
character encoding using detectUtf
, parses the XML using parseBytes
,
converts to an SEvent
stream using simplify
and then handing off control
to your supplied parser.
parseFile_ :: String -> (Text -> Maybe Text) -> Iteratee SEvent IO a -> IO aSource
The same as parseFile
, but throws any exceptions.
SEvent parsing
tag :: Monad m => (Name -> Maybe a) -> (a -> AttrParser b) -> (b -> Iteratee SEvent m c) -> Iteratee SEvent m (Maybe c)Source
The most generic way to parse a tag. It takes a predicate for checking if
this is the correct tag name, an AttrParser
for handling attributes, and
then a parser for dealing with content.
This function automatically absorbs its balancing closing tag, and will
throw an exception if not all of the attributes or child elements are
consumed. If you want to allow extra attributes, see ignoreAttrs
.
tag' :: Monad m => Name -> AttrParser a -> (a -> Iteratee SEvent m b) -> Iteratee SEvent m (Maybe b)Source
tagName :: Monad m => Name -> AttrParser a -> (a -> Iteratee SEvent m b) -> Iteratee SEvent m (Maybe b)Source
tagNoAttr :: Monad m => Name -> Iteratee SEvent m a -> Iteratee SEvent m (Maybe a)Source
A further simplified tag parser, which requires that no attributes exist.
content :: Monad m => Iteratee SEvent m (Maybe Text)Source
Grabs the next piece of content if available.
content' :: Monad m => Iteratee SEvent m TextSource
Grabs the next piece of content. If none if available, returns empty
.
Attribute parsing
data AttrParser a Source
A monad for parsing attributes. By default, it requires you to deal with
all attributes present on an element, and will throw an exception if there
are unhandled attributes. Use the requireAttr
, optionalAttr
et al
functions for handling an attribute, and ignoreAttrs
if you would like to
skip the rest of the attributes on an element.
Instances
requireAttr :: Name -> AttrParser TextSource
Require that a certain attribute be present and return its value.
optionalAttr :: Name -> AttrParser (Maybe Text)Source
Return the value for an attribute if present.
requireAttrRaw :: String -> (SAttr -> Maybe b) -> AttrParser bSource
optionalAttrRaw :: (SAttr -> Maybe b) -> AttrParser (Maybe b)Source
ignoreAttrs :: AttrParser ()Source
Skip the remaining attributes on an element. Since this will clear the
list of attributes, you must call this after any calls to requireAttr
,
optionalAttr
, etc.
Combinators
many :: Monad m => Iteratee SEvent m (Maybe a) -> Iteratee SEvent m [a]Source
Keep parsing elements as long as the parser returns Just
.
skipTill :: Monad m => Iteratee SEvent m (Maybe a) -> Iteratee SEvent m (Maybe a)Source
Skip the siblings elements until iteratee not right.
skipSiblings :: Monad m => Iteratee SEvent m a -> Iteratee SEvent m aSource
Combinator to skip the siblings element.
Exceptions
data XmlException Source
Constructors
XmlException | |
Fields | |
InvalidEndElement Name | |
InvalidEntity Text | |
SXmlException | |
Fields | |
UnparsedAttributes [SAttr] |