Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
MarkupParse
Description
Synopsis
- data Markup = Markup {
- standard :: Standard
- markupTree :: [Tree Token]
- data Standard
- markup :: Standard -> ByteString -> These [MarkupWarning] Markup
- markup_ :: Standard -> ByteString -> Markup
- data RenderStyle
- markdown :: RenderStyle -> Markup -> ByteString
- normalize :: Markup -> Markup
- wellFormed :: Markup -> [MarkupWarning]
- isWellFormed :: Markup -> Bool
- data MarkupWarning
- type Result a = These [MarkupWarning] a
- resultError :: Result a -> a
- resultEither :: Result a -> Either [MarkupWarning] a
- resultMaybe :: Result a -> Maybe a
- type TagName = ByteString
- name :: Standard -> Parser e ByteString
- selfClosers :: [TagName]
- type AttrName = ByteString
- type AttrValue = ByteString
- data Attr = Attr !AttrName !AttrValue
- attrs :: Standard -> Parser a [Attr]
- data Token
- = StartTag !TagName ![Attr]
- | EmptyElemTag !TagName ![Attr]
- | EndTag !TagName
- | Content !ByteString
- | Comment !ByteString
- | Decl !ByteString
- | Doctype !ByteString
- tokenize :: Standard -> ByteString -> These [MarkupWarning] [Token]
- tokenize_ :: Standard -> ByteString -> [Token]
- token :: Standard -> Parser String Token
- detokenize :: Standard -> Token -> ByteString
- gather :: Standard -> [Token] -> These [MarkupWarning] [Tree Token]
- gather_ :: Standard -> [Token] -> [Tree Token]
- degather :: Markup -> These [MarkupWarning] [Token]
- degather_ :: Markup -> [Token]
- xmlVersionInfo :: Parser e ByteString
- xmlEncodingDecl :: Parser e ByteString
- xmlStandalone :: Parser e ByteString
- xmlVersionNum :: Parser e ByteString
- xmlEncName :: Parser e ByteString
- xmlYesNo :: Parser e ByteString
Documentation
import MarkupParse import Data.ByteString qualified as B bs <- B.readFile "other/line.svg" m = markup_ bs
is an approximate round trip from markdown
. 'markup_
ByteString
to Markup
back to ByteString'. The underscores represent versions of main functions that throw an exception on warnings encountered along the way.
At a lower level, a round trip pipeline might look something like:
:t tokenize Html >=> gather Html >>> fmap (Markup Html >>> normalize) >=> degather >>> fmap (fmap (detokenize Html) >>> mconcat) ByteString -> These [MarkupWarning] ByteString
From left to right:
tokenize
converts aByteString
to aToken
list,gather
takes the tokens and gathers them intoTree
s of tokens- this is then wrapped into the
Markup
data type. normalize
concatenates content, and normalizes attributes,degather
turns the markup tree back into a token list. Finally,detokenize
turns a token back into a bytestring.
Along the way, the kleisi fishies and compose forward usage accumulates any warnings via the These
monad instance.
Markup
>>>
markup Html "<foo class=\"bar\">baz</foo>"
That (Markup {standard = Html, markupTree = [Node {rootLabel = StartTag "foo" [Attr "class" "bar"], subForest = [Node {rootLabel = Content "baz", subForest = []}]}]})
Instances
Generic Markup Source # | |
Show Markup Source # | |
NFData Markup Source # | |
Defined in MarkupParse | |
Eq Markup Source # | |
Ord Markup Source # | |
ToExpr Markup Source # | |
Defined in MarkupParse | |
type Rep Markup Source # | |
Defined in MarkupParse type Rep Markup = D1 ('MetaData "Markup" "MarkupParse" "markup-parse-0.0.0.2-HnQ2lGo9uJcI0zQ43aTniK" 'False) (C1 ('MetaCons "Markup" 'PrefixI 'True) (S1 ('MetaSel ('Just "standard") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Standard) :*: S1 ('MetaSel ('Just "markupTree") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 [Tree Token]))) |
From a parsing pov, Html & Xml (& Svg) are close enough that they share a lot of parsing logic, so that parsing and printing just need some tweaking.
The xml parsing logic is based on the XML productions found in https://www.w3.org/TR/xml/
The html parsing was based on a reading of html-parse, but ignores the various 'x00' to 'xfffd' & eof directives that form part of the html standards.
markup :: Standard -> ByteString -> These [MarkupWarning] Markup Source #
Convert bytestrings to Markup
>>>
markup Html "<foo><br></foo><baz"
These [MarkupParser (ParserLeftover "<baz")] (Markup {standard = Html, markupTree = [Node {rootLabel = StartTag "foo" [], subForest = [Node {rootLabel = StartTag "br" [], subForest = []}]}]})
data RenderStyle Source #
Indented 0 puts newlines in between the tags.
Instances
Generic RenderStyle Source # | |
Defined in MarkupParse Associated Types type Rep RenderStyle :: Type -> Type # | |
Show RenderStyle Source # | |
Defined in MarkupParse Methods showsPrec :: Int -> RenderStyle -> ShowS # show :: RenderStyle -> String # showList :: [RenderStyle] -> ShowS # | |
Eq RenderStyle Source # | |
Defined in MarkupParse | |
type Rep RenderStyle Source # | |
Defined in MarkupParse type Rep RenderStyle = D1 ('MetaData "RenderStyle" "MarkupParse" "markup-parse-0.0.0.2-HnQ2lGo9uJcI0zQ43aTniK" 'False) (C1 ('MetaCons "Compact" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "Indented" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Int))) |
markdown :: RenderStyle -> Markup -> ByteString Source #
Convert Markup
to bytestrings
>>>
B.putStr $ markdown (Indented 4) (markup_ Html [i|<foo><br></foo>|])
<foo> <br> </foo>
normalize :: Markup -> Markup Source #
concatenate sequential content, and normalize attributes; unwording class values and removing duplicate attributes (taking last).
>>>
B.putStr $ markdown Compact $ normalize (markup_ Xml [i|<foo class="a" class="b" bar="first" bar="last"/>|])
<foo bar="last" class="a b"/>
wellFormed :: Markup -> [MarkupWarning] Source #
Check for well-formedness and rerturn warnings encountered.
>>>
wellFormed $ Markup Html [Node (Comment "") [], Node (EndTag "foo") [], Node (EmptyElemTag "foo" []) [Node (Content "bar") []], Node (EmptyElemTag "foo" []) []]
[EmptyContent,EndTagInTree,LeafWithChildren,BadEmptyElemTag]
isWellFormed :: Markup -> Bool Source #
Are the trees in the markup well-formed?
Warnings
data MarkupWarning Source #
markup-parse generally tries to continue on parse errors, and return what has/can still be parsed, together with any warnings.
Constructors
BadEmptyElemTag | A tag ending with "/>" that is not an element of |
SelfCloserWithChildren | A tag ending with "/>" that has children. Cannot happen in the parsing phase. |
LeafWithChildren | Only a |
TagMismatch TagName TagName | A CloseTag with a different name to the currently open StartTag. |
UnmatchedEndTag | An EndTag with no corresponding StartTag. |
UnclosedTag | An EndTag with corresponding StartTag. |
EndTagInTree | An EndTag should never appear in |
EmptyContent | Empty Content, Comment, Decl or Doctype |
MarkupParser ParserWarning |
Instances
type Result a = These [MarkupWarning] a Source #
The structure of many returning functions.
A common computation pipeline is to take advantage of the These
Monad instance eg
markup s bs = bs & (tokenize s >=> gather s) & second (Markup s)
resultError :: Result a -> a Source #
Convert any warnings to an error
>>>
resultError $ (tokenize Html) "<foo"
*** Exception: MarkupParser (ParserLeftover "<foo") ...
resultEither :: Result a -> Either [MarkupWarning] a Source #
Returns Left on any warnings
>>>
resultEither $ (tokenize Html) "<foo><baz"
Left [MarkupParser (ParserLeftover "<baz")]
resultMaybe :: Result a -> Maybe a Source #
Returns results if any, ignoring warnings.
>>>
resultMaybe $ (tokenize Html) "<foo><baz"
Just [StartTag "foo" []]
Token components
type TagName = ByteString Source #
Name of token
name :: Standard -> Parser e ByteString Source #
Parse a tag name. Each standard is slightly different.
selfClosers :: [TagName] Source #
Html tags that self-close
type AttrName = ByteString Source #
Name of an attribute.
type AttrValue = ByteString Source #
Value of an attribute. "" is equivalent to true with respect to boolean attributes.
An attribute of a tag
In parsing, boolean attributes, which are not required to have a value in HTML, will be set a value of "", which is ok. But this will then be rendered.
>>>
detokenize Html <$> tokenize_ Html [i|<input checked>|]
["<input checked=\"\">"]
Instances
Generic Attr Source # | |
Show Attr Source # | |
NFData Attr Source # | |
Defined in MarkupParse | |
Eq Attr Source # | |
Ord Attr Source # | |
ToExpr Attr Source # | |
Defined in MarkupParse | |
type Rep Attr Source # | |
Defined in MarkupParse type Rep Attr = D1 ('MetaData "Attr" "MarkupParse" "markup-parse-0.0.0.2-HnQ2lGo9uJcI0zQ43aTniK" 'False) (C1 ('MetaCons "Attr" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 AttrName) :*: S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 AttrValue))) |
Tokens
A Markup token
>>>
runParser_ (many (token Html)) [i|<foo>content</foo>|]
[StartTag "foo" [],Content "content",EndTag "foo"]
>>>
runParser_ (token Xml) [i|<foo/>|]
EmptyElemTag "foo" []
>>>
runParser_ (token Html) "<!-- Comment -->"
Comment " Comment "
>>>
runParser_ (token Xml) [i|<?xml version="1.0" encoding="UTF-8"?>|]
Decl "xml version=\"1.0\" encoding=\"UTF-8\""
>>>
runParser_ (token Html) "<!DOCTYPE html>"
Doctype "DOCTYPE html"
>>>
runParser_ (token Xml) "<!DOCTYPE foo [ declarations ]>"
Doctype "DOCTYPE foo [ declarations ]"
>>>
runParser (token Html) [i|<foo a="a" b="b" c=c check>|]
OK (StartTag "foo" [Attr "a" "a",Attr "b" "b",Attr "c" "c",Attr "check" ""]) ""
>>>
runParser (token Xml) [i|<foo a="a" b="b" c=c check>|]
Fail
Constructors
StartTag !TagName ![Attr] | A start tag. https://developer.mozilla.org/en-US/docs/Glossary/Tag |
EmptyElemTag !TagName ![Attr] | An empty element tag. Optional for XML and kind of not allowed in HTML. |
EndTag !TagName | A closing tag. |
Content !ByteString | The content between tags. |
Comment !ByteString | Contents of a comment. |
Decl !ByteString | Contents of a declaration |
Doctype !ByteString | Contents of a doctype declaration. |
Instances
tokenize :: Standard -> ByteString -> These [MarkupWarning] [Token] Source #
Parse a bytestring into tokens
>>>
tokenize Html [i|<foo>content</foo>|]
That [StartTag "foo" [],Content "content",EndTag "foo"]
token :: Standard -> Parser String Token Source #
A flatparse Token
parser.
>>>
runParser (token Html) "<foo>content</foo>"
OK (StartTag "foo" []) "content</foo>"
detokenize :: Standard -> Token -> ByteString Source #
bytestring representation of Token
.
>>>
detokenize Html (StartTag "foo" [])
"<foo>"
gather :: Standard -> [Token] -> These [MarkupWarning] [Tree Token] Source #
Gather together token trees from a token list, placing child elements in nodes and removing EndTags.
>>>
gather Html =<< tokenize Html "<foo class=\"bar\">baz</foo>"
That [Node {rootLabel = StartTag "foo" [Attr "class" "bar"], subForest = [Node {rootLabel = Content "baz", subForest = []}]}]
degather :: Markup -> These [MarkupWarning] [Token] Source #
Convert a markup into a token list, adding end tags.
>>>
degather =<< markup Html "<foo class=\"bar\">baz</foo>"
That [StartTag "foo" [Attr "class" "bar"],Content "baz",EndTag "foo"]
XML specific Parsers
xmlVersionInfo :: Parser e ByteString Source #
xml production [24]
xmlEncodingDecl :: Parser e ByteString Source #
xml production [80]
xmlStandalone :: Parser e ByteString Source #
Xml production [32]
xmlVersionNum :: Parser e ByteString Source #
xml production [26]
xmlEncName :: Parser e ByteString Source #
xml production [81]
xmlYesNo :: Parser e ByteString Source #
Xml yes/no