Copyright	(c) Robbin C.
License	GPLv3
Maintainer	Robbin C.
Stability	unstable
Portability	portable
Safe Haskell	None
Language	Haskell2010

Codec.Archive.Zim.Parser

Contents

Functions
Exceptions
ZIM Header
ZIM Directory Entry
ZIM file format

Description

This is a library for parsing ZIM (http://openzim.org) files. ZIM files contain offline web content (eg, Wikipedia) which can be browsed locally without an Internet connection.

The API is meant to be intuitive for normal use-cases.

To get content for "A/index.htm" from ZIM file "file.zim":

> mimeContent <- "file.zim" `getContent` Url "A/index.htm"
> :t mimeContent
mimeContent :: Maybe (B8.ByteString, BL.ByteString)
> print mimeContent
Just ("text/html", "<html><head>...</html>")

The above will open the file, parse the ZIM header, lookup the MIME type and content of the URL, close the file and return the MIME type and content as a pair. Note that content is a lazy bytestring.

The above operation should suffice for a simple webserver serving a ZIM file. For finer control, it is possible to cache and reuse the file handle and the ZIM header.

> hdl <- openBinaryFile "file.zim" ReadMode
> hdr <- getHeader hdl
> :t hdr
hdr :: ZimHeader
> (hdl, hdr) `getContent` Url "A/index.htm"
Just ("text/html", "<html><head>...</html>")

ZIM files of Wikimedia Foundation (Wikipedia, Wikibooks, etc) can be found at http://ftpmirror.your.org/pub/kiwix/zim.

Below is a full example of a Scotty web server that serves a ZIM file (specified on command line) on localhost port 3000:

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad.IO.Class (liftIO)
import Data.Text.Lazy (toStrict, fromStrict)
import Data.Text.Encoding (decodeUtf8, encodeUtf8)
import System.Environment (getArgs)
import Network.HTTP.Types.Status (status404)
import Web.Scotty
import Codec.Archive.Zim.Parser (getMainPageUrl, getContent, Url(..))

main :: IO ()
main = do
    [fp] <- getArgs
    scotty 3000 $ do
      get "/" (redirectToZimMainPage fp)
      get (regex "^/(./.*)$") (serveZimUrl fp)
      notFound $ text "Invalid URL!"

redirectToZimMainPage :: FilePath -> ActionM ()
redirectToZimMainPage fp = do
    res <- liftIO $ getMainPageUrl fp
    case res of
      Nothing -> do
        status status404
        text "This ZIM file has no main page specified!"
      Just (Url url) -> redirect . fromStrict $ decodeUtf8 url

serveZimUrl :: FilePath -> ActionM ()
serveZimUrl fp = do
    url <- (encodeUtf8 . toStrict) <$> param "1"
    res <- liftIO $ fp `getContent` Url url
    case res of
      Nothing -> do
        liftIO . putStrLn $ "Invalid URL: " ++ show url
        status status404
        text $ "Invalid URL!"
      Just (mimeType, content) -> do
        liftIO . putStrLn $ "Serving: " ++ show url
        setHeader "Content-Type" (fromStrict $ decodeUtf8 mimeType)
        raw content

Feedback and contributions are welcome on http://github.com/robbinch/zim-parser.

Synopsis

getHeader :: RunZim h => h -> IO ZimHeader
getMimeList :: RunZim h => h -> IO MimeList
getDE :: (ZimGetDE k, RunZim h) => h -> k -> IO ZimDirEnt
getMainPageUrl :: RunZim h => h -> IO (Maybe Url)
getCluster :: RunZim h => h -> ClusterNumber -> IO Cluster
getBlob :: RunZim h => h -> (ClusterNumber, BlobNumber) -> IO Blob
getContent :: (ZimGetContent k, RunZim h) => h -> k -> IO (Maybe (ByteString, ByteString))
searchDE :: (ZimSearchDE k, RunZim h) => h -> k -> IO [(Int, ZimDirEnt)]
type MimeList = Array Int ByteString
mkNsTitle :: Char -> ByteString -> Title
mkNsTitlePrefix :: Char -> ByteString -> TitlePrefix
mkNsUrl :: Char -> ByteString -> Url
class RunZim h
class ZimGetDE k
class ZimSearchDE k
class ZimGetContent k
data ZimException
- = ZimInvalidMagic
- | ZimParseError String
- | ZimIncompleteInput
- | ZimInvalidIndex Int
data ZimHeader = ZimHeader {
- zimMagicNumber :: Int
- zimVersion :: Int
- zimUuid :: ByteString
- zimArticleCount :: Int
- zimClusterCount :: Int
- zimUrlPtrPos :: Integer
- zimTitlePtrPos :: Integer
- zimClusterPtrPos :: Integer
- zimMimeListPos :: Integer
- zimMainPage :: Maybe Int
- zimLayoutPage :: Maybe Int
- zimChecksumPos :: Integer
}
data ZimDirEntType
- = ZimArticleEntry
- | ZimRedirectEntry
- | ZimLinkTarget
- | ZimDeletedEntry
data ZimDirEnt = ZimDirEnt {
- zimDeType :: ZimDirEntType
- zimDeMimeType :: Int
- zimDeParameterLen :: Int
- zimDeNamespace :: Char
- zimDeRevision :: Int
- zimDeRedirectIndex :: Maybe Int
- zimDeClusterNumber :: Maybe Int
- zimDeBlobNumber :: Maybe Int
- zimDeUrl :: ByteString
- zimDeTitle :: ByteString
}
newtype UrlIndex = UrlIndex Int
newtype TitleIndex = TitleIndex Int
newtype ClusterNumber = ClusterNumber Int
newtype BlobNumber = BlobNumber Int
newtype Cluster = Cluster {
- unCluster :: ByteString
}
newtype Blob = Blob {
- unBlob :: ByteString
}
newtype Url = Url ByteString
data Title
data TitlePrefix

Functions

getHeader :: RunZim h => h -> IO ZimHeader Source #

getMimeList :: RunZim h => h -> IO MimeList Source #

getDE :: (ZimGetDE k, RunZim h) => h -> k -> IO ZimDirEnt Source #

getMainPageUrl :: RunZim h => h -> IO (Maybe Url) Source #

Returns URL of main page in ZIM. This URL can be used for redirecting to the actual page.

getCluster :: RunZim h => h -> ClusterNumber -> IO Cluster Source #

getBlob :: RunZim h => h -> (ClusterNumber, BlobNumber) -> IO Blob Source #

getContent :: (ZimGetContent k, RunZim h) => h -> k -> IO (Maybe (ByteString, ByteString)) Source #

Get (MIME type, Content). Note that Content is lazy.

searchDE :: (ZimSearchDE k, RunZim h) => h -> k -> IO [(Int, ZimDirEnt)] Source #

Search for a Directory Entry on a RunZim. When searching for a:

Url: Returns either 0 (not found) or 1 element.
Title: Returns either 0 (not found) or 1 element.
TitlePrefix: Returns either 0 (not found) or 2 elements corresponding to lower and upper bound of titles containing the prefix.

type MimeList = Array Int ByteString Source #

List of Mime Types

mkNsTitle :: Char -> ByteString -> Title Source #

Construct a Title with a Namespace.

mkNsTitlePrefix :: Char -> ByteString -> TitlePrefix Source #

Construct a TitlePrefix with a Namespace.

mkNsUrl :: Char -> ByteString -> Url Source #

Construct a Url with a Namespace.

class RunZim h Source #

Instances of this class represent a Zim File and are able to perform ZIM operations (getMimeList, getContent, etc). Valid instances include a Handle to a ZIM file, a FilePath to a ZIM file, or a (Handle, ZimHeader) where ZimHeader is parsed previously (so it does not need to be reparsed).

Minimal complete definition

runZim

Instances

Instances details

RunZim FilePath Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods runZim :: FilePath -> (Handle -> ZimHeader -> IO a) -> IO a
RunZim Handle Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods runZim :: Handle -> (Handle -> ZimHeader -> IO a) -> IO a
RunZim (Handle, ZimHeader) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods runZim :: (Handle, ZimHeader) -> (Handle -> ZimHeader -> IO a) -> IO a

class ZimGetDE k Source #

Minimal complete definition

getDE

Instances

Instances details

ZimGetDE TitleIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getDE :: RunZim h => h -> TitleIndex -> IO ZimDirEnt Source #
ZimGetDE UrlIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getDE :: RunZim h => h -> UrlIndex -> IO ZimDirEnt Source #

class ZimSearchDE k Source #

Minimal complete definition

searchDE

Instances

Instances details

ZimSearchDE Title Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods searchDE :: RunZim h => h -> Title -> IO [(Int, ZimDirEnt)] Source #
ZimSearchDE TitlePrefix Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods searchDE :: RunZim h => h -> TitlePrefix -> IO [(Int, ZimDirEnt)] Source #
ZimSearchDE Url Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods searchDE :: RunZim h => h -> Url -> IO [(Int, ZimDirEnt)] Source #

class ZimGetContent k Source #

Minimal complete definition

getContent

Instances

Instances details

ZimGetContent Title Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> Title -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent TitleIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> TitleIndex -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent Url Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> Url -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent UrlIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> UrlIndex -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent ZimDirEnt Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> ZimDirEnt -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent (MimeList, Title) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, Title) -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent (MimeList, TitleIndex) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, TitleIndex) -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent (MimeList, Url) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, Url) -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent (MimeList, UrlIndex) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, UrlIndex) -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent (MimeList, ZimDirEnt) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, ZimDirEnt) -> IO (Maybe (ByteString, ByteString)) Source #

Exceptions

data ZimException Source #

Other than the below, ErrorCall can be thrown by LZMA library if there is a problem with decompression.

Constructors

ZimInvalidMagic	ZIM file has invalid magic number (anything other than 72173914).
ZimParseError String	There is an error in parsing.
ZimIncompleteInput	There is insufficient bytes required to parse.
ZimInvalidIndex Int	The given index (URL, title or cluster) is out of bounds for this ZIM file.

Instances

Instances details

Exception ZimException Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods toException :: ZimException -> SomeException # fromException :: SomeException -> Maybe ZimException # displayException :: ZimException -> String # backtraceDesired :: ZimException -> Bool #
Show ZimException Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> ZimException -> ShowS # show :: ZimException -> String # showList :: [ZimException] -> ShowS #

ZIM Header

data ZimHeader Source #

See http://www.openzim.org/wiki/ZIM_file_format#Header for more details.

Constructors

ZimHeader

Fields

zimMagicNumber :: Int
Magic Number of file (somewhat superfluous as getZimHeader will throw an exception if magic number is anything other than 72173914)
zimVersion :: Int
Version of ZIM header
zimUuid :: ByteString
UUID of file
zimArticleCount :: Int
Number of articles
zimClusterCount :: Int
Number of clusters
zimUrlPtrPos :: Integer
Position of sorted URL pointers
zimTitlePtrPos :: Integer
Position of sorted Title pointers
zimClusterPtrPos :: Integer
Position of Cluster pointers
zimMimeListPos :: Integer
Position of MIME list
zimMainPage :: Maybe Int
Index of main page
zimLayoutPage :: Maybe Int
Index of layout page
zimChecksumPos :: Integer
Position of MD5 checksum

Instances

Instances details

Show ZimHeader Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> ZimHeader -> ShowS # show :: ZimHeader -> String # showList :: [ZimHeader] -> ShowS #
Eq ZimHeader Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: ZimHeader -> ZimHeader -> Bool # (/=) :: ZimHeader -> ZimHeader -> Bool #
RunZim (Handle, ZimHeader) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods runZim :: (Handle, ZimHeader) -> (Handle -> ZimHeader -> IO a) -> IO a

ZIM Directory Entry

data ZimDirEntType Source #

There are 4 types of directory entries. Most content in a ZIM file are usually ZimArticleEntry or ZimRedirectEntry.

Constructors

ZimArticleEntry
ZimRedirectEntry
ZimLinkTarget
ZimDeletedEntry

Instances

Instances details

Show ZimDirEntType Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> ZimDirEntType -> ShowS # show :: ZimDirEntType -> String # showList :: [ZimDirEntType] -> ShowS #
Eq ZimDirEntType Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: ZimDirEntType -> ZimDirEntType -> Bool # (/=) :: ZimDirEntType -> ZimDirEntType -> Bool #

data ZimDirEnt Source #

See http://www.openzim.org/wiki/ZIM_file_format#Directory_Entries for more details.

Constructors

ZimDirEnt

Fields

zimDeType :: ZimDirEntType
Type of this Directory Entry
zimDeMimeType :: Int
Index into MIME list given by getZimMimeList
zimDeParameterLen :: Int
Parameter Length
zimDeNamespace :: Char
Namespace
zimDeRevision :: Int
Revision
zimDeRedirectIndex :: Maybe Int
Redirect Index (only applicable for ZimRedirectEntry)
zimDeClusterNumber :: Maybe Int
Content is stored in this Cluster Number (only applicable for ZimArticleEntry)
zimDeBlobNumber :: Maybe Int
Content is stored in this Blob Number (only applicable for ZimArticleEntry)
zimDeUrl :: ByteString
URL
zimDeTitle :: ByteString
Title

Instances

Instances details

Show ZimDirEnt Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> ZimDirEnt -> ShowS # show :: ZimDirEnt -> String # showList :: [ZimDirEnt] -> ShowS #
Eq ZimDirEnt Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: ZimDirEnt -> ZimDirEnt -> Bool # (/=) :: ZimDirEnt -> ZimDirEnt -> Bool #
ZimGetContent ZimDirEnt Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> ZimDirEnt -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetContent (MimeList, ZimDirEnt) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, ZimDirEnt) -> IO (Maybe (ByteString, ByteString)) Source #

newtype UrlIndex Source #

Wrapper for URL index

Constructors

UrlIndex Int

Instances

Instances details

Show UrlIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> UrlIndex -> ShowS # show :: UrlIndex -> String # showList :: [UrlIndex] -> ShowS #
Eq UrlIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: UrlIndex -> UrlIndex -> Bool # (/=) :: UrlIndex -> UrlIndex -> Bool #
Ord UrlIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods compare :: UrlIndex -> UrlIndex -> Ordering # (<) :: UrlIndex -> UrlIndex -> Bool # (<=) :: UrlIndex -> UrlIndex -> Bool # (>) :: UrlIndex -> UrlIndex -> Bool # (>=) :: UrlIndex -> UrlIndex -> Bool # max :: UrlIndex -> UrlIndex -> UrlIndex # min :: UrlIndex -> UrlIndex -> UrlIndex #
ZimGetContent UrlIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> UrlIndex -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetDE UrlIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getDE :: RunZim h => h -> UrlIndex -> IO ZimDirEnt Source #
ZimGetContent (MimeList, UrlIndex) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, UrlIndex) -> IO (Maybe (ByteString, ByteString)) Source #

newtype TitleIndex Source #

Wrapper for Title index

Constructors

TitleIndex Int

Instances

Instances details

Show TitleIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> TitleIndex -> ShowS # show :: TitleIndex -> String # showList :: [TitleIndex] -> ShowS #
Eq TitleIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: TitleIndex -> TitleIndex -> Bool # (/=) :: TitleIndex -> TitleIndex -> Bool #
Ord TitleIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods compare :: TitleIndex -> TitleIndex -> Ordering # (<) :: TitleIndex -> TitleIndex -> Bool # (<=) :: TitleIndex -> TitleIndex -> Bool # (>) :: TitleIndex -> TitleIndex -> Bool # (>=) :: TitleIndex -> TitleIndex -> Bool # max :: TitleIndex -> TitleIndex -> TitleIndex # min :: TitleIndex -> TitleIndex -> TitleIndex #
ZimGetContent TitleIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> TitleIndex -> IO (Maybe (ByteString, ByteString)) Source #
ZimGetDE TitleIndex Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getDE :: RunZim h => h -> TitleIndex -> IO ZimDirEnt Source #
ZimGetContent (MimeList, TitleIndex) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, TitleIndex) -> IO (Maybe (ByteString, ByteString)) Source #

newtype ClusterNumber Source #

Wrapper for Cluster number

Constructors

ClusterNumber Int

Instances

Instances details

Show ClusterNumber Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> ClusterNumber -> ShowS # show :: ClusterNumber -> String # showList :: [ClusterNumber] -> ShowS #
Eq ClusterNumber Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: ClusterNumber -> ClusterNumber -> Bool # (/=) :: ClusterNumber -> ClusterNumber -> Bool #
Ord ClusterNumber Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods compare :: ClusterNumber -> ClusterNumber -> Ordering # (<) :: ClusterNumber -> ClusterNumber -> Bool # (<=) :: ClusterNumber -> ClusterNumber -> Bool # (>) :: ClusterNumber -> ClusterNumber -> Bool # (>=) :: ClusterNumber -> ClusterNumber -> Bool # max :: ClusterNumber -> ClusterNumber -> ClusterNumber # min :: ClusterNumber -> ClusterNumber -> ClusterNumber #

newtype BlobNumber Source #

Wrapper for Blob number

Constructors

BlobNumber Int

Instances

Instances details

Show BlobNumber Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> BlobNumber -> ShowS # show :: BlobNumber -> String # showList :: [BlobNumber] -> ShowS #
Eq BlobNumber Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: BlobNumber -> BlobNumber -> Bool # (/=) :: BlobNumber -> BlobNumber -> Bool #
Ord BlobNumber Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods compare :: BlobNumber -> BlobNumber -> Ordering # (<) :: BlobNumber -> BlobNumber -> Bool # (<=) :: BlobNumber -> BlobNumber -> Bool # (>) :: BlobNumber -> BlobNumber -> Bool # (>=) :: BlobNumber -> BlobNumber -> Bool # max :: BlobNumber -> BlobNumber -> BlobNumber # min :: BlobNumber -> BlobNumber -> BlobNumber #

newtype Cluster Source #

Wrapper for Cluster

Constructors

Cluster
Fields unCluster :: ByteString

newtype Blob Source #

Wrapper for Blob

Constructors

Blob
Fields unBlob :: ByteString

newtype Url Source #

Wrapper for Url

Constructors

Url ByteString

Instances

Instances details

Show Url Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> Url -> ShowS # show :: Url -> String # showList :: [Url] -> ShowS #
Eq Url Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: Url -> Url -> Bool # (/=) :: Url -> Url -> Bool #
Ord Url Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods compare :: Url -> Url -> Ordering # (<) :: Url -> Url -> Bool # (<=) :: Url -> Url -> Bool # (>) :: Url -> Url -> Bool # (>=) :: Url -> Url -> Bool # max :: Url -> Url -> Url # min :: Url -> Url -> Url #
ZimGetContent Url Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> Url -> IO (Maybe (ByteString, ByteString)) Source #
ZimSearchDE Url Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods searchDE :: RunZim h => h -> Url -> IO [(Int, ZimDirEnt)] Source #
ZimGetContent (MimeList, Url) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, Url) -> IO (Maybe (ByteString, ByteString)) Source #

data Title Source #

Wrapper for Title

Instances

Instances details

Show Title Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> Title -> ShowS # show :: Title -> String # showList :: [Title] -> ShowS #
Eq Title Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: Title -> Title -> Bool # (/=) :: Title -> Title -> Bool #
Ord Title Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods compare :: Title -> Title -> Ordering # (<) :: Title -> Title -> Bool # (<=) :: Title -> Title -> Bool # (>) :: Title -> Title -> Bool # (>=) :: Title -> Title -> Bool # max :: Title -> Title -> Title # min :: Title -> Title -> Title #
ZimGetContent Title Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> Title -> IO (Maybe (ByteString, ByteString)) Source #
ZimSearchDE Title Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods searchDE :: RunZim h => h -> Title -> IO [(Int, ZimDirEnt)] Source #
ZimGetContent (MimeList, Title) Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods getContent :: RunZim h => h -> (MimeList, Title) -> IO (Maybe (ByteString, ByteString)) Source #

data TitlePrefix Source #

Wrapper for Title Prefix

Instances

Instances details

Show TitlePrefix Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods showsPrec :: Int -> TitlePrefix -> ShowS # show :: TitlePrefix -> String # showList :: [TitlePrefix] -> ShowS #
Eq TitlePrefix Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods (==) :: TitlePrefix -> TitlePrefix -> Bool # (/=) :: TitlePrefix -> TitlePrefix -> Bool #
Ord TitlePrefix Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods compare :: TitlePrefix -> TitlePrefix -> Ordering # (<) :: TitlePrefix -> TitlePrefix -> Bool # (<=) :: TitlePrefix -> TitlePrefix -> Bool # (>) :: TitlePrefix -> TitlePrefix -> Bool # (>=) :: TitlePrefix -> TitlePrefix -> Bool # max :: TitlePrefix -> TitlePrefix -> TitlePrefix # min :: TitlePrefix -> TitlePrefix -> TitlePrefix #
ZimSearchDE TitlePrefix Source #
Instance details Defined in Codec.Archive.Zim.Parser Methods searchDE :: RunZim h => h -> TitlePrefix -> IO [(Int, ZimDirEnt)] Source #

ZIM file format

Following is a short summary of the ZIM file format. The authoritative reference is at http://www.openzim.org/wiki/ZIM_file_format.

1. ZIM header

This is an 80-byte header (see ZimHeader). Among other things, it contains file offsets to the below.

2. List of MIME types

This is a sequence of null-terminated strings (eg. text/html, text/javascript). The last string is zero length, so the end always consists of 2 consecutive null bytes.

3. List of URLs

This is a sequence of 8-byte file offsets, each pointing to a directory entry. This list is sorted by the directory entries' URL.

4. List of Titles

This is a sequence of 4-byte indices, each pointing to a URL above (which in turn point to a directory entry). This list is sorted by the directory entries' Title.

5. Directory Entries

This is a sequence of Directory Entries (see ZimDirEnt). The first 2 bytes determine the type of this entry, which also determine the length. Contents include:

a. MIME type

This 2-byte field means:

0xffff: This directory entry is a ZimRedirectEntry.
0xfffe: This directory entry is a ZimLinkTarget.
0xfffd: This directory entry is a ZimDeletedEntry.
any other value: This directory entry is a ZimArticleEntry and this index into the MIME list from above determines its MIME type.

b. Namespace

This single character determines the directory entry's namespace. (eg. A for articles, I for images, etc.) The comprehensive list is at http://www.openzim.org/wiki/ZIM_file_format#Namespaces.

c. Cluster and Blob number

Only for ZimArticleEntry, this is the directory entry's Cluster and Blob number. The Cluster number is a 4-byte index into the list of Clusters below. The Blob number refers to a block inside the (decompressed) cluster. Together, they provide the content of this directory entry.

d. URL and Title

These 2 null-terminated strings represent the URL and Title of this directory entry respectively. If the Title is empty, it is taken to be the same as the URL.

6. List of Clusters

This is a list of 8-byte file offsets, each pointing to a cluster in the file. -- The end of a cluster is also the start of the next cluster. Therefore, the length of a cluster is the difference between the adjacent offsets. For the last cluster, the end is the Checksum file offset, as the Checksum is always the last 16 bytes of a ZIM file.

a. Compression Type

The first byte of the cluster determines if it is uncompressed (eg. PNG image) or compressed with LZMA (eg. HTML).

0 or 1: No compression
4: Compressed with LZMA

b. List of Blobs

This is a list of 4-byte offsets, each pointing inside this cluster. The end of a blob is also the start of the next blob. Therefore, the length of a blob is the difference between the adjacent offsets. The last offset points to the end of the data area so there is always one more offset than blobs.