This Node.js script converts TEI documents (starting with The Kitáb-i-Aqdas) from HTML or XHTML format to TEI P5 XML format.
See the results at:
- https://bahaidev.github.io/tei-conversion/books/hidden-words.xml
- https://bahaidev.github.io/tei-conversion/books/kitab-i-aqdas.xml
The Kitáb-i-Aqdas (The Most Holy Book) is the central book of the Bahá'í Faith, written by Bahá'u'lláh. This converter transforms the XHTML version (Bahá'í Reference Library format) into a structured TEI P5 XML document suitable for digital humanities research and preservation.
books/The Kitáb-i-Aqdas.xhtml- Official XHTML source document from Bahá'í Reference Librarysrc/convert-aqdas-to-tei.js- Main conversion script (ESM Node.js)package.json- Node.js configuration and dependenciesbooks/kitab-i-aqdas.xml- Generated TEI XML output (created when you run the script)books/tei-to-html.xsl- XSLT stylesheet for viewing XML as formatted HTML
-
Auto-detects source format: Supports both legacy HTML and modern XHTML structures
-
Complete conversion of all major sections:
- Preface (7 paragraphs)
- Introduction (30 paragraphs)
- Description (8 paragraphs)
- Main text (190 paragraphs)
- Questions and Answers (107 items)
- Notes (194 items)
-
Preserves formatting:
- Italic text (
<hi rend="italic">) - Bold text (
<hi rend="bold">) - Underlined text (
<hi rend="underline">) - Superscript and subscript
- External references/links
- Italic text (
-
TEI-compliant structure:
- Proper TEI header with metadata
- Structured divisions by section type
- Numbered paragraphs and notes
- Character encoding normalization
First ensure pnpm is installed:
npm install -g pnpmNext, install the required dependencies:
pnpm installThis will install:
jsdom- For parsing and manipulating HTML
Run the conversion script:
pnpm convertTo view the converted documented in the browser, you can run the local server:
pnpm start...and then open in the browser.
The script will:
- Read
The Kitáb-i-Aqdas.xhtml - Parse and extract all sections using navigation-based structure detection
- Generate
kitab-i-aqdas.xmlin TEI P5 format - Display statistics about the conversion
Reading HTML file...
Parsing HTML document...
Extracting sections...
Sections found:
- Preface: 7 paragraphs
- Introduction: 30 paragraphs
- Description: 8 paragraphs
- Main text: 190 paragraphs
- Questions: 107 items
- Notes: 194 items
Generating TEI XML...
Writing output file...
Conversion complete! Output saved to: ./books/kitab-i-aqdas.xml
Total file size: 132.75 KB
The generated XML file includes an XSLT stylesheet that transforms it into beautifully formatted HTML when opened in a web browser:
# macOS
open books/kitab-i-aqdas.xml
# Linux
xdg-open books/kitab-i-aqdas.xml
# Windows
start books/kitab-i-aqdas.xmlOr simply double-click the books/kitab-i-aqdas.xml file in your file manager.
The stylesheet (books/tei-to-html.xsl) provides:
- Elegant typography with serif fonts and proper spacing
- Automatic table of contents with section navigation
- Color-coded sections for easy visual distinction
- Numbered paragraphs with superscript numbering
- Styled notes with golden accent borders
- Responsive design that works on mobile and desktop
- Print-friendly layout for PDF generation
For editing and validation, use:
- oXygen XML Editor - Full TEI P5 validation
- XMLSpy - Schema validation and XSLT debugging
- VS Code with XML extensions - Lightweight editing
The TEI document follows the TEI P5 guidelines:
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<!-- Metadata about the text -->
</teiHeader>
<text>
<body>
<div type="preface">
<head>Preface</head>
<p n="1">...</p>
</div>
<div type="introduction">
<head>Introduction</head>
<p n="1">...</p>
</div>
<div type="description">
<head>Description</head>
<p n="1">...</p>
</div>
<div type="main-text">
<head>The Kitáb-i-Aqdas</head>
<p n="1">...</p>
<!-- up to n="190" -->
</div>
<div type="questions-answers">
<head>Questions and Answers</head>
<p n="1">...</p>
</div>
<div type="notes">
<head>Notes</head>
<note n="1">...</note>
</div>
</body>
</text>
</TEI>The script is configured to read from The Kitáb-i-Aqdas.xhtml by default. To use a different file, edit the constant at the top of src/convert-aqdas-to-tei.js:
const INPUT_FILE = './books/The Kitáb-i-Aqdas.xhtml';
const OUTPUT_FILE = './books/kitab-i-aqdas.xml';The converter automatically detects the document structure:
- XHTML format: Uses navigation-based section detection with numeric ID anchors
- Legacy HTML: Falls back to semantic anchor name patterns (pref#, intro#, par#, etc.)
The script provides several functions you can extend:
cleanText()- Text normalizationnormalizeText()- Character entity conversionextractTextWithFormatting()- HTML to TEI formatting conversionparseDocument()- Section extraction logicgenerateTEIHeader()- TEI header customizationgenerateTEIBody()- TEI body structure
The converter handles:
- Unicode characters (Arabic diacritics, accented letters)
- HTML entities (
á, , etc.) - Special characters (curly quotes, em dashes, etc.)
- XML escaping for special characters
To validate the generated TEI XML:
- Use an online validator: TEI by Example Validator
- Use xmllint:
xmllint --noout --schema tei_all.xsd books/kitab-i-aqdas.xml - Use oXygen XML Editor for comprehensive validation
This is an educational project for TEI encoding of the Kitáb-i-Aqdas. The original text is © Bahá'í World Centre.