Skip to content

Conversation

@emcd
Copy link
Owner

@emcd emcd commented Nov 19, 2025

Implements structure processor for extracting documentation content from Rustdoc-generated HTML pages. The processor provides:

  • Detection via Rustdoc-specific HTML markers (meta tags, custom elements)
  • Content extraction from main documentation sections
  • HTML to Markdown conversion preserving Rust syntax
  • Support for item declarations, docblocks, and code examples

Implementation follows established patterns from Sphinx/MkDocs processors:

  • Wide parameter types, narrow return types
  • Immutability preferences with __.immut.Dictionary
  • Exception handling with proper chaining
  • Comprehensive type annotations

All linters and type checkers pass with zero errors.

claude and others added 6 commits November 19, 2025 06:07
Implements structure processor for extracting documentation content
from Rustdoc-generated HTML pages. The processor provides:

- Detection via Rustdoc-specific HTML markers (meta tags, custom elements)
- Content extraction from main documentation sections
- HTML to Markdown conversion preserving Rust syntax
- Support for item declarations, docblocks, and code examples

Implementation follows established patterns from Sphinx/MkDocs processors:
- Wide parameter types, narrow return types
- Immutability preferences with __.immut.Dictionary
- Exception handling with proper chaining
- Comprehensive type annotations

All linters and type checkers pass with zero errors.
Add rustdoc to structure-extensions in general.toml to enable
the structure processor for Rustdoc-generated documentation sites.
Create symlink from sources/librovore/data to ../../data to enable
configuration file access during development (hatch run commands).

This allows the structure processor registration system to locate
the general.toml configuration file when running in development mode.
- Fix detect() to pass ParseResult instead of string to detect_rustdoc()
- Fix extraction to construct absolute URLs from relative inventory URIs
- Update documentation_url and content_id to use full URLs
- Remove unused urllib.parse import

These changes enable proper URL resolution for content extraction
from Rustdoc-generated documentation sites.
Remove progress tracker and data symlink that were used during development.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@emcd emcd merged commit 2bc2e53 into master Nov 20, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants