Skip to content

Conversation

@emcd
Copy link
Owner

@emcd emcd commented Nov 19, 2025

Implement structure processor for extracting API documentation content from Pydoctor-generated sites (e.g., Twisted, Dulwich).

New features:

  • Detection via meta tags, CSS markers, and HTML structure patterns
  • Content extraction for docstrings, signatures, and code examples
  • HTML to Markdown conversion with Bootstrap cleanup
  • URL utilities for proper ParseResult type handling

Package structure:

  • sources/librovore/structures/pydoctor/init.py - Registration
  • sources/librovore/structures/pydoctor/main.py - PydoctorProcessor class
  • sources/librovore/structures/pydoctor/detection.py - Site detection logic
  • sources/librovore/structures/pydoctor/extraction.py - Content extraction
  • sources/librovore/structures/pydoctor/conversion.py - HTML conversion
  • sources/librovore/structures/pydoctor/urls.py - URL manipulation

Configuration:

  • Added pydoctor structure extension to general.toml

Quality assurance:

  • All linters pass (ruff, isort, pyright)
  • All tests pass (171 tests)
  • Follows project coding standards and practices

Implement structure processor for extracting API documentation content
from Pydoctor-generated sites (e.g., Twisted, Dulwich).

New features:
- Detection via meta tags, CSS markers, and HTML structure patterns
- Content extraction for docstrings, signatures, and code examples
- HTML to Markdown conversion with Bootstrap cleanup
- URL utilities for proper ParseResult type handling

Package structure:
- sources/librovore/structures/pydoctor/__init__.py - Registration
- sources/librovore/structures/pydoctor/main.py - PydoctorProcessor class
- sources/librovore/structures/pydoctor/detection.py - Site detection logic
- sources/librovore/structures/pydoctor/extraction.py - Content extraction
- sources/librovore/structures/pydoctor/conversion.py - HTML conversion
- sources/librovore/structures/pydoctor/urls.py - URL manipulation

Configuration:
- Added pydoctor structure extension to general.toml

Quality assurance:
- All linters pass (ruff, isort, pyright)
- All tests pass (171 tests)
- Follows project coding standards and practices
Code quality improvements:
- Remove blank lines within function bodies
- Narrow overly-broad try block in detect_pydoctor
- Simplify return statement (remove unnecessary assignment)

Documentation:
- Document SSL/TLS certificate verification issue in issues.md
- Document normalize_base_url code duplication in issues.md

Changes follow project coding standards. All linters pass.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@emcd emcd merged commit 9ed6bfe into master Nov 20, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants