The Principal Dev – Masterclass for Tech Leads

The Principal Dev – Masterclass for Tech LeadsNov 27-28

Join

New Python HTML Libraries 2025

GitHub Libraries Python HTML Libraries

html5lib/html5lib-python 1K +3

added 9 months ago

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

alir3z4/html2text 2K +5

added 9 months ago

Convert HTML to Markdown-formatted text.

gawel/pyquery 2K +3

added 9 months ago

A jQuery-like library for python.

mozilla/bleach 2K +4

added 9 months ago

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

buriy/python-readability 2K

added 9 months ago

Given an HTML document, extract and clean up the main body text and title.

lxml/lxml 2K +9

added 9 months ago

lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language

scrapy/parsel 1K +3

added 9 months ago

Parsel lets you extract data from XML/HTML/JSON documents using XPath or CSS selectors.

psf/requests-html 13K +3

added 10 months ago

This library intends to make parsing HTML as simple and intuitive as possible.

Join libs.tech

...and unlock some superpowers

GitHub

We won't share your data with anyone else.