unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. The repository notes some limitations (e.g., languages like Chinese/Arabic/Korean may not be well-supported). Because of its simplicity and focused purpose, it can be a reliable building block in backend services or CLI tools.

Features

  • Extracts main textual content (body) from an HTML document
  • Parses and returns metadata (title, author, date, language detection etc)
  • Caches intermediate representations for performance when extracting multiple fields
  • CLI / module support: can be installed globally or used programmatically
  • Suitable for building datasets, article-scraping, republishing workflows
  • Open-source under Apache-2.0 license, easy to integrate in Node.js stacks

Project Samples

Project Activity

See All Activity >

Categories

HTML/XHTML

License

Apache License V2.0

Follow unfluff

unfluff Web Site

Other Useful Business Software
Cut Data Warehouse Costs up to 54% with BigQuery Icon
Cut Data Warehouse Costs up to 54% with BigQuery

Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.
Try BigQuery Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of unfluff!

Additional Project Details

Registered

2025-11-14