Search, download, and process public domain texts from the Project Gutenberg collection.
Install the released version from CRAN:
install.packages("gutenbergr")Install the development version from GitHub:
# install.packages("pak")
pak::pak("ropensci/gutenbergr")Load the package:
library(gutenbergr)
library(dplyr)We’ll get and set our Project Gutenberg mirror:
gutenberg_get_mirror()#> [1] "https://aleph.pglaf.org"
Search through the metadata to find a book:
gutenberg_works(title == "Persuasion")#> # A tibble: 1 × 8
#> gutenberg_id title author gutenberg_author_id language
#> <int> <chr> <chr> <int> <fct>
#> 1 105 Persuasion Austen, Jane 68 en
#> gutenberg_bookshelf rights has_text
#> <chr> <fct> <lgl>
#> 1 Category: Novels/Category: British Literature Public domain in the USA. TRUE
Persuasion’s gutenberg_id is 105. We’ll use it to download it. We’ll
set our cache option to "persistent" so that we don’t have to
re-download it later.
options(gutenbergr_cache_type = "persistent")
persuasion <- gutenberg_download(105)persuasion#> # A tibble: 8,357 × 2
#> gutenberg_id text
#> <int> <chr>
#> 1 105 "Persuasion"
#> 2 105 ""
#> 3 105 ""
#> 4 105 "by Jane Austen"
#> 5 105 ""
#> 6 105 "(1818)"
#> 7 105 ""
#> 8 105 ""
#> 9 105 ""
#> 10 105 ""
#> # ℹ 8,347 more rows
Multiple works can be downloaded at once. We’ll add title data from
the metadata.
books <- gutenberg_download(c(105, 161), meta_fields = "title")books |> count(title)#> # A tibble: 2 × 2
#> title n
#> <chr> <int>
#> 1 Persuasion 8357
#> 2 Renascence, and Other Poems 1222
See the following vignettes for more advanced usage of gutenbergr.
- Getting Started with gutenbergr - explore metadata and download books
- Text Mining with gutenbergr and tidytext - complete analysis workflow with tidytext
See the
data-raw
directory for scripts. Metadata was generated from the Project
Gutenberg
catalog on 11
January 2026.
Yes! The package follows Project Gutenberg’s rules:
- Retrieves books directly from mirrors using the authorized link format
- Prioritizes
.zipfiles to minimize bandwidth - Supports session and persistent caching
- This package is designed for downloading individual works or small collections, not the entire corpus. For bulk downloads, set up a mirror.
See their Terms of Use for details.
See
CONTRIBUTING.md.
Note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

