gutenbergr

Search, download, and process public domain texts from the Project Gutenberg collection.

Installation

Install the released version from CRAN:

install.packages("gutenbergr")

Install the development version from GitHub:

# install.packages("pak")
pak::pak("ropensci/gutenbergr")

Quick Start

Load the package:

library(gutenbergr)
library(dplyr)

We’ll get and set our Project Gutenberg mirror:

gutenberg_get_mirror()

#> [1] "https://aleph.pglaf.org"

Search through the metadata to find a book:

gutenberg_works(title == "Persuasion")

#> # A tibble: 1 × 8
#>   gutenberg_id title      author       gutenberg_author_id language
#>          <int> <chr>      <chr>                      <int> <fct>   
#> 1          105 Persuasion Austen, Jane                  68 en      
#>   gutenberg_bookshelf                           rights                    has_text
#>   <chr>                                         <fct>                     <lgl>   
#> 1 Category: Novels/Category: British Literature Public domain in the USA. TRUE

Persuasion’s gutenberg_id is 105. We’ll use it to download it. We’ll set our cache option to "persistent" so that we don’t have to re-download it later.

options(gutenbergr_cache_type = "persistent")
persuasion <- gutenberg_download(105)

persuasion

#> # A tibble: 8,357 × 2
#>    gutenberg_id text            
#>           <int> <chr>           
#>  1          105 "Persuasion"    
#>  2          105 ""              
#>  3          105 ""              
#>  4          105 "by Jane Austen"
#>  5          105 ""              
#>  6          105 "(1818)"        
#>  7          105 ""              
#>  8          105 ""              
#>  9          105 ""              
#> 10          105 ""              
#> # ℹ 8,347 more rows

Multiple works can be downloaded at once. We’ll add title data from the metadata.

books <- gutenberg_download(c(105, 161), meta_fields = "title")

books |> count(title)

#> # A tibble: 2 × 2
#>   title                           n
#>   <chr>                       <int>
#> 1 Persuasion                   8357
#> 2 Renascence, and Other Poems  1222

Vignettes

See the following vignettes for more advanced usage of gutenbergr.

Getting Started with gutenbergr - explore metadata and download books
Text Mining with gutenbergr and tidytext - complete analysis workflow with tidytext

FAQ

How were the metadata files generated?

See the data-raw directory for scripts. Metadata was generated from the Project Gutenberg catalog on 11 January 2026.

Do you respect robot access rules?

Yes! The package follows Project Gutenberg’s rules:

Retrieves books directly from mirrors using the authorized link format
Prioritizes .zip files to minimize bandwidth
Supports session and persistent caching
This package is designed for downloading individual works or small collections, not the entire corpus. For bulk downloads, set up a mirror.

See their Terms of Use for details.

Contributing

See CONTRIBUTING.md.

Note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
.github		.github
.vscode		.vscode
R		R
data-raw		data-raw
data		data
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
revdep		revdep
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
air.toml		air.toml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
gutenbergr.Rproj		gutenbergr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gutenbergr

Installation

Quick Start

Vignettes

FAQ

How were the metadata files generated?

Do you respect robot access rules?

Contributing

About

Uh oh!

Releases 8

Uh oh!

Contributors 10

Uh oh!

Languages

ropensci/gutenbergr

Folders and files

Latest commit

History

Repository files navigation

gutenbergr

Installation

Quick Start

Vignettes

FAQ

How were the metadata files generated?

Do you respect robot access rules?

Contributing

About

Topics

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Uh oh!

Contributors 10

Uh oh!

Languages