paper-organize

A command-line utility for organizing academic papers with intelligent metadata extraction and descriptive filenames. Supports downloading from URLs or processing existing PDF files.

Features

Smart Naming: Automatically extracts metadata from PDFs and generates readable filenames like Wang_Hierarchical_Reasoning_Model.pdf
Unified Input: Process URLs, individual files, or entire directories of PDFs
Network Resilience: Built-in retry logic with exponential backoff for network failures
Progress Tracking: Real-time download progress with size information
Conflict Resolution: Automatic handling of filename conflicts with numbered suffixes
Graceful Fallbacks: Works even when metadata extraction fails
Batch Processing: Organize entire directories of PDFs with a single command
Environment Support: Configurable default directories via environment variables

Installation

From Source

git clone <repository-url>
cd paper-organize
pip install -e .

Development Installation

git clone <repository-url>
cd paper-organize
uv sync --extra dev

Usage

Basic Usage

# Download and organize a paper from URL
paper-organize https://arxiv.org/pdf/2506.21734

# Organize an existing PDF file
paper-organize ./downloaded-paper.pdf

# Batch organize all PDFs in a directory
paper-organize ./papers-directory/

Real Example

# Download and organize this arXiv paper:
paper-organize https://arxiv.org/pdf/2506.21734

# Creates file: Wang_Hierarchical_Reasoning_Model.pdf
# Instead of: 2506.21734.pdf

Command Options

paper-organize --help

# Usage: paper-organize [OPTIONS] INPUT
#
# INPUT can be:
#   • URL          Download and organize a paper from the web
#   • PDF file     Organize an existing PDF file  
#   • Directory    Batch organize all PDFs in a directory
# 
# Options:
#   --dir DIRECTORY   Directory to save organized files (overrides PAPERS_DIR)
#   --name TEXT       Custom filename for the organized file
#   --no-auto-name    Skip metadata extraction and use original filename
#   --quiet          Suppress output for scripting
#   --verbose        Show detailed output
#   --help           Show this message and exit
#
# Directory Priority: --dir > PAPERS_DIR environment variable > ~/Papers (default)

Environment Variables

# Set default download directory
export PAPERS_DIR="$HOME/Research/Papers"

# Now all organized papers go to ~/Research/Papers by default
paper-organize https://arxiv.org/pdf/2506.21734

# Override for specific operation
paper-organize https://arxiv.org/pdf/2506.21734 --dir ./references/

How It Works

Processes input - downloads from URLs or reads existing files with progress tracking
Extracts metadata using a layered strategy:
- PyPDF for basic PDF metadata
- Enhanced extraction pipeline with arXiv API and pdfplumber for academic identifiers (DOI, arXiv ID)
- Title parsing from PDF text as fallback
Generates filename in format: {FirstAuthor}_{Year}_{Title}.pdf
Sanitizes filename for filesystem compatibility
Resolves conflicts by appending numbers if file exists

Intelligent Filename Examples

Wang_2024_Hierarchical_Reasoning_Model.pdf
Smith_2023_Deep_Learning_Survey.pdf
Chen_2024_Attention_Mechanisms_NLP.pdf

Advanced Usage

Batch Processing

# Organize all PDFs in a directory
paper-organize ~/Downloads/papers/

# Output: Processes each PDF and organizes with metadata-based names
# Example output:
# → Processing existing file: paper1.pdf
# ✓ Renamed to: Wang_2024_Deep_Learning.pdf
# → Processing existing file: paper2.pdf  
# ✓ Renamed to: Smith_2023_Neural_Networks.pdf
# 📊 Summary: Processed 15 files

Custom Organization

# Organize to specific directory with custom name
paper-organize arxiv-paper.pdf --dir ./references/ --name "important-paper"

# Disable automatic renaming
paper-organize https://arxiv.org/pdf/2506.21734 --no-auto-name

Error Handling

The tool gracefully handles various error conditions:

Network failures with automatic retry
Invalid URLs or file paths
Permission errors
Corrupted or non-PDF files
Metadata extraction failures

Development

Running Tests

uv run pytest

Type Checking

uv run mypy src/ tests/

Linting and Formatting

uv run ruff check src/ tests/
uv run ruff format src/ tests/

Dependencies

This project builds on several excellent open-source libraries:

Click (BSD-3-Clause) - Command line interface toolkit
Requests (Apache-2.0) - HTTP library for downloads
PyPDF (BSD-3-Clause) - PDF text extraction and metadata
pdfplumber (MIT) - Enhanced PDF text extraction
arxiv (MIT) - Official arXiv API client
tqdm (MIT/MPL-2.0) - Progress bars
pytest (MIT) - Testing framework
MyPy (MIT) - Static type checker
Ruff (MIT) - Fast Python linter and formatter

We're grateful to the maintainers and contributors of these projects for making paper-organize possible.

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github		.github
LICENSES		LICENSES
docs		docs
src/paperorganize		src/paperorganize
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.safety-project.ini		.safety-project.ini
ACKNOWLEDGMENTS.md		ACKNOWLEDGMENTS.md
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

paper-organize

Features

Installation

From Source

Development Installation

Usage

Basic Usage

Real Example

Command Options

Environment Variables

How It Works

Intelligent Filename Examples

Advanced Usage

Batch Processing

Custom Organization

Error Handling

Development

Running Tests

Type Checking

Linting and Formatting

Dependencies

License

About

Uh oh!

Releases 3

Packages

Contributors 3

Uh oh!

Languages

License

snits/paper-organize

Folders and files

Latest commit

History

Repository files navigation

paper-organize

Features

Installation

From Source

Development Installation

Usage

Basic Usage

Real Example

Command Options

Environment Variables

How It Works

Intelligent Filename Examples

Advanced Usage

Batch Processing

Custom Organization

Error Handling

Development

Running Tests

Type Checking

Linting and Formatting

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Uh oh!

Languages

Packages