A command-line utility for organizing academic papers with intelligent metadata extraction and descriptive filenames. Supports downloading from URLs or processing existing PDF files.
- Smart Naming: Automatically extracts metadata from PDFs and generates readable filenames like
Wang_Hierarchical_Reasoning_Model.pdf - Unified Input: Process URLs, individual files, or entire directories of PDFs
- Network Resilience: Built-in retry logic with exponential backoff for network failures
- Progress Tracking: Real-time download progress with size information
- Conflict Resolution: Automatic handling of filename conflicts with numbered suffixes
- Graceful Fallbacks: Works even when metadata extraction fails
- Batch Processing: Organize entire directories of PDFs with a single command
- Environment Support: Configurable default directories via environment variables
git clone <repository-url>
cd paper-organize
pip install -e .git clone <repository-url>
cd paper-organize
uv sync --extra dev# Download and organize a paper from URL
paper-organize https://arxiv.org/pdf/2506.21734
# Organize an existing PDF file
paper-organize ./downloaded-paper.pdf
# Batch organize all PDFs in a directory
paper-organize ./papers-directory/# Download and organize this arXiv paper:
paper-organize https://arxiv.org/pdf/2506.21734
# Creates file: Wang_Hierarchical_Reasoning_Model.pdf
# Instead of: 2506.21734.pdfpaper-organize --help
# Usage: paper-organize [OPTIONS] INPUT
#
# INPUT can be:
# • URL Download and organize a paper from the web
# • PDF file Organize an existing PDF file
# • Directory Batch organize all PDFs in a directory
#
# Options:
# --dir DIRECTORY Directory to save organized files (overrides PAPERS_DIR)
# --name TEXT Custom filename for the organized file
# --no-auto-name Skip metadata extraction and use original filename
# --quiet Suppress output for scripting
# --verbose Show detailed output
# --help Show this message and exit
#
# Directory Priority: --dir > PAPERS_DIR environment variable > ~/Papers (default)# Set default download directory
export PAPERS_DIR="$HOME/Research/Papers"
# Now all organized papers go to ~/Research/Papers by default
paper-organize https://arxiv.org/pdf/2506.21734
# Override for specific operation
paper-organize https://arxiv.org/pdf/2506.21734 --dir ./references/- Processes input - downloads from URLs or reads existing files with progress tracking
- Extracts metadata using a layered strategy:
- PyPDF for basic PDF metadata
- Enhanced extraction pipeline with arXiv API and pdfplumber for academic identifiers (DOI, arXiv ID)
- Title parsing from PDF text as fallback
- Generates filename in format:
{FirstAuthor}_{Year}_{Title}.pdf - Sanitizes filename for filesystem compatibility
- Resolves conflicts by appending numbers if file exists
Wang_2024_Hierarchical_Reasoning_Model.pdfSmith_2023_Deep_Learning_Survey.pdfChen_2024_Attention_Mechanisms_NLP.pdf
# Organize all PDFs in a directory
paper-organize ~/Downloads/papers/
# Output: Processes each PDF and organizes with metadata-based names
# Example output:
# → Processing existing file: paper1.pdf
# ✓ Renamed to: Wang_2024_Deep_Learning.pdf
# → Processing existing file: paper2.pdf
# ✓ Renamed to: Smith_2023_Neural_Networks.pdf
# 📊 Summary: Processed 15 files# Organize to specific directory with custom name
paper-organize arxiv-paper.pdf --dir ./references/ --name "important-paper"
# Disable automatic renaming
paper-organize https://arxiv.org/pdf/2506.21734 --no-auto-nameThe tool gracefully handles various error conditions:
- Network failures with automatic retry
- Invalid URLs or file paths
- Permission errors
- Corrupted or non-PDF files
- Metadata extraction failures
uv run pytestuv run mypy src/ tests/uv run ruff check src/ tests/
uv run ruff format src/ tests/This project builds on several excellent open-source libraries:
- Click (BSD-3-Clause) - Command line interface toolkit
- Requests (Apache-2.0) - HTTP library for downloads
- PyPDF (BSD-3-Clause) - PDF text extraction and metadata
- pdfplumber (MIT) - Enhanced PDF text extraction
- arxiv (MIT) - Official arXiv API client
- tqdm (MIT/MPL-2.0) - Progress bars
- pytest (MIT) - Testing framework
- MyPy (MIT) - Static type checker
- Ruff (MIT) - Fast Python linter and formatter
We're grateful to the maintainers and contributors of these projects for making paper-organize possible.
MIT License - see LICENSE file for details.