A Python tool for scraping the New York Film Festival (NYFF) website, useful for planning what you want to see at the festival.
Author: Jack Murphy
The script grabs everything showing at NYFF, looks up IMDb data for production company information, runtimes, possible release date if available, and compiles it. It also looks up trailers, whether a screening is sold out, etc.
The NYFF doesn't have centralized trailers, nor will it list release dates - this does that for you.
git clone <repository-url>
cd nyff-scraper
pip install -e .# Scrape NYFF 2025 lineup with full enrichment
nyff-scraper
# Scrape a custom URL
nyff-scraper https://www.filmlinc.org/nyff/nyff63-lineup/
# Test with limited films
nyff-scraper --limit 10
# Skip trailer search (faster)
nyff-scraper --skip-trailers
# Export only specific formats
nyff-scraper --csv-onlyThe JSON output is provided as a structured format that can be used however you like. You can load it into a spreadsheet, another script, or even an LLM if you want to ask more complex questions. For example, you could ask it to find three films that are unlikely to be in theatres next year and don’t have overlapping showtimes, or filter by whether you want to attend introductions or avoid them.
You can look through the JSON file or the CSV to see what kinds of fields there are.
Flattened data suitable for spreadsheet analysis with one row per showtime - that I know can be a bit much all at once, however it allows you to filter in a more robust way.
Human-readable format perfect for documentation and sharing.
# Clone the repository
gh repo clone rdcdc/nyff-scraper
cd nyff-scraper
# Install in development mode
pip install -e .
# Or install from PyPI (when published)
pip install nyff-scraper# Clone and install dependencies
gh repo clone rdcdc/nyff-scraper
cd nyff-scraper
pip install -r requirements.txt
# Run directly
python -m src.nyff_scraper.cliThe nyff-scraper command provides a comprehensive CLI with many options:
# Full pipeline with all features
nyff-scraper
# Scrape only (no enrichment)
nyff-scraper --only-scrape
# Skip specific enrichment steps
nyff-scraper --skip-imdb --skip-trailers
# Check your letterboxd account for things it could recommend (experimental)
nyff-scraper --letterboxd yourusername
# Custom output location
nyff-scraper --output-dir ./results --output-name my_films
# Test with limited data
nyff-scraper --limit 5 --verboseYou can also use the components directly in Python:
from nyff_scraper import NYFFScraper, IMDbEnricher, TrailerEnricher
from nyff_scraper.exporters import export_all_formats
# Initialize components
scraper = NYFFScraper()
imdb_enricher = IMDbEnricher()
trailer_enricher = TrailerEnricher()
# Scrape films
films = scraper.scrape_nyff_lineup()
# Enrich with IMDb data
films = imdb_enricher.enrich_films(films)
# Add trailers
films = trailer_enricher.enrich_films(films, search_trailers=True)
# Export to all formats
export_all_formats(films, "my_films")positional arguments:
url URL to scrape (default: NYFF 2025 lineup)
processing options:
--only-scrape Only scrape film data, skip IMDb and trailer enrichment
--skip-imdb Skip IMDb enrichment (production companies, distributors)
--skip-trailers Skip YouTube trailer search
--limit N Limit processing to first N films (useful for testing)
output options:
--output-dir DIR Output directory for generated files (default: current directory)
--output-name NAME Base name for output files (default: nyff_films)
--cache-dir DIR Directory for caching web requests (default: cache)
export format options:
--json-only Export only JSON format
--csv-only Export only CSV format
--markdown-only Export only Markdown format
utility options:
--verbose, -v Enable verbose logging
--quiet, -q Suppress all output except errors
--help, -h Show this help message and exit
nyff-scraper/
├── src/
│ └── nyff_scraper/
│ ├── __init__.py # Package initialization
│ ├── cli.py # Command-line interface
│ ├── scraper.py # Web scraping functionality
│ ├── imdb_enricher.py # IMDb data enrichment
│ ├── trailer_enricher.py # YouTube trailer search
│ └── exporters.py # Data export modules
├── tests/ # Test suite
├── scripts/ # Additional utility scripts
├── pyproject.toml # Project configuration
├── requirements.txt # Core dependencies
├── requirements-dev.txt # Development dependencies
├── README.md # This file
└── .gitignore # Git ignore patterns
# Clone the repository
git clone <repository-url>
cd nyff-scraper
# Install in development mode with dev dependencies
pip install -e ".[dev]"
# Or use requirements files
pip install -r requirements-dev.txt
# Set up pre-commit hooks
pre-commit install# Run all tests
pytest
# Run with coverage
pytest --cov=nyff_scraper
# Run specific test file
pytest tests/test_scraper.py# Format code
black src/ tests/
# Sort imports
isort src/ tests/
# Lint code
flake8 src/ tests/
# Type checking
mypy src/The scraper is built with a modular architecture:
- NYFFScraper: Handles web scraping of film lineup pages
- IMDbEnricher: Searches IMDb and extracts production/distribution data
- TrailerEnricher: Searches YouTube for film trailers
- Exporters: Convert data to various output formats (JSON, CSV, Markdown)
- CLI: Command-line interface tying everything together
Each module can be used independently, making it easy to customize the workflow or extend functionality.
The architecture is designed to be extensible. To adapt for other film festivals:
- Create a new scraper class inheriting from a base scraper
- Implement festival-specific parsing logic
- Update the CLI to support the new festival
- Add festival-specific configuration
- requests: HTTP library for web scraping
- beautifulsoup4: HTML parsing and extraction
- lxml: Fast XML/HTML parser
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature) - Make your changes
- Add tests for new functionality
- Run the test suite and linting
- Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/new-feature) - Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- New York Film Festival for providing accessible film data
- IMDb for production and distribution information
- YouTube for trailer hosting and search capabilities
"No films found": Check that the URL is correct and the website structure hasn't changed.
Rate limiting: The scraper includes delays to be respectful to servers. For faster testing, use --limit option.
Missing dependencies: Ensure all requirements are installed with pip install -r requirements.txt.
Permission errors: Make sure you have write permissions in the output directory.
- Check the Issues page for known problems
- Create a new issue with detailed error information
- Use
--verboseflag for detailed logging when reporting issues