Automatically extract slides and transcribe audio from lecture videos.
Use the Google Colab notebook for the easiest experience - no installation required:
The Colab notebook provides a user-friendly interface with:
- Form-based configuration (no command-line needed)
- Automatic GPU acceleration for Whisper
- One-click download of results
- Google Drive integration for easy file access
- Smart Slide Detection: Grid-based peripheral uniformity analysis to identify actual slides vs UI elements
- Letterbox Detection: Automatically detects and crops letterbox bars from recordings
- Sensitive Change Detection: Catches animated slides and text appearing
- Fast Transcription: Uses Deepgram API by default (or Whisper as fallback)
- Batch Processing: Process entire folders of lecture videos
- PDF Output: All slides compiled into a single PDF for easy LLM intake
- Python 3.9+
- FFmpeg (for audio extraction)
- Deepgram API key (free tier available at deepgram.com)
git clone https://github.com/graysonmc/Lecture-Processor.git
cd Lecture-Processor
# Create virtual environment (using Anaconda)
conda create -n lecture-processor python=3.11
conda activate lecture-processor
# Or using venv
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project directory with your Deepgram API key:
DEEPGRAM_API_KEY=your-api-key-hereGet a free API key at https://deepgram.com ($200 free credits)
For users who prefer running locally instead of Colab:
Put your lecture videos in ~/Desktop/Raw Lectures/, then run:
python process_lecture.py --batchOutput goes to ~/Desktop/Processed Lectures/<video_name>/:
slides.pdf- All slides in one PDFtranscript.txt- Full transcription
python process_lecture.py lecture.mp4# Slides only (skip transcription)
python process_lecture.py --batch --slides-only
# Custom output directory
python process_lecture.py --batch --output-dir /path/to/output
# Adjust slide detection sensitivity (lower = more sensitive)
python process_lecture.py --batch --threshold 20
# Use local Whisper instead of Deepgram
python process_lecture.py --batch --whisper-model base- Letterbox Detection: Requires black bars on sides (positive signal for recorded slides)
- Rectangle Mapping: Detects slide boundaries within the letterbox
- Grid Analysis: Maps 18x12 grid onto detected slide region
- Peripheral Uniformity: Checks that edges/corners have consistent white background
- Change Detection: Sensitive to new text appearing (catches animated slides)
~/Desktop/Processed Lectures/
├── Lecture-01/
│ ├── slides.pdf
│ └── transcript.txt
├── Lecture-02/
│ ├── slides.pdf
│ └── transcript.txt
└── ...
pip install pytest
pytest tests/ -v- No slides detected: Video may not have letterbox bars, or slides may have dark backgrounds
- Too many duplicates: Lower the threshold with
--threshold 15 - Missing slides: Raise the threshold with
--threshold 30 - Transcription fails: Check your
DEEPGRAM_API_KEYis set correctly
MIT