AI-Powered Desktop Transcription Application
A modern, cross-platform desktop application for high-accuracy speech-to-text transcription, powered by OpenAI Whisper and built with Tauri + React.
Download · Documentation · Report Bug · Request Feature · 中文
Steno is a sophisticated desktop application that transforms audio into accurate text using state-of-the-art AI technology. Built with performance and user experience in mind, it offers both real-time recording and file-based transcription capabilities with support for multiple languages and audio formats.
- Advanced Audio Processing - Support for MP3, WAV, FLAC, OGG, AAC, M4A, WMA with intelligent format conversion
- Real-time Transcription - Live audio capture with simultaneous speech recognition
- Multi-language Support - Optimized for Chinese, English, and Indonesian with automatic language detection
- AI Model Flexibility - Choose from Tiny (39MB), Base (74MB), or Large v3 (1.5GB) Whisper models
- Native macOS Performance - Optimized desktop application for macOS
- Apple Silicon Optimization - Metal GPU acceleration on M1/M2/M3 Macs
| Platform | Version | Architecture | Memory | Storage |
|---|---|---|---|---|
| macOS | 10.15+ | Apple Silicon (M1/M2/M3) | 4GB+ | 200MB+ |
# Download and install
curl -L -o Steno.dmg https://github.com/xazaj/Steno/releases/download/v1.0.0/Steno_1.0.0_aarch64.dmg
open Steno.dmg
# Configure permissions (required for unsigned apps)
xattr -rd com.apple.quarantine "/Applications/Steno.app"
codesign --force --deep --sign - "/Applications/Steno.app"-
Model Selection: Choose your preferred Whisper model based on your needs:
- Tiny Model (39MB) - Fast processing, basic accuracy
- Base Model (74MB) - Balanced performance and quality
- Large v3 Model (1.5GB) - Highest accuracy, slower processing
-
Permissions: Grant microphone access for real-time transcription features
Audio Input → Processing → AI Recognition → Text Output → Export
| Mode | Use Case | Input | Features |
|---|---|---|---|
| File Mode | Batch processing | Audio files | Drag & drop, batch queue, format conversion |
| Real-time | Live recording | Microphone | Live preview, speaker detection, instant results |
| Long Audio | Extended content | Large files | Smart chunking, progress tracking, memory optimization |
- Smart Prompts - Context-aware templates for meetings, interviews, medical, and technical content
- Speaker Diarization - Automatic identification and separation of different speakers
- Export Options - Multiple formats including TXT, SRT, JSON, and Markdown
- Search & Organization - Tag-based categorization with powerful filtering capabilities
~/Library/Application Support/com.steno.app/
├── database/
│ └── steno.db # SQLite database (transcription records & settings)
├── models/ # Whisper AI model files
│ ├── ggml-tiny.bin # Tiny model (~39MB)
│ ├── ggml-base.bin # Base model (~142MB)
│ └── ggml-large-v3.bin # Large v3 model (~1.55GB)
├── audio/
│ ├── uploads/ # User uploaded audio files
│ └── temp/ # Temporary audio files
└── logs/ # Application logs
└── app.log
# View application data
open "~/Library/Application Support/com.steno.app/"
# Backup database
cp "~/Library/Application Support/com.steno.app/database/steno.db" ~/Desktop/steno_backup.db
# Clean temporary files
rm -rf "~/Library/Application Support/com.steno.app/audio/temp/"# Clone repository
git clone https://github.com/xazaj/Steno.git
cd steno
# Install dependencies
npm install
# Start development server
npm run tauri:dev# Development with hot reload
npm run tauri:dev
# Production build
npm run tauri:build
# Platform-specific builds
npm run build:mac-m1 # Apple Siliconsteno/
├── src/ # React frontend
│ ├── components/ # UI components
│ ├── hooks/ # Custom React hooks
│ ├── types/ # TypeScript definitions
│ └── utils/ # Utility functions
├── src-tauri/ # Rust backend
│ ├── src/ # Core application logic
│ ├── lib/ # whisper.cpp integration
│ └── capabilities/ # Tauri security permissions
├── docs/ # Documentation
└── models/ # AI model storage
| Layer | Technologies |
|---|---|
| Frontend | React 18, TypeScript, Tailwind CSS, Vite |
| Backend | Rust, Tauri 2.0, whisper.cpp, SQLite |
| Audio Processing | Symphonia, CPAL, WebRTC VAD, RustFFT |
| AI Models | OpenAI Whisper (Tiny, Base, Large v3) |
We welcome contributions from the community. Please read our Contributing Guidelines before getting started.
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature) - Commit your changes (
git commit -m 'Add new feature') - Push to the branch (
git push origin feature/new-feature) - Open a Pull Request
- Follow Conventional Commits for commit messages
- Run tests before submitting:
npm test && cargo test - Ensure code formatting:
npm run lint && cargo fmt - Add documentation for new features
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper - State-of-the-art speech recognition models
- whisper.cpp - High-performance C++ implementation
- Tauri - Modern desktop application framework
- Symphonia - Professional audio decoding library
- Bug Reports: GitHub Issues
- Feature Requests: GitHub Issues
- Documentation: Project Wiki
- Discussions: GitHub Discussions
Made with care by the Steno development team