Stars
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
This repository contains tutorials and examples for Triton Inference Server
Repository for training models for music source separation.
OCR model that handles complex tables, forms, handwriting with full layout.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
This GitHub Action creates a GitHub contribution calendar on a 3D profile image.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
This repository contains a comprehensive computer vision/machine learning football project that uses YOLO for object detection, Kmeans for pixel segmentation, optical flow for motion tracking, and …
GenAI Agent Framework, the Pydantic way
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
Source for the TechEmpower Framework Benchmarks project
Official repository of SepReformer for speech separation



