Lists (1)
Sort Name ascending (A-Z)
Stars
OctoTools: An agentic framework with extensible tools for complex reasoning
Align Anything: Training All-modality Model with Feedback
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
🧑🚀 全世界最好的LLM资料总结(数据处理、模型训练、模型部署、o1 模型、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案,支持 OpenAI,Midjourney,Claude,讯飞星火,Stable Diffusion,DALL·E,ChatGLM,通义千问,腾讯混元,360 智脑,百川 AI,火山方舟,新必应,Gemini,Moonshot …
Direction-Aware Multichannel Selective Fixed-filter Active Noise Control
This is the official repository for TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023].
Mel cepstral distortion (MCD) computations in python.
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
Train transformer language models with reinforcement learning.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
On-device AI across mobile, embedded and edge for PyTorch
Implementation of the proposed minGRU in Pytorch
[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Malfunctioning Industrial Machine Investigation and Inspection
An Open-Sourced LLM-empowered Foundation TTS System
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Text-to-Music Generation with Rectified Flow Transformers