Stars
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Effortless data labeling with AI support from Segment Anything and other awesome models.
坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones, efficient RepGFPN, ZeroHead, AlignedOTA, and distillation enhancement.
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
cengfubo / ChatTTS
Forked from 6drf21e/ChatTTS_colab🚀 一键部署(含离线整合包)!基于 ChatTTS ,支持音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。
This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM , etc. It is not excluded that more models will be supported in the future. At the …
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube
《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合中国宝宝的部署教程
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
智能微秘书,全能的微信机器人管理平台,最简单的方式接入ChatGPT,FastGPT,Dify,Coze,扣子.支持绘图,语音识别,语音发送,定时任务,支持企微、公众号、5G 消息、WhatsApp
🚀 MaxKB 是一款基于大语言模型和 RAG 的开源知识库问答系统,广泛应用于智能客服、企业内部知识库、学术研究与教育等场景。
Code examples and resources for DBRX, a large language model developed by Databricks
The open source platform for AI-native application development.
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" (TMLR 2024)
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
SplitML (Signal Processing Library for Interference rejecTion by Machine Learning) is a code repository for a set of tools for interference rejection in complex time-domain signals.
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80 languages recognition, provide data annotation and synthesis tools, support training and…
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)