Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Graphical User Interface (GUI) for the simulated skylight properties and information
Code for our CVPR'2024 paper "GauHuman: Articulated Gaussian Splatting from Monocular Human Videos"
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
Real-Time High-Resolution Background Matting
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image …
EMNLP'22 | MedCLIP: Contrastive Learning from Unpaired Medical Images and Texts
[SIGGRAPH'24] CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)
InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds (CVPR 2023)
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
A modular graph-based Retrieval-Augmented Generation (RAG) system
OpenAI 接口管理 & 分发系统,支持 Azure、Anthropic Claude、Google PaLM 2 & Gemini、智谱 ChatGLM、百度文心一言、讯飞星火认知、阿里通义千问、360 智脑以及腾讯混元,可用于二次分发管理 key,仅单可执行文件,已打包好 Docker 镜像,一键部署,开箱即用. OpenAI key management & redistributi…
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
提供了一种gpt大模型平替解决方案实现利用非gpt大模型去使用Graphrag,支持多类型大模型如本地大模型(Ollama)、阿里云通义千问、百度文心千帆、智谱ChatGML、讯飞星火认知、Ollama、Moonshot AI、Google Gemini等。示例代码使用阿里的通义千问大模型,其他大模型使用方式相同。
A web-based video editing tool implemented with WebCodecs, similar to CapCut Web.使用webcodecs实现的Web端视频编辑工具,类似剪映Web版。
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
教你只用最基本的python语法和numpy一步步实现深度学习框架
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
bdashore3 / flash-attention
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
Data annotation toolbox supports image, audio and video data.
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Quick exploration into fine tuning florence 2
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
CleanStream is an OBS plugin that uses AI to clean live audio streams from unwanted words and utterances