Lists (32)
Sort Name ascending (A-Z)
3D模型与纹理结合的渲染
3D模型互转工具
3D表达--不一样的思路(无需smpl)
3维重建
CV综述
openMVG--重建
smpl
some vton data prepare tool
三维重建--多照片联合重建
乐谱工具
人与环境的交互,比如人与环境的重叠点灯。
人体分割
人像抠图
体型SMPL提取尺寸信息
动图生成
号称效果比较好的re-texture
图像/视频高清化
图像显示特征提取
姿态重建
对原图片的新视角打光
工程化代码
年龄变化
换其它的
头发/鞋子数据集
暂时不知道,看起来名气不错
服装动态自然效果
服装建模重建
点云生成
自动换衣<VTON-liked>
虚拟试衣论文
重建人体
音乐识谱
Stars
cassiniR / FunASR
Forked from modelscope/FunASRA Fundamental End-to-End Speech Recognition Toolkit
MMeRAG is an open-source RAG (Retrieval-Augmented Generation), Provides a parser for audio and video data to implement RAG for audio and video. MMeRAG是一个开源的RAG项目,提供了一种用于音频和视频数据的解析器,用来实现音视频的RAG。
A Simple and Efficient Implementation Of Fast Fourier Transform For Audio Denoise
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1
An Open-Sourced LLM-empowered Foundation TTS System
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
A diffusers pipeline for zero shot stylised couples portrait creation
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
✨✨Latest Advances on Multimodal Large Language Models
OmniControl: Control Any Joint at Any Time for Human Motion Generation, ICLR 2024
A 3DGS framework for omni urban scene reconstruction and simulation.
An ASR model for transcribing laughter and speech-laugh in conversational speech
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
[SIGGRAPH Asia 2024] PuzzleAvatar: Assembling 3D Avatars from Personal Albums
[ICCV 2023]ToonTalker: Cross-Domain Face Reenactment
[ICML 2024] 🍅HumanTOMATO: Text-aligned Whole-body Motion Generation
A tool to tranform the flame texture space,shape and pose paramerter into SMPL or SMPLX model 's head(or face).
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
20 high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
ViViD: Video Virtual Try-on using Diffusion Models
[ECCV'24] TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Talk to your database as if you were chatting with a friend. Turn natural language into powerful SQL queries effortlessly, and get your answers back in a language you understand. No technical jargo…
First Place Winner at Delta Hacks 5. Analyses speech, hand gestures, and facial expressions and gives both real-time feedback as well as a summary of results at the end.