Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
This repository contains simple but quite fun deep learning projects:)
Official Pytorch implementation of CutMix regularizer
ORION: Orientation-boosted Voxel Nets for 3D Object Recognition
[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Learning low-shot object classification with explicit shape bias learned from point clouds
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
Scripts for fine-tuning Llama2 via SFT and DPO.
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Google Gemini AI model w/speech recognition and voice.
Official Github repo for the paper "Evaluating the Evaluation of Diversity in Natural Language Generation"
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
PyTorch code and models for the DINOv2 self-supervised learning method.
An open source implementation of CLIP.
Android in-app purchases and subscriptions made easy.
Oboe is a C library that makes it easy to build high-performance audio apps on Android.