Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Activity launcher creates shortcuts for any installed app and hidden activities to launch them with ease
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
Auto_Jobs_Applier_AIHawk is a tool that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple job offers in an automated and personalized…
Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Unified Efficient Fine-Tuning of 100 LLMs (ACL 2024)
This repo is for Amazon ML Challenge 2024. The challenge was to develop a Machine Learning model to extract product details directly from the product images.
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
Strong and Open Vision Language Assistant for Mobile Devices
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Use Florence 2 to auto-label data for use in training fine-tuned object detection models.
Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera …
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://…
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the informati…
Quick exploration into fine tuning florence 2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
How to use bounding boxes with the Gemini API
Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Course on LLMs: Building Personalized Customer Chatbots •
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)