Skip to content
View Ravi-Teja-konda's full-sized avatar

Block or report Ravi-Teja-konda

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results

Activity launcher creates shortcuts for any installed app and hidden activities to launch them with ease

Kotlin 898 170 Updated May 31, 2024

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 14,945 1,382 Updated Oct 15, 2024

Auto_Jobs_Applier_AIHawk is a tool that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple job offers in an automated and personalized…

Python 19,066 2,844 Updated Oct 27, 2024

Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.

Python 28 2 Updated Jan 20, 2024

Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜

Jupyter Notebook 789 78 Updated Sep 11, 2024
Python 14 1 Updated May 17, 2024

Unified Efficient Fine-Tuning of 100 LLMs (ACL 2024)

Python 33,099 4,074 Updated Oct 28, 2024

This repo is for Amazon ML Challenge 2024. The challenge was to develop a Machine Learning model to extract product details directly from the product images.

Python 47 8 Updated Sep 24, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,399 869 Updated Oct 22, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,795 162 Updated Oct 4, 2024

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 3,766 344 Updated Oct 7, 2024

streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL

Python 1,374 101 Updated Oct 24, 2024

Low-level Computer Vision library in Rust

Rust 186 18 Updated Oct 28, 2024

Strong and Open Vision Language Assistant for Mobile Devices

Python 1,019 66 Updated Apr 15, 2024

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 2,940 212 Updated Sep 25, 2024

Use Florence 2 to auto-label data for use in training fine-tuned object detection models.

Python 59 7 Updated Aug 15, 2024

Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera …

Python 85 16 Updated Sep 28, 2020

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://…

68 7 Updated Oct 19, 2023

This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the informati…

246 18 Updated Jan 10, 2022

Compose multimodal datasets 🎹

Python 197 9 Updated Oct 26, 2024

Quick exploration into fine tuning florence 2

Jupyter Notebook 265 24 Updated Sep 19, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 814 56 Updated Oct 26, 2024

How to use bounding boxes with the Gemini API

TypeScript 86 11 Updated Jun 23, 2024

Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app

Python 52,250 6,802 Updated Sep 12, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,456 73 Updated Oct 9, 2024

Course on LLMs: Building Personalized Customer Chatbots •

Jupyter Notebook 21 9 Updated May 19, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,936 157 Updated Oct 24, 2024

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,193 106 Updated Aug 27, 2024

tiny vision language model

Jupyter Notebook 5,156 447 Updated Oct 26, 2024

Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

HTML 3,990 561 Updated May 30, 2023
Next