GitHub - pipixin321/HolmesVAD: Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

If you like our project, please give us a star ⭐ on GitHub for latest update.

📰 News

[2024.07.01] 🔥🔥🔥 Our inference code is available, and we release our model at [HolmesVAD-7B].
[2024.06.12] 👀 Our HolmesVAD and VAD-Instruct50k will be available soon, welcome to star ⭐ this repository for the latest updates.

😮 Highlights

Towards open-ended Video Anomaly Detection (VAD), existing methods often exhibit biased detection when faced with challenging or unseen events and lack interpretability. To address these drawbacks, we propose Holmes-VAD, a novel framework that leverages precise temporal supervision and rich multimodal instructions to enable accurate anomaly localization and comprehensive explanations.

Firstly, towards unbiased and explainable VAD system, we construct the first largescale multimodal VAD instruction-tuning benchmark, i.e., VAD-Instruct50k. This dataset is created using a carefully designed semi-automatic labeling paradigm. Efficient single-frame annotations are applied to the collected untrimmed videos, which are then synthesized into high-quality analyses of both abnormal and normal video clips using a robust off-the-shelf video captioner and a large language model (LLM).

Building upon the VAD-Instruct50k dataset, we develop a customized solution for interpretable video anomaly detection. We train a lightweight temporal sampler to select frames with high anomaly response and fine-tune a multimodal large language model (LLM) to generate explanatory content.

🛠️ Requirements and Installation

Python >= 3.10
Pytorch == 2.0.1
CUDA Version >= 11.7
transformers >= 4.37.2
Install required packages:

# inference only
git clone https://github.com/pipixin321/HolmesVAD.git
cd HolmesVAD
conda create -n holmesvad python=3.10 -y
conda activate holmesvad
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install decord opencv-python pytorchvideo

# additional packages for training
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

🤗 Demo

CLI Inference

CUDA_VISIBLE_DEVICES=0 python demo/cli.py --model-path ./checkpoints/HolmesVAD-7B --file ./demo/examples/vad/RoadAccidents133_x264_270_451.mp4

Gradio Web UI

CUDA_VISIBLE_DEVICES=0 python demo/gradio_demo.py

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
demo		demo
videollava		videollava
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

If you like our project, please give us a star ⭐ on GitHub for latest update.

📰 News

😮 Highlights

🛠️ Requirements and Installation

🤗 Demo

CLI Inference

Gradio Web UI

About

Releases

Packages

Languages

License

pipixin321/HolmesVAD

Folders and files

Latest commit

History

Repository files navigation

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

If you like our project, please give us a star ⭐ on GitHub for latest update.

📰 News

😮 Highlights

🛠️ Requirements and Installation

🤗 Demo

CLI Inference

Gradio Web UI

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages