Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
-
Updated
Jul 3, 2023 - Python
Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
CompBench evaluates the comparative reasoning of multimodal large language models (MLLMs) with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes.
Add a description, image, and links to the evaluation-llms topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-llms topic, visit your repo's landing page and select "manage topics."