This repo is for vision-language based zero-shot instance segmentation built on top of detectron2 framework..
To use this package, you will need to install it
→ via source
$ git clone https://github.com/iKrishneel/zsis.git
$ pip install -e zsis
→ via pip
$ pip install git https://github.com/iKrishneel/zsis@master
The vision-language model is based of the OpenAI CLIP and is fully configurable through the detectron2 config.
This model uses the two stage MaskRCNN pipeline to generate class agnostic object bounding boxes and uses CLIP for classifying the proposal bounding boxes. The pretrained cutlter is used as a class agnostic proposal generator. The configs for Culter CLIP can be found here.
To run the Culter CLIP demo
$ python tools/test.py --config-file config/culter/cascade_mask_rcnn_R_50_FPN_clip.yaml --image PATH_TO_IMAGE --labels LIST_OF_VOCABS
Run the following example command. Note that it will first download two sets of weights; one for Cutler and other for CLIP.
$ python tools/test.py --config-file config/culter/cascade_mask_rcnn_R_50_FPN_clip.yaml --image assets/images/bus.jpg --labels bus,car,people,truck,cat,dot