RGBX_Semantic_Segmentation

The official implementation of CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers (IEEE T-ITS 2023): More details can be found in our paper [PDF].

Usage

Installation

Requirements

Python 3.7
PyTorch 1.7.0 or higher
CUDA 10.2 or higher

We have tested the following versions of OS and softwares:

OS: Ubuntu 18.04.6 LTS
CUDA: 10.2
PyTorch 1.8.2
Python 3.8.11

Install all dependencies. Install pytorch, cuda and cudnn, then install other dependencies via:

pip install -r requirements.txt

Datasets

Orgnize the dataset folder in the following structure:

<datasets>
|-- <DatasetName1>
    |-- <RGBFolder>
        |-- <name1>.<ImageFormat>
        |-- <name2>.<ImageFormat>
        ...
    |-- <ModalXFolder>
        |-- <name1>.<ModalXFormat>
        |-- <name2>.<ModalXFormat>
        ...
    |-- <LabelFolder>
        |-- <name1>.<LabelFormat>
        |-- <name2>.<LabelFormat>
        ...
    |-- train.txt
    |-- test.txt
|-- <DatasetName2>
|-- ...

train.txt contains the names of items in training set, e.g.:

<name1>
<name2>
...

For RGB-Depth semantic segmentation, the generation of HHA maps from Depth maps can refer to https://github.com/charlesCXK/Depth2HHA-python.

For preparation of other datasets, please refer to the original websites:

Train

Pretrain weights:

Download the pretrained segformer here pretrained segformer.
Config

Edit config file in configs.py, including dataset and network settings.

Run multi GPU distributed training:

$ CUDA_VISIBLE_DEVICES="GPU IDs" python -m torch.distributed.launch --nproc_per_node="GPU numbers you want to use" train.py

The tensorboard file is saved in log_<datasetName>_<backboneSize>/tb/ directory.
Checkpoints are stored in log_<datasetName>_<backboneSize>/checkpoints/ directory.

Evaluation

Run the evaluation by:

CUDA_VISIBLE_DEVICES="GPU IDs" python eval.py -d="Device ID" -e="epoch number or range"

If you want to use multi GPUs please specify multiple Device IDs (0,1,2...).

Result

We offer the pre-trained weights on different RGBX datasets (Some weights are not available yet. Due to the difference of training platforms, these weights may not be correctly loaded):

NYU-V2(40 categories)

Architecture	Backbone	mIOU(SS)	mIOU(MS & Flip)	Weight
CMX (SegFormer)	MiT-B2	54.1%	54.4%	NYU-MiT-B2
CMX (SegFormer)	MiT-B4	56.0%	56.3%
CMX (SegFormer)	MiT-B5	56.8%	56.9%

MFNet(9 categories)

Architecture	Backbone	mIOU	Weight
CMX (SegFormer)	MiT-B2	58.2%	MFNet-MiT-B2
CMX (SegFormer)	MiT-B4	59.7%

ScanNet-V2(20 categories)

Architecture	Backbone	mIOU	Weight
CMX (SegFormer)	MiT-B2	61.3%	ScanNet-MiT-B2

RGB-Event(20 categories)

Architecture	Backbone	mIOU	Weight
CMX (SegFormer)	MiT-B4	64.28%	RGBE-MiT-B4

Publication

If you find this repo useful, please consider referencing the following paper:

@article{zhang2023cmx,
  title={CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers},
  author={Zhang, Jiaming and Liu, Huayao and Yang, Kailun and Hu, Xinxin and Liu, Ruiping and Stiefelhagen, Rainer},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2023}
}

Acknowledgement

Our code is heavily based on TorchSeg and SA-Gate, thanks for their excellent work!

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
dataloader		dataloader
engine		engine
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
eval.py		eval.py
requirements.txt		requirements.txt
segmentation.jpg		segmentation.jpg
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RGBX_Semantic_Segmentation

Usage

Installation

Datasets

Train

Evaluation

Result

NYU-V2(40 categories)

MFNet(9 categories)

ScanNet-V2(20 categories)

RGB-Event(20 categories)

Publication

Acknowledgement

About

Releases

Packages

Contributors 4

Languages

License

huaaaliu/RGBX_Semantic_Segmentation

Folders and files

Latest commit

History

Repository files navigation

RGBX_Semantic_Segmentation

Usage

Installation

Datasets

Train

Evaluation

Result

NYU-V2(40 categories)

MFNet(9 categories)

ScanNet-V2(20 categories)

RGB-Event(20 categories)

Publication

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages