This is Official PyTorch implementation for 2023-NeurIPS-MathNAS: If Blocks Have a Role in Mathematical Architecture Design.
For a quick overview of this work you can look at the poster made by the two main authors, Qinsi Wang* and Jinghan Ke*.
For more details and supplementary material content please see the paper.
If you find MathNAS useful in your research, please consider citing:
@article{qinsi2023mathnas,
title={MathNAS: If Blocks Have a Role in Mathematical Architecture Design},
author={Wang Qinsi and Ke Jinghan and Liang Zhi and Zhang Sihai},
year={2023},
eprint={2311.04943},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
The code has been tested with Python 3.7 , PyTorch 1.9.0 , CUDA 11.1. You can also run it using higher versions.
conda create --name mathnas python=3.7
conda activate mathnas
pip install torch==1.9.0 cu111 torchvision==0.10.0 cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install nas-bench-201 scipy pyyaml psutil timm yacs
pip install -i https://pypi.gurobi.com gurobipy
- Experiments on MobileNet-V3 Search Space
- Experiments on NAS-Bench-201 Search Space
- Experiments on SuperViT Search Space
- Experiments on SuperTransformer Search Space
- Dynamic Running on Edge Device
You can perform a latency-limited search on Raspberry Pi, jetson TX-2 CPU and jetson TX-2 GPU with the following code:
$ python main.py
--mode nas \
--search_space mobilenetv3 \
--device [raspberrypi/tx2cpu/tx2gpu] \
--latency_constraint [number of latency]
For example,
python main.py --mode nas --search_space mobilenetv3 --device tx2gpu --latency_constraint 500
result:
search time is 0.51258 s
The search net is {'ks': [7, 5, 7, 7, 7, 5, 7, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 3, 5, 5], 'd': [4, 4, 4, 4, 4], 'e': [6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6], 'encodearch': [3, 4, 4, 4, 3]}
The latency of search net is 68.9 s
network saved!
The output file path of search result is results/mobilenetv3/[device]/[latency_constraint].yml
.
You can predict the latency and accuracy of any network through the following code:
$ python main.py
--mode predict \
--search_space mobilenetv3 \
--device [raspberrypi/tx2cpu/tx2gpu] \
--load_path [network.yml]
For example,
python main.py --mode predict --search_space mobilenetv3 --device tx2gpu --load_path ./results/mobilenetv3/tx2gpu/500.yml
result:
The net arch is {'d': [4, 4, 4, 4, 4], 'e': [6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6], 'encodearch': [3, 4, 4, 4, 3], 'ks': [7, 5, 7, 7, 7, 5, 7, 7, 7, 7, 7, 7, 7, 5, 5, 7, 7, 3, 5, 5]}
The predicted accuracy of the net is 79.3 %
The predicted latency of the net is 68.9 s
Model | Flops(M) | Top1 | Search Time | Config |
---|---|---|---|---|
MathNAS-MB1 | 257 | 75.9 | 0.9s | link |
MathNAS-MB2 | 289 | 76.4 | 1.2s | link |
MathNAS-MB3 | 336 | 78.2 | 1.5s | link |
MathNAS-MB4 | 669 | 79.2 | 0.8s | link |
You can verify the accuracy of the searched network on ImageNet:
$ python valide/validate_mobilenetv3.py \
--config_path [Path of neural architecture config file] \
--imagenet_save_path [Path of ImageNet 1k]
You can perform a latency-limited and energy-limited search on FPGA,edgeGPU with the following code:
$ python main.py
--mode nas \
--search_space nasbench201 \
--device [fpga/edgegpu] \
--latency_constraint [number of latency] \
--energy_constraint [number of energy]
The output file path of search result is results/nasbench201/[device]/[latency_constraint]_[energy_constraint].yml
.
You can predict the latency and accuracy of any network through the following code:
$ python main.py
--mode predict \
--search_space nasbench201 \
--device [fpga/edgegpu] \
--load_path [network.yml]
You can perform a FLOPs-limited search with the following code:
$ python main.py
--mode nas \
--search_space supervit \
--flops_constraint [number of FLOPs]
The output file path of search result is results/supervit/[flops_constraint].yml
.
You can predict the latency and accuracy of any network through the following code:
$ python main.py
--mode predict \
--search_space supervit \
--load_path [network.yml]
Model | Topk1(%) | Topk5(%) | FLOPs(M) | Param(M) | Config |
---|---|---|---|---|---|
MathNAS-T1 | 78.4 | 93.5 | 200 | 8.9 | link |
MathNAS-T2 | 79.6 | 94.3 | 325 | 9.3 | link |
MathNAS-T3 | 81.3 | 95.1 | 671 | 13.6 | link |
MathNAS-T4 | 82.0 | 95.7 | 1101 | 14.4 | link |
MathNAS-T5 | 82.5 | 95.8 | 1476 | 14.8 | link |
You can verify the accuracy of the searched network on ImageNet:
$ python valide/validate_nasvit.py \
--config_path [Path of neural architecture config file] \
--imagenet_save_path [Path of ImageNet 1k]
You can perform a latency-limited search on Raspberry Pi, Intel Xeon CPU and Nvidia TITAN Xp GPU with the following code:
$ python main.py
--mode nas \
--search_space supertransformer \
--device [raspberrypi/xeon/titanxp] \
--latency_constraint [number of latency]
The output file path of search result is results/supertransformer/[device]/[latency_constraint].yml
.
You can predict the latency and accuracy of any network through the following code:
$ python main.py
--mode predict \
--search_space supertransformer \
--device [raspberrypi/xeon/titanxp] \
--load_path [network.yml]
NOTE: You can test models by BLEU score and Computing Latency.
Enter the dynamic folder, run the following code to achieve dynamic running on the CPU.
from Runcpu import Run
Run(latencymax,latencymin,ILP)
latencymax and latencymin represent the maximum and minimum latency accepted on the device, respectively. ILP represents the hardware solver selection, ILP=1 (Gurobipy) or 0 (Linprog).
Similarly, run the following code to achieve dynamic running on the GPU.
from Rungpu import Run
Run(latencymax,latencymin,ILP)
The profiles of the blocks used by the CPU and GPU are in the CPUblock and GPUblock folders.