Pruning/Sparsity Tutorial #304

glenn-jocher · 2020-07-05T20:59:23Z

📚 This guide explains how to apply pruning to YOLOv5 🚀 models. UPDATED 25 September 2022.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Test Normally

Before pruning we want to establish a baseline performance to compare to. This command tests YOLOv5x on COCO val2017 at image size 640 pixels. yolov5x.pt is the largest and most accurate model available. Other options are yolov5s.pt, yolov5m.pt and yolov5l.pt, or you own checkpoint from training a custom dataset ./weights/best.pt. For details on all available models please see our README table.

$ python val.py --weights yolov5x.pt --data coco.yaml --img 640 --half

Output:

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0 cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
Model Summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     [email protected] [email protected]:.95: 100% 157/157 [01:12<00:00,  2.16it/s]
                 all       5000      36335      0.732      0.628      0.683      0.496
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640)  # <--- base speed

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.507  # <--- base mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.689
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.552
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.345
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.652
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.630
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.682
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.829
Results saved to runs/val/exp

Test YOLOv5x on COCO (0.30 sparsity)

We repeat the above test with a pruned model by using the torch_utils.prune() command. We update val.py to prune YOLOv5x to 0.3 sparsity:

30% pruned output:

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0 cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
Model Summary: 444 layers, 86705005 parameters, 0 gradients
Pruning model...  0.3 global sparsity
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     [email protected] [email protected]:.95: 100% 157/157 [01:11<00:00,  2.19it/s]
                 all       5000      36335      0.724      0.614      0.671      0.478
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640)  # <--- prune mAP

Evaluating pycocotools mAP... saving runs/val/exp3/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.489  # <--- prune mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.677
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.537
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.370
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.664
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.803
Results saved to runs/val/exp3

In the results we can observe that we have achieved a sparsity of 30% in our model after pruning, which means that 30% of the model's weight parameters in nn.Conv2d layers are equal to 0. Inference time is essentially unchanged, while the model's AP and AR scores a slightly reduced.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

The text was updated successfully, but these errors were encountered:

lucasjinreal · 2020-07-06T06:40:15Z

@glenn-jocher why the speed doesn't change at all after prune? Is that only remove the weight of conv but not changed the structure actually? how to save the pruned model and it's architecture for retraining?

NanoCode012 · 2020-07-06T06:43:17Z

Is there a guideline on how much we should prune by? What are the benefits to doing this?

glenn-jocher · 2020-07-06T19:38:53Z

@jinfagang yes, structure is not changed at all, and parameter count is the same, it's just that some of the weights are 0 instead of near zero as they were before.

I suppose this would allow for effective kmeans quantization to lower bits (for smaller filesizes), but I'm not sure about any possible speed improvement. I think as long as the parameter count remains the same, the speed will remain the same.

@NanoCode012 no guidelines really, its just an experiment to see how many of the weights you can remove and what effect that has on performance. Honestly I don't really see any great applications at the moment based on my results above, but it's there in case anyone would like to explore it further.

lucasjinreal · 2020-07-07T02:33:33Z

@glenn-jocher Looka like prune has a remove method which can remove weights:

prune.remove(module, 'weight')

and all weights and params saved in module.state_dict which can be used for new pruned model.

glenn-jocher · 2020-07-07T02:43:13Z

@jinfagang yes, this .remove() method is deleting the original weights as there is a pruned copy also in the model. So before applying remove the model/module will have 2X the normal parameters, after using it it is back to it's normal parameter count.

You have to consider the shapes of the operations in the forward pass. For a convolution from say shape(1,128,20,20) to shape(1,256,20,20) you must have a weight matrix of shape 128x256. It's not possible to remove elements from a normal matrix or tensor, as it will always need 128*256 weights inside it.

There are special cases of sparse matrices in some packages/languages, it may be possible pytorch is converting the original tensor to a sparse tensor with the same shape, though I'm not sure if this is the case. Even if it were, any exported models (i.e. onnx, coreml, tensorrt) using these sparse matrices would need special support for them, or they would be handled as normal matrices.

glenn-jocher · 2020-07-07T02:44:10Z

The current pruning method incorporates the line of code you mention already as well:

yolov5/utils/torch_utils.py

Lines 88 to 97 in 121d90b

    
           def prune(model, amount=0.3): 
        
               # Prune model to requested global sparsity 
        
               import torch.nn.utils.prune as prune 
        
               print('Pruning model... ', end='') 
        
               for name, m in model.named_modules(): 
        
                   if isinstance(m, nn.Conv2d): 
        
                       prune.l1_unstructured(m, name='weight', amount=amount)  # prune 
        
                       prune.remove(m, 'weight')  # make permanent 
        
               print(' %.3g global sparsity' % sparsity(model))

lucasjinreal · 2020-07-07T02:59:30Z

@glenn-jocher Nice. do u figure out how to obtain the pruned model architecture?

glenn-jocher · 2020-07-07T03:06:02Z

@jinfagang well that's what I was saying, the architecture does not change. In my example above, the 128x256 convolution weights are still a 128x256 weights, it's just that some of their values that were previously near-zero have been set equal to zero during the pruning. The 128x256 matrix may or may not then be stored as a sparse matrix, which is a special type of matrix intended for use with data that contains mostly zeros, and saves memory (and maybe or maybe not also saves processing time).

TLDR the architecture is exactly the same when pruning, no layers are removed as far as I know, and the input and output shapes (and shapes of all intermediate layers) remain the same.

lucasjinreal · 2020-07-07T03:28:27Z

@glenn-jocher so the simplified model can not get it's new channel num and shape automatically, is there anyway to make it happen?

Lornatang · 2020-07-08T04:28:20Z

@glenn-jocher First feel your work! Let me ask you, which paper or project address is your pruning based on?

glenn-jocher · 2020-07-08T04:59:00Z

@Lornatang I based this pruning implementation off of the original pytorch pruning tutorial at the link below, but the idea to apply pruning here originally came from @jinfagang. I don't actually have any experience pruning models.
https://pytorch.org/tutorials/intermediate/pruning_tutorial.html

@jinfagang I modified detect.py to prune and save, and print updated model info:

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_normal.pt')

    torch_utils.prune(model, 0.3)
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_pruned.pt')

Output:

Model Summary: 140 layers, 7.45958e 06 parameters, 7.45958e 06 gradients, 17.5 GFLOPS
Pruning model...  0.299 global sparsity
Model Summary: 140 layers, 7.45958e 06 parameters, 7.45958e 06 gradients, 17.5 GFLOPS

Model sizes are here (for both yolov5s in FP32):

HenryWang628 · 2020-07-12T08:41:14Z

So maybe layer pruning or channel-level sparsity works better since it changes the architecture of the network?
I have seen a project like this:
https://github.com/tanluren/yolov3-channel-and-layer-pruning

glenn-jocher · 2020-07-12T16:16:16Z

@HenryWang628 I see, thanks for the link. The tensorboard histograms are very nice. So it seems a more useful method would be channel prune, mAP drop > finetune x epochs, recover some lost mAP.

This all raises the question though, if you are going to go through all of this effort on a large model like YOLOv5x, why not just train a smaller model like YOLOv5s? The training time will be much faster, and you don't need the extra pruning and finetuning steps.

glenn-jocher · 2020-07-14T19:39:10Z

For anyone interested, there is a detailed discussion on this here pytorch/tutorials#1054 (comment)

The author there says this:

I'm not familiar with your architecture, so you'll have to decide which parameters it makes sense to pool together and compare via global magnitude-based pruning; but let's assume, just for the sake of this simple example, that you only want to consider the convolutional layers identified by the logic of my if-statement below [if those aren't the weights you care about, please feel free to modify that logic as you wish].

Now, those layers happen to come with two parameters: "weight" and "bias". Let's say you are interested in the weights [if you care about the biases too, feel free to add them in as well in the parameters_to_prune]. Alright, how do we tell global_unstructured to prune those weights in a global manner? We do so by constructing parameters_to_prune as requested by that function [again, see docs and tutorial linked above].
parameter_to_prune = [
    (v, "weight") 
    for k, v in dict(model.named_modules()).items()
    if ((len(list(v.children())) == 0) and (k.endswith('conv')))
]

# now you can use global_unstructured pruning
prune.global_unstructured(parameter_to_prune, pruning_method=prune.L1Unstructured, amount=0.3)
To check that that succeeded, you can now look at the global sparsity across those layers, which should be 30%, as well as the individual per-layer sparsity:
# global sparsity
nparams = 0
pruned = 0
for k, v in dict(model.named_modules()).items():
    if ((len(list(v.children())) == 0) and (k.endswith('conv'))):
        nparams  = v.weight.nelement()
        pruned  = torch.sum(v.weight == 0)
print('Global sparsity across the pruned layers: {:.2f}%'.format( 100. * pruned / float(nparams)))
# ^^ should be 30%

# local sparsity
for k, v in dict(model.named_modules()).items():
    if ((len(list(v.children())) == 0) and (k.endswith('conv'))):
        print(
            "Sparsity in {}: {:.2f}%".format(
                k,
                100. * float(torch.sum(v.weight == 0))
                / float(v.weight.nelement())
            )
        )
# ^^ will be different for each layer
Originally posted by @mickypaganini in pytorch/tutorials#1054 (comment)

glenn-jocher · 2020-07-14T23:47:06Z

More info from pytorch/tutorials#605 (comment)

Hi @cranmer,
Hopefully this tutorial will be included soon (cc: @soumith).

As is, this module is not intended (by itself) to help you with memory savings. All that pruning does is to replace some entries with zeroes. This itself doesn't buy you anything, unless you represent the sparse tensor in a smarter way (which this module itself doesn't handle for you). You can, however, rely on torch.sparse and other functionalities there to help you with that. To give you a concrete example:
import torch
import torch.nn.utils.prune as prune

t = torch.randn(100, 100)
torch.save(t, 'full.pth')

p = prune.L1Unstructured(amount=0.9)
pruned = p.prune(t)
torch.save(pruned, 'pruned.pth')

sparsified = pruned.to_sparse()
torch.save(sparsified, 'sparsified.pth')
When I ls, these are the sizes on disk:
21K sparsified.pth
40K pruned.pth
40K full.pth
By the way, before calling prune.remove, you can expect you memory footprint to be a lot higher than what you started out with, because for each pruned parameter you now have: the original parameter, the mask, and the pruned version of the tensor. Calling prune.remove brings you back to only having a single (now pruned) tensor per pruned parameter. Still, if you don't represent these pruned parameters smartly, the memory footprint at this point won't be any lower than what you started out with.

Originally posted by @mickypaganini in pytorch/tutorials#605 (comment)

Lornatang · 2020-07-29T09:57:05Z

@glenn-jocher I think you can refer to https://github.com/vainf/torch-pruning, he has implemented this function in detail.

github-actions · 2020-09-04T00:37:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

shoebNTU · 2020-09-17T03:40:47Z

Hi, thank you everyone for the informative comments. Thanks Glen for this super-cool library. Not sure if there is a way to implement a line like - "sparsified = pruned.to_sparse()" (pytorch/tutorials#605 (comment)) for nn.conv2d?

I am trying to reduce the overall model weights. Eventually, I want to port this to a Jetson Nano. My understanding is that a smaller model yields --> faster speeds. Please correct me if my understanding is wrong. Thanks.

glenn-jocher · 2020-09-18T00:29:37Z

@shoebNTU any speed benefits would depend on the capability of your hardware and drivers to exploit sparse matrices, so there is no single answer to your question.

github-actions · 2020-11-16T00:33:33Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jayer95 · 2022-11-01T09:57:30Z

@glenn-jocher
Hi,
Could I just ask you a question regarding why the pruning taught here is not sparsely trained?

I refer to the following projects:
https://github.com/midasklr/yolov5prune/tree/v5.0
https://github.com/midasklr/yolov5prune/tree/v6.0
https://github.com/tanluren/yolov3-channel-and-layer-pruning

As the sparse training epoch progresses, more and more gamma approaches 0 by looking at tensorboard bn.

After training, pruning can be performed. A basic principle is that the threshold cannot be greater than the maximum gamma of any channel bn. Then prune according to the percentage.

glenn-jocher · 2022-11-01T10:27:31Z

@jayer95 our tutorial is in need of updating! I wrote it myself a while ago. If you'd like to propose updates/fixes that would be awesome to help everyone :)

jayer95 · 2022-11-02T05:39:45Z

@glenn-jocher Sure, I got it :)

DaphnaNanovel · 2022-12-22T12:44:33Z

Hello, is it possible to retrain pruned model? We have trained yolov5 on our custom data, then pruned the model, and would like to retrain it on the same custom data. The naive attempt to perform normal training on the pruned model was not successful and the following error was caught:
model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create TypeError: 'DetectMultiBackend' object is not subscriptable

Abanoub-G · 2023-03-02T17:17:05Z

Hi,

Thanks a lot for the tutorial and the very insightful conversation. I have successfully managed to prune and save yolov5s. However, when I come to run val.py on the saved model I get the following error:

File "models/yolov5/val.py", line 420, in <module>
    main(opt)
  File "models/yolov5/val.py", line 391, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "models/yolov5/val.py", line 142, in run
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
  File "/home/NetZIP/models/yolov5/models/common.py", line 345, in __init__
    model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)
  File "/home/NetZIP/models/yolov5/models/experimental.py", line 88, in attempt_load
    model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval())  # model in eval mode
TypeError: 'bool' object is not callable

Note, the val.py works fine when I run it using the yolov5s.pt model, but throws out the error above when running the pruned saved model. I used the code provided earlier in this conversation to save the model (https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445).

I think the issue might be in how the model gets saved rather than the pruning, because I also tried just simply saving the yolov5s.pt model without the pruning using the save code provided here https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445 and it resulted in the same error when running val.py on it.

I have been looking at this for a while and can not seem to find what is causing this error or what is the issue with the saving method. The only thing I was able to spot is that the files inside the yolov5s.pt/data/ and yolov5s_fp_32_pruned.pt/data/ have different numerals. See attached screenshots below. Could this be the issue? if yes, any idea what is causing it and how to correct it please?

Thanks

relaxtheo · 2023-05-05T08:45:33Z

Hi,

Thanks a lot for the tutorial and the very insightful conversation. I have successfully managed to prune and save yolov5s. However, when I come to run val.py on the saved model I get the following error:
File "models/yolov5/val.py", line 420, in <module>
    main(opt)
  File "models/yolov5/val.py", line 391, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "models/yolov5/val.py", line 142, in run
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
  File "/home/NetZIP/models/yolov5/models/common.py", line 345, in __init__
    model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)
  File "/home/NetZIP/models/yolov5/models/experimental.py", line 88, in attempt_load
    model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval())  # model in eval mode
TypeError: 'bool' object is not callable
Note, the val.py works fine when I run it using the yolov5s.pt model, but throws out the error above when running the pruned saved model. I used the code provided earlier in this conversation to save the model (https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445).

I think the issue might be in how the model gets saved rather than the pruning, because I also tried just simply saving the yolov5s.pt model without the pruning using the save code provided here https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445 and it resulted in the same error when running val.py on it.

I have been looking at this for a while and can not seem to find what is causing this error or what is the issue with the saving method. The only thing I was able to spot is that the files inside the yolov5s.pt/data/ and yolov5s_fp_32_pruned.pt/data/ have different numerals. See attached screenshots below. Could this be the issue? if yes, any idea what is causing it and how to correct it please?

Thanks

I have same problem. In yolov5, the pt file is a ckpt, not just the model part. My ugly solution is create a new ckpt, and copy all options except the model from the original ckpt to new new ckpt, and set the pruned model to the new ckpt.

glenn-jocher · 2023-05-05T10:54:54Z

@relaxtheo hi,

The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.

One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.

Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.

I hope this helps! Let me know if you have any further questions.

relaxtheo · 2023-05-06T03:49:21Z

After the model

@relaxtheo hi,

The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.

One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.

Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.

I hope this helps! Let me know if you have any further questions.

Thank you very much for the reply.

I am currently using v6.2, compare to the latest code, the prune method has no change, and seems a bit change in attempt_load function.

But what makes me confusing is what I can gain from this pruning? Seems model file size has no change, parameters number keeps same, inference speed has no change, so it seems I can only get a worse model with low inference performance without any gain

glenn-jocher · 2023-05-06T07:40:04Z

@relaxtheo thank you for your response.

Model pruning can help reduce the computation required for inference by removing redundant and unnecessary parameters from the model. Although the file size and number of parameters may not change significantly, the inference speed can be improved if the pruning is performed correctly.

However, the effectiveness of pruning may depend on the specific model architecture and the amount of pruning applied. It's possible that in your case, the pruning method didn't achieve significant improvements in speed or performance.

If you're looking to improve the performance of your model, you may want to try other optimization techniques such as quantization or knowledge distillation. These methods can help reduce the size and computation required for inference, resulting in faster and more efficient models.

I hope this helps! If you have any further questions or concerns, please let me know.

relaxtheo · 2023-05-06T10:08:39Z

@relaxtheo thank you for your response.

Model pruning can help reduce the computation required for inference by removing redundant and unnecessary parameters from the model. Although the file size and number of parameters may not change significantly, the inference speed can be improved if the pruning is performed correctly.

However, the effectiveness of pruning may depend on the specific model architecture and the amount of pruning applied. It's possible that in your case, the pruning method didn't achieve significant improvements in speed or performance.

If you're looking to improve the performance of your model, you may want to try other optimization techniques such as quantization or knowledge distillation. These methods can help reduce the size and computation required for inference, resulting in faster and more efficient models.

I hope this helps! If you have any further questions or concerns, please let me know.

Thank you very much!

glenn-jocher · 2023-05-06T10:42:52Z

@relaxtheo hi there,

Thanks for sharing your experience with model pruning in YOLOv5. While model pruning aims to reduce the computation required for inference by removing redundant and unnecessary parameters, the effectiveness of pruning may depend on various factors, including the specific model architecture and the amount of pruning applied. Therefore, it's possible that in your case, the pruning method you used didn't achieve significant improvements in speed or performance.

If you're looking to further optimize your model, you may want to consider other approaches such as quantization or knowledge distillation. These optimization techniques can help reduce the size and computation required for inference, resulting in faster and more efficient models.

Please let us know if you have any further questions or concerns. We're here to help!

Best, [Your name/Team name]

bryanbocao · 2023-05-07T03:54:12Z

After the model

@relaxtheo hi,
The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.
One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.
Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.
I hope this helps! Let me know if you have any further questions.

Thank you very much for the reply.

I am currently using v6.2, compare to the latest code, the prune method has no change, and seems a bit change in attempt_load function.

But what makes me confusing is what I can gain from this pruning? Seems model file size has no change, parameters number keeps same, inference speed has no change, so it seems I can only get a worse model with low inference performance without any gain

@relaxtheo I think the current pruning method is specifically "unstructured pruning" (correct me if I am wrong) where filters with small weight magnitudes are set to 0s, but they are still stored in the model weight file (i.e. <model>.pth) and those zero values are not actually removed which still take some space in the disk. That's why the model file size is not changed. During inference, unless the code has an explicit way to accelerate like skipping those zeros, it will still do the same amount of computation on those parameters with zero values. But the advantage is that I treat it as an efficient way to estimate how the model performance can preserve and the potential to accelerate, so that I know when to actually prune the model in the next step.

The thing you are looking for might be "structure pruning" (https://github.com/VainF/Torch-Pruning) that actually removes those zeros after pruning to save both space and time, but it is not easy to implement due to the dependency among layers in various network architectures.

glenn-jocher · 2023-05-07T04:47:45Z

@bryanbocao hi there,

Thank you for reaching out. You are correct that the current pruning method in YOLOv5 uses unstructured pruning, where filters with small weight magnitude are set to 0s, while they are still stored in the weight file. As a result, the model file size may not change significantly, and inference speed may not be improved unless the code has an explicit way to accelerate like skipping those zeros.

Structure pruning, on the other hand, removes those zeros after pruning to save both space and time. However, implementing structure pruning may not be easy due to the dependency among layers in various network architectures.

We appreciate your feedback on this issue, and we'll keep it in mind as we continue to improve YOLOv5. If you have any further questions or concerns, don't hesitate to let us know.

Best, [Your name/Team name]

relaxtheo · 2023-05-08T03:15:25Z

@bryanbocao @glenn-jocher Thank you all very much, I will try your recommendations

glenn-jocher · 2023-05-08T06:12:13Z

@relaxtheo Thank you for reaching out, and we're glad to hear that our recommendations were helpful. Don't hesitate to let us know if you have any further questions or concerns. We're here to help!

Mary14-design · 2024-01-09T15:12:38Z

It may be a bit unrelated but I am similar error while trying to do training. I am still new to the yolo models. Can you help me please with solving it?

glenn-jocher · 2024-01-09T17:39:49Z

@Mary14-design it seems like there might be an issue with the image link you've provided; it's not displaying correctly. However, I'm here to help you with your training issue. Could you please provide more details about the error message you're encountering during training with YOLOv5? This will help me understand the problem better and assist you accordingly. If you can copy and paste the error message or describe the issue in more detail, that would be great.

glenn-jocher added enhancement New feature or request documentation Improvements or additions to documentation labels Jul 5, 2020

glenn-jocher self-assigned this Jul 5, 2020

glenn-jocher mentioned this issue Jul 5, 2020

Model pruning support #120

Closed

glenn-jocher mentioned this issue Jul 14, 2020

Pruning Tutorial Not Working Correctly pytorch/tutorials#1054

Closed

github-actions bot added the Stale Stale and schedule for closing soon label Sep 4, 2020

github-actions bot closed this as completed Sep 9, 2020

glenn-jocher removed the Stale Stale and schedule for closing soon label Oct 8, 2020

glenn-jocher reopened this Oct 8, 2020

glenn-jocher mentioned this issue Nov 10, 2020

What is augmented inference ？ #1340

Closed

glenn-jocher mentioned this issue Sep 25, 2022

Documentation of methods, parameters, allowed values, term definitions, etc, etc #9584

Closed

2 tasks

This was referenced Oct 3, 2022

Gradual unfreezing the layers during training. #9677

Closed

Procedure of training the model offline. #9700

Closed

glenn-jocher mentioned this issue Oct 10, 2022

confusion matrix - backgroud part #9754

Closed

This was referenced Oct 24, 2022

Use Yolo for anomaly detection #9906

Closed

I want to pass the image read by opencv to the model I/F #9913

Closed

BBuf mentioned this issue Oct 30, 2022

[Plan v2.0] 源码解读以及开发计划 Oneflow-Inc/oneflow-yolo-doc#22

Open

24 tasks

This was referenced Nov 6, 2022

Number of Classes #10054

Closed

Multigpu training becomes slower in Kaggle #10078

Closed

Yolov5 cannot detection a video (tfjs) #7416

Closed

glenn-jocher mentioned this issue Dec 6, 2022

How to freeze backbone and unfreeze it after a specific epoch? #10416

Closed

1 task

UNeedCryDear mentioned this issue Feb 14, 2023

Failed to read the onnx model UNeedCryDear/yolov5-opencv-dnn-cpp#23

Closed

NamburiSrinath mentioned this issue May 9, 2023

RuntimeError: nonzero is not supported for tensors with more than INT_MAX elements, file a support request pytorch/pytorch#100989

Closed

Pruning/Sparsity Tutorial #304

Pruning/Sparsity Tutorial #304

Comments

glenn-jocher commented Jul 5, 2020 • edited Loading

Before You Start

Test Normally

Test YOLOv5x on COCO (0.30 sparsity)

Environments

Status

lucasjinreal commented Jul 6, 2020

NanoCode012 commented Jul 6, 2020

glenn-jocher commented Jul 6, 2020

lucasjinreal commented Jul 7, 2020 • edited Loading

glenn-jocher commented Jul 7, 2020

glenn-jocher commented Jul 7, 2020

lucasjinreal commented Jul 7, 2020

glenn-jocher commented Jul 7, 2020

lucasjinreal commented Jul 7, 2020

Lornatang commented Jul 8, 2020

glenn-jocher commented Jul 8, 2020

HenryWang628 commented Jul 12, 2020

glenn-jocher commented Jul 12, 2020 • edited Loading

glenn-jocher commented Jul 14, 2020

glenn-jocher commented Jul 14, 2020 • edited Loading

Lornatang commented Jul 29, 2020

github-actions bot commented Sep 4, 2020

shoebNTU commented Sep 17, 2020

glenn-jocher commented Sep 18, 2020

github-actions bot commented Nov 16, 2020

jayer95 commented Nov 1, 2022

glenn-jocher commented Nov 1, 2022

jayer95 commented Nov 2, 2022

DaphnaNanovel commented Dec 22, 2022

Abanoub-G commented Mar 2, 2023 • edited by glenn-jocher Loading

relaxtheo commented May 5, 2023

glenn-jocher commented May 5, 2023

relaxtheo commented May 6, 2023

glenn-jocher commented May 6, 2023

relaxtheo commented May 6, 2023

glenn-jocher commented May 6, 2023

bryanbocao commented May 7, 2023 • edited Loading

glenn-jocher commented May 7, 2023

relaxtheo commented May 8, 2023

glenn-jocher commented May 8, 2023

Mary14-design commented Jan 9, 2024

glenn-jocher commented Jan 9, 2024

glenn-jocher commented Jul 5, 2020 •

edited

Loading

lucasjinreal commented Jul 7, 2020 •

edited

Loading

glenn-jocher commented Jul 12, 2020 •

edited

Loading

glenn-jocher commented Jul 14, 2020 •

edited

Loading

Abanoub-G commented Mar 2, 2023 •

edited by glenn-jocher

Loading

bryanbocao commented May 7, 2023 •

edited

Loading