Generalize weight compression via OpenVINO submodels #2727

nikita-savelyevv · 2024-06-11T13:45:12Z

Changes

Reason for changes

Related tickets

Tests

alexsu52 · 2024-06-13T05:22:17Z

nncf/quantization/algorithms/weight_compression/weight_lowering.py

+ from nncf.openvino.quantization.compression_primitives import OV_COMPRESSION_PRIMITIVE_CACHE
+
+ compress_weight_primitive = OV_COMPRESSION_PRIMITIVE_CACHE.get_compress_weight_primitive(
+ config, weight.shape, scale.shape, zero_point.shape
+ )
+ compressed_weights = Tensor(compress_weight_primitive(weight.data, scale.data, zero_point.data))


I would suggest to hide caching logic and have simple function call like compress_weight or quantize_weight.

Suggested change

from nncf.openvino.quantization.compression_primitives import OV_COMPRESSION_PRIMITIVE_CACHE

compress_weight_primitive = OV_COMPRESSION_PRIMITIVE_CACHE.get_compress_weight_primitive(

config, weight.shape, scale.shape, zero_point.shape

)

compressed_weights = Tensor(compress_weight_primitive(weight.data, scale.data, zero_point.data))

from nncf.openvino.quantization.weight_lowering import compress_weight

compressed_weights = compress_weight(weight, scale, zero_point)

Do you suggest to create a quantize_weight()/compress_weight() function alongside calculate_quantized_weight() function at weight_lowering.py and call the former one from the latter one? If so, don't you think it is an unnecessary function multiplication? In my opinion weight_lowering.py already has so many different functions with similar names, it's hard to distinguish which does what.

I tend to agree with your opinion. I'm working on this right now. If you have any suggestions for refactoring weight_lowering.py, please, let me know. Regarding this comment, I hope that you can extend the functionality of weight_lowering.py without adding complexity to this module.

alexsu52 · 2024-06-13T05:25:50Z

nncf/quantization/algorithms/weight_compression/weight_lowering.py


- level_low = 0
- level_high = 2**num_bits - 1
+ if weight.backend == TensorBackend.numpy and is_openvino_available():


To avoid copy-paste of this code, dispatching can be implemented.

alexsu52 · 2024-06-13T05:56:04Z

nncf/openvino/quantization/compression_primitives.py

+from nncf.quantization.algorithms.weight_compression.config import WeightCompressionConfig
+
+
+class OVCompressionPrimitiveCache:


Cache logic can be encapsulated into decorator as following:

@cache def jit_compress_weight( weight: Tensor, scale: Tensor, zero_point: Tensor, num_bits: int ): level_low = 0 level_high = 2**num_bits - 1 w = opset.parameter(weight.shape, name="w") s = opset.parameter(scale.shape, name="s") zp = opset.parameter(zero_point.shape, name="zp") result = opset.clamp(opset.round(w / s zp), level_low, level_high, name="compressed_weights") if return_nodes: return w, s, zp, result model = ov.Model([result], [w, s, zp]) compiled_model = ov.compile_model(model) return lambda w, s, zp: compiled_model([w, s, zp])[0] def compress_weight( weight: Tensor, scale: Tensor, zero_point: Tensor, num_bits: int ): return jit_compress_weight(weight, scale, zero_point, num_bits)

github-actions bot added NNCF Common Pull request that updates NNCF Common NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jun 11, 2024

alexsu52 reviewed Jun 13, 2024

View reviewed changes

nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 55cafaa to a68a63d Compare July 3, 2024 18:31

nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 6b98ddd to 3d9faa4 Compare July 16, 2024 14:19

nikita-savelyevv force-pushed the compress-via-openvino branch 5 times, most recently from cb6aaa0 to 1c85732 Compare September 3, 2024 16:02

nikita-savelyevv added 2 commits September 5, 2024 19:36

Initial draft. Rebased.

5432254

Dynamic shapes WIP

e6a1a4b

nikita-savelyevv force-pushed the compress-via-openvino branch from 1c85732 to b527cac Compare September 6, 2024 11:11

github-actions bot added the documentation Improvements or additions to documentation label Sep 6, 2024

BF16 support

8a83597

nikita-savelyevv force-pushed the compress-via-openvino branch from b527cac to ac3ea02 Compare September 6, 2024 18:27

nikita-savelyevv added 3 commits September 9, 2024 11:49

End-to-end compression WIP

5aefe9b

Add logic to compare numpy to ov computations

3d54344

Added release_memory

e04e7d1

nikita-savelyevv force-pushed the compress-via-openvino branch from ac3ea02 to 2a3a63c Compare September 11, 2024 12:59

nikita-savelyevv added 3 commits September 11, 2024 14:59

Added a script to run multiple experiments sequentially

2a3a63c

INT4 experiments

8e1d7b4

BF16 fix

43967ab

INT4 experiments

c9569bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize weight compression via OpenVINO submodels #2727

Generalize weight compression via OpenVINO submodels #2727

nikita-savelyevv commented Jun 11, 2024

alexsu52 Jun 13, 2024

nikita-savelyevv Jul 3, 2024 •

edited

Loading

alexsu52 Jul 4, 2024 •

edited

Loading

alexsu52 Jun 13, 2024

alexsu52 Jun 13, 2024

		from nncf.quantization.algorithms.weight_compression.config import WeightCompressionConfig


		class OVCompressionPrimitiveCache:

Generalize weight compression via OpenVINO submodels #2727

Are you sure you want to change the base?

Generalize weight compression via OpenVINO submodels #2727

Conversation

nikita-savelyevv commented Jun 11, 2024

Changes

Reason for changes

Related tickets

Tests

alexsu52 Jun 13, 2024

Choose a reason for hiding this comment

nikita-savelyevv Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

alexsu52 Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

alexsu52 Jun 13, 2024

Choose a reason for hiding this comment

alexsu52 Jun 13, 2024

Choose a reason for hiding this comment

nikita-savelyevv Jul 3, 2024 •

edited

Loading

alexsu52 Jul 4, 2024 •

edited

Loading