-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize weight compression via OpenVINO submodels #2727
base: develop
Are you sure you want to change the base?
Generalize weight compression via OpenVINO submodels #2727
Conversation
from nncf.openvino.quantization.compression_primitives import OV_COMPRESSION_PRIMITIVE_CACHE | ||
|
||
compress_weight_primitive = OV_COMPRESSION_PRIMITIVE_CACHE.get_compress_weight_primitive( | ||
config, weight.shape, scale.shape, zero_point.shape | ||
) | ||
compressed_weights = Tensor(compress_weight_primitive(weight.data, scale.data, zero_point.data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to hide caching logic and have simple function call like compress_weight
or quantize_weight
.
from nncf.openvino.quantization.compression_primitives import OV_COMPRESSION_PRIMITIVE_CACHE | |
compress_weight_primitive = OV_COMPRESSION_PRIMITIVE_CACHE.get_compress_weight_primitive( | |
config, weight.shape, scale.shape, zero_point.shape | |
) | |
compressed_weights = Tensor(compress_weight_primitive(weight.data, scale.data, zero_point.data)) | |
from nncf.openvino.quantization.weight_lowering import compress_weight | |
compressed_weights = compress_weight(weight, scale, zero_point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you suggest to create a quantize_weight()
/compress_weight()
function alongside calculate_quantized_weight()
function at weight_lowering.py
and call the former one from the latter one? If so, don't you think it is an unnecessary function multiplication? In my opinion weight_lowering.py
already has so many different functions with similar names, it's hard to distinguish which does what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with your opinion. I'm working on this right now. If you have any suggestions for refactoring weight_lowering.py
, please, let me know. Regarding this comment, I hope that you can extend the functionality of weight_lowering.py
without adding complexity to this module.
|
||
level_low = 0 | ||
level_high = 2**num_bits - 1 | ||
if weight.backend == TensorBackend.numpy and is_openvino_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid copy-paste of this code, dispatching can be implemented.
from nncf.quantization.algorithms.weight_compression.config import WeightCompressionConfig | ||
|
||
|
||
class OVCompressionPrimitiveCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache logic can be encapsulated into decorator as following:
@cache
def jit_compress_weight(
weight: Tensor,
scale: Tensor,
zero_point: Tensor,
num_bits: int
):
level_low = 0
level_high = 2**num_bits - 1
w = opset.parameter(weight.shape, name="w")
s = opset.parameter(scale.shape, name="s")
zp = opset.parameter(zero_point.shape, name="zp")
result = opset.clamp(opset.round(w / s zp), level_low, level_high, name="compressed_weights")
if return_nodes:
return w, s, zp, result
model = ov.Model([result], [w, s, zp])
compiled_model = ov.compile_model(model)
return lambda w, s, zp: compiled_model([w, s, zp])[0]
def compress_weight(
weight: Tensor,
scale: Tensor,
zero_point: Tensor,
num_bits: int
):
return jit_compress_weight(weight, scale, zero_point, num_bits)
55cafaa
to
a68a63d
Compare
6b98ddd
to
3d9faa4
Compare
cb6aaa0
to
1c85732
Compare
1c85732
to
b527cac
Compare
b527cac
to
ac3ea02
Compare
ac3ea02
to
2a3a63c
Compare
Changes
Reason for changes
Related tickets
Tests