Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize weight compression via OpenVINO submodels #2727

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

nikita-savelyevv
Copy link
Collaborator

Changes

Reason for changes

Related tickets

Tests

@github-actions github-actions bot added NNCF Common Pull request that updates NNCF Common NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jun 11, 2024
Comment on lines 271 to 276
from nncf.openvino.quantization.compression_primitives import OV_COMPRESSION_PRIMITIVE_CACHE

compress_weight_primitive = OV_COMPRESSION_PRIMITIVE_CACHE.get_compress_weight_primitive(
config, weight.shape, scale.shape, zero_point.shape
)
compressed_weights = Tensor(compress_weight_primitive(weight.data, scale.data, zero_point.data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to hide caching logic and have simple function call like compress_weight or quantize_weight.

Suggested change
from nncf.openvino.quantization.compression_primitives import OV_COMPRESSION_PRIMITIVE_CACHE
compress_weight_primitive = OV_COMPRESSION_PRIMITIVE_CACHE.get_compress_weight_primitive(
config, weight.shape, scale.shape, zero_point.shape
)
compressed_weights = Tensor(compress_weight_primitive(weight.data, scale.data, zero_point.data))
from nncf.openvino.quantization.weight_lowering import compress_weight
compressed_weights = compress_weight(weight, scale, zero_point)

Copy link
Collaborator Author

@nikita-savelyevv nikita-savelyevv Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest to create a quantize_weight()/compress_weight() function alongside calculate_quantized_weight() function at weight_lowering.py and call the former one from the latter one? If so, don't you think it is an unnecessary function multiplication? In my opinion weight_lowering.py already has so many different functions with similar names, it's hard to distinguish which does what.

Copy link
Contributor

@alexsu52 alexsu52 Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with your opinion. I'm working on this right now. If you have any suggestions for refactoring weight_lowering.py, please, let me know. Regarding this comment, I hope that you can extend the functionality of weight_lowering.py without adding complexity to this module.


level_low = 0
level_high = 2**num_bits - 1
if weight.backend == TensorBackend.numpy and is_openvino_available():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid copy-paste of this code, dispatching can be implemented.

from nncf.quantization.algorithms.weight_compression.config import WeightCompressionConfig


class OVCompressionPrimitiveCache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache logic can be encapsulated into decorator as following:

@cache
def jit_compress_weight(
        weight: Tensor,
        scale: Tensor,
        zero_point: Tensor,
        num_bits: int
    ):
        level_low = 0
        level_high = 2**num_bits - 1

        w = opset.parameter(weight.shape, name="w")
        s = opset.parameter(scale.shape, name="s")
        zp = opset.parameter(zero_point.shape, name="zp")

        result = opset.clamp(opset.round(w / s   zp), level_low, level_high, name="compressed_weights")

        if return_nodes:
            return w, s, zp, result

        model = ov.Model([result], [w, s, zp])

        compiled_model = ov.compile_model(model)

        return lambda w, s, zp: compiled_model([w, s, zp])[0]

def compress_weight(
        weight: Tensor,
        scale: Tensor,
        zero_point: Tensor,
        num_bits: int
        
    ):
         return jit_compress_weight(weight, scale, zero_point, num_bits)

@nikita-savelyevv nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 55cafaa to a68a63d Compare July 3, 2024 18:31
@nikita-savelyevv nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 6b98ddd to 3d9faa4 Compare July 16, 2024 14:19
@nikita-savelyevv nikita-savelyevv force-pushed the compress-via-openvino branch 5 times, most recently from cb6aaa0 to 1c85732 Compare September 3, 2024 16:02
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation NNCF Common Pull request that updates NNCF Common NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants