Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use another implementation for PDEP #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yanjiew1
Copy link

@yanjiew1 yanjiew1 commented Aug 12, 2024

This approach is from chapter 7 of Hacker's Delight. It eliminates the need of POPCNT and BZHI instructions totally, enhancing portability.

GCC generates less instructions compared to the original implementation.

This approach is from chapter 7 of Hacker's Delight. It eliminates
the need of POPCNT and BZHI instructions totally, enhancing portability.

GCC generates less instructions compared to the original implementation.
@yanjiew1
Copy link
Author

yanjiew1 commented Aug 14, 2024

The performance can be slower when compiled with "-O3" flag because the new implementation cannot take advantage of the LEA instruction.
If we compile this code with "-O3", GCC may inline the function and then optimize the code in a way that the shifts of the ppp are computed once for each mask. Thus, the number of instructions may not be reduced when "-O3" is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant