You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
primesieve::iterator's performance depends heavily on the fillNextPrimes() method from PrimeGenerator.cpp. For x64 we have a vectorized AVX512 algorithm that is pretty optimal for this task. Once other CPU architectures (e.g. arm64) support 512-bit vector instructions like AVX512 we should port our AVX512 algorithm to these CPU architectures.
ARM has recently added (2021) the Scalable Vector Extension (SVE) to its CPUs. SVE is supposed to be a portable vector instruction set that works with different vector instructions widths. However for vectorizing our fillNextPrimes() method we need at least 512-bit vector instructions.
The text was updated successfully, but these errors were encountered:
Primesieve's AVX512 fillNextPrimes() algorithm is close to optimal for converting 1-bits from the sieve array into the corresponding bit indexes/positions (and next into prime numbers). The algorithm mainly relies on the VPCOMPRESSB and VPERMB instructions from AVX512.
According to ChatGPT the new ARM SVE instruction set has those instructions as well.
Yes, ARM Scalable Vector Extension (SVE) does offer a similar functionality to Intel AVX-512's VPCOMPRESSB instruction. In ARM SVE, the svcompress_b instruction serves a similar purpose.
Yes, ARM Scalable Vector Extension (SVE) provides a similar functionality to Intel's AVX-512 VPERMB instruction. In SVE, you can achieve similar byte-wise permute operations using the sveperm instruction family.
This means that it should be relatively easy to port Primesieve's AVX512 fillNextPrimes() algorithm to ARM SVE. But there is one big drawback, in ARM SVE the vector length is dynamic which adds another layer of complexity for implementing this algorithm.
According to ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads, as of 2024 there are no ARM CPUs out yet that support SVE or SV2. And SVE2 will likely be limited to 128-bits for the next few years, on top of that the dynamic vector length of ARM SVE is a pain in the ass in many cases (including its use in primesieve). Maybe these issues explain why there is no wide adoption of ARM SVE yet...
It is best to wait a few years until ARM CPUs ship a vector extension of at least 256-bits (ideally we want 512-bits like in AVX512.). Ideally we want a fixed vector length instruction set, similar to ARM NEON or AVX512.
primesieve::iterator
's performance depends heavily on thefillNextPrimes()
method fromPrimeGenerator.cpp
. For x64 we have a vectorized AVX512 algorithm that is pretty optimal for this task. Once other CPU architectures (e.g. arm64) support 512-bit vector instructions like AVX512 we should port our AVX512 algorithm to these CPU architectures.ARM has recently added (2021) the Scalable Vector Extension (SVE) to its CPUs. SVE is supposed to be a portable vector instruction set that works with different vector instructions widths. However for vectorizing our
fillNextPrimes()
method we need at least 512-bit vector instructions.The text was updated successfully, but these errors were encountered: