Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for SIMD support #27731

Closed
alexcrichton opened this issue Aug 12, 2015 · 70 comments · Fixed by #117372
Closed

Tracking issue for SIMD support #27731

alexcrichton opened this issue Aug 12, 2015 · 70 comments · Fixed by #117372
Labels
A-simd Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

This is a tracking issue for the unstable core_simd feature in the standard library. SIMD support is quite a thorny topic, but it's largely covered in rust-lang/rfcs#1199, being implemented in #27169, and @huonw will be creating an external crate for full-fledged SIMD support.

cc @huonw

@alexcrichton alexcrichton added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. labels Aug 12, 2015
@pnkfelix
Copy link
Member

pnkfelix commented Oct 6, 2015

Note that #26403 may well be a blocker issue for 100% safe composable SIMD

@aturon
Copy link
Member

aturon commented Nov 5, 2015

This issue now tracks the simd, simd_ffi, and repr_simd features.

@huonw
Copy link
Member

huonw commented Dec 11, 2015

Status update:

In the compiler:

  • there is support for a lot of the intel x86/x86-64 intrinsics, but none of the AMD-specific ones
  • it also has pretty much all the ARM & AArch64 NEON ones
  • other targets have essentially no support
  • some of the fanciest intrinsics are missing, especially (IIRC) some of the ARM pointer ones
  • the cfg(target_feature = "...") detection is subpar, e.g. it doesn't detect features when an explicit -C target-cpu="..." is set, it doesn't handle disabling features like -C target-feature="-sse2", nor does it handle (AFAICT) for custom target specs

In https://github.com/huonw/simd:

  • the cross-platform API is mostly there, for x86, ARM and AArch64
  • a small number of platform-intrinsics are exposed, but not a lot
  • the autogenerator (in etc) should be upgraded to handle emitting the actual wrappers as well as the raw extern blocks that it can currently do
  • this probably makes most sense after rewriting it into Rust, as a proper cargo binary checking in Cargo.lock and everything, using e.g. libs from crates.io (this is fine, as the binary is not part of the normal build process, it is only run when someone decides to)

I'm intending to work on the simd crate first, starting with the rewrite of the autogenerator, but I've currently got a thesis to work on.

@bstrie
Copy link
Contributor

bstrie commented Mar 29, 2016

@huonw How's the thesis going? :) Any progress on this issue, to relay to those interested in SIMD stabilization?

@alexcrichton
Copy link
Member Author

@BurntSushi, @nikomatsakis, and I talked about this recently at the work week, and our thoughts are:

  • The original strategy of stabilizing the small pieces (intrinsics #[repr(simd)]) in the compiler is probably the best way forward here. More experimentation can happen in the simd crate, but those are at least the bare bones for moving forward.

  • Specifically with intrinsics, we probably want to stabilize defining them in a different fashion, specifically:

    #[simd_intrinsic = "name"] 
    pub fn foo(a: A, b: B) -> C {
        foo(a, b)
    }

    That is, we'd probably stabilize a new #[simd_intrinsic] (ish) attribute whose name would closely follow the standard naming conventions (e.g. those in C compilers). The function would look like normal Rust and look like it recurses into itself but the compiler would understand that direct calls to the function are actually implemented inline, so this isn't actually infinite recursion.

  • The "structural typing" of the intrinsics is relatively ok. That is, the compiler can verify any definition is correct, even though each definition may be slightly different (e.g. any 32-bit x 4 value could show up perhaps). The bad part, however, is that not all intrinsics can be verified. For example the simd_lt intrinsic can have any number of SIMD types instantiated, but a pairing like String and Vec<u8> would be nonsensical. This may be difficult to solve in a "pure" fashion but may be worth stomaching to stabilize the SIMD intrinsics (which in general should never be called in favor of the simd crate itself)

  • Specifically with #[repr(simd)], we may want to remove support for tagging a generic structure. This does not appear to be used in the simd crate today and may not be necessary at all, in which case it's probably just complications that we don't want to have to think about today.

  • One possible route with the intrinsics would be a thought by @arielb1 where any illegal instantiation of type parameters just causes the intrinsic to codegen as a trap, which means that any instantiation is possible and some would just fail at runtime (which would need to be avoided anyway).

All of this was discussed hopefully with an eye to start the process of stabilization soon-ish, and then we can all get SIMD on stable Rust!

cc @eddyb, you likely have many opinions as well!

@eddyb
Copy link
Member

eddyb commented Jun 17, 2016

@alexcrichton Ahh, I ignored the multiple-definition option in my recent comment.
I think it's a great solution for integer and floating-point intrinsics, but I didn't consider stabilization of any intrinsic to be possible, hence why I tried to only think of options where libcore hosts all intrinsics.

I am still wary about stabilizing intrinsics, but #[simd_intrinsic] seems focused in scope, so I can see how that works. Although, would it be restricted to things that are definitely about SIMD?
There are various platform intrinsics that don't do anything with vectors, such as prefetch.

Other than that, this seems like a good move forward, without the complexities I was worried about.

@alexcrichton
Copy link
Member Author

@eddyb hm yeah I'm not sure if #[simd_intrinsic] is the best name, certainly up for debate! I would figure that all intrinsics would be defined through a similar mechanism, but the SIMD intrinsics were namespaced somehow so they're the only ones that we stabilize. I wouldn't want to stabilize, for example, intrinsics like prefetch (for now).

@BurntSushi
Copy link
Member

There are other useful intrinsics like crc32 that are explicitly part of SSE 4.2 but aren't necessarily SIMD.

@alexcrichton
Copy link
Member Author

Oh interesting! I'd be ok punting on those for now in favor of just dealing with the SIMD pieces, but we can relatively easily reevaluate to do something different though.

@nikomatsakis
Copy link
Contributor

So I had a really interesting conversation with @sunfishcode on the topic of SIMD, and in particular the design of SIMD in WASM. The high-level summary was two points:

  1. The current breakdown (platform-specific intrinsics, a portable layer atop) is a good one.
  2. Since we modeled the SIMD crate on the JS SIMD designs, which are now being incorporated into WASM, it will align well with the WASM design. Code that can be expressed in terms of the simd crate will thus also be a good candidate for compiling to WASM.

Some other interesting points that he raised:

  1. SIMD has many audiences with diverse needs, and you can't necessarily accommodate them all very well with just one API:
  • codec authors want the raw intrinsics because they use them in clever and unexpected ways;
  • HPC people want higher-level abstractions but don't need access to every trick in the book;
  • high-performance demands also require raw intrinsics, because they don't mind investing the time to reshape the algorithm for each platform.
  1. One way to support these diverse needs, which has been considered for WASM, is to offer the "union" of features across platforms, but offer a way to query which features are "fast" (the idea is that the "slow" features will be emulated). In Rust I would expect we may want similar things, though perhaps the "slow" paths would just trap? (It's probably a bug if you actually wind up executing one of them.)

@nikomatsakis
Copy link
Contributor

On the topic of intrinsics, I feel overall pretty good about some kind of attribute that can be applied to a fn to indicate that the compiler should compile it via pure instructions. Such functions would have to have appropriate argument/return types (roughly like today). If the argument/return types are not generic, this seems very harmless to me, as we can check it purely at the definition site (as @alexcrichton noted).

However, I feel mildly less good about the generic versions, since these cannot be checked until trans time, which means we have to face two annoying choices:

  • issue traps or compilation errors when an intrinsic is used incorrectly;
  • introduce more machinery into the compiler like special SIMD traits.

However, it does seem that there is a third way out: we could remove all support for generic intrinsics, and instead have people define their own traits that map to these operations. For example, today the simd crate does something roughly like this:

#[simd_intrinsic(...)]
fn simd_eq<T,U>(t: T, u: T) -> U;

unsafe trait Simd {
    type EqType;
}

fn generic_eq<T:Simd>(t: T, u: T) -> T::EqType {
    simd_eq(t, t)
}

unsafe impl Simd for u32x4 { ... } // etc

It seems like we could instead do:

trait Simd { // no longer an unsafe trait
    type EqType;

    // we now include a method for the various simd operations we might want to do:
    fn eq(x: &Self, y: &Self) -> Self::EqType;
    ...
}

#[simd_intrinsic]
fn eq_u32x4(x: u32x4, y: u32x4) -> boolx4 {...}

impl Simd for u32x4 {
    #[inline(always)]
    fn eq(x: &Self, y: &Self) -> Self::EqType {
         eq_u32x4(x, y)
    }
}

I'm probably getting some of the details wrong (have to consult the crate for the precise names involved) but hopefully you get the idea. Basically, the compiler only supports monotype intrinsics, and the wrapper crate adds (using normal trait methods) any generic dispatch needed.

@ruuda
Copy link
Contributor

ruuda commented Jun 17, 2016

The function would look like normal Rust and look like it recurses into itself but the compiler would understand that direct calls to the function are actually implemented inline, so this isn't actually infinite recursion.

Is there a good reason for making the function recurse into itself? It seems like unnecessary repetition to me. Would a macro like intrinsic!(), similar to unreachable!(), be possible?

  • codec authors want the raw intrinsics because they use them in clever and unexpected ways;
  • HPC people want higher-level abstractions but don't need access to every trick in the book;
  • high-performance demands also require raw intrinsics, because they don't mind investing the time to reshape the algorithm for each platform.

I agree. This is one of the papercuts of the current state: most of the platform-specific intrinsics are there with their usual names, except for a few basic arithmetic operations, which are simd_add and such. I think it would be better to expose all of the raw platform intrinsics and build a higher-level cross-platform simd_add on top of that with #[cfg(target_feature)]. A crate like simd could build on top of that by providing fallback (e.g. two SSE adds if AVX is not available). It wouldn’t be generic, but does it need to be? I can’t think of a #[repr(simd)] type that is not just an n-tuple of the scalar type. And for the low-level intrinsics the types have little meaning anyway (e.g. _mm256_cmp_ps returns a vector of floats, but actually they are bitmasks).

@eddyb
Copy link
Member

eddyb commented Jun 17, 2016

Is there a good reason for making the function recurse into itself?

Maybe it's contrived, but casting the function to a function pointer would naturally give you a pointer to a function which contains the intrinsic operation.

except for a few basic arithmetic operations, which are simd_add and such

There's a very good reason for keeping those that way: they're basic LLVM operations (i.e. simd_add is just the integer/float add you get for but with vector arguments) and LLVM can optimize them, unlike arbitrary intrinsics, which are function calls and get lowered in target codegen.

@ahicks92
Copy link
Contributor

ahicks92 commented Oct 3, 2016

Can anyone provide an overview of the status of this? I was talking with someone whose GitHub name I don't know on IRC, and there was some indication that no one is handling further development of this feature. I have enough experience with X86 SIMD that I could probably help.

I like @nikomatsakis approach, except that sometimes you need to be able to treat f32x4 as i32x4 or similar on at least X86. This is because some of the shuffles aren't implemented for f32. If the compiler provides intrinsics for all possible vector types for this case, then it should be fine.

One other possibility that comes to mind now that we're close to it is to finish type-level integers, then make generic intrinsics with declarations like this:

fn simd_mul<T>(v1: T, v2: T) -> T
where std::mem::size_of<T>(): platform_simd_size, std::mem::align_of<T>(): platform_simd_align {
//magic code
}

This of course depends on how close we are to having type-level integers, but it should be checkable well before trans in any sane implementation of type-level integers I can think of. Just a thought.

@eddyb
Copy link
Member

eddyb commented Oct 3, 2016

This is because some of the shuffles aren't implemented for f32.

LLVM shuffles don't care what the element types are, and neither do the Rust intrinsics exposing them.

@ahicks92
Copy link
Contributor

ahicks92 commented Oct 3, 2016

@eddyb
People were talking about exposing the platform intrinsics explicitly, which was my point here.

If you drop the cross-platform shuffles in favor of putting it all in a crate and also drop the weird semi-generic nature of the original RFC, this does indeed become a problem.

@nikomatsakis
Copy link
Contributor

@camlorn afaik, nobody is carrying this forward, but I would very much like to see progress! I still basically stand by my previous comment, though I think @eddyb suggested (perhaps on IRC) the idea of applying the special attribute directly to the method in the impl, and that seems even better (perhaps just making it a lang item -- it would mean though that this lang item can be applied multiple times).

I have no objection to exposing the platform intrinsics explicitly, but it also doesn't seem like a required ingredient. It'd be great to make progress on the wrapper library, and adding in platform-specific names feels orthogonal to me. (Right? This is a bit out of cache.)

@nikomatsakis
Copy link
Contributor

I'm not exactly sure what's the best next step. Perhaps a new RFC is warranted, just to lay out the plan clearly? At minimum some kind of canonical write-up feels appropriate. Hopefully the changes vis-a-vis today are relatively minimal.

@ahicks92
Copy link
Contributor

ahicks92 commented Oct 4, 2016

@nikomatsakis
I like the idea of cross platform intrinsics a great deal, and tbh I need to read the whole thread before I'm at full understanding.

It seems to me that you could provide only the platform specific intrinsics, get the optimizer doing a good job with eliminating temporary moves, get type-level integers, and then add a #[inline(force)] that libs can use to make the code efficient.

As I understand it, we almost have type-level integers. And @pcwalton is working on the needed optimizer stuff.

But that said, I have no problem with the original RFC. I started at the bottom of this thread and read up, however, and it seems to me that people are no longer convinced that this is a good way. Perhaps this impression changes once I read the whole thing.

@eddyb
Copy link
Member

eddyb commented Nov 14, 2016

@BurntSushi I knew I saw something somewhere! See #27731 (comment) above.

github-actions bot pushed a commit to rust-lang/glacier that referenced this issue Nov 10, 2021
=== stdout ===
=== stderr ===
error[E0412]: cannot find type `U` in this scope
 --> /home/runner/work/glacier/glacier/ices/82926.rs:5:44
  |
5 |     fn simd_insert<T>(x: T, idx: u32, val: U) -> T;
  |                    -                       ^ help: a type parameter with a similar name exists: `T`
  |                    |
  |                    similarly named type parameter `T` defined here

error[E0658]: inline-const is experimental
 --> /home/runner/work/glacier/glacier/ices/82926.rs:9:5
  |
9 |     const { simd_insert(U, 1_u32, 42_f32) }
  |     ^^^^^
  |
  = note: see issue #76001 <rust-lang/rust#76001> for more information
  = help: add `#![feature(inline_const)]` to the crate attributes to enable

error[E0658]: platform intrinsics are experimental and possibly buggy
 --> /home/runner/work/glacier/glacier/ices/82926.rs:3:8
  |
3 | extern "platform-intrinsic" {
  |        ^^^^^^^^^^^^^^^^^^^^
  |
  = note: see issue #27731 <rust-lang/rust#27731> for more information
  = help: add `#![feature(platform_intrinsics)]` to the crate attributes to enable

warning: type `f32x3` should have an upper camel case name
 --> /home/runner/work/glacier/glacier/ices/82926.rs:2:8
  |
2 | struct f32x3(f32, f32, f32);
  |        ^^^^^ help: convert the identifier to upper camel case (notice the capitalization): `F32x3`
  |
  = note: `#[warn(non_camel_case_types)]` on by default

error[E0094]: intrinsic has wrong number of type parameters: found 1, expected 2
 --> /home/runner/work/glacier/glacier/ices/82926.rs:5:19
  |
5 |     fn simd_insert<T>(x: T, idx: u32, val: U) -> T;
  |                   ^^^ expected 2 type parameters

error[E0308]: mismatched types
 --> /home/runner/work/glacier/glacier/ices/82926.rs:9:25
  |
9 |     const { simd_insert(U, 1_u32, 42_f32) }
  |                         ^ expected `()`, found struct `f32x3`

error: aborting due to 5 previous errors; 1 warning emitted

Some errors have detailed explanations: E0094, E0308, E0412, E0658.
For more information about an error, try `rustc --explain E0094`.
==============
github-actions bot pushed a commit to rust-lang/glacier that referenced this issue Nov 10, 2021
=== stdout ===
=== stderr ===
error[E0412]: cannot find type `T` in this scope
 --> /home/runner/work/glacier/glacier/ices/83837.rs:5:23
  |
5 |     fn simd_insert(x: T, idx: u32, val: U);
  |                       ^ not found in this scope

error[E0412]: cannot find type `U` in this scope
 --> /home/runner/work/glacier/glacier/ices/83837.rs:5:41
  |
5 |     fn simd_insert(x: T, idx: u32, val: U);
  |                                         ^ not found in this scope

error[E0658]: inline-const is experimental
 --> /home/runner/work/glacier/glacier/ices/83837.rs:9:5
  |
9 |     const { simd_insert(U, 0x1319_8a2e, 42_u16) }
  |     ^^^^^
  |
  = note: see issue #76001 <rust-lang/rust#76001> for more information
  = help: add `#![feature(inline_const)]` to the crate attributes to enable

error[E0658]: platform intrinsics are experimental and possibly buggy
 --> /home/runner/work/glacier/glacier/ices/83837.rs:3:8
  |
3 | extern "platform-intrinsic" {
  |        ^^^^^^^^^^^^^^^^^^^^
  |
  = note: see issue #27731 <rust-lang/rust#27731> for more information
  = help: add `#![feature(platform_intrinsics)]` to the crate attributes to enable

warning: type `u16x2` should have an upper camel case name
 --> /home/runner/work/glacier/glacier/ices/83837.rs:2:8
  |
2 | struct u16x2(u16, u16);
  |        ^^^^^ help: convert the identifier to upper camel case (notice the capitalization): `U16x2`
  |
  = note: `#[warn(non_camel_case_types)]` on by default

error[E0094]: intrinsic has wrong number of type parameters: found 0, expected 2
 --> /home/runner/work/glacier/glacier/ices/83837.rs:5:19
  |
5 |     fn simd_insert(x: T, idx: u32, val: U);
  |                   ^ expected 2 type parameters

error: aborting due to 5 previous errors; 1 warning emitted

Some errors have detailed explanations: E0094, E0412, E0658.
For more information about an error, try `rustc --explain E0094`.
==============
@workingjubilee workingjubilee added the A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. label Mar 3, 2023
@coastalwhite
Copy link
Contributor

Sorry to bump this issue, but after reading the discussions here, I am not 100% clear as to why the RISC-V intrinsics are tied to this issue. With the ARM, wasm32, x86 and x86_64 intrinsics being available on stable, is SIMD the blocker for RISC-V instrinsics? Since the vector extension (as well as other extensions) are ratified and available through a target-feature, is there a chance of merging those into a more stable release?

If so, I am willing to pick up this issue of getting them ready for a more stable release.

@Amanieu
Copy link
Member

Amanieu commented Aug 4, 2023

I'm in the process of splitting all std::arch intrinsics into their own separate target features.

The main blocker for RISC-V vector intrinsics is support for scalable vectors, which is tracked in rust-lang/rfcs#3268.

@coastalwhite
Copy link
Contributor

Would you be open to having me slowly including other extensions in the core::arch module? For example, the Zbb (Basic-Bit Manipulation) and Zkn (NIST Algorithm Suite) immediately come to mind as being very useful.

@Amanieu
Copy link
Member

Amanieu commented Aug 4, 2023

Zbb doesn't need to be exposed as intrinsics, all of the functionality is already available as methods on plain integer types.

Crypto intrinsics are fine to add though, but please open a new tracking issue for them when you send a PR.

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 7, 2024
…acrum

Update stdarch submodule

Splits up rust-lang#27731 into multiple tracking issues.

Closes rust-lang#27731
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Jan 9, 2024
…ulacrum

Update stdarch submodule

Splits up rust-lang#27731 into multiple tracking issues.

Closes rust-lang#27731
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 9, 2024
…acrum

Update stdarch submodule

Splits up rust-lang#27731 into multiple tracking issues.

Closes rust-lang#27731
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 18, 2024
…acrum

Update stdarch submodule

Splits up rust-lang#27731 into multiple tracking issues.

Closes rust-lang#27731
@bors bors closed this as completed in ea37e80 Feb 5, 2024
@Lokathor
Copy link
Contributor

Lokathor commented Feb 5, 2024

@workingjubilee bors has run rampant with power, this issue isn't done.

@CryZe
Copy link
Contributor

CryZe commented Feb 5, 2024

The commit explicitly says that this issue got split into various smaller issues. The question is rather where are they?

alessandrod added a commit to alessandrod/aya that referenced this issue Feb 6, 2024
Hashbrown depends on ahash which used to use feature(stdsimd) which as
been removed in rust-lang/rust#27731.

Latest hashbrown bumps ahash which doesn't use the removed feature
anymore.
@ChrisDenton
Copy link
Contributor

The commit explicitly says that this issue got split into various smaller issues. The question is rather where are they?

Presumably: https://github.com/rust-lang/rust/issues?q=is:issue is:open label:A-simd label:C-tracking-issue

@zachs18

This comment was marked as outdated.

@Amanieu
Copy link
Member

Amanieu commented Feb 11, 2024

Here is the full set of new tracking issues for what stdsimd was previous tracking:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-simd Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.