Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics (in nightly) in Windows x64 #120820

Merged
merged 7 commits into from
Feb 29, 2024

Conversation

CKingX
Copy link
Contributor

@CKingX CKingX commented Feb 9, 2024

As Rust plans to set Windows 10 as the minimum supported OS for target x86_64-pc-windows-msvc, I have added the cmpxchg16b and sse3 feature. Windows 10 requires CMPXCHG16B, LAHF/SAHF, and PrefetchW as stated in the requirements here. Furthermore, CPUs that meet these requirements also have SSE3 (see)

…ers and Rust plans to set Windows 10 as the minimum supported OS for target x86_64-pc-windows-msvc, I have added the cmpxchg16b and sse3 feature (as CPUs that meet the Windows 10 64-bit requirement also support SSE3. See https://walbourn.github.io/directxmath-sse3-and-ssse3/ )
@rustbot
Copy link
Collaborator

rustbot commented Feb 9, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @fmease (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 9, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 9, 2024

These commits modify compiler targets.
(See the Target Tier Policy.)

@rust-log-analyzer

This comment has been minimized.

@CKingX
Copy link
Contributor Author

CKingX commented Feb 9, 2024

Strange. rustc --print=target-features lists cmpxchg16b

@ChrisDenton

This comment was marked as outdated.

@@ -3,6 3,7 @@ use crate::spec::{base, SanitizerSet, Target};
pub fn target() -> Target {
let mut base = base::windows_msvc::opts();
base.cpu = "x86-64".into();
base.features = " cmpxchg16b, sse3".into();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sahf too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That I am not sure. x86-64-v2 onwards don't use sahf though it is listed in the features. CMPXCHG16B and SSE3 are listed. Does that mean Rustc doesn't yet make use of these instructions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this

rustc --print=cfg -C target-cpu=x86-64-v2
debug_assertions
panic="unwind"
target_arch="x86_64"
target_endian="little"
target_env="msvc"
target_family="windows"
target_feature="cmpxchg16b"
target_feature="fxsr"
target_feature="popcnt"
target_feature="sse"
target_feature="sse2"
target_feature="sse3"
target_feature="sse4.1"
target_feature="sse4.2"
target_feature="ssse3"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="ptr"
target_os="windows"
target_pointer_width="64"
target_vendor="pc"
windows

Did some more digging and turns out SAHF seems to be for kernel rather than userspace code. Hence it is not listed in the rustc command I ran above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems to be tied to virtualization and for kernel space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've been researching this a bit more. On balance I do think we should add sahf. While I don't think it's that important, it was seen fit to be included in x86-64-v2 feature level and I think we should align ourselves with that (although obviously dropping the SSE level back down to 3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@ChrisDenton
Copy link
Contributor

ChrisDenton commented Feb 9, 2024

Hm, using cx16 appears to work but for some reason cmpxchg16b doesn't.

EDIT: Ah, to_llvm_features has a list of features that need translating from their rust names to their llvm names. This isn't applied to target specs so you have to use the llvm name manually.

Fixed a bug where adding CMPXCHG16B would fail due to different names in Rustc and LLVM
@CKingX
Copy link
Contributor Author

CKingX commented Feb 9, 2024

Thank you! Fixed it

@CKingX
Copy link
Contributor Author

CKingX commented Feb 9, 2024

Also as CMPXCHG16B allows 128-bit/16-byte atomics, I am creating a new commit to update max atomic width as well. This is also already done for x86_64_apple_darwin target where it states:
base.max_atomic_width = Some(128); // penryn supports cmpxchg16b

As CMPXCHG16B is supported, I updated the max atomic width to 128-bits from 64-bits
@ChrisDenton
Copy link
Contributor

Hm, doesn't that also require avx for full atomics?

@CKingX
Copy link
Contributor Author

CKingX commented Feb 9, 2024

x86-64 darwin (macOS) target does this without AVX support as the minimum requirement is Penryn (Core 2) which doesn't support AVX or AVX2.

@ChrisDenton
Copy link
Contributor

Fair enough. Maybe I was thinking of something else.

@mati865
Copy link
Contributor

mati865 commented Feb 9, 2024

Shouldn't this be set for all other x86 Windows targets as well?

@CKingX
Copy link
Contributor Author

CKingX commented Feb 9, 2024

As for 32-bit Windows 10, it only requires SSE2 and that is already the requirement in Rustc.

Now for 64-bit MinGW ABI, I could update it if that also requires Windows 10 and up. I looked at Rust platforms and I didn't see win7 version of MinGW ABI so I am unsure. However, if the -gnu targets also dropped support, I can do a new commit

@ChrisDenton
Copy link
Contributor

All x86_64-pc-windows targets with std support are raising their baseline to Windows 10. Also the uwp targets because they inherently don't support older Windows versions. The only exceptional one is the win7 target.

Updated x86_64-uwp-windows-gnu to use CMPXCHG16B and SSE3
@rustbot rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 9, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 9, 2024

There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.

You can start a rebase with the following commands:

$ # rebase
$ git rebase -i master
$ # delete any merge commits in the editor that appears
$ git push --force-with-lease

The following commits are merge commits:

@rustbot
Copy link
Collaborator

rustbot commented Feb 9, 2024

There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.

You can start a rebase with the following commands:

$ # rebase
$ git rebase -i master
$ # delete any merge commits in the editor that appears
$ git push --force-with-lease

The following commits are merge commits (since this message was last posted):

@rustbot rustbot removed has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 9, 2024
@CKingX CKingX changed the title Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics to Windows x64 Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics in Windows x64 Feb 17, 2024
@CKingX CKingX changed the title Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics in Windows x64 Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics (in nightly) in Windows x64 Feb 17, 2024
CKingX added a commit to CKingX/rust that referenced this pull request Feb 18, 2024
…enkov

Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics (in nightly) in Windows x64

As Rust plans to set Windows 10 as the minimum supported OS for target x86_64-pc-windows-msvc, I have added the cmpxchg16b and sse3 feature. Windows 10 requires CMPXCHG16B, LAHF/SAHF, and PrefetchW as stated in the requirements [here](https://download.microsoft.com/download/c/1/5/c150e1ca-4a55-4a7e-94c5-bfc8c2e785c5/Windows 10 Minimum Hardware Requirements.pdf). Furthermore, CPUs that meet these requirements also have SSE3 ([see](https://walbourn.github.io/directxmath-sse3-and-ssse3/))
@bors
Copy link
Contributor

bors commented Feb 19, 2024

⌛ Testing commit 376c7b9 with merge d6a09a7...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2024
Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics (in nightly) in Windows x64

As Rust plans to set Windows 10 as the minimum supported OS for target x86_64-pc-windows-msvc, I have added the cmpxchg16b and sse3 feature. Windows 10 requires CMPXCHG16B, LAHF/SAHF, and PrefetchW as stated in the requirements [here](https://download.microsoft.com/download/c/1/5/c150e1ca-4a55-4a7e-94c5-bfc8c2e785c5/Windows 10 Minimum Hardware Requirements.pdf). Furthermore, CPUs that meet these requirements also have SSE3 ([see](https://walbourn.github.io/directxmath-sse3-and-ssse3/))
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 19, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 19, 2024
@CKingX
Copy link
Contributor Author

CKingX commented Feb 19, 2024

Ok so the test is checking if SSE4.2 enables CRC32 on x86-64 (base). As we passed in extra features, it fails.

I am going to look into how other targets handle this when they add feature flags

@CKingX
Copy link
Contributor Author

CKingX commented Feb 19, 2024

I did some digging and it is not all good news. x86_64h-apple-darwin handles this by using core-avx2 and removing some features using the "-feature" syntax. Now, the features list is evaluated the feature left to right so "-feature, feature" should still lead to it correcting when you add the feature. The disadvantage of this approach is that it doesn't work if you change the target cpu (like choosing native) where those features don't get added back. Other platforms like x86-64 Fuschia and Android add features just fine

That leaves updating the test to exclude windows target, but I am unsure on if the test should be changed. At the very least, we should add another test for windows x64.

@CKingX
Copy link
Contributor Author

CKingX commented Feb 19, 2024

The other idea is setting the target cpu as nocona but that might mess up optimizations as a whole because nocona is based off of pentium 4 (far off from modern CPUs as compared to core to haswell)

@erikdesjardins
Copy link
Contributor

From the test failure it looks like that line changes to

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "target-cpu"="x86-64" "target-features"=" cx16, sse3, sahf, sse4.2, crc32" }

So it's fine to just change the test expectation to allow additional target features before sse4.2/crc32, like

// CHECK: attributes #0 {{.*"target-features"=".*\ sse4.2,\ crc32"}}

since that upholds the spirit of the test (that adding sse4.2 also adds crc32)

@CKingX
Copy link
Contributor Author

CKingX commented Feb 20, 2024

Thanks erikdesjardins! Updating the test was a lot simpler than I expected.

@CKingX
Copy link
Contributor Author

CKingX commented Feb 27, 2024

Just as a heads up the updated test just needs to be reviewed

@ChrisDenton
Copy link
Contributor

lgtm, thanks!

@bors r=petrochenkov,ChrisDenton

@bors
Copy link
Contributor

bors commented Feb 29, 2024

📌 Commit 2d25c3b has been approved by petrochenkov,ChrisDenton

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 29, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 29, 2024
…llaumeGomez

Rollup of 7 pull requests

Successful merges:

 - rust-lang#119748 (Increase visibility of `join_path` and `split_paths`)
 - rust-lang#120820 (Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics (in nightly) in Windows x64)
 - rust-lang#121000 (pattern_analysis: rework how we hide empty private fields)
 - rust-lang#121376 (Skip unnecessary comparison with half-open range patterns)
 - rust-lang#121596 (Use volatile access instead of `#[used]` for `on_tls_callback`)
 - rust-lang#121669 (Count stashed errors again)
 - rust-lang#121783 (Emitter cleanups)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 36bd9ef into rust-lang:master Feb 29, 2024
11 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Feb 29, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 29, 2024
Rollup merge of rust-lang#120820 - CKingX:cpu-base-minimum, r=petrochenkov,ChrisDenton

Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics (in nightly) in Windows x64

As Rust plans to set Windows 10 as the minimum supported OS for target x86_64-pc-windows-msvc, I have added the cmpxchg16b and sse3 feature. Windows 10 requires CMPXCHG16B, LAHF/SAHF, and PrefetchW as stated in the requirements [here](https://download.microsoft.com/download/c/1/5/c150e1ca-4a55-4a7e-94c5-bfc8c2e785c5/Windows 10 Minimum Hardware Requirements.pdf). Furthermore, CPUs that meet these requirements also have SSE3 ([see](https://walbourn.github.io/directxmath-sse3-and-ssse3/))
@CKingX
Copy link
Contributor Author

CKingX commented Feb 29, 2024

Thank you Chris, erikdesjardins, CryZe and petrochenkov!

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request May 4, 2024
Pkgsrc changes:
 * Adapt checksums and patches, some have beene intregrated upstream.

Upstream chnages:

Version 1.78.0 (2024-05-02)
===========================

Language
--------
- [Stabilize `#[cfg(target_abi = ...)]`]
  (rust-lang/rust#119590)
- [Stabilize the `#[diagnostic]` namespace and
  `#[diagnostic::on_unimplemented]` attribute]
  (rust-lang/rust#119888)
- [Make async-fn-in-trait implementable with concrete signatures]
  (rust-lang/rust#120103)
- [Make matching on NaN a hard error, and remove the rest of
  `illegal_floating_point_literal_pattern`]
  (rust-lang/rust#116284)
- [static mut: allow mutable reference to arbitrary types, not just
  slices and arrays]
  (rust-lang/rust#117614)
- [Extend `invalid_reference_casting` to include references casting
  to bigger memory layout]
  (rust-lang/rust#118983)
- [Add `non_contiguous_range_endpoints` lint for singleton gaps
  after exclusive ranges]
  (rust-lang/rust#118879)
- [Add `wasm_c_abi` lint for use of older wasm-bindgen versions]
  (rust-lang/rust#117918)
  This lint currently only works when using Cargo.
- [Update `indirect_structural_match` and `pointer_structural_match`
  lints to match RFC]
  (rust-lang/rust#120423)
- [Make non-`PartialEq`-typed consts as patterns a hard error]
  (rust-lang/rust#120805)
- [Split `refining_impl_trait` lint into `_reachable`, `_internal` variants]
  (rust-lang/rust#121720)
- [Remove unnecessary type inference when using associated types
  inside of higher ranked `where`-bounds]
  (rust-lang/rust#119849)
- [Weaken eager detection of cyclic types during type inference]
  (rust-lang/rust#119989)
- [`trait Trait: Auto {}`: allow upcasting from `dyn Trait` to `dyn Auto`]
  (rust-lang/rust#119338)

Compiler
--------

- [Made `INVALID_DOC_ATTRIBUTES` lint deny by default]
  (rust-lang/rust#111505)
- [Increase accuracy of redundant `use` checking]
  (rust-lang/rust#117772)
- [Suggest moving definition if non-found macro_rules! is defined later]
  (rust-lang/rust#121130)
- [Lower transmutes from int to pointer type as gep on null]
  (rust-lang/rust#121282)

Target changes:

- [Windows tier 1 targets now require at least Windows 10]
  (rust-lang/rust#115141)
 - [Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics in tier 1 Windows]
  (rust-lang/rust#120820)
- [Add `wasm32-wasip1` tier 2 (without host tools) target]
  (rust-lang/rust#120468)
- [Add `wasm32-wasip2` tier 3 target]
  (rust-lang/rust#119616)
- [Rename `wasm32-wasi-preview1-threads` to `wasm32-wasip1-threads`]
  (rust-lang/rust#122170)
- [Add `arm64ec-pc-windows-msvc` tier 3 target]
  (rust-lang/rust#119199)
- [Add `armv8r-none-eabihf` tier 3 target for the Cortex-R52]
  (rust-lang/rust#110482)
- [Add `loongarch64-unknown-linux-musl` tier 3 target]
  (rust-lang/rust#121832)

Refer to Rust's [platform support page][platform-support-doc]
for more information on Rust's tiered platform support.

Libraries
---------

- [Bump Unicode to version 15.1.0, regenerate tables]
  (rust-lang/rust#120777)
- [Make align_offset, align_to well-behaved in all cases]
  (rust-lang/rust#121201)
- [PartialEq, PartialOrd: document expectations for transitive chains]
  (rust-lang/rust#115386)
- [Optimize away poison guards when std is built with panic=abort]
  (rust-lang/rust#100603)
- [Replace pthread `RwLock` with custom implementation]
  (rust-lang/rust#110211)
- [Implement unwind safety for Condvar on all platforms]
  (rust-lang/rust#121768)
- [Add ASCII fast-path for `char::is_grapheme_extended`]
  (rust-lang/rust#121138)

Stabilized APIs
---------------

- [`impl Read for &Stdin`]
  (https://doc.rust-lang.org/stable/std/io/struct.Stdin.html#impl-Read-for-&Stdin)
- [Accept non `'static` lifetimes for several `std::error::Error`
  related implementations] (rust-lang/rust#113833)
- [Make `impl<Fd: AsFd>` impl take `?Sized`]
  (rust-lang/rust#114655)
- [`impl From<TryReserveError> for io::Error`]
  (https://doc.rust-lang.org/stable/std/io/struct.Error.html#impl-From-for-Error)

These APIs are now stable in const contexts:

- [`Barrier::new()`]
  (https://doc.rust-lang.org/stable/std/sync/struct.Barrier.html#method.new)

Cargo
-----

- [Stabilize lockfile v4](rust-lang/cargo#12852)
- [Respect `rust-version` when generating lockfile]
  (rust-lang/cargo#12861)
- [Control `--charset` via auto-detecting config value]
  (rust-lang/cargo#13337)
- [Support `target.<triple>.rustdocflags` officially]
  (rust-lang/cargo#13197)
- [Stabilize global cache data tracking]
  (rust-lang/cargo#13492)

Misc
----

- [rustdoc: add `--test-builder-wrapper` arg to support wrappers
  such as RUSTC_WRAPPER when building doctests]
  (rust-lang/rust#114651)

Compatibility Notes
-------------------

- [Many unsafe precondition checks now run for user code with debug
  assertions enabled] (rust-lang/rust#120863)
  This change helps users catch undefined behavior in their code,
  though the details of how much is checked are generally not
  stable.
- [riscv only supports split_debuginfo=off for now]
  (rust-lang/rust#120518)
- [Consistently check bounds on hidden types of `impl Trait`]
  (rust-lang/rust#121679)
- [Change equality of higher ranked types to not rely on subtyping]
  (rust-lang/rust#118247)
- [When called, additionally check bounds on normalized function return type]
  (rust-lang/rust#118882)
- [Expand coverage for `arithmetic_overflow` lint]
  (rust-lang/rust#119432)

Internal Changes
----------------

These changes do not affect any public interfaces of Rust, but they represent
significant improvements to the performance or internals of rustc and related
tools.

- [Update to LLVM 18](rust-lang/rust#120055)
- [Build `rustc` with 1CGU on `x86_64-pc-windows-msvc`]
  (rust-lang/rust#112267)
- [Build `rustc` with 1CGU on `x86_64-apple-darwin`]
  (rust-lang/rust#112268)
- [Introduce `run-make` V2 infrastructure, a `run_make_support`
  library and port over 2 tests as example]
  (rust-lang/rust#113026)
- [Windows: Implement condvar, mutex and rwlock using futex]
  (rust-lang/rust#121956)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants