Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASCII fast-path for char::is_grapheme_extended #121138

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

Swatinem
Copy link
Contributor

I discovered that impl Debug for str is quite slow because it ends up doing a unicode_data::grapheme_extend::lookup for each char, which ends up doing a binary search.

This introduces a fast-path for ASCII chars which do not have this property.

The lookup is thus completely gone from profiles.


As a followup, maybe it’s worth implementing this fast path directly in unicode_data so that it can check for the lower bound directly before going to a potentially expensive binary search.

I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search.

This introduces a fast-path for ASCII chars which do not have this property.

The `lookup` is thus completely gone from profiles.
@rustbot
Copy link
Collaborator

rustbot commented Feb 15, 2024

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 15, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 18, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
@Swatinem
Copy link
Contributor Author

@the8472 , may I also ask you for a perf run here? If this PR shown a similar improvement to #121150, it might be a good idea to merge this instead / additionally, as this is a much smaller patch that should be less controversial.

@the8472
Copy link
Member

the8472 commented Feb 20, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 20, 2024
@bors
Copy link
Contributor

bors commented Feb 20, 2024

⌛ Trying commit 8eaaa6e with merge 2d650c6...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 20, 2024
Add ASCII fast-path for `char::is_grapheme_extended`

I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search.

This introduces a fast-path for ASCII chars which do not have this property.

The `lookup` is thus completely gone from profiles.

---

As a followup, maybe it’s worth implementing this fast path directly in `unicode_data` so that it can check for the lower bound directly before going to a potentially expensive binary search.
@bors
Copy link
Contributor

bors commented Feb 20, 2024

☀️ Try build successful - checks-actions
Build commit: 2d650c6 (2d650c640120d7f8f508691f3b4068f3eeeaefa9)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2d650c6): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.7% [-2.7%, -2.7%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.0% [-2.0%, -2.0%] 1
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 1
All ❌✅ (primary) 0.0% [-2.0%, 2.1%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-1.7%, -1.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.7% [-1.7%, -1.7%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 639.883s -> 641.472s (0.25%)
Artifact size: 308.63 MiB -> 308.62 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 20, 2024
@Swatinem
Copy link
Contributor Author

-54.08% on the fmt-debug-derive runtime benchmark. That is a very nice win for this oneliner.
#121150 in contrast achieved -66.71% on that benchmark (even more if you select "wall time")

@cuviper
Copy link
Member

cuviper commented Mar 4, 2024

There's plenty of precedent for ASCII shortcuts in other methods, like is_numeric.
We might still consider #121150 as well, but let's land this first as the simpler case.

@bors r

@bors
Copy link
Contributor

bors commented Mar 4, 2024

📌 Commit 8eaaa6e has been approved by cuviper

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 4, 2024
@@ -927,7 927,7 @@ impl char {
#[must_use]
#[inline]
pub(crate) fn is_grapheme_extended(self) -> bool {
unicode::Grapheme_Extend(self)
self > '\x7f' && unicode::Grapheme_Extend(self)
Copy link
Contributor

@tgross35 tgross35 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to instead update unicode-table-generator so this becomes a part of Grapheme_Extend (unicode::unicode_data::grapheme_extended::lookup)? So as to help anywhere else this call might wind up getting used

Doesn't have to be part of this PR, maybe later refactoring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should have a const ASCII_MAX: char = '\x7f' somewhere, maybe even public. We throw that magic number around a lot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator doesn't know specifics about particular properties yet, AFAICS, but I could imagine it could generically optimize lookups outside of a property's min/max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did this in the unicode-table-generator (and fun fact: this was the only properly with a min > ASCII). Let me open that up as a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go: #122013

@bors
Copy link
Contributor

bors commented Mar 5, 2024

⌛ Testing commit 8eaaa6e with merge bdde2a8...

@tgross35
Copy link
Contributor

tgross35 commented Mar 5, 2024

@bors r-

No need to spend ci time on a rollup=never if #122013 will supersede it tomorrow :)

edit: fudge

@bors
Copy link
Contributor

bors commented Mar 5, 2024

@tgross35: 🔑 Insufficient privileges: Not in reviewers

@bors
Copy link
Contributor

bors commented Mar 5, 2024

☀️ Test successful - checks-actions
Approved by: cuviper
Pushing bdde2a8 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 5, 2024
@bors bors merged commit bdde2a8 into rust-lang:master Mar 5, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Mar 5, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (bdde2a8): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.0% [4.0%, 4.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.0% [-3.7%, -2.4%] 3
All ❌✅ (primary) 4.0% [4.0%, 4.0%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 642.994s -> 643.644s (0.10%)
Artifact size: 175.03 MiB -> 175.01 MiB (-0.01%)

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 6, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
@Swatinem Swatinem deleted the grapheme-extend-ascii branch March 12, 2024 10:36
wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request May 4, 2024
Pkgsrc changes:
 * Adapt checksums and patches, some have beene intregrated upstream.

Upstream chnages:

Version 1.78.0 (2024-05-02)
===========================

Language
--------
- [Stabilize `#[cfg(target_abi = ...)]`]
  (rust-lang/rust#119590)
- [Stabilize the `#[diagnostic]` namespace and
  `#[diagnostic::on_unimplemented]` attribute]
  (rust-lang/rust#119888)
- [Make async-fn-in-trait implementable with concrete signatures]
  (rust-lang/rust#120103)
- [Make matching on NaN a hard error, and remove the rest of
  `illegal_floating_point_literal_pattern`]
  (rust-lang/rust#116284)
- [static mut: allow mutable reference to arbitrary types, not just
  slices and arrays]
  (rust-lang/rust#117614)
- [Extend `invalid_reference_casting` to include references casting
  to bigger memory layout]
  (rust-lang/rust#118983)
- [Add `non_contiguous_range_endpoints` lint for singleton gaps
  after exclusive ranges]
  (rust-lang/rust#118879)
- [Add `wasm_c_abi` lint for use of older wasm-bindgen versions]
  (rust-lang/rust#117918)
  This lint currently only works when using Cargo.
- [Update `indirect_structural_match` and `pointer_structural_match`
  lints to match RFC]
  (rust-lang/rust#120423)
- [Make non-`PartialEq`-typed consts as patterns a hard error]
  (rust-lang/rust#120805)
- [Split `refining_impl_trait` lint into `_reachable`, `_internal` variants]
  (rust-lang/rust#121720)
- [Remove unnecessary type inference when using associated types
  inside of higher ranked `where`-bounds]
  (rust-lang/rust#119849)
- [Weaken eager detection of cyclic types during type inference]
  (rust-lang/rust#119989)
- [`trait Trait: Auto {}`: allow upcasting from `dyn Trait` to `dyn Auto`]
  (rust-lang/rust#119338)

Compiler
--------

- [Made `INVALID_DOC_ATTRIBUTES` lint deny by default]
  (rust-lang/rust#111505)
- [Increase accuracy of redundant `use` checking]
  (rust-lang/rust#117772)
- [Suggest moving definition if non-found macro_rules! is defined later]
  (rust-lang/rust#121130)
- [Lower transmutes from int to pointer type as gep on null]
  (rust-lang/rust#121282)

Target changes:

- [Windows tier 1 targets now require at least Windows 10]
  (rust-lang/rust#115141)
 - [Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics in tier 1 Windows]
  (rust-lang/rust#120820)
- [Add `wasm32-wasip1` tier 2 (without host tools) target]
  (rust-lang/rust#120468)
- [Add `wasm32-wasip2` tier 3 target]
  (rust-lang/rust#119616)
- [Rename `wasm32-wasi-preview1-threads` to `wasm32-wasip1-threads`]
  (rust-lang/rust#122170)
- [Add `arm64ec-pc-windows-msvc` tier 3 target]
  (rust-lang/rust#119199)
- [Add `armv8r-none-eabihf` tier 3 target for the Cortex-R52]
  (rust-lang/rust#110482)
- [Add `loongarch64-unknown-linux-musl` tier 3 target]
  (rust-lang/rust#121832)

Refer to Rust's [platform support page][platform-support-doc]
for more information on Rust's tiered platform support.

Libraries
---------

- [Bump Unicode to version 15.1.0, regenerate tables]
  (rust-lang/rust#120777)
- [Make align_offset, align_to well-behaved in all cases]
  (rust-lang/rust#121201)
- [PartialEq, PartialOrd: document expectations for transitive chains]
  (rust-lang/rust#115386)
- [Optimize away poison guards when std is built with panic=abort]
  (rust-lang/rust#100603)
- [Replace pthread `RwLock` with custom implementation]
  (rust-lang/rust#110211)
- [Implement unwind safety for Condvar on all platforms]
  (rust-lang/rust#121768)
- [Add ASCII fast-path for `char::is_grapheme_extended`]
  (rust-lang/rust#121138)

Stabilized APIs
---------------

- [`impl Read for &Stdin`]
  (https://doc.rust-lang.org/stable/std/io/struct.Stdin.html#impl-Read-for-&Stdin)
- [Accept non `'static` lifetimes for several `std::error::Error`
  related implementations] (rust-lang/rust#113833)
- [Make `impl<Fd: AsFd>` impl take `?Sized`]
  (rust-lang/rust#114655)
- [`impl From<TryReserveError> for io::Error`]
  (https://doc.rust-lang.org/stable/std/io/struct.Error.html#impl-From-for-Error)

These APIs are now stable in const contexts:

- [`Barrier::new()`]
  (https://doc.rust-lang.org/stable/std/sync/struct.Barrier.html#method.new)

Cargo
-----

- [Stabilize lockfile v4](rust-lang/cargo#12852)
- [Respect `rust-version` when generating lockfile]
  (rust-lang/cargo#12861)
- [Control `--charset` via auto-detecting config value]
  (rust-lang/cargo#13337)
- [Support `target.<triple>.rustdocflags` officially]
  (rust-lang/cargo#13197)
- [Stabilize global cache data tracking]
  (rust-lang/cargo#13492)

Misc
----

- [rustdoc: add `--test-builder-wrapper` arg to support wrappers
  such as RUSTC_WRAPPER when building doctests]
  (rust-lang/rust#114651)

Compatibility Notes
-------------------

- [Many unsafe precondition checks now run for user code with debug
  assertions enabled] (rust-lang/rust#120863)
  This change helps users catch undefined behavior in their code,
  though the details of how much is checked are generally not
  stable.
- [riscv only supports split_debuginfo=off for now]
  (rust-lang/rust#120518)
- [Consistently check bounds on hidden types of `impl Trait`]
  (rust-lang/rust#121679)
- [Change equality of higher ranked types to not rely on subtyping]
  (rust-lang/rust#118247)
- [When called, additionally check bounds on normalized function return type]
  (rust-lang/rust#118882)
- [Expand coverage for `arithmetic_overflow` lint]
  (rust-lang/rust#119432)

Internal Changes
----------------

These changes do not affect any public interfaces of Rust, but they represent
significant improvements to the performance or internals of rustc and related
tools.

- [Update to LLVM 18](rust-lang/rust#120055)
- [Build `rustc` with 1CGU on `x86_64-pc-windows-msvc`]
  (rust-lang/rust#112267)
- [Build `rustc` with 1CGU on `x86_64-apple-darwin`]
  (rust-lang/rust#112268)
- [Introduce `run-make` V2 infrastructure, a `run_make_support`
  library and port over 2 tests as example]
  (rust-lang/rust#113026)
- [Windows: Implement condvar, mutex and rwlock using futex]
  (rust-lang/rust#121956)
bors added a commit to rust-lang-ci/rust that referenced this pull request May 24, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request May 25, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang/rust#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request Jun 28, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang/rust#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants