Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing optimization around __rust_alloc and unknown functions if panic=unwind #46515

Closed
RalfJung opened this issue Dec 5, 2017 · 8 comments · Fixed by #92419 or #102099
Closed

Missing optimization around __rust_alloc and unknown functions if panic=unwind #46515

RalfJung opened this issue Dec 5, 2017 · 8 comments · Fixed by #92419 or #102099
Labels
A-allocators Area: Custom and system allocators C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@RalfJung
Copy link
Member

RalfJung commented Dec 5, 2017

The following code gets optimized as expected:

pub fn test() -> bool {
    let x = &*Box::new(0);
    let y = &*Box::new(0);
    
    if x as *const _ == y as *const _ {
        return true;
    }
    return false;
}

becomes

; example::test
; Function Attrs: nounwind uwtable
define zeroext i1 @example::test() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality !dbg !18 {
  ret i1 false, !dbg !22
}

However, if I let the code call an unknown function instead of return, the optimization disappears:

pub fn test(f: fn()) {
    let x = &*Box::new(0);
    let y = &*Box::new(0);
    
    if x as *const _ == y as *const _ {
        f();
    }
}

becomes

; example::test
; Function Attrs: uwtable
define void @example::test(void ()* nocapture nonnull %f) unnamed_addr #1 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality !dbg !36 {
  %err.i.i.i10 = alloca %"core::mem::ManuallyDrop<alloc::allocator::AllocErr>", align 8
  %_5.i.i.i.i11 = alloca %"alloc::allocator::AllocErr", align 8
  %err.i.i.i = alloca %"core::mem::ManuallyDrop<alloc::allocator::AllocErr>", align 8
  %_5.i.i.i.i = alloca %"alloc::allocator::AllocErr", align 8
  %_9.sroa.9.i.i = alloca [16 x i8], align 8
  %_9.sroa.9.0.sroa_idx9.i.i = getelementptr inbounds [16 x i8], [16 x i8]* %_9.sroa.9.i.i, i64 0, i64 0, !dbg !38
  call void @llvm.lifetime.start(i64 16, i8* nonnull %_9.sroa.9.0.sroa_idx9.i.i), !dbg !38
  %0 = getelementptr inbounds %"core::mem::ManuallyDrop<alloc::allocator::AllocErr>", %"core::mem::ManuallyDrop<alloc::allocator::AllocErr>"* %err.i.i.i, i64 0, i32 0, i64 0, !dbg !48
  call void @llvm.lifetime.start(i64 24, i8* nonnull %0) #6, !dbg !48, !noalias !52
  %1 = call i8* @__rust_alloc(i64 4, i64 4, i8* nonnull %0) #6, !dbg !55, !noalias !52
  %2 = icmp eq i8* %1, null, !dbg !57
  br i1 %2, label �3.i.i.i, label %"_ZN35_$LT$alloc..boxed..Box$LT$T$GT$$GT$3new17he6f20be442f24a38E.exit", !dbg !61

[snip]

That seems wrong, why should the optimization stop kicking in? The corresponding C code does not have any problem:

#include <memory>

void test(void (*f)()) {
    auto x = std::make_unique<int>(0);
    auto y = std::make_unique<int>(0);

    if (&*x == &*y) {
        f();
    }
}

becomes

; Function Attrs: uwtable
define void @test(void (*)())(void ()* nocapture) local_unnamed_addr #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) !dbg !1204 {
  tail call void @llvm.dbg.value(metadata void ()* %0, i64 0, metadata !1208, metadata !1211), !dbg !1212
  tail call void @llvm.dbg.value(metadata %"class.std::unique_ptr"* undef, i64 0, metadata !1209, metadata !1213), !dbg !1214
  tail call void @llvm.dbg.value(metadata %"class.std::unique_ptr"* undef, i64 0, metadata !1209, metadata !1213), !dbg !1214
  tail call void @llvm.dbg.value(metadata %"class.std::unique_ptr"* undef, i64 0, metadata !1210, metadata !1213), !dbg !1215
  tail call void @llvm.dbg.value(metadata %"class.std::unique_ptr"* undef, i64 0, metadata !1210, metadata !1213), !dbg !1215
  tail call void @llvm.dbg.value(metadata %"class.std::unique_ptr"* undef, i64 0, metadata !1209, metadata !1213), !dbg !1214
  ret void, !dbg !1216
}

(I briefly thought maybe unwinding is the problem, but C should have the same kind of unwinding here.)

@kennytm kennytm added A-allocators Area: Custom and system allocators C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Dec 5, 2017
@RalfJung
Copy link
Member Author

RalfJung commented Dec 5, 2017

@eddyb suggested I try -C panic=abort, and indeed, that makes all the difference. However, the optimization is sound even with unwinding panics, so it should happen either way.

@RalfJung RalfJung changed the title Missing optimization around __rust_alloc and unknown functions Missing optimization around __rust_alloc and unknown functions if panic=unwind Dec 5, 2017
@RalfJung
Copy link
Member Author

RalfJung commented Dec 10, 2017

Aha! I think I got it. In the IR for the Rust program, the unwinding branch calls @_ZN4core3ptr13drop_in_place17h91409ca23b53ad07E with the noinline attribute. So LLVM does not look through this call at all, and (I guess) it considers the pointer passed to it escaped. The pointer equality folding only happens if the pointer does not get escaped, so LLVM can guarantee that all comparisons of this pointer are folded consistently.

So, to fix this, we'd have to remove this noinline -- at least for Box<T> where T does not itself implement Drop.

RalfJung added a commit to RalfJung/rust that referenced this issue Dec 10, 2017
Box<T> drop glue is pretty simple and should not lead to blow-up problems.
Moreover, not inlining it leads to LLVM pessimizing some pointer-based
optimizations because it thinks the pointer passed to the drop glue could be
escaped.

Fixes rust-lang#46515.  However, this is not a great fix -- the problem still exists for
other smart pointers besides `Box`.
@SimonSapin
Copy link
Contributor

CC @nox, @rust-lang/wg-codegen

@nox
Copy link
Contributor

nox commented Apr 1, 2018

Could this be preventing other optimisations than pointer equality folding? Cc @rust-lang/compiler

@nagisa
Copy link
Member

nagisa commented Apr 1, 2018

@nox definitely. My guess is that alias analysis is being overly conservative here and is declaring pointers to be MayAlias, rather than NoAlias, which what they definitely are. This will defeat many optimisations that check for aliasing (pretty much everything).

@RalfJung
Copy link
Member Author

RalfJung commented Apr 2, 2018

However, pointer equality optimizations and aliasing information are (AFAIK) mostly unrelated -- in particular, optimizing equalities based on noalias is incorrect:

int foo(restrict int *x, restrict int *y) {
  // The restrict will compile to noalias in LLVM. However, C does not let you
  // optimize the following unless both x and y are actually dereferenced.
  (x == y) ? 1 : 0
}

@steveklabnik
Copy link
Member

Triage: this optimization still does not happen.

@eddyb
Copy link
Member

eddyb commented Mar 28, 2022

@nagisa @erikdesjardins Oh no, #95338 ended up triggering LLVM to close this again.

TaKO8Ki added a commit to TaKO8Ki/rust that referenced this issue Jun 27, 2023
…=nikic

Rebased: Mark drop calls in landing pads cold instead of noinline

I noticed that certain inlining optimizations were missing while staring at some compiled code output. I'd like to see this relanded, so I rebased the PR from ``@erikdesjardins`` (PR rust-lang#94823).

This PR reapplies rust-lang#92419, which was reverted in rust-lang#94402 due to rust-lang#94390.

Fixes rust-lang#46515, fixes rust-lang#87055.

Update: fixes rust-lang#97217.
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 2, 2023
…ikic

Rebased: Mark drop calls in landing pads cold instead of noinline

I noticed that certain inlining optimizations were missing while staring at some compiled code output. I'd like to see this relanded, so I rebased the PR from `@erikdesjardins` (PR rust-lang#94823).

This PR reapplies rust-lang#92419, which was reverted in rust-lang#94402 due to rust-lang#94390.

Fixes rust-lang#46515, fixes rust-lang#87055.

Update: fixes rust-lang#97217.
@bors bors closed this as completed in 2e5a9dd Oct 2, 2023
antoyo pushed a commit to rust-lang/rustc_codegen_gcc that referenced this issue Oct 8, 2023
Rebased: Mark drop calls in landing pads cold instead of noinline

I noticed that certain inlining optimizations were missing while staring at some compiled code output. I'd like to see this relanded, so I rebased the PR from `@erikdesjardins` (PR #94823).

This PR reapplies rust-lang/rust#92419, which was reverted in rust-lang/rust#94402 due to rust-lang/rust#94390.

Fixes rust-lang/rust#46515, fixes rust-lang/rust#87055.

Update: fixes #97217.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
7 participants