Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite liveness analysis to be based on MIR #51003

Open
nikomatsakis opened this issue May 23, 2018 · 15 comments
Open

rewrite liveness analysis to be based on MIR #51003

nikomatsakis opened this issue May 23, 2018 · 15 comments
Assignees
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

The current liveness code does a simple liveness computation (actually a few such things) and tells you when e.g. assignments are dead and that sort of thing. It does this on the HIR. It would be better to do this on the MIR — in fact, the NLL computation is already computing liveness across all of MIR, so we ought to be able to piggy back on those results I imagine.

It may be a good idea to wait though until the MIR borrowck stuff "settles down" a bit before attempting this.

@nikomatsakis nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll labels May 23, 2018
@nikomatsakis nikomatsakis added C-cleanup Category: PRs that clean code up or issues documenting cleanup. and removed WG-compiler-nll labels Jul 3, 2018
@nikomatsakis
Copy link
Contributor Author

I'm removing WG-compiler-nll as this isn't really related to NLL, it's just general cleanup. Here are some tips into the code for future reference:

The liveness code I am referring to, which ought to be ported:

pub fn check_crate<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>) {

The MIR-based liveness computation:

pub fn liveness_of_locals<'tcx>(mir: &Mir<'tcx>, mode: LivenessMode) -> LivenessResult {

@cjgillot
Copy link
Contributor

Is this issue still relevant? Can I pick it up?

@matthewjasper
Copy link
Contributor

It's still relevant. Feel free to ping me with questions.

@jonas-schievink jonas-schievink added the A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Apr 10, 2020
@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented May 10, 2020

I looked into this this morning. A few roadblocks remain.

For one, the MIR has no record of statements like let _ = x even before SimplifyCfg. More complex expressions seem to be lowered even when they are assigned to _, so I think the solution is to look for an assignment of an ExprKind::VarRef specifically and lower it to a FakeRead of some kind.

There's also the issue of associating Locals with their HirIds for diagnostics. This was attempted in #51275, but it caused issues with incremental compilation. That PR worked around the issue by computing the spans needed for MIR borrowck errors ahead-of-time. I think a better solution would be to stop including VarBindingInfo in the StableHash of a MIR body. I don't fully understand the implications of ignoring VarBindingInfo for the purposes of incremental. I think it's okay as long as we don't use the HirId to determine whether to emit a lint/error, only to associate it with a Span. This suggests that we need to precompute the level for the "unused variables" lint. Alternatively, we could duplicate the approach in #51275, but that won't scale as more checking is moved to the MIR.

Finally, I was seeing spurious warnings for variables that were bound in a match arm and only used in a guard. Presumably this will also require some changes to MIR building, but I've not investigated this one yet.

Since I don't have a good intuition for incremental compilation or MIR building, I would appreciate any advice on solving either of these problems. @pnkfelix, this was a while ago, but did you consider ignoring VarBindingInfo during stable hashing in #51275?

@matthewjasper
Copy link
Contributor

The only reason Spans work while HirIds don't is because the incremental tests make stable hashing ignore spans. I think just using HirIds and clearing the VarBindingInfo after borrow checking so that it only affects unoptimized MIR would be fine.

@nikomatsakis
Copy link
Contributor Author

I think this agrees with what @matthewjasper said, but I want to encourage us not to do anything "clever" around incremental -- that is to say, we should not try to tweak hashing schemes or something (imo), we should hash all the data. If we don't want something to be hashed, the right way is to remove it from the struct itself (which should then prove we don't have a dependency on it).

That said, when it comes to spans, we've been talking for some time about a better scheme to handle spans and incremental compilation (see e.g. #47389), which seems related.

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented May 11, 2020

@nikomatsakis FWIW, my plan was to encapsulate this so that it would be clear to both contributors and reviewers when this HirId was used for something besides diagnostic spans or suggestions.

Regardless, switching VarBindingForm to be a single HirId seems to be somewhat contentious, and doing so is not necessary to fix this issue since we can encode the spans ahead of time like #51275 did. I'll open a separate issue.

@camsteffen
Copy link
Contributor

Is this something I could contribute to with some mentorship?

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Jan 25, 2022

I can answer specific questions on anything MIR related, but for the incremental comp stuff we discussed above you'll need to look elsewhere. I never found a solution I was satisfied with.

The other big stumbling block I remember was async fns, since the MIR we generate for them is more disconnected from the HIR than for ordinary fns; some things that appear as used in the HIR appear as unused in the MIR and vice versa. I don't think I tried particularly hard to solve this in #72164, so it's possible you can do better with not much effort.

I was seeing spurious warnings for variables that were bound in a match arm and only used in a guard. Presumably this will also require some changes to MIR building, but I've not investigated this one yet.

the MIR has no record of statements like let _ = x even before SimplifyCfg. More complex expressions seem to be lowered even when they are assigned to _,

These two issues are solved in #72164. See this comment for a discussion of the first.

It's also worth mentioning that with #91032, there's another big HIR dataflow analysis in addition to liveness. Going from 2 big analyses to 1 might be a less attractive proposition than going from 1 to 0.

@nikomatsakis
Copy link
Contributor Author

I'm of mixed minds here. I sometimes think that we should do our safety analyses on "THIR" instead of "MIR". MIR currently serves two masters, analysis and optimization -- I think that produces more tension than I anticipated initially.

@camsteffen
Copy link
Contributor

For some context I was hoping to work towards #65467.

@camsteffen
Copy link
Contributor

I'm of mixed minds here. I sometimes think that we should do our safety analyses on "THIR" instead of "MIR". MIR currently serves two masters, analysis and optimization -- I think that produces more tension than I anticipated initially.

I would think that the decision between MIR or THIR (or HIR) for analyses should be based on which one would allow the simplest (or cheapest?) implementation, and that any problems caused with incremental can be solved architecturally. It seems like THIR would allow a simpler implementation than the HIR (especially for #65467), but I don't have enough experience with MIR to comment on how that would compare.

My understanding of the "tension" is this: The MIR has insufficient information about the original code to do some analyses/diagnostics. This is fixed by copying more data from HIR to MIR. But that added data tends to break incremental compilation. So then we need to separate "data needed for codegen" from "data needed for analyses" and not hash the latter. This seems resolvable to me in my limited understanding. Though I wonder if the mere size of the "analyses data" needed should be enough to deter us from using the MIR?

@nikomatsakis
Copy link
Contributor Author

I don't really think this is about incremental compilation, @camsteffen. It's more about what invariants the MIR code must maintain. The simplest example is stripping dead code: it's convenient for MIR to not contain unreachable code, but that also means that any analysis will disregard such code. How smart can MIR be in removing dead code? Can it do CSE? Constant propagation? etc

@camsteffen
Copy link
Contributor

Aren't those all optimizations that happen after lowering, and so analyses can happen in between? (thanks for the discussion)

@bjorn3
Copy link
Member

bjorn3 commented Jan 31, 2022

I believe something like let _ = *foo; results in no MIR at all even before optimizations. Doing the equivalent of let tmp = *foo; let _ = tmp; wouldn't be possible as that would move out of *foo, while let _ = *foo; doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants