Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Add stale PR action #36336

Merged
merged 10 commits into from
Sep 15, 2020
Merged

CI: Add stale PR action #36336

merged 10 commits into from
Sep 15, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Sep 13, 2020

Adding a GitHub Action for closing stale PRs labeling PRs as stale which I think could help in managing the backlog

https://github.com/actions/stale

@dsaxton dsaxton added the CI Continuous Integration label Sep 13, 2020
@jbrockmendel
Copy link
Member

neat idea

@dsaxton
Copy link
Member Author

dsaxton commented Sep 13, 2020

Might want to make exemptions for works in progress or PRs awaiting review, seems both are possible with this action

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just close PRs and not issues?

looking at the options https://github.com/actions/stale/blob/main/action.yml there is a debug option and rate limit, maybe try with these.

repo-token: ${{ secrets.GITHUB_TOKEN }}
stale-pr-message: "This pull request is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days."
days-before-stale: 30
days-before-close: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we set this to -1 to start, so that issues and PRs are just labelled in the first instance and not closed?

@jreback
Copy link
Contributor

jreback commented Sep 13, 2020

yeah let's just label to start

@dsaxton
Copy link
Member Author

dsaxton commented Sep 13, 2020

can we just close PRs and not issues?

Yes, I believe if we don't specify a stale-issue-message then it won't do anything for issues (this is according to the sample YAML under basic usage)

@dsaxton
Copy link
Member Author

dsaxton commented Sep 13, 2020

Updated to never close PRs for now and use debug only (I'm assuming this means we would see what actions would have occurred within the logs). Rate limiting I think is handled by default with occurences-per-run set to 30.

name: "Close stale pull requests"
on:
schedule:
- cron: "0 0 * * *"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this runs once a day? (could be more frequent while in debug mode)

also could add https://docs.github.com/en/actions/reference/events-that-trigger-workflows#scheduled-events as a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about every 6 hours during debugging?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can easily run builds manually in GitHub actions afaik. I'd say once a day is good enough, also for debug, but doesn't make a difference.

@jreback jreback added this to the 1.2 milestone Sep 13, 2020
@jreback
Copy link
Contributor

jreback commented Sep 13, 2020

@datapythonista if you'd have a quick look.

@dsaxton
Copy link
Member Author

dsaxton commented Sep 13, 2020

I also think it could be interesting to automatically label new PRs with "Needs review" (similar to new issues being flagged with "Needs triage"), and in addition exempt these from becoming stale

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @dsaxton

I guess days-before-close: -1 means that we'll just label PRs, and not actually close them, right? That sounds good to me.

Added few ideas, but happy with it.

name: "Close stale pull requests"
on:
schedule:
- cron: "0 0 * * *"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can easily run builds manually in GitHub actions afaik. I'd say once a day is good enough, also for debug, but doesn't make a difference.

- uses: actions/stale@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
stale-pr-message: "This pull request is stale because it has been open for 30 days with no activity. Please update to prevent this from being closed."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it could be good that the message is a bit friendlier, instead of being straight to the point (I guess this is copied from the action documentation).

Like explaining why we close stale PRs, letting the user know that they can always ping us later if they need it reopen...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it does sound a bit blunt. I'll try to use softer wording; open to suggestions as well.

@@ -0,0 1,17 @@
name: "Manage stale pull requests"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other workflows are named as "Activate", "CI", "Assign". Not important, but something more compact like "Stale PRs" may look more consistent, and make things easier to find in the workflows list.

@dsaxton
Copy link
Member Author

dsaxton commented Sep 13, 2020

I guess days-before-close: -1 means that we'll just label PRs, and not actually close them, right? That sounds good to me.

Yes, I think that essentially disables closing according to their documentation:

  days-before-close:
    description: 'The number of days to wait to close an issue or pull request after it being marked stale. Set to -1 to never close stale issues.'
    default: 7

@jorisvandenbossche
Copy link
Member

I don't directly find where we discussed this before (or at least where I raised this concern, some other short discussions at #23489, #23489), but my main concern with something like this is that this somewhat assumes that PRs are inactive because of the contributors being inactive.

Looking at the 10 first PRs when sorting by "least recently updated", roughly 6 of them are actually waiting on review of a core developer, 2 first need discussion / a decision, and 2 are really waiting on changes of the contributor.

Which doesn't mean that a bot cannot help improve our workflow, to be clear (it can also help us to be reminded to review, or at some point if no core dev has interest to review a PR, then the best solution is to actually close it). But at least that should somehow be reflected in the messaging of the bot.


Comment from @TomAugspurger on gitter: "Are you aware of any bots that could check with the last comment was by a pandas maintainer / requested changes? Ideally we would only auto-close when we were the last one to comment."
As another example: the cpython has a more advanced labeling / bot workflow (labels "awaiting review", "awaiting changes", "awaiting merge") where the bot will eg add a "awaiting review" label on a PR whenever a new push to the PR was done, and add a "awaiting changes" label whenever a core dev left a review comment.

@simonjayhawkins
Copy link
Member

@jorisvandenbossche I think not automatically closing PRs partly addresses your concerns. but I agree the message seems targeted to the author. maybe just a terse This pull request has been labelled as stale because it has been open for 30 days with no activity. instead.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 14, 2020

According to https://github.com/actions/stale, there are ways to configure depending on labels already present on the PR.
So eg we could not message or use a different message on PRs that are labeled with "Needs review" (if the then more consistently add this to PRs that are updated by the contributor, though).

@simonjayhawkins
Copy link
Member

maybe we need another action in place first that labels PRs with "Needs review" when changes are pushed.

@dsaxton
Copy link
Member Author

dsaxton commented Sep 14, 2020

Agree exempting some labels would make sense, I also opened #36349 to start trying to move towards that (this and exempting "Needs Review" would at least deal with the situation of someone opening a PR that doesn't get reviewed for a while).

Regarding wording could add something asking the person to ping a team member if they're waiting for a review (or can contributors be given the ability to label as "Needs Review" themselves, and ask them to do this if they're still waiting?). That way the onus doesn't sound like it's being placed only on the contributor.

@jbrockmendel
Copy link
Member

I'm happy with this as-is, so just spitballing:

  1. Would it make sense to have a a blocked-by-### label/status? e.g.
  1. On the review side, visualization is a weak point. I don't have a good solution for this.

@dsaxton
Copy link
Member Author

dsaxton commented Sep 14, 2020

I'm happy with this as-is, so just spitballing:

  1. Would it make sense to have a a blocked-by-### label/status? e.g.
  1. On the review side, visualization is a weak point. I don't have a good solution for this.

@jbrockmendel Maybe a generic "Blocked" label and add that to the exempt list here?

I'm also assuming these labels should already exist (the action won't create them) so should I add "Stale" and "Blocked" to the current set?

Edit: went ahead and added them

This is the default, but just being explicit about it
@jreback
Copy link
Contributor

jreback commented Sep 15, 2020

ok let's give this a try, can always disable if its out of control.

@jreback jreback merged commit ce7b4c0 into pandas-dev:master Sep 15, 2020
@jreback
Copy link
Contributor

jreback commented Sep 15, 2020

thanks @dsaxton

@dsaxton dsaxton deleted the stale-pr-action branch September 15, 2020 02:08
@simonjayhawkins
Copy link
Member

for info.. from https://github.com/pandas-dev/pandas/runs/1116110678?check_suite_focus=true

Marking issue #35514 - BUG: fix combine_first converting timestamp to int as stale
Marking issue #35428 - WIP BUG: Inconsistent date parsing of to_datetime as stale
Marking issue #35373 - ENH: Add validation checks for non-unique merge keys as stale
Marking issue #35348 - fix generate_range function in datetimes.py file as stale
Marking issue #35300 - ENH: add percentage threshold to dropna as stale
Marking issue #35292 - ENH: Add orient=tight format for dictionaries as stale
Marking issue #35270 - TST: Add test to ensure DF describe does not throw an error (#32409) as stale
Marking issue #35223 - PERF: Allow str.split callers to skip expensive post-processing as stale
Marking issue #35207 - ENH: raise limit for completion number of columns and warn beyond as stale
Marking issue #35127 - ENH: make is_list_like handle non iterable numpy-like arrays correctly as stale
Marking issue #35011 - Fix Issue #34923: Inferred dtype at the end of df explode method as stale
Marking issue #34987 - BUG: Cannot create third-party ExtensionArrays for datetime types (xfail) as stale
Marking issue #34878 - BUG: Fix replace for different dtype equal value as stale
Marking issue #34827 - BUG fix _Unstacker int32 limit in dataframe sizes (#26314) as stale
Marking issue #34814 - ENH: add masked algorithm for mean() function [WIP] #34754 as stale

@simonjayhawkins
Copy link
Member

#35292 has label:"Needs Discussion". maybe add to skip list.

@dsaxton
Copy link
Member Author

dsaxton commented Sep 16, 2020

Marking issue #35292 - ENH: Add orient=tight format for dictionaries as stale
Marking issue #34755 - TST: Period with Timestamp overflow as stale
Marking issue #34720 - BUG #34621 added nanosecond support to class Period as stale
Marking issue #34584 - DOC: DataFrame.reindex improve summary and return type as stale
Marking issue #34488 - BUG: Raise ValueError for non numerical join columns in merge_asof as stale
Marking issue #34426 - Categorical.(to|from)_dummies as stale
Marking issue #34385 - use ensure_clean rather than explicit os.remove #34384 as stale

@dsaxton
Copy link
Member Author

dsaxton commented Sep 16, 2020

Marking issue #35292 - ENH: Add orient=tight format for dictionaries as stale
Marking issue #34330 - BUG: Fix using dtype with parse_dates in read_csv as stale
Marking issue #34293 - ENH: categorical scatter plot as stale
Marking issue #34011 - Add fix to raise error when category value 'x' is not predefined but is assigned through df.loc[..]=x as stale

Top one should be ignored if / when #36382 gets merged

@dsaxton
Copy link
Member Author

dsaxton commented Sep 16, 2020

Marking issue #35292 - ENH: Add orient=tight format for dictionaries as stale
Marking issue #35063 - ENH: support index=True/False keyword for io.sql.get_schema as stale
Marking issue #35018 - BUG-Fix: AssertionError when slicing MultiIndex and setting value of … as stale
Marking issue #33739 - PERF: fix #32976 slow group by for categorical columns as stale
Marking issue #33543 - Preserving boolean dtype in Series.any/all function with level keyword #33449 as stale
Marking issue #33531 - BUG: Do not use string Index like Datetimelike Index as stale

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants