Wikipedia:Bots/Requests for approval/UsuallyNonviolentBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Jc86035 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 08:30, Monday, May 29, 2017 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): AutoWikiBrowser
Source code available: Using AWB with regex find/replace and genfixes
Function overview: Add |misc=
to {{Infobox album}}, {{Infobox song}} and {{Infobox single}} and fix chronology parameters for {{Infobox single}}, {{Infobox album}} and {{Extra chronology}}
Links to relevant discussions (where appropriate): User talk:Ojorojo#Merger and replacement; others including WT:SONGS and Template talk:Infobox single
Edit period(s): One-time run, plus additional runs
Estimated number of pages affected: 15,000
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): No
Function details: Regexes below. Pages will be listed from Category:Music infoboxes with malformed table placement (0), Category:Music infoboxes with Module:String errors (2) and Category:Errors reported by Module String (12). The functions involve the cleanup of the templates before they are to be automatically substituted by AnomieBOT.
Description | Find | Replace |
---|---|---|
Remove |misc= or |Misc= if between auxiliary templates |
(\{\{([eE]xtra album cover|[eE]xtra chronology|[sS]ingles|[eE]xternal music video|[aA]udiosample|[eE]xtra music sample|[aA]udio sample|[Ee]xtra track listing). \}\})\s*\|\s*[Mm]isc\s*=\s*(\{\{([eE]xtra album cover|[eE]xtra chronology|[sS]ingles|[eE]xternal music video|[aA]udiosample|[eE]xtra music sample|[aA]udio sample|[Ee]xtra track listing). \}\}) |
$1$3
|
Prepend |misc= to auxiliary templates |
(\{\{([eE]xtra album cover|[eE]xtra chronology|[sS]ingles|[eE]xternal music video|[aA]udiosample|[eE]xtra music sample|[aA]udio sample|[Ee]xtra track listing). \}\}\s*) |
|misc=$1
|
Move quotes outside of links | \[\[([^\|^\/] )\|"([^\^\/]] )"\]\] |
"[[$1|$2]]"
|
Move bold outside of links | \[\[([^\|^\/] )\|'''([^\^\/]] )'''\]\] |
'''[[$1|$2]]'''
|
Move italics outside of links | \[\[([^\|^\/] )\|''([^\^\/]] )''\]\] |
''[[$1|$2]]''
|
Remove parameters containing only dashes, empty quotes or "N/A" (chronologies) | ^(\s*\|?\s*((Last|This|Next) (single|album)|(last|this|next)_(single|album))\s*=\s*)(N\/A|-|none|–|—|–|—|{{snd}}|{{ndash}}|{{mdash}}|{{dash}}|""|")\s*(\||\}\})?\s*$ |
$1$8
|
Remove <small> tags (chronologies) |
^(\s*\|?\s*((Last|This|Next) (single|album)|(last|this|next)_(single|album))\s*=\s*)<small>(. )</small>\s*(\||\}\})?\s*$ |
$1$7$8
|
Remove <small> tags from year brackets (inside templates; could be improved) |
<small>(\([a-zA-Z ]*\d\d\d\d\))</small> |
$1
|
Format singles chronology parameters | ^(\s*\|?\s*((Last|This|Next) single|(last|this|next)_single)\s*=\s*)"?(\[\[[^"^\/^\]] \]\])"?\s*(< *\/? *[Bb] [Rr] *\/? *>)?\s*\([a-zA-Z ]*(\d\d\d\d)\)\s*(\||\}\})?\s*$ |
$1"$5"<br />($7)$8
|
Format singles chronology parameters | ^(\s*\|?\s*((Last|This|Next) single|(last|this|next)_single)\s*=\s*)"([^"^\/] )"\s*(< *\/? *[Bb] [Rr] *\/? *>)?\s*\([a-zA-Z ]*(\d\d\d\d)\)\s*(\||\}\})?\s*$ |
$1"$5"<br />($7)$8
|
Format singles chronology parameters | ^(\s*\|?\s*((Last|This|Next) single|(last|this|next)_single)\s*=\s*)"?([^"^|^\/^\(^\)^<] )"?\s*(< *\/? *[Bb] [Rr] *\/? *>)?\s*\([a-zA-Z ]*(\d\d\d\d)\)\s*(\||\}\})?\s*$ |
$1"$5"<br />($7)$8
|
Format albums chronology parameters | ^(\s*\|?\s*((Last|This|Next) album|(last|this|next)_album)\s*=\s*)"?('')?"?(\[\[[^\]] \]\])"?('')?"?\s*(< *\/? *[Bb][Rr] *\/? *>)?\s*\([a-zA-Z ]*(\d\d\d\d)\)\s*(\||\}\})?\s*$ |
$1''$6''<br />($9)$10
|
Format albums chronology parameters | ^(\s*\|?\s*((Last|This|Next) album|(last|this|next)_album)\s*=\s*)"?''"?"?(.*)''"?\s*(< *\/? *[Bb][Rr] *\/? *>)?\s*\([a-zA-Z ]*(\d\d\d\d)\)\s*(\||\}\})?\s*$ |
$1''$5''<br />($7)$8
|
Remove space at end of quote before years bracket | "<br />\( |
"<br />(
|
Move quotes outside of bold formatting | '''"(\S[^"] \S)"''' |
"'''$1'''"
|
Fix unclosed brackets at the end of chronology parameters | \((\d\d\d\d)(\s*\||\}\})?(\s*)$ |
($1)$2$3
|
Replace italics with quotes in singles chronology parameters | ^(\s*\|?\s*((Last|This|Next) single|(last|this|next)_single)\s*=\s*)"''((''')?[^"^'] (''')?)''"\s*(< *\/? *[Bb] [Rr] *\/? *>)?\s*\([a-zA-Z ]*(\d\d\d\d)\)\s*(\||\}\})?\s*$ |
$1"$5"<br />($9)$10
|
Replace italics with quotes in singles chronology parameters | ^(\s*\|?\s*((Last|This|Next) single|(last|this|next)_single)\s*=\s*)"''([^"] )''"\s*(< *\/? *[Bb] [Rr] *\/? *>)?\s*\([a-zA-Z ]*(\d\d\d\d)\)\s*(\||\}\})?\s*$ |
$1"$5"<br />($7)$8
|
Discussion
editI would also like to have the bot fix incorrect template nesting within the auxiliary templates, but this is somewhat beyond my ability. If anyone knows how, please tell me. Thanks. (pinging Ojorojo) Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 08:33, 29 May 2017 (UTC)[reply]
To any BAG members, please grant my bot account AWB access before approving trial. Thanks, Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 15:21, 31 May 2017 (UTC)[reply]
- @Jc86035: This task is fine, fundamentally. Since there are a lot of regexes involved, the room for error is high and we'll need to check over things carefully. Before we move on to the trial, I would just like to get a clearer sense of why these pages are malformed and exactly what is going to be changed. For example, Amsterdam February 94 and Amor Fati (album) (two random pages) are both in Category:Music infoboxes with Module:String errors, but I don't see an obvious indication of how these are malformed and what the bot would do to them. Or are these not in the scope of the task? — Earwig talk 20:01, 2 June 2017 (UTC)[reply]
- @The Earwig: I think both are currently outside the scope of this task, although it's possible that the bot might edit them anyway and replace the
<br>
tags (I might fix the regexes for this). The reason the pages have "malformed" code is because the parameters are being replaced through Module:String to avoid formatting errors like these and reduce room for error (e.g.|Last single="Song"<br/>(2015)
→|prev_title=Song
|prev_year=2015
). The first one incorrectly has the italics inside the link, and won't be edited under the current regexes since I assumed no one would display the disambiguation as well. The second one has "(EP)" inside one of the parameters, which trips up Module:String and probably shouldn't be there (I might add a regex for that as well). Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 03:46, 3 June 2017 (UTC)[reply]
- @The Earwig: I think both are currently outside the scope of this task, although it's possible that the bot might edit them anyway and replace the
- Per WP:BOTACC, the bot's name does not identify the task or the bot operator. — HELLKNOWZ ▎TALK 20:56, 2 June 2017 (UTC)[reply]
- @Hellknowz: Should I rename the bot? Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 03:46, 3 June 2017 (UTC)[reply]- We have approved bots before that don't follow this policy point. There doesn't seem to be any strong opposition. I'm just making sure to mention this. — HELLKNOWZ ▎TALK 19:48, 3 June 2017 (UTC)[reply]
- I personally don't have a massive problem with this, as long as the user page identifies the owner (which it does) and perhaps edit summaries can identify the owner too, to make it very clear who controls the bot. As edit summaries and the userpage are the main ways people come into contact with bots, I feel that as long as these suggestions are implemented, there is little chance for confusion over who owns the bot. TheMagikCow (T) (C) 11:36, 4 June 2017 (UTC)[reply]
- Agreed; when wanting to find the owner of a bot, I doubt people are going to rely solely on its username without checking its userpage. We have enough bots out there that don't indicate ownership with their names that it seems unfair to enforce such a policy now, unless the community clearly decided it was unacceptable. — Earwig talk 18:25, 4 June 2017 (UTC)[reply]
- I personally don't have a massive problem with this, as long as the user page identifies the owner (which it does) and perhaps edit summaries can identify the owner too, to make it very clear who controls the bot. As edit summaries and the userpage are the main ways people come into contact with bots, I feel that as long as these suggestions are implemented, there is little chance for confusion over who owns the bot. TheMagikCow (T) (C) 11:36, 4 June 2017 (UTC)[reply]
- This policy doesn't really reflect current practices anymore and should be reviewed for updates. As far as this instance - the bot account name identifies it as a bot, and the bot userpage clearly identifies its tasks and operator, I'm not concerned (forexample, naming it 86035bot isn't really going to solve anything for editors here). — xaosflux Talk 17:11, 7 June 2017 (UTC)[reply]
- We have approved bots before that don't follow this policy point. There doesn't seem to be any strong opposition. I'm just making sure to mention this. — HELLKNOWZ ▎TALK 19:48, 3 June 2017 (UTC)[reply]
- @Hellknowz: Should I rename the bot? Jc86035 (talk) Use {{re|Jc86035}}
- @Jc86035: You mentioned briefly that you may be considering adding additional regex. What's the status of that? If the code is pretty much written, I think we're ready to approve a trial, but I'm not a fan of ill-defined tasks that may expand mid-trial. I don't necessarily need to see a list of regex (although it's helpful, and I'll review it for obvious issues since you've provided it), but I definitely need to see a list of what fixes you plan to do if anything has changed. ~ Rob13Talk 15:42, 13 June 2017 (UTC)[reply]
- @BU Rob13: I think it's largely done, except I need to duplicate the seven larger regexes with outputs containing
<br>
, so that where there is already a tag the original is retained (to avoid making extra edits). So I don't forget: The first of each pair would have the question mark after(< *\/? *[Bb][Rr] *\/? *>)
in find removed and the tag in replace replaced with$6
or$8
, and the second of each pair would have the parentheses, their contents and the question mark removed and the affected match numbers in replace reduced by 1. Anything else would be in a second run, because there's all sorts of things that could turn up given the large number of formatting errors, and I don't really want to introduce more of them. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 15:56, 13 June 2017 (UTC)[reply]
- @BU Rob13: I think it's largely done, except I need to duplicate the seven larger regexes with outputs containing
- Approved for trial (200 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Sounds good, please ping me if anything changes. If possible, try to include a bit of each fix in the trial, although that may be difficult. Once the trial is complete, please explain the contributions and link to the trial edits at relevant WikiProjects (SONGS, mainly, but others if you think they're relevant) in addition to marking the trial complete here. ~ Rob13Talk 16:00, 13 June 2017 (UTC)[reply]
- @BU Rob13: Trial complete. (200 edits); one article was edited twice and I did manual fixes for the first few because of a regex problem with bold/italic formatting (fixed). Judging by tracking categories the success rate was about 75%, so more runs will probably be needed. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 11:34, 15 June 2017 (UTC)[reply]- @Jc86035: Bit busy this week, so I'll try to review this early next week. In the meantime, responding to your 75% success rate comment, could you provide some context as to what happens with the edits that are unsuccessful? Diffs would help. ~ Rob13Talk 12:04, 15 June 2017 (UTC)[reply]
- @BU Rob13: I've deliberately set up the bot not to remove brackets after titles as they could contain valuable information, so this edit did not remove the bracket which is causing the categorization. This edit and this edit did not fix the page since there are two separate sets of chronologies in one row which should be split using {{Extra chronology}}. This edit did not fix the page since the year value is "2007/2008" and isn't recognized. There are others, such as a missing apostrophe in italics (the bot only fixes italics when both apostrophes are missing). Some of these will need to be fixed manually, and hopefully the bot can clear away the easy fixes. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 12:38, 15 June 2017 (UTC)[reply]- @Jc86035: Fair enough. Am I correct in thinking every edit the bot makes will still contain one non-cosmetic fix? ~ Rob13Talk 13:40, 15 June 2017 (UTC)[reply]
- @BU Rob13: I've deliberately set up the bot not to remove brackets after titles as they could contain valuable information, so this edit did not remove the bracket which is causing the categorization. This edit and this edit did not fix the page since there are two separate sets of chronologies in one row which should be split using {{Extra chronology}}. This edit did not fix the page since the year value is "2007/2008" and isn't recognized. There are others, such as a missing apostrophe in italics (the bot only fixes italics when both apostrophes are missing). Some of these will need to be fixed manually, and hopefully the bot can clear away the easy fixes. Jc86035 (talk) Use {{re|Jc86035}}
- @Jc86035: Bit busy this week, so I'll try to review this early next week. In the meantime, responding to your 75% success rate comment, could you provide some context as to what happens with the edits that are unsuccessful? Diffs would help. ~ Rob13Talk 12:04, 15 June 2017 (UTC)[reply]
- @BU Rob13: Trial complete. (200 edits); one article was edited twice and I did manual fixes for the first few because of a regex problem with bold/italic formatting (fixed). Judging by tracking categories the success rate was about 75%, so more runs will probably be needed. Jc86035 (talk) Use {{re|Jc86035}}
- @Jc86035: Sorry for the delay. I'll review this tonight or possible tomorrow morning. Been a crazy couple of weeks. ~ Rob13Talk 03:28, 22 June 2017 (UTC)[reply]
- @Jc86035: Various issues:
- Have you considered adding regex to ensure the single name is bolded in the chronology? That seems to be standard from what I've seen. e.g. [1] [2] [3]
- How are you implementing the removal of spaces before the line breaks? Those should be normal find-and-replace designated as minor fixes with skip if minor fixes only checked.
- Are you referring to the removal of spaces before (and after) the
<br>
tags, or removal of spaces at the end of lines? If you're referring to the former I can fix that. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 05:39, 22 June 2017 (UTC)\[reply]- @Jc86035: The former. The latter should be taken care of with "skip if genfixes only". ~ Rob13Talk 05:43, 22 June 2017 (UTC)[reply]
- Are you referring to the removal of spaces before (and after) the
- Why was this edit made?
- The error is there because of the "n.a.", which the bot didn't remove from otherwise empty chronology parameters before this edit (I've added it to the match). Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 05:45, 22 June 2017 (UTC)[reply]
- The error is there because of the "n.a.", which the bot didn't remove from otherwise empty chronology parameters before this edit (I've added it to the match). Jc86035 (talk) Use {{re|Jc86035}}
- String module errors were introduced here.
- Fixed in the template; purge the page if you still see it. The release date is pulled from the chronology if it's there. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 05:45, 22 June 2017 (UTC)[reply]
- Fixed in the template; purge the page if you still see it. The release date is pulled from the chronology if it's there. Jc86035 (talk) Use {{re|Jc86035}}
- For these edits, not all of the italics were fixed. [4] [5] [6], many others
- I'll see if this can be fixed. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 05:45, 22 June 2017 (UTC)[reply]
- I'll see if this can be fixed. Jc86035 (talk) Use {{re|Jc86035}}
- ~ Rob13Talk 05:37, 22 June 2017 (UTC)[reply]
- I'm going to stop reviewing here for now, about 1/3 of the way through the trial. The last issue is the most severe and present in many of the edits I've checked, so it will need to be chased down and fixed before we move on to another trial. ~ Rob13Talk 05:42, 22 June 2017 (UTC)[reply]
- @Jc86035: Various issues:
- Approved for extended trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Mostly to ensure issue 5 is fixed. ~ Rob13Talk 14:48, 27 June 2017 (UTC)[reply]
- @BU Rob13: Trial complete. (100 edits); 74 edits removed a page from the error category. Several more (such as Words Words Words) would have been fixed by regex changes I made during the run, but I haven't gone back to fix them. A few pages were edited because they were in the Module:String error category for unrelated reasons. Several pages were not fixed due to there being multiple chronologies, extra brackets, Chinese text supplementing the English text, or multiple year values within one set of parentheses. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 11:09, 28 June 2017 (UTC)[reply]
- @BU Rob13: Trial complete. (100 edits); 74 edits removed a page from the error category. Several more (such as Words Words Words) would have been fixed by regex changes I made during the run, but I haven't gone back to fix them. A few pages were edited because they were in the Module:String error category for unrelated reasons. Several pages were not fixed due to there being multiple chronologies, extra brackets, Chinese text supplementing the English text, or multiple year values within one set of parentheses. Jc86035 (talk) Use {{re|Jc86035}}
@Jc86035: I'm noticing a lot of erroneous edits where just genfixes or cosmetic-only changes are being made. Do you have "Skip if genfixes only" checked? See [7] [8] [9] [10] [11] [12]. ~ Rob13Talk 21:52, 3 July 2017 (UTC)[reply]
- @BU Rob13: I believe this is because I didn't switch on "Skip if only minor replacement made" (I've checked it and every other skip box). The only minor find-and-replace is
\s*< *\/? *[Bb][Rr] *\/? *>\s*
→<br />
. I think it should be ready now, and if I've missed anything that's bot-fixable then a second run can be done. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 11:46, 5 July 2017 (UTC)[reply]- @Jc86035: Easy enough fix. Couple other edge cases:
- [13] Move the find-and-replace for (2005 --> (2005) above the rule adding line breaks before the year, in order to prevent a lack of closing parenthesis from resulting in no line break.
- [14] I'm not sure if you want to do something about italics that include the date (but shouldn't). It does seem to be in the general scope of this task, but you don't have to; you're not introducing the error. It would probably mean another trial if you wrote a new find-and-replace for this, so keep that in mind if you decide to code for this.
- ~ Rob13Talk 14:08, 5 July 2017 (UTC)[reply]
- @BU Rob13: #1 will need another regex because the current one relies on the position of the
<br>
. I've fixed #2. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 14:33, 5 July 2017 (UTC)[reply]- @Jc86035: Are you certain? The bot currently injects a
<br>
between the title and a date if no<br>
is there. The only reason that rule wasn't applied was because it finds dates in the format of "(2005)" rather than "(2005". If the rule to change (2005 to (2005) ran first, then I see no reason why the other regex wouldn't work as intended. ~ Rob13Talk 14:38, 5 July 2017 (UTC)[reply]- @BU Rob13: The regex also does replacement for "2005)", so it searches for the br as well to avoid song/album titles ending in numbers. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 14:55, 5 July 2017 (UTC)[reply]- @Jc86035: I guess I'm confused. How was this edit made with no line break in the line? [15] ~ Rob13Talk 15:23, 5 July 2017 (UTC)[reply]
- @BU Rob13: See "Fix unclosed brackets at the end of chronology parameters" in the table above. I obviously should have put this before the rest of the fixes; currently modifying the regexes. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 11:41, 6 July 2017 (UTC)[reply] - I've added another regex and moved the one I mentioned at 11:41 to immediately after the other bracket fixes. It should be ready now. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 12:03, 6 July 2017 (UTC)[reply]
- @BU Rob13: See "Fix unclosed brackets at the end of chronology parameters" in the table above. I obviously should have put this before the rest of the fixes; currently modifying the regexes. Jc86035 (talk) Use {{re|Jc86035}}
- @Jc86035: I guess I'm confused. How was this edit made with no line break in the line? [15] ~ Rob13Talk 15:23, 5 July 2017 (UTC)[reply]
- @BU Rob13: The regex also does replacement for "2005)", so it searches for the br as well to avoid song/album titles ending in numbers. Jc86035 (talk) Use {{re|Jc86035}}
- @Jc86035: Are you certain? The bot currently injects a
- @BU Rob13: #1 will need another regex because the current one relies on the position of the
- @Jc86035: Easy enough fix. Couple other edge cases:
{{BAG assistance needed}}
I would like to get this done soon, as {{Infobox single}} has been lying around in a transition state for a while and has already been transwikied at least once. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 11:21, 7 July 2017 (UTC)[reply]
- @Jc86035: To try to avoid a third trial, please (a) post the new regex you added so I can look over it (email if you prefer not to make it public - that's also fine), and (b) undo your bot's edit at [16] and redo it with the corrected regex. ~ Rob13Talk 13:31, 7 July 2017 (UTC)[reply]
- @BU Rob13: Done. I've sent you an email containing the text of the bot configuration file, in case there's anything else wrong with it. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 14:09, 7 July 2017 (UTC)[reply] - I've turned off the fix for br tags, because AWB just doesn't handle it properly for some reason. I've also had another issue with regexes, but it won't affect the task right now because only about 600 of the 20,000 pages would be fixed by that. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 14:14, 7 July 2017 (UTC)[reply]- @Jc86035: Will you ensure the bot doesn't edit those pages as long as the issue persists? ~ Rob13Talk 14:43, 7 July 2017 (UTC)[reply]
- @BU Rob13: I've removed that regex (the one I asked for help with on VPT) from the config. I won't deliberately edit those pages, although as all of the errors are sorted into the same category I'm filtering them by sort key so they might turn up occasionally. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me 15:11, 7 July 2017 (UTC)[reply]- Alright, just to be clear, if I approve at this point, I'm approving without that functionality/regex and you cannot add it back without an additional BRFA (which would probably be fast, but still needs to be done). Is that good with you, Jc86035? ~ Rob13Talk 15:13, 7 July 2017 (UTC)[reply]
- @BU Rob13: I've removed that regex (the one I asked for help with on VPT) from the config. I won't deliberately edit those pages, although as all of the errors are sorted into the same category I'm filtering them by sort key so they might turn up occasionally. Jc86035 (talk) Use {{re|Jc86035}}
- @Jc86035: Will you ensure the bot doesn't edit those pages as long as the issue persists? ~ Rob13Talk 14:43, 7 July 2017 (UTC)[reply]
- @BU Rob13: Done. I've sent you an email containing the text of the bot configuration file, in case there's anything else wrong with it. Jc86035 (talk) Use {{re|Jc86035}}
Approved. with task scope as discussed above. ~ Rob13Talk 16:05, 7 July 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.