Bots noticeboard |
---|
This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although this page is frequented mainly by bot owners, any user is welcome to leave a message or join the discussion here. For non-urgent issues or bugs with a bot, a message should be left on the bot operator's talk page. If discussion with the operator does not resolve the issue or the problem is urgent and widespread, the problem can be reported by following the steps outlined in WP:BOTISSUE. This is not the place for requests for bot approvals or requesting that tasks be done by a bot. General questions about the MediaWiki software (such as the use of templates, etc.) should be asked at Wikipedia:Village pump (technical). |
Bot-related archives (v·t·) |
---|
help creating a bot
Hello. I am not sure if this is the correct venue. If this cant be solved here, kindly let me know where should I go.
I currently have the AWB bot on enwiki, User:KiranBOT. It adds wikiproject banners on talkpage (simplest task, I think). In short: I need to create a fully automated/toolforge bot.
Prelude:Around 20 days ago, I got bot flag on mrwiki (AWB). Within less than 20 days (in around 7-8 runs), it racked-up more than 10k edits there (mr:special:contributions/KiranBOT). Because of the syntax of Marathi language, and word rules (not grammar rules), there are many uncontroversial find and replace tasks. But there are less than 10 active/regular editors, so such tasks have been piled up.
To the point: On mrwiki, I would like to run a simple bot — but the one with continuous editing, like DumbBOT. Few hours ago, I created an account on wikitech/toolforge, and requested for membership. But I am still not sure how, and where to upload the bot's code. I want to code it in C#. The bot will obviously be discussed/vetted on mrwiki, along with the keywords to be replaced (I have created a rudimentary list at mr:User:Usernamekiran/typos). Any help/guidence will be appreciated a lot. —usernamekiran • sign the guestbook • (talk) 23:38, 31 December 2021 (UTC)
- Hey there. Here's my notes for PHP programming language: User:Novem Linguae/Essays/Toolforge bot tutorial. In particular, you can use this to get your FTP client and console running. For C#, I think you would want to use this: wikitech:Help:Toolforge/Mono. And also Grid instead of Kubernetes. Hope that helps. –Novem Linguae (talk) 23:56, 31 December 2021 (UTC)
- thanks. looking into it. —usernamekiran • sign the guestbook • (talk) 23:59, 31 December 2021 (UTC)
- The Novem Linguae tutorial looks good to get started, but two things to note:
1. It mentions use of an FTP client to transfer files to the host. There's another way – git. You can set up a repository on GitHub/Gitlab and push your code there. On the toolforge host, you can pull it. There are ways, using webhooks or GitHub Actions, through which you can even trigger pulls automatically on TF when you push locally.
2. It mentions use of Kubernetes for cron jobs. Using the grid is much easier (requires just adding a line to the crontab file). – SD0001 (talk) 04:01, 1 January 2022 (UTC) - dummy comment to avoid archiving. —usernamekiran • sign the guestbook • (talk) 18:41, 17 January 2022 (UTC)
- The Novem Linguae tutorial looks good to get started, but two things to note:
- thanks. looking into it. —usernamekiran • sign the guestbook • (talk) 23:59, 31 December 2021 (UTC)
So I could transfer files using github, and also created files using mono on putty/CLI. But I couldnt execute the bot. First I went with C#, then python, but both didnt work. I have lots of material in .net to study/refer like dotnetwikibot framework, source code of AWB, and some other programs mentioned at mediawiki. All I need is a little guidance regarding how to compile and run it on toolforge. Your help will be appreciated a lot. Also pinging @Mz7, JPxG, and ST47: —usernamekiran • sign the guestbook • (talk) 15:44, 18 January 2022 (UTC)
- @Mz7, SD0001, and Novem Linguae: Hi. So I downloaded phthon, and pywikibot to my computer, and made a couple of edits using it. It was pretty straightforward. I made one or two edits using replace.py. I also made a couple of edits using my own kiran.py. But I havent figured/found out how to perform find and replace task. Would you kindlt help me? The less than 10 edits can be seen at mr:special:contributions/KiranBOT_II. —usernamekiran • sign the guestbook • (talk) 20:06, 17 February 2022 (UTC)
- I don't know Python, so I can't help with this particular question. But Python is one of the most popular wiki bot languages so I am sure someone can help. Discord's #technical channel may be a good resource. Good luck. –Novem Linguae (talk) 20:14, 17 February 2022 (UTC)
- The documentation is at mw:Manual:Pywikibot/replace.py. Find & replace would be something like
replace foo bar -search:"insource:\"foo\" "
, or am I missing something? ― Qwerfjkltalk 20:31, 17 February 2022 (UTC)- Either that, or you can write your own python code for find and replace. Pywikibot just needs to be used to fetch the text in the beginning, and to save it in the end. – SD0001 (talk) 04:07, 18 February 2022 (UTC)
- Qwerfjkl: Hi. For the quick background: I want to create an automated bot on Marathi (mrwiki) to find and replace certain words or set of words. The current list has 42 words to be replaced in mainspace, and is expected to grow upto around 200. @SD0001 I already successfully edited pages using my own script. The only two things I am currently stuck at is the working find and replace syntax for pywikibot, and to tell the script to edits pages from ns:0. The script I created, I had told it to edit one particular page. —usernamekiran • sign the guestbook • (talk) 08:18, 18 February 2022 (UTC)
- Have you tried the regex module for python? ― Qwerfjkltalk 08:39, 18 February 2022 (UTC)
- @Qwerfjkl: Hi. Thanks for the response, but unfortunately, I had found the solution before seeing your reply. I have solved the find and replace issue, now the only thing that remains is, how to tell the script to edit the pages of certain namespace(s). Can you help me with that please? —usernamekiran • sign the guestbook • (talk) 18:34, 18 February 2022 (UTC)
- It depends on how you are supplying the pages to edit. ― Qwerfjkltalk 19:42, 18 February 2022 (UTC)
- One technique would be to filter by namespace when you're generating your list of pages to edit. The details of that will depend on how you are generating your list of pages to edit. Are you using an API query, a pywikibot function, etc? I suggest sharing your code with us, you'll likely get more and better answers. –Novem Linguae (talk) 20:34, 18 February 2022 (UTC)
- @Qwerfjkl: Hi. Thanks for the response, but unfortunately, I had found the solution before seeing your reply. I have solved the find and replace issue, now the only thing that remains is, how to tell the script to edit the pages of certain namespace(s). Can you help me with that please? —usernamekiran • sign the guestbook • (talk) 18:34, 18 February 2022 (UTC)
- Have you tried the regex module for python? ― Qwerfjkltalk 08:39, 18 February 2022 (UTC)
- Qwerfjkl: Hi. For the quick background: I want to create an automated bot on Marathi (mrwiki) to find and replace certain words or set of words. The current list has 42 words to be replaced in mainspace, and is expected to grow upto around 200. @SD0001 I already successfully edited pages using my own script. The only two things I am currently stuck at is the working find and replace syntax for pywikibot, and to tell the script to edits pages from ns:0. The script I created, I had told it to edit one particular page. —usernamekiran • sign the guestbook • (talk) 08:18, 18 February 2022 (UTC)
- Either that, or you can write your own python code for find and replace. Pywikibot just needs to be used to fetch the text in the beginning, and to save it in the end. – SD0001 (talk) 04:07, 18 February 2022 (UTC)
- The documentation is at mw:Manual:Pywikibot/replace.py. Find & replace would be something like
- I don't know Python, so I can't help with this particular question. But Python is one of the most popular wiki bot languages so I am sure someone can help. Discord's #technical channel may be a good resource. Good luck. –Novem Linguae (talk) 20:14, 17 February 2022 (UTC)
Here is the code:
import pywikibot
from pywikibot import pagegenerators, textlib
import re
#retrieve the page
site = pywikibot.Site()
page = pywikibot.Page(site, u"user:usernamekiran/typos")
text = page.text
#edit the page
string = page.text
page.text = string.replace("abcusernamekiran", "xyz")
#save the page
page.save(u"experimental edit with modified script")
Thanks, —usernamekiran • sign the guestbook • (talk) 09:54, 19 February 2022 (UTC)
- @Usernamekiran: You could generate the pages something like this (untested):― Qwerfjkltalk 10:40, 19 February 2022 (UTC)
import pywikibot from pywikibot import pagegenerators, textlib import re #retrieve the pages site = pywikibot.Site() pages = site.search( "intitle:\"foo\"", total=5, namespaces=0) for page in pages: text = page.text #edit the page text = text.replace("abcusernamekiran", "xyz") # or using the re module: # text = re.sub( "abcusernamekiran", "xyz", text) #save the page page.save(u"experimental edit with modified script")text
Review of Wikipedia:Bots/Requests for approval/MalnadachBot 12
MalnadachBot task 12 was recently speedily approved to correct lint errors on potentially hundreds of thousands of pages. As the bot started making some initial edits, my watchlist has started blowing up. The majority of the edits that I see in my watchlist are fixing deprecated <font> tags in one particular user's signature, in ancient AfD nominations that I made 10+ years ago. A very small sampling: [2][3][4][5] These edits do not change the way the page is rendered; they only fix the underlying wikitext to bring it into compliance with HTML5. Since no substantive changes are being made in any of these edits, I believe this bot task should not have been approved per our bot policy; specifically, WP:COSMETICBOT. I'd like to request that this task (and any other similar tasks) be reviewed in light of this. Pinging bot owner and bot task approver: @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ: @Primefac: —ScottyWong— 15:59, 28 January 2022 (UTC)
- Scottywong, when should obsolete HTML tags be converted to modern syntax? Lint errors have been flagged by MediaWiki since 2018, so a small group of editors have already been fixing errors for over three years and there are still millions of errors. Given that we have fixed a lot of the easy errors, the remaining millions of errors will take multiple years to fix. – Jonesey95 (talk) 16:13, 28 January 2022 (UTC)
- It is properly tagging the edit as "bot" and "minor" - so watchlist flooding should be able to be alleviated by hiding bot edits. — xaosflux Talk 16:23, 28 January 2022 (UTC)
- I understand why the bot is making these edits, and how to hide them from my watchlist. However, if you're suggesting that WP:COSMETICBOT is no longer a valid part of bot policy, perhaps we should delete that section from WP:BOTPOL? Or can you explain how this bot is not making purely cosmetic edits to the wikitext of pages? —ScottyWong— 16:40, 28 January 2022 (UTC)
- I haven't gone through that part, was looking if there was any immediate tech issue that was causing flooding. — xaosflux Talk 16:47, 28 January 2022 (UTC)
- WP:COSMETICBOT explicitly mentions
[fixing] egregiously invalid HTML such as unclosed tags, even if it does not affect browsers' display or is fixed before output by RemexHtml (e.g. changing <sup>...</sub> to <sup>...</sup>)
as non-cosmetic. – SD0001 (talk) 16:47, 28 January 2022 (UTC)- Scottywong, I quoted COSMETICBOT to you once today, but maybe you haven't seen that post yet. Here it is again:
Consensus for a bot to make any particular cosmetic change must be formalized in an approved request for approval.
That happened. The BRFA and the bot's edits are consistent with WP:COSMETICBOT. – Jonesey95 (talk) 16:49, 28 January 2022 (UTC)- The BRFA was speedily approved in less than 3 hours, with no opportunity for community discussion. This discussion can act as a test for whether or not there is community consensus for this bot to operate in violation of WP:COSMETICBOT. The <sub></sup> example given above is substantive, because it would actually change the way the page is rendered. Changing <font> tags to <span> tags results in no change whatsoever, since every modern browser still understands and supports the <font> tag, despite it being deprecated. —ScottyWong— 16:54, 28 January 2022 (UTC)
- Well, unlike the hard work we did to clear out obsolete tags in Template space, we're not going to fix millions of font tags in talk space pages by hand, which leaves two options that I can see: an automated process, or leaving the font tags in place until it is confirmed that they will stop working. It sounds like what you want is an RFC at VPR or somewhere to ask if we should formally deprecate the use of font tags on the English Wikipedia. You might want to ask about other obsolete tags (
<tt>...</tt>
,<strike>...</strike>
, and<center>...</center>
) while you're at it. – Jonesey95 (talk) 17:02, 28 January 2022 (UTC)- There's a difference between "from this point on, let's not use font tags anymore" and "let's go back to millions of dormant AfD pages (most of which will never be read or edited ever again, for the rest of eternity) and make millions of edits to change all of the font tags to span tags." Let's see how this discussion goes first, and then we can determine if a wider RFC is necessary. —ScottyWong— 17:15, 28 January 2022 (UTC)
- My bot edits are not in violation of WP:COSMETICBOT, Lint errors are exempt from the usual prohibition on cosmetic edits. See point 4
fixed before output by RemexHtml
covers Lint errors.As for the speedy approval, the context for that is the prior BRFAs for MalnadachBot. They were to fix very specific types of Lint errors that were all done successfully after testing and discussion, fixing over 4.7 million Lint errors in the process. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 17:18, 28 January 2022 (UTC) - Perhaps this is a pedantic point that misses the crux of what you're saying, but there are millions of errors, not millions of AfDs (per a Quarry there are only 484,194 AfD subpages, excluding daily logs). jp×g 20:39, 31 January 2022 (UTC)
- My bot edits are not in violation of WP:COSMETICBOT, Lint errors are exempt from the usual prohibition on cosmetic edits. See point 4
- There's a difference between "from this point on, let's not use font tags anymore" and "let's go back to millions of dormant AfD pages (most of which will never be read or edited ever again, for the rest of eternity) and make millions of edits to change all of the font tags to span tags." Let's see how this discussion goes first, and then we can determine if a wider RFC is necessary. —ScottyWong— 17:15, 28 January 2022 (UTC)
- Regarding the speedy approval: the bot operator had 10 successful similar runs fixing these types of errors, so to say that there was "no opportunity for discussion" is a little silly - the first task was approved in May 2021, so in my mind that is 9 months worth of bot edits during which the task(s) could have been discussed. When a bot operator gets to a point where they have a bunch of similar tasks that are trialled and run with zero issues, I start speedy-approving them, not only because it saves the botop time, but it has been demonstrated that the type of tasks being performed by the bot are not an issue. Primefac (talk) 17:19, 28 January 2022 (UTC)
- To be clear, I'm not saying that the speedy approval was necessarily inappropriate. I was responding to Jonesey95's assertion that the BRFA represents explicit community consensus for this task. I'm also not suggesting that the bot is doing anything technically incorrect, or that it has any bugs. All I'm suggesting is that if fixing purely cosmetic wikitext syntax issues on ancient AfD pages doesn't qualify as WP:COSMETICBOT, then I'm not sure what would. If WP:COSMETICBOT no longer reflects the way that bots work on WP, then perhaps we should remove it. But until it's removed, I still believe that this type of task falls on the wrong side of bot policy, as currently written. —ScottyWong— 17:32, 28 January 2022 (UTC)
- I quoted the relevant portion of COSMETICBOT above. – Jonesey95 (talk) 17:48, 28 January 2022 (UTC)
- Speaking only for myself, I have a hard time finding a basis revoking approval (or denying it). The bot is doing the legwork to future proof our pages with deprecated syntax. That to me, is a good thing. The bot op / bot page however, could mention WP:HIDEBOTS as a way to reduce watchlist clutter for those annoyed by the task, but the task itself is IMO legit. Headbomb {t · c · p · b} 17:50, 28 January 2022 (UTC)
- I quoted the relevant portion of COSMETICBOT above. – Jonesey95 (talk) 17:48, 28 January 2022 (UTC)
- I agree with Primefac's assessment here. A dozen BRFAs that are preventative measures to avoid pages falling into the state where we would find COSMETICBOT irrelevant, never mind the lines in COSMETICBOT that indicate that these changes can be executed today? Sounds reasonable to me. It also avoids (good faith) time spent elsewhere like at WP:VPT when we get questions about why someone can't read an archive. Izno (talk) 18:51, 28 January 2022 (UTC)
- It seems that I'm the lone voice on this one (apart from one other user that expressed concern on the bot owner's talk page), which is fine. If you wouldn't mind, I'd like to leave this discussion open for a while longer to give anyone else an opportunity to express an opinion. If, after a reasonable amount of time, there is clear consensus that making these cosmetic changes is a good thing for the project, I'm happy to respect that and crawl back into my hole. I paused the bot while this discussion was ongoing; I will unpause the bot now since it seems somewhat unlikely that approval for this task will be revoked, and allowing the bot to continue making these edits might draw more attention to this discussion. —ScottyWong— 18:56, 28 January 2022 (UTC)
- Just a quick note from my phone, re: COSMETICBOT: Something that would qualify as a cosmetic edit that would probably never gain approval would for example be aligning '=' signs in template calls (as you often see in infoboxes) or or removing whitespace from the ends of lines. These things might clean up the wikitext, but they don't change the HTML output. AFAIK, that's what COSMETICBOT is good for. --rchard2scout (talk) 15:09, 30 January 2022 (UTC)
- I agree, cosmetic bot should be used with common sense it's not a hard red line. The question is this bot a good idea, enough to justify so many edits, and the answer is yes IMO. Furthermore, while the changes may technically be cosmetic today they won't be in the future, presumably, if/when some browsers stop supporting older syntax. I wish we had a way to make these types of edits hidden by default vs. opt-in. -- GreenC 15:24, 30 January 2022 (UTC)
... make these types of edits hidden by default...
- they already are, as the default email watchlist settings are to hide minor edits. Hell, if it's a minor bot edit, it will keep it off your watchlist even if you do want it to show up. Primefac (talk) 20:48, 30 January 2022 (UTC)- Are you sure? At the top of my watchlist I have an array of checkboxes to hide:
- I agree, cosmetic bot should be used with common sense it's not a hard red line. The question is this bot a good idea, enough to justify so many edits, and the answer is yes IMO. Furthermore, while the changes may technically be cosmetic today they won't be in the future, presumably, if/when some browsers stop supporting older syntax. I wish we had a way to make these types of edits hidden by default vs. opt-in. -- GreenC 15:24, 30 January 2022 (UTC)
- Just a quick note from my phone, re: COSMETICBOT: Something that would qualify as a cosmetic edit that would probably never gain approval would for example be aligning '=' signs in template calls (as you often see in infoboxes) or or removing whitespace from the ends of lines. These things might clean up the wikitext, but they don't change the HTML output. AFAIK, that's what COSMETICBOT is good for. --rchard2scout (talk) 15:09, 30 January 2022 (UTC)
- It seems that I'm the lone voice on this one (apart from one other user that expressed concern on the bot owner's talk page), which is fine. If you wouldn't mind, I'd like to leave this discussion open for a while longer to give anyone else an opportunity to express an opinion. If, after a reasonable amount of time, there is clear consensus that making these cosmetic changes is a good thing for the project, I'm happy to respect that and crawl back into my hole. I paused the bot while this discussion was ongoing; I will unpause the bot now since it seems somewhat unlikely that approval for this task will be revoked, and allowing the bot to continue making these edits might draw more attention to this discussion. —ScottyWong— 18:56, 28 January 2022 (UTC)
- To be clear, I'm not saying that the speedy approval was necessarily inappropriate. I was responding to Jonesey95's assertion that the BRFA represents explicit community consensus for this task. I'm also not suggesting that the bot is doing anything technically incorrect, or that it has any bugs. All I'm suggesting is that if fixing purely cosmetic wikitext syntax issues on ancient AfD pages doesn't qualify as WP:COSMETICBOT, then I'm not sure what would. If WP:COSMETICBOT no longer reflects the way that bots work on WP, then perhaps we should remove it. But until it's removed, I still believe that this type of task falls on the wrong side of bot policy, as currently written. —ScottyWong— 17:32, 28 January 2022 (UTC)
- Well, unlike the hard work we did to clear out obsolete tags in Template space, we're not going to fix millions of font tags in talk space pages by hand, which leaves two options that I can see: an automated process, or leaving the font tags in place until it is confirmed that they will stop working. It sounds like what you want is an RFC at VPR or somewhere to ask if we should formally deprecate the use of font tags on the English Wikipedia. You might want to ask about other obsolete tags (
- The BRFA was speedily approved in less than 3 hours, with no opportunity for community discussion. This discussion can act as a test for whether or not there is community consensus for this bot to operate in violation of WP:COSMETICBOT. The <sub></sup> example given above is substantive, because it would actually change the way the page is rendered. Changing <font> tags to <span> tags results in no change whatsoever, since every modern browser still understands and supports the <font> tag, despite it being deprecated. —ScottyWong— 16:54, 28 January 2022 (UTC)
- Scottywong, I quoted COSMETICBOT to you once today, but maybe you haven't seen that post yet. Here it is again:
- I understand why the bot is making these edits, and how to hide them from my watchlist. However, if you're suggesting that WP:COSMETICBOT is no longer a valid part of bot policy, perhaps we should delete that section from WP:BOTPOL? Or can you explain how this bot is not making purely cosmetic edits to the wikitext of pages? —ScottyWong— 16:40, 28 January 2022 (UTC)
- registered users
- unregistered users
- my edits
- bots
- minor edits
- page categorization (checked by default)
- Wikidata (checked by default)
- probably good edits
- I see all minor and bot edits to articles in my watchlist. Were it true that minor and bot edits are default hidden, Monkbot/task 18 might have run to completion.
- —Trappist the monk (talk) 21:04, 30 January 2022 (UTC)
- I understand where everyone is coming from, and I don't intend to continue arguing about it when it's clear I'm in the minority, but perhaps I'm a little out of the loop. Here's my question: is there any real evidence that any major, modern browsers have plans to fully deprecate support for things like <font> tags and other HTML4 elements? Will there come a time that if a browser sees a font tag in the html source of a page, it literally will ignore that tag or won't know what to do with it? It seems like such an easy thing for a browser to continue supporting indefinitely with little to no impact on anything. I suppose I'm wondering if this bot is solving an actual problem, or if it's trying to solve a hypothetical problem that we think might exist at some point in the distant future. —ScottyWong— 21:28, 30 January 2022 (UTC)
- That is a great question to address to MediaWiki's developers, who have deliberately marked
<font>...</font>
and a handful of other tags as "obsolete" by inserting error counts into the "Page information" page for all pages containing those tags. In the software world, marking a specific usage as obsolete or deprecated is typically the first step toward removal of support, and the MW developers have removed support for many long-supported features over the years. The MediaWiki developers may have similar plans for obsolete tags, or they may have other plans. – Jonesey95 (talk) 23:30, 30 January 2022 (UTC)- There are some good notes at Parsing/Notes/HTML5 Compliance. There's also some discussion buried in gerrit:334990 which mostly represents my current thoughts (though I certainly don't speak for the Parsing Team anymore), which is that it is probably unlikely browsers will stop supporting
<font>
,<big>
, etc. in the near future. If they do, we could implement those tags ourselves to prevent breakage either at the wikitext->HTML layer or just in CSS. I don't think there are any plans or even considerations to remove these deprecated tags from wikitext before browsers start dropping them. I said this in basically the same discussion on Meta-Wiki in 2020. - That said, I would much rather see bots that can properly parse font/etc. tags into their correct span/etc. replacements so it can all be done in one pass instead of creating regexes for every signature. Legoktm (talk) 18:49, 4 February 2022 (UTC)
- There are some good notes at Parsing/Notes/HTML5 Compliance. There's also some discussion buried in gerrit:334990 which mostly represents my current thoughts (though I certainly don't speak for the Parsing Team anymore), which is that it is probably unlikely browsers will stop supporting
- This is not a hypothetical problem, we know for sure that at least one obsolete html tags marked by Linter does not work in mobile.
<tt>...</tt>
renders as plain text in Mobile Wikipedia. Compare this in mobile and desktop view. Based on this we can reasonably conclude that<font>...</font>
will stop working at some point as well. Besides not everything that is obsolete in HTML5 is counted as obsolete by Linter. For example tags like<big>...</big>
and table attributes likealign
,valign
andbgcolor
are not marked by Linter even though they too are obsolete in HTML5 like font tags. So it seems the developers have plans to continue support for these, but not for font tags. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 06:00, 31 January 2022 (UTC)- Although for the record,
<tt>...</tt>
does not work in mobile because of specific css overriding it in the mobile skin (dating back to 2012). Whether any common mobile browsers didn't or don't support it is unclear. Anomie⚔ 13:57, 31 January 2022 (UTC)- So, it still sounds like we're fixing hypothetical problems that we think will eventually become real problems. I agree that it'll probably eventually become a real problem one way or the other, but honestly I still don't see the point of correcting problems in someone's signature on a 12 year-old AfD. —ScottyWong— 07:17, 1 February 2022 (UTC)
- Although for the record,
- That is a great question to address to MediaWiki's developers, who have deliberately marked
- As much I don't like my watchlist full of bot edits, it's not a COSMETICBOT violation as long the changes clear a maintenance error. What I would say is a problem is that the bot doesn't actually fix all the issues at once. For example, this edit is fine, but what about these font tags? It the bot really going to come back and edit the page again? Or even the same task like this edit, which is fine, except there are two other font tag uses. Is the bot really replacing each signature one at a time? — HELLKNOWZ ∣ TALK 11:12, 31 January 2022 (UTC)
- This is, to be honest, one of the reasons why I gave a slightly-more-blanket approval for the task; instead of "here are another three signatures" I was hoping the botop would find a wide range of similar linter errors that would likely be on the same page(s), and hit them all at once. As the run progressed, and new errors were found, they could just be added to the run logic without the need for a subsequent BRFA. If this is not the case, then it sure should be. Primefac (talk) 11:34, 31 January 2022 (UTC)
- I have just been adding specific patterns to the replacement list as and when I find them. I don't want to use general purpose regexes to do replacements since this is a fully automated task. They work fine most of the time but edge cases are troublesome. My experience Linting Wikipedia has shown that people are... creative in using all sorts of things that would cause problems for general purpose regexes. Considering the size of this task, even with 99.9% accuracy, it would still leave thousands of false positives. This is the kind of bot task that when things go smoothly, most people wouldn't care, but if there are a few errors, lots of people would come to your talk with complaints. When the number of Lint errors are down to less than a hundred thousand instead 16.6 million today, then it would be possible to do a supervised run and try to clear out all errors in a page with a single edit. My current approach of using only specific replacements may not fix all errors in the page at a time, but it does the job by keeping false positives as close to zero as possible. This to me is the most important thing.That said, I will increase the number of find and replace patterns the bot considers at a time so that more can be replaced if they are present in a page. The bot will take more time to process a page and will have to use a generic edit summary, but that's a good tradeoff I guess. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 18:07, 31 January 2022 (UTC)
- That's not really a good reason to not consolidate all of these changes into one edit. If you have code that can correct <font> tags and you have another piece of code that can correct <tt> tags, then all you have to go is grab the wikitext, run it through the font tag code, then take the resulting corrected wikitext and run it through the tt tag code, then take the resulting wikitext and run it through any other blocks of delinting code that you have, and then finally write the changes to the article. Instead, you're grabbing the wikitext, running it through the font tag code, and then saving that edit. Then, sometime later, you're grabbing that wikitext again, and running it through tt tag code, and saving that edit. There's really no difference, except for the number of edits you're making. —ScottyWong— 07:14, 1 February 2022 (UTC)
- To clarify, the code I have to correct font tags (i.e general purpose regexes to correct font tags) and some other Lint errors works fine most of the time, but gives some false positives which makes it not suitable for use in a fully automated task like this. You can read the first BRFA and this discussion for why I do not use such code with my bot. Usually in a situation like this, we would run it as semi-automated task and approve every edit before saving so that false positives can be discarded. But that is not possible here due to the huge number of pages involved. So I am left to work with a set of definite replacements, like specific user signatures and substituted templates, that are checked in a page before saving an edit. I have increased the number of replacements it will check to try and get more in an edit. This would be an example of when more than one of the replacements checked by the bot were present in a page and fixed in the same edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 15:58, 1 February 2022 (UTC)
- That's not really a good reason to not consolidate all of these changes into one edit. If you have code that can correct <font> tags and you have another piece of code that can correct <tt> tags, then all you have to go is grab the wikitext, run it through the font tag code, then take the resulting corrected wikitext and run it through the tt tag code, then take the resulting wikitext and run it through any other blocks of delinting code that you have, and then finally write the changes to the article. Instead, you're grabbing the wikitext, running it through the font tag code, and then saving that edit. Then, sometime later, you're grabbing that wikitext again, and running it through tt tag code, and saving that edit. There's really no difference, except for the number of edits you're making. —ScottyWong— 07:14, 1 February 2022 (UTC)
it's not a COSMETICBOT violation as long the changes clear a maintenance error
I disagree. The basis for COSMETICBOT in the first place is that cosmetic editsclutter page histories, watchlists, and/or the recent changes feed with edits that are not worth the time spent reviewing them
, so they should not be performed unless there is an overriding reason to do so. That is not in the text of the policy, but clearly a balance has to be struck between usefulness and spamminess; the present case has fairly low usefulness (clearing a maintenance error is hardly high-priority).- Basically I agree with Scottywong. I have no strong feelings for or against COSMETICBOT, but if that bot task is deemed to be compliant, it means the policy is toothless, so we might as well remove it. (It might be argued that the bot task is against COSMETICBOT but should still be allowed as an explicit exception, but I do not see anyone making that argument.) TigraanClick here for my talk page ("private" contact) 13:08, 3 February 2022 (UTC)
- "unless there is an overriding reason to do so" -- yes, and one of the common examples is right below in the policy text: "egregiously invalid HTML [..]". I mean, I agree that the policy sets no limit on spamminess, but that's a separate matter. — HELLKNOWZ ∣ TALK 15:16, 3 February 2022 (UTC)
- The policy says
egregiously invalid HTML such as unclosed tags
. Unclosed tags is a much more serious issue than deprecated tags, let alone deprecated tags that are still supported. "Egregiously invalid HTML" does not include only unclosed tags, but IMO it does not include font tags. At the very least, that is the sort of thing you would expect some discussion about at BRFA - if we are serious about enforcing the policy as written. TigraanClick here for my talk page ("private" contact) 13:13, 4 February 2022 (UTC)- Again I ask: When should we start fixing these deprecated tags, if not now? There are millions of instances of deprecated tags. Based on our historical pace of fixing Linter errors, it will take multiple years to fix all of them, especially since font tags in signatures are still, inexplicably, allowed by the MediaWiki software. – Jonesey95 (talk) 14:30, 4 February 2022 (UTC)
- There is a saying that prevention is better than cure. Software updation is a natural part of all websites. That
<font>...</font>
,<center>...</center>
and other obsolete tags are still supported doesn't mean it will continue to be so in future, hence why developers have marked them as errors and giving us time to replace them. Imagine logging in one day and seeing pages out of alignment and colors not being displayed, among other things. It had already happened once in July 2018. Will they not be egregiously invalid after that happens? These edits will benefit editors and readers by making sure that pages continue to display as editors intended when software changes. This is basically why COSMETICBOT allows fixing htmleven if it does not affect browsers' display or fixed before output by RemexHtml
. My bot has already replaced about 2.5 million font tags while running the previous 10 BRFA Lint fixing tasks, hardly something there was no discussion about. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 14:36, 4 February 2022 (UTC) - (edit conflict) I consider tags that will stop working on par with unclosed tag pairs. Invalid markup is invalid. Be it for visual display, parsing, screen readers, printing, accessibility, forwards-compatibility, etc. — HELLKNOWZ ∣ TALK 14:40, 4 February 2022 (UTC)
- @Hellknowz: But these deprecated HTML4 tags are still supported, and no plans have been announced by anyone to stop supporting them at a specific date. Sure, eventually they will be unsupported, maybe next year, maybe coinciding with the heat death of the universe, or maybe sometime in between. At this point, we're reacting to something that we think might happen in the future, but we don't know when, and we don't know the specifics of how that deprecation will be handled. Maybe we'll be given 5 years notice of the deprecation. Maybe, by the time HTML4 tags are fully unsupported, we'll already be using HTML6 tags, and we'll have to go through this whole process a second time when we could have just done it once. The point is: we're acting reflexively but we don't know anything yet. And our response is to edit millions of decades-old closed AfDs to mess with someone's signature. I'm amazed that so many people are pushing back on this one. I mean, I can understand going through article space to fix these issues. But 15 year-old AfDs? Really? How is that worth anyone's time? What is the worst case scenario if someone happens to open a 15 year-old AfD and someone's signature is displaying in the default font instead of the custom font that the editor originally intended? —ScottyWong— 17:31, 4 February 2022 (UTC)
- You say "worst case" but you describe one of the best cases. Worst case is the whole page throws a 5xx server error and doesn't render anything. I'm not saying it will happen, but I am saying we are trying to guess the future instead of fixing something that has been marked as deprecated. The original concern was that COSMETICBOT doesn't apply (or doesn't explicitly mention this use case). I only argue that it does follow policy even if people don't like it. But whether we actually want to fix this or leave it until it (may be) breaks is a different issue that wasn't opposed until watchlists lit up. This noticeboard can review approval on policy grounds, which I don't find a problem with. As I said, I don't like watchlist spam either and not doing all the replacements at the same time is pretty terrible. And this is likely not the first nor the last time something will need fixing. This sounds like a watchlist (search, sort, filter) functionality problem. — HELLKNOWZ ∣ TALK 19:10, 4 February 2022 (UTC)
- @Hellknowz: But these deprecated HTML4 tags are still supported, and no plans have been announced by anyone to stop supporting them at a specific date. Sure, eventually they will be unsupported, maybe next year, maybe coinciding with the heat death of the universe, or maybe sometime in between. At this point, we're reacting to something that we think might happen in the future, but we don't know when, and we don't know the specifics of how that deprecation will be handled. Maybe we'll be given 5 years notice of the deprecation. Maybe, by the time HTML4 tags are fully unsupported, we'll already be using HTML6 tags, and we'll have to go through this whole process a second time when we could have just done it once. The point is: we're acting reflexively but we don't know anything yet. And our response is to edit millions of decades-old closed AfDs to mess with someone's signature. I'm amazed that so many people are pushing back on this one. I mean, I can understand going through article space to fix these issues. But 15 year-old AfDs? Really? How is that worth anyone's time? What is the worst case scenario if someone happens to open a 15 year-old AfD and someone's signature is displaying in the default font instead of the custom font that the editor originally intended? —ScottyWong— 17:31, 4 February 2022 (UTC)
- The policy says
- "unless there is an overriding reason to do so" -- yes, and one of the common examples is right below in the policy text: "egregiously invalid HTML [..]". I mean, I agree that the policy sets no limit on spamminess, but that's a separate matter. — HELLKNOWZ ∣ TALK 15:16, 3 February 2022 (UTC)
- Some people have minor/bot edits showing on their watchlists for good reason. Eg, to monitor minor or bot edits to active pages. What they dont have them turned on for is to see a bot going back through a decades worth of archived/closed AFD's making trivial corrections to errors that barely deserve the name. Congratulations, you just pinged a load of deleted articles (quite a few contentious) back to the top of the watchlist of editors. Well done. Countdown to recreation in 5, 4, 3.... Only in death does duty end (talk) 15:14, 31 January 2022 (UTC)
- I'm coming here with a related problem. I was trying to search for something in the ANI archives, sorted by date. But that's impossible, because the date searched for is the last modified one, which is distorted because the bot has fixed minor errors long after the archive was created. For example, Wikipedia:Administrators' noticeboard/IncidentArchive969 contains two comments saying "please do not edit the archives", and yet this bot did it anyway. I don't really care what problem the bots were trying to solve, they have broken Wikipedia's search mechanism, making it unusable. Ritchie333 (talk) (cont) 12:16, 3 February 2022 (UTC)
- Honestly Wikipedia search is doomed. @Ritchie333: The problem you mention (searching archives where 'last modified' is updated by bot edits) I've gotten around by filtering by creation date, which should roughly correspond to the real date of entries in the archive. ProcrastinatingReader (talk) 15:21, 3 February 2022 (UTC)
- The problem Ritchie333 describes has existed forever, and it happens whether bots clean up the archive page, humans do manual tidying, or Xfd-processing editors do it. Pages need to be edited when MW code changes; that's just reality. It's the search dating that is broken. – Jonesey95 (talk) 15:32, 3 February 2022 (UTC)
- That doesn't have to be reality if we don't make it reality. We could choose to have our reality be one where archived discussion pages (like AN, ANI, XfD) are never edited, by bots or anyone else. And if the consequence is that, 20 years from now, an editor's fancy signature shows up in the default font instead of the custom font that the editor originally intended, well... we'll just have to find a way to emotionally deal with that problem. Coming from someone who uses custom fonts in their signature, I'm confident that I can find a way to work through that problem. It might take some extra therapy sessions, but I think I can do it. —ScottyWong— 17:36, 4 February 2022 (UTC)
- The problem Ritchie333 describes has existed forever, and it happens whether bots clean up the archive page, humans do manual tidying, or Xfd-processing editors do it. Pages need to be edited when MW code changes; that's just reality. It's the search dating that is broken. – Jonesey95 (talk) 15:32, 3 February 2022 (UTC)
- Honestly Wikipedia search is doomed. @Ritchie333: The problem you mention (searching archives where 'last modified' is updated by bot edits) I've gotten around by filtering by creation date, which should roughly correspond to the real date of entries in the archive. ProcrastinatingReader (talk) 15:21, 3 February 2022 (UTC)
- For what it's worth -- I think that this type of discussion often ends up with a pessimistic bent because people who don't see a problem don't care enough to comment about it -- I don't see a problem. Okay, maybe it breaks search: this seems like a potential real problem, but the deeper problem is that search sucks if it gets broken by this. I don't see why you would keep a page on your watchlist for ten years, besides the fact that ten years ago there wasn't a temporary-watchlist feature. It's not like there is any benefit to watchlisting an AfD that expired ten years ago -- unless there is? Is there? jp×g 20:12, 8 February 2022 (UTC)
Why not resolve this in MediaWiki?
I seem to recall seeing in some prior discussions (on WP:VPT IIRC, though I could not find the discussion so apologies for not linking it) that MediaWiki was going to have an update at some point that would basically take the wikitext (bad HTML and all) and correct it for output. It seems like clogging up edit histories with tens of thousands (or probably millions when it's all said and done) of revisions to "correct" markup that can be massaged/fixed in the rendering pipeline is a massive waste of time and resources. —Locke Cole • t • c 03:27, 7 February 2022 (UTC)
- As opposed to clogging up the rendering pipeline by requiring translation being a massive waste of time and resources for every person viewing every revision everywhere? :) Revisions in a database are fundamentally cheap, anyway.
- There was a brief time where one particular tag was translated, but it was quickly undone since it was used in places that were not compatible with the translation, among other reasons. Izno (talk) 04:40, 7 February 2022 (UTC)
- Considering rendering is only an issue for pages that are edited regularly, and most of the pages with errors seem to be old/stale discussion pages, I'm not convinced mass bot edits is somehow better. —Locke Cole • t • c 05:34, 7 February 2022 (UTC)
- Locke Cole, the opposite of what you suggest is what actually happened. The code that renders pages had been silently correcting syntax errors for years, and when a new renderer was deployed some of those workarounds were not carried over. Hence Special:Linterrors, which flags errors that could cause rendering errors (most of which have been fixed by gnomes and bots since 2018) as well as conditions that will presumably have their workarounds removed at some point. For a deep dive, see mw:Parsing/Replacing Tidy. – Jonesey95 (talk) 04:52, 7 February 2022 (UTC)
- @Jonesey95: Thank you for the pointer. What drew my attention to this was this edit which replaced
<tt>
with<code>
tags which I thought was an odd thing to be done on a mass scale (semi-automated or not). —Locke Cole • t • c 05:45, 7 February 2022 (UTC)
- @Jonesey95: Thank you for the pointer. What drew my attention to this was this edit which replaced
The funny thing is that this doesn't even need to be fixed in MediaWiki, because no major browser in the world has trouble with HTML4 tags. If this bot didn't fix it, and MediaWiki software didn't fix it, then your browser would fix it. If anything, these issues should be fixed in article space only. I've still heard no legitimate reason why it's considered valuable to fix font tags on 12 year-old closed AfDs. —ScottyWong— 05:54, 7 February 2022 (UTC)
- True, in the case of
<tt>
, which rabbit-holed me to this discussion, MDN still shows the tag as being fully supported in every major browser on the market. Being deprecated in the spec doesn't mean chase down issues that don't exist. —Locke Cole • t • c 06:11, 7 February 2022 (UTC) I've still heard no legitimate reason
is basically a fallacy, because I can say the same thing and have it be just as true.- As for browsers, that's true today. For the same reason MediaWiki developers can shut off an arbitrary tag, so too could the browsers. And they have done it before, which cannot be said of the developers.
- Never mind that mobile today already applies a CSS reset to a bunch of the old tags—<tt> and <small> off the cuff, both of which render as normal text. Izno (talk) 06:15, 7 February 2022 (UTC)
- Those are all great reasons to fix these deprecated tags within article space. However, what is the value in making millions of edits to fix these deprecated tags on old AfD pages that closed over a decade ago? If anyone can provide one good reason why that would be valuable, I'll gladly shut up. I don't think it's fallacious to ask "why?" and hope for a cogent answer. —ScottyWong— 17:12, 7 February 2022 (UTC)
- I have, on multiple occasions, needed to check an old AFD log page (generally when someone improperly transcludes a TFD deletion template) and when there are linter errors on the page it takes ages to sort out where they're coming from in order for me to a) fix them, and b) find what I was originally looking for. To me, that is reason enough for to fix old things.
- On the "watchlist spam" topic, I go through about once a year and clear out about half my watchlist of things that I will probably never need to see again, many of which are deletion discussion pages. Primefac (talk) 17:21, 7 February 2022 (UTC)
- Those are all great reasons to fix these deprecated tags within article space. However, what is the value in making millions of edits to fix these deprecated tags on old AfD pages that closed over a decade ago? If anyone can provide one good reason why that would be valuable, I'll gladly shut up. I don't think it's fallacious to ask "why?" and hope for a cogent answer. —ScottyWong— 17:12, 7 February 2022 (UTC)
Request that Malnadachbot be limited to one edit per page
The fixing of lint errors continues ad infinitum, hopefully all of those AfDs from 12 years ago will display nicely for the hordes of editors that are reading them. Anyway, in all seriousness, it has come to my attention that MalnadachBot is making multiple edits per page to fix lint errors. My RfA page keeps popping up in my watchlist periodically. The history of that page shows that Malnadachbot has now made 8 separate edits to the page to fix lint errors. Five edits were made under Task 2, and three edits were made under Task 12. All of the edits are extremely similar to each other, and there is no reason that they couldn't be made in a single edit, if the bot owner had any idea how to properly use regex and AWB. How many more edits will he need to make to this 10 year-old archived page before it is free of lint errors? I honestly feel like this is evidence that User:ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ is not a competent enough bot editor to carry out this task properly. At the very least, if there isn't support to remove his bot access, I'd like to request that his bot task approvals make it extremely clear that he must not make more than one edit per page to fix these lint errors. —ScottyWong— 16:31, 2 March 2022 (UTC)
- Note the comment by @ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ:
To clarify, the code I have to correct font tags (i.e general purpose regexes to correct font tags) and some other Lint errors works fine most of the time, but gives some false positives which makes it not suitable for use in a fully automated task like this. You can read the first BRFA and this discussion for why I do not use such code with my bot.
― Qwerfjkltalk 16:40, 2 March 2022 (UTC) - I don't see this as a problem, these are handled in waves and seem reasonable edits still. This bot appears to be properly asserting the "bot" flag on the edits, so anyone that doesn't want to see bot edits on their watchlist is able to easily filter them out. As these are appear to require manual definitions for signature fixes, it isn't reasonable to expect that every case would be per-identified. Now, if these were happening very rapidly, like 8 edits in a day perhaps we should address it better, but when that is spread out over many months per page I don't. — xaosflux Talk 16:44, 2 March 2022 (UTC)
- Note: this isn't an endorsement or condemnation on the suitability of the task(s) in general (which is being discussed above) - just that I don't see this specific constraint as a good solution. — xaosflux Talk 16:46, 2 March 2022 (UTC)
- I don't see any reason why it wouldn't be possible to fix these issues in an automated way without requiring multiple edits or manual intervention. Anyone with a decent understand of regular expressions can do this. This is not a complicated problem to solve for a competent coder. If this bot operator claims that he is not capable of fixing all the errors on a page in a single edit, or that his code is so inefficient that it produces "false positives" and requires him to manually supervise every edit, then I think we should find a different bot operator. I'll volunteer to take over the tasks if no one else wants to. FYI - the bot operator himself (not the bot) has now manually edited my old RfA page and has claimed to fix all of the lint errors on the page. —ScottyWong— 17:00, 2 March 2022 (UTC)
- @Xaosflux: It would be one thing if this bot was going through pages "in waves" and fixing different issues each time. That's not what's happening here. The bot is going to the same page to make multiple edits to fix different instances of the same issue. This is unnecessarily filling up watchlists, clogging up edit histories, and changing the last modified date of old archived pages, among other problems. If a page has 10 instances of the same exact lint error, there is no technical reason (besides poor coding) that it should take more than one edit to fix all 10 instances. I realize I'm probably being annoying by continuing to complain about this bot operator and the tasks he's carrying out, but it really is supremely annoying to me. —ScottyWong— 19:57, 2 March 2022 (UTC)
- The bot waves appear to be along the lines of "fix this batch of signatures" not "fix all instances of lint error:n" or even harder "fix all lint errors of all known types". I understand you don't like this, but I know the signature fixes can be a pain to deal with and multiple waves are often the best way to tackle them on an ad-hoc type basis. As far as the problems you identified, clogging watchlists is the most likely to get quick admin action - but as bot flags are being asserted it seems fairly trivial. I don't see any serious problems with last touch dates or a few extra revisions on any one page being a significant problem. Building a better bot is almost always universally welcomed, but stopping improvement while waiting for such to materialize usually isn't. — xaosflux Talk 22:58, 2 March 2022 (UTC)
- Exactly. We should not let perfect be the enemy of good. After all the wiki model works by the principle of incremental improvement. Before I submitted my first BRFA, I looked at all previous en.wp Lint fixing bot task attempts to see what is workable. The successful tasks involved a batch of specific patterns per BRFA, failed tasks involved bot operators trying to fix everything and giving up after realising the scale of problem. Running the bot in multiple waves using divide and conquer approach is only realistic way to reduce the backlog. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:28, 3 March 2022 (UTC)
- The bot waves appear to be along the lines of "fix this batch of signatures" not "fix all instances of lint error:n" or even harder "fix all lint errors of all known types". I understand you don't like this, but I know the signature fixes can be a pain to deal with and multiple waves are often the best way to tackle them on an ad-hoc type basis. As far as the problems you identified, clogging watchlists is the most likely to get quick admin action - but as bot flags are being asserted it seems fairly trivial. I don't see any serious problems with last touch dates or a few extra revisions on any one page being a significant problem. Building a better bot is almost always universally welcomed, but stopping improvement while waiting for such to materialize usually isn't. — xaosflux Talk 22:58, 2 March 2022 (UTC)
- @Xaosflux: It would be one thing if this bot was going through pages "in waves" and fixing different issues each time. That's not what's happening here. The bot is going to the same page to make multiple edits to fix different instances of the same issue. This is unnecessarily filling up watchlists, clogging up edit histories, and changing the last modified date of old archived pages, among other problems. If a page has 10 instances of the same exact lint error, there is no technical reason (besides poor coding) that it should take more than one edit to fix all 10 instances. I realize I'm probably being annoying by continuing to complain about this bot operator and the tasks he's carrying out, but it really is supremely annoying to me. —ScottyWong— 19:57, 2 March 2022 (UTC)
- I don't see any reason why it wouldn't be possible to fix these issues in an automated way without requiring multiple edits or manual intervention. Anyone with a decent understand of regular expressions can do this. This is not a complicated problem to solve for a competent coder. If this bot operator claims that he is not capable of fixing all the errors on a page in a single edit, or that his code is so inefficient that it produces "false positives" and requires him to manually supervise every edit, then I think we should find a different bot operator. I'll volunteer to take over the tasks if no one else wants to. FYI - the bot operator himself (not the bot) has now manually edited my old RfA page and has claimed to fix all of the lint errors on the page. —ScottyWong— 17:00, 2 March 2022 (UTC)
- Note: this isn't an endorsement or condemnation on the suitability of the task(s) in general (which is being discussed above) - just that I don't see this specific constraint as a good solution. — xaosflux Talk 16:46, 2 March 2022 (UTC)
- @Scottywong: I have manually fixed all 95 or so Lint errors in your RFA. Took me 15 minutes with script assisted editing. Do you really expect a bot to get them all in a single edit? It really isn't true that
there is no reason that they couldn't be made in a single edit, if the bot owner had any idea how to properly use regex and AWB
. I have more experience with bot fixing of Lint errors than anyone else, I am running the bot as efficiently as it is possible to do without leaving behind false positives. Even so, let me point out that the bot has fixed 1.4 million Lint errors since the time this thread was opened. RFA pages by their nature have a lot more signatures than most wiki pages, so the bot revisits it more than usual. This task is far more complicated than it looks. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 17:06, 2 March 2022 (UTC)- What you're saying makes no sense. I understand regex quite well, and I used to operate a bot on Wikipedia that fixed similar issues (it probably also fixed millions of individual issues, but it didn't need to make millions of edits to do so because it was properly coded). This is not more complicated than it looks, in fact, it's not complicated at all, it's simple regex find and replace. There is no technical reason why a properly coded bot cannot fix all of these lint issues in a page with a single edit, without human intervention or supervision. If your regex was properly designed, the risk of false positives would be extremely low. In the case of my RfA page, your bot made multiple edits to fix different instances of the exact same problem. In this edit, you fix six different instances of
<font color="something">Text</font>
. Then, a few weeks later, you make this edit to fix 14 more of the exact same issue. Why did your code miss these issues on the first pass (or, more specifically, on the first 8 passes)? —ScottyWong— 18:31, 2 March 2022 (UTC)- It was not fixing different instances of the exact same problem in your RFA page. Edits from task 2 was to fix Special:LintErrors/tidy-font-bug. Unlike the more numerous font tags inside a link, errors of this type already makes visual difference in the page. If you see all edits done in this task, the ones in line 109 and 134 would be difficult for a bot to apply correctly in a single edit with others in the same task, if it was targeting a general pattern. No matter how well designed any regexes are, they cannot catch all of these in a single edit. For RFAs and other pages with a lot of signatures, we can only reduce the number of bot edits by using a larger batch of replacements, which I am already doing. You should spend some time fixing Lint errors to get an understanding of the problem, you wouldn't be casually dismissing this as an easy task then. Please submit your own BRFA if you think it is so simple to fix all errors in a single edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:01, 3 March 2022 (UTC)
- Scottywong: Wikipedia:Village pump (technical)/Archive 70 is a sample page with approximately 180 Linter errors, nearly all of them font-tag-related. I encourage you to try to create a set of false-positive-free regular expressions that can fix every Linter font tag error on that page, and other VPT archive pages, with a single edit of each page. If you can do so, or even if you can reduce the count by 90% with a single edit, you will be a hero, and you may be able to help Wikipedia get out from under the burden of millions of Linter errors much more quickly. Here's a sample of what you'll be looking at:
- It was not fixing different instances of the exact same problem in your RFA page. Edits from task 2 was to fix Special:LintErrors/tidy-font-bug. Unlike the more numerous font tags inside a link, errors of this type already makes visual difference in the page. If you see all edits done in this task, the ones in line 109 and 134 would be difficult for a bot to apply correctly in a single edit with others in the same task, if it was targeting a general pattern. No matter how well designed any regexes are, they cannot catch all of these in a single edit. For RFAs and other pages with a lot of signatures, we can only reduce the number of bot edits by using a larger batch of replacements, which I am already doing. You should spend some time fixing Lint errors to get an understanding of the problem, you wouldn't be casually dismissing this as an easy task then. Please submit your own BRFA if you think it is so simple to fix all errors in a single edit. ಮಲ್ನಾಡಾಚ್ ಕೊಂಕ್ಣೊ (talk) 05:01, 3 March 2022 (UTC)
- What you're saying makes no sense. I understand regex quite well, and I used to operate a bot on Wikipedia that fixed similar issues (it probably also fixed millions of individual issues, but it didn't need to make millions of edits to do so because it was properly coded). This is not more complicated than it looks, in fact, it's not complicated at all, it's simple regex find and replace. There is no technical reason why a properly coded bot cannot fix all of these lint issues in a page with a single edit, without human intervention or supervision. If your regex was properly designed, the risk of false positives would be extremely low. In the case of my RfA page, your bot made multiple edits to fix different instances of the exact same problem. In this edit, you fix six different instances of
<font face="Century Gothic">[[User:Equazcion|<span style="color:#000080">'''Equazcion'''</span>]] <small>[[User talk:Equazcion|'''<sup>(<span style="color:#007BA7">talk</span>)</sup>''']]</small> 02:29, 16 Jan 2010 (UTC)</font> [[User:IP69.226.103.13|<font color="green"><strong>IP69.226.103.13</strong></font>]] | [[User talk:IP69.226.103.13|<font color="green"><strong>Talk about me.</strong></font>]] [[User:Terrillja|<font color="003300">Terrillja</font>]][[User Talk:Terrillja|<font color="black"><sub> talk</sub></font>]] [[User:December21st2012Freak|<font color="#922724">'''December21st2012Freak'''</font>]] <sup>[[user talk:December21st2012Freak|<font color="#008080">''Talk to me''</font>]]</sup> <font face="monospace" color="#004080">[[User:Flowerpotman|<span style="color:#004080; font-variant:small-caps">FlowerpotmaN</span>]]·([[User talk:Flowerpotman|t]])</font> <font face="Myriad Web">'''[[User:Mrschimpf|<span style="color:maroon">Nate</span>]]''' <span style="color:dark blue">•</span> <small>''([[User_talk:Mrschimpf|<span style="color:dodgerblue">chatter</span>]])''</small></font> <font face="Baskerville Old Face">[[User:the_ed17|<font color="800000">Ed</font>]] [[User talk:the_ed17|<font color="800000">(talk</font>]] • [[WP:OMT|<font color="800000">majestic titan)</font>]]</font> <font style="font-family: Vivaldi">[[User:Intelligentsium|<span style="color:#013220">Intelligent</span>]]'''[[User_talk:Intelligentsium|<span style="color:Black">sium</span>]]'''</font> <font color="blue"><sub>'''[[User_talk:Noetica |⊥]]'''</sub><sup>¡ɐɔıʇǝo</sup><big>N</big><small>oetica!</small></font><sup>[[User_talk:Noetica |T]]</sup> [[User:Ruhrfisch|Ruhrfisch]] '''[[User talk:Ruhrfisch|<sub><font color="green">><></font></sub><small>°</small><sup><small>°</small></sup>]]''' <font color="#A20846">╟─[[User:TreasuryTag|Treasury]][[User talk:TreasuryTag|Tag]]►[[Special:Contributions/TreasuryTag|<span style="cursor:help;">directorate</span>]]─╢</font> [[User:Screwball23|<font color="0000EE">Sc</font><font color="4169E1">r</font><font color="00B2EE">ew</font><font color="FF6600">ba</font><font color="FFFF00">ll</font><font color="9400D3">23</font>]] [[User talk:Screwball23|talk]] <font color="32CD32">''[[User:Jéské Couriano|Jeremy]]''</font> <font color="4682B4"><sup>([[User talk:Jéské Couriano|v^_^v]] [[Special:Contributions/Jéské Couriano|Boribori!]])</sup></font> [[User:Masem|M<font size="-3">ASEM</font>]] ([[User Talk:Masem|t]]) <span style="border:2px solid black;background:black;-webkit-border-radius:16px;-moz-border-radius:16px;color:white;width:20px;height:20px">([[user talk:Flyingidiot|<font color="white">ƒ''î''</font>]])</span><span style="position:relative;top:12px;left:-20px;">[[user:flyingidiot|<font color="black">»</font>]]</span> '''[[User:Floydian|<font color="#5A5AC5">ʄɭoʏɗiaɲ</font>]]''' <sup>[[User_talk:Floydian|<font color="#3AAA3A">τ</font>]]</sup> <sub>[[Special:Contributions/Floydian|<font color="#3AAA3A">¢</font>]]</sub>
- I omitted some easy ones. Note that the page should look the same (or better) when you are done compared to how it looks now. There are more types of Linter errors on that page, but if you can fix the font tags in signatures all at once in an automated or semi-automated fashion, that would be outstanding. I picked the above page at semi-random, knowing that VPT pages tend to have large numbers of interesting signatures; there are thousands of discussion pages with this level of complexity. – Jonesey95 (talk) 05:56, 3 March 2022 (UTC)
Inactive bots - February 2022
It looks like we have a few inactive bots by definition (both account and operator haven't edited in 2 years) according to the Majavah report. (Majavah, if there's a listed operator, could you get their contribs or something and add that to the report too? Maybe even a column to indicate mutual activity e.g. {{cross}} where neither are and a {{check}} where both are. This was painful. :)
Pending removals
- The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- Detroiterbot run by MJCdetroit
- FeedBot run by Feedintm
- GrashoofdBot run by Grashoofd
- DpmukBOT run by Dpmuk
- Image-req-proj-bot run by Traveler100
- FPBot run by FunPika
- PeerReviewBot run by CBM
- ExpertIdeasBot run by I.yeckehzaare
- タチコマ robot run by とある白い猫
- Theo's Little Bot run by Theopolisme
- Wiki Feed Bot run by Fako85
- Fz29bot run by Fz-29
- KadaneBot run by Kadane
- Required notifications
All operators notified on their talk pages. — xaosflux Talk 14:22, 1 February 2022 (UTC)
Forward outlook
And a few more in a month or so:
- PDFbot run by Dispenser
- Makecat-bot run by Makecat
- TAP Bot run by Thine Antique Pen
- BG19bot run by Bgwhite
- ProteinBoxBot (blocked also) run by Andrew Su and Julialturner
- TheMagikBOT run by TheMagikCow
Long inactives but not outside of policy
We should consider asking the active bot operators whether all the rest of the bot accounts that haven't edited since a while ago (pick a number, 5 years seems fine generally) still need a bot flag. These particularly stand out...:
- SprinterBot is a redirect to RscprinterBot
- CeraBot is a redirect to Cerabot~enwiki
- Citation bot 1, Citation bot 2, Citation bot 3, Citation bot 4 subsumed? by Citation bot
- Helpful Pixie Bot and Femto Bot (operator presently blocked)
- Legobot II (blocked)
- TohaomgBot (blocked)
- KasparBot (blocked)
- FRadical Bot (blocked, and the operator CU-blocked)
- Cydebot (blocked)
Discussion
These ones seem like low-hanging fruit, but I think the rest should be queried as well. Izno (talk) 05:30, 1 February 2022 (UTC)
- This was actually the subject of a proposal I was about to make -- I think it might be useful for a bot task that monitored the on-wiki activity of bot maintainers. Either "last edit date", or "edits in last 180 days", or something along those lines -- this could either go on a central status page or on the infobox of a bot's userpage. Oftentimes, the writers and maintainers of bots will go inactive for a long time, which makes it a little discouraging to report bugs / suggest new features, as it's likely they will fall into a black hole. Sjp×g 07:45, 1 February 2022 (UTC)
- This seems like a good idea, but relies on something we don't actually have: a programmatic listing of all bot:operator relationships! I'd actually like to see a page/table for this. (which could then also possible be bot-updated with the last edit/action dates of each periodically) - but bootstrapping it takes some effort. — xaosflux Talk 10:53, 1 February 2022 (UTC)
- Picking a couple recently-approved BRFAs at random, it looks like "Operator:" is formatted relatively consistently between 2013 and now -- I might be able to scrape through these and come up with something. Maybe something that had to be manually verified, but it'd be something. jp×g 11:36, 1 February 2022 (UTC)
- Hell, even this one from 2007 has the operator noted the same way. jp×g 11:38, 1 February 2022 (UTC)
- Here's an idea, your idea might be better though: There's bot userboxes that are posted on the owner's page and indicate the operated bot as one of the parameters. Example:
{{Template:User bot owner|Acebot}}
. There's probably userboxes or templates for the bot's page too. These templates also populate categories such as Category:Wikipedia bot operators. –Novem Linguae (talk) 11:43, 1 February 2022 (UTC)
- Here's an idea, your idea might be better though: There's bot userboxes that are posted on the owner's page and indicate the operated bot as one of the parameters. Example:
- Hell, even this one from 2007 has the operator noted the same way. jp×g 11:38, 1 February 2022 (UTC)
- It's only somewhat related but I'd like to see a list of active approved bot tasks. There are lots of pages in Category:Approved Wikipedia bot requests for approval but this includes tasks that finished, op went inactive, are obsolete, consensus changed, bot went down, etc...
- A useful way would be for the closing BAG to add a characteristic of a task (eg the task # in edit/action summaries) to Wikipedia:Bot activity monitor going forward, which would achieve that. Then maybe old tasks can be added on an ad-hoc basis. ProcrastinatingReader (talk) 12:40, 1 February 2022 (UTC)
- It would likely be a "going forward" for now, with potential retroactive categorisation later, but we could always base it on the "Edit period(s)" section of the task - if it's a One-Time-Run, sub-categorise it as such, otherwise put it into some sort of "ongoing" subcat. Primefac (talk) 20:53, 1 February 2022 (UTC)
- Picking a couple recently-approved BRFAs at random, it looks like "Operator:" is formatted relatively consistently between 2013 and now -- I might be able to scrape through these and come up with something. Maybe something that had to be manually verified, but it'd be something. jp×g 11:36, 1 February 2022 (UTC)
- This seems like a good idea, but relies on something we don't actually have: a programmatic listing of all bot:operator relationships! I'd actually like to see a page/table for this. (which could then also possible be bot-updated with the last edit/action dates of each periodically) - but bootstrapping it takes some effort. — xaosflux Talk 10:53, 1 February 2022 (UTC)
- I left operator notices for the first batch, if no response in a week we will deflag them and mark as {{retired|bot=yes}}. — xaosflux Talk 10:47, 1 February 2022 (UTC)
- xaosflux: Regarding SprinterBot, I believe there is little realistic prospect of resurrecting this bot's tasks, so feel free to deflag the account. Rcsprinter123 (comment) 17:32, 1 February 2022 (UTC)
- I added a column to User:MajavahBot/Bot status report for the last recorded activity (any edit or publicly logged action) for any listed operator. Majavah (talk!) 11:05, 1 February 2022 (UTC)
Question about making edits for semi-automation
I recently posted on the help desk (Wikipedia:Help desk#Confusion on bot policy regarding semi-automated editing) where I explained my confusion on how one should approach/implement semi-automated edits. You may read the section I linked if you wish to. In a nutshell, I would like to semi-automate edits from a script I am running. I do not wish to create a bot account and would like to have the edits be on my main account (once I get more comfortable with the API I may consider expanding the script and may look into bot creation).
My requirements seem simple. However, according to mw:API:Edit I require a CSRF token, but apparently I can't just lazily do mw.user.tokens.get( 'csrfToken' )
and call it a day cause that does not work. As the code samples demonstrate, there is a 4-step process. Fine, but I'm confused about the second step (POST with lgname+lgpassword+lgtoken), because those parameters are to be obtained from Special:BotPasswords which states Make sure you are logged into your bot's account, and not the owner's account, before creating a bot password. , and as per WP:SEMIAUTOMATED, A bot account should not be used for assisted editing, unless the task has been through a BRFA.. So, I should not use a bot account, but I still need a 'bot password' from an account which it seems to imply cannot be my own?
Side note: MediaWiki JS sample code works fine. What I do not like about this however is that it needs to be done in a JS console on the wiki. I'd much prefer to have a script running in a terminal (ie. using NodeJS/Python).
Side side note: Psst while I have you nerds here, can someone point me to some docs explaining how I can achieve the functionality reFill achieves by sending you to a page that shows a diff with some changes made without making those changes. My script updates statistics, since I am semi-automating it, I would love for the script to run, and then show a diff so that I can visually confirm what it has done and then just publish the changes. Satricious (talk) 16:09, 20 February 2022 (UTC)
- I think that I do something vaguely similar with awb. IANA periodically update their language-subtag-registry file. My AWB script reads that file and then updates Module:Language/data/iana languages, Module:Language/data/iana scripts, Module:Language/data/iana regions, Module:Language/data/iana variants, Module:Language/data/iana suppressed scripts, and Module:Language/data/ISO 639-1 from the subtag registry. Because it is not fully automated, I see a diff of the change before I manually save each module's update – all of this within awb. So, you might want to consider that as an option.
- —Trappist the monk (talk) 16:44, 20 February 2022 (UTC)
- Thanks for your input! I had no idea AWB was capable of that, I now see that as an option worth exploring. While that might actually be the best solution to my problem. I'd still love to hear from others about how I could achieve this using the API. Just because I'd prefer to do this myself without the use of a tool which I feel might restrict me. But you've definitely convinced me to try out AWB at some point in the future :p Satricious (talk) 17:25, 20 February 2022 (UTC)
- Special:BotPasswords is to be used while logged into the account from which you're planning to do the edits. That's usually a dedicated bot account for large-scale edit operations, but in your case ("semi-automation"), it would be just your own account. And by the way, you probably want to use a bot framework (see mw:API:Client code#JavaScript) which would avoid having to write boilerplate code for logging in, maintaining the session, handling tokens, etc which are both tedious and error-prone. – SD0001 (talk) 19:01, 20 February 2022 (UTC)
- Thanks for the clarification! I think my question has been answered now. I'd still like to know what API calls to make to show diffs before making changes (as reFill does). Do you how I might achieve this? Satricious (talk) 05:08, 21 February 2022 (UTC)
- Like this (taken from User:Novem Linguae/Scripts/DraftCleaner.js):― Qwerfjkltalk 07:15, 21 February 2022 (UTC)
function goToShowChangesScreen(titleWithNamespaceAndUnderscores, wikicode, editSummary) { let titleEncoded = encodeURIComponent(titleWithNamespaceAndUnderscores); let wgServer = mw.config.get('wgServer'); let wgScriptPath = mw.config.get('wgScriptPath'); let baseURL = wgServer + wgScriptPath + '/'; // https://stackoverflow.com/a/12464290/3480193 $(`<form action="${baseURL}index.php?title=${titleEncoded}&action=submit" method="POST"/>`) .append($('<input type="hidden" name="wpTextbox1">').val(wikicode)) .append($('<input type="hidden" name="wpSummary">').val(editSummary)) .append($('<input type="hidden" name="mode">').val('preview')) .append($('<input type="hidden" name="wpDiff">').val('Show changes')) .append($('<input type="hidden" name="wpUltimateParam">').val('1')) .appendTo($(document.body)) //it has to be added somewhere into the <body> .submit(); }
- @Qwerfjkl: That works! Thanks a lot, I really appreciate it. Satricious (talk) 07:32, 21 February 2022 (UTC)
- Like this (taken from User:Novem Linguae/Scripts/DraftCleaner.js):
- Thanks for the clarification! I think my question has been answered now. I'd still like to know what API calls to make to show diffs before making changes (as reFill does). Do you how I might achieve this? Satricious (talk) 05:08, 21 February 2022 (UTC)
- This question has been answered. Satricious (talk) 07:32, 21 February 2022 (UTC)