Jump to content

Wikipedia:Requests for comment/Archive.is RFC 4

From Wikipedia, the free encyclopedia
The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion. A summary of the conclusions reached follows.
There is a consensus to remove archive.is from the spam blacklist. Most support votes point out both the fact that this is a useful site for referencing, it is often the only source, and it can always be re-blacklisted if future issues occur, and spamming can be dealt with. While the opposes mention issues with the service and the original issues resulting it being blacklisted, but on weight of arguments and numbers of votes, there appears to be a consensus that we should at least try un-blacklisting archive.is, and seeing what happens. Mdann52 (talk) 14:41, 22 June 2016 (UTC)[reply]

Background

[edit]

Archive.is is an archiving service similar to sites like Webcite and the Wayback Machine. archive.is is on the en.wikipedia.org Wikipedia:Spam-blacklisting. This prevents it's use as a reference source URL.

Based on the questions of consensus raised within talk Spam-blacklist, the community should discuss and vote on whether the previous consensus, established in Wikipedia:Archive.is RFC 3, should remain in force.

Effect of Blacklisting ("Blocking")

[edit]

Article source links that refer to archive.is are currently disallowed. Any edit incorporating a new URL that starts with https://archive.is/, will result in the editor being returned to their draft changes (Edit view) with a warning notice which reads (in part)

Your edit was not saved because it contains a new external link to a site registered on Wikipedia's blacklist.
To save your changes now, you must go back and remove the blocked link (shown below), and then save.
...

The following link has triggered a protection filter: archive.is

(Here is a picture of the entire warning notice). The warning notice prevents the change referencing the archive.is URL from being saved.

Previous RFCs

[edit]

For review, there have been three previous RFC on this topic.

The User is invited to consider them before voting.

Instructions

[edit]

To add your vote,

  1. add a numbered entry under appropriate section like # short comment. --~~~~, or simply # --~~~~.
  2. (optional) further discussion can be added under the Discussion section.

For example, within the Support Vote section, an entry might read like

  1. archive.is is the bee's knees. --User1 (talk) 00:00, 1 May 2016 (UTC)

or, within the Oppose Vote section

  1. archive.is is the first horse of the apocalypse. --User2 (talk) 00:00, 1 May 2016 (UTC)

Voting

[edit]
[edit]

Support Vote

[edit]
  1. archive.is is desperately needed because sources are rotting, reasons for blocking are unfounded. --JamesThomasMoon1979 05:23, 18 May 2016 (UTC)[reply]
    Citation needed Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  2. Support Per Jtmoon. archive.is is the best archive we have. Archives pages better than any other site, and archives more pages. Was blocked because an unauthorised bot inserted archive URLs pointing to archive.is. Similar to what the authorised bot CyberBot II now does for archive.org. There was never any concrete evidence of involvement of archive.is with the unauthorised bot (or CyberBot II with archive.org). No reason to believe there would be any problem if the site was removed from the blacklist. Hawkeye7 (talk) 06:29, 18 May 2016 (UTC)[reply]
    WP:PERX Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
    Supporting the change in wording. The whole point is to enable the use of archive.is. Hawkeye7 (talk) 22:26, 20 May 2016 (UTC)[reply]
  3. Indeed, other archiving services sometimes fail to archive parts of sites (usually "Web 2.0", but WebCite fails on even simple parts like the "PREMIERE" indicator in Zap2it's TV schedule (the episode guide is messed up for the series in question, so that's not an option)) when archive.is manages to do so. There is no evidence for a "botnet", or it being affiliated with the site. For all we know, maybe they got what they wanted by getting archive.is blacklisted. nyuszika7h (talk) 06:45, 18 May 2016 (UTC)[reply]
    So you are electing to willfully ignore the numerous IP editors making the same types of actions that Rotlink bot for 5 edits and then hopping to annother IP? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
  4. Support - Per Jtmoon and Hawkeye7. It seems counter-productive blacklisting what is obviously a useful tool. Sometimes it is the ONLY archive of a source available. Nothing sinister about it that I can see, so surely its better than the alternative, which is sometimes nothing? Anotherclown (talk) 10:37, 18 May 2016 (UTC)[reply]
  5. Support - No serious reason for continued blocking. Various fears about archive.is have not materialized over 2.5 year period, the site offers useful service, it is not banned on other language wikis, and existing older links to it have not caused any serious damage.--Staberinde (talk) 15:12, 18 May 2016 (UTC)[reply]
    You mean other than archive.is being caught circumventing our ban again in February of this year? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
    When I looked at those discussions earlier it look like the suspected sock master was flagging existing archive.is/today link for example this one is an edit that marks a archive.today link as dead. Is there any proof since the last RFC? PaleAqua (talk) 02:56, 27 May 2016 (UTC)[reply]
  6. Support — The blacklist was a very poor solution to a pretty minor problem. The site itself is not the issue, but rather how a few users chose to override bot-policy. It should not remain. Carl Fredik 💌 📧 19:35, 19 May 2016 (UTC)[reply]
    And then after the ban, the bot (or sockpuppet) actions still continuing in defiance of the ban? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
    support Problems with archive.is have not materialized. That said, I think that it is naive to think that the problems people had with archive.is weren't actual problems caused by folks associated with archive.is. I was the closer of the first RfC. Hobit (talk) 16:00, 21 May 2016 (UTC)[reply]
    You mean other than they getting caught bulk inserting themselves in February of this year? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
    Cite? I vaguely recall it, but didn't think we had any solid reason to think it was them. Hobit (talk) 00:44, 2 June 2016 (UTC)[reply]
    @Hobit: Wikipedia:Administrators'_noticeboard/Archive279#Sockmaster_flagging_archive.today_links_en_masse, first an account, then many IPs (from all over the world), editing completely cross-wiki. This is a similar way of working as originally with User:Rotlink and the bot account (first two accounts, then many IPs, operating from all over the world and operating on all wikis), and quite close to a situation in April 2015 (no accounts, just 4 IPs from 4 different countries). In all these cases, the editors are SPA for archive.is-related issues. (and in the last few days there is a series of IPs from different countries who are SPA on issues related to this RfC active - seems similar). I guess you could argue that they are all different people, and that these IPs of this week are not operated by the same editor that was active in February 2016, or April 2015 or in 2013 in the original case, and you can argue that it is not someone who is financially benefiting from having the incoming links (though, archive.is is privately funded - if they don't get linked from Wikipedia who is paying their traffic .. still, it might just be a friend or a fan), in the end it is still the case that someone who has access to hundreds (if not thousands) of IPs is editing here in total neglect of our pillars (and if it is someone who is benefiting financially from being linked from Wikipedia, in violation of MediaWiki's Terms of Use). --Dirk Beetstra T C 09:06, 2 June 2016 (UTC)[reply]
    Striking support, moving to oppose. Hobit (talk) 02:08, 3 June 2016 (UTC)[reply]
  7. Support. Archive.is is a legitimate archival service. The unauthorized bot incident was sad; the community here was unsettled. But the response to it was akin to terminating all relations with a country just because a tourist from that country broke a law. It is a good service. Let's put it to use. Best regards, Codename Lisa (talk) 10:50, 22 May 2016 (UTC)[reply]
  8. Support. BTW, I'm sick and tired of the carpet bombing by the malicious 'Blacklisted link' template flagging articles for no good reason. Poeticbent talk 06:43, 23 May 2016 (UTC)[reply]
  9. Support – Archive.is is a legitimate archiving service, unlike Internet Archive which retroactively deletes pages based on changes to robots.txt after archiving, and WebCite which sometimes does not archive pages correctly. Archive.is is also able to create archives from Google caches. I have run into many occasions when repairing dead links where Archive.is is the only archiving service available. The benefits of adding Archive.is back outweigh the minor disadvantages. SSTflyer 07:48, 23 May 2016 (UTC)[reply]
  10. Support, in large part because the whole thing was a big overreaction, as Codename Lisa notes. Plus, it's a benefit if they don't follow archive.org's policy of dumping pages based on later robots.txt changes. Nyttend (talk) 12:51, 23 May 2016 (UTC)[reply]
  11. Support - if it's being used for spammy links, those need to be dealt with individually. The service is way too useful to encyclopedic purposes - David Gerard (talk) 13:21, 23 May 2016 (UTC)[reply]
  12. Support—as noted above, the original decision was an overreaction, the site is useful, and we can deal with case-by-case situations of spamminess. Imzadi 1979  13:28, 23 May 2016 (UTC)[reply]
  13. Weak support for last resort There's no question that archive.is handles more scenarios as the other haven't adopted to newer web technologies. I am concerned about the lack of respect that archive.is shows for internet standards, particular ignoring robots.txt. Content producers have the right to control what they produce, and often use that file to control it. archive.is, in part, can archive more links because it ignores the expressed wishes of content producers that use that file to control access. And yes, I read their FAQ. My preference would be to encourage archive.is only as a last resort archival service.Ravensfire (talk) 14:17, 23 May 2016 (UTC)[reply]
    @Ravensfire: You forgot about the case where a domain expires and/or is taken over. Parked domains usually have a robots.txt that prevents crawling, and if it's taken over, the attacker can do so manually. Archive.is has a manual report process if one would like an archived copy to be taken down. There is only limited automated archiving (from Wikipedia recent changes, I don't know of anything else), other than that it's manual so it doesn't really make sense to apply robots.txt. Anyway, I don't mind a preference for other archives when not inferior. nyuszika7h (talk) 14:30, 23 May 2016 (UTC)[reply]
  14. Support per all previous supports. (Note I am a long-time user who now prefers to edit as an IP. I don't see anything about IP's not being allowed to weigh in here.) — Preceding unsigned comment added by 173.17.170.8 (talk) 17:08, 23 May 2016
    IPs are not debarred from participating in RfCs; however, you are still expected to WP:SIGN your posts. --Redrose64 (talk) 19:49, 23 May 2016 (UTC)[reply]
  15. Support Any issues can be sorted locally. Only in death does duty end (talk) 19:51, 23 May 2016 (UTC)[reply]
  16. Support like any other site on the web, spam links can be removed case by case. The bot stuff is in the past now. Pinguinn 🐧 20:14, 23 May 2016 (UTC)[reply]
  17. Support per virtually everyone. Appears to be a useful way to prevent link rot. Regards, James (talk/contribs) 22:18, 23 May 2016 (UTC)[reply]
  18. Support— as stated above it's useful and links has to be dealth with individually. -- Hakan·IST 07:04, 24 May 2016 (UTC)[reply]
  19. Support per all above me.--The Traditionalist (talk) 16:14, 24 May 2016 (UTC)[reply]
  20. Support but only where other archival services are insufficient. If there's a worry about permanence, then create a bot that submits all archive.is archives used on wiki to the Internet Archive and replaces them. I don't buy the botnet argument. I think it's just run-of-the-mill spamming, either by a resourceful person or via social engineering. That said, I do agree that archive.is shouldn't be a preferred archival service, and in the vast majority of cases editors should be firmly encouraged to use the Internet Archive to take snapshots instead. —/Mendaliv//Δ's/ 21:47, 24 May 2016 (UTC)[reply]
    Mendaliv - that is the current practice (a slight soft reading of the previous RfC): links can not be used (as they are blacklisted), except when it can be shown that there is no replacement, in which case specific links can be whitelisted. If that is what you mean, then the links could as well stay on the blacklist. --Dirk Beetstra T C 03:13, 25 May 2016 (UTC)[reply]
    I support flipping the burden, then. Demonstrate that the links are available elsewhere, are needless, or are otherwise forbidden by policy and they can be removed. The bottom line is that there's no justification for keeping it on the blacklist. If it starts getting uncontrollably spammed, then we can consider re-adding it. As I say above, and have said elsewhere, there's no clear evidence of use of a botnet, no evidence to connect the spamming to the site operators, and no evidence that disfavoring its use as an archival service merits this extraordinary and unusual manner of preventing its use. And let's just assume for a moment that there is a real risk of the site going rogue: Then use Internet Archive to snapshot the archive.is page and link to Internet Archive instead (presuming there's some reason why we aren't just using IA to archive the live page). No risk of going rogue anymore. —/Mendaliv//Δ's/ 03:47, 25 May 2016 (UTC)[reply]
  21. Support, but only as an option of last resort. Kaldari (talk) 22:47, 24 May 2016 (UTC)[reply]
    What is this supposed to mean, Kaldari? If all else fails we remove it from the blacklist, or that the links should only be used if there is no alternative archive (the latter being the current practice)? --Dirk Beetstra T C 06:58, 25 May 2016 (UTC)[reply]
    That the links should only be used if there is no alternative archive. Kaldari (talk) 16:56, 26 May 2016 (UTC)[reply]
  22. Support, Not only because of the above ( especially agree with Codename Lisa ) but for similar reasons that I updated my position in RFC.3. In regards to possible continuing abuse is it possible have an edit filter tag post that add the links? PaleAqua (talk) 05:25, 25 May 2016 (UTC)[reply]
    Since there seems to be several questions about it. I am already aware of the February issues that have been mentioned several times but don't see proof tying the owner of the site to them as well as stuff like Wikipedia:Requests for comment/Archive.is RFC/Rotlink email attempt. This is one of those issues where there is some strange duckness and antiduckness combined with a lot of what seemed group think in either RfCs. To me unfortunately a lot of the origins of this seems to boil down to someone saying something akin to "I am owner / friend of the owner and did this because XYZ" and others saying similar "They are not but I have washed my hands of this", etc. To be honest I'm not sure this is something that can be proven one way or the other, so the real questions are what are the benefits and cost to Wikipedia by continuing to black list the site or allow links to it. There are places where archive.is does appear to be a better archiver;for example in cases where a domain name has expired and the new domain name owner has a robot text file which causes the previous pages to be hidden in the way back machine. I still think that ultimately a better solution would be an ISBN type solution for archive links, possibly hooked to wikidata and that shifts some of the issues related to archive.is and potential advertisal concerns which is one of the concerns brought up in RfC.3. PaleAqua (talk) 06:39, 29 May 2016 (UTC)[reply]
  23. Support, it is a useful archive tool that preserves the formatting of pages more faithfully than Webcite and the Wayback machine. The blacklisting was an overreaction. -- Ham105 (talk) 13:46, 25 May 2016 (UTC)[reply]
  24. Support because as an archive website that is used by many other websites, archive.is doesn't seem like one of these spam links that comprise most of the blacklist. I don't even fully understand why the site was blocked at all, though. Kylo, Rey, & Finn Consortium (formerly epicgenius) (talk) 19:56, 26 May 2016 (UTC)[reply]
  25. Support. All arguments I could make have already been made in the comments above.  — Scott talk 14:33, 27 May 2016 (UTC)[reply]
  26. Support It's a useful archiving service, for the reasons people have explained, and we shouldn't continue to block it over the bot incidents from the past. (That, and having 10,000 articles with a blacklist tag due to useful and hard-to-replace links both creates a massive backlog and invites trouble for reasons I won't explain per WP:BEANS.) TheCatalyst31 ReactionCreation 22:08, 27 May 2016 (UTC)[reply]
  27. Support I've never understood why we blocked it in the first place. If someone is spamming links to a useful service, block the user. Or maybe acknowledge that we have a severe need for archiving around here. --PresN 23:05, 28 May 2016 (UTC)[reply]
  28. Support, because I hate seeing those blacklist warning boxes at the top of articles that have a perfectly legitimate archive.is link in them. -- turdastalk 12:13, 30 May 2016 (UTC)[reply]
  29. Conditional support. That is, when the archive.is link was added by a human or there is no other archiving website that contains the page in question. Archive.is links added by automated bots or scripts should be replaced by a Wayback Machine or WebCite link if possible. Other than that, I see no reason to blacklist archive.is. --Proud User (talk) 13:15, 30 May 2016 (UTC)[reply]
  30. Agree with the above. An overreaction to a spammer. Jenks24 (talk) 18:54, 30 May 2016 (UTC)[reply]
  31. Strong support and support overturning consensus disfavoring archive.is links. As an encyclopedia that cites heavily to online resources, it's imperative that they remain verifiable over time, especially when they are unlikely to be available in print. As others have pointed out, the Wayback Machine chokes on certain sites and refuses to archive others. Worse, I recently learned that it actually drops previously archived pages in response to changed robots.txt files. Rebbing 00:19, 31 May 2016 (UTC)[reply]
  32. Strong support and this case highlights one of the many reasons why the spam blacklist is so awful in and of itself. Mdrnpndr (talk) 02:42, 31 May 2016 (UTC)[reply]
  33. Strong support as the initiator of the last RFC to reinstate Archive.is. Darkwarriorblake / SEXY ACTION TALK PAGE! 19:31, 31 May 2016 (UTC)[reply]
  34. Support - The whole saga of blacklisting archive.is has caused far more problems than it ever solved. Personally I prefer to use archive.org, but there is little point in attempting to change all of the archive.is links manually when there are more important things to do.--♦IanMacM♦ (talk to me) 10:56, 1 June 2016 (UTC)[reply]
  35. Support - we need the webarchiving badly and unfortunately Wayback Machine and webcite proved to be unreliable in the past. At the moment Archive.is is often the only game in town and unless we do something with it we need to face it. aegis maelstrom δ 16:51, 1 June 2016 (UTC)[reply]
  36. Strong support. As an editor using workarounds like this I am sick and tired of this block interfering with efforts like these, making it harder to improve the encyclopedia. The grudge-holders' and slanderers' arguments mostly don't check out when delved into. As I recall, the bot, since appropriate (IMO) testing (based on other bots being approved under the same circumstances) was successful, should have been approved; instead a bad decision led to more and more bad, authoritarian decisions (including some blocks and bans) all compounding the original error, even though it was well argued that preceding decisions had been in error. Ad hominem attacks are inappropriate and deserve sanction. The trumped-up charges are reminiscent of those of the worst authoritarians in history. Glad this looks like it'll finally be overturned. Consensus has clearly changed. --Elvey(tc) 18:06, 1 June 2016 (UTC)[reply]
    Actually, I'm not aware of any of the arguments against archive.is that doesn't check out. Care to indicate exactly how you delved into any of them and found them lacking? Or do you just want to have your support vote highlight your edit history as an editor that intentionally bypasses the spam blacklist when you find it convenient?—Kww(talk) 21:31, 1 June 2016 (UTC)[reply]
    No, because one can lead a horse to water, and there's plenty of water in the RFCs. I read. There's no ban on referring to the archive as I've done; it gives the archive no link juice. If someone can prove otherwise, I'm all ears. I recall otherwise. --Elvey(tc) 01:08, 2 June 2016 (UTC)[reply]
    I was originally all for continuing the ban in the previous RfC before I actually started to research the evidence. See some of my discussions in RfC.3 for details. Proof of the person or people behind the botnet being the owners of archive.is is all one sided. Consider if someone starts attacking wikipedia with a bot and when questioned say that they are Jimbo it would be clearly that they are lying. In this case all the emails claiming to be doing it as the owner or on behalf of the owner go through the botnet operator's email. Taking them at face value would be convincing etc. there have been discussions in the past through other channels to archive.is that disavow Rotlink's claims. 2 Many of the botnet activities since have done stuff like break archive.is links or add links to other archives. There clearly is some gaming going on but unless it's designed to be a mind game ( and even then ) it's hard to be certain as there are many equally likely possibilities that would happen from the botnet being an attack on archive.is itself. I'm not sure that the evidence is strong enough to clearly say that the botnet attacks etc. are the owners of archive.is. Nor to be honest is it that clear they are not. I can other think of simpler explanations that would explain some of the stuff we've seen and possibly the NOINDEX concerns. I'm not saying that I'm certain the other way either, but the evidence chain is not complete and has wholes when looked at in more details. To my mind the issues boil down is archive.is potentially useful to us. If not then we can err on the side of caution and block. Unfortunately there are clear cases where archive.is is unfortunately the best archive choice. I've seen service archive.org links expire after change of owner and new robot.txt that causes the old pages to disappear. To be honest in many of those cases I just mark deadlink and let if for someone else worry about it. I still wish though that a better approach to page archival was considered instead of just focusing on individual sites. PaleAqua (talk) 02:54, 2 June 2016 (UTC)[reply]
  37. Support - It is a useful archive for many sites that are no longer available and may be very difficult to replace. In any case the aggressive bot that kept sticking black-listed site messages because of some innocuous sites on archive.is is more offensive. Hzh (talk) 23:10, 1 June 2016 (UTC)[reply]
  38. Support - Blacklisting this site has never made the slightest sense to me. Obviously, we know and love Archive.org, but there is no reason in the world that this affinity should extend to banning an alternate site. If there are spambot problems, fix them without resorting to the nuclear option of a complete blacklisting. Carrite (talk) 05:22, 2 June 2016 (UTC)[reply]
    My concern is that if the folks running archive.is are the ones running the spambots, which seems likely, I worry about why. Maybe it's to help Wikipedia, but that seems unlikely. If they, for example, start using it to distribute malware (which isn't impossible, they've been associated with botnet attacks already), that is a huge problem for us. Hobit (talk) 02:12, 3 June 2016 (UTC)[reply]
  39. Support - As stated below sometimes crucial sources are trapped and are only archived in one place. I don't buy the argument that if it's only archived in one place it's insignificant, based upon my experience with one link about Lycee Seijo being closed. I requested whitelisting for that link anyway, but I think I will let my opinion known here in this RFC. WhisperToMe (talk) 06:18, 2 June 2016 (UTC)[reply]
  40. Support – The service is useful, and the block hurts Wikipedia. - Nellis 20:17, 3 June 2016 (UTC)[reply]
  41. Support – I dislike spam, but truly despise dead reference links. In an ideal world, WMF would be putting some of its considerable resources into developing some new and improved archive service, but until that happens we have to work with the options available to us. If a reliable online source is deadlinked, and Internet Archive and WebCite both lack versions we can use, then we should have the option of adding a link from this archive service if doing so preserves content. I'd prefer using either of the two alternatives myself, but they are not perfect, between the risk of losing IA links to robots.txt and the need to proactively create WebCite links. If the archive.is people try spamming Wikipedia, all we have to do is put the site back on the blacklist. It's a risk that I feel is worth taking, at least until a better option presents itself. Giants2008 (Talk) 23:33, 3 June 2016 (UTC)[reply]
  42. 'Support' - Its usefulness appears to outweigh the concerns, which to the best of my knowledge have not resulted any concrete detriment to Wikipedia. --Dtaylor1984 (talk) 15:36, 4 June 2016 (UTC)[reply]
  43. Support sometimes archive.is is the only available archiving site. --Jakob (talk) aka Jakec 18:52, 5 June 2016 (UTC)[reply]
  44. Support. While issues may exist with certain parties adding archive links to Wikipedia articles, that should not affect legitimate editors from utilizing a resource. Archive.is is by far the most robust and speediest archive site I'm aware of. Certainly, Internet Archive (and even Webcitation) has a strong history and reputation and I would absolutely not suggest that those archive links be replaced by Archive.is links, but please do not remove options from editors. Both of the other predominant archive sites occasionally suffer serious issues with properly reflecting the actual presentation and content of archived pages, which I feel is a far more serious issues than the negatives that have been presented elsewhere. Huntster (t @ c) 17:37, 10 June 2016 (UTC)[reply]
  45. Support: The threat of linkrot and information loss is more of a pressing threat than the alleged dishonesties of those running archive.is which have yet to be proven with indisputable evidence, and are built upon from speculation. I want practical results, and archive.is gives me practical results; I have no time for the politicking behind all of this. Of the archiving services available, web.archive.org, webcitation.org and archive.is all have their different weaknesses and limitations, so utilising the strengths of all three would be most beneficial for encyclopedia-building. If one service does not work for a particular URL to be archived, there are other alternatives to choose from. Reducing the freedom of choice between archival services is counter-productive in that it limits what article contributors can do in cases where a source needs protection from linkrot. --benlisquareTCE 05:07, 12 June 2016 (UTC)[reply]
  46. Weak support, and if the spam bots return, block it immediately again. --Joy [shallot] (talk) 19:31, 13 June 2016 (UTC)[reply]
  47. Support  TUXLIE  15:57, 14 June 2016 (UTC)[reply]
  48. Strong Support, since the reasons for blocking are unfounded and I have yet to see archive.is actually used for malicious purposes. I would prefer the wayback machine, if not for its' using the URL's current robots.txt instead of the contents of it at time of archiving. (note that rumour has it among other web archivers - I can only cite personal discussions on IRC - there is a backdoor around this robots.txt block, but unfortunately I don't know it and very few people seem to. Also note that archive.org does apparently keep the data in storage, but merely blocks the public from viewing it for a few (evil) reasons). I also don't care for ads, but I'd rather have an archive.is snapshot than no snapshot at all. Wyatt8740 (talk) 17:19, 20 June 2016 (UTC)[reply]
  49. Support. It was the right call that, 2.5 years ago, we temporarily blacklisted archive.is because RotlinkBot (and purported assailants) was mass-adding links in contravention of our bot policy. But it's been 2.5 years and archive.is hasn't wreaked havoc on the internet and has actually proved to be a useful asset. Even site bans are reviewed (and typically rescinded on request) after 12 months and I think it's time for us to let people use archive.is on the English Wikipedia again. Deryck C. 18:51, 20 June 2016 (UTC)[reply]
  50. Support. If it becomes a problem again, we can re-blacklist. ---- Patar knight - chat/contributions 03:56, 21 June 2016 (UTC)[reply]
  51. Support we must get our priorities straight, as benlisquare says above. Content that goes missing is lost forever, and given our mission, that should weigh more than any misuse of a tool that in itself incites no such behavior. --Waldir talk 22:51, 21 June 2016 (UTC)[reply]

Oppose Vote

[edit]

Enter a single-line, numbered, signed vote here if you support keeping archive.is on the Spam blacklist.

  1. Close this RfC as the result is meaningless - this does not address the issues and consensuses of the previous RfCs and a supporting closure would not change anything. --Dirk Beetstra T C 05:16, 19 May 2016 (UTC)[reply]
    Consensus can change. This RfC will establish a new consensus that will overturn the original RfC. Hawkeye7 (talk) 08:12, 19 May 2016 (UTC)[reply]
    No, this will not overturn the consensus of the previous RfC, as it is not handling all aspects of the previous RfC. A consensus that supports not using the blacklist to prohibit additions does not automatically mean a consensus to stop removing the links, or even a consensus that no new links can be added. Consensus can change, if that is appropriately assessed. --Dirk Beetstra T C 08:35, 19 May 2016 (UTC)[reply]
    Most of the comments here suggest that the reason for the block was "unfounded" as such this consensus can certainly overturn the older one. Carl Fredik 💌 📧 22:36, 19 May 2016 (UTC)[reply]
    I would also point out this RFC merely removes it from the blacklist, it would not over-rule local consensus at an article to exclude an archive.is link. It merely permits the adding of them subject to normal editing consensus to do so. Only in death does duty end (talk) 19:49, 23 May 2016 (UTC)[reply]
    Most of the comments here did apparently not do their research, or are based on naive thinking: they think that the initial abuse through the additions of archive.is was through some intrerested independent party that was only here to help Wikipedia, and not the people behind archive.is. Oppose - get your facts straight. --Dirk Beetstra T C 03:34, 24 May 2016 (UTC)[reply]
    Keep Open. Valid. Apparently, in your case, you have the ability to think you can look inside people's heads and tell what they're thinking. (Perhaps the admin bit caused this wooly thinking?) Fortunately, you can't. Have my facts straight. It's magical thinking to believe that ignoring the many well-documented cases where only archive.is works, would work if it wasn't blacklisted, and works much better, makes them go away. This punitive block should have ended years ago, and I don't hold a grudge over activity that worked to get around a block that should have been removed years ago. That the extreme claim that "A consensus that supports not using the blacklist to prohibit additions does not automatically mean ... a consensus that no new links can be added." is even made shows how extreme and closed-minded the thinking on this has become. Common sense has gone out the window. (Presumably the no is there due to an editing error; without that assumption, the comment makes even less sense!) Oh, and given that Beetstra edited the RFC for clarity, any remaining ambiguity is not to be interpreted extremely in Beetstra's favor, per the legal principles of the last shot rule and Contra proferentem. --Elvey(tc) 18:43, 1 June 2016 (UTC)[reply]
    @Elvey: It is funny to see that you comment like this while you are also clearly unaware (and this RfC is not exactly helping in providing you that information) that the promotion of archive.is has been repeating itself several times since the original situation, and is an actually currently ongoing situation (FYI, you were canvassed here to support - I wonder how that stands in light of WP:FALSECON). --Dirk Beetstra T C 13:34, 2 June 2016 (UTC)[reply]
    <facepalm> Wow, way to attempt change the subject. I make some valid points and your response is an ad hominem attack - you accuse me of ignorance. Same thing with Codename Lisa, below - she makes valid points and the response is more accusations of ignorance. Borderline WP:NPA violations. Your argument is that since I'm not agreeing with you, I must be ignorant. Lame! It is hubris to think you can look inside people's heads and tell what they're thinking, yet you've made this claim get again, by claiming to know I'm unaware of something! That it happened again after I called it out... shocking. It needs to STOP!

    Analogy: Consider a prisoner who is found guilty of attempted escape (that injured no one and no thing), and sentenced to additional time because of it. If DNA evidence becomes available that proves him innocent of the original crime, should he still be required to serve the sentence for an attempted escape? Probably not, IMO. That's why I said when I voted, "the bot ... should have been approved; instead a bad decision led to more and more bad, authoritarian decisions (including some blocks and bans) all compounding the original error, even though it was well argued that preceding decisions had been in error." So, IMO, the initial allegations that led to the denial of bot status led to more and more, including people pushing a story of criminality. I don't buy the story of criminality based on evidence presented, and I have looked at it. I do have my facts straight. The evidence is too weak. But more importantly, I don't think an innocent man should be imprisoned for trying to escape an unjust sentence. The "crime" is whatever led to denial of bot status. The use of proxies, etc is the escape attempts. --Elvey(tc) 17:39, 5 June 2016 (UTC)[reply]

    @Elvey: I'll say it again, you were one of the editors canvassed here to support by likely the same editors of 2013, 2015, and earlier this year. --Dirk Beetstra T C 19:04, 5 June 2016 (UTC)[reply]
  2. Oppose There are plenty of legitimate archiving services (archive.org is the main one) that take care of the link rot problem without introducing any of the many problems of archive.is. Jackmcbarn (talk) 00:32, 22 May 2016 (UTC)[reply]
    Archive.is is a legitimate archiving service. It invariably does a better job than archive.org or webcite. Regrettably, the number of archiving services remains pitifully few. Hawkeye7 (talk) 01:45, 22 May 2016 (UTC)[reply]
    As explained above, there are many cases where archive.org and WebCite are incapable of archiving certain parts (or even any meaningful content) of websites and without that the references are useless. I wouldn't mind preferring other archives when they are usable (though it's probably too late to add a section for that to this RfC now), but there is no reason to block archive.is altogether. And we shouldn't waste editors' time by making them request whitelisting whenever they can't get any other working archive. (Not sure if there have been any cases of these requests and whether they would be accepted or not based on the previous RfCs, as far as I know they did not address individual whitelisting.) nyuszika7h (talk) 20:30, 22 May 2016 (UTC)[reply]
  3. Leave on blacklist : No legitimate archiving service would use illegal botnets to add the links. It's interesting that this RFC doesn't mention that piece of archive.is's history, and I expect that the votes to remove it are based on ignorance of that fact.—Kww(talk) 22:48, 23 May 2016 (UTC)[reply]
    Archive.is is no more responsible for that than archive.org is for CyberBot II. Hawkeye7 (talk) 23:29, 23 May 2016 (UTC)[reply]
    We aren't talking Rotlinkbot and contravention of Wikipedia policies: we are talking actual criminal trespass, where hundreds of user computers were compromised. It was a criminal act which directly benefited archive.is and only benefited archive.is. It's pretty unlikely that the only beneficiary of a criminal act was somehow completely uninvolved.—Kww(talk) 23:35, 23 May 2016 (UTC)[reply]
    I also find it pretty striking that very close to the moment that archive.today was deregistered User:Rotlink went around and changed all of those links to archive.is. No-one not involved with archive.is would do that service. Seeing that User:Rotlink and User:Rotlinkbot, and all the IPs show exactly the same behaviour does nothing but suggest that archive.is themselves were behind the actions. Keep sticking your heads in the sand, and pretend that you are not being led by companies so that they can make money through you. --Dirk Beetstra T C 03:34, 24 May 2016 (UTC)[reply]
    1. "No-one not involved with archive.is would do that service." By this logic, all Featured Articles of Wikipedia are written by people directly affiliated with their subjects! Have you ever heard of the word "fan"? Fans are more steadfast and even fierce in their support than people actually affiliated with the matter.
    2. Quality of the service is another matter. That bot affair was sad. But not using archive.is is cutting off the nose to spite the face!
    Codename Lisa (talk) 08:04, 24 May 2016 (UTC)[reply]
    Have you ever heard of spammers - spammers are even more fierce in their support as it is .. making them money. They are sometimes more resilient than fans (and some are around for years and still continue). In that case, ask the WMF to renew their Terms of Use, and open a Phabricator ticket to remove the spam-blacklist extension, as they are both meaningless. You act as if spam does not exist.
    I can agree with the problems, though not using archive.is would not be detrimental to Wikipedia (as many of their archived pages can be found elsewhere, and even if not does not mean that the information is not verifiable anyway), even if some people are acting as if it is the end of Wikipedia. --Dirk Beetstra T C 08:44, 24 May 2016 (UTC)[reply]
    Oh, by the way, fans are involved with the subject, and they do have a reason to advance a subject. It takes effort to adhere to our pillars, and if you are right, you just give the excuse to them not to adhere to our pillars. --Dirk Beetstra T C 08:46, 24 May 2016 (UTC)[reply]
    You are making assumptions here, you have no evidence. As I said earlier, it could have been someone who wanted exactly this, to get it blacklisted (I don't know why anyone would want to do that, but people can do surprising things). But regardless of who did it, that was years ago. The main purpose of the blacklisting was to stop the unauthorized mass addition of links because there was no other way to stop it. And Beetstra, "many of their archived pages can be found elsewhere" – the key word here is many, but not all. There's no reason why we shouldn't make it easier for people trying to verify the information and provide an archive link just because it happens to be only available on archive.is. Let me ask you something, would you also consider pointing an editor trying to verify the information in a dead link to archive.is on the talk page a "blockable offense" (while archive.is is on the blacklist)? nyuszika7h (talk) 09:13, 24 May 2016 (UTC)[reply]
    "I also find it pretty striking that very close to the moment that archive.today was deregistered User:Rotlink went around and changed all of those links to archive.is. No-one not involved with archive.is would do that service." – We know he's a friend of the owner (he said that), but what you say is not necessarily true. When someone's site got taken over and they migrated to a new domain, I replaced the existing links on huwiki with AWB. And I'm just a frequent visitor of the site. nyuszika7h (talk) 09:25, 24 May 2016 (UTC)[reply]
    I know he is a friend, not necessarily involved in a way that is in 'violation' with our current terms of use. I am also aware that his English is good enough to communicate and I would have preferred that some communication would have come forward regarding the situation (either from the friend or from the owner; and IIRC the owner was asked to interact with us). So we know that this is closer to a 'fan' (and one rather close to the owner) than to a Joe job (I have seen such cases, and it is how I ended up in the anti-spam business years ago). Also, this does not look like a Joe Job, or at best a very, very badly executed one. I must say, that for a friend the editor did show to be extremely persistent (seen the nature of the IPs being used) in making sure that archive.is got linked (if you are so persistent that you want to run the risk that you friends' site gets blacklisted .. so you may understand that I am somewhat reluctant to consider the friends' story to be the whole story).
    Regarding the 'many of their archived pages can be found elsewhere' - and I mean many, based on the whitelisting requests I still have to see a significant number for which there are no alternatives. And I hope you followed that discussion and related comments from me there - I am taking and advocating a stance there softer than the initial RfC (a stance reconfirmed in the second, though weak), it is what we have a whitelist for (though the community decision of the RfC was to remove the links also when there was no working alternative).
    Regarding the whether I find that a blockable offense - it depends on how it is done (I'll discuss this broader than your question). In a talkpage discussion, no (even if you would find a non-blocked url-shortening service and use that to link to the page - I would however de-link and meta-blacklist the url-shortening service on sight, but without 'warning' the original editor). In mainspace using wording ('find the archived version on archive.is') would be inappropriate prose for a mainspace reference, and I have removed such instances (one non-archive.is recently: diff). If that would be done repeatedly I would consider it disruptive in itself, not because of the 'evading' of the blacklist (I have used strong wording on someone who did that on another blacklisted link after being denied de-blacklisting, and I removed all the violating instances; did not block the editor). If someone would go into an intentional circumvention of the technical means that enforced the community decision made in the first RfC which was reaffirmed in the third RfC (and I have seen such cases for archive.is), then yes, I would consider that a blockable offense - that is intentional (and in the case where I am talking about: knowingly) circumventing a community decision with which you do not agree. --Dirk Beetstra T C 11:08, 24 May 2016 (UTC)[reply]
    Whether or not evidence is needed is another question - the task of the technical means at hand of users with advanced rights (blocking, spam-blacklist, page protection, edit filters) is to stop disruption to Wikipedia. I do think we agree that the way the links were added was disruptive and counter the practices described in our policies and guidelines. The original editors were running unapproved bots and were blocked. That did not stop the behaviour (so, blocking was not enough), and page protection is obviously not an option here (too many pages involved). So the technical means was to stop the MO of the edits: either through blacklisting or through an edit filter. Whether or not it is the owner, a fan, a friend, your neighbour, monkeys, or a competitors' Joe job is irrelevant, the target is to stop the disruption to Wikipedia, and that is what the community decided should be done through a blacklist (at first enforced through an edit filter which practically did the same for 1 1/2 - 2 years and no-one cared, only when it was blacklisted (User:Jtmoon: attempts; User:Hawkeye7: attempts, User:Nyuszika7H: attempts, to take a few of the supporters for removal here) people started to cry wolf .. funny). --Dirk Beetstra T C 11:44, 24 May 2016 (UTC)[reply]
    What does this mean? Hawkeye7 (talk) 12:23, 24 May 2016 (UTC)[reply]
    What does what mean? Editors are prohibited for years to add links, and the ones that complain hardly find any problems with that (you, Hawkeye7, needed it 5 times (12 edits) in just over 2 years, and I it seems you found alternatives as I did not see any requests (I may have missed those, hard to track) to get these links in), but as soon as blacklisting is performed it is suddenly not replaceable and a problem). --Dirk Beetstra T C 13:11, 24 May 2016 (UTC)[reply]
    Oh. I'm just a peon, so all it said for me was "One or more of the filter IDs you specified are private. Because you are not allowed to view details of private filters, these filters have not been searched for. No results." Hawkeye7 (talk) 13:18, 24 May 2016 (UTC)[reply]
    Sorry, I did not realise that you were not able to see the results of hidden filter hits either (strange). Anyway, to explain myself - when people started at the blacklist/whitelist to complain that they could not live without it, I started to look into how often they were hitting the blacklist/edit-filter due to this (they must have hammered it, right?). I saw for some several hits (though generally not more than about 12 (absolute numbers, you 5 times unique pages, s.o. else in that stat also 12 hits, 4 unique pages); in over 2 years) and checked what they did in response (often: use another archive, as I said, I can't see whether they complained or requested to override). Some of the people that complained that archive.is was sometimes the only archive available NEVER hit the filter nor the spam-blacklist at all (which, I think, is also showing how needed the site is). The discussion that precipitated this short survey of me (an archive.is link that was deemed needed) turned out to be a) available in its original form on archive.org, available under a new, alternative path on the original server, and available on archive.org on the new path - hence replaceable in itself. Although I believe that there are cases where archive.is is the only archive, I do not believe that the situation is as dire as many here do want us to believe (and I think that the onus is on the editors who want to revert the prohibition to show that the link is really needed so much that the community should override their earlier decision - what came out of the whitelisting requests and out of the filter/spam-blacklist does not show such an urgency). For fun, I looked down at the two edits you did not find disruptive (I agree, the individual edits are not disruptive) - both are replaceable). --Dirk Beetstra T C 13:40, 24 May 2016 (UTC)[reply]
    So basically, the arguments a) that the blacklisting is unfounded because it was not the owner, or is unfounded because the threat has stopped, are invalid as well: the blacklisting/edit-filter did what it needed to do; and the involved editor is still active ánd interested in linking archive.is on MediaWiki (as recent as February 14, 2016) - show me evidence that we do not need such means anymore, any guarantee that it will not restart (I have shown to be willing to experiment, and I know cases where such 'experiments' were going very wrong)?
    And if we want to use the argument of 'some are not replaceable' - did anyone do the statistics on a significant subset of currently linked archive.is archives and see how many are really not replaceable? --Dirk Beetstra T C 11:44, 24 May 2016 (UTC)[reply]
    FWIW, I do not agree that it was disruptive. I looked at a couple of the links inserted by the unauthorised bot at random.[1][2] Both are fine; correct archive links were properly added for sites that were no longer available. In these cases at least (the first two I looked at), there was no greater disruption than the authorised bot CyberBot II, which tries to do exactly the same thing with archive.org. So it comes down to an unauthorised bot running. I understand the reasons for prohibiting unauthorised bots, but I'm not seeing actual damage to the encyclopaedia. Hawkeye7 (talk) 12:23, 24 May 2016 (UTC)[reply]
    The disruption was indeed in the way they were added, and as I explain, the only way the community found to stop that was to stop the MO of the editor by blacklisting (blocking and page protection was not cutting the deal). If the editor would have stopped and worked with the community (which, until now have not done) this might have gone different (but outreach from our side did not get any response, nor did they do that effort - and I know what that tells me). --Dirk Beetstra T C 13:11, 24 May 2016 (UTC)[reply]
    Hawkeye7, you realize that we are not talking an "unauthorized" bot in terms of violating Wikipedia policy, right? We're talking about a botnet: computers that have been compromised by virii and malware, with access sold by criminals to people that then use them to commit computer crime. It doesn't come down to an "unauthorised bot" running at all.—Kww(talk) 13:53, 24 May 2016 (UTC)[reply]
    What evidence is there of a botnet? There are many ways to change/spoof IP-addresses entirely without using a bot-net.
    (P.S. virus isn't pluralized virii)
    Carl Fredik 💌 📧 15:34, 24 May 2016 (UTC)[reply]
    Looking deeper there seems to be no evidence — in the discussion leading up the black-listing: [3] — it was mentioned he had used open-proxies — which should have been barred from editing from the get-go. This all screams a total overreaction and poor judgement when instigating the block, and it should be opened up again for legitimate use. We may need better ways to target open-proxies. Carl Fredik 💌 📧 15:40, 24 May 2016 (UTC)[reply]
    You think that those proxies hosted on residential computers around the world weren't there as a result of malware and virus infections? The very report you are linking to is complaining about illegal proxy use in support of archive.is only 4 months ago, belying Codename Lisa's assertion that all misbehaviour is long in the past.—Kww(talk) 15:51, 24 May 2016 (UTC)[reply]
    Firstly — where are you getting the idea that they are on residential computers? And secondly, even if they were there is no indication whatsoever that a bot-net is involved. FoxyProxy is just one service that allows the average user to set up an open proxy, this can later simply be scanned for across the entire internet, examples [4] , [5]. Carl Fredik 💌 📧 16:26, 24 May 2016 (UTC)[reply]
    There are numerous DSL and cable modems in the mix presented at WP:Archive.is RFC, CFCF: did you ever bother to actually analyse the IP list presented in the data, or are you just screaming absolutely no evidence without having devoted any time to analysing the data presented?—Kww(talk) 16:37, 24 May 2016 (UTC)[reply]
    No, I didn't analyze them myself — what I saw was a lack of any analysis presented on that page. There wasn't so much as a comment, and even if they are private computers there are any number of private users who host open proxies — as I have already described. Carl Fredik 💌 📧 16:44, 24 May 2016 (UTC)[reply]
    Just for a fun exercise, check out https://www.socks-proxy.net/ which scans for open proxies — you'll find loads of private addresses. There is no need for a botnet, and no reason to believe there was any illegal activity. We need to better patrol open proxies, but that isn't going to be solved using the spam-list.Carl Fredik 💌 📧 16:50, 24 May 2016 (UTC)[reply]
    Thanks for admitting that all of your comments have been made without analysing the evidence presented: I hope whoever closes this mess takes that into account.—Kww(talk) 16:59, 24 May 2016 (UTC)[reply]
    Kww—You as well as I know that is not what I wrote. I did not perform an independent network-analysis of the used addresses, in part because after 4 years they are likely no longer operational, but also because that is required by you who make the allegations, not me! There is nothing in the previous RfCs or discussions that point towards any illegal behaviour here — and insinuating that my arguments hold no value because I have not performed a WHOIS/Ping/Port scan of each address is fraudulent! I can add that in many jurisdictions such analysis is in itself illegal! Carl Fredik 💌 📧 21:00, 24 May 2016 (UTC)[reply]
    User:Haskwye7, User:Kww - you might want to review this in that respect. --Dirk Beetstra T C 15:14, 24 May 2016 (UTC)[reply]
    Note that both are replaceable with alternative archives: the first one and the second one (is it me, or is the second reference conflicting with the data presented in the table on Wikipedia - the table reads 37, 25 and 5, the archive talks only about "Never Cry Another Tear - 24/10/2009 - 24/10/2009 - 70 - 1" (the item does not seem listed in the table on Wikipedia). --Dirk Beetstra T C 13:40, 24 May 2016 (UTC)[reply]
    Kww, please have a look at http://spys.ru/proxies/ you will find out that: 1) there are plenty of 'residential IP' among public proxies and 2) they are not botnet but misconfigured soho-routers.
    Also, Wikipedia cannot be edited from 'hosting IPs'. Even if the spammers took unsorted public proxylist the successful edits could have been made from 'residential IPs' only.— Preceding unsigned comment added by 203.170.75.14 (talkcontribs)
    News flash: Accessing a "misconfigured soho-router" without the owner's knowledge and consent is also a criminal act.—Kww(talk) 17:18, 25 May 2016 (UTC)[reply]
    1) Not at all. There are many research and commercial projects what access each and every address on the Internet making no discount on how well the endpoint is configured and on what premises it is located. Such as shodan.io, scans.io, ..., not to mention projects like hola.org which silently but nevertheless legally turns every computer where their browser extension is installed into a proxy-server.
    2) I believe that you use the word "crime" metonymically as "what I personally would consider unethical". Otherwise, the confidence about their crime implies you as a criminal for "forgetting" to notify the authorities. Regardless of were their action a crime or not, misprision is a crime.
    I will accept the technical correction that mere "access" is not a crime. Using it to transmit commands to other computers goes well beyond mere access.—Kww(talk) 20:03, 25 May 2016 (UTC)[reply]
    Any "access" implies transmitting a command to another computer, in that even a ping-request will require passing this back through a series of relays. Now if we want to use a more restrictive definition - these computers are not transmitting any commands, they are simply relaying it as proxies - where the request still originates from the original users computer. If configured voluntarily by the user, accessing such a proxy, if unprotected by passwords etc. would not constitute a crime in most locales. Carl Fredik 💌 📧 21:37, 25 May 2016 (UTC)[reply]
  4. Oppose There are 3 conditions I want to see prior to reconsidering if we want to allow Archive.is:
    1. an admission from Rotlink, and the management from Archive.Whatever that they acknowledge that
      • their initial attempts to run a bot to add Archive.is links and to overwrite other archiving services with Archive.is was out of order because of WP:BOTPOL
      • their initial attempts to "spam" their service's links into English wikipedia without gaining a consensus to do so is out of order
      • after having RotlinkBot blocked and the refusal to participate in WP:BRFA and consensus building, numerous IP addresses began making the same actions that RotlinkBot. This gave the impression that either they were botswarming on multiple different hosts or had deliberately planted software on residental computers in order to accomplish a fait accompli
      • they have not engaged collaboratively with the community to resolve the issues (namely ignoring robots.txt, striping out original ads, injecting ads to monetize Archive.is, etc) that caused their service to be not welcome in English Wikipedia
    2. That they engage with the community to resolve the issues that have previously blocked the usage of Archive.is
    3. That they work with the foundation to provide a better source of retaining reference data for pages.
    Until these are finished, Any relaxing of restrictions is premature and ill guided as the reasons for the restrictions have not been overcome. Hasteur (talk) 12:45, 25 May 2016 (UTC)[reply]
    @Hasteur: As what I did with others, I'd like to bring the discussion here: Wikipedia:Administrators'_noticeboard/Archive279#Sockmaster_flagging_archive.today_links_en_masse to the table, to show the total neglect that Rotlink (and sockpuppets/meatpuppets) have for the situation here (which, in my opinion, does not show much hope that the spamming will not continue ..). --Dirk Beetstra T C 13:32, 25 May 2016 (UTC)[reply]
    @Beetstra: I was not up to speed on the recent history, but I am completely unsuprised that the same actions are still being jammed into the encyclopedia. Hasteur (talk) 14:14, 25 May 2016 (UTC)[reply]
    We have tried to get the Internert Archive to do these things without success. Hawkeye7 (talk) 21:46, 25 May 2016 (UTC)[reply]
    You seem to be unable to tell the difference between users adding the links of their own choice and the service itself inserting the links. To use an analogy, you as a user of a computer are free to choose to use what web browser you want (in the generic case). Using Microsoft Office doesn't automatically oblige you to use Internet Explorer. Hasteur (talk) 23:40, 25 May 2016 (UTC)[reply]
    That is all I am asking for; the freedom to use the archiving service I want. Feel free to petition for CyberBot II to stop adding archive.org links until they comply with your preconditions. Hawkeye7 (talk) 02:22, 26 May 2016 (UTC)[reply]
    Has Internet Archive been caught red heanded trying to override community consensus? Has Internet Archive been caught using disruptive tactics after the community has rejected it? You still seem to think that Archive.is and Internet Archive are equivilant services. Let's try yet annother analogy since you're still missing the point. In your city, I assume Taxi service is available, and there are regulations around taxi service. Internet Archive is acting like a responsible citizen and following all the regulations. Archive.is is acting like a pirate taxi operator and picking up fares wherever they want and ignoring regulations, safety concerns, and trafic laws. Currently we have a plan in place (in our analogy universe) that the police (admins) are empowered to stop the Archive.is cars, arrest the drivers, and impound the car for willful disregard of the city (wikipeida) policy. Finally your repeated and willful ignorance is suggesting (at least to me) that you may have a PoV you should declare. Hasteur (talk) 12:13, 26 May 2016 (UTC)[reply]
    Are you referring to Uber_(company)#Criticism? 78.139.174.106 (talk) 14:19, 26 May 2016 (UTC)[reply]
  5. Oppose per Hasteur. The opposition has done a poor job of explaining what the issue with archive.is has been but after much reading, it appears that website has been up to no good and abused Wikipedia to further their commercial venture. To that end, I oppose using them. Chris Troutman (talk) 22:41, 29 May 2016 (UTC)[reply]
    @Chris troutman: Then maybe you should explain: What is this commercial venture you are talking about? Hasteur, whom you are supporting, requires a confession of guilt from party who might not be guilty in the first place. On the whole, no, the opposition hasn't been doing a good job of explaining what's wrong. —Codename Lisa (talk) 07:39, 30 May 2016 (UTC)[reply]
    Yeah, I think you're misreading the problem, Chris. To my understanding, there's no commercial venture attached to archive.is. True, non-commerical websites can be spammed just the same as commercial ones. The real question in my view is whether use of the spam blacklist is still necessary (or even sufficient, considering the arguments below that they're repeatedly registering new domains to bypass the blacklist). The question isn't whether we should allow archive.is links to be spammed. I'm not aware of any reason why we can't, say, set up an edit filter to catch unregistered/non-autoconfirmed editors who add links to "archive.(any newer generic TLD)/(any combination of uppercase and lowercase latin alphabet characters)" and "archive.(any newer generic TLD)/(nnnn).(nn).(nn)/http", and hold a subsequent RfC to determine when archive.is links should be permitted. Finally, as I pointed out above, the entire "They could turn rogue at any moment and turn Wikipedia into a repository of links to attack pages" concern can be completely sidestepped by using archive.org to snapshot archive.is archives and then linking to the archive.org snapshot of archive.is. —/Mendaliv//Δ's/ 13:17, 30 May 2016 (UTC)[reply]
    @Mendaliv: I think I understand where you're coming from and Codename Lisa's point of the blacklist being an overreaction. However, I find Beetstra's argument persuasive. A single bad incident makes me fearful of opening our doors to a second potential abuse. I don't care how archive.is runs ads on their sites. I've linked to plenty of news websites that I'm sure do the same. My issue is the spammy editing. Carl Fredik denies there was any real problem although it seems there's evidence otherwise. With perhaps the right prevention we could use archive.is. I like having these archiving services available but I don't see both a technical solution (like the edit filter control as described by Mendaliv) and a consensus to use same. I would support lifting the blacklist if such a technical solution was supported by the community but until then, I have to say no. I'd like to think Wikipedia's response in the past has sent a message to websites like archive.is so they understand disruptive editing will be punished. Chris Troutman (talk) 18:29, 30 May 2016 (UTC)[reply]
    @Chris troutman: A single bad incident makes me fearful of opening our doors to a second potential abuse it is not a second potential abuse, after blocking additions of archive.is they returned in April 2015 with a limited spam attempt, and in February 2016. --Dirk Beetstra T C 03:11, 31 May 2016 (UTC)[reply]
  6. Oppose One of archive.is's features is link shortener. All link shorteners are in the blacklist. Archive.is must be in the blacklist. Example: bit.ly is also a 'useful service' and has other nice features but no one argues for its whitelisting 2.190.7.141 (talk) 00:06, 30 May 2016 (UTC)[reply]
    Not true. it is an archive, not a link shortener. WebCite also produces shortened URLs and is whitelisted. Hawkeye7 (talk) 04:15, 30 May 2016 (UTC)[reply]
    The shorteners are forbidden for another reason than the length of the links. They are forbidden because it is the way to place link to other blacklisted sites. Due to their archiving feature Archive.is and WebCite are suit for such use even better than classic bit.ly-style shorteners and thus must be forbidden. 2.190.7.141 (talk) 04:29, 30 May 2016 (UTC)[reply]
    @Hawkeye7: Not true, webcite is not whitelisted, it is not blacklisted in the first place since no-one ever had the intention to spam Wikipedia to it to an extend that those additions needed to be curbed. The IP however does bring to front another problem - that the archiving sites can be used to circumvent the blacklisting of other sites. It would be better if the used links were solely incorporating the original link into their parameters and not a 'shortened' form of them (the use of the non-shortened forms could be enforced with the blacklist as redirect-abuse is a serious issue, but that also then goes for webcite and other archiving sites - discussions along those lines have been held in the past). --Dirk Beetstra T C 04:57, 30 May 2016 (UTC)[reply]
    @Beetstra: Just to be clear, both webcitation.org and archive.is have non-shortened forms of URL that include the full original link plus a simple date parameter: e.g. archive.is http://archive.is/2016.05.24/https://www.google.de/, and webcitation.org http://www.webcitation.org/query?url=http://google.com&date=2016-05-30 ... because of the blacklisting, you will need to put these addresses into your browser instead of clicking them. The same pages with the shortened link options: http://archive.is//nnnHP and http://www.webcitation.org/6hsn16K2q. So, really, this is a non-argument; opponents of archive.is should focus on other angles of attack. -- Ham105 (talk) 06:03, 30 May 2016 (UTC)[reply]
    @Ham105: I would not oppose the de-listing on basis of url-shortening concerns, and it is not a concern under this RfC - it is a concern for all of the archiving services and not specific to archive.is (and not even to archiving services - we have parts of google.com blacklisted due to url-shortening and/or blacklist-evasion issues, not because the base domains were spammed beyond control). --Dirk Beetstra T C 07:19, 30 May 2016 (UTC)[reply]
  7. Oppose - for now pending discussion (see below for concerns). — Rhododendrites talk \\ 00:55, 30 May 2016 (UTC)[reply]
  8. Oppose - although marking archive.today links dead in Feb-2016 is doubtfully a SPAM-action (the proxies did not add new links), there was a SPAM-action in Jun-2015 (after RFC3; not mentioned in RFC3 and not mentioned here yet) adding links to another their domain (now offline) archive.limited Wikipedia:WikiProject_Spam/Local/archive.limited. The threat is still here. As soon as archive.is is whitelisted, they start adding links again. It if will not be whitelisted, they will go on with registering new archive.* domains and SPAM links pointing to new domains. webarchiveproject.org is hosted on the same IP [6] and not yet blacklisted. 202.21.125.86 (talk) 03:55, 30 May 2016 (UTC)[reply]
    Another interesting domain from the same server: http://web.archive.org/web/20160530040627/http://lushlinks.com/ 202.21.125.86 (talk) 04:07, 30 May 2016 (UTC)[reply]
    Any connection between those domains and archive.is? The IP addresses are different, the domain registration is different, the domain host is different. PaleAqua (talk) 06:32, 12 June 2016 (UTC)[reply]
  9. Oppose Allowing link to archive.is from articles would be penny wise and pound foolish.
    • Any reference that is only accessible via archive.is is unlikely to be a reliable source in the first place. In exceptional circumstances a particular use can be whitelisted.
    • Unlike archive.org, we have no idea who is behind this site' and how reliable they are. Some of the evidence we have about them makes them look downright shady. Sending our reader to them without warning looks irresponsible to me.
    • Not linking to archive.is doesn't make the archived page go away (unless archive.is itself decides to disappear), so there is no urgency here.
    • What is the point of relying on an archive site that may disappear itself? Archiving pages should be fully under our own control.
    • Having the WMF build a service similar to archive.is for internal use has been a feature request for years. Relying on a third-party won't encourage them to finally start working on this.
    • If we're going to be linking to archived versions of web pages, we shouldn't be doing so directly from main space. Build an extension that can send users to one of several archives (preferably checking if it is avialable first) and issuing the appropriate warnings before sending readers there.
    Ruud 18:53, 31 May 2016 (UTC)[reply]
    @Ruud Koot: "Any reference that is only accessible via archive.is is unlikely to be a reliable source in the first place." – That is a ridiculous argument. So if a site isn't "Web 1.0" enough for Internet Archive or WebCite, then it's not a reliable source? Also, you contradict yourself with this: "In exceptional circumstances a particular use can be whitelisted."
    "Build an extension that can send users to one of several archives" – Yes, let's make it unnecessarily harder for the readers. See Wikipedia:Readers first. Often it's not as simple as getting the latest (or even the oldest) archived version, when a page's content frequently changes, such as Disney ABC Press. Memento can be useful, but it's still the best to point the reader directly to a working copy when possible.
    And linking to other archive sites is already a well-established practice and nobody else seems to have a problem with it. Perhaps you don't trust archive.is to be reliable, but archive.org is not infallible either – it will retroactively hide archived content if a site's robots.txt disallows it (which can happen with domains that expire and are parked, so not only if the site's owner legitimately wants to take down the content – and for that case, archive.is has a report form).
    Also, having the WMF host their own archiving service sounds nice in theory, but in practice it's not as simple as you think. nyuszika7h (talk) 19:20, 31 May 2016 (UTC)[reply]
  10. Oppose - Archive.is still appears to be a rogue operation of some sort, and that is reason enough to blacklist it. Robert McClenon (talk) 03:07, 1 June 2016 (UTC)[reply]
  11. Oppose None of the supporting comments address the concerns or raised or context of the previous RFCs. There has been no attempt to explain away the unauthorized bot, the botnets used, or the questionable practices of the site itself. We should not be allowing a service that breached the community's trust especially when there are valid alternatives available like archive.org and webcite. While I do support using it as a last resort if all the other archives failed, none of the arguments presented here are convincing and are just "because I like it". Opencooper (talk) 00:22, 2 June 2016 (UTC)[reply]
  12. Oppose moving to oppose. Problems still seem to be associated with this site and that makes me nervous. Could be a false flag thing of some sort, but that seems unlikely. Hobit (talk) 02:10, 3 June 2016 (UTC)[reply]
  13. Oppose for now The community sat down with the Internet Archives and had discussion of how to work together. Would want to see the bot operators / group from archive.is comment on the concerns raised before we lift the blacklisting. If they are doing this "Archive.is strips out the original advertising on the page and inserts in their own advertising links to montetize themselves, thereby making it not a true archive of the page?" Than wow. Of course they want more links from us. Doc James (talk · contribs · email) 14:49, 3 June 2016 (UTC)[reply]
    @Doc James: They are not stripping out ads and I have never seen any from them. See archive dot is XajR3 for an example. Their FAQ says that they may include advertising "after 2014", but including ads alone is not an argument against using it, as many news and other sites include ads too. nyuszika7h (talk) 17:15, 3 June 2016 (UTC)[reply]
    many news and other sites include ads too - But we're not debating sources. We're debating a way we access sources across the project, which is a separate discussion from whether a source is reliable. The question is, as I've come to understand it, should we be pointing readers to a site that says it may introduce ads, gives no indication about whether the ads would be implemented responsibly, run for profit by unknown entities with an unknown business model, providing with no information about the long-term plan for the site or long-term feasibility? It's a tough question to answer in the affirmative, even if there are a few cases when it's the only currently available means of accessing some sources. We do link to news sites that have ads, but only when we consider them to be reliable sources according to other criteria. When we know nothing about the source, when it's unclear about the purpose, ownership, and priorities of the site, it's very rarely going to be considered a reliable source. Seems more analogous to the way we handle WP:DEADLINKSPAM -- when someone finds a deadlink, copies archived source content to their own ad-filled website, and adds the link to Wikipedia to remove the deadlink tag. Yeah, because their spam site doesn't acknowledge robots.txt, there's an argument that it could be around longer than archive.org, but we remove those links anyway. If we were deciding between no archiving and archive.is, that would be one thing, but in the vast majority of cases we have an alternative (and not just any alternative, but a well-known, well-respected non-profit with a long-term plan, many funding sources -- and not only are they committed to not showing ads and not spamming us, but they're even interested to work with other non-profits for a cause that has nothing to do with personal profit. — Rhododendrites talk \\ 19:05, 3 June 2016 (UTC)[reply]
    @Doc James: The archive does not replace ads. Wikipedia has a promotional article about a project which does replace ads with their own. Wouldn't you like to start a RFC about its removal and blacklisting?
    Okay thanks. Why not just use archive.org? Am moving to neutral due to the difficulty with reverting vandalism when an archive(.is) link is present Doc James (talk · contribs · email) 15:16, 5 June 2016 (UTC)[reply]
    @Doc James: I suppose this page is turning into a prohibitive wall of text. To summarize my understanding: Archive.org respects the use of robots.txt and thus will not archive sites which do not want to be archived. Archive.is ignores it. So there are more sites archive.is can archive, and archives aren't deleted if, down the road, a site decides it does want to use robots.txt. Ignoring robots.txt could be controversial, but is not, as far as I've seen, a reason anyone has given for not using archive.is. While archive.is is not currently replacing ads, as above, nobody knows where it gets funding, it's not a non-profit, and it says it may run ads "after 2014". On the subject of ads, if this is a small venture with one or a small number of people bearing the costs, it seems like opening the floodgates of Wikipedia would surely hasten the process of implementing ads -- and in such a case there are no commitments/indications about how those ads will be implemented. There have also been a great deal of spam, with bots/IPs adding loads of archive.is links to Wikipedia, which some allege to have been orchestrated by the owners of the site (I don't know what evidence there is for that, though). The appeal of archive.org is that, while it doesn't cover as many sites due to robots.txt, it's a non-profit that cooperates with Wikipedia, has a long-term plan, will never be running ads (aside from their fundraising banner, I suppose), and is more likely to be around years from now (archive.is has been around since 2012, but again, it seems very unclear about its funding/long-term prospects). For me, I'm uneasy with removing it from the blacklist, especially if we don't have a mechanism to always prefer archive.org whenever at all possible. Adjusting Template:Cite web to accommodate multiple archive urls and to state an explicit preference for archive.org is one way to do that, but as of now the result of supporting removal from the blacklist will be to use archive.is by default, which I don't know is a good thing. — Rhododendrites talk \\ 15:35, 5 June 2016 (UTC)[reply]
    Yes agree long term stability is important for an archive service. Also good to have one that complies with websites requests. Doc James (talk · contribs · email) 15:40, 5 June 2016 (UTC)[reply]
  14. Oppose Andrew Lenahan - Starblind 14:52, 3 June 2016 (UTC)[reply]
  15. Oppose Archive.is is not known for predictable and transparent conduct (quite the contrary). Luckily, there are viable alternatives. Archiving services are not just a link here and there in the External links sections, but used on a massive scale. Therefor, continuing the blacklist is justified. – Finnusertop (talkcontribs) 14:10, 6 June 2016 (UTC)[reply]
  16. Oppose Continuing on the blacklist is justified for now given the concerns.Pincrete (talk) 20:26, 6 June 2016 (UTC)[reply]
  17. Oppose We are to allow a site known for malicious activities to spread their links onto this project? The supporters need to take a long hard look in the mirror and decide if they want Wikipedia to be overrun with commercialism. Archive.is needs to be a hell of a lot more forthcoming with the Foundation before we could dream of allowing this. As is, it's nightmare waiting to happen. Allowing archive.is to run amok of this project is no better than welcoming Willy on Wheels back with open arms. --Hammersoft (talk) 14:26, 7 June 2016 (UTC)[reply]
  18. Oppose. This is the proverbial drinking poison to quench thirst. And we are not that thirsty anyway. T. Canens (talk) 02:34, 8 June 2016 (UTC)[reply]
  19. Oppose Unfortunately it seems the owner of archive.is is willing to do wrong things and spams Wikipedia using robots. We have no control over the site so we could end up with a large part of Wikipedia in danger from his use of our links to his site. I really wish archive.org could at least do something a bit more about its policy of deleting archived pages just because some domain squatter with no interest in the old content takes over a domain and sticks in a robots.txt - the retroactive deletions make the site very unreliable. Dmcq (talk) 08:58, 8 June 2016 (UTC)[reply]
  20. Oppose. A pig in a poke at best. Debouch (talk) 20:30, 13 June 2016 (UTC)[reply]
  21. Oppose. Classical "Operation Margarine". 176.14.242.87 (talk) 16:52, 14 June 2016 (UTC)[reply]

Discussion (Request For Comment)

[edit]

Arguments by User:Jtmoon,

Support Removal
  1. immune from LINKROT
  2. other archiving sources are suffering from linkrot as this is discussed
  3. proven itself reliable; operating same way since 2012
  4. fast, reliable, and easy to use; all Wikipedians grasp it, compare to archive.org which can be pretty slow, IMO.
Counter-Oppose
  1. most Oppose concerns are "Crystal Ball reasoning"; might get taken down, might host malware, might host spam
    1. most Oppose concerns apply to all other websites considered link worthy; news sites, blogs, corporate sites, etc.
  2. since the Oppose position is that of censoring, the burden of proof should be on that side to present decent evidence of their concerns. So far, it has only been speculation.
    1. no proof provided of supposed spam-bots, malware hosting
    2. no proof provided archive.is had employed (or had reason to employ) a botnet
    3. no proof provided of any botnet
  3. commercial concerns make no sense as nearly all primary source news websites are commercial websites
--JamesThomasMoon1979 04:17, 17 May 2016 (UTC)[reply]

Previous RfCs

[edit]

The previous RfCs have multiple aspects, which are not all addressed in this RfC. The RfCs do include conclusions about removal of links (which would still have consensus), the RfCs talk about prohibiting additions of links (which would also not be addressed here). Can someone please appropriately address these issues, as the conflicting consensuses might now result in the blacklisting rules being removed, while we are still to prohibit additions (which would mean that an edit filter should be re-enabled?), and we should still remove the links while the links are not on the blacklist. As the situation is now, the outcome of this to-be-RfC is not bringing anyone anywhere. --Dirk Beetstra T C 05:14, 19 May 2016 (UTC)[reply]

The previous RfC ended with no consensus on the removal of links, or on allowing the addition of links. It is understood that removal from the blacklist means that editors are free to add links whereever they think appropriate. Hawkeye7 (talk) 08:26, 19 May 2016 (UTC)[reply]
I quote from Wikipedia:Archive.is_RFC_3: "Result: No consensus — Given the prior RFC, this presumably means the stare decisis would be to continue prohibiting additions and continue removing." - that consensus will not be overthrown with this the decision to de-blacklist, the consensus would still be to prohibit additions and continue removing, except that that would not be enforced with the blacklist but in other ways. And even if the consensus would be not to use technical means to prohibit additions (so no blacklist ánd no edit filter, e.g.), the continued addition would still be against consensus .. and anyone violating that would be editing against community consensus; "Combined with the lack of consensus affirming the inverse of this item (i.e., #1), again we have to look to the prior RFC, which presumably means the stare decisis would still be to continue removal, unless I'm missing something." - also not overthrown with de-blacklisting, we would still need to remove the links. Can we now please set this RfC up in a proper way, as any conclusion here is meaningless. --Dirk Beetstra T C 08:33, 19 May 2016 (UTC)[reply]
When there is no consensus, the status quo stands. Consensus on this RfC will establish a new consensus. It is understood that removal from the blacklist means that editors are free to add archive.is links whereever they think appropriate. Hawkeye7 (talk) 09:08, 19 May 2016 (UTC)[reply]
Don't get me wrong, I have nothing against overturning the previous consensuses (and I might even !vote in favour of doing that), but administratively the problem is differently. We currently have a community consensus to blacklist and remove - and that is what is being applied. I cannot make the single-admin decision to overturn that. The set-up of the current RfC does not overturn that either, it only means that the blacklisting is to be reverted, but there is still a standing consensus to prohibit. I would strongly suggest to phrase clear questions in the RfC that address the previous RfCs conclusions: 'should the additions of archive.is links still be prohibited (yes/no)?', 'should existing archive.is links still be removed (when the conclusion of the first question is 'yes') (yes/no)?'. Positive answers on that is what you (we) want. I don't understand now why the questions in this RfC cannot be addressed properly, get a proper, new, consensus, and execute that. Why do things half-baked? --Dirk Beetstra T C 09:14, 19 May 2016 (UTC)[reply]
The very same reason that you or me didn't create the RfC in the first place! Because it places can intolerable burden on an editor trying to create an RfC. Restricts access to the whole RfC process to wikilawyers. So adhering to WP:NOTBUREAUCRACY, which is a policy, a support vote should be considered a positive answer for all of the above. Remove from the blacklist (or whether the hell it is that is preventing addition of links). Do not remove existing links. Allow addition of new links. Do not collect $200. Hawkeye7 (talk) 09:33, 19 May 2016 (UTC)[reply]
I have not been the one arguing to get the previous consensus overturned, I have ruled that that was needed for me to take administrative action. 'A support vote should be considered a positive answer for all of the above.' - should by who? By the administrator that has to judge what the consensus that is achieved here is meaning for the previous consensus? There have now been two RfCs after the first one, one invalidated early, and a second with half-baked questions that did not properly get a consensus or clear answer (the previous RfC ended with two conclusions having '.. presumably means ..' in them). So I will forego the conclusion here that it will here become 'the consensus here shows that the blacklist should not be used to prohibit additions, an action which is endorsed in the previous RfCs, which presumably means that the link additions should still be prohibited but that that should not be done using the spam-blacklist', and 'the additions should not be prohibited with the spam-blacklist, but are still prohibited, which presumably means that the links should still be removed'. A bit of working together in creating this RfC and getting proper questions laid out (and I did edit this page early on, and commented in different places about it early on, all with similar suggestions) before starting may get us a long way, and at least a clear answer that can not be wikilawyered around or which can have multiple interpretations. The question laid out here does however have multiple interpretations possible. --Dirk Beetstra T C 10:56, 19 May 2016 (UTC)[reply]
I reject your argument that it is impossible to remove the site from the blacklist. This RfC is for exactly that. Hawkeye7 (talk) 12:51, 19 May 2016 (UTC)[reply]
I am not making that argument, on the contrary, I make the argument that the consensus here could be that the site gets removed from the blacklist. It will however not counter the consensus reached by the earlier RfCs - that additions should be prohibited and that the existing links should be removed. --Dirk Beetstra T C 13:14, 19 May 2016 (UTC)[reply]
This RfC implicitly removes all restrictions on archive.is. Hawkeye7 (talk) 13:35, 19 May 2016 (UTC)[reply]
"Remove archive.is from the Spam blacklist) (Oppose/Support)" - explicitly ALL restrictions? I only see a question for one restriction. --Dirk Beetstra T C 13:37, 19 May 2016 (UTC)[reply]
Well there is nothing else that we can do. If wikilawyers want to try and obstruct the use of archive.is in the face of consensus, then it becomes a matter for ArbCom. Hawkeye7 (talk) 13:49, 19 May 2016 (UTC)[reply]
@Hawkeye7, Nyuszika7H, CFCF, and Beetstra: I've been tied up with IRL work and will be for a bit longer. Perhaps you users or @Staberinde: or others could address User:Beetstra concerns. Thanks User:Beetstra for bringing this up.
@Beetstra:, I presumed the prior RFCs, seeing no activity for several years, and having not changed the Blacklisting, then they could be left as is. That is, this RFC 4 would not need to re-present the arguments in those prior RFCs since they seemed to be a stalemate. Secondly, I wanted to simplify the issue in this RFC 4 since re-presenting arguments in three prior RFC was too onerous for the editor (myself) and too onerous for other Users to understand that would only complicate this issue for little or no improvement to en.wikipedia.org.
--JamesThomasMoon1979 18:18, 20 May 2016 (UTC)[reply]


Considering that RfC just started we could simply add "Remove archive.is from the Spam blacklist and permit adding new links" to proposal to cover this, and then ping everyone who already voted in the section. There is really not much point in 3-4 different questions here, as it is meaningless to remove from blacklist if we don't also permit adding new links. Any objections @Beetstra:?--Staberinde (talk) 15:31, 19 May 2016 (UTC)[reply]

Also this RfC isn't properly listed yet, but I guess that should wait until formatting objections have been handled.--Staberinde (talk) 15:46, 19 May 2016 (UTC)[reply]

Or we just ignore this inane objection on the grounds that it has no merit. Wikipedia is not a bureaucracy, and we can assuredly interpret this question and the responses to be sufficient for overturning an old consensus. We should interpret the spirit of the question, not the exact words. Any more ink spilt on this nonsense damages the sanity of anyone who's actually here to help improve the encyclopaedia. Carl Fredik 💌 📧 22:46, 19 May 2016 (UTC)[reply]
@Jtmoon, Hawkeye7, Nyuszika7H, CFCF, and Beetstra: I went ahead and clarified proposal wording a bit adding "and permit adding new links"[7]. Pinging all who already voted to avoid any later complaints.--Staberinde (talk) 16:11, 20 May 2016 (UTC)[reply]

Comments by Beetstra

[edit]
per WP:TPO - moving the comments changes their meaning. Moving the discussions back into place. --Dirk Beetstra T C 13:47, 26 May 2016 (UTC)}}[reply]
Beetstra — Please refrain from doing so. Others commented here, and filling the entire vote section with debate is not good practice. The Oppose section could similarly be purged from debate, but it grew more organically and someone else may have to do that. Carl Fredik 💌 📧 17:00, 26 May 2016 (UTC)[reply]
One of the major hindrances for that is actually that you chose to vote twice with different rationale in your comment. First as close, then oppose. It might be best if you moved the discussion if you wish to. Carl Fredik 💌 📧 17:02, 26 May 2016 (UTC)[reply]
  1. @Jtmoon: there are many alternatives that can be used, and you seem to have gone quite well without (I presume you did find alternatives or did not find it excessively needed when you were blocked 3 times in additions of the site). The original spammers of 3 1/2 years ago (who were using accounts and mulitple IPs (botnets)) were active on-wiki as recent as February 2016. The use of such practices, whether by site owners, fans, friends, or a competitor, is a very well founded reason to use technical means (edit filter, blocking, page protection or the spam blacklist) to stop such edits. --Dirk Beetstra T C 05:35, 25 May 2016 (UTC) (reping: @Jtmoon: --Dirk Beetstra T C 05:35, 25 May 2016 (UTC))[reply]


@Beetstra: Thanks for replying and putting so much time into this. I was hoping to steer this debate toward something more concise but I guess everyone has a lot to say!
"you seem to have gone quite well without (I presume you did find alternatives or did not find it excessively needed when you were blocked 3 times in additions of the site)". For me, that link results in One or more of the filter IDs you specified are private. Because you are not allowed to view details of private filters, these filters have not been searched for.
But there have been a few dead links for which only archive.is was available but I could not add that to archiveurl. Search my Contributions for strings "dead" or "404". Some (but not all) are now noted dead links without a supporting archiveurl entry within the citation.
I reviewed some of the discussions within this RFC 4 and skimmed the rest. Unfortunately, I don't have the time to consider every argument. But I'm pretty sure I would just reiterate my original points above under "Arguments by User:Jtmoon,".
I'll just kindly request you consider that despite some things running afoul in the past which involved archive.is, allowing it again would greatly improve the serious and perpetually ongoing problem of LinkRot.
Also, FYI, there should be two questions at archive.is blog (blog(dot)archive(dot)is) relating to wikipedia. One from anonymous and one from JamesThomasMoon1979.
--JamesThomasMoon1979 00:02, 30 May 2016 (UTC)[reply]


@Jtmoon: I only wish that more time was put into constructing this RfC from the beginning. As I already said from the beginning, this needs more explanation and you get now, as evidenced from the remarks, totally uninformed !votes from editors. Also, the way you set this up is not giving you debate, you actually don't want debate.
And as I have asked a couple of times below, can you provide statistics for that. You say there have been a few dead links for which only archive.is was available. Is that the 3 times you hit the filter (I have to add 1 for when the spam blacklist was handling it see log)? In over 2 years? If that was all then that is not a really desperate need (as you did for those three times not go on and see whether s.o. could help you work around the filter, you did not go on to get the filter removed). And you must have added hundreds of references. It comes likely down to the 1 on 100 cases where archive.is cannot be replaced. And that can be very well handled with the whitelist.
I'll just kindly request you consider to reset this RfC in a way that we can have a proper discussion, neutrally initiated and with all information presented, about the problems, and possible solutions. --Dirk Beetstra T C 03:26, 30 May 2016 (UTC)[reply]
  1. If it happens again it is easy to get User:Xlinkbot to scan for new IPs adding links and to ban them for being open proxies. Carl Fredik 💌 📧 09:39, 25 May 2016 (UTC)[reply]
    @Hawkeye7: - we've discussed this in the section below. I am still awaiting answer there. --Dirk Beetstra T C 05:35, 25 May 2016 (UTC)[reply]
    WP:PERX Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
    @Hasteur: This is much more than just "per X", so as I said below, this is just WP:POINTy. nyuszika7h (talk) 15:04, 26 May 2016 (UTC)[reply]
    @Nyuszika7H: Can you explain to me why you think that other archiving services fail to archive parts of sites, when you appear to only had the need to use it once? Did you always use alternatives? Also, you are aware that there is evidence for a botnet (I hope that you properly analysed the provided evidence for that), and that the original spammers have used it as recent as February 2016. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Once is enough for a user to know and remember that the site can not be linked — and already we have a negative impact. This isn't about the amount of use, it's about a completely asinine and incorrect application of the spam-list. Carl Fredik 💌 📧 09:42, 25 May 2016 (UTC) [reply]
    "Once is enough for a user to know and remember that the site can not be linked" – exactly. And in the cases I needed it, I didn't feel like requesting whitelisting, because I didn't even know that was possible – back then it was using the edit filter, I don't know if it's possible to whitelist with that, but I didn't know about MediaWiki:Spam-whitelist until some point either. And I didn't feel encouraged to request whitelisting the few times I needed it, based on other users' comments.
    I was not aware of the recent activity, but as someone else pointed it out, there is no reason to believe machines were compromised, as there are a lot of users who run open proxies, there are thousands of open proxies and not all of them are (or can be) blocked. At the very least, we need to make sure that at least legitimate uses with no alternatives will be whitelisted. (Have any archive.is links ever been whitelisted? The previous RfCs didn't really address that, just one user mentioned whitelisting, unless I missed something.) But as evidenced by the circumvention, this is not the right approach in the long term. CFCF's suggestion of using XLinkBot sounds like a good idea to me. Although I guess sleeper socks could be used, that wouldn't be as easy to abuse. nyuszika7h (talk) 09:58, 25 May 2016 (UTC)[reply]
    {@CFCF: I took the point that 'once is enough to know' would be a point - still no-one ever ran into an sufficiently urgent need that they did ask for ways around the filter, that still suggests that it is not very needed. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    Yes, and this could be considered a chilling effect on the type of content that would otherwise have been authored. Numerous studies and literature show how adding even minescule hurdles to performing a task on the internet will decrease the amount it is performed by an order of magnitude. It causes users to think: "Oh, but there must be a really good reason why this is blocked" or "Oh, this isn't worth my time" or even "Oh, but that process is horrible, I don't want to force myself through it" — all of which disincentivise producing quality content. Carl Fredik 💌 📧 11:18, 25 May 2016 (UTC)[reply]
    Still it is telling, as is that now suddenly we need to remove it from the blacklist (a decision taken 3 1/2 years ago, with an inbetween re-discussion) .. because additions are not allowed and we need it .. that is a misinterpretation (and I have been commenting on how this RfC is presented earlier). --Dirk Beetstra T C 12:15, 25 May 2016 (UTC)[reply]
    So you're electing to ignore the part where Archive.is strips out the original advertising on the page and inserts in their own advertising links to montetize themselves, thereby making it not a true archive of the page? Hasteur (talk) 12:37, 26 May 2016 (UTC)[reply]
    @Hasteur: Can you show me an example of that? I wouldn't have noticed as I use an ad blocker. As far as I know, there were plans to possibly introduce advertising but that never happened, and are you sure the ads are intentionally being stripped out and it's not the case of being something like obnoxious Flash ads that it doesn't support? Also, it is WP:POINTy to reiterate the same arguments under every support !vote. I think it's safe to assume most people here, like me, were not aware of the recent activity. nyuszika7h (talk) 15:03, 26 May 2016 (UTC)[reply]
    @Nyuszika7H: I also considered XLinkBot earlier. Also that however has collateral damage, but to a lesser extend (similarly, we could use an edit-filter, though that is heavier on the server). In the time of the edit-filter there was no whitelisting possible indeed. As spamming is likely still a problem, I think that XLinkBot/EditFilter may be needed as an alternative, and I think that that should be strongly considered.
    There are currently no archive.is sites whitelisted. For the handful of requests that I handled alternatives existed (though I have indicated and suggested there that if there is no alternative whitelisting could be carried out). --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    @Anotherclown: Can you explain to me why you think it is a useful tool, when you appear to never had the need to use the site? Did you always use alternatives? --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    What happened three month ago? —Codename Lisa (talk) 15:17, 26 May 2016 (UTC)[reply]
    Gday. I've never added it to an article for the very reason that it was blacklisted (hence the nil results in your search). There have been a few occasions when I would have as it was the only tool that had an archived copy of a particular dead link. As such the result of it being blacklisted was that unreferenced information was removed when it could otherwise have been referenced and retained. Anotherclown (talk) 00:08, 26 May 2016 (UTC)[reply]
    As I mentioned previously the mere knowledge that the site is blocked is enough to have a chilling effect. Requiring users to jump through excessive hoops to whitelist links is ridiculous and leads to nothing beyond allowing you to display your authority in denying them access.. Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC) [reply]
    @CFCF: That is ad hominim and out of line with how I have acted there. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    I admit that may have been clumsily phrased — but the issue is that this centralizes the command to a single user or a small subset of users (you yourself have previously expressed that too few admins are willing to work with the black-/white-list). Wikipedia is supposed to refrain from this type of centralization of power — and a straw-poll of the different requests on the white-list page shows that most are being denied (many for pretty trivial reasons too). Either whitelisting individual URLS (not entire page-directories) should be automated so that any authorized/autopatrolled user can do it — or the spam-list should only be used for cases where there is little cause for legitimate use of a website.
    That other similar websites are available is also not an argument in this case at all — because it should come down to preference which archival service is to be used. archive.is allows for screen-shotting the page and archives the page as it loads for the user who made the request — for example bypassing regional variations in content. An example here is to go to www.nickelodeon.com, or www.disney.com — where there is no visible way to access other regional sites. It is possible that this type of censoring can spread to other sites, more akin to the type that we use for referencing on Wikipedia. In fact this is already occurring: http://speisa.com/modules/articles/index.php/item.2446/daily-mail-blocks-sweden-for-legal-reasons.html
    Traditional archival services such as archive.org have nothing to offer in this type of situation, while archive.is is able to accurately portray differences in the loaded sites. These are all rare case-scenarious, but adding the burden to request whitelisting (a very cumbersome process) is damaging to the encyclopedia. Carl Fredik 💌 📧 11:12, 25 May 2016 (UTC)[reply]
    Yes, I have been a strong advocate of getting more people involved there, as (and I know that), I am sometimes policeman, judge, executioner, higher court, again executioner ... (which for clear-cut cases is not necessarily a problem, and I do sometimes step away .. which then again sometimes results in requests getting ignored due to no alternative manpower). There would be improvements possible there (also in the long-requested overhaul of the whole system, it is inconvenient and difficult to operate - not the priority of WMF apparently). 'Automation' is difficult, as often one does need to dig into reasons and see where requests come from and how/if they are needed. This specific case is now based on RfC (i.e. community consensus). That community consensus states that links should be removed and additions prohibited (full stop). Based on that, any whitelisting could just be blanket denied (that is automated). I already WP:IAR on that in that I do say that if there is no replacement possible that I would whitelist. I have however not run into many of those cases. Note, regarding more manpower, non-admins could easily help there in evaluating situations and/or clarifying (and even denying the obvious 'bad' requests) - but that is not in the interest of people, apparently.
    Generally, the spam blacklist is only used for those cases where there is hardly to no legitimate use. For archive.is we have been through a meta-discussion where we discussed the cross-wiki aspect of this site, with early on already saying that this is probably not good to blacklist but that we did need a way to mitigate the cross-wiki situation (there was resistance from a significant number of wikis to the unsolicited additions). It again boils down to the crudeness of the spam-blacklist extension (WMF, where are you?), where alternatives are hard to implement (a global edit filter with a throttle would be an option, but that is taking too much system resources; I could write a throttling variety into XLinkBot .. but my time is also limited).
    As such, I am afraid that we end up in a loose-loose type of situation - or no legitimate use, or giving in to yet another spammer (I know, we have no evidence that the editor is connected to archive.is (he says he is a friend), but evidence to the contrary does not exist either). I do not expect that the community will step in sufficiently if the IPs start spamming archive.is again (maybe a bit of whack-a-mole and a bit of reverting), nor that now we finally see some significant pressure on the developers to actually come with proper solutions to this type of situations. --Dirk Beetstra T C 12:15, 25 May 2016 (UTC)[reply]
  1. @Staberinde: You are aware that the original editors that were using botnets to add these links to various Wikimedia sites have done so as recent as February 2016? I would consider that a serious reason to continue blocking the site. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    No evidence has been presented that the original editors used anything but legal tools — and while this is an issue that can be solved through other means we should not play mock investigators here — it undermines our position and makes us look like fools who do not understand basics of the internet. Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC) [reply]
    The use of multiple internationally based residential IPs does not look good. And whether the use of the IPs itself was legal or not - if someone is editing disruptively from multiple IPs people work hard to stop it, here someone is running unauthorised (in Wikipedia terms) automated accounts from multiple IPs .. --Dirk Beetstra T C 12:15, 25 May 2016 (UTC)[reply]
    @CFCF: A pretty minor problem - thousands of uncontrolled link additions by unauthorised bots and using botnets to circumvent accounts being blocked does not sound like a 'pretty minor problem' to me, but more like a massive understatement. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Yeah, pretty minor. Botnets (as has been explained many times before) — and all we are left with is an unauthorised bot and one stupid editor who isn't following the rules. Either we could invite him/her to apply for authorization or get some better way to block the open proxies that have been used. It should not be too hard to track where he has gotten his proxy-list from. Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC)[reply]
    @CFCF: Well, a bit of analysis shows what they abuse, and how difficult it is to block all those IPs. 3 1/2 years after the initial situation they were back at it with a whole new set. 'It should not be too hard to track where he has gotten his proxy-list from.' - if so, it is also not to hard for them to find another list. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    {rto|Hobit}} in the end, who added it is not of importance. Whether it is the site owner, a fan, a bunch of monkeys, a sweatshop, your neighbour, or a competitor, the result for Wikipedia is the same: continued and repeated disruption through the use of unauthorised automated accounts / IPs (where the IPs are part of botnets). --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    And in the end the spam black-list is a poor way to address the concerns because it has collateral damage and impacts legitimate uses as well. (Whitelisting is not a viable option.) Carl Fredik 💌 📧 09:48, 25 May 2016 (UTC)[reply]
    @CFCF: So we just open the tap without considering viable alternatives (well, we two were the only two to at least hint at alternatives). --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    @Codename Lisa: it is indeed a sad incident, unfortunately, it did not stay just with that incident, the same situation repeated again just 3 months ago (about 3 1/2 years after the initial problem). --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
  1. @SSTflyer: you have "run into many occasions when repairing dead links where Archive.is is the only archiving service available" is in striking contrast with your two recorded attempts to use archive.is in the 2 1/2 years the filter blocked the additions. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Mere knowledge that Wikipedia does not allow Archive.is is enough to make editors avoid using it. That argument is absolutely insane — it builds on the idea that all we ever do on the internet is edit Wikipedia, and that we would never archive for other purposes — and it is just stupid in that it implies that editors do not remember or could not have seen others who have tried and failed to use archive.is . Carl Fredik 💌 📧 09:52, 25 May 2016 (UTC) [reply]
    @CFCF: as above, I considered the once-bitten-twice-shy option .. still no-one had a big enough problem to solve anyway, even if they knew that just trying would not be enough. --Dirk Beetstra T C 10:54, 25 May 2016 (UTC)[reply]
    @Nyttend: a big overreaction when the site gets spammed using multiple unauthorised bot accounts and botnet IPs, and where the original editors behind the additions feel to repeat their actions as recent as February 2016? --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Yes, a big overreaction — even in the way you talk about it. There is no evidence of a bot-net!' Carl Fredik 💌 📧 09:52, 25 May 2016 (UTC) [reply]
    Shout all you like, but the alternative is "used a list of potentially compromised computers and misconfigured routers in an effort to bypass IP level blocks", which is just as illegal. Anybody shouting that there's no evidence is simply ignoring the evidence presented.—Kww(talk) 02:53, 26 May 2016 (UTC)[reply]
    No, that is an alternative, not the alternative. I will give you that it is possible that this could have been done illegally — but that is not enough to even indict anyone in any court in the world. You must beyond reasonable doubt show that the person has performed a crime, not just mere allegations that it is possible that in performing this action it is also possible to perform a crime. So, as far as legality goes — unless you have better proof your argument is null.
    Using and hosting open proxies on personal computers is a practice that is legal in most of the world — and does not require you to use compromised or misconfigured hardware. All it requires is that you connect to a server hosting an open proxy.
    Your argument amounts to wafting away a gardener from his job while cutting the rose-bushes, because: "his garden-scissors could be stolen — and he might be an illegal immigrant too". Just because you don't happened to like what he did doesn't mean it was illegal! Carl Fredik 💌 📧 10:02, 26 May 2016 (UTC)[reply]
    Anyway, the legality of the mechanism isn't something relevant to us. WMF didn't do anything that you mention, so it's not liable, and regardless of the mechanism used, the action was stopped because it was in violation of our policies, so WMF isn't liable for countenancing something illegal. Both of them are unrelated to the issue of continuing to prevent people from using it through legal means. Nyttend (talk) 15:21, 9 June 2016 (UTC)[reply]
    @David Gerard: ".. way too useful to encyclopedic purposes", yet in the 2 1/2 years that this was monitored, you felt only once to use the site and apparently have found alternatives in all other cases where these encyclopedic purposes needed to be met. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
    Whatever you're measuring is incorrect - I have preferred archive.org links, but (e.g. when h-online took down its entire archive) had to resort to archive.is several times when there was no other copy I could find.
    Also, at this point you're just getting spammy yourself, badgering people, saying the same things over and over. This isn't convincing me, and I suspect others, that you have a substantive point - David Gerard (talk) 06:41, 25 May 2016 (UTC)[reply]
    I can agree that my measure is thwarted (I took that into account, though that in itself is telling as well), but I don't think that the use of archive.is is as big as is here suggested (and most people here did not do the effort to actually see whether it is so much needed over other services, and that is what you have said here as well ('I have preferred archive.org links' - so not much use for archive.is anyway, I wonder how others think about this).
    I think that if you use arguments in an RfC that you should be able to defend those, and I have been saying since the beginning that people were not properly informed with the introduction of this RfC and that it needed to be expanded (and the question that you posed below in the discussion is one of the examples why your !vote here is uninformed - what if we have no other methods of stopping the continued abuse), and I do think that most of the editors here do give (obviously) uninformed responses (the spamming is something of the past - did the !voter actually go through recent discussions regarding this subject and looked through the logs to see whether it is not current and ongoing? Well obviously not, since it was an issue a mere 3 months ago). --Dirk Beetstra T C 06:58, 25 May 2016 (UTC) (expanded --Dirk Beetstra T C 07:02, 25 May 2016 (UTC))[reply]
  1. @Ravensfire: You forgot about the case where a domain expires and/or is taken over. Parked domains usually have a robots.txt that prevents crawling, and if it's taken over, the attacker can do so manually. Archive.is has a manual report process if one would like an archived copy to be taken down. There is only limited automated archiving (from Wikipedia recent changes, I don't know of anything else), other than that it's manual so it doesn't really make sense to apply robots.txt. Anyway, I don't mind a preference for other archives when not inferior. nyuszika7h (talk) 14:30, 23 May 2016 (UTC)[reply]
  1. @Pinguinn: "The bot stuff is in the past now." - for a full 3 months. --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
  1. @James Allison: Appears to be a useful way to prevent link rot .. yet you have not used this link in the 2 1/2 years that it's use was restricted. How can you judge that it is useful? --Dirk Beetstra T C 05:29, 25 May 2016 (UTC)[reply]
  1. Mendaliv - that is the current practice (a slight soft reading of the previous RfC): links can not be used (as they are blacklisted), except when it can be shown that there is no replacement, in which case specific links can be whitelisted. If that is what you mean, then the links could as well stay on the blacklist. --Dirk Beetstra T C 03:13, 25 May 2016 (UTC)[reply]
    I support flipping the burden, then. Demonstrate that the links are available elsewhere, are needless, or are otherwise forbidden by policy and they can be removed. The bottom line is that there's no justification for keeping it on the blacklist. If it starts getting uncontrollably spammed, then we can consider re-adding it. As I say above, and have said elsewhere, there's no clear evidence of use of a botnet, no evidence to connect the spamming to the site operators, and no evidence that disfavoring its use as an archival service merits this extraordinary and unusual manner of preventing its use. And let's just assume for a moment that there is a real risk of the site going rogue: Then use Internet Archive to snapshot the archive.is page and link to Internet Archive instead (presuming there's some reason why we aren't just using IA to archive the live page). No risk of going rogue anymore. —/Mendaliv//Δ's/ 03:47, 25 May 2016 (UTC)[reply]
    @Mendaliv: "there's no justification for keeping it on the blacklist. If it starts getting uncontrollably spammed, ...". You are aware that the original spammers were active throughout Wikimedia as recent as February 2016, using the same techniques of automated edits on both an account and on multiple IPs of the same character (proxies, botnets) as what precipitated the original discussions, RfCs and consensus to blacklist? --Dirk Beetstra T C 04:55, 25 May 2016 (UTC)[reply]
  1. What is this supposed to mean, Kaldari? If all else fails we remove it from the blacklist, or that the links should only be used if there is no alternative archive (the latter being the current practice)? --Dirk Beetstra T C 06:58, 25 May 2016 (UTC)[reply]
  • Just commenting that this sub-section is a disaster, and it's entirely inappropriate to move someone else's comments (multiple comments directed at multiple users) to a separate section while leaving everyone else's comments intact and in place (and it should really go without saying that it looks really bad if you only move someone's comments who you disagree with). Also, they were just dumped here and the syntax wasn't even cleaned up. They should be moved back into place, the rest of the extended comments moved out of the survey sections, or it should at least be rendered readable for crying out loud. — Rhododendrites talk \\ 03:28, 30 May 2016 (UTC)[reply]

Spammy links?

[edit]

Of course, the original problem remains. Can we deal effectively with the spammy links it's being abused for on an individual basis? Is this feasible? - David Gerard (talk) 19:22, 23 May 2016 (UTC)[reply]

@David Gerard: I'm not sure what you're talking about, I'm not aware of archive.is links still being "spammed", unless you are talking about bypassing the spam filter using archive.is, which can just as easily be done with WebCite. nyuszika7h (talk) 20:31, 23 May 2016 (UTC)[reply]
I'm assuming that was the original justification for adding it to the blacklist. If it wasn't, what was? - David Gerard (talk) 10:36, 24 May 2016 (UTC)[reply]
@David Gerard: Well indeed, what this case has shown was that someone was originally spamming it and that that resulted in blacklisting (first enforced through an edit filter). That spamming was first by named accounts, and when those were blocked multiple IPs took over - which showed that blocking the accounts did not solve the problem (and obviously page protection is not cutting the deal either). We also know that one of the original editors was active (on Wikimedia, not on en.wikipedia where they are blocked) as recent as February 2016, so I think that that is enough evidence to show that the editors until recent still had interest in linking this (though they have not been active in multiple additions before 2016 on a really significant scale). On the other hand, members of this community have not been able to add links for years now without too much problems (only within days of actual blacklisting they started to cry wolf because they did not comply with the second decision made in the original RfCs: all links should be removed). Those editors have not shown that there are indeed a significant number of links that can never be replaced by showing an analysis of a random, significant subset of currently linked archive.is archives (which corroborates with the few complaints that they could not add the links in the first place - apparently there alternatives existed or were not needed), nor have they shown whether not having an archive (for newly to-be-added links, or for those linked instances) is detrimental for the information on that page. User:XLinkBot might be an option to catch spammers early-on, but that will have some collateral damage similar, but on a smaller scale, to spam-blacklisting (reverting genuine edits by new editors/IPs). --Dirk Beetstra T C 12:04, 24 May 2016 (UTC)[reply]
You might want to read this discussion to make a decision on whether you think the threat is over, and whether we can handle the situation in other ways. --Dirk Beetstra T C 15:12, 24 May 2016 (UTC)[reply]

NOINDEX

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Please stop removing NOINDEX. RFCs are not to be indexed, see http://en.wikipedia.org/robots.txt (also http://bugzilla.wikimedia.org/show_bug.cgi?id=11261). This Archive.is RFCs are malicioulsly placeed not under /wiki/Wikipedia_talk:Requests_for_comment/ to evade the indexing prohibition. That's especially weird when done by the editors arguing for respecting robots.txt— Preceding unsigned comment added by 105.107.123.132 (talkcontribs)

There are many RfCs which are not under that tree. There is no policy or guideline that prescribes that RfCs are not to be indexed, that pages under mentioned tree are noindexed may very well be for another reason. Do not insert that tag again without getting a proper consensus. --Dirk Beetstra T C 13:06, 26 May 2016 (UTC)[reply]
There is also no policy that archives should respect robots.txt. You put yourself to the opponent side. Also, before consensus is established the noindex must be present as the default option for all RFCs.
Or should I open ticket in Bugzilla to add this pages to robots.txt manually by the WMF admins?— Preceding unsigned comment added by 78.139.174.106 (talkcontribs)
An RfC .. not no-indexed for the reason of being an RfC. And there are many examples like that. --Dirk Beetstra T C 13:15, 26 May 2016 (UTC)[reply]
I do however have a question, randomly cross-country hopping IP: what is in these RfCs that it should not be found by e.g. Google? --Dirk Beetstra T C 13:18, 26 May 2016 (UTC)[reply]
There is a RFC-section on a talk page which cannot be individually included or excluded. Archive.is RFC are designated pages in the same Wikipedia: namespace where all non-indexed RFCs are normally placed.
There is no cross-country, I am in Georgia. And I have question - what was the reason to format the name of the RFCs in a way that it should not be found by e.g. Google? — Preceding unsigned comment added by 78.139.174.106 (talkcontribs)
'where all non-indexed RFCs are normally placed' .. there is however no real regulation for that.
Funny, a couple of minutes ago you were in Algeria, and yesterday in the Ukraine. And I already answered that question: there is no policy or regulation for that on Wikipedia. You have however neatly evaded the question. --Dirk Beetstra T C 13:32, 26 May 2016 (UTC)[reply]
There is established practice coded in robots.txt. There is established practice to place new RFCs undex /wiki/Wikipedia:Requests_for_comment/. Well, no one has to follow it. But no one has to resist either. There must be the reason why an experienced editor do not follow it and create RFC with a SEO-optimized name avoiding robots.txt - it is not a random event already. And then fight again NOINDEX on the pages - it is not a random event definitively.— Preceding unsigned comment added by 78.139.174.106 (talkcontribs)
There is also an established practice to keep things in the same system. If anything, you could ask for all 4 RfCs to be moved into the tree. For the rest, it may be common practice, but that is about it. --Dirk Beetstra T C 13:50, 26 May 2016 (UTC)[reply]
I mean the crusade against NOINDEX directly confirms the initial intention of choosing name for the RFC especially to avoid robots.txt. Yes, I mean the very first RFC, not RFC_4 (which was named after RFC_3, which ...). 78.139.174.106 (talk) 13:53, 26 May 2016 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

 In progress. Pinguinn 🐧 17:26, 26 May 2016 (UTC)  Done[reply]

  • Thank you very much indeed"! 78.139.174.106 (talk) 17:33, 26 May 2016 (UTC)[reply]
    ! The OP's IP is close to https://en.wikipedia.org/wiki/Special:Contributions/105.107.15.128 which has a short history of doing nil but flagging archive.is links as being dead. The latter is obviously a sock from the anti-archive.is camp. So there's evidence of serious abuse on both sides. --Elvey(tc) 16:39, 5 June 2016 (UTC)[reply]
    ? There's someone hopping IPs who wanted this page to be noindexed so as to be treated the same as it would be under Wikipedia:Requests_for_comment/. I don't really understand the motivation of either side, but I also don't understand what in the world you're talking about. There's at least one IP hopper participating. You've found another IP from one of those ranges who may be someone participating in this discussion. Are you saying it was someone anti-archive.is who was arguing with Beetstra to noindex the page? (Beetstra, who has been one of the more vocal "anti-archive.is" participants in this discussion (insofar as the blacklist is concerned)). What reason do you have to throw out WP:SOCK allegations? Because someone used multiple IPs? I genuine don't follow. — Rhododendrites talk \\ 17:03, 5 June 2016 (UTC)[reply]
    I would consider the IPs that try to NOINDEX as IPs on the support-to-deblacklist side (make sure that the world doesn't see these discussions that resulted in the blacklisting), as the IPs that were (successfully in 2 cases) canvassing for support !votes (all in discussions where there were complaints about not being able to link, or to editors who were finding solutions to have at least the data on-wiki; 2 editors responded within an hour of being pinged which strongly suggests that they were here because they were pinged). The other IPs active here at this time are less clear in the pattern. However, the former all do fit the same pattern as the IPs in 2013, April 2015 and February 2016. --Dirk Beetstra T C 17:47, 5 June 2016 (UTC)[reply]
    @Rhododendrites: I am not against de-blacklisting per sé .. But I am concerned about the continued abuse by the original spammers, and do think that editors should be properly informed before !voting, and that solutions should be properly discussed before deblacklisting. As such, I am against ta plain uncontrolled de-blacklisting as is mainly argued here. --Dirk Beetstra T C 19:24, 5 June 2016 (UTC)[reply]

Questions

[edit]

First of all, these discussion sections as they exist right now are sort of unapproachable for the previously uninvolved (scattered discussion, debates over nuance without some basic information, syntax problems, etc.). Similarly, we really need a summary at the top of why archive.is is blacklisted and why it should be taken off the blacklist. Just pointing to four other RfCs while providing no other context makes it really difficult for neutral parties to join the discussion who weren't part of the past discussions.

So here are my questions. My hope is that they will lead to inline discussions, but feel free to ignore if you think I should dig through the previous RfCs. — Rhododendrites talk \\ 13:14, 29 May 2016 (UTC)[reply]

I disagree with the premise that this is needed — seeing as an overwhelming majority support overturning the ban. It was shoddily introduced and likewise poorly defended by a small number of editors who believe that sheer persistence will merit their position more weight. Carl Fredik 💌 📧 13:45, 29 May 2016 (UTC)[reply]
@Rhododendrites: And I completely disagree with that. Many of the editors were oblivious of intemediate problems (diff; diff; diff; and some of the support !votes, e.g. diff). It is clear that there is a lack of information provided to the editors (as also evidenced by the comment here, and the questions posted you below). It is now difficult to say whether the presentation of this RfC is the reason for the 'overwhelming support'. --Dirk Beetstra T C 13:57, 29 May 2016 (UTC)[reply]
Well, sure, if you say "this site is useful, should we take it off the blacklist" and say nothing about why it's blacklisted, the default would probably be support. Without providing an adequate summary of context, you're going to get people who are already opinionated on the subject, people who give kneejerk uninformed opinions, and very few uninvolved contributors who take the time to parse old discussions. — Rhododendrites talk \\ 14:31, 29 May 2016 (UTC)[reply]
WP:KETTLE violation there, CFCF. You dismiss concerns, state your own dismissal repeatedly, and repeatedly disrupt the discussion in order to minimize the opposition. [8] [9]. Your abuse of this RFC alone should merit invalidating it and running a fresh one in which you are forbidden to participate.—Kww(talk) 03:13, 30 May 2016 (UTC)[reply]
Kettle is an essay, it can't be "violated". I have nowhere disrupted the discussion and I have engaged in the discussion in a reasonable manner. What you suggest is preposterous and will never happened. Carl Fredik 💌 📧 12:27, 30 May 2016 (UTC)[reply]
In a reasonable manner, Carl? Your selected moving of (sometimes ongoing; diff) discussions, bolding of comments when they do not seem to come through (diff), and an (in your words clumsily worded) ad hominim attack (diff) on an RfC where there has been continuous resistance towards discussion but which needed to be selectively formatted into a vote where questions to opposers were allowed by supporting parties, but not to supporters of opposing parties does not seem reasonable to me (and I think that others, like Hasteur), on who you performed such actions thought similar). --Dirk Beetstra T C 13:01, 30 May 2016 (UTC)[reply]

1. A few years ago a bot and several IPs added many links to archive.is inappropriately. Is there any reason to consider these events relevant to the blacklist today?

  • The action was mistaken and should have been taken to close the open proxies that were utilized instead. Carl Fredik 💌 📧 13:41, 29 May 2016 (UTC)[reply]
  • (edit conflict) The editors were active, using the same MO as 3 years ago (accounts and multiple IPs scattered over the world), as recent as February 2016 (3 months ago). It seems therefore not unlikely that, upon removal of the block of addition of the site, the editor could very well return and continue their actions. Though I have not found any relation with the editors in February, a similar MO (multiple IPs scattered all over the world) was used to make sure that this page was 'noindexed' as recent as 3 days ago. It goes without saying that it is easy to find more open proxies to use (in a way, it becomes a game of Whack-a-mole. --Dirk Beetstra T C 13:45, 29 May 2016 (UTC)[reply]
Please explain what is referred to by MO? Is it modus operandi? I'm uncertain most editors will understand what you mean. Carl Fredik 💌 📧 13:49, 29 May 2016 (UTC)[reply]
Sorry, yes, I meant modus operandi - in this case the use of multiple IPs scattered over the world. --Dirk Beetstra T C 14:02, 29 May 2016 (UTC)[reply]

2. What differences exist between archive.is, archive.org, webcite, and/or other similar services? This question regards features as well as the organizations behind them. In practical terms, if a website is still available, or if it is unavailable but there exist multiple extant archives on different sites, what reasons are there to use (or not use) archive.is vs. alternatives like archive.org or webcite?

Yes, thank you. Carl Fredik 💌 📧 13:55, 29 May 2016 (UTC)[reply]
  • (edit conflict) Archive.is is 'on demand', and more continuous (archives do not get deleted when the original site changes rules or owner - something that anyway does not happen too often). However, much (but not ALL!) of the information is available elsewhere. Additions of the links have been blocked for a long period of time without a large influx of complaints that that gave problems (and I don't think we had a large influx of whitelisting requests because editors found their additions blocked, the influx there was more since articles still containing the link get now flagged). Also, several recent whitelisting requests were declined as alternatives were available on other archiving sites. --Dirk Beetstra T C 13:45, 29 May 2016 (UTC)[reply]
In the short-term websites do not change owners very often — but when looking at a 5-10 year period and longer they most certainly do. Many websites from the early 2000s have different owners, different layouts, or have been abandoned for years at a time. Locking popular sitenames in the hope that someone will come along and pay large sums for the domain is a very common practice. Carl Fredik 💌 📧 13:54, 29 May 2016 (UTC)[reply]
The question is how that number relates to the millions of references we have. For the first cases on the whitelist many did still have functional alternatives. How many of the thousands of originally spammed links actually suffer from this problem (can we have an analysis of a subset of 100-200 to show that that is really a significant problem)? --Dirk Beetstra T C 14:02, 29 May 2016 (UTC)[reply]
It isn't down to whether the problem is common — but a single case of a lost source caused by this block is significant in my book. Just because the issue is rare doesn't mean it is in any way insignificant. Carl Fredik 💌 📧 15:15, 29 May 2016 (UTC)[reply]
Well, the way I meant it is, if it is 1 on 10 blacklisting becomes rather disruptive, we would have to whitelist a lot. If it is 1 on 50 or 1 on 100 then the problem becomes already much less. Seen the ease that alternative archives could be found on the whitelists, and the fact that no-one bothered in the 2 1/2 years that it was blocked to ask for an exception seems to suggest that we are closer to the 1:100 side than to the 1:10 side. You here suggest that if we lose a reference because the original goes down and we do not have a viable archive (which could be the case if we are too late already to create an archive of the information with archive.is), then that would be a disaster for the information on Wikipedia. Is that really true, would we delete because the reference does not exist anymore? Is that information so important that we do not have any other references that confirm the situation? (and under the current situation, we could whitelist the archive.is, if it is demonstrably the only archive). --Dirk Beetstra T C 15:40, 29 May 2016 (UTC)[reply]
  • There's talk of longevity here. What reason do we have to believe archive.is will be around? They only date back to 2012, after all, and say nothing about their funding except that it's private. The Internet Archive, putting aside any faults it has for the moment, is a non-profit behemoth with a long-term plan and tons of funders. — Rhododendrites talk \\ 14:31, 29 May 2016 (UTC)[reply]
None, but they are currently the only service that provides this functionality. We shouldn't prohibit the use of a good service just because it isn't perfect. Carl Fredik 💌 📧 15:09, 29 May 2016 (UTC)[reply]
  • (edit conflict) The longevity argument for archive.is is indeed one that can be returned in kind. Archive.org deletes sometimes stuff based on robots.txt issues, but the site is likely to stay around - archive.is is not deleting, but since it is privately funded may not stay forever, meaning that those links could all go down as well - leaving no alternative either. A return question: if we do not link to internet archives at all (including archive.org), will that bring down Wikipedia, will it really bring down those statements for which the original citation does not exist anymore, or is there where WP:AGF on the original citation comes around? --Dirk Beetstra T C 15:40, 29 May 2016 (UTC)[reply]

3. If a website is unavailable and the only extant archive is at archive.is, is there one particularly salient problem with archive.is that would make having a dead link preferable to using it?

  • Other archival sites will remove a website if it has gone offline and been replaced with a notice that the domain is for sale — as these pages often have robots.txt files. Archive.is disregards robots.txt when the archival has been performed manually — creating a more permanent archival service than say archive.org.
It is also considerably faster and transparent to the end user in how it performs the archival. Carl Fredik 💌 📧 13:41, 29 May 2016 (UTC)[reply]
  • (edit conflict) No. Those specific instances could be whitelisted (and I have just done one a couple of hours ago). Except for the lack of manpower at the whitelist, there is no reason that links cannot be whitelisted if there is a demonstrable need for it (i.e. there are no alternative archives available or even that the alternative archives are grossly insufficient in capturing the required information, or when other archives are deleting the content due to certain reasons). --Dirk Beetstra T C 13:45, 29 May 2016 (UTC)[reply]
  • I see talk of why archive.is is better, why it's not that great, and ways to keep it on the blacklist, but still no explanation of why it's blacklisted to begin with other than spammers. Surely if someone started spamming links to Elsevier it wouldn't be blacklisted, so why is it? In the time since asking these questions I did a little bit of digging to learn more about archive.is. There's a longevity concern I stated above, but there's also the question of whether, if we opt to use archive.is instead of e.g. archive.org, they could decide to flood the site with ads in the future (even malicious ads). With archive.org's funding model and various commitments we're reasonably safe from that, but the FAQ on archive.is even says "Well, I can promise it will have no ads at least till the end of 2014." Which means it could be imminent. That seems like an awfully big deal when talking about introducing tons of links from Wikipedia. — Rhododendrites talk \\ 14:31, 29 May 2016 (UTC)[reply]
Yeah, it would be a big deal if that was what was up for debate — but it isn't. We are discussing whether the site should be allowed at all, not whether we should allow the introduction of tons of links. I am fully behind the idea that we should stop users from running unauthorized bots, but when archive.is already holds links that no longer exist anywhere else (yes that is the case) — we shouldn't be distracted by the issue of a rogue user who is ignoring policy.
I don't see why we would scrutinize their funding model when we don't do the same with media-sites which are already riddled with ads. Carl Fredik 💌 📧 15:06, 29 May 2016 (UTC)[reply]
You say here that we wouldn't be introducing a ton of links. So then again the question: how many links are we talking about. How much do we really need to whitelist because there is no alternative? --Dirk Beetstra T C 15:40, 29 May 2016 (UTC)[reply]
  • (edit conflict) @Rhododentrites: It was blacklisted because the community decided that the additions were uncontrollable (unapproved bot accounts and what is likely the same bot running from many IPs that were scattered around the world) and that had to be stopped. Seen the community support, I think that the community there decided that it was not thát needed (alternatives often exist) and that the abuse was likely not going to be stopped in other ways (as User:CFCF suggests, you could block all the IPs, but the supply of open proxies is likely rather unlimited and it becomes a game of Whack-a-mole). A blocking edit-filter does the same as the blacklist (though may be more easy to tune: disallow only unconfirmed editors - which in itself is also easy to circumvent), a weaker edit-filter only detects and can be ignored; XLinkBot can be reverted/ignored (or you wait to be confirmed) ... And who is going to run behind the mess to clean it up when the IP is finally blocked. We are 2 1/2 years further and the original mess is still not cleaned up, despite community decision that that needed to be done.
  • I agree, if Elsevier would start spamming on this scale it would likely not be stopped, though it might in the end become the only means of stopping. It will not be a single-admin blacklist decision - it could, in an extreme case, be the outcome of an RfC. I think there as well part of the decision will need to take into account whether the links are really, really needed. We have some stuff on the blacklist where the companies behind the links are still active spamming what is left over, and there is community resistance against blacklisting those domains 'because they are sometimes useful' (giving in to spam). I'll post a return question for this in return: why does archive.is need to be spammed to such a degree (why is someone spamming archive.is to Wikipedia), while that is not observed with Elsevier? --Dirk Beetstra T C 15:40, 29 May 2016 (UTC)[reply]
  • In relation to the Elsevier case, there are initiatives on Wikipedia to have links to external databases where the owners are actively helping in expanding our articles (sometimes only with the link to their database). All those things can very well be in good collaboration, in line with our m:Terms of use. However, here we have someone spamming the links to archive.is, and someone actively trying to make sure that discussions about archive.is are not to be indexed by Google (suggesting that it is at least not a Joe job, a Joe jobber would like to make sure that these discussions could be found online if it is the intention to discourage linking to archive.is, moreover, they are only somewhat successful here, so their approach was anyway bad if that was their intention - hence it is a 'fan', a 'friend' or the owner themselves). All attempts to discuss with this editor (or group of editors) are futile. Does that give the impression that the editor is intending from now on to be cooperative - or do we soon have to run behind them again to revert and block, just because a few links are really needed and whitelisting is too much of a burden there. --Dirk Beetstra T C 16:07, 29 May 2016 (UTC)[reply]

4. If one or more archiving services are preferable, or if one or more are undesirable, are there other ways other than blacklisting to communicate or implement that prioritization?

  • Choice of which service to archive with should be up to editor discretion — therefor the question is moot. Carl Fredik 💌 📧 13:41, 29 May 2016 (UTC)[reply]
  • (edit conflict) The edit filter (which has been active for several years) practically did that (preventing the addition of the link). One could consider a 'warning only' filter to be used, but that would not deter additions of links that are replaceable, nor would it, obviously, stop those who feel the need to spam (like the editors and IPs that precipitated the first RfC and the decision to blacklist). Another option is User:XLinkBot, but also there it is a matter of reverting the bot (though that generally gets noticed quite early, and persistent editors might end up being reported to WP:AIV early on if they go over the addition throttle (6th revert within a 3 day period if I recall the settings correctly)). --Dirk Beetstra T C 13:45, 29 May 2016 (UTC)[reply]
  • I'm thinking about other means. For example, a bot that looks for archive.org versions of the same source and replaces them when available. You'd lose the benefits of archive.is above, yes, but I'm wondering if these sort of compromise scenarios have been discussed. — Rhododendrites talk \\ 14:31, 29 May 2016 (UTC)[reply]
I don't like that idea, because it should be presumed that when an editor chose a specific archival service that was done for a reason.
I am far more supportive of running XLinkBot on new users/IPs — essentially barring IPs from adding these links. I don't think that would have very much impact on legitimate use.
An alternative is to run a bot that automatically lengthens the URLs from archive.is to their full name, after which the site can be crossrefernced with archive.org in case archive.is ever does close. Carl Fredik 💌 📧 15:06, 29 May 2016 (UTC)[reply]
If it is for a reason, that reason can be clearly stated, and would be a good reason to whitelist the link. If you can't make that case, then you should not even add the link. --Dirk Beetstra T C 15:41, 29 May 2016 (UTC)[reply]
That would be reasonable if there wasn't only one or two arbiters of the spam-list. What really ought to be done is to block additions by IPs/new users and allow anyone with an editing record to add the links.
Getting links whitelisted is an unreasonable burden and makes editing more difficult for established editors — and if such an editor starts spamming links they can easily be blocked.
What is happening here is a misappropriation of a very blunt tool to adress an issue which needs to be dealt with finesse. Carl Fredik 💌 📧 12:35, 30 May 2016 (UTC)[reply]
Still, that is not what is presented here, and how this RfC was set-up, nor how you voted your support. It is a vote-stacking (not a discussion) of totally uninformed editors, something that I remarked from the beginning. Blocking is an option, but if, like in February, dozens of IPs are editing congruently, blocking is a very hard task and the influx of edits and following reverts is equally disruptive.
And calling the few editors who are active on the black/whitelist the arbiters is again a disproportionate characterisation, especially when those 'arbiters' actually are upholding a weaker stance than what is the current consensus in the last RfC (which was to remove all links even if there was no alternative - which translates to don't whitelist just remove and deny). It is not the first time that you feel the need to use such language here, while still considering this a discussion. I can agree that the tool is blunt, but that is what the community decided to use here (the continuous spamming of the links by the many IPs and similar disruption with relation to archive.is is equally being performed in a very blunt way, where other tools hardly work). --Dirk Beetstra T C 13:01, 30 May 2016 (UTC)[reply]
  • (edit conflict) @Rhododendrites: It is not only discussed, we actively have a bot adding archives on archive.org to links (though not the replacement of archive.is, I believe .. though that would be an easy to implement extension of the system). It would have been a good discussion point here, but that is not how this RfC was set-up. It poses a mere question: 'remove', it does not consider asking for alternatives. --Dirk Beetstra T C 15:40, 29 May 2016 (UTC)[reply]

@Rhododendrites: "For example, a bot that looks for archive.org versions of the same source and replaces them when available." – It would be a bad idea for a bot to do that, because archive.org may very well have a copy but one that happens to be unusable. @CFCF: "An alternative is to run a bot that automatically lengthens the URLs from archive.is to their full name" – If people use the citation templates correctly (with |archiveurl=), that shouldn't be necessary. nyuszika7h (talk) 15:56, 29 May 2016 (UTC)[reply]

@Nyuszika7H: "because archive.org may very well have a copy but one that happens to be unusable" - again the question: can we have decent statistics for that? How often do the archive.org copies turn out to be so bad that they cannot give the information that needs to be verified? 1:10? 1:100? Is it the same with the number of whitelisting requests of archive.is links where there were no alternatives to be found? --Dirk Beetstra T C 16:10, 29 May 2016 (UTC)[reply]
@Beetstra: My point is archive.is links should not be arbitrarily replaced overriding the user's decision which may have been made because it is the only suitable archive – although while archive.is remains on the blacklist, it should be OK to do that (making sure to exclude any whitelisted URLs). nyuszika7h (talk) 16:26, 29 May 2016 (UTC)[reply]
What about adding optional parameters to Template:Cite web for specific archives? You could use archive-url and archive-date as usual, filling in the archive of your choice, and someone else could fill in an additional field if they so chose. E.g. if there's a reason to use archive.is, you could still add the parameters archiveorg-url and archiveorg-date. If something happens to archive.is in the future, the content isn't lost, and if something happens re: robots.txt with archive.org, content wouldn't be lost. Some will undoubtedly view this as overkill, but it doesn't necessarily change how people use the template -- it would just add flexibility and an extra layer of protection against link rot. — Rhododendrites talk \\ 16:49, 29 May 2016 (UTC)[reply]
I'd actually like to see how we handle archives switched to something more like what is done with ISBN, PersonData, intra-wiki links for other languages, etc. using WikiData. See Wikipedia:Requests for comment/Archive.is RFC 3#Look at referencing templates that support links to multiple archiving sites. PaleAqua (talk) 16:58, 29 May 2016 (UTC)[reply]
Memento could be useful in that. nyuszika7h (talk) 16:59, 29 May 2016 (UTC)[reply]
@Nyuszika7H: but archive.is links could be added to ALL places where there is already an archive.is link. That would greatly aid in determining whether there are no replacements in those cases. I still would like to see properly performed statistics whether in how many cases these sites are really needed. Maybe we can then make a well informed decision and come up with better solutions than the current situation. --Dirk Beetstra T C 03:43, 30 May 2016 (UTC)[reply]

Would it be possible to simply require the use of Cite X templates when adding an archive.is link, with requirements for the original URL and some kind of tag to claim that every other archive's version is missing or incorrect? A popup on preview/submit would inform the user that archive.is is the last resort, and to check this list of archives first, and if so, add a specific tag. (noarchiveorg perhaps?) Similar to a blacklist but not quite. Non-templated links would be blacklisted outright, as would any use of archive.is without a url tag. Or is this just too complex? Spambots gonna spam, and blocking them should be separate matter from the use of a legit but abused resource. SilverbackNet talk 08:20, 1 June 2016 (UTC)[reply]

@SilverbackNet: Whether in templates or not, the edit is not going to save, the software recognizes both as an added external link. Setting up an edit filter where non-templated links are disallowed might be an option, though it is not really a block, as it is easy to spam templated links or just links in templates as well.
One could opt for an edit filter, where 'new editors' and IPs get blocked from using the archive (though that is going to have collateral damage in itself), and will need to be exempted as well (sometimes the edits are genuine). It will cause a lot of work, and seeing that we have over 3 years of decision to remove the links and blacklist them, and no-one has done a significant cleanup (and those who would try would probably be condemned for it). It is a difficult decision - either you have extra work due to whitelistings needing to be done, or you have extra work to help those blocked by an edit filter, or you have extra work through having to cleanup behind the spammers (we for sure would set up a filter to play whack-a-mole).
I would favour to have at least the fully expanded links (but not only for archive.is), and not the 'shorthand versions'. That could be enforced by blacklisting, leaving full links an option (similar is done for parts of google.com). --Dirk Beetstra T C 09:01, 1 June 2016 (UTC)[reply]
[edit]

The original set of mass additions was using archive.is and archive.today, which got blacklisted / abuse-filtered since the first RfC. In that RfC was described that open proxies were used in bot-like additions from multiple locations. Additions have been blocked for 2-2 1/2 years.

It turns now out that archive.limited was spammed as recent as April 2015 (SPA IPs from US, Sweden, Iceland, and Russia), and that in February 2016 archive.today links were updated or tagged using multiple IPs (named accounts, as well as SPA IPs from Spain, Marocco, Algeria, Ukraine etc.). As recent as a couple of days ago, several IPs (SPA IPs from Ukraine, Algeria, Georgia) insisted this page to be no-indexed, a modus operandi quite similar to the mass additions/changes/tagging.

Based on this, and the years of experience in this field on Wikipedia, I really wonder whether we can define this as 'the spamming stopped'/'is something from the past', and whether we can trust that the spamming will not restart if all restrictions on linking are taken away. --Dirk Beetstra T C 05:45, 30 May 2016 (UTC)[reply]

The connection via "done by IP" is rather weak, it is not like Rotlink edits (they added archiveurl= and here we see dump replacements of live links): https://en.wikipedia.org/w/index.php?title=Chinese_New_Year&diff=656959956&oldid=656602591
URL structure http://web.archive.org/web/*/http://archive.limited/* differs from archive.{is,today}.
Filtering of archive.is did not prevent this archive.limited spam. I would say more advanced filtering is required. Alerts when more than 1 link is changed, when changes made only in links, etc 77.44.177.146 (talk) 06:34, 30 May 2016 (UTC)[reply]
I do have concerns that the different groups of IPs are all operated by the same editor(s). They all concern different IPs throughout the world, and they all have strong interest in domains/discussions related to archive.is. For the rest, yes it is weak. I do need to look around and dig a bit further. Checkuser data might be able to help us, though a lot of this is stale by now. --Dirk Beetstra T C 07:14, 30 May 2016 (UTC)[reply]
By this logic, 1 pro-vote and 2 counter-votes above have been made by Rotlink. And Rotlink advertised here his SPAM-agency (lushlinks.com) 77.44.177.146 (talk) 07:41, 30 May 2016 (UTC)[reply]
No, you are wrong in interpreting my logic. --Dirk Beetstra T C 08:09, 30 May 2016 (UTC)[reply]
  • Yes, there are spam issues — but as the IP editor above has stated, a more advanced filtering that does not limit legitimate use is needed. The filter as it exists now is disruptive to Wikipedia and does not help. Having XLinkbot analyze all new aditions by new users and IPs is a simpler and far less destructive alternative. Carl Fredik 💌 📧 12:30, 30 May 2016 (UTC)[reply]

Questions to User:Proud User

[edit]
  • You state that you do not see reason to blacklist the site. However, regularly since the initial use of several accounts and a large number of IPs there have been mass edits by similar groups (including other accounts and many unrelated IPs) in relation to external links related to archive.is (archive.today, archive.limited, etc.) (e.g. April 2015, February 2016). How do you think we should curb these (ongoing) spam issues? It was obvious in the original set of additions, as well as in the situation of February 2016 that blocking only resulted in them finding other accounts. And do you think that it is reasonable that that type of disruption is consuming editor time, and that it is reasonable that IPs need to get blocked while there may also be genuine editors using said IPs? --Dirk Beetstra T C 13:24, 30 May 2016 (UTC)[reply]
As I said, we should stop automated scripts from linking to archive.is. I should have also added "spammer." I think the best way to stop spammers from disrupting the system with archive.is links is to say something along the line of "you need to be autoconfirmed to add archive.is links." We can keep archive.today and archive.limited on the blacklist as it is used ONLY by spammers and I see no legitimate use. We should not ban archive.is links altogether, especially if there is no alternative. --Proud User (talk) 13:33, 30 May 2016 (UTC)[reply]
"you need to be autoconfirmed to add archive.is links." - this was the state from before-RFC1 to before-RFC2. Introducing prohibition for autoconfirmed editors to add archive.is links triggered RFC2 and RFC3.
The prohibition was asynchronous to any SPAM-actions, caused only by SPAM-fighters notice that in the mentioned timeframe the autoconfirmed editors added few thousands non-spammy links to archive.is.
"2016. Two autoconfirmed editors talking about circumventing archive.is spam-filter". It it would be allowed the rate of adding links by such editors would be compared to one of the spammers. 91.185.159.42 (talk) 16:42, 30 May 2016 (UTC)[reply]
@Proud User: thank you for your answer. There are alternatives, which are currently implemented, to the total ban: whitelist the links that are really needed. --Dirk Beetstra T C 03:20, 31 May 2016 (UTC)[reply]

Questions to User:Ianmacm

[edit]

You state "... there is little point in attempting to change all of the archive.is links manually when there are more important things to do ...". Do you expect that hunting down the spammers of archive.is, blocking the IPs and reverting their edits is going to give editors time to do 'more important things' than the few minutes that each editor could spend on replacing a few encountered archive.is links in a normal editing proces, or even have a bot add archive.org links to all cases where archive.is is currently listed, lowering the burden of editors to have to find suitable archive.org links themselves? --Dirk Beetstra T C 11:14, 1 June 2016 (UTC)[reply]

We're going to have to agree to differ on this. The evidence about the botnet hasn't convinced everyone. As for "the few minutes that each editor could spend on replacing a few encountered archive.is links", there must be hundreds in the articles on my watchlist. To say that it would be time consuming to replace all of them manually would be an understatement.--♦IanMacM♦ (talk to me) 11:22, 1 June 2016 (UTC)[reply]
@Ianmacm: I was not using an argument with a botnet. I would use an argument along the line of 'a lot of additions using an automated process using globally distributed accounts/IPs' - an strategy that has been repeated in 2015 and 2016 in relation to archive.is-domains.
I am also not asking you to replace the 100s of links on the 100s of pages on your watchlist (where I don't think that you are the single person watching those 100s of pages). That it is time consuming is something that I agree upon - but if editors would have done 1 page a week, you could easily have done a large part of those pages already. --Dirk Beetstra T C 11:42, 1 June 2016 (UTC)[reply]
There are currently an estimated 9500 links in mainspace (12661 in total, ~75% in mainspace). In 134 weeks that would have taken about 70 editors to take out one link a week. It doesn't seem that it is really such a burden: in that period (actually, a bit shorter period) about 18000 did get removed (I wonder what happened with all those references .. all down the drain or were there alternatives for all of them?). --Dirk Beetstra T C 12:12, 1 June 2016 (UTC)[reply]
This is starting to get into WP:BLUDGEON territory. I've stated my views on this issue and haven't got any more to say for the time being.--♦IanMacM♦ (talk to me) 12:24, 1 June 2016 (UTC)[reply]

How archive.is is spamed after blacklisting

[edit]

https://en.wikipedia.org/w/index.php?title=Special:Search&search=insource:/[Aa][Rr][Cc][Hh][Ii][Vv][Ee].?[Dd][Oo][Tt]/&ns0=1&fulltext=Search (archived copy)

The search link was found on @LLarson: tool page User:LLarson/sandbox. I did not track yet who (IPs or editors) have inserted these. Feel free to investigate and undo. 78.139.174.106 (talk) 13:29, 1 June 2016 (UTC)[reply]

Interestingly, one of those were added by admin WhisperToMe. Not sure what Beetstra et al. think about it, but that instance is really just a note to editors, not useful to readers. But if there's really no alternative, I guess it could be whitelisted. nyuszika7h (talk) 19:27, 1 June 2016 (UTC)[reply]
@Nyuszika7H: I've found instances where there's no other way to archive the link other than using that archive dot is tool, and/or the only remaining archive of that page is on archive dot is. If there's a way to whitelist individual ones that would be great WhisperToMe (talk) 19:30, 1 June 2016 (UTC)[reply]
@WhisperToMe: Yeah, it seems they are willing to do that now (at MediaWiki_talk:Spam-whitelist). Most editors probably weren't aware it's a possibility as the consensus of the previous RfCs was to remove all links (which didn't actually happen) and prohibit any new additions, with no mention of whitelisting. It wasn't even technically possible while it was using an edit filter rather than the spam blacklist. nyuszika7h (talk) 19:33, 1 June 2016 (UTC)[reply]
@Nyuszika7H: Probably, they are not willing to do it: "If this is the only source, the content is clearly not significant". Reference to the possibility of whitelisting seems to be solely the way to set up a Catch22: if it is the only source then one may ask for whitelisting; if it is the only source then the content is clearly not significant and the same input is used to deny whitelisting. 46.109.223.76 (talk) 19:54, 1 June 2016 (UTC)[reply]
IMO "If this is the only source, the content is clearly not significant" is a silly argument: it's crucial to hold onto any source one can possibly get, and if this is the only surviving version of a newspaper article cited to show some historical development of the subject you need to be able to link to it. Wikipedia works by "death by a thousand cuts" - one source doesn't seem significant but together they build the article WhisperToMe (talk) 22:10, 1 June 2016 (UTC)[reply]
The actual source is of course the newspaper article, not the archive. The problem was always verifiability. Unfortunately, newspapers are disappearing, and many are going online only. So without archive.is, we are in danger of losing the source completely. Hawkeye7 (talk) 23:05, 1 June 2016 (UTC)[reply]

I have no problem with having a hidden message adding another archive - what I do have problems with is deliberate circumvention of the blocks (which I did notice with some editors) and/or references along the line of 'type archive.is in the address bar of your browser, and find the archive of http://www.example.com' (or worse 'because archive.is is on the spam blacklist, find it yourself on archive.is') (I haven't seen example of this with archive.is, I referenced such a case above).

As I have repeatedly said on the whitelist, if there are no alternatives, then we are very willing to whitelist archive.is links (see MediaWiki_talk:Spam-whitelist#archive.is.2FDRIzl for an example). If there are alternatives, then use those. The the arguments of 'but the alternative, archive.org, may delete it in the future' will be met with a 'we thén switch by whitelisting archive.is'. I have tried to codify that practice in MediaWiki_talk:Spam-whitelist/Common_requests#archive.is, and that is a weaker reading than what is actually prescribed by RfC 3 (remove all, even if there is no alternative).

The argument here used by User:WhisperToMe If there's a way to whitelist individual ones that would be great (and it is also used by many others - people don't know that it can be whitelisted) is not holding ground - the warning you receive when trying to save a page with a blacklisted link is clearly stating the option of whitelisting the page. WhisperToMe, you did ran into the spam blacklist a couple of times, so you must have seen that warning.

I agree with others here, that the argument by User:JzG in "If this is the only source, the content is clearly not significant" is maybe a bit harsh (or needs more explanation), but saying that 'they are not willing to do that' is not the right conclusion out of that. --Dirk Beetstra T C 05:45, 2 June 2016 (UTC)[reply]

@Beetstra: where is the template for the message regarding a certain link being blocked? I don't remember the template having a message regarding whitelisting... (I see a message about it now but I don't remember the previous versions having it)
When editors want to see the source for a small article, and 40-60% portions come from a certain link from the Japan Times (yes, there is a link I need whitelisted originally from the Japan Times) it's a very necessary link.
WhisperToMe (talk) 05:54, 2 June 2016 (UTC)[reply]
Found it! In Lycée Seijo there is an article about the school's closing from The Japan Times ("Seijo Gakuen closes French campus."). No it doesn't source over 50% of the article content, but such an article is necessary for understanding the subject. The Wayback machine does not display archives of this item, and Webcitation also lacks a copy as "Search" -> "URL to find snapshots of:" returns nothing. WhisperToMe (talk) 06:09, 2 June 2016 (UTC)[reply]
@WhisperToMe: The last version of MediaWiki:Spamprotectiontext is from december 2013 and reads Request that just the specific page be allowed, without unblocking the whole website, by asking on the spam whitelist talk page. That same text was also there in 2011, which is a version before all of the cases where you hit the spam blacklist (I think the first time it was introduced was in October 2006.
Your case about Lycée Seijo is exactly the type of case I mean: get it whitelisted. The resistance you got in the quoted whitelist request can be countered - it is however you who has to make the case, you need the link and you know best why it is necessary. It is however again just one single case without proper statistics how often it is really needed and how often it can be replaced with other archives. However, the spam problem existed in 2013, again in 2015 and the editors were active in 2016 as well. Do you really want to give editors work catching the spammers and blocking their many, many IPs for the occasional archive.is link? --Dirk Beetstra T C 08:38, 2 June 2016 (UTC)[reply]
@WhisperToMe: under the current consensus (RfC 3, ignoring what we are having here), whitelisting of links that are needed should be a formality (even if that is already a very loose reading of the RfC). I, personally, will WP:AGF on any established editor that tells me 'I need this archive, it is nowhere else' and whitelist it (asking anyone who does not state that to go that extra mile, spot-checking in some other cases). My only worry is how to handle the IP-hopping editors who are interested in situations regarding archive.is (either spamming archive.is-related links (2013, April 2015), tagging archive.is-related links (while on other wikis updating/spamming them - February 2016), or heavily interact with discussions related to that (noindex pushing, interacting with this discussion, canvassing people to come here (to which at least onetwo persons reacted)). --Dirk Beetstra T C 11:48, 2 June 2016 (UTC)[reply]
@Beetstra. You forgot to add that the same IPs voted here on the your "oppose" side. And during this RFC they created a highly biased article on Encyclopedia Dramatica (depicting people behind the archives as nazi and collecting only "oppose" arguments from the discussions on Wikipedia) which brought biased people here. As for canvassing, no one has been interested yet in inviting the hot pro-archive advocates from the past RFCs (Lexein, ChrisGualtieri, Kheider, ...). Assuming the IPs a singular entity (albeit I believe in at least 3 actors behind the IPs and 2 competing actors behind the archives: .is and .limited) I would say their interest is orthogonal to the topic, they are interested in keeping the status quo (links are blacklisted as before, noindex as before, everything as before, ... - and there is no such option nor such party to support).

Allow extendedconfirmed users to edit through the blacklist

[edit]

Considering the sheer number of pages with blacklisted links, most of which are due to archive.is issues; and the fact that most of the spamming is done by IPs and new users, it makes sense for WP:EXTENDEDCONFIRMED users to be able able to edit through the blacklist (i.e. set the filter to warn but not disallow), or at least for archive.is' particular entry. It prevents shit like this where a vandal blanks a page with archive.is on it and reverting is hindered by the blacklist. Satellizer el Bridget (Talk) 04:07, 3 June 2016 (UTC)[reply]

Yes good point. Doc James (talk · contribs · email) 15:23, 5 June 2016 (UTC)[reply]
Very supportive of this idea — as long as it is technologically feasible. One problem that exists is that these articles will not be editable at all by anyone without extendedconfirmed rights, and changing these things may require some substantial work in the MediaWiki software. For the record I am supportive of allowing any edits through the blacklist if the user is extendedconfirmed — not only relating to archive.is. These edits should not be seen as de facto spam — and if spamming occurs (adding many similar links, not just the occasional black-list circumvention) the user can be dealt with appropriately. Carl Fredrik 💌 📧 18:12, 5 June 2016 (UTC)[reply]
This may work, if the system and policy is similar to editing of protected pages (as user:CFCF suggests. Should also technically not be too difficult, though it still has some unwanted side-effects (which is the reason the pages currently get tagged for having BLd links). This suggestion shoul be discussed on an VP-page VPT, likely), user:Satellizer. --Dirk Beetstra T C 19:32, 5 June 2016 (UTC)[reply]
I can't imagine how such a thing could work with current software, although no opinion on how hard it would be to change the software this way. Nyttend (talk) 15:22, 9 June 2016 (UTC)[reply]

WaybackMedic

[edit]

There is a bot removing links to archive(dot)org which disappeared from the archive due to changes in robots.txt. 5000 edits in less than a week. All these are potential insert points for links to archive(dot)is if archive(dot)is would be allowed as the last resort 179.176.202.185 (talk) 17:03, 3 June 2016 (UTC)[reply]

There is also an answer to the statistics question raised in this RFC: WaybackMedic author's estimation is 1:5 of archive(dot)org links are invalid ("est. 20k pages of ~100k checked"). 179.176.202.185 (talk) 17:17, 3 June 2016 (UTC)[reply]

Archives and Edit Wars

[edit]

There is an edit war in German Wiki. One party sent a removal request to Wayback Machine and the evidence used by another party was removed (screenshot). Archive.is still has the page so far but it could be the question of days or hours. — Preceding unsigned comment added by 78.84.89.4 (talkcontribs) 06:09, 8 June 2016

[edit]

This morning, I noticed two IPs, 185.127.244.160 (talk) and 27.4.121.209 (talk), mass-removing archive.is citations from articles. I don't have time to investigate each citation in hopes of finding suitable replacements, and, because of the blacklist, I can't roll them back either; the best I can do is undo and comment out the offending archive links, which I did on the articles on my watchlist. Is there a better way to handle this? Rebbing 16:48, 21 June 2016 (UTC)[reply]

The better way to handle it is to not add blacklisted links to Wikipedia. Since you were not able to rollback due to the blacklist, I presume that means the pages haven't been given an exception, so anyone is correct to remove them (pending, of course, the outcome of this RfC). — Rhododendrites talk \\ 17:20, 21 June 2016 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. No further edits should be made to this discussion.