Wikipedia:Requests for comment/Archived citations
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- The result was no consensus (9 support, 6 oppose). This RfC has been closed early as it did not generate much input from the community. On-going discussions have moved to a second RfC based on comments below and on this talk page, to move this proposal forward. Everybody is encouraged to join the discussion and express their views. - Hydroxonium (H3O ) 03:06, 28 February 2011 (UTC) (link to new RfC added –SJ 21:02, 22 March 2011 (UTC))
Should MediaWiki:Common.js be modified so everybody that visits the English Wikipedia will now have additional links — next to all external links — that point to the archived versions of those webpages at Wikiwix.com? - Hydroxonium (H3O ) 14:31, 11 February 2011 (UTC)
Summary
editLink rot is a major problem on Wikipedia. The French Wikpedia found a solution using cached webpages at Wikiwix.com and implemented it over 2 years ago. A task force on English Wikipedia is proposing the same solution. A board member from the Wikimedia Foundation has reviewed the related discussion and is supportive.
- Please refer to the on-going discussion of additional backup solutions to this problem in order to avoid a single point of failure.
This change affects everybody that visits the English Wikipedia with a browser that has javascript enabled, not just registered users, so your input is appreciated. Thanks. - Hydroxonium (H3O ) 14:31, 11 February 2011 (UTC)
Details
editWikipedia relies on verifiable information from reliable sources to ensure that the information it provides is accurate. Wikipedia uses external links to webpages to verify information in articles. There are currently 17.5 million external links used on the English Wikipedia (see list 844 MB download) with a few thousand links added each day. These links often go dead, which is referred to as WP:LINKROT. The number of articles marked with dead links has been increasing and is now over 100,000 (see graph).
A task force was formed to deal with this issue and found that the French Wikipedia implemented a solution in October 2008 (see translated proposal}. The solution involves modifying the javascript code that is common to everybody that visits Wikipedia. The modified code adds an "[archive]" blue link to cached versions of webpages from the French search engine Wikiwix.com (see translated example, note however that in the original proposal the links were green). The "[archive]" links are added next to all external links on the English Wikipedia for visitors that have javascript enabled.
Proposal
editModify MediaWiki:Common.js to include fr:Utilisateur:Pmartin/cache.js so that everybody will have a link to cached versions of all external links.
Testing
editThis solution has been in use at fr.wikipedia for over 2 years and has been thoroughly tested. Users that would like to test it may add fr:Utilisateur:Pmartin/cache.js to their vector.js.
You could test to add this line
importScript("User:Pmartin/cache.js");
in your vector.js.
In Lady_Antebellum this is archive of the 4 reference [1]
Consensus required
editImplementing this feature requires consensus from the community as it affects everybody that visits English Wikipedia using a browser with javascript enabled. Please use the discussion area below for extended comments. Thanks. - Hydroxonium (H3O ) 14:31, 11 February 2011 (UTC)
Support (modify Common.js to add link)
edit- Support as a proven solution. - Hydroxonium (H3O ) 14:33, 11 February 2011 (UTC)
- Support : I've tested it on three French wikis without problem. JackPotte (talk) 16:49, 11 February 2011 (UTC)
- Of the proposed solutions, this is basically the only one that is proven and appears feasible in the near-term (i.e. next several months). Mr.Z-man 22:20, 11 February 2011 (UTC)
- Support : (edit/note: This is an indirect reply to BetaCommand (Δ)/others on this page that had opposed at the time this post was written.) This is a problem that needs to be solved as quickly as possible. I see very few drawbacks to using such a service. This is something that has literally already been tried, tested, and succeeded quite well for years. While we may not be fr wiki that doesn't mean we have to be different just for the sake of being different. Sometimes time spent reinventing the wheel results in a great new idea or service, but often reinventing the wheel just amounts to time spent (wasted?) reinventing the wheel. If we feel compelled to show a disclaimer saying that you are connecting to an external site we could always do that via an onwiki page, however I don't feel as if that is necessary, as it should already be covered under the general disclaimer/WP:NOT#CENSORED. Really I don't see how linking to a cached version of an objectionably site is any worse than having an objectionable link in the article in the first place. The bottom line is the nature of Wikipedia is such that sections of it will not be appropriate to children. That is one of the primary arguments against WP:NOT#CENSORED that you encounter. If parents feel that strongly about it then they can block wikipedia using software either freely available or built into windows. I really fail to see how we are any worse than google in linking to potentially inappropriate content or the other few billion sites that are inappropiate for children. Wikipedia's core mission is not to serve/be suitable for children, this is NOT Club Penguin. It is the nature of the internet that much of it is not suitable for kids, and it's pretty much impossible to protect kids who actively seek out (see search term #4) inappropriate content. --nn123645 (talk) 03:42, 13 February 2011 (UTC)
- Support : Proven solution; as an editor of both, I always wondered why en didn't have that feature. [[CharlieEchoTango]] 07:20, 15 February 2011 (UTC)
- Support : I can't think of a reason to oppose. This solution seems very good and I have confidence any future problems will swiftly be adressed. Yoenit (talk) 21:47, 15 February 2011 (UTC)
- Support : Whichever service is used, I think this is a very good idea (or we could go with a combo of services or we could put a sign-up for any archiving service that wanted to join in, but if we want to do some more elaborate set up like that, I'd suggest starting out with using Wikia for archiving and then thinking about adding more). I don't think I need to argue that link-rot is a problem (although let me throw out the nightmare scenario - Google goes out of business, all Google books links go dead). Moreover, we should not be complacent about this issue just because it has been batted around for years now, link-rot is right now acting against Wikipedia's verify-ability and reducing content quality. Now I think archiving should be a policy for all Wikipedia links, but I see resistance to that, and it would be an immense shame to delay ourselves further while searching for an ideal solution. So let's split this debate - let's consider separately archiving external links generally and archiving cite-web links (or more generally links in citations because not everyone uses cite-web templates), a proposal which seems to have more support. If we can get at least archiving going for cite-web links it'll be an immense boon for Wikipedia, and to be honest I can live with a long, rambling debate over the wisdom of archiving the other links. Jztinfinity (talk) 23:34, 16 February 2011 (UTC)
- Support : As a user of fr.wiki, this is a great solution. Please adopt! — Preceding unsigned comment added by Pulverpulver (talk • contribs) 18:19, 18 February 2011 (UTC)
- Support: This looks like a very useful service; it has been well received on fr.wp for some time. A good example of exchanging best practices across wikis. –SJ 22:07, 22 February 2011 (UTC)
Oppose (do not modify Common.js)
edit- Oppose for now. This should be tried using {{Dead link}} with the parameter use_cache=yes and url= so we can see how this would work here. – Allen4names 16:37, 11 February 2011 (UTC)
- Can you explain what you mean a bit more please? I assume you mean that we should change the deadlink template to include a link to archives, but the problem with that is that generally speaking, by the time a link is dead, it is too late. I've spent a fair bit of time trying to fix dead links and only ~50% can be found at the internet archive, for example. SmartSE (talk) 11:21, 13 February 2011 (UTC)
- Point made. The template I should have referred to is {{Cite web}}. Mind you I don't make enough effort to archive links myself, but we should not give preference (i.e. by policy or guideline) to one web archive service over another so some means of choice should be offered. I am not sure how to do that. – Allen4names 18:39, 13 February 2011 (UTC)
- Wikiwix is, as far as I know, the only archive service that's willing to not only do this for free, but do all practically the work themselves. All we have to do is use the script. The other services require, at least, a bot to tell the service which links to archive. And if we use template parameters rather than a script, we'll need a bot to make hundreds of thousands of edits to edit the templates in articles. Other services may also require a fee or other investment from the WMF. Mr.Z-man 18:50, 13 February 2011 (UTC)
- Please note that in the trial period there would be no mass edits by bots. The {{Cite web}} template can be modified to add wikiwix_test to the class HTML parameter and the script modified accordingly so the the archive link will only appear if archiveurl= is absent or use_cache= is set to yes. I strongly dislike the idea of relying on one web archival service. – Allen4names 20:02, 13 February 2011 (UTC)
- So instead, we should rely on zero, until we can get multiple ready? Why would we only test it on a limited number of links for a trial, when it's significantly easier to enable (and disable) it globally? Mr.Z-man 22:53, 13 February 2011 (UTC)
- If this is tested and we end up relying on it too much I hope it gets disabled quickly and permanently. This can become an evil dependency very quickly with users not bothering to use any other web archiving service (WebCite for example) I could only hope that the Wayback Machine keeps working if such a thing where to occur. – Allen4names 03:56, 14 February 2011 (UTC)
- If someone wants to run another system in parallel, there's nothing stopping them from doing so. But why on earth should the lack of one working system prevent us from using another? Why is zero better than one? "Users" should not have to do anything. Anything that requires users to manually archive links is a massive waste of time. We shouldn't require users to go through a pointless process for something that can be easily automated. I should also note that the operator of Wikiwix has agreed to provide backups of the data. Mr.Z-man 21:54, 14 February 2011 (UTC)
- I am algree also to give a rss of each link we detect from recent change for help another archive solution, that it s just need any archive to work. This tool could be develop by the tool server, if you does not want a dependencies. Pmartin (talk) 23:02, 14 February 2011 (UTC)
- If we enable this globally, we won't have a way to inhibit the links in cases were Wikiwix doesn't work, as shown in my oppose below. Also, it will be awkward to add links to other archive services. I don't like the idea of handing over control of hundreds on links on some pages to an outside group with no control for an editor. Yes, we can complain to Wikiwix, but who knows if it will do any good or if we will even be understood if we don't know French. —UncleDouggie (talk) 08:52, 15 February 2011 (UTC)
- Wikiwix is my website, to add or to remove our archive only one update is necessary on wikipedia. So if in the futur you think that we are borderline just ask to one admin to remove the script from common. So it does not a problem. We have working also with some chapters [2] or [3], do you think that is because we are not trust ? Pmartin (talk) 09:24, 15 February 2011 (UTC)
- There might not be consensus for removal because it works on some articles. However, if it doesn't work on an article that I'm editing, I would have no recourse to disable it on that one article only. Who pays for your hosting? I don't see any ads. It just seems like something that Wikimedia should be hosting. —UncleDouggie (talk) 10:01, 15 February 2011 (UTC)
- There is on option to disable the link when you write it class="noarchive". Our server are on Renater french network university. I think wikimedia could not store this data because there is a lot of trademarks, copyrighting data, .... Pmartin (talk) 10:10, 15 February 2011 (UTC)
- The foundation has shown no interest in starting any kind of project like this even though we have known for years that this is a major problem. To oppose simply because you think the WMF should host it in my opinion is imprudent, as even if your able to get the foundation to agree it will take months/years for it to get implemented. Just look at how much attention they have given Patrolled Revisions and Liquid Threads have taken, the latter of which STILL isn't implemented and both of those changes don't require any major change in existing infrastructure. Unlike many other problems this backlog is more than one of just inconvenience, some of these links can't be replaced once they go dead. Designing an alternative system is great but we should have something in place at least as an interim measure while it is being developed here. This is a situation where redundancy is not a bad thing. As far as it not working on every article I don't think that is too big of a deal. Having a link that goes to a 404 on some articles is better than not having a link at all to an archive. There is nothing preventing us from adding more than one link to an archive service in the script, and in the highly unlikely event that wikiwix goes rouge it would be a matter of a single edit to common.js in order to remove the link. I think it is pretty well understood by most people that external links, or all of wikipedia for that matter, may contain offensive content. --nn123645 (talk) 17:23, 15 February 2011 (UTC)
- Final comment. In my opinion implementing this in MediaWiki:common.js will be a net negative with regards to dead links. This should be used for dead link repair with the archived versions verified by editors and should be added as a gadget with the default set to off. — Allen4names 16:50, 16 February 2011 (UTC)
- There might not be consensus for removal because it works on some articles. However, if it doesn't work on an article that I'm editing, I would have no recourse to disable it on that one article only. Who pays for your hosting? I don't see any ads. It just seems like something that Wikimedia should be hosting. —UncleDouggie (talk) 10:01, 15 February 2011 (UTC)
- Wikiwix is my website, to add or to remove our archive only one update is necessary on wikipedia. So if in the futur you think that we are borderline just ask to one admin to remove the script from common. So it does not a problem. We have working also with some chapters [2] or [3], do you think that is because we are not trust ? Pmartin (talk) 09:24, 15 February 2011 (UTC)
- If we enable this globally, we won't have a way to inhibit the links in cases were Wikiwix doesn't work, as shown in my oppose below. Also, it will be awkward to add links to other archive services. I don't like the idea of handing over control of hundreds on links on some pages to an outside group with no control for an editor. Yes, we can complain to Wikiwix, but who knows if it will do any good or if we will even be understood if we don't know French. —UncleDouggie (talk) 08:52, 15 February 2011 (UTC)
- I am algree also to give a rss of each link we detect from recent change for help another archive solution, that it s just need any archive to work. This tool could be develop by the tool server, if you does not want a dependencies. Pmartin (talk) 23:02, 14 February 2011 (UTC)
- If someone wants to run another system in parallel, there's nothing stopping them from doing so. But why on earth should the lack of one working system prevent us from using another? Why is zero better than one? "Users" should not have to do anything. Anything that requires users to manually archive links is a massive waste of time. We shouldn't require users to go through a pointless process for something that can be easily automated. I should also note that the operator of Wikiwix has agreed to provide backups of the data. Mr.Z-man 21:54, 14 February 2011 (UTC)
- If this is tested and we end up relying on it too much I hope it gets disabled quickly and permanently. This can become an evil dependency very quickly with users not bothering to use any other web archiving service (WebCite for example) I could only hope that the Wayback Machine keeps working if such a thing where to occur. – Allen4names 03:56, 14 February 2011 (UTC)
- So instead, we should rely on zero, until we can get multiple ready? Why would we only test it on a limited number of links for a trial, when it's significantly easier to enable (and disable) it globally? Mr.Z-man 22:53, 13 February 2011 (UTC)
- Please note that in the trial period there would be no mass edits by bots. The {{Cite web}} template can be modified to add wikiwix_test to the class HTML parameter and the script modified accordingly so the the archive link will only appear if archiveurl= is absent or use_cache= is set to yes. I strongly dislike the idea of relying on one web archival service. – Allen4names 20:02, 13 February 2011 (UTC)
- Wikiwix is, as far as I know, the only archive service that's willing to not only do this for free, but do all practically the work themselves. All we have to do is use the script. The other services require, at least, a bot to tell the service which links to archive. And if we use template parameters rather than a script, we'll need a bot to make hundreds of thousands of edits to edit the templates in articles. Other services may also require a fee or other investment from the WMF. Mr.Z-man 18:50, 13 February 2011 (UTC)
- Point made. The template I should have referred to is {{Cite web}}. Mind you I don't make enough effort to archive links myself, but we should not give preference (i.e. by policy or guideline) to one web archive service over another so some means of choice should be offered. I am not sure how to do that. – Allen4names 18:39, 13 February 2011 (UTC)
- Having people manually add the archive links will not improve the reliability of archiving, it just creates a mountain of busywork and, if we can't keep up with the backlog, an inconvenience for readers. If anything, having fewer people actually checking links on the archive has the potential to decrease the reliability if errors and bugs in the archive service are missed for longer. If we wait until links actually die before we check for archived versions, we're screwed if the archive is bad. Mr.Z-man.sock (talk) 17:08, 17 February 2011 (UTC)
- Hello Allen4names, the comment about not "building in" reliance on a single service is a good one. I would be interested in an implementation that involves having a copy of the cache stored on a Wikimedia server or the toolserver. Since Wikiwix uses an open format for their caching, this should be doable; as soon as we get a commitment from another service, we can implement active redundancy (in addition to those backups) –SJ 22:15, 22 February 2011 (UTC)
- We are talking about a project that will require several TBs of space. AFAIK at the current time the toolserver does not have the capacity to handle a project like this. --nn123645 (talk) 16:36, 24 February 2011 (UTC)
- Hello Allen4names, the comment about not "building in" reliance on a single service is a good one. I would be interested in an implementation that involves having a copy of the cache stored on a Wikimedia server or the toolserver. Since Wikiwix uses an open format for their caching, this should be doable; as soon as we get a commitment from another service, we can implement active redundancy (in addition to those backups) –SJ 22:15, 22 February 2011 (UTC)
- Can you explain what you mean a bit more please? I assume you mean that we should change the deadlink template to include a link to archives, but the problem with that is that generally speaking, by the time a link is dead, it is too late. I've spent a fair bit of time trying to fix dead links and only ~50% can be found at the internet archive, for example. SmartSE (talk) 11:21, 13 February 2011 (UTC)
- Oppose VERY VERY Bad idea, [4] is one example of how it can go very wrong, basically it showed porn advertisements to a minor. ΔT The only constant 18:23, 11 February 2011 (UTC)
- The whole thing is a misunderstanding. The staff was experimenting on an index for Twitter, and how to reuse it's tweets into an interesting and contextual list of links. They tried to integrate it into Wikiwix. It turns out, the feature wasn't understood well by users, and links to contents unsuitable for children could be created. However, the community quickly reacted, and was listened to. Pmartin quickly disabled the feature on Wikiwix. Note that there was no advertising, the links were only reused content from Twitter. There was no commercial interest. This was a serious issue indeed, but it has be solved. So I believe we can move on now. Cheers, Dodoïste (talk) 22:03, 11 February 2011 (UTC)
- Please Δ, be serious, try to check if what you write is true before you write it: it was not advertising, and how do you know that the girl was a minor? How can you write that? 81.64.104.59 (talk) 10:23, 12 February 2011 (UTC)
- Please actually read that thread, if you cant read french grab a translator and take a good read, the issue was raised because the daughter of the user in question was using fr.wiki to do some research (for school I think). Honestly I cannot support such a large scale project when other solutions exist with far fewer issues. ΔT The only constant 20:39, 14 February 2011 (UTC)
- It s not our advertising, in the 48 hours we have solve the problem, you could see here what s happen [5]. I want to add if I want make a porn search engine, I think it was a very bad idea to test it on wikipedia :) Pmartin (talk) 22:42, 14 February 2011 (UTC)
- Δ, once again, be serious, and stop writing things that you have no idea if they are true or false. I wan't to discuss about facts and logical arguments, I don't have any time to loose with your approximations (for instance, you tell me to take a good read at the thread, and say that the girl is "the daughter of the user" whereas she's his niece, you say that she was using fr.wiki to do some research for school although you have actually (nor I) no idea why she was using it). So please, stop writing approximations, it's something that makes me nervous, I dislike this way of arguing, I want to make my opinion on true facts, not on biased facts and suppositions x-( thank you 81.64.104.59 (talk) 00:35, 15 February 2011 (UTC)
- I read that thread in fr; the problem was misconstrued, and the advertising seen was not on a Wikiwix site. As I understand it, it is not possible for any sort of ads to be seen in that fashion now, since the interface improvement mentioned above. –SJ 22:15, 22 February 2011 (UTC)
- Oppose I think Betacommand hits the nail on the head. Simply put, we can't simply put extra links to who knows what all over the place; we should be checking our existing external links first and then adding links to archived versions. While the French Wikipedia may have used this for two years, we are not the French Wikipedia and I think a small-scale test is something is would be necessary first, to see if this is worthwhile. I am, however, unopposed to this being a gadget, with the default to off. /ƒETCHCOMMS/ 21:55, 11 February 2011 (UTC)
- A small-scale test sounds like a fine idea. Perhaps we could start with a single category that uses a lot of webcites? –SJ 22:15, 22 February 2011 (UTC)
- Oppose the suggested implementation. If we're going to link archival material using javascript - which is not, in itself, a bad idea - we should be utilizing multiple archival sites, we should prefer sites with a clear copyright compliance policy, and we should prefer sites that don't inject advertising - see the comment above by Δ for one of several good reasons why the last point is very important. I'm not opposed to a version of this javascript solution that checks Archive.org and WebCite, although the latter is tough to link to programmatically. — Gavia immer (talk) 22:08, 11 February 2011 (UTC)
- Please, oppose to the proposition, but not based upon bad arguments, and moreover false: no advertising was displayed. See the explanation of Dodoïste, or this link for more explanation how it all happened: http://blog.wikiwix.com/en/2011/02/03/archivage-des-liens-externes-le-probleme-a-ete-traite/ 81.64.104.59 (talk) 10:23, 12 February 2011 (UTC)
- I agree with Gavia. And as far as I can tell after looking through the wikiwix solution, they have a solid copyright compliance policy, and they don't inject advertising. (They are a better solution than webcite in this regard) Archive.org should definitely be pursued as a redundant solution, and they can share the same dataset and cache-format with wikiwix - but their implementation will be a bit slower. –SJ 22:15, 22 February 2011 (UTC)
- Wikiwix does inject advertising of themselves. They place a banner across the top of the archived page with links to their Wikipedia search engine that has Google ads. They don't currently place Google ads on the archived pages themselves and it sounds like they plan to keep it this way because they would probably get dropped by fr.wikipedia if they did so. I understand the need for the banner ads; they have to make money somehow to support the archiving service. The question is if we find this to be an acceptable trade-off. We desperately need an archiving service. I would prefer that the Foundation and WebCite reach an accord that gives us automatic, high-speed, bannerless archiving. If that's not going to happen within the next 30 days, we should go with Wikiwix because they are far better than our current sad state of affairs. —UncleDouggie (talk) 07:13, 23 February 2011 (UTC)
- I agree with Gavia. And as far as I can tell after looking through the wikiwix solution, they have a solid copyright compliance policy, and they don't inject advertising. (They are a better solution than webcite in this regard) Archive.org should definitely be pursued as a redundant solution, and they can share the same dataset and cache-format with wikiwix - but their implementation will be a bit slower. –SJ 22:15, 22 February 2011 (UTC)
- Please, oppose to the proposition, but not based upon bad arguments, and moreover false: no advertising was displayed. See the explanation of Dodoïste, or this link for more explanation how it all happened: http://blog.wikiwix.com/en/2011/02/03/archivage-des-liens-externes-le-probleme-a-ete-traite/ 81.64.104.59 (talk) 10:23, 12 February 2011 (UTC)
- Oppose Obv. - per above; per common-sense; doesn't fix the problem, adds excess crud links, breaks things, etc etc Chzz ► 03:44, 12 February 2011 (UTC)
- What does it break, and how does it not fix the problem? Mr.Z-man 05:40, 12 February 2011 (UTC)
- Oppose – Blindly adding links that we hope work right doesn't seem to make much sense. On a new ref, Wikiwix won't have the page archived yet so the extra link would be dead. Same thing if the page has a robots.txt file that prevents them from archiving it. I recommend that our dead link bots check for alternate archived versions and do something smart. Either change the main link and move the original to a comment, or add extra archived link(s) to the end of the citation. —UncleDouggie (talk) 06:34, 13 February 2011 (UTC)
- From my understanding, Wikiwix follows Wikipedia's recent changes, so any link added will be archived almost instantaneously. For old links, Wikiwix checks with other archive services. Mr.Z-man 07:24, 13 February 2011 (UTC)
- You're correct, Mr.Z-man. The only websites that are not archived by Wikiwix are those that contain a robots.txt file that prevents them from archiving it. The robots.txt limitation is similar in every archive service, as they all have to comply with the law. Cheers, Dodoïste (talk) 10:03, 13 February 2011 (UTC)
- We would still be automatically creating dead links for sites that opt out via robots.txt. How does the French Wikipedia handle this? —UncleDouggie (talk) 11:09, 13 February 2011 (UTC)
- Those dead links - small [archive] links - links to a Wikiwix page which explains that the website contain a robots.txt file that prevents archiving. Actually, it turned out to be useful information for users: they are now able to easily know that this website will be unavailable one day and there will be no archive able to retrieve it. They can search for a better source. We also have other means to know which website disallow archive robots, but this contextual feature proved to be handy too. Cheers, Dodoïste (talk) 13:54, 13 February 2011 (UTC)
- I tried to test it but I don't see any archive links even after a full purge/refresh. I see that my browser is loading the script. —UncleDouggie (talk) 04:41, 14 February 2011 (UTC)
- At the present moment, the backed up sites are those which are linked from the French wikis: do you see it there? JackPotte (talk) 16:38, 14 February 2011 (UTC)
- Yes, I see the links. So, this proposal can't be implemented today because nothing has been archived and even if we did do it it wouldn't help any of our currently dead links. Many of the archive links on the page you gave me work, but this one tries to download a PHP file to my computer. Not a great idea to subject our readers to. —UncleDouggie (talk) 17:54, 14 February 2011 (UTC)
- There is no problem with my browser (FF, IE8), the PDF file is downloaded normally. Anyway, such bugs should be reported at the dedicated page. Pmartin (talk) 20:35, 14 February 2011 (UTC)
- I also get the PDF correctly in my browser (Firefox 4 beta).
- The Wikiwix archive needs about three weeks of crawling Wikipedia's external links in order to become fully operational. The current debate is: "should Wikiwix run the crawl and set the archive ready?" Pmartin doesn't want to set up this archive if it won't be used. The community has to choose to use Wikiwix beforehand. Cheers, Dodoïste (talk) 21:21, 14 February 2011 (UTC)
- It fails every time in FF 3.6.13 but it does work for me in Safari 5.0.3. —UncleDouggie (talk) 01:20, 15 February 2011 (UTC)
- We have testing from Safari 5.0.3 on Mac and no problem also. Pmartin (talk) 09:42, 15 February 2011 (UTC)
- Why does the link I posted not work in FF 3.6.13? —UncleDouggie (talk) 10:04, 15 February 2011 (UTC)
- I don't know yet, we trying to make the bug. Do you have the same problem here Pmartin (talk) 10:24, 15 February 2011 (UTC)
Yes, same problem with that link. —UncleDouggie (talk) 10:37, 15 February 2011 (UTC)That link works as expected in that I get the option to save the PDF file. The problem with the first link is I get a dialog to save "Display.php" —UncleDouggie (talk) 10:40, 15 February 2011 (UTC)- Could you test now this link ? Pmartin (talk) 18:00, 15 February 2011 (UTC)
- It works! —UncleDouggie (talk) 19:29, 15 February 2011 (UTC)
- You could test the javascript on the test section. Pmartin (talk) 11:53, 16 February 2011 (UTC)
- It works! —UncleDouggie (talk) 19:29, 15 February 2011 (UTC)
- Could you test now this link ? Pmartin (talk) 18:00, 15 February 2011 (UTC)
- Why does the link I posted not work in FF 3.6.13? —UncleDouggie (talk) 10:04, 15 February 2011 (UTC)
- We have testing from Safari 5.0.3 on Mac and no problem also. Pmartin (talk) 09:42, 15 February 2011 (UTC)
- It fails every time in FF 3.6.13 but it does work for me in Safari 5.0.3. —UncleDouggie (talk) 01:20, 15 February 2011 (UTC)
- There is no problem with my browser (FF, IE8), the PDF file is downloaded normally. Anyway, such bugs should be reported at the dedicated page. Pmartin (talk) 20:35, 14 February 2011 (UTC)
- Yes, I see the links. So, this proposal can't be implemented today because nothing has been archived and even if we did do it it wouldn't help any of our currently dead links. Many of the archive links on the page you gave me work, but this one tries to download a PHP file to my computer. Not a great idea to subject our readers to. —UncleDouggie (talk) 17:54, 14 February 2011 (UTC)
- At the present moment, the backed up sites are those which are linked from the French wikis: do you see it there? JackPotte (talk) 16:38, 14 February 2011 (UTC)
- I tried to test it but I don't see any archive links even after a full purge/refresh. I see that my browser is loading the script. —UncleDouggie (talk) 04:41, 14 February 2011 (UTC)
- Those dead links - small [archive] links - links to a Wikiwix page which explains that the website contain a robots.txt file that prevents archiving. Actually, it turned out to be useful information for users: they are now able to easily know that this website will be unavailable one day and there will be no archive able to retrieve it. They can search for a better source. We also have other means to know which website disallow archive robots, but this contextual feature proved to be handy too. Cheers, Dodoïste (talk) 13:54, 13 February 2011 (UTC)
- We would still be automatically creating dead links for sites that opt out via robots.txt. How does the French Wikipedia handle this? —UncleDouggie (talk) 11:09, 13 February 2011 (UTC)
- You're correct, Mr.Z-man. The only websites that are not archived by Wikiwix are those that contain a robots.txt file that prevents them from archiving it. The robots.txt limitation is similar in every archive service, as they all have to comply with the law. Cheers, Dodoïste (talk) 10:03, 13 February 2011 (UTC)
- From my understanding, Wikiwix follows Wikipedia's recent changes, so any link added will be archived almost instantaneously. For old links, Wikiwix checks with other archive services. Mr.Z-man 07:24, 13 February 2011 (UTC)
Discussion
editMy main concern is relying exclusively on this solution as if it fails, we will be back where we started with a bunch of dead links. So I am encouraging others to join in the on-going related discussion here. Thanks. - Hydroxonium (H3O ) 14:38, 11 February 2011 (UTC)
- How about this: 1) The archive link takes you to a toolserver script (could eventually be a MediaWiki special page). 2) That special page explains the principle of link archiving and how new archive providers can join, disclaims liability, and has a timer-based redirect to a default archive. (If we get more than one, we could have a radiobutton selection with a cookie storing the user preference). This way we use archiving simultaneously as a way to advertise the need for more archiving. :-)--Eloquence* 21:25, 11 February 2011 (UTC)
- It will require a solid infrastructure for the Toolserver, otherwise there will be a serious overload. This can be done, but there is no way that the current Toolserver's servers would be able to sustain the load. Dodoïste (talk) 21:53, 11 February 2011 (UTC)
- Im actually in contact with webcitation.org in order to facilitate archiving. without the problems of wikiwix ΔT The only constant 23:02, 13 February 2011 (UTC)
Suggested improvements
editA number of people seem supportive of the idea, but not of this particular implementaion. So I would encourage everybody to submit suggested improvements to this idea. Thanks. - Hydroxonium (H3O ) 16:51, 14 February 2011 (UTC)
- Instead of using a script, the archive could be implemented trough {{Cite web}} as a test case. However, the only feasible way to do it would be to have the archive enabled on all inclusions of "Cite web". If users have to add manually some code like
|wikiwix_archive=true
on all 800.000 transclusions, it would miss the point. Same thing if a robot were to make those changes. - In short: we can imagine a few different ways to implement it. But there is one constraint: the community has to adopt Wikiwix on a fairly large scale, or Pmartin won't set the archive running. Cheers, Dodoïste (talk) 21:45, 14 February 2011 (UTC)
- The only solution is to make it completely perfect from the beginning. So far there is opposition due to a single incident last month that was fixed, several people opposing because we won't be running several archiving systems in parallel, and at least one person opposing because it's using a single script rather than a bot making millions of edits. It's another case of the perfect being the enemy of the good causing the English Wikipedia to stagnate. Mr.Z-man 22:03, 14 February 2011 (UTC)
Questions
editAfter reading this proposal several issues remain unclear to me:
- Why does this need a script (dynamic page updates are bad) as oposed to modifying {{cite web}}?
- Why does there need to be an archive link behind every external link, as opposed to only behind dead links.
—Ruud 15:28, 17 February 2011 (UTC)
- Good questions.
- This does not need a script to be implemented. It was just the easiest and quickest solution. I proposed modifying {{Citation/core}} in this discussion, which would update the major templates like {{cite news}} and {{cite web}}. But there has been some opposition to that suggestion.
- There does not need to be an archive link for all external links. Again, this was just the easiest solution.
- Hope that answers your questions. - Hydroxonium (H3O ) 15:43, 17 February 2011 (UTC)
- Modifying the template has several issues. 1) It requires a bot to make potentially more than a million edits, it'll take months just to do all the existing ones. 2) Citation templates are not required for formatting citations (and there is significant opposition to requiring them), using the templates misses all the links used as references that don't use the templates. 3) The citation templates are already extremely slow, compounded by the fact that they're used multiple times in articles. Some articles with lots of citations already take upwards of 30 seconds to parse, adding more features will only make this worse. Mr.Z-man.sock (talk) 17:18, 17 February 2011 (UTC)
- Plus, it's quite hard to determine when a link is dead. A link can die temporarily, because of a server down for some time, for example. So a bot would need to check on all external links on a regular basis to ensure they did not go dead. Or a webpage could be moved to another url name and be replaced by another webpage, thus causing a change of content without a 404 error, which is impossible to detect automatically.
- For these reasons and the ones mentioned above, it is incredibly easier to archive all links, and display the said archive for all links. Yours, Dodoïste (talk) 20:49, 17 February 2011 (UTC)
- Isn't the issue that we need to archive a page before it's dead? While it's great to link to archives on dead pages, despite the best efforts of the archival services not all pages end up archived, however, by triggering an api by an archival service you can force the page to be archived, just in case.Jztinfinity (talk) 21:27, 23 February 2011 (UTC)
Closing this RfC to make progress
editPer the discusion on the talk page, I will be closing this RfC soon unless somebody has an objection. We would like to make some progress and feel a small test would be beneficial. Thanks. - Hydroxonium (H3O ) 16:20, 26 February 2011 (UTC)