User:Faebot
Jump to navigation
Jump to search
Log 2020
[edit]- United Kingdom Parliamentary photographs housekeeping tasks for this periodic batch upload.
Log 2019
[edit]- CDC videos a long term upload project moved to WMF labs, and newly under Faebot account, after the heavy video recoding needing was over demanding for an old laptop.
- Wayback addition of Wayback Machine links for selected large GLAM related batch uploads.
Log 2018
[edit]- Talk page trimmer, after requests.
- Category:Files from Internet Archive Book Images Flickr stream ongoing maintenance as described in the Category notes. Not completely reliable!
Log 2017
[edit]- Populating Category:Uploads by Fæ with linkrot after related discussions.
- User:Fæ/code/PD-USGov - license diffusion from past batch uploads.
Log 2014
[edit]
A live report of uploads can be found using this catscan2 report. Some reference tables of ongoing uploads per month are available at User:Faebot/reports/catscan.
June 2014
[edit]- Upgrade all jpeg images in Photochrom prints to same resolution as tiff (around 3x larger) by local download of the tiff and recompiling as jpg. This will apply to around 6,000 images and will be slow as it will rely on home broadband for both tif download and jpg upload.
- Update Fellows collection with an additional few thousand images and create Fellows animations after some Village Pump discussion on maximum working GIF sizes (23MP total).
- Finalize Photochrom prints collection, including sub-categorization by country. Announcement on Commons Village Pump. Parallel discussion on the German Wikipedia. An update to GWT changes functionality and some significant post-upload housekeeping was systematically run.
- Ongoing discussions with the Wellcome Digital Library and Cancer Research UK. It is likely that Faebot will be able to support these this summer.
- The next tranche of Avionics uploads will rely on paying of forum membership. As my own payment for Wikimedia UK's membership was rejected by the Chief Executive, this is on hold, as proposals should be from a member of the charity.
May 2014
[edit]-
Full hunter pocket watch, Waltham
-
Art Nouveau silver condiment set, c.1905
-
Commons Featured Picture from the WMUK supported uploads from the Ministry of Defence.
-
William Pitt as an alchemist, 1796
-
Constantinople street market, 1890s
- Photochrom prints collection: 20R - commenced, xml generation on WMUK kit, though the GWToolset runs the file upload.
- Images from Fellows (auctioneers): 0R - announcement pending by Andy.
- Re-visit of the Library of Congress British Cartoon Prints collection in order to general a set of files that were previously skipped, primarily as they were over 100MB. Catscan report
- NYPL maps upload exceeds 10,000 files. Notice posted at the Village pump to attract volunteers to start using them. --Fæ (talk) 10:37, 9 May 2014 (UTC)
April 2014
[edit]- Los Angeles County Museum of Art 2,354 new photographs
- Update to my original upload of 20,000 photographs from the museum in 2013. Based on a request from PKM, a user well known for her GLAM work.
- Aircraft photos by Robert Frola, part of the ongoing Airliners batch upload project.
- Experimenting with the Python "flickrapi" module made it possible to upload an additional 3,083 photographs from Flickr that were All Rights Reserved, but these images are covered by a specific email release. Though this is running as a direct upload, similar code may be useful on other projects to generate a dataset for use with the GWToolset.
- A few hundred photographs had previously been loaded from this Flickrstream of high quality early photographs from Ireland. This was an update to add the latest 220 archive images, now the NLI appears to have completed this Flickr Commons project.
- NYPL maps: 3642, NYPL maps (over 50 megapixels): 1249
- The release of 20,000 high quality map images by the New York Public Library was of interest by volunteers on email lists and the Village Pump, along with requests for a bot to help with consistent uploads. This is more complex than it first appears, as direct links to the high resolution tiffs cannot be automatically "scraped" from the public website. I signed up for access to the NYPL API, and use this to deduce the url to download from. This is limited to 10,000 transactions. See Commons:Batch uploading/NYPL Maps. Many of the images are greater than 100MB, these are currently being skipped. Solutions of either creating a Python chunk uploader module (so up to 1GB might be uploaded) or generating an XML dump to use the GLAMwiki toolset to perform the upload are being considered (the latter would be much faster). Note, the average file size is large at ~50MB, due to bandwidth constraints and the fact that skipped files still result in 100MB being downloaded (as the API does not provide file sizes), the upload rate is of the order of just 5 per hour. The uploads have been popular, with Commons volunteers helping to crop and rotate the images.
- The GWToolset is now being used to complete this upload (up to 50 times faster), including the files previously skipped due to being over 100MB in file size. A category for images over 50,000,000 pixel resolution in this upload has been created as Commons does not create thumbnails for images over this extremely large resolution. There are likely to be a few hundred at this pixel size from the 20,000.
- This upload was paused when a few hundred images over 50 megapixels started to degrade WMF server performance in rendering tiff thumbnails. Operations are looking at ways of managing the operational load that thumbnail creation causes. The upload was cautiously restarted using a minimal number of processing threads.
- After a request on Commons:Bots/Work_requests I have created a "live" SVG format checker to mark vector files with embedded raster graphics with {{BadSVG}}. This identifies around 10 to 40 files per day by sampling the Commons upload log every hour, checking the source code of all SVG files over 5KB in size. As this seems non-controversial, it is running on the WMUK supported kit, though at some future point it may be re-written and migrate to WMFlabs (note an SVGbot used to run but lapsed back in 2010). The templated files are not marked as Wikimedia UK supported, doing so might be seen as controversial as Wikimedia UK kit did not upload the files. Note, this is currently running inconsistently due to an unsolved memory leak problem.
- A request from Robin for English Wikimedia articles to be automatically created for SSSIs in Wales has resulted in Wikipedia:Bots/Requests_for_approval#Faebot. This may not get approved, how we stage these potential 900 articles in a non-controversial way is under discussion.
- Due to other priorities on time, the Geograph project has been delayed. This will take some experimentation and research to decide how to approach the next level of place identification and whether to mass update from the Geograph project.
March 2014
[edit]- Batch uploads of Collections in the Library of Congress started. Some of the tiff files are very high resolution scans of width greater than 5,000 pixels and filesize larger than 100MB. These include some collections of direct relevance to UK history. Most relevant to the UK are:
- British Cartoon Prints Collection: 1017
- 1780—1830 historic political cartoons
- World War I posters in the Library of Congress: 1871
- As only a minority are British, these are not currently marked as sponsored by the chapter. Some posters, such as the more extreme propaganda posters from Germany, may have controversial content, however feedback from WikiProject MILHIST has been entirely supportive.
- British Cartoon Prints Collection: 1017
February 2014
[edit]- A backlog of more than 10,000 photographs as part of the Aircraft forum images were uploaded. This included around 5,000 from a Russian aircraft forum. These and later photographs will be identified as supported by Wikimedia UK. (Source code)
- Images_from_MoD_uploaded_by_Fæ: 6032
- Currently the Ministry of Defence releases around 35 to 70 high quality official photographs each month. Faebot should check the MoD site for new files daily, providing an estimated 600 high quality educational photographs each year. Fæ is in contact with the MoD Images library who will update him on changes in their API (which provides the metadata).
Initial
[edit]In January 2014 the WMUK charity supplied a macmini to act as a server for selected Faebot projects this includes:
- User:Faebot/Geograph#C5 - location categories on Geograph images for Scotland.
- Images from the UK Ministry of Defence (from February 2014)
- Aircraft forum images including airliners.net and russianplanes.net (from February 2014)—Catscan report Source (Russianplanes)
Log 2013
[edit]- September: Geograph categorization of Scotland under way with 541,082 images in the net (probably 400,000 will be in Scotland after testing). This will take 2 months to complete at the current burn rate. The schema is laid out at /Geograph#C5.
- A few periodic and one-off reports done in May-August due to various requests:
- Daily report of mobile phone related uploads lacking EXIF data and with multiple Tineye matches, to support a long running theme on the village pump
- Daily report of uncategorized photographs of the Gezi Park protests
- Analysis of anonymous IP created deletion requests to support a village pump vote and Analysis of deletion requests created by me, though could be adapted for any user
- Summer: Finding Counties appears highly reliable now, a large swatch of middle-England is being categorized under Geograph images in England - meaning that 714,703 photographs will be tested, and around 70% will have a new category added.
- February: Restarted Geograph categorization of Wales. This should see 150,000 images have better location categories. /Geograph#C3d Done
- January/February saw a significant period of retesting, demonstrating the Geograph sorting to now be accurate with an error margin of 0.03%, at least as accurate as the Ordnance Survey open data itself. /Geograph#Ct1
Log 2012
[edit]- October onwards: Regional categorization of Geograph starting with Category:Geograph images in London, see (Stage 1) /Geograph#C1 for details
- October onwards: Populating Category:Geograph images by year 1990s
- October onwards: Populating Category:Geograph images by year 1940s through to 1980s Done c. 20,000 images categorized
- September: Commons:Bots/Work_requests/Archive_7#Proposal_for_Geograph_raw_HTML_tidy-up Done 2,147 images fixed
- September onwards: User categorization for major Geograph contributors, see /Geograph
Awaiting batch upload completion
[edit]- Category:Tupper Scrapbooks Selection of photographs by William Vaughn Tupper, several volumes to upload
Misc jobs
[edit]- Populate Category:Geograph images of places mentioned in the Domesday Book
- Create Category:Images by User:Poco a poco by sniffing through 5,000,000 images.
- Experiment with category counts on Geograph to find false "needing categorization" cases, by filtering 900,000 images. Report at User:Faebot/Sandbox3.
Batch upload log
[edit]- _178 Category:Initials illuminated capitals added to sub-directories.
- __76 Category:1873 Michigan County Maps high quality scans of detail county maps.
- __24 Category:Black Panther Trial Sketches court sketches.
- _100 Category:Randolph Linsly Simpson Collection portraits of black Americans from 1850s.
- _144 Category:Cartes de visite Added Cartes de visite from the Beinecke Library.
- _180 Category:New Mexico postcards by Tichnor Brothers Uploaded after confirming licence at Commons:Village pump/Copyright, 1930s-1945 postcards of New Mexico.
- _300 Uploads from Ruslan Flickr 'Old photos and pictures' set, ~300 images mostly with pre-1917 Russian context {{PD-RusEmpire}} may apply
- _100 Category:Ex libris ±100 decorative book plates and stamps (mainly 19th century)
- _182 Category:Winslow Homer wood engravings prints 1858-1873, BPL
- _125 Category:Across the Continent on the Kansas Pacific Railway (Route of the 35th Parallel), with photographs published in 1869, BPL
- _700 Category:Animal Locomotion 11 Volumes by Eadweard Muybridge held by the BPL, 700 sets of photographs
- 1246 Category:Chromolithographs at Boston Public Library 1,200 high quality scans of 19th century popular images