User Details
- User Since
- Oct 2 2019, 10:06 AM (266 w, 5 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- AikoChou [ Global Accounts ]
Apr 25 2022
Hi @Isaac, glad to start working on this. :) I am currently working on a sub-task to complete the HTTP error handling code. After that, we should be ready to deploy it. I will let you know if we see any issues.
Mar 21 2022
Hmm, editquality image is also not the latest version. Need to deploy as well.
Checked deployment-chart, only editquality has been deployed.
Aug 26 2021
The following files are samples of the image suggestions for articles that have no infoboxes for each of the wikis we counted. :)
Aug 19 2021
The data I posted is not inclusive of those changes. These numbers were calculated based on an older set of image recommendations from 2021-04.
Aug 9 2021
Update -- we excluded all kinds of infobox by filtering using Q19887878. For cebwiki, the count drops from 99% to 3% of unillustrated articles that have no infobox, and the count for other wikis has also dropped.
Jul 30 2021
The following table is the preliminary result:
https://docs.google.com/spreadsheets/d/1JGDOmZ16L3La-l82rhKAD2IhoaCQfcNICNQ90paH54U
Jul 29 2021
The task is ongoing, but it may take longer than expected.
Jul 2 2021
Weekly update:
Jun 18 2021
Weekly updates:
Jun 11 2021
Weekly updates:
We confirmed (1) How the input data is formatted and (3) The function used to transform the Keras model to an Estimator are not the cause of the poor performance for Estimator, as we trained a CNN model from scratch that can reach the same performance in both Keras and Estimator.
May 30 2021
May 21 2021
Weekly updates:
We wrote documentation of distributed image inference workflow in the Github repo and provided three tasks as examples: image quality inference, face detection, and Resnet feature extraction. With regard to distributed training using tf-yarn, we are looking for an alternative to wrap a Keras model in Estimator to solve the accuracy issue.
May 18 2021
May 3 2021
Weekly update:
Apr 29 2021
Apr 6 2021
For point 1. I calculated the number of overlapped images in allowed_images and image_placeholders as follows:
I want to use tf-yarn to train a simple model on the cluster, but I found some environment variables need to be set up, which described in this doc:
- JAVA_HOME: /usr/bin/java
- HADOOP_HDFS_HOME: /usr/bin/hdfs
Mar 27 2021
I updated the code in the GitHub repo (in the branch) that improves filtering out placeholders. The workflow is as follows - first use PetScan to search all the subcategories from Category:Image_placeholders (https://petscan.wmflabs.org/?psid=18699732). Next, query for all images from those categories in Hive. Then, exclude these images when querying for candidates in both wikidata commons category (fewer cases) and other wikis (many cases).
Mar 23 2021
Hi @Cparle - yes of course, there you go:
Mar 10 2021
@MMiller_WMF -- here are the results computed using unillustrated articles for which the algorithm has at least one recommendation. Since illustrated articles for February are available to query, I added results for January. Most of them fall within the range of 0.1% ~ 8%. There are two very high numbers 21.46% and 31.62% in arzwiki (In previous results, these two months also have relatively high percentages). A scatter plot is shown below that excludes the two outliers, showing the distribution for most wikis.
Mar 4 2021
Summary of the work done so far:
- Imported the image data on local and saved to TFRecords files
- Finetuned an Xception model to classify images between 'sculptures' and 'maiolica'
- Ran inference on test data on local
Mar 2 2021
Here are estimates of the percentage of unillustrated articles that become illustrated after one month for each target wikis.
Feb 22 2021
Hi all,
Feb 15 2021
Hi @MMiller_WMF @Tgr -- it's very nice to meet you too. I'm really happy to have the opportunity to help :D
Feb 11 2021
Hi all,
Feb 10 2021
@Miriam Yeah if there is no maximum, it's not appropriate to use normalization. I'll update the result of the non-normalization one.
Feb 9 2021
Hi all!
Here are the results after removing icons (.svg). Overall, these numbers drop slightly but not change much.
Feb 8 2021
Hi all,
Could you double check that I have LDAP access? because I'm not able to access the notebooks.
Feb 3 2021
Hi @CDanis,
My wikitech username: AikoChou
Preferred shell username: aikochou
SSh public key: https://phabricator.wikimedia.org/P14137
I have read and signed the L3 Wikimedia Server Access Responsibilities document.
Thanks! :)
Mar 9 2020
Completed the wrap-up steps:
- Documentation: Meta page, README on GitHub
- Posted to cloud, wikitech-l, wiki-research-l
- Added the final report on https://www.mediawiki.org/wiki/Outreachy/Round_19/Bi-weekly_Reports#Week_6
- Summarized the project on https://www.mediawiki.org/wiki/Outreachy/Past_projects#Round_19
Completed the wrap-up steps:
- Documentation: Meta page, README on GitHub
- Posted to cloud, wikitech-l, wiki-research-l
- Added the final report on https://www.mediawiki.org/wiki/Outreachy/Round_19/Bi-weekly_Reports#Week_6
- Summarized the project on https://www.mediawiki.org/wiki/Outreachy/Past_projects#Round_19
Mar 2 2020
In the last week of the internship, I've been working on:
Week 9-10
Week 1-8 Summary
In the last week of the internship, I've been working on:
Week 9-10
Week 1-8 Summary
Jan 22 2020
Weekly update
- Modified the input pipeline and the format written to the database.
- Worked on a script to ingest data into Citation Hunt.
- Created pull request of 1 and 2 for Guilherme to review
We are in week 8 now but have moved to week 9 work. Just swapped Citation Hunt work with testing/regular job work. Swapped 9-12 with 6-8. :)
Dec 30 2019
Thank you all for your help!
The issue was solved when I run it using the grid engine and set -mem option. :)
Dec 28 2019
Nov 28 2019
Oct 14 2019
Hi @Ghassanmas I am also confusing about the definition between statement and sentence in this project. Thanks for pointing out.
Here is my repo:
https://github.com/AikoChou/wikimedia-outreachy-2019
Oct 8 2019
Oct 4 2019
Do we have a deadline for this task? Thanks!
I am using :
Python 3.7
Keras 2.2.4
Tensorflow 2.0.0