This is potential task for google summer of code / Outreachy
About huggle
Huggle is a fast diff browser application intended for dealing with vandalism or other unconstructive edits on Wikimedia projects, written in C .
Huggle is able to load and review edits made to Wikipedia in real time, helps users identify unconstructive edits, and allows them to be reverted quickly. Various mechanisms are used to draw conclusions to whether an edit is constructive or not. It uses a semi-distributed model where edits are retrieved using a "provider" (this can be anything that is capable of distributing a stream of edit information, such as the Wikipedia API or IRC recent changes feed), pre-parsed and analyzed. This information is then shared with other anti-vandalism tools, such as ClueBot NG. Huggle also uses a number of self-learning mechanisms, including a global white-list (users that are considered trusted) and user-badness scores that are stored locally on the client's computer.
Description of this task
Create a mechanism that would recognize common mistakes in wikitext syntax and implement this into Huggle as extension. For every syntax error that is found in a new text of a diff, raise edit score by certain value. It could also lookup commons files and if someone replaced a file name with non existent file, it should raise score as well.
Summary of error checks should be written to MetaLabels or PropertyBag of every edit (see header file wikiedit.hpp for details), so that it would be visible in edit details in huggle interface. The mechanism to detect syntax errors should be designed, if possible, in a way that it can be reused by other tools (it must be GNU-license compatible open source at least).
External linting service (optional)
In case you weren't feeling that strong in C there is also an option to design an external linting service that could be hosted on tool labs, written in a language of your choice. The extension would then use this service in order to validate edits. That is also an option, but not necessary.
Hints
You can re-use code from existing tools that do similar job, for example AWB, or CheckWiki https://tools.wmflabs.org/checkwiki/
You can take the source code of this extension as a reference, because it already does re-score edits in huggle: https://github.com/huggle/extension-scoring
Basic information about huggle can be found at http://enwp.org/WP:HG, documentation for developers can be found at http://tools.wmflabs.org/huggle/docs/head and on wiki at https://github.com/huggle/huggle3-qt-lx/wiki
It's strongly recommended to discuss any potential issues or questions regarding the huggle code on our IRC channel Huggle on freenode.net
- Primary mentor: Petrb (petan on freenode)
- Co-mentor: (Phabricator username)
- Other mentors: (optional, Phabricator username)
- Skills: C (Optionally PHP, Perl, .Net or python in order to analyze the existing tools that do this job)
- Estimated project time for a senior contributor: 3 weeks
- Microtasks: (links to Phabricator tasks that must be completed in order to become a strong candidate)