You may or may not already know that I am participating in the December 2023 Outreachy internship with Wikimedia. We've reached the half way mark and so far it has been quite an experience; so much learning. Actually, most of what i've implemented in the internship period, I hadn't implemented before. The weekly reviews where I get feedback from my mentors on my tasks have been very helpful as I've been able to research better ways of improving my code and its performance. In this article, I illustrate my original plans for the internship and how much progress i've made so far.
My internship goals:
I am working on the project: Addressing the Lusophone Technological wishlist proposals and in my original project timeline, I planned to tackle the proposals with the most priority up-votes by the Lusophone Wikimedia community; A tool that lists pages with broken links and Activating Internet Archive bot on Portuguese Wikipedia. After engaging with my mentors, considering the complexities of the two projects and the internship period (3 months), we decided to address the most pressing issue; A tool that lists pages with broken links and a less complex but vital proposal; Option to disable Rollback in regular editors' edition.
The Minimum Viable Product (MVP) for a tool that lists pages with broken links is a script that can be loaded to the Skin of a Wikipedia page and checks the external links on a page to mark links whose status code is not 200. Other proposed features of the tool include;
Listing the number of broken links found on a page
Caching of links that have been checked to improve performance
Translating the tool to Portuguese
Writing documentation for the tool
For the Option to disable Rollback in regular editors' edition, a script that checks if an editor is an active member of known Wikipedia groups and hides the roll-back button for them on a roll-backer's interface is required.
Achievements:
For the first half of the internship, I have worked on the tool that lists broken links and so far, I've achieved the following:
Successfully deployed the tool to Toolforge and it can now be accessed and used by all editors on Wikipedia at https://deadlinkchecker.toolforge.org/
The tool successfully checks pages with relatively fewer links on the page (approximately 60) and marks links whose status code is not 200. It also displays the results from the dead link checker; the number of dead links found on a page or "OK" otherwise.
Written documentation for the tool
Translated the tool to Portuguese.
Highlights of the internship so far
A major highlight of my internship is how much my coding skills have improved because of the constant research to improve my implementation. Prior to this internship, I was mainly working on back-end projects and this internship has helped me practice and appreciate front-end development.
Also, I was able to learn a new concept called concurrent programming. For the dead link checker, I have to make multiple requests to the multitude of links on each Wikipedia page. Before this internship, I was only familiar with threading -which I later discovered was not approriate for Input/Output tasks. The performance issues with threading on this project led me to discover asynchronous concurrent tasks in python using Asyncio.
Prior to this internship, I'd not implemented caching in a project or interacted with Redis. I have been able to understand the need for and how to perform fast data retrieval using an in memory database like Redis.
Plans for the second half of the internship
Improve performance of the dead link checker tool to check more links as many pages on Wikipedia have 100 or more links
Set up caching for pages that have already been checked
Present the script to the Lusophone Wikipedia community for feedback
Improve security of the tool to ensure requests come only from Wikimedia domains
Develop script to disable roll back for active editors
Conclusion.
My Outreachy internship with the Wikimedia foundation has been a great learning experience and I look forward to learning more as I continue the rest of my internship.