Commons:ISA Tool/Image to Concept

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Test and share your feedback 

The goal of the Image to Concept project was to create a fully functional crowdsourcing tool for the semi-automatic tagging of images on Wikimedia Commons and to put the tool to a test, in order to explore and document its strengths and weaknesses and to make recommendations regarding its use and further development. The test was implemented on the ISA Tool. This project is related to Computer-aided tagging, which seeks to assist community members in identifying and labeling depicts statements for Commons files.

Final Project Report[edit]

The Final Project Report was published on 6 October 2023 and is available here:

The Image to Concept project[edit]

Components of the Image-to-Concept Project
Components of the Image-to-Concept Project

The Image to Concept project pursues two long-term goals:

  • Development and provision of free/libre open source algorithms for semi-automatic entity extraction from images and interlinking of the entities with existing knowledge graphs.
  • Development of a free/libre open source crowdsourcing application facilitating the semi-automatic entity extraction from images that puts the aforementioned algorithms to use.

Phase I of the Image to Concept project comprises the development of a fully functional crowdsourcing tool for the semi-automatic tagging of images on Wikimedia Commons:

  • based on the ISA tool (code repository / license)
  • extended by Google Cloud Vision
  • enhanced by text mining from image metadata
  • adding “depicts” statements pointing to Wikidata
  • supporting the enforcement of Wikimedia Commons community rules
  • generating tags that are relevant from a user point of view (based on several use cases / user stories)
  • with a modular architecture consisting of:
    • crowdsourcing application (human in the loop);
    • entity extraction algorithms;
    • algorithms to ensure quality and to enforce community norms;
    • target knowledge bases.
  • attractive for GLAM institutions
  • well-accepted by the volunteer community on Wikimedia Commons.

The research project consists of two largely consecutive work packages: Software Development (WP 1), followed by Large-Scale Tests (WP 2).

The project is rounded off by three smaller work packages which focus on the integration of the project in the pipelines of the SWITCH Connectome Project (a project aimed at creating an ecosystem for linked open research data in Switzerland), communications in the context of SWITCH and the planning of future development steps.

WP 1: Software Development[edit]

  • Iteration 0: Set up the team and the development environment
  • Iteration 1: Develop first iteration of core suggestions feature within ISA tool (based on the rough test version already developed)
  • Iteration 2: After initial small scale testing and feedback, refine and improve design and user experience.
  • Iteration 3: Develop a first version of the Metadata-to-Concept Module. Fix the remaining bugs in view of broad user testing.
  • Iteration 4: Improve the Metadata-to-Concept Module. Record stats on usage of machine vision suggestions, including option to download for further processing (e.g. to train the algorithms).
  • Iteration 5: Further improvement of the Metadata-to-Concept Module
  • Documentation

WP 2: Large Scale Tests / Use Cases[edit]

After the software development and internal small scale testing, large scale tests will be implemented.

The research questions will be focusing on the following aspects:

  • Look, feel, and usability of the crowdsourcing tool (practically speaking, the ISA Tool) with the machine vision / metadata-to-concept enhancements in place.
  • Relevancy of the generated tags from the point of view of GLAM institutions and their users (various user stories) (e.g. in the context of content donations)
  • Usefulness of the tool from the point of the view of the Wikimedia Commons community (conformity of the generated tags with community norms; acceptance of the tool by the community on Wikimedia Commons, helpfulness in getting the work done etc.)

The tests will serve at the same time as an outreach campaign vis-à-vis users and potential users of the tool - both with regard to heritage institutions and with regard to volunteer image taggers. At the end of the large scale tests, recommendations will be made regarding the use and the further development of the tool.

The work package 2 more specifically includes:

  • Carry out large-scale tests, document results
  • Collect user stories, document use cases
  • Community outreach, exchange regarding support of community norms by the ISA Tool
  • Preparing for benchmarking against machine version approaches currently employed by heritage institutions (the actual benchmarking study is beyond the scope of the present project)

Timeline[edit]

June 2022 to October 2023.

Credits[edit]

Team members
  • Beat Estermann - Project Coordinator
  • Eugene Egbe (contractor WMSE) - Software Development
  • Matthias Ruediger (BFH) - Software Development
  • Navino Evans (contractor WMSE) - Software Project Manager / Software Architect
  • Sebastian Sigloch (SWITCH) - Representative Funding Institution
  • Sebastian Berlin (WMSE) - Software Development
  • André Costa (WMSE) - Coordination on the side of WMSE
  • Florence Devouard (contractor WMSE) - Community Outreach

To share opinions, wishes, or ask questions please leave a message on the talk page. If you wish to reach a team member directly, please feel free to leave a message on our respective talk pages.

Partner institutions

Testing sign up[edit]

 Sign-up Test and share your feedback 

If you want to get involved in the latest developments of the ISA Tool, please add your name below in the following format : # {{#target:User talk:Your User Name Here}}

List

  1. User talk:Anthere
  2. User talk:Beat Estermann
  3. User talk:Islahaddow
  4. User talk:Ceslause
  5. User talk:Secretlondon
  6. User talk:OtuNwachinemere
  7. User talk:Onyinyeonuoha
  8. User talk:Magotech
  9. User talk:Bile_rene
  10. User talk:Asaf (WMF)
  11. User talk:Beireke1
  12. User talk:Mndetatsin
  13. User talk:GeorgHH
  14. User talk:Serieminou
  15. User talk:Fawaz.tairou
  16. User talk:Fexpr
  17. User talk:actveso
  18. User talk:Iwuala Lucy
  19. User talk:Bile rene
  20. User talk:MichellevL (WMNL)

During this stage in the process, we are looking for feedback regarding look, feel and usability of the tool : Please share your feedback on the talk page; thank you in advance!

ISA Workshop - Dec 12th[edit]

An online workshop is planned on Monday 12th of December, 16h-17h30 UTC+1.
Organizers of the workshop
Beat Estermann, André Costa (WMSE), Navino Evans and Florence Devouard
Link
https://us02web.zoom.us/j/87465436094
Notes
on this doc
Agenda
  • Round of introduction and tagging experience sharing
  • Brief introduction to and demonstration of the ISA Tool (what it does etc.)
  • Introduction to the Image to Concept project (computer-aided tagging)
  • Demonstration and testing of the recommendation engine : https://isa-dev.toolforge.org
  • Discussion around tagging rules and recommendations - Commons:Depiction guidelines
Interested in attending the workshop ? Please sign-up

How to add new categories to the Machine Vision process[edit]

This is the link to the beta version of the enhanced tool: https://isa-dev.toolforge.org. [1]

You may use one of the existing campaigns or create one of your own. Note that not all images on Wikimedia Commons are passed through the machine vision algorithm by default. When creating a new campaign, make sure that you include categories for which inclusion in machine vision has already been granted [1] or open a new Phabricator ticket to request the addition of further categories to the machine vision queue.
Make sure to request all categories (the system will only take the bottom level, so remember to pick up and down).
Please note that the expected waiting time to get the request processed is between 1 and 4 weeks.

Research background[edit]

  • Stuber, Christian; Kocher, Manuel; Oesch, Aaron (2020): Deep Learning für Gedächtnisinstitutionen (“Angewandte Fälle und Übungen” at Bern University of Applied Sciences, Master Business Informatics)
    • Exploratory study to address the question which deep learning methods can be applied in practice to assist heritage institutions in the semantic tagging of their image collections:
  • Burkhalter, Yannick (2021): Using a machine learning algorithm for the semantic annotation to images to Wikidata (“CASE-Arbeit” at Bern University of Applied Sciences, Bachelor Business Informatics)
    • Exploratory study with the goal to examine to what extent it is possible to use a specially trained Visual Recognition Service to tag images available on Wikimedia Commons
  • Burkhalter, Yannick (2021): Combination of a Visual Recognition Service with a Crowdsourcing Approach for the Enrichment of Semantic Data on Images (Bachelor Thesis in Business Informatics at Bern University of Applied Sciences)
    • In this thesis, the ISA tool and the CAT tool, which are used to add depicts statements, were combined to create a new tool to help GLAM institutions add depicts statements to their image collections. The tool could be implemented with the help of the developers of the ISA tool using an iterative approach

Links[edit]

Notes[edit]

  1. As a matter of comparison, here is the link to the original tool with no enhancement : https://isa.toolforge.org