By clicking "Accept", you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.
How-to

Running a data annotation campaign: the guide (2/2)

Written by
Nicolas
Published on
2023-12-18
Reading time
This is some text inside of a div block.
min

The preliminary steps outlined in the first part of this guide have led to the creation of a team, the precise definition of the project's problems, and the development of precise rules for annotation tasks. The campaign can begin! In this article, we have compiled a set of recommendations for running successful data annotation campaigns.

Training and mobilizing Data Labelers for successful AI projects

Training and mobilizing Data Labelers (or annotators) is a necessary step in any data annotation campaign. The repetitive, tedious and sometimes complex nature of the annotation task exposes them to the risk of errors such as omitting an object to be annotated on a given image, or assigning an inappropriate label. Thorough training and effective mobilization of annotators, both at the outset and during the course of the project, are essential to mitigate these risks of error, and above all to identify them as early as possible.

In the preliminary phase of the project, it is essential to clearly explain the project's stakes to the team of annotators, emphasizing the central role of annotation in the project's success. This is an essential awareness-raising phase. This integration stage also represents an opportunity to raise annotators' awareness of Artificial Intelligence concepts, and of the reality of AI product development cycles.

It's also a good practice to maintain a register of the most common errors, which is updated as the project progresses, using a participative approach (i.e., each annotator is invited to complete the register with specific cases identified, supplemented with concrete examples and illustrated with screenshots).

Keep annotators engaged throughout the project

Maintaining the commitment of annotators throughout the project requires a constant dynamic of exchange. Sharing tools such as instant messaging, discussion forums and collaborative documents are useful for fostering discussions within the project team, enabling difficulties to be resolved, questions to be asked and mutual support to be provided. Regular synchronization sessions can also be set up to communicate project progress, share any changes or highlight specific points of attention related to annotation.

Control and ensure data quality

When the final objective of the annotation campaign is to develop an algorithm to automate a task, the presence of errors in the data and metadata used for training can cause the algorithm to reproduce the imperfections of manual annotation. We have put together a list of best practices to ensure the reliability of projects of all sizes.

Create a Ground Truth dataset

A dataset, also known as "Ground Truth", is made up of annotated documents whose annotations have been rigorously checked, guaranteeing unquestionable quality. This dataset can be exploited in a variety of ways.

On the one hand, the corresponding documents (excluding annotations) can be submitted for annotation by the annotators at the start of the project. The aim of this approach is to ensure that annotators have an adequate understanding of the task, and to check that the annotation scheme is unambiguous, i.e. that it could not lead two annotators to annotate the same document in a correct but divergent way. By comparing annotators' annotations with quality-assured ones, errors or ambiguities can be detected. These findings will either clarify elements of the annotation scheme requiring further explanation, or correct the annotation scheme to eliminate certain ambiguities.

On the other hand, the "Ground Truth" dataset can also be used as a test dataset, offering the possibility of evaluating the algorithm developed on a dataset of maximum quality. This approach makes it possible to measure the algorithm's performance under reliable conditions, and to ensure its robustness and accuracy.

Random verification of annotated documents by Data Labelers

It is recommended that, throughout the project, the project manager periodically reread randomly selected annotated documents, in order to guarantee the quality of the annotations.

Setting up consistency tests on annotations

For certain projects, it is possible to implement automatic tests reflecting the business rules that annotations must respect. When such tests can be integrated, they offer the possibility of automatically detecting annotated documents with a high risk of error, thus requiring priority verification by the business expert.

Finally: taking stock of your annotation campaign

Conducting an annotation campaign, often confronted with complex challenges, requires careful evaluation at its conclusion, to draw out useful lessons for subsequent projects involving annotation. This critical phase allows us to document in detail the methodology used, the course of the campaign, as well as key metrics. The following section provides a non-exhaustive list of metrics and questions relevant to an in-depth evaluation of your annotation campaign, offering valuable insights .

Below are a few indicators that can be used to assess the performance and relevance of annotation campaigns:

- Annotation campaign duration

- Number of annotators mobilized

- Total volume of annotated documents

- Average time spent annotating a document

- Suitability of annotation software (performance, comparison of results using several platforms, ergonomics, etc.).

- Appropriateness of annotation scheme (legibility, reproducibility, coverage of special cases)

- Ability to mobilize professional annotators who are experts in their field

A comprehensive evaluation approach contributes to a better understanding of the successes and challenges encountered, providing essential information for improving future annotation campaigns.

(End of guide. You can find the first part of our guide at this address).

To find out more, read our article on how to choose the right annotation platform for your specific needs.

Innovatiana has set itself apart by presenting an integrated solution for managing your data annotation campaigns through its "CUBE" platform. This platform stands out by offering a global solution, accessible at https://dashboard.innovatiana.com, for data collection and annotation challenges. It represents an all-in-one approach, centralizing the specific requirements of each project within a single working environment, thus enabling tailored customization.