By clicking "Accept", you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.
Knowledge

Is a manual annotation strategy for AI projects still valid in 2024?

Written by
Nicolas
Published on
2023-12-15
Reading time
This is some text inside of a div block.
min

Is an annotation phase necessary for my AI development project, and what annotation strategy should I adopt?

When undertaking a project based on unstructured data, the question of annotation inevitably arises. Although this step is not systematically required, it plays a crucial role in understanding and exploiting data for AI. This article explores the need for a manual annotation phase and looks at the strategies to be adopted (whether manual or automated annotation, or automated and enriched by manual validations).

Which data? Structured, semi-structured or unstructured?

The first step is to understand the nature of the data to be analyzed, whether textual, image or video data, for example. Their nature (structured or unstructured) and the total volume of data are determining factors. Should we annotate, and if so, what approach should we adopt? To answer these questions, it is essential to discern the differences between manual and automatic annotation in the data processing process prior to the development of an AI product.

Manual or automatic annotations: what's the difference?

Manual annotation involves the assignment of labels to documents or subsets of documents by human participants (data annotators, also known as data labelers). Automatic annotation, on the other hand, involves computer programs in this task (not to be confused with the labeling platform, which is a tool facilitating annotation tasks that can be used for both automatic and manual tasks). This automation can be achieved through a variety of methods, including rule-based techniques, or supervised learning algorithms used for annotation (and therefore, whose purpose is not to be a product for the end-user, but rather an AI used to prepare data for other AIs). The latter supervised learning algorithms require a prior annotation phase.

The choice between manual and automatic annotation depends largely on the characteristics of the project. Manual annotation often offers unrivalled accuracy, but can be costly and time-consuming. Automatic annotation, on the other hand, although generally less accurate, can be quicker and more economical. It is also possible to opt for a hybrid approach, combining the advantages of both methods to maximize efficiency while preserving annotation quality.

Enhancing manual annotation with artificial intelligence (AI): when is it relevant?

The appropriateness of using AI methods to structure data depends closely on the volume of data to be processed. For example, when analyzing responses to a questionnaire with a relatively modest volume of data, it may be wiser to opt for a manual annotation approach. This method, although time-consuming, can precisely meet the objectives of analyzing the themes addressed by respondents. It's important to note that determining the suitability of AI is not based solely on a fixed threshold for the number of documents, but rather on criteria such as the nature and length of the documents and the complexity of the annotation task.

However, when faced with a large volume of documents or a continuous flow of data, automation of the annotation process generally becomes a relevant option. In these situations, the annotation phase aims to initially annotate a portion of the documents, depending on the nature of the documents and the complexity of the task. This partial annotation is then used to train a supervised algorithm, enabling efficient automation of annotation across the entire corpus. However, we must be careful not to imagine that the automatic annotation task is self-sufficient. Generally speaking, it will produce pre-labeled data that needs to be qualified by professional annotators before it can be used by an AI model. The annotation task becomes more targeted for annotators, making their work more efficient.

A frequently recommended approach is to use Active Learning in annotation processes, to improve the working conditions and efficiency of annotators. Active Learning consists in intelligently selecting the most informative examples for the algorithm, in order to progressively improve its performance. By integrating Active Learning into the manual annotation process, the process can be optimized by specifically targeting the most complex or ambiguous data, helping to increase the algorithm's efficiency and accuracy over time.


Take, for example, a real estate ad annotation task (30 to 40 labels on average for each 500-word ad). By integrating Active Learning after annotating 2,000 texts, pre-annotated data will be generated. This pre-annotated data will then be submitted to the annotators for manual qualification, i.e. they will have the task of checking and correcting pre-annotation errors, rather than manually annotating the 30 to 40 labels mentioned above, for the remaining 5,000 ads, for example.

Conclusion

The balance between manual and automatic annotation is adjusted according to the specific requirements of data annotation campaigns and artificial intelligence projects. A dynamic, adaptive approach is essential. In this context, Innovatiana stands out by offering a complete solution through its "CUBE" platform, accessible at https://dashboard.innovatiana.com. This platform provides access to labelled data on demand, to meet the varied needs of projects, while offering the possibility of reinforcing labelling teams by mobilizing our team of Data Labelers. In this way, Innovatiana is fully in line with a dynamic and progressive vision of annotation within artificial intelligence projects, offering a comprehensive response adapted to today's challenges.