Knowledge

All about dataset annotation: from raw data to high-performance AI‍!

Written by

Daniella

Published on

2024-11-22

Reading time

This is some text inside of a div block.

min

📘 CONTENTS

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

The success of artificial intelligence depends to a large extent on the quality of the data it receives. Dataset annotation plays a key role in the development of machine learning models.

‍

This process, which involves enriching raw data with relevant metadata, enables algorithms to understand and learn from this information. Whether for 🔗 identify objects in an imageinterpret text or 🔗 recognize soundsthe annotation of data forms the basis of any successful AI model.

‍

In short, data annotation is a prerequisite in sectors such as retail, automotive, healthcare and finance. It enables the development of accurate and efficient artificial intelligence and machine learning models, illustrating its importance through specific use cases. This subject, at the crossroads of data science and Machine Learning, deserves special attention to understand its importance and impact in the modern AI ecosystem.

‍

💡 In this article, we suggest you discover how a job of dataset annotation can strengthen your artificial intelligence models. It's laborious, sometimes expensive work, but we're convinced it's a necessary craft for the future of artificial intelligence. We'll tell you more in this Blog, so follow the guide!

‍

Introduction

‍

Artificial intelligence (AI), machine learning (ML) or even generative AI... so many concepts you're probably familiar with that have revolutionized and continue to revolutionize many sectors, from healthcare and finance to commerce and transport. At the heart of this revolution lies a fundamental element: data. More specifically, the quality and relevance of the data used to drive AI models. This is where dataset annotation comes in, a process that transforms raw data into information that can be exploited by algorithms.

‍

Put simply, data annotation involves enriching raw data with metadata or labels that enable algorithms to understand and learn from this information. Whether it's identifying objects in an image, interpreting text or recognizing sounds, data annotation is the cornerstone of any successful AI model.

‍

So... what's the point of data annotation?

‍

Data annotation is an essential process for training artificial intelligence models. It involves assigning labels or annotations to raw data to make it usable by machine learning algorithms. Data annotation is very useful for supervised learning, a common approach in machine learning where algorithms learn from labeled examples. Annotated data enables algorithms to learn to recognize patterns and make accurate predictions.

‍

In Computer Vision, for example, data annotation helps algorithms to identify and locate elements in an image, such as cars, pedestrians or animals. This makes it possible to develop applications such as facial recognition, object detection or autonomous driving. Similarly, in 🔗 natural language processing (NLP), data annotation helps algorithms understand the nuances and contexts in which humans communicate, thus facilitating 🔗 tasks such as sentiment analysismachine translation or 🔗 chatbots.

‍

Data annotation is a process that requires both precision and a thorough understanding of the data context. The quality of annotation has a direct impact on model performance. Accurate and consistent annotation reduces errors and improves the ability of models to generalize to new data.

‍

What is an annotated dataset?

‍

An annotated dataset is a data set enriched with additional information (or metadata), called annotations, which describe or structure the data to facilitate its understanding by artificial intelligence (AI) algorithms.

‍

These annotations can take different forms depending on the type of data and the purpose of the analysis: labels to 🔗 classify images🔗 bounding boxes to locate objects, transcriptions for audio files or named entities to analyze text.

‍

*Overview of a video dataset annotation process - Source: 🔗* ***ResearchGate***

‍

The main purpose of an annotated dataset is to provide machine learning models with the elements they need to learn to 🔗 recognize patternspredict results or perform specific tasks. For example, in the field of Computer Vision, an annotated image dataset could indicate which photos contain cats, where they are in the image, and even what actions they perform.

‍

💡 TLDR: annotations are used to train supervised models that use data as a reference to make accurate predictions on new, unannotated information.

‍

Why is data annotation essential for AI?

‍

Data annotation is essential for artificial intelligence, as it forms the basis of supervised learning, the most widespread type of learning in AI projects. Here's why it's essential:

‍

Giving meaning to raw data

Raw, unannotated data is often incomprehensible to algorithms. Annotations enrich this data with explicit information, such as categories, labels or visual cues, enabling models to learn how to interpret it. Data preparation is a crucial step, as it directly influences the efficiency and accuracy of AI models.

‍

Improving model accuracy

Annotations act as a guide for machine learning algorithms, enabling them to recognize patterns and adjust their predictions. The more precise and well-designed the annotations, the better the model will perform. Regular updating of labeling rules is also important to ensure the accuracy and consistency of annotations in a project, especially in 2024.

‍

Adapting AI to specific use cases

Each AI project has its own specific needs. Data annotation enables models to be customized for specific applications, such as image recognition in Computer Vision or sentiment analysis in natural language processing.

‍

Facilitate model evaluation and improvement

The annotated datasets, obtained during the 🔗 Data Annotationphase, are used as a reference to evaluate model performance. They enable us to measure accuracy, sensitivity and error rates, and identify areas for improvement.

‍

Making models robust

By annotating varied and representative data, we can train models capable of handling a wide range of situations and reducing biases, thus increasing their reliability.

‍

Examples of annotating microscopic data with Bounding Boxes - Source: 🔗 **ResearchGate**

‍

What role does dataset annotation play in Computer Vision?

‍

Dataset annotation plays a central role in Computer Vision, providing algorithms with the information they need to interpret and analyze data visually. Here are the main roles of annotation in this field:

‍

Enriching images with metadata

Annotations can be used to transform raw images into usable data for artificial intelligence models. This includes adding labels, bounding boxes, segmentation masks or key points, depending on the needs of the application.

‍

Computer systems use this annotated data to improve their performance and produce accurate information.

‍

Training algorithms to recognize objects

By associating visible objects in images with specific categories, annotations help models learn to detect and classify objects, such as cars, pedestrians or animals.

‍

Locate and segment visual elements

Annotation not only lets you know what an image contains, but also allows you to precisely locate objects or areas of interest in the image, using contours or masks, for example.

‍

Improve precision in complex tasks

In applications such as facial recognition, anomaly detection or autonomous driving, detailed annotations ensure that models understand visual subtleties such as facial expressions or viewing angles.

‍

Create datasets for a variety of use cases

Computer Vision covers a wide range of applications, from object recognition to video analysis. Context-sensitive annotations enable models to be customized to meet specific needs.

‍

Evaluating model performance

Annotated datasets serve as a basis for testing and comparing algorithm performance. They can be used to measure the accuracy of detection, classification or 🔗 segmentations.

‍

What are the main types of data annotation?

‍

Data annotations vary according to the type of data and the objectives of artificial intelligence projects. Here are the main types of data annotation, ranked by their frequent use in Computer Vision and natural language processing applications:

‍

Annotation for visual data (images and videos)

Classification: Each image or video is given a global label indicating the category to which it belongs (e.g. "cat", "dog", "car").
Bounding Boxes: Objects in an image or video are framed by rectangles to indicate their position.
🔗 Semantic segmentation : Each pixel in an image is assigned to a specific category (example: "road", "pedestrian", "vehicle").
Instance segmentation: Identical to semantic segmentation, but each instance of an object is distinguished (e.g. two cars have separate masks).
🔗 Annotation by key points : Objects are annotated by specific points (example: human articulation for pose recognition).
Video Tracking: Track annotated objects in a video sequence to understand their movements.

‍

Annotation for text data

Named EntityRecognition: Identification and categorization of specific entities in a text, such as proper nouns, dates or amounts.
Text classification: Association of a document or sentence with a category (e.g. positive or negative sentiment).
Syntactic analysis: Annotation of the grammatical structure of a sentence, such as the relationships between words.
Relationship annotation: linking two entities in a text to identify connections (e.g. a person and a company).

‍

Annotation for audio data

Transcription: Conversion of audio into text.
Sound event labeling: Indication of when specific sounds appear in an audio file.
Temporal segmentation: Annotation of the beginnings and ends of audio segments of interest (e.g. different speakers in a conversation).

‍

Annotation for multimodal data

Data alignment: coordinate annotations between different types of data, such as linking a text transcript to a corresponding audio or video segment.
Interaction annotation: Analysis of interactions between modalities, for example between facial expression and speech in a video.

‍

Annotation for structured data (tables, databases)

Attribute annotation: Add labels to columns or entries in a database to indicate their meaning or category.
Linking data: Create relationships between different data sets, e.g. by grouping similar entries.

‍

These types of annotation are often combined to meet the specific needs of AI projects. The choice of annotation type depends on the available data and the targeted task, such as classification, detection or prediction.

‍

Looking for Data Labelers for your dataset annotation tasks?

Our expertise in dataset annotation is at your service. Our dedicated team is here to support you in all your data preparation projects for your artificial intelligence models. We look forward to hearing from you.

‍

What tools should you use to annotate a dataset?

‍

🔗 Annotating a dataset requires specialized tools, adapted to the data types and project objectives. Here's a list of the most popular annotation tools, broken down according to their specific uses (these are tools we've used at Innovatiana - 🔗 don't hesitate to contact us if you'd like to know more or don't know which one to choose):

‍

Tools for annotating images and videos

- LabelImg:
An open-source tool for creating bounding boxes on images. Ideal for object classification and detection.
Strengths: Free, intuitive, compatible with various formats (XML, PASCAL VOC, 🔗 YOLO).

- 🔗 CVAT (Computer Vision Annotation Tool):
Open-source platform designed to annotate images and videos. It supports complex tasks such as segmentation and tracing.
Strengths: User-friendly web interface, collaborative management, annotation customization.

- Labelbox:
Commercial solution offering advanced features for annotating and managing datasets.
Highlights: annotation analysis, tools for object segmentation and tracking.

- SuperAnnotate:
Comprehensive platform for computer vision annotation and project management, suitable for large teams.
Highlights: fast annotations, quality management, integration with AI pipelines.

‍

Tools for annotating textual data

- Prodigy:
Python-based annotation tool, ideal for tasks such as named entity recognition, sentiment analysis or text classification.
Strengths: Fast and designed for rapid iteration.

- LightTag:
Collaborative platform for text annotation, suitable for teams working on labeling projects.
Strengths: User-friendly interface, conflict management between annotators, quality reports.

- BRAT (Brat Rapid Annotation Tool):
Open-source solution for syntactic, semantic and relationship annotation in textual data.
Strengths: Suitable for researchers, easy customization, export in various formats.

- Datasaur:
Platform focused on text annotation with collaborative tools and features to manage large-scale projects.
Strengths: Performance monitoring, automation tools to reduce annotation workload.

‍

Tools for annotating audio data

- Label Studio :
Open-source software for segmenting and annotating audio files. Particularly well-suited to this type of use, with a user-friendly interface.
Strengths: Free, wide range of audio editing functions.

- Praat:
Software specialized in the analysis and annotation of audio files, particularly for linguistics and phonetics.
Strengths: Suitable for in-depth analysis, precise segmentation options.

- Sonix:
Pay-per-use platform for automatic transcription and annotation of audio.
Strengths: Fast transcriptions, collaboration tools.

‍

Tools for annotating multimodal data

- VGG Image Annotator (VIA):
A lightweight open-source tool for annotating images, videos and audio files.
Strengths: Versatility, no need for advanced configuration.

- RectLabel:
Pay-per-use MacOS software for annotating images and videos, especially for multimodal projects.
Strengths: Easy to use, export in common formats (COCO, YOLO).

‍

💡 Please note: at the time of writing, data annotation software solutions for artificial intelligence are still evolving, and the management of multimodal data is still perfectible. In the future, solutions should make it possible to create relationships between various types of data in an intuitive yet high-performance way.

‍

Automation-based tools

- Amazon SageMaker Ground Truth:
AWS Service combining manual and automated annotation using 🔗 Machine Learning models.
Highlights: reduced annotation costs, management of large datasets.

- 🔗 Scale AI :
Commercial platform combining artificial intelligence and human intervention to rapidly annotate large volumes of data.
Strengths: Massive management, quality guaranteed by teams of crowdsourced annotators.

- Dataloop:
Solution for automating repetitive tasks in complex projects.
Strengths: Scalability, easy integration into ML pipelines.

‍

Tools for collaborative projects

- Diffgram:
Open-source platform for collaborative annotation of images, videos and textual data.
Highlights: Customizable, integrated team management.

- Hive Data:
Paid tool for managing large-scale annotations, with a focus on collaboration and quality.
Strengths: Detailed reporting, integrated validation process.

‍

How do you choose the right tool?

The choice of tool depends on the following factors:

Data type: Images, text, audio or multimodal.
Budget: Open-source or commercial solution.
Team size: whether or not real-time collaboration is required.
Data volume: Manual or automated annotations for large datasets.

‍

These tools not only facilitate the annotation process, but also guarantee efficient project management, contributing to more qualitative, high-performance AI models.

‍

How to guarantee the quality of data annotation?

‍

Ensuring the quality of data annotation is essential for obtaining high-performance, reliable artificial intelligence (AI) models. Quality annotation reduces errors in model training and maximizes their ability to generalize. Here are the main strategies for achieving this:

‍

1. Provide clear, standardized instructions

Well-defined annotation instructions are essential to ensure consistency in the annotation process. These instructions should include:

Precise descriptions of categories or labels.
Concrete examples and counter-examples.
Rules to resolve ambiguities or deal with atypical cases.

These instructions need to be updated in line with feedback from annotators, who are at the heart of this process and need to be professionalized.

‍

2. Training annotators

Annotators need to understand the project objectives and master the annotation tools. Initial training, combined with regular refresher sessions, can improve their accuracy and ability to be rigorous. For specialized tasks, such as medical analysis, it is advisable to work with experts in the field.

‍

3. Use high-performance annotation tools

Annotation tools play an important role in the quality of annotated data. They should include features such as :

Managing conflicts between annotators.
Automatic validation of annotations according to predefined rules.
User-friendly interfaces to minimize human error.

Tools such as CVAT, Prodigy or Labelbox offer advanced features to guarantee better quality.

‍

4. Set up validation by several annotators

To reduce individual bias and ensure consistency, it is useful to have several annotators working on the same data. Conflicting annotations can then be reviewed by an expert or resolved by a majority vote.

‍‍

5. Integrate quality control processes

Setting up regular processes to check annotations is essential. This can include:

Cross-reviews between annotators.
Audits carried out by experts to verify a sample of annotations.
The use of quality metrics such as precision, recall or inter-annotator agreement.

‍

6. Using gold data or "gold standards

Gold standards" are data that have already been annotated and validated by experts. They can be used for :

Train annotators by showing them quality examples.
Compare the annotations produced with a reliable reference.
Regular testing of annotator performance.

‍

7. Automate simple tasks and manually validate complex cases

Automation reduces the workload for simple annotations, such as bounding boxes or image segmentation. Human annotators can then concentrate on ambiguous or expert cases.

‍

8. Managing bias in annotations

Annotations can reflect biases in the annotators or in the data itself. To minimize them :

Provide impartial and inclusive instructions.
Include a variety of annotators to provide different points of view.
Check data representativeness in annotations.

‍

9. Create an iterative process for setting up complex data annotation processes

Data annotation must be a continuous process. By analyzing the performance of models trained with annotated data, it is possible to identify errors or shortcomings and improve annotations for subsequent cycles.

‍

10. Prioritize communication and feedback

Encouraging annotators to ask questions and point out ambiguities improves overall quality. Regular meetings to discuss challenges encountered and possible solutions help refine instructions and ensure greater consistency. A single communication channel for each annotation project is also essential!

‍

How can annotated datasets be used?

‍

Annotated datasets are essential in many fields, as they enable artificial intelligence (AI) models to be trained to solve specific problems. Here are the main application areas where annotated datasets play an important role:

‍

Computer Vision

Dataset annotation is essential for computer vision, where it enables models to identify and locate objects in images or videos. This includes applications such as facial recognition, used for security or personalization, and medical analysis, which helps detect anomalies in X-rays or MRIs.

‍

Another example: in agriculture, annotated satellite images are used to monitor crops and identify diseases or weeds, while in transport, they play a key role in autonomous driving systems.

‍

Natural language processing (NLP)

In the field of natural language processing, annotated datasets are indispensable for tasks such as sentiment analysis, where they help to understand emotions or opinions in texts.

‍

They are also used in machine translation systems, chatbots and voice assistants, which rely on annotations to better interpret users' intentions. Text annotation also makes it possible to develop systems capable of summarizing long documents or extracting named entities, such as dates or people's names.

‍

Health and biotechnology

Annotated datasets play an essential role in healthcare, particularly in medical diagnostics, where they help AI models identify pathologies from images such as scans or ultrasounds.

‍

In genomic analysis, annotations can be used to identify mutations or anomalies in DNA sequences. Telemedicine applications also benefit from annotation, facilitating the automatic interpretation of symptoms for remote diagnosis.

‍

Automotive & Transportation

In the automotive sector, annotated datasets are fundamental for training models embedded in autonomous vehicles, enabling them to recognize pedestrians, road signs or other vehicles. They also contribute to route planning and the identification of obstacles on the road, guaranteeing safe and efficient travel.

‍

Trade and e-commerce

In the retail sector, dataset annotation is used to develop personalized recommendation systems, which analyze purchasing behavior to suggest appropriate products. Visual search, which makes it possible to find a product based on an image, also relies on annotations. Finally, in the fight against fraud, annotated data can be used to identify suspicious behavior in online transactions.

‍

Security and defense

Annotated datasets are at the heart of surveillance and defense systems, notably for facial recognition, used in surveillance videos. They are also indispensable for detecting anomalies or unusual objects, and for analyzing satellite images to monitor borders or assess high-risk areas.

‍

Agriculture and the environment

Precision agriculture relies on annotated datasets to monitor crops, detect disease or estimate yields using drones or satellite images. In the environmental field, data annotation helps track deforestation, assess the impact of pollution or improve climate forecasting models.

‍

Video games and virtual reality

Annotations enable the development of immersive experiences in video games and virtual reality. By detecting players' movements or integrating virtual objects into real environments, they help create natural, engaging interactions.

‍

Education and research

In education, annotated datasets are used to develop learning tools tailored to students' specific needs, such as personalized platforms. In scientific research, they help accelerate discoveries in fields such as biology and astrophysics, by structuring and enriching data for more effective analysis.

‍

Entertainment and media

Dataset annotation is widely used to improve speech recognition, for example in automatic transcriptions for movies or online videos. Streaming platforms also rely on these annotations to offer personalized content recommendations, whether for videos, music or podcasts.

‍

Robotics

In robotics, annotated datasets enable robots to navigate autonomously by interpreting their environment. They are also essential for improving human-machine interaction, enabling robots to understand and respond to human commands.

‍

Finance and banking

Finally, in the financial sector, data annotations help identify fraudulent transactions and automate the processing of financial documents. They are also used to analyze statements or contracts, speeding up decision-making processes.

‍

What are the best practices for annotating datasets?

‍

Dataset annotation is an important step in the development of high-performance artificial intelligence models. To guarantee reliable and exploitable results, it is important to follow certain best practices. Here are the main ones:

‍

1. Define clear, precise objectives

As we said above about 🔗 data qualitybefore starting annotation, it's essential to have a clear understanding of the project's objective. What problem needs to be solved? What type of data is required? For example, an object detection project requires annotations precisely locating objects, while a sentiment analysis project requires textual data labeled with emotions or opinions.

‍

2. Use well-defined annotation guidelines

Providing annotators with clear, standardized instructions is essential to guarantee the consistency and quality of annotations. These instructions should include concrete examples, precise definitions of categories, and rules for handling ambiguous cases.

‍

3. Select qualified annotators

Annotator expertise is a key success factor. For complex tasks, such as 🔗 annotating medical datait's best to call on specialists in the field. For less technical tasks, a well-trained and well-supervised group may suffice.

‍

4.Ensurerepresentative data coverage

It is important that the annotated data is varied and representative of the problem to be solved. This helps to reduce bias and train models capable of generalizing to real data. For example, in a facial recognition project, including images from different lighting conditions, angles and contexts is essential.

‍

5. Carry out regular quality controls

Setting up validation processes to check the quality of annotations is essential. This can include:

Cross-reviews, where several annotators check each other's work.
The use of audit tools or metrics to measure the consistency and accuracy of annotations.

‍

‍6. Automate repetitive tasks

For greater efficiency, use automation tools such as Amazon SageMaker Ground Truth or Scale AI for simple or repetitive tasks. Human annotators can then concentrate on complex or ambiguous cases.

‍

7. Document processes

It's good practice to keep up-to-date documentation of methods and decisions made during the annotation process. This guarantees project continuity, even when teams change, and ensures traceability of annotated data.

‍

8. Iterate to refine annotations

Dataset annotation is often an iterative process. After training a model on an initial annotated dataset, analyzing its performance helps to identify errors or gaps in the annotations. This feedback can then be used to improve the dataset.

‍

9. Managing conflicts and ambiguities

Data can sometimes be ambiguous or open to interpretation. To solve these problems, it is useful to :

Create consensus among annotators through discussions or additional rules.
Set up a validation process involving an expert or supervisor.

‍

10. Maintaining ethics and confidentiality

When sensitive data, such as medical information or personal data, is used, it's very important to guarantee its confidentiality and comply with local regulations, such as the RGPD in Europe.

‍

💡 By following these best practices, it's possible to achieve high-quality annotations for your datasets, tailored to project needs and capable of maximizing the performance of artificial intelligence models.

‍

What future for dataset annotation with advances in AI?

‍

The future of dataset annotation is closely linked to advances in artificial intelligence (AI), which are profoundly transforming this stage of model development. Here are the main trends and possible developments:

‍

Growing automation thanks to AI

AI technologies, such as Deep Learning and generative models, make it possible to significantly reduce reliance on human annotation. Automated tools are able to perform initial annotation tasks, such as object tracing or classification, with increasing accuracy. The human then intervenes mainly to validate or correct the annotations generated.

‍

This doesn't mean that annotation by humans is becoming pointless... on the contrary, the Data Labeler profession is becoming more professional, and it will soon be necessary to master 🔗 complex annotation techniques such as interpolation or SAM2 to produce complete, high-quality datasets.

‍

Unsupervised and self-supervised learning

The rise of 🔗 unsupervised learning methods or self-supervised, where models learn directly from raw data without pre-existing annotations, could limit the need for costly annotations. Such approaches, like Computer Vision models that exploit the relationships between pixels in an image, can generate useful representations without human intervention.

‍

Crowdsourcing and enhanced global collaboration

Despite advances in automation, crowdsourcing remains an essential method for collecting diverse annotations. In the future, more advanced collaborative platforms, incorporating gamification or AI technologies to guide annotators, could improve the speed and quality of human annotation, while widening access to a range of contributors on a global scale. Beware, however, of the ethical impact of crowdsourcing: prefer dataset annotation specialists like Innovatiana!

‍

Enhanced quality thanks to AI

AI-assisted annotation systems, such as those based on pre-trained models, will improve annotation accuracy while reducing human error. These tools will automatically detect inconsistencies and suggest corrections, thus guaranteeing optimum dataset quality.

‍

Dynamic creation of simulated datasets

Simulated environments, such as those used for training autonomous vehicles, offer the possibility of generating automatically annotated datasets. These techniques make it possible to create varied, realistic scenarios at low cost, while precisely controlling data conditions, for example, by simulating varied weather conditions or complex interactions.

‍

Reducing annotation bias

Advances in AI make it possible to better identify and correct biases in annotations, guaranteeing greater representativeness of data. In the future, integrated bias analysis systems will be able to automatically flag up imbalances or equity issues in annotated datasets.

‍

Integration into AI development pipelines

As annotation tools evolve, the annotation process will become a fluid and integrated step in AI development pipelines. This includes the use of unified platforms where annotations, model training and evaluations take place seamlessly and interconnected.

‍

Advanced multimodal annotation

Increasingly complex AI projects require multimodal annotations (images, text, audio). The tools of the future will be able to manage several types of data simultaneously and coordinate their annotations to better reflect the interactions between different modalities, for example, the relationships between a dialogue and an image.

‍

More personalized annotations

As AI advances, annotation tools will become more customizable, adapting to the specific needs of each project or field. For example, pre-trained models in the medical or legal fields will be able to provide contextually relevant annotations, reducing the time and effort required.

‍

Reinforced ethics and regulation

As annotated data volumes increase, 🔗 ethical and regulatory issues will take center stage. AI will play a key role in ensuring that annotations comply with privacy laws and user rights. Automated auditing tools could be deployed to verify annotations' compliance with ethical and legal standards.

‍

Conclusion

‍

Dataset annotation is a cornerstone in the development of artificial intelligence, linking raw data with the ability of algorithms to learn and generalize. This process, while demanding in terms of time, resources and precision, is essential to guarantee high-performance, reliable models.

‍

Thanks to rigorous practices, adapted tools and the emergence of automation technologies, data annotation is evolving to meet the growing challenges of modern AI projects. Whether for Computer Vision, natural language processing or specialized applications such as healthcare or robotics, it plays a key role in enabling artificial intelligence systems to adapt to varied contexts and specific needs.

‍

As technological advances simplify and optimize this process, it remains essential to maintain a balance between human intervention and automation to ensure the quality, diversity and ethics of annotated data. The future of annotation lies in harmonious collaboration between humans and machines, which promises ever more innovative and effective solutions in the field of artificial intelligence.