Knowledge

Discover interactive segmentation: a new era in image analysis

Written by

Aïcha

Published on

2025-03-08

Reading time

This is some text inside of a div block.

min

📘 CONTENTS

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

La segmentation d’images consiste à diviser une image en régions significatives afin d’en faciliter l’analyse. Lorsqu’elle est interactive, un humain guide l’algorithme (par exemple, avec des outils d'annotation avancés) pour obtenir une segmentation précise de zones d’intérêt spécifiques. Cette approche permet de segmenter n’importe quel objet, même non prévu par les classes d’un modèle automatique, grâce aux indications de l’utilisateur. Pour préparer des jeux de données, la segmentation interactive s’avère donc précieuse pour combler les lacunes des méthodes entièrement automatiques, en combinant la rapidité de l’IA et l’expertise humaine.

‍

💡 In this article, we explore the principles of interactive segmentation, trace the evolution of techniques (from rule-based methods to neural networks), present its flagship applications (medical imaging, image editing, robotics, etc.), and discuss the current challenges as well as future prospects for this technology.

‍

Interface utilisateur de CVAT (illustration) pour la segmentation interactive : les annotateurs utilisent la fonctionnalité Segment Anything 2 pour créer des masques sur les feuilles, puis les examinent et corrigent manuellement pour une meilleure précision (Source : Innovatiana)

‍

Interactive segmentation principle

‍

Interactive segmentation involves man-machine collaboration to isolate an object in an image. The user provides visual indications and the segmentation algorithm calculates the corresponding mask (s). Several modes of interaction are commonly used:

Control points: the user clicks on a few pixels, marking them as belonging either to the target object(positive points) or to the background(negative points). The system then adjusts the mask accordingly, with the user adding further points until the desired result is achieved.
Boîte englobante (bounding box) : l’utilisateur trace un rectangle approximatif autour de l’objet d’intérêt. L’algorithme segmentera ensuite précisément l’intérieur de cette boîte en distinguant l’objet du fond.
Scribbles / brushstrokes: the user paints rough lines on the object to be kept and possibly on the background to be excluded. These scribbles serve as a guide for the algorithm to delimit the zones.

‍

Each new indication from the user updates the segmentation iteratively, until the target object is correctly isolated. The great advantage of this approach is that it removes ambiguity in complex cases: the human can specify what the machine should segment. For example, if several objects are touching, or if the lighting disturbs the scene, the user can orientate the result with just a few clicks. In this way, interactive segmentation combines the precision of human control with the computational speed of algorithms, delivering a result that is often more reliable than a fully automatic (or fully manual) method on difficult images.

‍

Are you looking for quality datasets for your Computer Vision models?

Don't hesitate to contact us: our team of Data Labelers has the expertise and experience to segment your most complex images and videos.

‍

Evolution of image segmentation techniques

‍

Image segmentation has evolved considerably over the last few decades, from simple deterministic methods to high-performancedeep learning algorithms. There are three main stages in this evolution:

‍

1. Rule-based methods (1980s-1990s)‍

Les premiers procédés de segmentation reposaient sur des critères fixés manuellement par des experts en traitement d’image. Parmi ces techniques classiques, on retrouve par exemple le seuillage (binarisation d’une image en fonction d’un seuil de luminance ou de couleur), la détection de contours (délimitation des objets via leurs bords en examinant les gradients de l’image) ou le region growing (regroupement de pixels voisins ayant des caractéristiques similaires). Ces méthodes “à la main” fonctionnent bien dans des cas simples, mais manquent de robustesse dès que les scènes sont complexes ou les paramètres de prise de vue variables. Elles doivent souvent être ajustées image par image. Néanmoins, elles ont posé les bases théoriques de la segmentation et restent utilisées pour des besoins simples ou en pré-traitement.

‍

2. Machine learning-based approaches (2000s)‍

Avec les progrès de l’apprentissage statistique, les chercheurs ont introduit des modèles capables d’apprendre à segmenter à partir de données annotées. Par exemple, des méthodes combinent des descripteurs de pixels (couleur, texture, etc.) et des classifieurs entraînés (SVM, forêts aléatoires…) pour prédire l’étiquette (objet ou fond) de chaque pixel. D’autres techniques, comme les random walks (marches aléatoires) ou les modèles markoviens (MRF/CRF), intègrent des informations de voisinage pour améliorer la cohérence des segments. En segmentation interactive, un algorithme marquant cette époque est le Graph Cut (et son extension GrabCut) qui utilise un modèle de graphe pour séparer interactivement un objet : l’utilisateur initie le processus (par exemple en entourant grossièrement l’objet) et l’algorithme optimise une découpe du graphe image en minimisant un critère de coût. Globalement, ces approches apprennent partiellement des données, ce qui les rend plus adaptatives que les simples règles fixes. Toutefois, leur performance reste limitée par la nécessité de définir manuellement les bonnes caractéristiques à apprendre (handcrafted features), et elles atteignent vite leurs limites sur des images très complexes ou des objets variés.

‍

3. Neural networks and Deep Learning (2010s to present)‍

La révolution est venue des réseaux de neurones convolutifs (CNN) capables d’apprendre automatiquement les caractéristiques pertinentes pour segmenter des images. Des modèles tels que U-Net, Mask R-CNN ou plus récemment Segment Anything (SAM) de Meta ont repoussé les frontières en termes de précision et de généralisation. En alimentant ces réseaux avec de grands jeux d’images annotées, ils parviennent à segmenter finement des objets aux formes et aux tailles variées, parfois même dans des conditions d’arrière-plan difficiles. Les techniques modernes mêlent souvent encoder-decoder (pour capturer le contexte global et les détails locaux) et attention multi-échelle, ce qui les rend très efficaces pour distinguer chaque pixel de l’image. De plus, certains modèles récents sont promptables, c’est-à-dire qu’ils acceptent des instructions (points, boîte, texte) en entrée pour segmenter n’importe quelle cible désignée dans l’image. Cela les rend particulièrement adaptés à la segmentation interactive, où un point ou un clic de l’utilisateur peut servir de prompt pour générer instantanément un masque.

‍

It's important to note that, despite the excellence of neural networks, traditional methods haven't totally disappeared: in contexts where computing resources are limited or images very simple, well-chosen thresholding may suffice. Nevertheless, for industrial applications requiring robustness and scale, Deep Learning-based approaches dominate image segmentation today.

‍

Applications in various fields

‍

La segmentation interactive a des applications variées dès qu’il s’agit d’isoler des objets visuels avec précision. Elle est utilisée tant pour annoter des données (création de datasets d’entraînement pour l’IA) que pour des outils destinés aux professionnels ou au grand public. Voici quelques domaines majeurs où elle apporte une valeur ajoutée :

‍

Medicine and biomedical imaging

‍

*Segmentation d’une IRM cérébrale : image originale (a) et image segmentée en trois tissus : substance blanche (WM), grise (GM) et liquide cérébro-spinal (CSF) (b). (Source :* ***pmc.ncbi.nlm.nih.gov***)

‍

In medicine, image segmentation is used to delineate anatomical structures or anomalies (tumors, organs, lesions, etc.) on imaging scans (MRI, CT, ultrasound, etc.). Automatic methods are useful, but specialist intervention is often required to correct and refine results. Manually analyzing entire volumes is extremely time-consuming and subject to variations.

‍

Interactive segmentation speeds up this process: a radiologist can, for example, automatically segment a tumor and then correct it in a few clicks if necessary, instead of delimiting it entirely by hand. Similarly, when preparing for computer-assisted surgery, the surgeon can quickly adjust the segmented target area (such as an organ to be treated) to obtain an accurate 3D model. Thanks to these interactive tools, reliable cut-outs of structures of interest can be obtained more quickly, aiding diagnosis, treatment planning or the creation of customized operating guides.

‍

Image editing and graphic design

‍

*Exemple d’****extraction de sujet*** *par GrabCut : en encadrant grossièrement le chat dans la photo (gauche), l’algorithme segmente automatiquement le sujet sur fond transparent (droite). Source :* ***researchgate.net***

‍

Whether in photography, advertising or design, interactive segmentation is a valuable tool for manipulating visual elements. A common application is object clipping (or background removal): this involves removing the background from an image to isolate the subject (product, person, etc.). Consumer software such as Photoshop incorporate intelligent selection tools (magnetic lasso, enhanced magic wand, etc.) based on interactive segmentation algorithms: the user indicates the approximate area to be preserved, the tool calculates the precise contour and refines it by painting over poorly cropped areas.

‍

Today, many online services offer one-click AI-assisted background removal. However, they often provide a "manual" mode to adjust the result, as the automatic may confuse elements (for example, fine hair with the background). Interactive segmentation is also used in augmented reality (to dynamically place an object or person in a different setting) or for selective colorization (to isolate a colored element on a black-and-white background, etc.). In all these cases, it offers the user precise control, while eliminating the need to trace contours entirely by hand.

‍

Robotics, autonomous vehicles and machine vision

‍

*Real-time segmented urban scene for an autonomous vehicle (each color representing a class)*

‍

Les systèmes robotiques et les véhicules autonomes s’appuient largement sur la vision par ordinateur pour comprendre leur environnement. En particulier, la segmentation sémantique fournit une compréhension fine de chaque pixel de l’image capturée par la caméra du robot ou de la voiture : elle attribue à chacun une étiquette (véhicule, piéton, route, obstacle, bâtiment…).

‍

Ceci est particulièrement important pour la navigation, car le système doit savoir où est la route, comment distinguer un piéton d’un lampadaire, etc. Dans la plupart des cas, ces segmentations sont effectuées de façon entièrement automatique par des réseaux de neurones entraînés sur des milliers d’images urbaines. Néanmoins, la constitution de ces bases de données d’entraînement fait largement appel à la segmentation interactive : des opérateurs humains annotent manuellement des exemples (images de rue) en utilisant des outils interactifs pour segmenter chaque objet, afin de créer des vérités terrain précises pour entraîner les modèles. Par ailleurs, en robotique industrielle, un opérateur peut utiliser la segmentation interactive pour enseigner rapidement à un robot à identifier une pièce particulière parmi d’autres sur une chaîne de montage (en la segmentant sur quelques images, pour générer des exemples).

‍

We can therefore see that humans intervene either upstream (to produce high-quality annotated data) or possibly in supervision (for example, a driver supervising an autonomous vehicle could correct the detection of an ambiguous obstacle in real time via an interactive segmentation interface, if such assistance functionalities exist in the future). In all cases, interactive segmentation provides quality assurance and a safety net in fields (transport, automation, robotics) where reliability is paramount.

‍

Current challenges and future prospects

‍

Despite its success, interactive segmentation faces a number of challenges. Ideally, we'd like to segment any object with a single click or instruction. Recent work is moving in this direction with foundation models such as Meta's Segment Anything Model (SAM), capable of generating a mask from a simple point or bounding box provided as input. These highly generic models produce impressive results, but they are not infallible. In practice, their predictions still often require human validation and correction. For example, we note that an annotation produced by SAM is not always perfect, and that a specialist has to rework it to achieve the required quality.

‍

Improving first-time accuracy is therefore a key challenge, and will require more efficient networks, possibly combining vision and language (we're beginning to explore models that can be guided by a textual instruction, such as "select the large tree on the right of the image").

‍

On the other hand, interactive segmentation needs to be adapted to new types of data. For example,3D (volumetric)imaging or video pose additional challenges: how can a user effectively guide segmentation in a temporal sequence or volume? Tools need to be invented to propagate corrections over time or across 3D slices, so that humans don't have to go through it all again frame by frame. Other avenues of research involvecontinuous learning: an interactive system could learn from the user's corrections, to avoid repeating the same errors. This is known as adaptive interactive segmentation, where the model is customized to the operator's preferences or to the specific data encountered.

‍

Another challenge lies in theuser experience itself: making the annotation interface as intuitive and efficient as possible. This requires, for example, instant visual feedback (so that the user can see in real time the effect of his clicks), intelligent suggestions (the system could proactively propose to segment a given object if the user hesitates), and the ability to cancel or refine locally without starting all over again. Latency must be kept to a minimum to enable fluid interaction: this means optimizing algorithms (some recent work has focused on lightweight models that can run in real time on CPUs.

Despite these challenges, the outlook for interactive segmentation is very promising. With the rise of ever more powerful and general-purpose AI models, we can imagine tools capable of "segmenting everything" almost instantaneously, requiring only a quick validation by the user. In many professional fields, these advances will save precious time for experts (doctors, engineers, etc.), who will be able to concentrate on analysis rather than on tedious data preparation... even if these tools in no way dispense with the need to set up a complete and efficient labeling process (or LabelOps).

‍

In conclusion, interactive segmentation is a good illustration of the complementarity between humans and AI: algorithms provide speed of execution and the ability to process large volumes of images, while human expertise guarantees the relevance and quality of the final result. Current research efforts are aimed at minimizing the need for intervention without eliminating it altogether, so that the final decision remains in informed human hands. We're confident that in the near future, thanks to the continuous improvement of models and interfaces, interactive segmentation will become an even more transparent and powerful tool, integrating naturally into many workflows without us even realizing it.

‍

Sources for further information

‍

- For a general introduction to the various image segmentation techniques, you may wish to consult 🔗 this article from Innovatiana.

- The 🔗 Kili Technology blog details the principles of interactive segmentation and how it can be used.

- Finally, to discover Meta's Segment Anything model, which foreshadows the future of universal segmentation, we suggest you read 🔗 SAM: everything you need to know.

Happy exploring!