By clicking "Accept", you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.
Knowledge

Publication of YOLOv9: understanding YOLO, the most popular object detection algorithm

Written by
Nicolas
Published on
2024-03-02
Reading time
This is some text inside of a div block.
min
πŸ“˜ CONTENTS
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Object detection is a fundamental task in πŸ”— Computer Vision : it enables artificial intelligences to locate and classify objects present in images or videos. The ability to accurately detect objects has many applications, from autonomous cars to surveillance systems. In recent years, one algorithm has gained popularity for its exceptional performance in πŸ”— object detection You Only Look Once (YOLO). But what do you know about this algorithm and what understanding do you have of it?

‍

Don't have a clue? Don't panic, this article is here to explain what YOLO is, its importance in the world of AI and its different versions. After reading this, you'll have a good understanding of YOLO and its applications. Let's get started!

‍

Object detection algorithms: what are they?

‍

Object detection algorithms are computer programs designed to identify and locate objects in an image or video. These powerful detection algorithms can identify multiple objects and classify them into different categories.

‍

A popular example of an object detection algorithm is YOLO (You Only Look Once), which quickly processes images in real time, making it highly effective for applications such as traffic monitoring and control. Another example is the R-CNN (Regions with Convolutional Neural Networks) family, which includes Fast R-CNN and Faster R-CNN, renowned for their accuracy in detecting single or multiple objects by first proposing regions and then classifying them.

‍

With advances in artificial intelligence (Deep Learning), these algorithms are constantly improving, becoming faster and more accurate, and play an essential role in the development of technologies such as autonomous vehicles, where they help automate a system to detect obstacles on the road, for example.

‍
What is YOLO, how important is it in AI?

‍

As we have seen, πŸ”— YOLOor "You Only Look Once", is a special tool that helps computers quickly and accurately see things in images, text files or videos.

‍

Created by experts πŸ”— Joseph Redmon and πŸ”— Ali Farhadi in 2015, YOLO is faster than older tools because it analyzes the entire image in one go. This quick check enables YOLO to quickly identify whether there are other objects, such as cars, trees or animals, and where they are in the image.

‍

The importance of YOLO is enormous for AI, particularly in the development of advanced products such as autonomous vehicles. For autonomous cars, YOLO can function as the car's eyes, quickly spotting things on the road to avoid accidents. Also, embedded in smart cameras, YOLO can help improve video surveillance by automatically detecting unusual behavior, for example in airports or shopping malls. This means that if someone leaves a backpack alone, YOLO can inform the security team immediately via a notification.

‍

YOLO's creators continue to update the algorithm to continually improve it; there are many versions, from YOLOv1 to YOLOv9 (the most recent, released in February 2024), each new version being faster and more accurate. YOLO has become very popular because it gives machines superpowers to see and understand the world quickly and locate objects for a multitude of real-world applications.

‍

‍

‍

‍

Logo


How do you prepare data to train your YOLO models?
Call on our annotators for your most complex data annotation tasks, and improve your data quality to 99% reliability! Work with our data labelers today.

‍

‍

‍

How does YOLO work?

‍

Here's how the YOLO (You Only Look Once) object detection algorithm works, explained in simple steps:

‍

1. Take a photo

First of all, the YOLO algorithm starts with an image, just like when you take a photo with a camera. This is what we call object detection based on the πŸ”—classification of images !

‍

2. Split image

Next, it divides the given image into small squares, like a checkerboard. Each square is checked to see if it contains an object (a cat, a dog or a tin can, for example).

‍

3. Search for clues

For each square, YOLO looks for clues or features such as edges, shapes or textures that might indicate which object is inside. It surrounds them with bounding boxes. As YOLO needs to learn to fully understand and interpret a new dataset, it is sometimes given a reference dataset (or "ground truth") from which it can draw for points of comparison.

‍

4. Make predictions

The algorithm makes a guess for each square in an image: what object could it be, and where exactly is it in the square? It assigns each guess a score to show its level of certainty.

‍

5. Disposing of surplus

Some squares have overlapping guesses of different objects, like two squares guessing part of the same car. YOLO chooses the best guess for each object, getting rid of superfluous guesses.

‍

6. Show what he has found

In the end, YOLO shows you where it thinks each object is by drawing boxes around them and labeling them, like "car" or "tree". If you give it 1,000 images containing dogs and cats, and tell it to identify the cats, it will show you images enriched with metadata pointing to the cats.

‍

YOLO's strong point is that it examines all the elements of an image (broken down into "squares") at the same time. As a result, it is fast and can even operate in real time, which is extremely useful for applications requiring fast reactions, such as autonomous cars or video surveillance!

‍

‍

‍

‍

Logo


πŸ’‘ Did you know?
YOLO, short for "You Only Look Once", is one of the most popular model architectures and object detection algorithms. YOLO is capable of predicting an object's class and the bounding box that defines its location on the image in a single pass, making it ideal for real-time applications.

‍

‍

‍

YOLO vs. R-CNN: what's the difference?

‍

YOLO and R-CNN are both effective for locating objects in images or πŸ”— videosbut they do so in different ways and for different use cases. Here's how they differ in object detection processes!

‍

Speed

YOLO is very fast, as it analyzes the whole image in one go. But R-CNN examines parts of the image several times to find objects, which takes longer. So the YOLO model offers more speed in object detection!

‍

Steps taken

YOLO divides the image into squares, guesses what's inside each one and eliminates unnecessary guesswork. R-CNN starts by finding interesting parts of the image, then examines these parts more closely to determine what they contain.

‍

Precision

R-CNN is very meticulous and precise, as it spends more time checking every part of the image. YOLO is faster, but sometimes not as meticulous as R-CNN.

‍

Use cases

YOLO is suitable when you need quick answers, as in an autonomous car that needs to make quick decisions. R-CNN is preferable when you need to be really sure of what's in the image and have more time to check, for example if a medical image shows signs of disease.

‍

‍

‍

Criteria YOLO R-CNN
Speed Faster Slower
Method Observe an image in 1 go Observe fragments of an image repeatedly
Precision Less precise but improving More precise
Better for Real-time applications Detailed analysis when reactivity is not a constraint


Comparison table: YOLO vs. R-CNN

‍

‍

Overall, using YOLO is like taking a quick look around a room and quickly spotting most of the objects in it. Using R-CNN is like taking the time to look at every nook and cranny of that room to make sure you don't miss anything. These algorithms are both excellent at this game, but they play it differently!

‍

Object detection evolution: from YOLO 1 to YOLO 9

‍

YOLO, an acronym for "You Only Look Once", is a real-time object detection algorithm that has undergone significant improvements since its inception. As a"one shot" detector, it processes images and identifies objects by predicting bounding boxes and class probabilities in a single pass. Over time, YOLO has become increasingly resilient and powerful, as illustrated by its authors' latest publication:

‍

YOLO performance illustration from πŸ”— GitHub of its authors, tested on the MS COCO set. On the x-axis, the number of parameters; on the y-axis, the average accuracy in percent.

‍

YOLO V1

- The first version of YOLO revolutionized the AI / Computer Vision research community with its real-time object detection capabilities, offering much faster inference speeds than existing methods such as R-CNN.

- YOLO v1 divides the incoming image into a grid and predicts several bounding boxes and class probabilities for each grid cell.

- However, with this first version, accuracy was a compromise. YOLO struggled with small objects and produced numerous object location errors.

‍

YOLO V2 and V3

- Subsequent versions, such as YOLO v2 and v3, introduced significant improvements and new features such as anchor boxes, using k-means clustering to predict more accurate bounding box coordinates.

- These versions have also benefited from batch normalization and the handling of higher resolution input images, leading to significantly better detection performance on benchmarks such as the Pascal VOC and COCO datasets.

‍

YOLO V4 and V5

- With the aim of achieving both high speed and high accuracy, YOLO v4 has introduced features such as spatial pyramid pooling and a more complex YOLO architecture based on state-of-the-art convolutional networks.

- YOLO v5, meanwhile, has focused on simplification and optimization, enabling it to run extremely fast on less powerful hardware while maintaining high precision.

‍

YOLO V6 to V8

- The most recent versions of YOLO, from version 6 onwards, introduce continuous improvements focused on real-life applications of YOLO, such as autonomous vehicles or video surveillance. As time progresses, YOLO moves away from the research community to the general public and real-life use cases.

- These versions have refined the use of Deep Learning techniques, including various forms ofdata augmentation and optimization algorithms that have helped improve average accuracy and the ability to detect a diverse range of object classes.

‍

YOLO V9

February 21, 2024 πŸ”— Chien-Yao WangI-Hau Yeh and πŸ”— Hong-Yuan Mark Liao have published the article πŸ”— "YOLOv9: Learn what you want to learn using Programmable Gradient Information.", which introduces a new computer vision model architecture: YOLOv9.

‍

YOLOv9 represents a major advance in the YOLO model series, offering significant improvements in accuracy and efficiency for real-time object detection. It is distinguished from its predecessors, notably YOLOv8, by a πŸ”— 49% reduction in the number of parameters and 43% reduction in computational complexitywhile increasing the average accuracy on the MS COCO dataset by 0.6%.

‍

πŸ”— The YOLOv9 series comprises four models. YOLOv9-s (small), YOLOv9-m (medium), YOLOv9-c (compact) and YOLOv9-e (extended), each varying in terms of number of parameters and performance. These models are designed to meet a wide range of needs, from light-duty applications to more demanding performance requirements.

‍

YOLOv9 introduces two major innovations:

- 1. Programmable Gradient Information (PGI)‍

- 2. the Generalized Efficient Layer Aggregation Network (GELAN)

‍

The ERP is an auxiliary supervision mechanism comprising three main components:

- 1. one main branch

- 2. reversible auxiliary branch

- 3. multi-level auxiliary information

‍

This structure helps mitigate the loss of information caused by information bottlenecks, a common problem in deep neural networks. GELAN combines elements of πŸ”— CSPNetknown for its efficient gradient path planning, and πŸ”— ELAN, which prioritizes inference speed, creating a versatile architecture that emphasizes lightweight design, fast inference and increased accuracy.

‍

In addition, YOLOv9 is suitable for a variety of Computer Vision applications, including logistics and distribution, autonomous vehicles, people counting in the retail sector and sports analysis. These applications benefit from YOLOv9's ability to detect objects in real time with great precision and efficiency.

‍

‍

All in all, YOLOv9 represents a major milestone in artificial intelligence research, reflecting the current momentum of a relentless quest to achieve and maintain leading-edge status in the field. YOLOv9 developers have released the source code on πŸ”— GitHubmaking it easy to adapt to a variety of Computer Vision tasks.

‍

‍

‍

Version Improvements Speed / Precision trade-off Applications
V1 Prediction by grid cell, single-shot method Fast but less precise Fundamental real-time detection (research)
V2 and V3 Anchor boxes, batch standardization Faster and more precise Various real-time applications
V4 and V5 Spatial pyramid pooling, optimizations Balance between speed and precision Demanding environments, such as transport
V6 to V8 Targeted optimizations, improved architectures Highly precise, in real time Specialized applications, such as surveillance
V9 Improved detection of small objects, integration with other AI models and explainable AI Greater precision and speed Applications such as medical imaging, autonomous driving and industrial fault detection


Overview of YOLO versions and evolutions

‍

‍

Over the course of its evolution from YOLO v1 to v9, the YOLO family of object detection algorithms has consolidated its position as a key tool in Computer Vision. With each version, YOLO has become more adept at detecting objects of varying complexity, in a variety of scenarios, becoming an essential component in automation systems where fast, accurate object detection is paramount. To find out more and test YOLOv9, please visit πŸ”— Hugging Face πŸ€— !

‍

‍

What are YOLO's main applications in various industries?

‍

YOLO, one of the world's leading object detection algorithms, is used in a variety of areas of life, making our everyday lives a whole lot easier. Here's a quick overview of the main industries where YOLO is used!

‍

Monitoring systems

YOLO is widely used in surveillance to maintain security in public spaces such as airports, shopping malls and city streets. It quickly identifies unattended objects, such as bags potentially containing dangerous materials, and unusual movements, alerting the authorities in real time. This helps prevent crime and respond quickly to potential threats, ensuring public safety.

‍

Traffic control and management

In the field of road traffic management, YOLO can analyze traffic patterns, spot traffic violations and detect accidents as soon as they occur. Authorities use this real-time data to optimize traffic flows, reduce congestion and deploy emergency services more quickly if necessary. With YOLO, smart cities can effectively manage their roads, potentially saving lives by reducing accident response times.

‍

Health

In the healthcare sector, YOLO is used in medical imaging to identify anomalies in scans and assist in diagnosis. While not as precise as specialized diagnostic tools, it nevertheless speeds up preliminary analysis, flagging up areas that require further examination by a healthcare professional. This YOLO application can speed up patient screening and help in the early detection of disease.

‍

Industrial automation

Manufacturing and logistics industries benefit from YOLO as it streamlines operations by identifying components on assembly lines, tracking inventory in real time and spotting product defects. This leads to better quality control, greater efficiency and lower operating costs, by minimizing human error and increasing throughput.

‍

Retail sales

Retailers use YOLO to understand customer behavior and improve store layouts. By analyzing how individuals move around a store, companies can optimize shelf locations, improve customer service and manage queues more effectively. This information helps build better customer experiences.

‍

Autonomous vehicles

The use of YOLO to develop AI for autonomous vehicles means that cars can detect other cars, pedestrians and obstacles on the road, making it indispensable to the driving decision-making process.

Frequently asked questions

NMS is a post-processing technique used in YOLO to ensure that each detected object is taken into account only once. After YOLO has predicted several bounding boxes for detected objects, NMS examines these boxes and removes the least likely ones, keeping only the most likely bounding boxes. This avoids multiple detections of the same object and improves the accuracy of the algorithm.
The Pascal VOC dataset is a recognized Computer Vision dataset that provides standardized image datasets for object class recognition. YOLO uses this dataset, among others such as COCO, for training and testing to achieve incremental improvements in object detection. Training on VOC helps the model learn to detect the 20 object classes included in the dataset and validate its accuracy and efficiency on training images.
YOLO can detect more than one bounding box per object; however, it relies on the NMS to decide on the most accurate one. The algorithm first predicts several boxes, then, based on class probabilities and intersection scores on union (IoU), selects the best bounding box while discarding the others.
YOLO is designed as a single-shot detector, meaning that it performs both classification and localization in a single pass. It is not fully convolutional, as it relies on fully connected layers at the end of the architecture. A convolutional neural network, on the other hand, has no fully connected layers and performs segmentation, producing a segmentation map. In the object detection problem, YOLO offers a fast and efficient way of detecting objects by bounding box coordinates and class probabilities, while CNNs are often used for pixel-by-pixel segmentation.
No, YOLO doesn't use support vector machines (SVMs) for object classification. Instead, it directly predicts class probabilities for each bounding box using softmax or logistic classifiers within the same Deep Learning model, rather than relying on traditional Machine Learning approaches like SVMs.

‍

One last word

‍

In short, YOLO is a powerful object detection algorithm, and there are few competitors who can match it for the design and marketing of high-performance AI products that are relatively inexpensive to develop. With its excellent object detection performance, real-time object detection features and unrivalled detection performance, YOLO is already used in a wide range of industries. So we hope you've enjoyed the information we've provided in this article. Thank you for reading!

‍

And if you want to know more about preparing datasets to train your YOLO models, why not explore the services offered by πŸ”— Innovatiana ? At Innovatiana, we understand the importance of a well-structured and dense dataset for the effectiveness of artificial intelligence models. We specialize in preparing and processing quality data to maximize the performance of your YOLO! models.