Publication of YOLOv9: understanding YOLO, the most popular object detection algorithm
Object detection is a fundamental task in Computer Vision It enables artificial intelligences to locate and classify objects in images or videos. The ability to accurately detect objects has many applications, from autonomous cars to surveillance systems. In recent years, one algorithm has gained in popularity for its exceptional performance in object detection You Only Look Once (YOLO). But what do you know about this algorithm, and how well do you understand it?
β
Don't have a clue? Don't panic, this article is here to explain what YOLO is, its importance in the world of AI and its different versions. After reading this, you'll have a good understanding of YOLO and its applications. Let's get started!
β
Object detection algorithms: what are they?
β
Object detection algorithms are computer programs designed to identify and locate objects in an image or video. These powerful detection algorithms can identify multiple objects and classify them into different categories.
β
A popular example of an object detection algorithm is YOLO (You Only Look Once), which quickly processes images in real time, making it highly effective for applications such as traffic monitoring and control. Another example is the R-CNN (Regions with Convolutional Neural Networks) family, which includes Fast R-CNN and Faster R-CNN, renowned for their accuracy in detecting single or multiple objects by first proposing regions and then classifying them.
β
With advances in artificial intelligence (Deep Learning), these algorithms are constantly improving, becoming faster and more accurate, and play an essential role in the development of technologies such as autonomous vehicles, where they help automate a system to detect obstacles on the road, for example.
β
What is YOLO, how important is it in AI?
β
We've seen it, YOLOor "You Only Look Once", is a special tool that helps computers to quickly and accurately see things in images, text files or videos.
β
Created by the experts Joseph Redmon and Ali Farhadi in 2015, YOLO is faster than older tools because it analyzes the entire image in one go. This quick check enables YOLO to quickly identify whether there are other objects, such as cars, trees or animals, and where they are in the image.
β
The importance of YOLO is enormous for AI, particularly in the development of advanced products such as autonomous vehicles. For autonomous cars, YOLO can function as the car's eyes, quickly spotting things on the road to avoid accidents. Also, embedded in smart cameras, YOLO can help improve video surveillance by automatically detecting unusual behavior, for example in airports or shopping malls. This means that if someone leaves a backpack alone, YOLO can inform the security team immediately via a notification.
β
YOLO's creators continue to update the algorithm to continually improve it; there are many versions, from YOLOv1 to YOLOv9 (the most recent, released in February 2024), each new version being faster and more accurate. YOLO has become very popular because it gives machines superpowers to see and understand the world quickly and locate objects for a multitude of real-world applications.
β
β
β
β
β
β
β
How does YOLO work?
β
Here's how the YOLO (You Only Look Once) object detection algorithm works, explained in simple steps:
β
1. Take a photo
First of all, the YOLO algorithm starts with an image, just like when you take a photo with a camera. We call this object detection based on image image classification !
β
2. Split image
Next, it divides the given image into small squares, like a checkerboard. Each square is checked to see if it contains an object (a cat, a dog or a tin can, for example).
β
3. Search for clues
For each square, YOLO looks for clues or features such as edges, shapes or textures that might indicate which object is inside. It surrounds them with bounding boxes. As YOLO needs to learn to fully understand and interpret a new dataset, it is sometimes given a reference dataset (or "ground truth") from which it can draw for points of comparison.
β
4. Make predictions
The algorithm makes a guess for each square in an image: what object could it be, and where exactly is it in the square? It assigns each guess a score to show its level of certainty.
β
5. Disposing of surplus
Some squares have overlapping guesses of different objects, like two squares guessing part of the same car. YOLO chooses the best guess for each object, getting rid of superfluous guesses.
β
6. Show what he has found
In the end, YOLO shows you where it thinks each object is by drawing boxes around them and labeling them, like "car" or "tree". If you give it 1,000 images containing dogs and cats, and tell it to identify the cats, it will show you images enriched with metadata pointing to the cats.
β
YOLO's strong point is that it examines all the elements of an image (broken down into "squares") at the same time. As a result, it is fast and can even operate in real time, which is extremely useful for applications requiring fast reactions, such as autonomous cars or video surveillance!
β
β
β
β
β
β
β
YOLO vs. R-CNN: what's the difference?
β
YOLO and R-CNN are both effective for locating objects in images or videos. videosbut they do so in different ways and for different use cases. Here's how they differ in their object detection processes!
β
Speed
YOLO is very fast, as it analyzes the whole image in one go. But R-CNN examines parts of the image several times to find objects, which takes longer. So the YOLO model offers more speed in object detection!
β
Steps taken
YOLO divides the image into squares, guesses what's inside each one and eliminates unnecessary guesswork. R-CNN starts by finding interesting parts of the image, then examines these parts more closely to determine what they contain.
β
Precision
R-CNN is very meticulous and precise, as it spends more time checking every part of the image. YOLO is faster, but sometimes not as meticulous as R-CNN.
β
Use cases
YOLO is suitable when you need quick answers, as in an autonomous car that needs to make quick decisions. R-CNN is preferable when you need to be really sure of what's in the image and have more time to check, for example if a medical image shows signs of disease.
β
β
β
β
β
Overall, using YOLO is like taking a quick look around a room and quickly spotting most of the objects in it. Using R-CNN is like taking the time to look at every nook and cranny of that room to make sure you don't miss anything. These algorithms are both excellent at this game, but they play it differently!
β
Object detection evolution: from YOLO 1 to YOLO 9
β
YOLO, an acronym for "You Only Look Once", is a real-time object detection algorithm that has undergone significant improvements since its inception. As a"one shot" detector, it processes images and identifies objects by predicting bounding boxes and class probabilities in a single pass. Over time, YOLO has become increasingly resilient and powerful, as illustrated by its authors' latest publication:
β
β
YOLO V1
- The first version of YOLO revolutionized the AI / Computer Vision research community with its real-time object detection capabilities, offering much faster inference speeds than existing methods such as R-CNN.
- YOLO v1 divides the incoming image into a grid and predicts several bounding boxes and class probabilities for each grid cell.
- However, with this first version, accuracy was a compromise. YOLO struggled with small objects and produced numerous object location errors.
β
YOLO V2 and V3
- Subsequent versions, such as YOLO v2 and v3, introduced significant improvements and new features such as anchor boxes, using k-means clustering to predict more accurate bounding box coordinates.
- These versions have also benefited from batch normalization and the handling of higher resolution input images, leading to significantly better detection performance on benchmarks such as the Pascal VOC and COCO datasets.
β
YOLO V4 and V5
- With the aim of achieving both high speed and high accuracy, YOLO v4 has introduced features such as spatial pyramid pooling and a more complex YOLO architecture based on state-of-the-art convolutional networks.
- YOLO v5, meanwhile, has focused on simplification and optimization, enabling it to run extremely fast on less powerful hardware while maintaining high precision.
β
YOLO V6 to V8
- The most recent versions of YOLO, from version 6 onwards, introduce continuous improvements focused on real-life applications of YOLO, such as autonomous vehicles or video surveillance. As time progresses, YOLO moves away from the research community to the general public and real-life use cases.
- These versions have refined the use of Deep Learning techniques, including various forms ofdata augmentation and optimization algorithms that have helped improve average accuracy and the ability to detect a diverse range of object classes.
β
YOLO V9
February 21, 2024, Chien-Yao WangI-Hau Yeh and Hong-Yuan Mark Liao published the article "YOLOv9: Learning what you want to learn using Programmable Gradient Information", which introduces a new computer vision model architecture: YOLOv9.
β
YOLOv9 represents a major advance in the YOLO model series, offering significant improvements in accuracy and efficiency for real-time object detection. It distinguishes itself from its predecessors, notably YOLOv8, by a 49% reduction in the number of parameters and 43% reduction in computational complexitywhile increasing average accuracy on the MS COCO dataset by 0.6%.
β
The YOLOv9 series comprises four models YOLOv9-s (small), YOLOv9-m (medium), YOLOv9-c (compact) and YOLOv9-e (large), each varying in terms of number of parameters and performance. These models are designed to meet a wide range of needs, from light-duty applications to more demanding performance requirements.
β
YOLOv9 introduces two major innovations:
- 1. Programmable Gradient Information (PGI)β
- 2. the Generalized Efficient Layer Aggregation Network (GELAN)
β
The ERP is an auxiliary supervision mechanism comprising three main components:
- 1. one main branch
- 2. reversible auxiliary branch
- 3. multi-level auxiliary information
β
This structure helps mitigate the loss of information caused by information bottlenecks, a common problem in deep neural networks. GELAN combines elements of CSPNetknown for its efficient gradient path planning, andELANwhich prioritizes inference speed, creating a versatile architecture that emphasizes lightweight design, rapid inference and increased accuracy.
β
In addition, YOLOv9 is suitable for a variety of Computer Vision applications, including logistics and distribution, autonomous vehicles, people counting in the retail sector and sports analysis. These applications benefit from YOLOv9's ability to detect objects in real time with great precision and efficiency.
β
β
All in all, YOLOv9 represents a major milestone in artificial intelligence research, reflecting the current dynamic of a relentless quest to achieve and maintain leading-edge status in the field. The developers of YOLOv9 have published the source code on GitHub, making it easy to adapt to various Computer Vision tasks.
β
β
β
β
β
Over the course of its evolution from YOLO v1 to v9, the YOLO family of object detection algorithms has consolidated its position as a key tool in Computer Vision. With each version, YOLO has become more adept at detecting objects of varying complexity, in a variety of scenarios, becoming an essential component in automation systems where fast, accurate object detection is paramount. To find out more and test YOLOv9, please go to Hugging Face π€ !
β
β
What are YOLO's main applications in various industries?
β
YOLO, one of the world's leading object detection algorithms, is used in a variety of areas of life, making our everyday lives a whole lot easier. Here's a quick overview of the main industries where YOLO is used!
β
Monitoring systems
YOLO is widely used in surveillance to maintain security in public spaces such as airports, shopping malls and city streets. It quickly identifies unattended objects, such as bags potentially containing dangerous materials, and unusual movements, alerting the authorities in real time. This helps prevent crime and respond quickly to potential threats, ensuring public safety.
β
Traffic control and management
In the field of road traffic management, YOLO can analyze traffic patterns, spot traffic violations and detect accidents as soon as they occur. Authorities use this real-time data to optimize traffic flows, reduce congestion and deploy emergency services more quickly if necessary. With YOLO, smart cities can effectively manage their roads, potentially saving lives by reducing accident response times.
β
Health
In the healthcare sector, YOLO is used in medical imaging to identify anomalies in scans and assist in diagnosis. While not as precise as specialized diagnostic tools, it nevertheless speeds up preliminary analysis, flagging up areas that require further examination by a healthcare professional. This YOLO application can speed up patient screening and help in the early detection of disease.
β
Industrial automation
Manufacturing and logistics industries benefit from YOLO as it streamlines operations by identifying components on assembly lines, tracking inventory in real time and spotting product defects. This leads to better quality control, greater efficiency and lower operating costs, by minimizing human error and increasing throughput.
β
Retail sales
Retailers use YOLO to understand customer behavior and improve store layouts. By analyzing how individuals move around a store, companies can optimize shelf locations, improve customer service and manage queues more effectively. This information helps build better customer experiences.
β
Autonomous vehicles
The use of YOLO to develop AI for autonomous vehicles means that cars can detect other cars, pedestrians and obstacles on the road, making it indispensable to the driving decision-making process.
β
One last word
β
In short, YOLO is a powerful object detection algorithm, and there are few competitors who can match it for the design and marketing of high-performance AI products that are relatively inexpensive to develop. With its excellent object detection performance, real-time object detection features and unrivalled detection performance, YOLO is already used in a wide range of industries. So we hope you've enjoyed the information we've provided in this article. Thank you for reading!
β
And if you'd like to find out more about preparing datasets to train your YOLO models, why not explore the services offered by Innovatiana ? At Innovatiana, we understand the importance of a well-structured and dense dataset for the effectiveness of artificial intelligence models. We specialize in preparing and processing quality data to maximize the performance of your YOLO models!