The YOLO (You Only Look Once) algorithm is a state-of-the-art, real-time object detection algorithm that uses neural networks to provide real-time object detection. It was first introduced in 2015 by Joseph Redmon et al. and has since undergone several iterations, the latest being YOLO v7. YOLO divides an input image into an S × S grid, and if the center of an object falls into a grid cell, that grid cell is responsible for detecting several regions of interest pointing to different objects. Compared to other region proposal classification networks, YOLO architecture is more like a fully convolutional neural network and passes the image once through the FCNN and output is prediction. Some of the key features of YOLO include:
-
Speed: YOLO is extremely fast and can process images in real-time, making it well-suited for applications such as video surveillance, self-driving cars, and augmented reality.
-
Accuracy: YOLO provides accurate results with minimal background errors.
-
Learning capabilities: The algorithm has excellent learning capabilities that enable it to learn the representations of objects and apply them in object detection.
The YOLO algorithm aims to predict a class of an object and the bounding box that defines the object location on the input image. It recognizes each bounding box using four numbers: center of the bounding box, width of the box, height of the box, and the corresponding number for the predicted class as well as the probability of the prediction. YOLO has several variants, including tiny YOLO and YOLOv3.