YOLO (You Only Look Once)

Hana M May 02, 2023 | 10:00 AM Technology

YOLO (You Only Look Once) is a family of state-of-the-art real-time object detection algorithms for computer vision applications. The first version of YOLO was introduced in 2016, and it quickly gained popularity due to its fast inference speed and high accuracy.

The YOLO algorithm works by dividing an input image into a grid of cells and predicting bounding boxes and class probabilities for each grid cell. This approach allows YOLO to perform object detection with a single forward pass of a neural network, making it much faster than other object detection algorithms that require multiple passes.

YOLOv1 and YOLOv2 were the first two versions of the algorithm, and they were both based on the Darknet neural network architecture. YOLOv3, released in 2018, improved on the previous versions by using a larger network and introducing a feature pyramid network to improve detection at different scales.

Figure 1. How YOLO works.

Figure 1 shows how YOLO works. Here are the main steps in how YOLO works:

  1. Preprocessing: The input image is resized and normalized to a fixed size and format that the YOLO algorithm can handle.
  2. Feature extraction: The input image is processed through a deep neural network, such as Darknet, to extract a set of features that represent the contents of the image.
  3. Objectness score: For each grid cell, YOLO predicts an objectness score that indicates the likelihood of an object being present in that cell. This score is based on the intersection over union (IoU) between the ground truth bounding boxes and the predicted bounding boxes.
  4. Bounding box prediction: For each grid cell with a high objectness score, YOLO predicts a bounding box that encloses the object. Each bounding box is represented by four coordinates (x, y, width, height) relative to the grid cell.
  5. Class prediction: For each bounding box, YOLO predicts the probability that it belongs to each class of objects that the algorithm has been trained to detect.
  6. Non-maximum suppression: The final step of YOLO involves applying non-maximum suppression to the set of predicted bounding boxes. This removes duplicate detections of the same object and keeps only the most confident detection.

One of the key advantages of YOLO is its fast inference speed, which makes it suitable for real-time applications such as self-driving cars, robotics, and video surveillance. YOLO has also been used in a variety of other applications, including tracking objects in sports videos and detecting defects in manufacturing processes.

Despite its advantages, YOLO does have some limitations. For example, it can struggle with small objects and objects with complex shapes, and it may not perform as well in crowded or highly occluded scenes. However, ongoing research and development have led to improved versions of YOLO, such as YOLOv4 and YOLOv5, which continue to push the boundaries of real-time object detection performance.

References:

  1. 1. https://www.researchgate.net/figure/A-YOLO-model-21-Represents-the-working-of-YOLO-model-for-detecting-the-objects-from_fig2_344775420

Cite this article:

Hana M (2023), YOLO (You Only Look Once), AnaTechmaz, pp.220

Recent Post

Blog Archive