Skip to content

Latest commit

Β 

History

History
68 lines (43 loc) Β· 2.8 KB

File metadata and controls

68 lines (43 loc) Β· 2.8 KB
description Single Shot Detectors and You Only Look Once

🀳 SSD and YOLO

πŸ˜‰ You Only Look Once

  • πŸ’₯ The approach involves a single neural network trained end to end
    • It takes an image as input and predicts bounding boxes and class labels for each bounding box directly.
  • πŸ˜• The technique offers lower predictive accuracy (e.g. more localization errors) Compared with region based models
  • βž— YOLO divides the input image into an SΓ—S grid. Each grid cell predicts only one object

πŸ‘·β€β™€οΈ Long Story Short: The system divides the input image into an S Γ— S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.

πŸŽ€ Advantages

  • πŸš€ Speed
  • πŸ€Έβ€β™€οΈ Feasible for real time applications

πŸ™„ Disadvantages

  • πŸ˜• Poor performance on small-sized objects
    • It tends to give imprecise object locations.

TODO: Compare versions of YOLO

πŸ€Έβ€β™€οΈ SSD

  • πŸ’₯ Predicts objects in images using a single deep neural network.
  • πŸ€“ The network generates scores for the presence of each object category using small convolutional filters applied to feature maps.
  • ✌ This approach uses a feed-forward CNN that produces a collection of bounding boxes and scores for the presence of certain objects.
  • ❗ In this model, each feature map cell is linked to a set of default bounding boxes

πŸ‘©β€πŸ« Details

  • πŸ–ΌοΈ After going through a certain of convolutions for feature extraction, we obtain a feature layer of size mΓ—n (number of locations) with p channels, such as 8Γ—8 or 4Γ—4 above.
    • And a 3Γ—3 conv is applied on this mΓ—nΓ—p feature layer.
  • πŸ“ For each location, we got k bounding boxes. These k bounding boxes have different sizes and aspect ratios.
    • The concept is, maybe a vertical rectangle is more fit for human, and a horizontal rectangle is more fit for car.
  • πŸ’« For each of the bounding boxes, we will compute c class scores and 4 offsets relative to the original default bounding box shape.

πŸ€“ Long Story Short

The SSD object detection algorithm is composed of 2 parts:

  • Extract feature maps
  • Apply convolution filters to detect objects.

πŸ•΅οΈβ€β™€οΈ Evaluation

  • Better accuracy compared to YOLO
  • Better speed compared to Region based algorithms

πŸ‘€ Visualization

🚫 SSD vs YOLO

🧐 References