• +91-9899714317 | +91-8810612382

YOLO Object Detection with OpenCV and Python

The You Only Look Once (YOLO) architecture was developed to create a one step process for detection and classification.  The image is divided into a fixed grid of uniform cells and bounding boxes are predicted and classified within each cell.  This architecture enables faster object detection and has been applied to streaming video.

The organization geography is displayed beneath. The pink hued layers have been quantized with 1 digit for loads and 3 cycle for enactments, and will be executed in the HW gas pedal, while different layers are executed in python.

The image processing is performed within Darknet by using python bindings.

The neural network has been trained on the PASCAL VOC (Visual Object Classes) and is able to identify 20 classes of objects in 4 categories

  1. Person: person
  2. Animal: bird, cat, cow, dog, horse, sheep
  3. Vehicle: airplane, bicycle, boat, bus, car, motorbike, train
  4. Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

The means for discovery and grouping are like the past organization as this organization additionally utilizes the Multi-layer offload design.

Initialize the network

  1. Import libraries
  2. Instantiate classifier
  3. Perform other initializations in the Darknet framework

Classify image

  1. Open image to be classified
  2. Execute the first convolutional layer in Python
  3. Compute HW Offload of the quantized layers
  4. Normalize using fully connected layers in python

Code for classification:


Draw detection boxes using Darknet

The image postprocessing (drawing the bounding boxes) is performed in darknet using python bindings

Code for image postprocessing:

Sample image (horses)

The first image that I going to use is a provided sample image of horses (773 x 512 pixels)



class: cow probability: 84%

class: horse probability: 74%

class: horse probability: 68%

Object detection bounding boxes:


The example shows the issues that occur with multiple overlapping objects.

Street camera images

The application that I would like to use neural networks for is object identification in video streams from surveillance cameras. 

It is typically pointed at the carport and post box, yet the skillet/slant ability permits me to turn upward and down the road and furthermore at my front entryway (270 levels of inclusion). At present, picture movement discovery and PIR detecting let me know when something is distinguished yet I want to take a gander at the camera video to decide whether it is something of interest. What’s more, obviously, there are a ton of misleading discoveries. I have 2 video sources that I might want to dissect, the live taken care of from the camera and furthermore put away video from an organization video recorder (NVR). I have different cameras, yet I figure it would be alright to expect that every camera have devoted handling equipment.

Night image

class: car probability: 30%

The car to the right is not detected.

Day image

class: car probability: 86%

class: car probability: 34%

Multiple bounding boxes for the same image

Truck image

class: car probability: 79%    — no separate class for truck

Truck image

class: car probability: 96%  — improved classification with larger image size (better resolution?)

Multiple objects

class: car probability: 79%

class: car probability: 75%

class: person probability: 51%

<h4 class="item-title">Analytic Square</h4>

Analytic Square

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

J-66, 2ND FLOOR, Rajouri Garden, New Delhi, Delhi 110027


Get in touch!

Please enter your email to subscribe to program details

© 2022 Analytic Square All Rights Reserved by site