Use opencv to detect faces using 11 different caffe models

May 18, 2020

Although many tutorials on using opencv and cv2.dnn.readnetfromcaffe exist over the internet , all of them just use the same model and don't reveal the fact that not all models work the same way.

I will be sharing some data related to the performance of all of those models.

You can also check my tutorial on caffe models and how to read them.

I, as many others, use opencv mainly for image augmentation or processing.

I used opencv for real time applications on embedded systems such as raspberry pi or nvidia jetson.

Here i will be showing the results for two aspect ratios 1.0, 1.0/255.0 .

The results shown will be the ones with the class human, but i will be showing the overall results as well with a confidence less than the max reported confidence for an object minus 0.3 as i have seen some weird results.
Some models work with an aspect ratio and produce completely wrong results on the other.
Some models detect a different set of faces on each aspect ratio.
I tested the model on some images chosen from the fddb dataset to measure the model's accuracy on hardly recognized faces, most of the faces are occluded in some way, and it has front faces, side faces, statues, drawings and reflections of faces on glass.

VGG VOC0712 SSD 300x300 iter 160000

VGG is a neural network for detection that is trained on pascal's visual object classes's dataset for the years 2007 and 2012 challenges.
The network is trained as a single shot multibox detector with input images of size 300x300.
Its trained for 160000 iterations.
It took 0.5s on average to detect faces.
It detected about 52% of the faces keeping in mind 50% is not bad at all when considering how hard the dataset is.

VGG_VOC0712 SSD 512x512 iter 160000
VGG network is trained on VOC07 and VOC12 datasets.
Its trained as a single shot detector for input images of size 512x512.
Its trained for 160000 iterations.
It took 1.2s on average to detect faces.

It detected about 43% of the faces as well but they were many times different than those detected by the 300x300 model.

VGG ILSVRC2016 SSD 300x300 iter 440000
VGG network is trained on imagenet's large scale visual recognition challenge for the 2016 competition.
Its trained as a single shot detector with size 300x300.
its trained for 440000 iterations.
It took 0.7s on average to detect faces.
Although it detected more than 48% of the faces, it detected some false faces and it detected each face multiple times, solving that problem can be as easy as making the confidence threshold higher or combining bboxes with more than 70% intersection.
Its quite robust to different sizes, rotations and quality problems but its not as good as the others with occlusions.

MobileNet VOC0712 SSD
MobileNet network is trained on MS-COCO dataset then fine tuned on pascal's visual object classes's dataset for the years 2007 and 2012 challenges.
The one trained on MS-COCO first had a classification mean average precision of 0.727 .
The one not trained on MS-COCO had a maP of 0.68 as reported by the developers.
It took 0.05s on average to detect faces.
It detected 48% of the faces.

VGG VOC0712 SSD 512x512 iter 120000
VGG network is trained on VOC07 and VOC12.
Its trained as a single shot detector for input images of size 512x512.
Its trained for 120000 iterations.
It took 1.2s on average to detect faces.
It detected 46% of the faces.

VGG COCO SSD 512x512
VGG network is trained on the 2007 coco dataset only.
Its trained as a single shot detector for input images of size 512x512.
It took 1.2s on average to detect faces.
It detected 46% of the faces.

VGG COCO SSD 300x300
VGG network is trained on the 2007 coco dataset only.
Its trained as a single shot detector for input images of size 512x512.
It took 0.4s on average to detect faces.
It detected 47% of the faces.

VGG VOC0712 SSD 300x300 iter 120000

VGG network is trained on VOC07 and VOC12 datasets.

Its trained as a single shot detector for input images of size 300x300.

It took 0.5s on average to detect faces.

VGG_VOC0712 SSD 512x512 iter 240000

VGG network is trained on VOC07 and VOC12 datasets.

Its trained as a single shot detector for input images of size 512x512.

Its trained for 160000 iterations.

It took 1.3s on average to detect faces.

Search This Blog

The algorithms portal

Use opencv to detect faces using 11 different caffe models

Comments

Post a Comment

Popular posts from this blog

Create a route optimization algorithm with zero costs using google's OR-tools and OSRM Part 3

Create a route optimization algorithm with zero costs using google's OR-tools and OSRM Part 1

Learn python programming through algorithms - Binpacking part 2