5.4 KiB
Instructions for evaluating accuracy (mAP) of SSD models
Preparation
-
Prepare image data and label ('bbox') file for the evaluation. I used COCO 2017 Val images (5K/1GB) and 2017 Train/Val annotations (241MB). You could try to use your own dataset for evaluation, but you'd need to convert the labels into COCO Object Detection ('bbox') format if you want to use code in this repository without modifications.
More specifically, I downloaded the images and labels, and unzipped files into
${HOME}/data/coco/
.$ wget http://images.cocodataset.org/zips/val2017.zip \ -O ${HOME}/Downloads/val2017.zip $ wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip \ -O ${HOME}/Downloads/annotations_trainval2017.zip $ mkdir -p ${HOME}/data/coco/images $ cd ${HOME}/data/coco/images $ unzip ${HOME}/Downloads/val2017.zip $ cd ${HOME}/data/coco $ unzip ${HOME}/Downloads/annotations_trainval2017.zip
Later on I would be using the following (unzipped) image and annotation files for the evaluation.
${HOME}/data/coco/images/val2017/*.jpg ${HOME}/data/coco/annotations/instances_val2017.json
-
Install 'pycocotools'. The easiest way is to use
pip3 install
.$ sudo pip3 install pycocotools
Alternatively, you could build and install it from source.
-
Install additional requirements.
$ sudo pip3 install progressbar2
Evaluation
I've created the eval_ssd.py script to do the mAP evaluation.
usage: eval_ssd.py [-h] [--mode {tf,trt}] [--imgs_dir IMGS_DIR]
[--annotations ANNOTATIONS]
{ssd_mobilenet_v1_coco,ssd_mobilenet_v2_coco}
The script takes 1 mandatory argument: either 'ssd_mobilenet_v1_coco' or 'ssd_mobilenet_v2_coco'. In addition, it accepts the following options:
--mode {tf,trt}
: to evaluate either the unoptimized TensorFlow frozen inference graph (tf) or the optimized TensorRT engine (trt).--imgs_dir IMGS_DIR
: to specify an alternative directory for reading image files.--annotations ANNOTATIONS
: to specify an alternative annotation/label file.
For example, I evaluated both 'ssd_mobilenet_v1_coco' and 'ssd_mobilenet_v2_coco' TensorRT engines on my x86_64 PC and got these results. The overall mAP values are 0.230
and 0.246
, respectively.
$ python3 eval_ssd.py --mode trt ssd_mobilenet_v1_coco
......
100% (5000 of 5000) |####################| Elapsed Time: 0:00:26 Time: 0:00:26
loading annotations into memory...
Done (t=0.36s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.11s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=8.89s).
Accumulating evaluation results...
DONE (t=1.37s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.232
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.351
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.254
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.018
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.166
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.530
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.209
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.264
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.264
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.022
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.191
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.606
None
$
$ python3 eval_ssd.py --mode trt ssd_mobilenet_v2_coco
......
100% (5000 of 5000) |####################| Elapsed Time: 0:00:29 Time: 0:00:29
loading annotations into memory...
Done (t=0.37s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.12s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=9.47s).
Accumulating evaluation results...
DONE (t=1.42s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.248
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.375
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.273
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.176
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.573
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.221
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.278
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.279
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.027
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.202
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.643
None