Object Detection - Part 11
- Using sliding window for object detection. choosing the window size is a challenge
- Region Proposal Netword - use traditional image processing techniques to come up with proposals for objects in the image. This is very fast. Selective search algorithm gives 2000 region proposals in a few seconds on CPU. Use convolutions on the proposed regions
- Detection without proposals
R-CNN
Predicting Bounding boxes from Region Proposal Inference Steps Using NMS to remove overlapping bounding boxes Using NMS to remove overlapping bounding boxes
Fast R-CNN
Fast R-CNN flips ROI proposals and covnets used in R-CNN to speed up the process
Fast R-CNN Using Resnet as backbone for Fast R-CNN ROI proposal will crop and resize the feature map. This cropping has to happen in a differential manner. To achieve the same ROI pooling is done.
ROI pooling will provide the same size features for different sized proposal regions Another method for cropping the feature maps is to use
ROI Align
methodDuring inference, 90% of the time is consumed by ROI Proposal which is done on a CPU
Faster R-CNN
Learnable Regional Proposal Network (RPN) RPN Network uses image features and anchor boxes to generate proposals. This network uses K-different anchor boxes at each point.
For each anchor box at each point the network classifies if the anchor is an object and also predicts the box transforms.
These inputs are provided to the convolutional network by the RPN to predict the class and bounding boxes.
Working of RPN with Anchor boxes
Single-shot Detector (SSD)
Using only the RPN network to predict the final categories and bounding box. (Getting rid of the second stage to speed up the process)
Two stages in Faster R-CNN SSD Working
Evaluating Object Detection Models
Average precision provides a balance between precision and recall
calculating mean Average Precision calculating the Area under the PR curve to get Average Precision Average the Average precision for all categories to get Mean Average Precision In practice we calculate the mAP at various thresholds and take the average
This model is jointly trained on 4 losses
Best practices to use object Detection
- Choose appropriate backbone
- Very big models work better
- Train longer
- using multiscale backbone: Feature Pyramid Networks
- Single stage methods have improved
- Test-time augmentation pushes numbers up
- Big ensembles, more data, etc. provides better accuracy