Object Detection - Part 11
- Using sliding window for object detection. choosing the window size is a challenge
- Region Proposal Netword - use traditional image processing techniques to come up with proposals for objects in the image. This is very fast. Selective search algorithm gives 2000 region proposals in a few seconds on CPU. Use convolutions on the proposed regions
- Detection without proposals
R-CNN
Fast R-CNN
Fast R-CNN flips ROI proposals and covnets used in R-CNN to speed up the process
ROI proposal will crop and resize the feature map. This cropping has to happen in a differential manner. To achieve the same ROI pooling is done.
Another method for cropping the feature maps is to use
ROI Align
methodDuring inference, 90% of the time is consumed by ROI Proposal which is done on a CPU
Faster R-CNN
RPN Network uses image features and anchor boxes to generate proposals. This network uses K-different anchor boxes at each point.
For each anchor box at each point the network classifies if the anchor is an object and also predicts the box transforms.
These inputs are provided to the convolutional network by the RPN to predict the class and bounding boxes.
Single-shot Detector (SSD)
Using only the RPN network to predict the final categories and bounding box. (Getting rid of the second stage to speed up the process)
Evaluating Object Detection Models
Average precision provides a balance between precision and recall
In practice we calculate the mAP at various thresholds and take the average
Best practices to use object Detection
- Choose appropriate backbone
- Very big models work better
- Train longer
- using multiscale backbone: Feature Pyramid Networks
- Single stage methods have improved
- Test-time augmentation pushes numbers up
- Big ensembles, more data, etc. provides better accuracy