Object Detection - Part 11
- Using sliding window for object detection. choosing the window size is a challenge
- Region Proposal Netword - use traditional image processing techniques to come up with proposals for objects in the image. This is very fast. Selective search algorithm gives 2000 region proposals in a few seconds on CPU. Use convolutions on the proposed regions
- Detection without proposals

R-CNN

Predicting Bounding boxes from Region Proposal 
Inference Steps 
Using NMS to remove overlapping bounding boxes 
Using NMS to remove overlapping bounding boxes
Fast R-CNN
Fast R-CNN flips ROI proposals and covnets used in R-CNN to speed up the process

Fast R-CNN 
Using Resnet as backbone for Fast R-CNN ROI proposal will crop and resize the feature map. This cropping has to happen in a differential manner. To achieve the same ROI pooling is done.

ROI pooling will provide the same size features for different sized proposal regions Another method for cropping the feature maps is to use
ROI AlignmethodDuring inference, 90% of the time is consumed by ROI Proposal which is done on a CPU
Faster R-CNN

Learnable Regional Proposal Network (RPN) RPN Network uses image features and anchor boxes to generate proposals. This network uses K-different anchor boxes at each point.
For each anchor box at each point the network classifies if the anchor is an object and also predicts the box transforms.
These inputs are provided to the convolutional network by the RPN to predict the class and bounding boxes.

Working of RPN with Anchor boxes
Single-shot Detector (SSD)
Using only the RPN network to predict the final categories and bounding box. (Getting rid of the second stage to speed up the process)

Two stages in Faster R-CNN 
SSD Working
Evaluating Object Detection Models
Average precision provides a balance between precision and recall

calculating mean Average Precision 
calculating the Area under the PR curve to get Average Precision 
Average the Average precision for all categories to get Mean Average Precision In practice we calculate the mAP at various thresholds and take the average

This model is jointly trained on 4 losses
Best practices to use object Detection
- Choose appropriate backbone
- Very big models work better
- Train longer
- using multiscale backbone: Feature Pyramid Networks
- Single stage methods have improved
- Test-time augmentation pushes numbers up
- Big ensembles, more data, etc. provides better accuracy