Object Detection - Part 11

Using sliding window for object detection. choosing the window size is a challenge
Region Proposal Netword - use traditional image processing techniques to come up with proposals for objects in the image. This is very fast. Selective search algorithm gives 2000 region proposals in a few seconds on CPU. Use convolutions on the proposed regions
Detection without proposals

Fast R-CNN flips ROI proposals and covnets used in R-CNN to speed up the process
Fast R-CNN
Using Resnet as backbone for Fast R-CNN
ROI proposal will crop and resize the feature map. This cropping has to happen in a differential manner. To achieve the same ROI pooling is done.
ROI pooling will provide the same size features for different sized proposal regions
Another method for cropping the feature maps is to use ROI Align method
During inference, 90% of the time is consumed by ROI Proposal which is done on a CPU

Learnable Regional Proposal Network (RPN)
RPN Network uses image features and anchor boxes to generate proposals. This network uses K-different anchor boxes at each point.
For each anchor box at each point the network classifies if the anchor is an object and also predicts the box transforms.
These inputs are provided to the convolutional network by the RPN to predict the class and bounding boxes.
Working of RPN with Anchor boxes

Using only the RPN network to predict the final categories and bounding box. (Getting rid of the second stage to speed up the process)
Two stages in Faster R-CNN
SSD Working