Paper Notes: End-to-End Object Detection with Transformers
Notes from DETR paper
Key Ideas
- An end-to-end object detection approach with respect to images instead by single or double stage methods using anchors and proposals
- Uses bipartite matching loss function using Hungarian Algorithm to enforce permutation-invariance and unique matches
- Parallel decoding with Transformers instead of auto-regressive models like RNN
- Learn positional encoding using object queries in Transformers - these are responsible to detect bounding boxes in different areas of an image
Background Reading
[TODO: Add details about these]
- Bipartite Matching Loss
- Hungarian Algorithm
- Transformer Architecture
- Positional Encoding
- IoU loss
Architecture
Hungarian Loss:
$y$ - ground truth set of objects
$\hat{y}$ - set of predictions from 1 to $N$
$y_i = (c_i, b_i)$
$c_i$ - ground truth class
$\hat{p}_i$ - predicted class probability
$b_i \in [0,1]^4$ - vector defining $[center_x, center_y, height, width]$
$\mathbb{I}$ - identity function equals to 1 when $c_i \ne \phi$ else 0