Libra r-cnn: Towards balanced learning for object detection

티스토리 뷰

Paper Review/Modern Detector

Libra r-cnn: Towards balanced learning for object detection

Arc Lab. 2019. 8. 23. 14:28

[업데이트 2019.08.24 13:14]

1. 논문

Libra r-cnn: Towards balanced learning for object detection

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 821-830

2019.07.16-20

soure code: https://github.com/OceanPang/Libra_R-CNN

2. 요약

- In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels – sample level, feature level, and objective level.

- we propose Libra R-CNN, a simple but effective framework towards balanced learning for object detection. It integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level.

- Based on this paradigm, the success of the object detector training depends on three key aspects:

(1) whether the selected region samples are representative,

(2) whether the extracted visual features are fully utilized, and

(3) whether the designed objective function is optimal.

However, our study reveals that the typical training process is significantly imbalanced in all these aspects.

- To mitigate the adverse effects caused by these issues, we propose Libra R-CNN, a simple but effective framework for object detection that explicitly enforces the balance at all three levels discussed above. This framework integrates three novel components:

(1) IoU-balanced sampling, which mines hard samples according to their IoU with assigned ground-truth.

(2) balanced feature pyramid, which strengthens the multi-level features using the same deeply integrated balanced semantic features.

(3) balanced L1 loss, which promotes crucial gradients, to rebalance the involved classification, overall localization and accurate localization.

- Without bells and whistles, Libra R-CNN achieves 2.5 points and 2.0 points higher Average Precision (AP) than FPN Faster R-CNN and RetinaNet respectively on MS COCO [21].

Here, we summarize our main contributions:

(1) We systematically revisit the training process of detectors. Our study reveals the imbalance problems at three levels that limit the detection performance.

(2) We propose Libra RCNN, a framework that rebalances the training process by combining three new components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss.

(3) We test the proposed framework on MS COCO, consistently obtaining significant improvements over state-of-the-art detectors, including both single-stage and two-stage ones.