Class-wise boundary regression by uncertainty in temporal action detection

摘要

Temporal action detection is a crucial aspect of video understanding. It aims to classify the action as well as locate the start and end boundaries of the action in the untrimmed videos. As deep learning is frequently utilized, the accuracy of annotation is crucial to boundary localization. However, it is observed that some annotation instances are ambiguous and the ambiguity varies between categories. To solve the problem above, a Gaussian model is built to estimate the boundary uncertainty for each instance. Based on instance uncertainty, category uncertainty is applied to describe the uncertainty of each category. By combining instance and category uncertainty, the boundaries of the selected proposals are refined and the ranking of candidate proposals is adjusted. Furthermore, overcorrection is avoided for categories with a high level of uncertainty. With the uncertainty approach, state-of-the-art performance is achieved: 57.5% on THUMOS14 (mAP@0.5) and 35.4% on ActivityNet (mAP@Avg).

出版物
In IET Image Processing