ProbPose: A Probabilistic Approach to 2D Human Pose Estimation

Visual Recognition Group
Czech Technical University in Prague

TL;DR

ProbPose introduces a probabilistic framework for human pose estimation, focusing on reducing false positives by predicting keypoint presence probabilities and handling out-of-image keypoints. It also introduces the new Ex-OKS metric to evaluate models on false positive predictions.

Abstract

Current Human Pose Estimation methods have achieved significant improvements. However, state-of-the-art models ignore out-of-image keypoints and use uncalibrated heatmaps as keypoint location representation. To address these limitations, we propose ProbPose, which predicts for each keypoint: a calibrated probability of keypoint presence at each location in the activation window, the probability of being outside of it, and its predicted visibility. To address the lack of evaluation protocols for out-of-image keypoints, we introduce the CropCOCO dataset and the Extended OKS (Ex-OKS) metric, which extends OKS to out-of-image points. Tested on COCO, CropCOCO, and OCHuman, ProbPose shows significant gains in out-of-image keypoint localization while also improving in-image localization through data augmentation. Additionally, the model improves robustness along the edges of the bounding box and offers better flexibility in keypoint evaluation. The code and models will be released on the project website for research purposes.

Contributions

  • A concept of presence probability that keypoints is in the activation window, distinct from confidence, which measures the model’s trust in its own estimate
  • ProbPose: top-down model for out-of-image kypoints estimation
  • OKSLoss adapted for dense predictions in risk minimization formulation
  • Ex-OKS evaluation metric penalizing false positive keypoints
  • CropCOCO dataset for out-of-image and false positive keypoints evaluation

Results

Below is the comparison with SOTA (ViTPose). ProbPose trained on cropped images has more stable predictions even with substantial crop. Training on cropped images also improves predictions on the bounding box borders and in occluded keypoints.

Following images showcase stability of prediction with gradual cropping. ViTPose (left) is not stable once a limb is cropped and tries to predict the limb in the image. ProbPose (right) is stable even with substantial crop.

BibTeX


        @misc{purkrabek2024ProbPose,
          title={ProbPose: A Probabilistic Approach to 2D Human Pose Estimation}, 
          author={Miroslav Purkrabek and Jiri Matas},
          year={2024},
          eprint={2412.02254},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2412.02254}, 
        }