SOTAVerified

A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark

2023-07-10Code Available1· sign in to hype

Jakub Paplham, Vojtech Franc

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Comparing different age estimation methods poses a challenge due to the unreliability of published results stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. This paper identifies two trivial, yet persistent issues with the currently used evaluation protocol and describes how to resolve them. We offer an extensive comparative analysis for state-of-the-art facial age estimation methods. Surprisingly, we find that the performance differences between the methods are negligible compared to the effect of other factors, such as facial alignment, facial coverage, image resolution, model architecture, or the amount of data used for pretraining. We use the gained insights to propose using FaRL as the backbone model and demonstrate its effectiveness on all public datasets. We make the source code and exact data splits public on GitHub.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
AFADFaRL+MLPMAE3.12Unverified
AFADResNet-50-Unimodal-ConcentratedMAE3.2Unverified
AFADResNet-50-RegressionMAE3.17Unverified
AFADResNet-50-OR-CNNMAE3.16Unverified
AFADResNet-50-Mean-VarianceMAE3.16Unverified
AFADResNet-50-DLDL-v2MAE3.15Unverified
AFADResNet-50-Cross-EntropyMAE3.14Unverified
AFADResNet-50-DLDLMAE3.14Unverified
AFADResNet-50-SORDMAE3.14Unverified
AgeDBFaRL+MLPMAE5.64Unverified
AgeDBResNet-50-OR-CNNMAE5.78Unverified
AgeDBResNet-50-DLDLMAE5.8Unverified
AgeDBResNet-50-DLDL-v2MAE5.8Unverified
AgeDBResNet-50-Cross-EntropyMAE5.81Unverified
AgeDBResNet-50-SORDMAE5.81Unverified
AgeDBResNet-50-Mean-VarianceMAE5.85Unverified
AgeDBResNet-50-Unimodal-ConcentratedMAE5.9Unverified
AgeDBResNet-50-RegressionMAE6.23Unverified
CACDResNet-50-DLDLMAE3.96Unverified
CACDResNet-50-Unimodal-ConcentratedMAE4.1Unverified
CACDResNet-50-Mean-VarianceMAE4.07Unverified
CACDResNet-50-RegressionMAE4.06Unverified
CACDResNet-50-OR-CNNMAE4.01Unverified
CACDFaRL+MLPMAE3.96Unverified
CACDResNet-50-SORDMAE3.96Unverified
CACDResNet-50-DLDL-v2MAE3.96Unverified
CACDResNet-50-Cross-EntropyMAE3.96Unverified
ChaLearn 2016FaRL+MLPMAE3.38Unverified
MORPH Album2 (SE)FaRL+MLPMAE3.04Unverified
MORPH Album2 (SE)ResNet-50-Unimodal-ConcentratedMAE2.78Unverified
MORPH Album2 (SE)ResNet-50-Cross-EntropyMAE2.81Unverified
MORPH Album2 (SE)ResNet-50-DLDLMAE2.81Unverified
MORPH Album2 (SE)ResNet-50-SORDMAE2.81Unverified
MORPH Album2 (SE)ResNet-50-DLDL-v2MAE2.82Unverified
MORPH Album2 (SE)ResNet-50-OR-CNNMAE2.83Unverified
MORPH Album2 (SE)ResNet-50-Mean-VarianceMAE2.83Unverified
MORPH Album2 (SE)ResNet-50-RegressionMAE2.83Unverified
UTKFaceResNet-50-RegressionMAE4.72Unverified
UTKFaceResNet-50-Unimodal-ConcentratedMAE4.47Unverified
UTKFaceResNet-50-Mean-VarianceMAE4.42Unverified
UTKFaceResNet-50-DLDL-v2MAE4.42Unverified
UTKFaceResNet-50-OR-CNNMAE4.4Unverified
UTKFaceResNet-50-DLDLMAE4.39Unverified
UTKFaceResNet-50-Cross-EntropyMAE4.38Unverified
UTKFaceResNet-50-SORDMAE4.36Unverified
UTKFaceFaRL+MLPMAE3.87Unverified

Reproductions