SOTAVerified

Data Valuation

Data valuation in machine learning tries to determine the worth of data, or data sets, for downstream tasks. Some methods are task-agnostic and consider datasets as a whole, mostly for decision making in data markets. These look at distributional distances between samples. More often, methods look at how individual points affect performance of specific machine learning models. They assign a scalar to each element of a training set which reflects its contribution to the final performance of some model trained on it. Some concepts of value depend on a specific model of interest, others are model-agnostic.

Concepts of the usefulness of a datum or its influence on the outcome of a prediction have a long history in statistics and ML, in particular through the notion of the influence function. However, it has only been recently that rigorous and practical notions of value for data, and in particular data-sets, have appeared in the ML literature, often based on concepts from collaborative game theory, but also from generalization estimates of neural networks, or optimal transport theory, among others.

Title	Date	Tasks	Status
Data value estimation on private gradients	Dec 22, 2024	Data ValuationFederated Learning	—Unverified
Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation	Aug 22, 2024	Data ValuationDiversity	—Unverified
Dissecting Representation Misalignment in Contrastive Learning via Influence Function	Nov 18, 2024	Contrastive LearningData Valuation	—Unverified
Efficient Data Shapley for Weighted Nearest Neighbor Algorithms	Jan 20, 2024	Computational EfficiencyData Valuation	—Unverified
Efficient Data Valuation Approximation in Federated Learning: A Sampling-based Approach	Apr 23, 2025	Data ValuationFederated Learning	—Unverified
Energy-Based Learning for Cooperative Games, with Applications to Valuation Problems in Machine Learning	Jun 5, 2021	Data ValuationVariational Inference	—Unverified
Exploiting the Data Gap: Utilizing Non-ignorable Missingness to Manipulate Model Learning	Sep 6, 2024	Data ValuationImputation	—Unverified
Fairness-Aware Data Valuation for Supervised Learning	Mar 29, 2023	Active LearningData Valuation	—Unverified
Fairshare Data Pricing via Data Valuation for Large Language Models	Jan 31, 2025	Data ValuationMath	—Unverified
2D-OOB: Attributing Data Contribution Through Joint Valuation Framework	Aug 7, 2024	Data PoisoningData Valuation	CodeCode Available

Title

Status

Hype

Data value estimation on private gradients

—Unverified

Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation

—Unverified

Dissecting Representation Misalignment in Contrastive Learning via Influence Function

—Unverified

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

—Unverified

Efficient Data Valuation Approximation in Federated Learning: A Sampling-based Approach