SOTAVerified

Data Valuation

Data valuation in machine learning tries to determine the worth of data, or data sets, for downstream tasks. Some methods are task-agnostic and consider datasets as a whole, mostly for decision making in data markets. These look at distributional distances between samples. More often, methods look at how individual points affect performance of specific machine learning models. They assign a scalar to each element of a training set which reflects its contribution to the final performance of some model trained on it. Some concepts of value depend on a specific model of interest, others are model-agnostic.

Concepts of the usefulness of a datum or its influence on the outcome of a prediction have a long history in statistics and ML, in particular through the notion of the influence function. However, it has only been recently that rigorous and practical notions of value for data, and in particular data-sets, have appeared in the ML literature, often based on concepts from collaborative game theory, but also from generalization estimates of neural networks, or optimal transport theory, among others.

Title	Date	Tasks	Status
Augment & Valuate : A Data Enhancement Pipeline for Data-Centric AI	Dec 7, 2021	BIG-bench Machine LearningData Valuation	—Unverified
A Principled Approach to Data Valuation for Federated Learning	Sep 14, 2020	Data SummarizationData Valuation	—Unverified
Data Overvaluation Attack and Truthful Data Valuation in Federated Learning	Feb 1, 2025	Data ValuationFederated Learning	—Unverified
A Unified Framework for Task-Driven Data Quality Management	Jun 10, 2021	Data SummarizationData Valuation	—Unverified
DAVED: Data Acquisition via Experimental Design for Data Markets	Mar 20, 2024	Data ValuationExperimental Design	—Unverified
Disentangled Structural and Featural Representation for Task-Agnostic Graph Valuation	Aug 22, 2024	Data ValuationDiversity	—Unverified
Data Valuation by Leveraging Global and Local Statistical Information	May 23, 2024	Data Valuation	—Unverified
Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset	Oct 15, 2020	Data ValuationPneumonia Detection	—Unverified
Data Valuation for Offline Reinforcement Learning	May 19, 2022	Data ValuationDeep Reinforcement Learning	—Unverified
Data Acquisition for Improving Model Fairness using Reinforcement Learning	Dec 4, 2024	Data ValuationFairness	—Unverified

Title

Status

Hype

Augment & Valuate : A Data Enhancement Pipeline for Data-Centric AI

—Unverified

A Principled Approach to Data Valuation for Federated Learning

—Unverified

Data Overvaluation Attack and Truthful Data Valuation in Federated Learning

—Unverified

A Unified Framework for Task-Driven Data Quality Management

—Unverified

DAVED: Data Acquisition via Experimental Design for Data Markets