Data Valuation

Data valuation in machine learning tries to determine the worth of data, or data sets, for downstream tasks. Some methods are task-agnostic and consider datasets as a whole, mostly for decision making in data markets. These look at distributional distances between samples. More often, methods look at how individual points affect performance of specific machine learning models. They assign a scalar to each element of a training set which reflects its contribution to the final performance of some model trained on it. Some concepts of value depend on a specific model of interest, others are model-agnostic.

Concepts of the usefulness of a datum or its influence on the outcome of a prediction have a long history in statistics and ML, in particular through the notion of the influence function. However, it has only been recently that rigorous and practical notions of value for data, and in particular data-sets, have appeared in the ML literature, often based on concepts from collaborative game theory, but also from generalization estimates of neural networks, or optimal transport theory, among others.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 119 papers

Title	Date	Tasks	Status	Hype	Score
shapiq: Shapley Interactions for Machine Learning	Oct 2, 2024	BenchmarkingData Valuation	CodeCode Available	4	5
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions	May 22, 2024	Data ValuationGPU	CodeCode Available	2	5
Redefining Contributions: Shapley-Driven Federated Learning	Jun 1, 2024	Collaborative FairnessContribution Assessment	CodeCode Available	1	5
Data Valuation and Detections in Federated Learning	Nov 9, 2023	Data ValuationFederated Learning	CodeCode Available	1	5
Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning	Oct 26, 2021	BIG-bench Machine LearningData Valuation	CodeCode Available	1	5
Data Banzhaf: A Robust Data Valuation Framework for Machine Learning	May 30, 2022	Data Valuation	CodeCode Available	1	5
LAVA: Data Valuation without Pre-Specified Learning Algorithms	Apr 28, 2023	Data Valuation	CodeCode Available	1	5
OpenDataVal: a Unified Benchmark for Data Valuation	Jun 18, 2023	BenchmarkingData Valuation	CodeCode Available	1	5
The Shapley Value in Machine Learning	Feb 11, 2022	BIG-bench Machine LearningData Valuation	CodeCode Available	1	5
Data Shapley: Equitable Valuation of Data for Machine Learning	Apr 5, 2019	BIG-bench Machine LearningData Valuation	CodeCode Available	1	5
Interpretable Machine Learning for TabPFN	Mar 16, 2024	Data ValuationIn-Context Learning	CodeCode Available	1	5
Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value	Apr 16, 2023	CPUData Valuation	CodeCode Available	1	5
Data Valuation Without Training of a Model	Jan 3, 2023	Data Valuationmodel	CodeCode Available	1	5
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation	Mar 2, 2025	Data Valuation	CodeCode Available	1	5
Targeted synthetic data generation for tabular data via hardness characterization	Oct 1, 2024	Data AugmentationData Valuation	CodeCode Available	0	5
Scaling Laws for the Value of Individual Data Points in Machine Learning	May 30, 2024	Data ValuationLearning Theory	CodeCode Available	0	5
Beyond Models! Explainable Data Valuation and Metric Adaption for Recommendation	Feb 12, 2025	Data ValuationFairness	CodeCode Available	0	5
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning	Feb 3, 2025	Data ValuationLanguage Modeling	CodeCode Available	0	5
Scalable Data Point Valuation in Decentralized Learning	May 1, 2023	Data ValuationFederated Learning	CodeCode Available	0	5
Towards Data Valuation via Asymmetric Data Shapley	Nov 1, 2024	Data ValuationDecision Making	CodeCode Available	0	5
One Sample Fits All: Approximating All Probabilistic Values Simultaneously and Efficiently	Oct 31, 2024	AllData Valuation	CodeCode Available	0	5
LossVal: Efficient Data Valuation for Neural Networks	Dec 5, 2024	Data Valuation	CodeCode Available	0	5
Exploring Data Redundancy in Real-world Image Classification through Data Selection	Jun 25, 2023	Active LearningContinual Learning	CodeCode Available	0	5
Data Selection for Fine-tuning Large Language Models Using Transferred Shapley Values	Jun 16, 2023	Data ValuationLanguage Modeling	CodeCode Available	0	5
Precedence-Constrained Winter Value for Effective Graph Data Valuation	Feb 2, 2024	Data Valuation	CodeCode Available	0	5
EcoVal: An Efficient Data Valuation Framework for Machine Learning	Feb 14, 2024	Data Valuation	CodeCode Available	0	5
DUPRE: Data Utility Prediction for Efficient Data Valuation	Feb 22, 2025	Data ValuationPrediction	CodeCode Available	0	5
ModelPred: A Framework for Predicting Trained Model from Training Data	Nov 24, 2021	Data ValuationMemorization	CodeCode Available	0	5
Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation	Mar 23, 2025	Data ValuationValue prediction	CodeCode Available	0	5
Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution	Jan 29, 2024	Data Valuation	CodeCode Available	0	5
Probably Approximate Shapley Fairness with Applications in Machine Learning	Dec 1, 2022	Data ValuationFairness	CodeCode Available	0	5
Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards	Dec 17, 2021	BIG-bench Machine LearningData Valuation	CodeCode Available	0	5
CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification	Nov 13, 2022	Data Valuation	CodeCode Available	0	5
FW-Shapley: Real-time Estimation of Weighted Shapley Values	Mar 9, 2025	Data Valuation	CodeCode Available	0	5
In-Context Probing Approximates Influence Function for Data Valuation	Jul 17, 2024	Data Valuation	CodeCode Available	0	5
Data Valuation with Gradient Similarity	May 13, 2024	Data Valuation	CodeCode Available	0	5
DeRDaVa: Deletion-Robust Data Valuation for Machine Learning	Dec 18, 2023	Data Valuation	CodeCode Available	0	5
2D-Shapley: A Framework for Fragmented Data Valuation	Jun 18, 2023	counterfactualData Valuation	CodeCode Available	0	5
Data Valuation using Reinforcement Learning	Sep 25, 2019	Data ValuationDomain Adaptation	CodeCode Available	0	5
A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms"	Apr 9, 2023	Data Valuation	CodeCode Available	0	5
Data Valuation using Neural Networks for Efficient Instruction Fine-Tuning	Feb 14, 2025	Data Valuation	CodeCode Available	0	5
Data valuation: The partial ordinal Shapley value for machine learning	May 2, 2023	Abstract AlgebraData Valuation	CodeCode Available	0	5
Data Distribution Valuation	Oct 6, 2024	Data ValuationFraud Detection	CodeCode Available	0	5
Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms	Aug 22, 2019	Data ValuationFairness	CodeCode Available	0	5
CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning	Jun 17, 2024	Data ValuationDecision Making	CodeCode Available	0	5
Faithful Group Shapley Value	May 25, 2025	Computational EfficiencyData Valuation	CodeCode Available	0	5
Influence-based Attributions can be Manipulated	Sep 8, 2024	Data ValuationFairness	CodeCode Available	0	5
2D-OOB: Attributing Data Contribution Through Joint Valuation Framework	Aug 7, 2024	Data PoisoningData Valuation	CodeCode Available	0	5
Towards Algorithmic Fairness by means of Instance-level Data Re-weighting based on Shapley Values	Mar 3, 2023	Data ValuationDecision Making	CodeCode Available	0	5
Profit Allocation for Federated Learning	Jan 1, 2019	Data ValuationFederated Learning	CodeCode Available	0	5

Show:10 25 50

← PrevPage 1 of 3Next →

No leaderboard results yet.