Data Augmentation

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 8378 papers

Title	Date	Tasks	Status	Hype	Score
YOLOv10: Real-Time End-to-End Object Detection	May 23, 2024	2D Object DetectionData Augmentation	CodeCode Available	11	5
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data	Jan 19, 2024	Data AugmentationDepth Estimation	CodeCode Available	9	5
Symmetry Considerations for Learning Task Symmetric Robot Policies	Mar 7, 2024	Data AugmentationDeep Reinforcement Learning	CodeCode Available	7	5
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer	Jul 24, 2024	Data AugmentationDecoder	CodeCode Available	7	5
RouteLLM: Learning to Route LLMs with Preference Data	Jun 26, 2024	Data AugmentationTransfer Learning	CodeCode Available	7	5
Dynamic Evaluation of Large Language Models by Meta Probing Agents	Feb 21, 2024	Data Augmentation	CodeCode Available	7	5
AugLy: Data Augmentations for Robustness	Jan 17, 2022	Adversarial RobustnessData Augmentation	CodeCode Available	5	5
DEIM: DETR with Improved Matching for Fast Convergence	Dec 5, 2024	Data AugmentationGPU	CodeCode Available	5	5
MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data	Jun 24, 2024	Data AugmentationOptical Character Recognition (OCR)	CodeCode Available	5	5
A Survey on Knowledge Distillation of Large Language Models	Feb 20, 2024	Data AugmentationKnowledge Distillation	CodeCode Available	5	5
OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation	Jun 2, 2025	Data AugmentationHuman Animation	CodeCode Available	5	5
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Feb 9, 2024	Data AugmentationGSM8K	CodeCode Available	4	5
Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching	Oct 8, 2024	Data AugmentationStyle Transfer	CodeCode Available	4	5
A Framework For Contrastive Self-Supervised Learning And Designing A New Approach	Aug 31, 2020	Data AugmentationImage Classification	CodeCode Available	4	5
DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio	May 11, 2022	CPUData Augmentation	CodeCode Available	4	5
Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System	May 17, 2024	Data AugmentationSpeech Dereverberation	CodeCode Available	4	5
RecBole 2.0: Towards a More Up-to-Date Recommendation Library	Jun 15, 2022	BenchmarkingData Augmentation	CodeCode Available	4	5
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning	May 13, 2024	Data AugmentationGSM8K	CodeCode Available	3	5
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering	Sep 3, 2023	Data AugmentationDomain Adaptation	CodeCode Available	3	5
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training	May 14, 2024	Data AugmentationSelf-Supervised Learning	CodeCode Available	3	5
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Oct 21, 2024	Autonomous DrivingData Augmentation	CodeCode Available	3	5
Improved motif-scaffolding with SE(3) flow matching	Jan 8, 2024	Data AugmentationDiversity	CodeCode Available	3	5
Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations	Oct 15, 2022	Contrastive LearningData Augmentation	CodeCode Available	3	5
AutoAugment: Learning Augmentation Policies from Data	May 24, 2018	Data AugmentationDomain Generalization	CodeCode Available	3	5
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera	Jan 5, 2025	Data AugmentationDepth Estimation	CodeCode Available	3	5

Show:10 25 50

← PrevPage 1 of 336Next →

All datasets ImageNet CIFAR-10 GA1457

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	DeiT-B (+MixPro)	Accuracy (%)	82.9	—	Unverified
2	ResNet-200 (DeepAA)	Accuracy (%)	81.32	—	Unverified
3	DeiT-S (+MixPro)	Accuracy (%)	81.3	—	Unverified
4	ResNet-200 (Fast AA)	Accuracy (%)	80.6	—	Unverified
5	ResNet-200 (UA)	Accuracy (%)	80.4	—	Unverified
6	ResNet-200 (AA)	Accuracy (%)	80	—	Unverified
7	ResNet-50 (DeepAA)	Accuracy (%)	78.3	—	Unverified
8	ResNet-50 (TA wide)	Accuracy (%)	78.07	—	Unverified
9	ResNet-50 (LoRot-E)	Accuracy (%)	77.72	—	Unverified
10	ResNet-50 (LoRot-I)	Accuracy (%)	77.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WideResNet-40-2 (Faster AA)	Percentage error	3.7	—	Unverified
2	Shake-Shake (26 2×32d) (Faster AA)	Percentage error	2.7	—	Unverified
3	WideResNet-28-10 (Faster AA)	Percentage error	2.6	—	Unverified
4	Shake-Shake (26 2×112d) (Faster AA)	Percentage error	2	—	Unverified
5	Shake-Shake (26 2×96d) (Faster AA)	Percentage error	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DiffAug	Classification Accuracy	92.7	—	Unverified
2	PaCMAP	Classification Accuracy	85.3	—	Unverified
3	hNNE	Classification Accuracy	77.4	—	Unverified
4	TopoAE	Classification Accuracy	74.6	—	Unverified