Data Augmentation

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 8378 papers

Title	Date	Tasks	Status	Hype	Score
YOLOv10: Real-Time End-to-End Object Detection	May 23, 2024	2D Object DetectionData Augmentation	CodeCode Available	11	5
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data	Jan 19, 2024	Data AugmentationDepth Estimation	CodeCode Available	9	5
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer	Jul 24, 2024	Data AugmentationDecoder	CodeCode Available	7	5
Dynamic Evaluation of Large Language Models by Meta Probing Agents	Feb 21, 2024	Data Augmentation	CodeCode Available	7	5
Symmetry Considerations for Learning Task Symmetric Robot Policies	Mar 7, 2024	Data AugmentationDeep Reinforcement Learning	CodeCode Available	7	5
RouteLLM: Learning to Route LLMs with Preference Data	Jun 26, 2024	Data AugmentationTransfer Learning	CodeCode Available	7	5
OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation	Jun 2, 2025	Data AugmentationHuman Animation	CodeCode Available	5	5
DEIM: DETR with Improved Matching for Fast Convergence	Dec 5, 2024	Data AugmentationGPU	CodeCode Available	5	5
AugLy: Data Augmentations for Robustness	Jan 17, 2022	Adversarial RobustnessData Augmentation	CodeCode Available	5	5
MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data	Jun 24, 2024	Data AugmentationOptical Character Recognition (OCR)	CodeCode Available	5	5
A Survey on Knowledge Distillation of Large Language Models	Feb 20, 2024	Data AugmentationKnowledge Distillation	CodeCode Available	5	5
Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System	May 17, 2024	Data AugmentationSpeech Dereverberation	CodeCode Available	4	5
RecBole 2.0: Towards a More Up-to-Date Recommendation Library	Jun 15, 2022	BenchmarkingData Augmentation	CodeCode Available	4	5
DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio	May 11, 2022	CPUData Augmentation	CodeCode Available	4	5
Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching	Oct 8, 2024	Data AugmentationStyle Transfer	CodeCode Available	4	5
A Framework For Contrastive Self-Supervised Learning And Designing A New Approach	Aug 31, 2020	Data AugmentationImage Classification	CodeCode Available	4	5
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Feb 9, 2024	Data AugmentationGSM8K	CodeCode Available	4	5
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies	Jun 9, 2022	3D Classification3D Part Segmentation	CodeCode Available	3	5
PyABSA: A Modularized Framework for Reproducible Aspect-based Sentiment Analysis	Aug 2, 2022	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available	3	5
Segment Any Medical Model Extended	Mar 26, 2024	Data AugmentationImage Segmentation	CodeCode Available	3	5
DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector	Apr 13, 2024	Data AugmentationKey Point Matching	CodeCode Available	3	5
YOLOv4: Optimal Speed and Accuracy of Object Detection	Apr 23, 2020	BIG-bench Machine LearningData Augmentation	CodeCode Available	3	5
Data Generation for Hardware-Friendly Post-Training Quantization	Oct 29, 2024	Data AugmentationGPU	CodeCode Available	3	5
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning	Feb 15, 2024	Data AugmentationInstruction Following	CodeCode Available	3	5
Robust and Accurate Object Detection via Adversarial Learning	Mar 23, 2021	AutoMLData Augmentation	CodeCode Available	3	5
Pythia v0.1: the Winning Entry to the VQA Challenge 2018	Jul 26, 2018	Data AugmentationVisual Question Answering (VQA)	CodeCode Available	3	5
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning	May 13, 2024	Data AugmentationGSM8K	CodeCode Available	3	5
OpenGraph: Towards Open Graph Foundation Models	Mar 2, 2024	Data AugmentationGraph Learning	CodeCode Available	3	5
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera	Jan 5, 2025	Data AugmentationDepth Estimation	CodeCode Available	3	5
Improved motif-scaffolding with SE(3) flow matching	Jan 8, 2024	Data AugmentationDiversity	CodeCode Available	3	5
Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations	Oct 15, 2022	Contrastive LearningData Augmentation	CodeCode Available	3	5
EfficientNetV2: Smaller Models and Faster Training	Apr 1, 2021	AutoMLClassification	CodeCode Available	3	5
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training	May 14, 2024	Data AugmentationSelf-Supervised Learning	CodeCode Available	3	5
Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection	Sep 7, 2023	AnatomyData Augmentation	CodeCode Available	3	5
Data Augmentation for Sequential Recommendation: A Survey	Sep 20, 2024	Data AugmentationRecommendation Systems	CodeCode Available	3	5
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Oct 21, 2024	Autonomous DrivingData Augmentation	CodeCode Available	3	5
AutoAugment: Learning Augmentation Policies from Data	May 24, 2018	Data AugmentationDomain Generalization	CodeCode Available	3	5
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering	Sep 3, 2023	Data AugmentationDomain Adaptation	CodeCode Available	3	5
Depth Field Networks for Generalizable Multi-view Scene Representation	Jul 28, 2022	Data AugmentationDepth Estimation	CodeCode Available	2	5
Deep Visual Geo-localization Benchmark	Apr 7, 2022	BenchmarkingData Augmentation	CodeCode Available	2	5
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking	Mar 7, 2024	Data AugmentationMulti-Object Tracking	CodeCode Available	2	5
Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions	Feb 24, 2025	Data AugmentationImage Generation	CodeCode Available	2	5
Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions	Sep 21, 2022	Data AugmentationDomain Adaptation	CodeCode Available	2	5
Deep learning for time series classification	Oct 1, 2020	Activity RecognitionClassification	CodeCode Available	2	5
Decoupling Representation Learning from Reinforcement Learning	Sep 14, 2020	Data AugmentationDeep Reinforcement Learning	CodeCode Available	2	5
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework	Mar 17, 2024	AllData Augmentation	CodeCode Available	2	5
DD-Ranking: Rethinking the Evaluation of Dataset Distillation	May 19, 2025	Data AugmentationData Compression	CodeCode Available	2	5
Deep PCB To COCO Convertor	May 1, 2022	ClassificationData Augmentation	CodeCode Available	2	5
Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt	Mar 22, 2024	Data AugmentationTime Series	CodeCode Available	2	5
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data	May 16, 2024	Data AugmentationDiversity	CodeCode Available	2	5

Show:10 25 50

← PrevPage 1 of 168Next →

All datasets ImageNet CIFAR-10 GA1457

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	DeiT-B (+MixPro)	Accuracy (%)	82.9	—	Unverified
2	ResNet-200 (DeepAA)	Accuracy (%)	81.32	—	Unverified
3	DeiT-S (+MixPro)	Accuracy (%)	81.3	—	Unverified
4	ResNet-200 (Fast AA)	Accuracy (%)	80.6	—	Unverified
5	ResNet-200 (UA)	Accuracy (%)	80.4	—	Unverified
6	ResNet-200 (AA)	Accuracy (%)	80	—	Unverified
7	ResNet-50 (DeepAA)	Accuracy (%)	78.3	—	Unverified
8	ResNet-50 (TA wide)	Accuracy (%)	78.07	—	Unverified
9	ResNet-50 (LoRot-E)	Accuracy (%)	77.72	—	Unverified
10	ResNet-50 (LoRot-I)	Accuracy (%)	77.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WideResNet-40-2 (Faster AA)	Percentage error	3.7	—	Unverified
2	Shake-Shake (26 2×32d) (Faster AA)	Percentage error	2.7	—	Unverified
3	WideResNet-28-10 (Faster AA)	Percentage error	2.6	—	Unverified
4	Shake-Shake (26 2×112d) (Faster AA)	Percentage error	2	—	Unverified
5	Shake-Shake (26 2×96d) (Faster AA)	Percentage error	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DiffAug	Classification Accuracy	92.7	—	Unverified
2	PaCMAP	Classification Accuracy	85.3	—	Unverified
3	hNNE	Classification Accuracy	77.4	—	Unverified
4	TopoAE	Classification Accuracy	74.6	—	Unverified