Data Augmentation

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 8378 papers

Title	Date	Tasks	Status	Hype
YOLOv10: Real-Time End-to-End Object Detection	May 23, 2024	2D Object DetectionData Augmentation	CodeCode Available	11
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data	Jan 19, 2024	Data AugmentationDepth Estimation	CodeCode Available	9
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer	Jul 24, 2024	Data AugmentationDecoder	CodeCode Available	7
RouteLLM: Learning to Route LLMs with Preference Data	Jun 26, 2024	Data AugmentationTransfer Learning	CodeCode Available	7
Symmetry Considerations for Learning Task Symmetric Robot Policies	Mar 7, 2024	Data AugmentationDeep Reinforcement Learning	CodeCode Available	7
Dynamic Evaluation of Large Language Models by Meta Probing Agents	Feb 21, 2024	Data Augmentation	CodeCode Available	7
OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation	Jun 2, 2025	Data AugmentationHuman Animation	CodeCode Available	5
DEIM: DETR with Improved Matching for Fast Convergence	Dec 5, 2024	Data AugmentationGPU	CodeCode Available	5
MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data	Jun 24, 2024	Data AugmentationOptical Character Recognition (OCR)	CodeCode Available	5
A Survey on Knowledge Distillation of Large Language Models	Feb 20, 2024	Data AugmentationKnowledge Distillation	CodeCode Available	5
AugLy: Data Augmentations for Robustness	Jan 17, 2022	Adversarial RobustnessData Augmentation	CodeCode Available	5
Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching	Oct 8, 2024	Data AugmentationStyle Transfer	CodeCode Available	4
Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System	May 17, 2024	Data AugmentationSpeech Dereverberation	CodeCode Available	4
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Feb 9, 2024	Data AugmentationGSM8K	CodeCode Available	4
RecBole 2.0: Towards a More Up-to-Date Recommendation Library	Jun 15, 2022	BenchmarkingData Augmentation	CodeCode Available	4
DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio	May 11, 2022	CPUData Augmentation	CodeCode Available	4
A Framework For Contrastive Self-Supervised Learning And Designing A New Approach	Aug 31, 2020	Data AugmentationImage Classification	CodeCode Available	4
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera	Jan 5, 2025	Data AugmentationDepth Estimation	CodeCode Available	3
Data Generation for Hardware-Friendly Post-Training Quantization	Oct 29, 2024	Data AugmentationGPU	CodeCode Available	3
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Oct 21, 2024	Autonomous DrivingData Augmentation	CodeCode Available	3
Data Augmentation for Sequential Recommendation: A Survey	Sep 20, 2024	Data AugmentationRecommendation Systems	CodeCode Available	3
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training	May 14, 2024	Data AugmentationSelf-Supervised Learning	CodeCode Available	3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning	May 13, 2024	Data AugmentationGSM8K	CodeCode Available	3
DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector	Apr 13, 2024	Data AugmentationKey Point Matching	CodeCode Available	3
Segment Any Medical Model Extended	Mar 26, 2024	Data AugmentationImage Segmentation	CodeCode Available	3
OpenGraph: Towards Open Graph Foundation Models	Mar 2, 2024	Data AugmentationGraph Learning	CodeCode Available	3
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning	Feb 15, 2024	Data AugmentationInstruction Following	CodeCode Available	3
Improved motif-scaffolding with SE(3) flow matching	Jan 8, 2024	Data AugmentationDiversity	CodeCode Available	3
Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection	Sep 7, 2023	AnatomyData Augmentation	CodeCode Available	3
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering	Sep 3, 2023	Data AugmentationDomain Adaptation	CodeCode Available	3
Augmentation-Free Graph Contrastive Learning of Invariant-Discriminative Representations	Oct 15, 2022	Contrastive LearningData Augmentation	CodeCode Available	3
PyABSA: A Modularized Framework for Reproducible Aspect-based Sentiment Analysis	Aug 2, 2022	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available	3
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies	Jun 9, 2022	3D Classification3D Part Segmentation	CodeCode Available	3
EfficientNetV2: Smaller Models and Faster Training	Apr 1, 2021	AutoMLClassification	CodeCode Available	3
Robust and Accurate Object Detection via Adversarial Learning	Mar 23, 2021	AutoMLData Augmentation	CodeCode Available	3
YOLOv4: Optimal Speed and Accuracy of Object Detection	Apr 23, 2020	BIG-bench Machine LearningData Augmentation	CodeCode Available	3
Pythia v0.1: the Winning Entry to the VQA Challenge 2018	Jul 26, 2018	Data AugmentationVisual Question Answering (VQA)	CodeCode Available	3
AutoAugment: Learning Augmentation Policies from Data	May 24, 2018	Data AugmentationDomain Generalization	CodeCode Available	3
DD-Ranking: Rethinking the Evaluation of Dataset Distillation	May 19, 2025	Data AugmentationData Compression	CodeCode Available	2
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning	May 16, 2025	Data Augmentation	CodeCode Available	2
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation	May 8, 2025	3DGSData Augmentation	CodeCode Available	2
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations	May 4, 2025	Data Augmentation	CodeCode Available	2
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation	Apr 17, 2025	Data AugmentationDiversity	CodeCode Available	2
Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection	Apr 6, 2025	Cross-Domain Few-ShotCross-Domain Few-Shot Object Detection	CodeCode Available	2
RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images and A Benchmark	Mar 21, 2025	Data Augmentation	CodeCode Available	2
External Knowledge Injection for CLIP-Based Class-Incremental Learning	Mar 11, 2025	class-incremental learningClass Incremental Learning	CodeCode Available	2
Composed Multi-modal Retrieval: A Survey of Approaches and Applications	Mar 3, 2025	Cross-Modal RetrievalData Augmentation	CodeCode Available	2
Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions	Feb 24, 2025	Data AugmentationImage Generation	CodeCode Available	2
RoboBERT: An End-to-end Multimodal Robotic Manipulation Model	Feb 11, 2025	Data Augmentation	CodeCode Available	2
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization	Jan 2, 2025	Data AugmentationVisual Localization	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 168Next →

All datasets ImageNet CIFAR-10 GA1457

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	DeiT-B (+MixPro)	Accuracy (%)	82.9	—	Unverified
2	ResNet-200 (DeepAA)	Accuracy (%)	81.32	—	Unverified
3	DeiT-S (+MixPro)	Accuracy (%)	81.3	—	Unverified
4	ResNet-200 (Fast AA)	Accuracy (%)	80.6	—	Unverified
5	ResNet-200 (UA)	Accuracy (%)	80.4	—	Unverified
6	ResNet-200 (AA)	Accuracy (%)	80	—	Unverified
7	ResNet-50 (DeepAA)	Accuracy (%)	78.3	—	Unverified
8	ResNet-50 (TA wide)	Accuracy (%)	78.07	—	Unverified
9	ResNet-50 (LoRot-E)	Accuracy (%)	77.72	—	Unverified
10	ResNet-50 (LoRot-I)	Accuracy (%)	77.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WideResNet-40-2 (Faster AA)	Percentage error	3.7	—	Unverified
2	Shake-Shake (26 2×32d) (Faster AA)	Percentage error	2.7	—	Unverified
3	WideResNet-28-10 (Faster AA)	Percentage error	2.6	—	Unverified
4	Shake-Shake (26 2×96d) (Faster AA)	Percentage error	2	—	Unverified
5	Shake-Shake (26 2×112d) (Faster AA)	Percentage error	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DiffAug	Classification Accuracy	92.7	—	Unverified
2	PaCMAP	Classification Accuracy	85.3	—	Unverified
3	hNNE	Classification Accuracy	77.4	—	Unverified
4	TopoAE	Classification Accuracy	74.6	—	Unverified