SOTAVerified

Data Augmentation

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Papers

Showing 51100 of 8378 papers

TitleStatusHype
BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with TransformerCode2
ProbPose: A Probabilistic Approach to 2D Human Pose EstimationCode2
Many-MobileNet: Multi-Model Augmentation for Robust Retinal Disease ClassificationCode2
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data GenerationCode2
Improved Multi-Task Brain Tumour Segmentation with Synthetic Data AugmentationCode2
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential RecommendationCode2
Self-Supervised Any-Point Tracking by Contrastive Random WalksCode2
HSIGene: A Foundation Model For Hyperspectral Image GenerationCode2
Synthetic continued pretrainingCode2
A Survey on Diffusion Models for Recommender SystemsCode2
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data TrainingCode2
RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution NetworksCode2
ARoFace: Alignment Robustness to Improve Low-Quality Face RecognitionCode2
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth EstimationCode2
Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic DataCode2
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 LanguagesCode2
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse WeatherCode2
Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution AnalysisCode2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical ReasoningCode2
Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content DetectorsCode2
Saturn: Sample-efficient Generative Molecular Design using Memory ManipulationCode2
Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2Code2
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative DataCode2
PHUDGE: Phi-3 as Scalable JudgeCode2
Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing CluesCode2
MindBridge: A Cross-Subject Brain Decoding FrameworkCode2
Identity Decoupling for Multi-Subject Personalization of Text-to-Image ModelsCode2
Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event DetectionCode2
Calib3D: Calibrating Model Preferences for Reliable 3D Scene UnderstandingCode2
LLM2LLM: Boosting LLMs with Novel Iterative Data EnhancementCode2
Addressing Concept Shift in Online Time Series Forecasting: Detect-then-AdaptCode2
A Versatile Framework for Multi-scene Person Re-identificationCode2
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation frameworkCode2
Revisiting Adversarial Training under Long-Tailed DistributionsCode2
EarthLoc: Astronaut Photography Localization by Indexing Earth from SpaceCode2
Delving into the Trajectory Long-tail Distribution for Muti-object TrackingCode2
Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and ApplicationsCode2
MolNexTR: A Generalized Deep Learning Model for Molecular Image RecognitionCode2
CodeS: Towards Building Open-source Language Models for Text-to-SQLCode2
Morphological Symmetries in RoboticsCode2
Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative FilteringCode2
One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive LearningCode2
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language ModelsCode2
A Survey on Data Augmentation in Large Model EraCode2
Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic RoomsCode2
Exploring Color Invariance through Image-Level Ensemble LearningCode2
Authorship Obfuscation in Multilingual Machine-Generated Text DetectionCode2
Large Language Models Can Learn Temporal ReasoningCode2
Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, GeometryCode2
Show:102550
← PrevPage 2 of 168Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DeiT-B (+MixPro)Accuracy (%)82.9Unverified
2ResNet-200 (DeepAA)Accuracy (%)81.32Unverified
3DeiT-S (+MixPro)Accuracy (%)81.3Unverified
4ResNet-200 (Fast AA)Accuracy (%)80.6Unverified
5ResNet-200 (UA)Accuracy (%)80.4Unverified
6ResNet-200 (AA)Accuracy (%)80Unverified
7ResNet-50 (DeepAA)Accuracy (%)78.3Unverified
8ResNet-50 (TA wide)Accuracy (%)78.07Unverified
9ResNet-50 (LoRot-E)Accuracy (%)77.72Unverified
10ResNet-50 (LoRot-I)Accuracy (%)77.71Unverified
#ModelMetricClaimedVerifiedStatus
1WideResNet-40-2 (Faster AA)Percentage error3.7Unverified
2Shake-Shake (26 2×32d) (Faster AA)Percentage error2.7Unverified
3WideResNet-28-10 (Faster AA)Percentage error2.6Unverified
4Shake-Shake (26 2×112d) (Faster AA)Percentage error2Unverified
5Shake-Shake (26 2×96d) (Faster AA)Percentage error2Unverified
#ModelMetricClaimedVerifiedStatus
1DiffAugClassification Accuracy92.7Unverified
2PaCMAPClassification Accuracy85.3Unverified
3hNNEClassification Accuracy77.4Unverified
4TopoAEClassification Accuracy74.6Unverified