| DeltaPy: A Framework for Tabular Data Augmentation in Python | May 22, 2020 | BIG-bench Machine LearningData Augmentation | CodeCode Available | 1 | 5 |
| SynTable: A Synthetic Data Generation Pipeline for Unseen Object Amodal Instance Segmentation of Cluttered Tabletop Scenes | Jul 14, 2023 | Amodal Instance SegmentationDataset Generation | CodeCode Available | 1 | 5 |
| Privacy-preserving data sharing via probabilistic modelling | Dec 10, 2019 | Privacy PreservingSynthetic Data Generation | CodeCode Available | 1 | 5 |
| AnthroNet: Conditional Generation of Humans via Anthropometrics | Sep 7, 2023 | 3D human pose and shape estimation3D Human Reconstruction | CodeCode Available | 1 | 5 |
| Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction | Mar 7, 2023 | Synthetic Data Generation | CodeCode Available | 1 | 5 |
| Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction | Sep 1, 2021 | Data PoisoningKnowledge Distillation | CodeCode Available | 1 | 5 |
| Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding | Aug 9, 2020 | Recommendation SystemsSynthetic Data Generation | CodeCode Available | 1 | 5 |
| Exploring Transformer Text Generation for Medical Dataset Augmentation | May 1, 2020 | Synthetic Data GenerationText Generation | CodeCode Available | 1 | 5 |
| Boosting Synthetic Data Generation with Effective Nonlinear Causal Discovery | Jan 18, 2023 | Causal Discoverysoftware testing | CodeCode Available | 1 | 5 |
| Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models | May 27, 2021 | DiversityGrammatical Error Correction | CodeCode Available | 1 | 5 |
| Overcoming Barriers to Data Sharing with Medical Image Generation: A Comprehensive Evaluation | Nov 29, 2020 | Computed Tomography (CT)Image Generation | CodeCode Available | 1 | 5 |
| TimeVAE: A Variational Auto-Encoder for Multivariate Time Series Generation | Nov 15, 2021 | Synthetic Data GenerationTime Series | CodeCode Available | 1 | 5 |
| POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities | Jul 19, 2023 | 3D Hand Pose Estimationhand-object pose | CodeCode Available | 1 | 5 |
| Noise-Aware Statistical Inference with Differentially Private Synthetic Data | May 28, 2022 | ImputationSynthetic Data Generation | CodeCode Available | 1 | 5 |
| BLEUBERI: BLEU is a surprisingly effective reward for instruction following | May 16, 2025 | Instruction FollowingSynthetic Data Generation | CodeCode Available | 1 | 5 |
| GECTurk: Grammatical Error Correction and Detection Dataset for Turkish | Sep 20, 2023 | ArticlesDecoder | CodeCode Available | 1 | 5 |
| NViSII: A Scriptable Tool for Photorealistic Image Generation | May 28, 2021 | Image GenerationOptical Flow Estimation | CodeCode Available | 1 | 5 |
| TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments | Aug 16, 2022 | DiversitySemantic Segmentation | CodeCode Available | 1 | 5 |
| CAD2Render: A Modular Toolkit for GPU-accelerated Photorealistic Synthetic Data Generation for the Manufacturing Industry | Nov 25, 2022 | GPUobject-detection | CodeCode Available | 1 | 5 |
| AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data | Mar 7, 2025 | DiversityFairness | CodeCode Available | 1 | 5 |
| DFNet: Enhance Absolute Pose Regression with Direct Feature Matching | Apr 1, 2022 | Camera Pose EstimationCamera Relocalization | CodeCode Available | 1 | 5 |
| GeoPointGAN: Synthetic Spatial Data with Local Label Differential Privacy | May 18, 2022 | ManagementPrivacy Preserving | CodeCode Available | 1 | 5 |
| CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data | Dec 16, 2021 | Pose EstimationRepresentation Learning | CodeCode Available | 1 | 5 |
| Generating tabular datasets under differential privacy | Aug 28, 2023 | Synthetic Data Generation | CodeCode Available | 1 | 5 |
| Using matrix-product states for time-series machine learning | Dec 20, 2024 | AstronomyImputation | CodeCode Available | 1 | 5 |
| Generative Wind Power Curve Modeling Via Machine Vision: A Self-learning Deep Convolutional Network Based Method | Aug 19, 2021 | BenchmarkingSynthetic Data Generation | CodeCode Available | 1 | 5 |
| LEyes: A Lightweight Framework for Deep Learning-Based Eye Tracking using Synthetic Eye Images | Sep 12, 2023 | Gaze EstimationSynthetic Data Generation | CodeCode Available | 1 | 5 |
| MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONS | Apr 25, 2025 | Clinical Language TranslationMachine Translation | CodeCode Available | 1 | 5 |
| D3A-TS: Denoising-Driven Data Augmentation in Time Series | Dec 9, 2023 | Data AugmentationDenoising | CodeCode Available | 1 | 5 |
| RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization | May 16, 2025 | RAGSynthetic Data Generation | CodeCode Available | 1 | 5 |
| Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy | May 9, 2023 | Synthetic Data Generation | CodeCode Available | 1 | 5 |
| MTSS-GAN: Multivariate Time Series Simulation Generative Adversarial Networks | Jun 26, 2020 | Generative Adversarial NetworkImage Generation | CodeCode Available | 1 | 5 |
| ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval | May 27, 2025 | Image RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Controllable 3D Generative Adversarial Face Model via Disentangling Shape and Appearance | Aug 30, 2022 | 3D Face ModellingFace Model | CodeCode Available | 1 | 5 |
| Improved Training of Wasserstein GANs | Mar 31, 2017 | Conditional Image GenerationImage Generation | CodeCode Available | 1 | 5 |
| Copula-based synthetic data augmentation for machine-learning emulators | Dec 16, 2020 | BIG-bench Machine LearningData Augmentation | CodeCode Available | 1 | 5 |
| CLIPPER: Compression enables long-context synthetic data generation | Feb 20, 2025 | Claim VerificationSynthetic Data Generation | CodeCode Available | 1 | 5 |
| CorGAN: Correlation-Capturing Convolutional Generative Adversarial Networks for Generating Synthetic Healthcare Records | Jan 25, 2020 | Disease PredictionGeneral Classification | CodeCode Available | 1 | 5 |
| Learning Compact Metrics for MT | Oct 12, 2021 | Cross-Lingual TransferLanguage Modeling | CodeCode Available | 1 | 5 |
| MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures | Mar 20, 2025 | Synthetic Data Generation | CodeCode Available | 1 | 5 |
| Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions | Nov 27, 2022 | Synthetic Data Generation | CodeCode Available | 1 | 5 |
| Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement | Jan 21, 2025 | Synthetic Data GenerationWorld Knowledge | CodeCode Available | 1 | 5 |
| Groove2Groove: One-Shot Music Style Transfer with Supervision from Synthetic Data | Aug 26, 2020 | DecoderMusic Genre Transfer | CodeCode Available | 1 | 5 |
| DP-MERF: Differentially Private Mean Embeddings with Random Features for Practical Privacy-Preserving Data Generation | Feb 26, 2020 | Privacy PreservingSensitivity | CodeCode Available | 1 | 5 |
| A Comprehensive Survey of Synthetic Tabular Data Generation | Apr 23, 2025 | Privacy PreservingSurvey | CodeCode Available | 1 | 5 |
| Learning from synthetic data generated with GRADE | May 7, 2023 | Pose EstimationSynthetic Data Generation | CodeCode Available | 1 | 5 |
| Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability | Jun 2, 2025 | DescriptiveSynthetic Data Generation | CodeCode Available | 1 | 5 |
| Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint | Jan 1, 2023 | Data AugmentationData-free Knowledge Distillation | CodeCode Available | 1 | 5 |
| SocialDial: A Benchmark for Socially-Aware Dialogue Systems | Apr 24, 2023 | Cultural Vocal Bursts Intensity PredictionSynthetic Data Generation | CodeCode Available | 1 | 5 |
| Will we run out of data? Limits of LLM scaling based on human-generated data | Oct 26, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |