| Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training | Jul 11, 2025 | Generative Adversarial NetworkSynthetic Data Generation | —Unverified | 0 |
| DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations | Jul 8, 2025 | In-Context LearningJoint Entity and Relation Extraction | —Unverified | 0 |
| How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE | Jun 26, 2025 | Defect DetectionDiversity | CodeCode Available | 0 |
| SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation | Jun 24, 2025 | Image GenerationPrivacy Preserving | —Unverified | 0 |
| PuckTrick: A Library for Making Synthetic Data More Realistic | Jun 23, 2025 | Missing ValuesSynthetic Data Generation | —Unverified | 0 |
| RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models | Jun 21, 2025 | Synthetic Data GenerationVision-Language-Action | —Unverified | 0 |
| Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation | Jun 19, 2025 | Privacy PreservingSynthetic Data Generation | —Unverified | 0 |
| Graph-Convolutional-Beta-VAE for Synthetic Abdominal Aorta Aneurysm Generation | Jun 16, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| The Synthetic Mirror -- Synthetic Data at the Age of Agentic AI | Jun 15, 2025 | Synthetic Data Generation | —Unverified | 0 |
| SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis | Jun 12, 2025 | BenchmarkingDialogue Generation | CodeCode Available | 2 |
| Spatiotemporal deep learning models for detection of rapid intensification in cyclones | Jun 10, 2025 | Data AugmentationDeep Learning | —Unverified | 0 |
| Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Jun 10, 2025 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| Unlocking the Potential of Large Language Models in the Nuclear Industry with Synthetic Data | Jun 10, 2025 | Decision MakingInformation Retrieval | —Unverified | 0 |
| Private Evolution Converges | Jun 10, 2025 | Synthetic Data Generation | —Unverified | 0 |
| Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques | Jun 9, 2025 | Activity RecognitionData Augmentation | —Unverified | 0 |
| SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents | Jun 9, 2025 | BenchmarkingSynthetic Data Generation | —Unverified | 0 |
| SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms | Jun 6, 2025 | DiversityLarge Language Model | —Unverified | 0 |
| Synthetic Tabular Data: Methods, Attacks and Defenses | Jun 6, 2025 | Synthetic Data Generation | —Unverified | 0 |
| Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models | Jun 6, 2025 | Synthetic Data Generation | CodeCode Available | 0 |
| Gen-n-Val: Agentic Image Data Generation and Validation | Jun 5, 2025 | Image HarmonizationInstance Segmentation | —Unverified | 0 |
| Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training | Jun 5, 2025 | Dataset Generationobject-detection | —Unverified | 0 |
| Beyond the Norm: A Survey of Synthetic Data Generation for Rare Events | Jun 4, 2025 | Synthetic Data Generation | —Unverified | 0 |
| BEAR: BGP Event Analysis and Reporting | Jun 4, 2025 | In-Context LearningSynthetic Data Generation | CodeCode Available | 0 |
| Does Prompt Design Impact Quality of Data Imputation by LLMs? | Jun 4, 2025 | Binary ClassificationImputation | —Unverified | 0 |
| RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions | Jun 3, 2025 | Referring ExpressionSynthetic Data Generation | —Unverified | 0 |
| IP-Dialog: Evaluating Implicit Personalization in Dialogue Systems with Synthetic Data | Jun 3, 2025 | AttributeSynthetic Data Generation | —Unverified | 0 |
| Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models | Jun 3, 2025 | Synthetic Data Generation | —Unverified | 0 |
| Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability | Jun 2, 2025 | DescriptiveSynthetic Data Generation | CodeCode Available | 1 |
| SMOTE-DP: Improving Privacy-Utility Tradeoff with Synthetic Data | Jun 2, 2025 | Privacy PreservingSynthetic Data Generation | —Unverified | 0 |
| dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation | May 31, 2025 | Synthetic Data GenerationTabular Data Generation | CodeCode Available | 1 |
| VietMix: A Naturally Occurring Vietnamese-English Code-Mixed Corpus with Iterative Augmentation for Machine Translation | May 30, 2025 | Machine TranslationSynthetic Data Generation | —Unverified | 0 |
| Multi-Domain ABSA Conversation Dataset Generation via LLMs for Real-World Evaluation and Model Comparison | May 30, 2025 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | —Unverified | 0 |
| CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis | May 29, 2025 | Contrastive LearningDiversity | —Unverified | 0 |
| StressTest: Can YOUR Speech LM Handle the Stress? | May 28, 2025 | Question AnsweringSentence | —Unverified | 0 |
| Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection | May 28, 2025 | DiversitySynthetic Data Generation | CodeCode Available | 1 |
| ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval | May 27, 2025 | Image RetrievalRetrieval | CodeCode Available | 1 |
| Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages | May 27, 2025 | Synthetic Data GenerationVoice Cloning | —Unverified | 0 |
| GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation | May 26, 2025 | Question AnsweringSynthetic Data Generation | CodeCode Available | 4 |
| Improving Heart Rejection Detection in XPCI Images Using Synthetic Data Augmentation | May 26, 2025 | Data AugmentationSynthetic Data Generation | —Unverified | 0 |
| Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations | May 26, 2025 | AllDiagnostic | —Unverified | 0 |
| From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data | May 26, 2025 | cross-modal alignmentInstruction Following | —Unverified | 0 |
| A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking | May 26, 2025 | BenchmarkingOptical Flow Estimation | —Unverified | 0 |
| SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback | May 26, 2025 | Prompt LearningQuestion Answering | —Unverified | 0 |
| Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments | May 26, 2025 | Data-free Knowledge DistillationFederated Learning | CodeCode Available | 0 |
| PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders | May 25, 2025 | DiversitySynthetic Data Generation | —Unverified | 0 |
| The Prompt is Mightier than the Example | May 24, 2025 | In-Context LearningSynthetic Data Generation | —Unverified | 0 |
| Large language model as user daily behavior data generator: balancing population diversity and individual personality | May 23, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| Data-Driven Breakthroughs and Future Directions in AI Infrastructure: A Comprehensive Review | May 22, 2025 | Federated LearningGPU | —Unverified | 0 |
| V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation | May 22, 2025 | Event-based visionOptical Flow Estimation | CodeCode Available | 1 |
| Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation | May 21, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |