SOTAVerified

Synthetic Data Generation

The generation of tabular data by any means possible.

Papers

Showing 150 of 822 papers

TitleStatusHype
Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training0
DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations0
How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RECode0
SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation0
PuckTrick: A Library for Making Synthetic Data More Realistic0
RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models0
Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation0
Graph-Convolutional-Beta-VAE for Synthetic Abdominal Aorta Aneurysm Generation0
The Synthetic Mirror -- Synthetic Data at the Age of Agentic AI0
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
Spatiotemporal deep learning models for detection of rapid intensification in cyclones0
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation ModelsCode3
Unlocking the Potential of Large Language Models in the Nuclear Industry with Synthetic Data0
Private Evolution Converges0
Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents0
SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms0
Synthetic Tabular Data: Methods, Attacks and Defenses0
Optimization-Free Universal Watermark Forgery with Regenerative Diffusion ModelsCode0
Gen-n-Val: Agentic Image Data Generation and Validation0
Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training0
Beyond the Norm: A Survey of Synthetic Data Generation for Rare Events0
BEAR: BGP Event Analysis and ReportingCode0
Does Prompt Design Impact Quality of Data Imputation by LLMs?0
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions0
IP-Dialog: Evaluating Implicit Personalization in Dialogue Systems with Synthetic Data0
Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models0
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and AccountabilityCode1
SMOTE-DP: Improving Privacy-Utility Tradeoff with Synthetic Data0
dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data GenerationCode1
VietMix: A Naturally Occurring Vietnamese-English Code-Mixed Corpus with Iterative Augmentation for Machine Translation0
Multi-Domain ABSA Conversation Dataset Generation via LLMs for Real-World Evaluation and Model Comparison0
CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis0
StressTest: Can YOUR Speech LM Handle the Stress?0
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency DetectionCode1
ConText-CIR: Learning from Concepts in Text for Composed Image RetrievalCode1
Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages0
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data GenerationCode4
Improving Heart Rejection Detection in XPCI Images Using Synthetic Data Augmentation0
Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations0
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data0
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking0
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback0
Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed EnvironmentsCode0
PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders0
The Prompt is Mightier than the Example0
Large language model as user daily behavior data generator: balancing population diversity and individual personality0
Data-Driven Breakthroughs and Future Directions in AI Infrastructure: A Comprehensive Review0
V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel SimulationCode1
Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation0
Show:102550
← PrevPage 1 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1corGANAUROC0.92Unverified
2GANAUROC0.87Unverified
#ModelMetricClaimedVerifiedStatus
1kiNETGANEMD0.07Unverified
2CTGANEMD0.07Unverified