SOTAVerified

Synthetic Data Generation

The generation of tabular data by any means possible.

Papers

Showing 150 of 822 papers

TitleStatusHype
Qwen2.5-Coder Technical ReportCode11
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM WorkflowsCode5
LAB: Large-Scale Alignment for ChatBotsCode5
TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic DataCode4
Nemotron-4 340B Technical ReportCode4
MegActor: Harness the Power of Raw Video for Vivid Portrait AnimationCode4
FSID: Fully Synthetic Image Denoising via Procedural Scene GenerationCode4
TrueTeacher: Learning Factual Consistency Evaluation with Large Language ModelsCode4
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data GenerationCode4
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation ModelsCode3
ReasonIR: Training Retrievers for Reasoning TasksCode3
Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMsCode3
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task SynthesisCode3
A Survey on Deep Learning for Theorem ProvingCode3
REaLTabFormer: Generating Realistic Relational and Tabular Data using TransformersCode2
Efficient LLM Scheduling by Learning to RankCode2
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series ForecastingCode2
DigiFace-1M: 1 Million Digital Face Images for Face RecognitionCode2
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time SeriesCode2
VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic DatasetCode2
SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing ImageryCode2
Synthetic QA Corpora Generation with Roundtrip ConsistencyCode2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
UAVD4L: A Large-Scale Dataset for UAV 6-DoF LocalizationCode2
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMsCode2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data AugmentationCode2
Pedagogical Alignment of Large Language ModelsCode2
End-to-End Full-Page Optical Music Recognition for Pianoform Sheet MusicCode2
Towards Realistic Generative 3D Face ModelsCode2
Mellow: a small audio language model for reasoningCode2
BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated MotionCode2
Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation FrameworkCode2
Improved Multi-Task Brain Tumour Segmentation with Synthetic Data AugmentationCode2
A Synthetic Dataset for Personal Attribute InferenceCode2
Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic DataCode2
InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information RetrievalCode2
EC-GAN: Low-Sample Classification using Semi-Supervised Algorithms and GANsCode1
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic DataCode1
EEG Synthetic Data Generation Using Probabilistic Diffusion ModelsCode1
dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data GenerationCode1
dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data GenerationCode1
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM GuardrailsCode1
Diffusion-based Conditional ECG Generation with Structured State Space ModelsCode1
Differentially Private Synthetic Medical Data Generation using Convolutional GANsCode1
Diffusion-HPC: Synthetic Data Generation for Human Mesh Recovery in Challenging DomainsCode1
Delving into High-Quality Synthetic Face Occlusion Segmentation DatasetsCode1
DFNet: Enhance Absolute Pose Regression with Direct Feature MatchingCode1
DeepNAG: Deep Non-Adversarial Gesture GenerationCode1
Show:102550
← PrevPage 1 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1corGANAUROC0.92Unverified
2GANAUROC0.87Unverified
#ModelMetricClaimedVerifiedStatus
1kiNETGANEMD0.07Unverified
2CTGANEMD0.07Unverified