SOTAVerified

Synthetic Data Generation

The generation of tabular data by any means possible.

Papers

Showing 51100 of 822 papers

TitleStatusHype
Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation0
Challenges and Limitations in the Synthetic Generation of mHealth Sensor Data0
Scaling Low-Resource MT via Synthetic Data Generation with LLMs0
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data GenerationCode0
BLEUBERI: BLEU is a surprisingly effective reward for instruction followingCode1
RAGSynth: Synthetic Data for Robust and Faithful RAG Component OptimizationCode1
RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs0
Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data0
Privacy-Preserving Analytics for Smart Meter (AMI) Data: A Hybrid Approach to Comply with CPUC Privacy Regulations0
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMsCode2
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data0
Uni-AIMS: AI-Powered Microscopy Image Analysis0
Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche LanguageCode0
Generating Reliable Synthetic Clinical Trial Data: The Role of Hyperparameter Optimization and Domain Constraints0
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data AugmentationCode2
AI-Generated Fall Data: Assessing LLMs and Diffusion Model for Wearable Fall Detection0
Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation0
Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language ModelsCode0
Modeling supply chain compliance response strategies based on AI synthetic data with structural path regression: A Simulation Study of EU 2027 Mandatory Labor Regulations0
Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models0
ReasonIR: Training Retrievers for Reasoning TasksCode3
Artificial Intelligence for Personalized Prediction of Alzheimer's Disease Progression: A Survey of Methods, Data Challenges, and Future Directions0
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model ValidationCode0
Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMsCode3
Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real TransferCode1
Unified Multi-Task Learning & Model Fusion for Efficient Language Model Guardrailing0
MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONSCode1
TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation0
Optimizing the Privacy-Utility Balance using Synthetic Data and Configurable Perturbation Pipelines0
A Comprehensive Survey of Synthetic Tabular Data GenerationCode1
ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving0
Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code0
A Statistical Approach for Synthetic EEG Data GenerationCode0
Learning from Reasoning Failures via Synthetic Data Generation0
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation0
MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation0
Synthetic Data for Blood Vessel Network Extraction0
Evaluating the Diversity and Quality of LLM Generated Content0
Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task0
Leveraging Vertical Public-Private Split for Improved Synthetic Data Generation0
Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data GenerationCode0
SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data0
Enhancing Metabolic Syndrome Prediction with Hybrid Data Balancing and CounterfactualsCode0
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection0
SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access CatalogCode0
A Self-Supervised Framework for Space Object Behaviour Characterisation0
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use0
CORTEX-AVD: A Framework for CORner Case Testing and EXploration in Autonomous Vehicle Development0
Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data0
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation0
Show:102550
← PrevPage 2 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1corGANAUROC0.92Unverified
2GANAUROC0.87Unverified
#ModelMetricClaimedVerifiedStatus
1kiNETGANEMD0.07Unverified
2CTGANEMD0.07Unverified