SOTAVerified
|
Agents
Browse
Leaderboard
About
Tasks
›
Synthetic Data Generation
Synthetic Data Generation
The generation of tabular data by any means possible.
Papers
Recently Added
Most Hyped
Most Active
Needs Verification
Most Verified
Showing 1–25 of 822 papers
Title
Date
Tasks
Status
Hype
Qwen2.5-Coder Technical Report
Sep 18, 2024
Code Generation
Code
Code Available
11
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Apr 22, 2024
Dataset Generation
Diversity
Code
Code Available
7
LAB: Large-Scale Alignment for ChatBots
Mar 2, 2024
Instruction Following
Language Modeling
Code
Code Available
5
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Feb 16, 2024
Synthetic Data Generation
Code
Code Available
5
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
May 26, 2025
Question Answering
Synthetic Data Generation
Code
Code Available
4
TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic Data
Jan 21, 2025
Fairness
Imputation
Code
Code Available
4
Nemotron-4 340B Technical Report
Jun 17, 2024
Synthetic Data Generation
Code
Code Available
4
MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
May 31, 2024
Portrait Animation
Style Transfer
Code
Code Available
4
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
May 18, 2023
Natural Language Inference
Synthetic Data Generation
Code
Code Available
4
FSID: Fully Synthetic Image Denoising via Procedural Scene Generation
Dec 7, 2022
Denoising
Image Denoising
Code
Code Available
4
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
Jun 10, 2025
3D Lane Detection
3D Object Detection
Code
Code Available
3
ReasonIR: Training Retrievers for Reasoning Tasks
Apr 29, 2025
Information Retrieval
MMLU
Code
Code Available
3
Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs
Apr 28, 2025
Synthetic Data Generation
Code
Code Available
3
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Dec 27, 2024
Diversity
Synthetic Data Generation
Code
Code Available
3
A Survey on Deep Learning for Theorem Proving
Apr 15, 2024
Automated Theorem Proving
Deep Learning
Code
Code Available
3
SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis
Jun 12, 2025
Benchmarking
Dialogue Generation
Code
Code Available
2
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs
May 12, 2025
AI Agent
Knowledge Distillation
Code
Code Available
2
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
May 8, 2025
3DGS
Data Augmentation
Code
Code Available
2
Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework
Apr 2, 2025
Benchmarking
Synthetic Data Generation
Code
Code Available
2
Mellow: a small audio language model for reasoning
Mar 11, 2025
Audio captioning
Language Modeling
Code
Code Available
2
Improved Multi-Task Brain Tumour Segmentation with Synthetic Data Augmentation
Nov 7, 2024
Data Augmentation
Synthetic Data Generation
Code
Code Available
2
Efficient LLM Scheduling by Learning to Rank
Aug 28, 2024
Blocking
Chatbot
Code
Code Available
2
VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset
Jul 25, 2024
Head Detection
Keypoint Estimation
Code
Code Available
2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models
Jun 27, 2024
Attribute
Benchmarking
Code
Code Available
2
SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery
Jun 26, 2024
Domain Adaptation
Earth Observation
Code
Code Available
2
Show:
10
25
50
← Prev
Page 1 of 33
Next →
All datasets
UCI Epileptic Seizure Recognition
UNSW-NB15
Benchmark Results
▼
UCI Epileptic Seizure Recognition
2 submissions
↑ higher is better
#
Model
Metric
Claimed
Verified
Status
1
corGAN
AUROC
0.92
—
Unverified
2
GAN
AUROC
0.87
—
Unverified
▼
UNSW-NB15
2 submissions
↑ higher is better
#
Model
Metric
Claimed
Verified
Status
1
kiNETGAN
EMD
0.07
—
Unverified
2
CTGAN
EMD
0.07
—
Unverified