SOTAVerified

Dataset Generation

The task involves enhancing the training of target application (e.g. autonomous driving systems) by generating datasets of diverse and critical elements (e.g. traffic scenarios). Traditional methods rely on expensive and limited datasets, which often fail to capture rare but essential situations that can pose risks during testing.

Papers

Showing 51100 of 308 papers

TitleStatusHype
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMsCode1
SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion ModelsCode1
Monocular Multi-Layer Layout Estimation for Warehouse RacksCode1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal ReasoningCode1
CamDiff: Camouflage Image Augmentation via Diffusion ModelCode1
Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languagesCode1
LIQUID: A Framework for List Question Answering Dataset GenerationCode1
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and DesignCode1
OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic SynthesisCode1
PADetBench: Towards Benchmarking Physical Attacks against Object DetectionCode1
Chip Placement with Diffusion ModelsCode1
PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset GenerationCode1
ProGen: Progressive Zero-shot Dataset Generation via In-context FeedbackCode1
RealFlow: EM-based Realistic Optical Flow Dataset Generation from VideosCode1
SofaMyRoom: a fast and multiplatform "shoebox" room simulator for binaural room impulse response dataset generationCode1
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM EvaluationCode1
ColabSfM: Collaborative Structure-from-Motion by Point Cloud RegistrationCode1
Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D EnvironmentCode1
Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting GarmentsCode1
Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel LogisticsCode1
A Semi-Synthetic Dataset Generation Framework for Causal Inference in Recommender SystemsCode0
PAXQA: Generating Cross-lingual Question Answering Examples at Training ScaleCode0
NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye ImagesCode0
Noisemaker 3D: Comprehensive Framework for Mesh Noise GenerationCode0
Pipeline and Dataset Generation for Automated Fact-checking in Almost Any LanguageCode0
Building Large Machine Reading-Comprehension Datasets using Paragraph VectorsCode0
Mitosis Detection from Partial Annotation by Dataset Generation via Frame-Order FlippingCode0
A Framework for Large Scale Synthetic Graph Dataset GenerationCode0
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-CheckingCode0
LoFT: LoRA-fused Training Dataset Generation with Few-shot GuidanceCode0
Drone Detection using Deep Neural Networks Trained on Pure Synthetic DataCode0
Neural Network Surrogate and Projected Gradient Descent for Fast and Reliable Finite Element Model Calibration: a Case Study on an Intervertebral DiscCode0
Affordance Learning for End-to-End Visuomotor Robot ControlCode0
Location-Aware Visual Question Generation with Lightweight ModelsCode0
Masked Face Dataset Generation and Masked Face RecognitionCode0
Private Dataset Generation Using Privacy Preserving Collaborative LearningCode0
Learning Camera Miscalibration DetectionCode0
JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLMCode0
JABBERWOCK: A Tool for WebAssembly Dataset Generation and Its Application to Malicious Website DetectionCode0
ADG-Pose: Automated Dataset Generation for Real-World Human Pose EstimationCode0
Improving Sentence Embeddings with Automatic Generation of Training Data Using Few-shot ExamplesCode0
KoCoSa: Korean Context-aware Sarcasm Detection DatasetCode0
Bag of Views: An Appearance-based Approach to Next-Best-View Planning for 3D ReconstructionCode0
IrrMap: A Large-Scale Comprehensive Dataset for Irrigation Method MappingCode0
Learning to Compute Gröbner BasesCode0
Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classificationCode0
GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning BenchmarksCode0
Dataset Generation and Bonobo Classification from Weakly Labelled VideosCode0
Automating 3D Dataset Generation with Neural Radiance FieldsCode0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.