SOTAVerified

Dataset Generation

The task involves enhancing the training of target application (e.g. autonomous driving systems) by generating datasets of diverse and critical elements (e.g. traffic scenarios). Traditional methods rely on expensive and limited datasets, which often fail to capture rare but essential situations that can pose risks during testing.

Papers

Showing 5175 of 308 papers

TitleStatusHype
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMsCode1
SofaMyRoom: a fast and multiplatform "shoebox" room simulator for binaural room impulse response dataset generationCode1
Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation MapCode1
SynPick: A Dataset for Dynamic Bin Picking Scene UnderstandingCode1
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion PlannerCode1
Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languagesCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and DesignCode1
OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic SynthesisCode1
Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic SegmentationCode1
Chip Placement with Diffusion ModelsCode1
ZeroGen: Efficient Zero-shot Learning via Dataset GenerationCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Global Tensor Motion PlanningCode1
Learning to Answer Visual Questions from Web VideosCode1
RealFlow: EM-based Realistic Optical Flow Dataset Generation from VideosCode1
ColabSfM: Collaborative Structure-from-Motion by Point Cloud RegistrationCode1
Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D EnvironmentCode1
Faithful Persona-based Conversational Dataset Generation with Large Language ModelsCode1
Forcing Diffuse Distributions out of Language ModelsCode1
Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation0
A systematic dataset generation technique applied to data-driven automotive aerodynamics0
Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions0
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking0
A large-scale, physically-based synthetic dataset for satellite pose estimation0
Show:102550
← PrevPage 3 of 13Next →

No leaderboard results yet.