SOTAVerified

Dataset Generation

The task involves enhancing the training of target application (e.g. autonomous driving systems) by generating datasets of diverse and critical elements (e.g. traffic scenarios). Traditional methods rely on expensive and limited datasets, which often fail to capture rare but essential situations that can pose risks during testing.

Papers

Showing 125 of 308 papers

TitleStatusHype
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
Synthetic Dataset Generation for Adversarial Machine Learning ResearchCode6
AutoCoder: Enhancing Code Large Language Model with AIEV-InstructCode4
Prompt2Model: Generating Deployable Models from Natural Language InstructionsCode4
Hierarchical Lexical Graph for Enhanced Multi-Hop RetrievalCode3
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewCode2
CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation ModelsCode2
Physics Informed Distillation for Diffusion ModelsCode2
DataDream: Few-shot Guided Dataset GenerationCode2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
JAX-SPH: A Differentiable Smoothed Particle Hydrodynamics FrameworkCode2
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset GenerationCode2
MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object DetectionCode2
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion ModelsCode2
Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting GarmentsCode1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal ReasoningCode1
TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal DiscoveryCode1
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM EvaluationCode1
Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation MapCode1
ColabSfM: Collaborative Structure-from-Motion by Point Cloud RegistrationCode1
Oasis: One Image is All You Need for Multimodal Instruction Data SynthesisCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content ModerationCode1
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion PlannerCode1
Show:102550
← PrevPage 1 of 13Next →

No leaderboard results yet.