SOTAVerified

Dataset Generation

The task involves enhancing the training of target application (e.g. autonomous driving systems) by generating datasets of diverse and critical elements (e.g. traffic scenarios). Traditional methods rely on expensive and limited datasets, which often fail to capture rare but essential situations that can pose risks during testing.

Papers

Showing 5175 of 308 papers

TitleStatusHype
Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic SegmentationCode1
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content ModerationCode1
Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation MapCode1
HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D ReconstructionCode1
Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D EnvironmentCode1
Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languagesCode1
ColabSfM: Collaborative Structure-from-Motion by Point Cloud RegistrationCode1
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and DesignCode1
MK-SQuIT: Synthesizing Questions using Iterative Template-fillingCode1
ViWi Vision-Aided mmWave Beam Tracking: Dataset, Task, and Baseline SolutionsCode1
Chip Placement with Diffusion ModelsCode1
Learning to Answer Visual Questions from Web VideosCode1
OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic SynthesisCode1
LIQUID: A Framework for List Question Answering Dataset GenerationCode1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal ReasoningCode1
LLMaAA: Making Large Language Models as Active AnnotatorsCode1
NeuroGraph: Benchmarks for Graph Machine Learning in Brain ConnectomicsCode1
ProGen: Progressive Zero-shot Dataset Generation via In-context FeedbackCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
SynTable: A Synthetic Data Generation Pipeline for Unseen Object Amodal Instance Segmentation of Cluttered Tabletop ScenesCode1
JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLMCode0
IrrMap: A Large-Scale Comprehensive Dataset for Irrigation Method MappingCode0
A Semi-Synthetic Dataset Generation Framework for Causal Inference in Recommender SystemsCode0
JABBERWOCK: A Tool for WebAssembly Dataset Generation and Its Application to Malicious Website DetectionCode0
KoCoSa: Korean Context-aware Sarcasm Detection DatasetCode0
Show:102550
← PrevPage 3 of 13Next →

No leaderboard results yet.