SOTAVerified

Dataset Generation

The task involves enhancing the training of target application (e.g. autonomous driving systems) by generating datasets of diverse and critical elements (e.g. traffic scenarios). Traditional methods rely on expensive and limited datasets, which often fail to capture rare but essential situations that can pose risks during testing.

Papers

Showing 150 of 308 papers

TitleStatusHype
Better Synthetic Data by Retrieving and Transforming Existing DatasetsCode7
Synthetic Dataset Generation for Adversarial Machine Learning ResearchCode6
Prompt2Model: Generating Deployable Models from Natural Language InstructionsCode4
AutoCoder: Enhancing Code Large Language Model with AIEV-InstructCode4
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
Hierarchical Lexical Graph for Enhanced Multi-Hop RetrievalCode3
DataDream: Few-shot Guided Dataset GenerationCode2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language ModelsCode2
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewCode2
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset GenerationCode2
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion ModelsCode2
Physics Informed Distillation for Diffusion ModelsCode2
CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation ModelsCode2
JAX-SPH: A Differentiable Smoothed Particle Hydrodynamics FrameworkCode2
MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object DetectionCode2
PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset GenerationCode1
NeuroGraph: Benchmarks for Graph Machine Learning in Brain ConnectomicsCode1
Perceptual Loss for Robust Unsupervised Homography EstimationCode1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal ReasoningCode1
LLMaAA: Making Large Language Models as Active AnnotatorsCode1
DCFace: Synthetic Face Generation with Dual Condition Diffusion ModelCode1
MK-SQuIT: Synthesizing Questions using Iterative Template-fillingCode1
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and AugmentationCode1
Detecting Anti-Vaccine Users on TwitterCode1
DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using Stable Diffusion ModelsCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languagesCode1
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMsCode1
Oasis: One Image is All You Need for Multimodal Instruction Data SynthesisCode1
PADetBench: Towards Benchmarking Physical Attacks against Object DetectionCode1
Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic SegmentationCode1
Learning-based NLOS Detection and Uncertainty Prediction of GNSS Observations with Transformer-Enhanced LSTM NetworkCode1
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content ModerationCode1
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic TasksCode1
Image Generation for Efficient Neural Network Training in Autonomous Drone RacingCode1
Learning to Answer Visual Questions from Web VideosCode1
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and DesignCode1
Chip Placement with Diffusion ModelsCode1
ColabSfM: Collaborative Structure-from-Motion by Point Cloud RegistrationCode1
Automated Multi-level Preference for MLLMsCode1
CamDiff: Camouflage Image Augmentation via Diffusion ModelCode1
HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D ReconstructionCode1
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion PlannerCode1
Improving Paraphrase Detection with the Adversarial Paraphrasing TaskCode1
Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation MapCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D EnvironmentCode1
Forcing Diffuse Distributions out of Language ModelsCode1
OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic SynthesisCode1
Faithful Persona-based Conversational Dataset Generation with Large Language ModelsCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.