SOTAVerified

Dataset Generation

The task involves enhancing the training of target application (e.g. autonomous driving systems) by generating datasets of diverse and critical elements (e.g. traffic scenarios). Traditional methods rely on expensive and limited datasets, which often fail to capture rare but essential situations that can pose risks during testing.

Papers

Showing 5175 of 308 papers

TitleStatusHype
Holistic Audit Dataset Generation for LLM Unlearning via Knowledge Graph Traversal and Redundancy Removal0
SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations0
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-CheckingCode0
Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving ScenariosCode0
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination EvaluationCode0
One-Shot Federated Learning with Classifier-Free Diffusion Models0
MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation0
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning0
Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes0
iTRI-QA: a Toolset for Customized Question-Answer Dataset Generation Using Language Models for Enhanced Scientific Research0
Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing0
E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic ExpressionsCode0
A Dataset Generation Toolbox for Dynamic Security Assessment: On the Role of the Security BoundaryCode0
The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation0
CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation ModelsCode2
Neural Error Covariance Estimation for Precise LiDAR Localization0
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI0
Low-Biased General Annotated Dataset Generation0
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content ModerationCode1
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion PlannerCode1
Movie2Story: A framework for understanding videos and telling stories in the form of novel text0
Cognition Chain for Explainable Psychological Stress Detection on Social MediaCode0
SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset GenerationCode0
Unbiased General Annotated Dataset Generation0
Show:102550
← PrevPage 3 of 13Next →

No leaderboard results yet.