SOTAVerified

Dataset Generation

The task involves enhancing the training of target application (e.g. autonomous driving systems) by generating datasets of diverse and critical elements (e.g. traffic scenarios). Traditional methods rely on expensive and limited datasets, which often fail to capture rare but essential situations that can pose risks during testing.

Papers

Showing 51100 of 308 papers

TitleStatusHype
Holistic Audit Dataset Generation for LLM Unlearning via Knowledge Graph Traversal and Redundancy Removal0
SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations0
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-CheckingCode0
Synth It Like KITTI: Synthetic Data Generation for Object Detection in Driving ScenariosCode0
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination EvaluationCode0
One-Shot Federated Learning with Classifier-Free Diffusion Models0
MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation0
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning0
Synthetic User Behavior Sequence Generation with Large Language Models for Smart Homes0
iTRI-QA: a Toolset for Customized Question-Answer Dataset Generation Using Language Models for Enhanced Scientific Research0
Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing0
E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic ExpressionsCode0
A Dataset Generation Toolbox for Dynamic Security Assessment: On the Role of the Security BoundaryCode0
The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation0
CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation ModelsCode2
Neural Error Covariance Estimation for Precise LiDAR Localization0
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
Low-Biased General Annotated Dataset Generation0
DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI0
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content ModerationCode1
Generating Traffic Scenarios via In-Context Learning to Learn Better Motion PlannerCode1
Movie2Story: A framework for understanding videos and telling stories in the form of novel text0
Cognition Chain for Explainable Psychological Stress Detection on Social MediaCode0
SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset GenerationCode0
Unbiased General Annotated Dataset Generation0
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition0
JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLMCode0
SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table ExtractionCode0
An Evolutionary Large Language Model for Hallucination Mitigation0
SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion ModelsCode1
Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems0
Global Tensor Motion PlanningCode1
OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic SynthesisCode1
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere0
Drone Detection using Deep Neural Networks Trained on Pure Synthetic DataCode0
Physics Informed Distillation for Diffusion ModelsCode2
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs0
Fineweb-Edu-Ar: Machine-translated Corpus to Support Arabic Small Language Models0
Fairness-Utilization Trade-off in Wireless Networks with Explainable Kolmogorov-Arnold Networks0
Simulating User Agents for Embodied Conversational-AI0
SYNOSIS: Image synthesis pipeline for machine vision in metal surface inspection0
FTSmartAudit: A Knowledge Distillation-Enhanced Framework for Automated Smart Contract Auditing Using Fine-Tuned LLMs0
Pseudo Dataset Generation for Out-of-Domain Multi-Camera View Recommendation0
Anchored Alignment for Self-Explanations Enhancement0
Autonomous Self-Trained Channel State Prediction Method for mmWave Vehicular Communications0
HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations0
EarthquakeNPP: Benchmark Datasets for Earthquake Forecasting with Neural Point Processes0
Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient ConstraintsCode0
Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation0
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.