Better Synthetic Data by Retrieving and Transforming Existing Datasets Apr 22, 2024 Dataset Generation Diversity
Code Code Available 7Synthetic Dataset Generation for Adversarial Machine Learning Research Jul 21, 2022 BIG-bench Machine Learning Dataset Generation
Code Code Available 6Prompt2Model: Generating Deployable Models from Natural Language Instructions Aug 23, 2023 Data-free Knowledge Distillation Dataset Generation
Code Code Available 4AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct May 23, 2024 Class-level Code Generation Code Completion
Code Code Available 4RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework Aug 2, 2024 Benchmarking Dataset Generation
Code Code Available 3Hierarchical Lexical Graph for Enhanced Multi-Hop Retrieval Jun 9, 2025 Dataset Generation RAG
Code Code Available 3An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation Feb 26, 2024 Dataset Generation text-to-speech
Code Code Available 2UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models Jun 27, 2024 Attribute Benchmarking
Code Code Available 2DataDream: Few-shot Guided Dataset Generation Jul 15, 2024 Classification Dataset Generation
Code Code Available 2MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection Feb 18, 2024 3D Object Detection Dataset Generation
Code Code Available 2DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models Aug 11, 2023 Dataset Generation Decoder
Code Code Available 2Vision Language Action Models in Robotic Manipulation: A Systematic Review Jul 14, 2025 Dataset Generation Natural Language Understanding
Code Code Available 2CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models Jan 9, 2025 Cell Segmentation Dataset Generation
Code Code Available 2JAX-SPH: A Differentiable Smoothed Particle Hydrodynamics Framework Mar 7, 2024 Dataset Generation
Code Code Available 2Physics Informed Distillation for Diffusion Models Nov 13, 2024 Dataset Generation Image Generation
Code Code Available 2PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation Jan 4, 2024 Dataset Generation Object
Code Code Available 1NeuroGraph: Benchmarks for Graph Machine Learning in Brain Connectomics Jun 9, 2023 Benchmarking Dataset Generation
Code Code Available 1Perceptual Loss for Robust Unsupervised Homography Estimation Apr 20, 2021 Dataset Generation Homography Estimation
Code Code Available 1MK-SQuIT: Synthesizing Questions using Iterative Template-filling Nov 4, 2020 Dataset Generation Machine Translation
Code Code Available 1Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design May 29, 2024 Dataset Generation Image to text
Code Code Available 1Learning to Answer Visual Questions from Web Videos May 10, 2022 Dataset Generation Question Answering
Code Code Available 1MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Jun 5, 2025 Dataset Generation Mathematical Problem-Solving
Code Code Available 1Image Generation for Efficient Neural Network Training in Autonomous Drone Racing Aug 6, 2020 Dataset Generation Efficient Neural Network
Code Code Available 1Chip Placement with Diffusion Models Jul 17, 2024 Dataset Generation Denoising
Code Code Available 1LIQUID: A Framework for List Question Answering Dataset Generation Feb 3, 2023 Dataset Generation Question Answering
Code Code Available 1LLMaAA: Making Large Language Models as Active Annotators Oct 30, 2023 Active Learning Dataset Generation
Code Code Available 1Afro-MNIST: Synthetic generation of MNIST-style datasets for low-resource languages Sep 28, 2020 BIG-bench Machine Learning Dataset Generation
Code Code Available 1Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Map May 6, 2025 Dataset Generation Segmentation
Code Code Available 1Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis Mar 11, 2025 All Dataset Generation
Code Code Available 1PADetBench: Towards Benchmarking Physical Attacks against Object Detection Aug 17, 2024 Adversarial Robustness Benchmarking
Code Code Available 1Improving Paraphrase Detection with the Adversarial Paraphrasing Task Jun 14, 2021 Dataset Generation Paraphrase Identification
Code Code Available 1Global Tensor Motion Planning Nov 28, 2024 Dataset Generation Diversity
Code Code Available 1Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs Sep 18, 2023 Dataset Generation Question Answering
Code Code Available 1Forcing Diffuse Distributions out of Language Models Apr 16, 2024 Dataset Generation Diversity
Code Code Available 1Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks May 23, 2023 Attribute Dataset Generation
Code Code Available 1Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation Sep 25, 2023 Dataset Generation Segmentation
Code Code Available 1DCFace: Synthetic Face Generation with Dual Condition Diffusion Model Apr 14, 2023 Dataset Generation Face Generation
Code Code Available 1Automated Multi-level Preference for MLLMs May 18, 2024 Dataset Generation Hallucination
Code Code Available 1Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering Aug 31, 2023 Benchmarking Dataset Generation
Code Code Available 1Faithful Persona-based Conversational Dataset Generation with Large Language Models Dec 15, 2023 Chatbot Dataset Generation
Code Code Available 1Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects Dec 31, 2023 3D Shape Retrieval Dataset Generation
Code Code Available 1Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner Dec 24, 2024 Autonomous Driving Dataset Generation
Code Code Available 1ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration Mar 21, 2025 Dataset Generation Point Cloud Registration
Code Code Available 1CamDiff: Camouflage Image Augmentation via Diffusion Model Apr 11, 2023 Dataset Generation Image Augmentation
Code Code Available 1HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D Reconstruction Jun 24, 2022 3D Reconstruction Camera Pose Estimation
Code Code Available 1ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation Dec 24, 2024 Dataset Generation
Code Code Available 1Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment Oct 3, 2020 Dataset Generation Task Planning
Code Code Available 1CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation Sep 3, 2024 Dataset Generation Question Answering
Code Code Available 1Detecting Anti-Vaccine Users on Twitter Oct 21, 2021 Dataset Generation Misinformation
Code Code Available 1OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic Synthesis Nov 14, 2024 Dataset Generation
Code Code Available 1