| MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures | Mar 20, 2025 | Synthetic Data Generation | CodeCode Available | 1 |
| AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data | Mar 7, 2025 | DiversityFairness | CodeCode Available | 1 |
| CLIPPER: Compression enables long-context synthetic data generation | Feb 20, 2025 | Claim VerificationSynthetic Data Generation | CodeCode Available | 1 |
| DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Feb 7, 2025 | Reinforcement Learning (RL)Synthetic Data Generation | CodeCode Available | 1 |
| SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset | Feb 4, 2025 | 3D Object DetectionAutonomous Driving | CodeCode Available | 1 |
| XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Jan 31, 2025 | Action LocalizationAction Recognition | CodeCode Available | 1 |
| FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Jan 28, 2025 | Natural Language InferenceSynthetic Data Generation | CodeCode Available | 1 |
| Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement | Jan 21, 2025 | Synthetic Data GenerationWorld Knowledge | CodeCode Available | 1 |
| Synthetic Data Generation by Supervised Neural Gas Network for Physiological Emotion Recognition Data | Jan 19, 2025 | EEGEmotion Recognition | CodeCode Available | 1 |
| Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner | Dec 24, 2024 | Autonomous DrivingDataset Generation | CodeCode Available | 1 |
| Using matrix-product states for time-series machine learning | Dec 20, 2024 | AstronomyImputation | CodeCode Available | 1 |
| ResoFilter: Fine-grained Synthetic Data Filtering for Large Language Models through Data-Parameter Resonance Analysis | Dec 19, 2024 | Data AugmentationSynthetic Data Generation | CodeCode Available | 1 |
| SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models | Dec 3, 2024 | Dataset GenerationImage-to-Image Translation | CodeCode Available | 1 |
| Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai | Nov 23, 2024 | DiversityQuestion Answering | CodeCode Available | 1 |
| BhasaAnuvaad: A Speech Translation Dataset for 13 Indian Languages | Nov 7, 2024 | automatic-speech-translationSynthetic Data Generation | CodeCode Available | 1 |
| SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification | Oct 7, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Oct 3, 2024 | HumanEvalSynthetic Data Generation | CodeCode Available | 1 |
| Voice Disorder Analysis: a Transformer-based Approach | Jun 20, 2024 | Data AugmentationDiversity | CodeCode Available | 1 |
| SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation | May 16, 2024 | Bias DetectionDiversity | CodeCode Available | 1 |
| Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models | Apr 23, 2024 | Conversational Question AnsweringDialogue State Tracking | CodeCode Available | 1 |
| EPIC: Effective Prompting for Imbalanced-Class Data Synthesis in Tabular Data Classification via Large Language Models | Apr 15, 2024 | In-Context LearningSynthetic Data Generation | CodeCode Available | 1 |
| An evaluation framework for synthetic data generation models | Apr 13, 2024 | Data AugmentationSynthetic Data Generation | CodeCode Available | 1 |
| API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | Feb 23, 2024 | Benchmarkingslot-filling | CodeCode Available | 1 |
| Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes | Jan 29, 2024 | Data AugmentationSound Event Localization and Detection | CodeCode Available | 1 |
| Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction | Jan 12, 2024 | Autonomous DrivingPrediction | CodeCode Available | 1 |
| RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation | Dec 21, 2023 | BenchmarkingProduct Recommendation | CodeCode Available | 1 |
| View-Dependent Octree-based Mesh Extraction in Unbounded Scenes for Procedural Synthetic Data | Dec 13, 2023 | Synthetic Data Generation | CodeCode Available | 1 |
| D3A-TS: Denoising-Driven Data Augmentation in Time Series | Dec 9, 2023 | Data AugmentationDenoising | CodeCode Available | 1 |
| AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing | Oct 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| GECTurk: Grammatical Error Correction and Detection Dataset for Turkish | Sep 20, 2023 | ArticlesDecoder | CodeCode Available | 1 |
| LEyes: A Lightweight Framework for Deep Learning-Based Eye Tracking using Synthetic Eye Images | Sep 12, 2023 | Gaze EstimationSynthetic Data Generation | CodeCode Available | 1 |
| AnthroNet: Conditional Generation of Humans via Anthropometrics | Sep 7, 2023 | 3D human pose and shape estimation3D Human Reconstruction | CodeCode Available | 1 |
| FinDiff: Diffusion Models for Financial Tabular Data Generation | Sep 4, 2023 | Fraud DetectionSynthetic Data Generation | CodeCode Available | 1 |
| Generating tabular datasets under differential privacy | Aug 28, 2023 | Synthetic Data Generation | CodeCode Available | 1 |
| POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities | Jul 19, 2023 | 3D Hand Pose Estimationhand-object pose | CodeCode Available | 1 |
| SynTable: A Synthetic Data Generation Pipeline for Unseen Object Amodal Instance Segmentation of Cluttered Tabletop Scenes | Jul 14, 2023 | Amodal Instance SegmentationDataset Generation | CodeCode Available | 1 |
| PyTrial: Machine Learning Software and Benchmark for Clinical Trial Applications | Jun 6, 2023 | Synthetic Data Generation | CodeCode Available | 1 |
| GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes | May 25, 2023 | Computed Tomography (CT)Image Generation | CodeCode Available | 1 |
| Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy | May 9, 2023 | Synthetic Data Generation | CodeCode Available | 1 |
| Learning from synthetic data generated with GRADE | May 7, 2023 | Pose EstimationSynthetic Data Generation | CodeCode Available | 1 |
| Synthetic Data-based Detection of Zebras in Drone Imagery | Apr 30, 2023 | Missing LabelsPose Estimation | CodeCode Available | 1 |
| SocialDial: A Benchmark for Socially-Aware Dialogue Systems | Apr 24, 2023 | Cultural Vocal Bursts Intensity PredictionSynthetic Data Generation | CodeCode Available | 1 |
| Natural Language-Based Synthetic Data Generation for Cluster Analysis | Mar 24, 2023 | Synthetic Data Generation | CodeCode Available | 1 |
| Diffusion-HPC: Synthetic Data Generation for Human Mesh Recovery in Challenging Domains | Mar 16, 2023 | Human Mesh RecoverySynthetic Data Generation | CodeCode Available | 1 |
| Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction | Mar 7, 2023 | Synthetic Data Generation | CodeCode Available | 1 |
| EEG Synthetic Data Generation Using Probabilistic Diffusion Models | Mar 6, 2023 | Brain Computer InterfaceData Augmentation | CodeCode Available | 1 |
| Generating Multidimensional Clusters With Support Lines | Jan 24, 2023 | ClusteringSynthetic Data Generation | CodeCode Available | 1 |
| Diffusion-based Conditional ECG Generation with Structured State Space Models | Jan 19, 2023 | State Space ModelsSynthetic Data Generation | CodeCode Available | 1 |
| Boosting Synthetic Data Generation with Effective Nonlinear Causal Discovery | Jan 18, 2023 | Causal Discoverysoftware testing | CodeCode Available | 1 |
| Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint | Jan 1, 2023 | Data AugmentationData-free Knowledge Distillation | CodeCode Available | 1 |