| Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation | May 21, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| Challenges and Limitations in the Synthetic Generation of mHealth Sensor Data | May 20, 2025 | Data AugmentationSynthetic Data Generation | —Unverified | 0 |
| Scaling Low-Resource MT via Synthetic Data Generation with LLMs | May 20, 2025 | Machine TranslationSynthetic Data Generation | —Unverified | 0 |
| LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation | May 17, 2025 | Automated Theorem ProvingSynthetic Data Generation | CodeCode Available | 0 |
| BLEUBERI: BLEU is a surprisingly effective reward for instruction following | May 16, 2025 | Instruction FollowingSynthetic Data Generation | CodeCode Available | 1 |
| RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization | May 16, 2025 | RAGSynthetic Data Generation | CodeCode Available | 1 |
| RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs | May 15, 2025 | Knowledge GraphsNatural Language Queries | —Unverified | 0 |
| Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data | May 14, 2025 | Federated LearningMissing Labels | —Unverified | 0 |
| Privacy-Preserving Analytics for Smart Meter (AMI) Data: A Hybrid Approach to Comply with CPUC Privacy Regulations | May 13, 2025 | EconometricsFederated Learning | —Unverified | 0 |
| Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs | May 12, 2025 | AI AgentKnowledge Distillation | CodeCode Available | 2 |
| Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data | May 12, 2025 | Program RepairSynthetic Data Generation | —Unverified | 0 |
| Uni-AIMS: AI-Powered Microscopy Image Analysis | May 11, 2025 | Synthetic Data Generation | —Unverified | 0 |
| Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language | May 10, 2025 | Language IdentificationSynthetic Data Generation | CodeCode Available | 0 |
| Generating Reliable Synthetic Clinical Trial Data: The Role of Hyperparameter Optimization and Domain Constraints | May 8, 2025 | Hyperparameter OptimizationSynthetic Data Generation | —Unverified | 0 |
| SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation | May 8, 2025 | 3DGSData Augmentation | CodeCode Available | 2 |
| AI-Generated Fall Data: Assessing LLMs and Diffusion Model for Wearable Fall Detection | May 7, 2025 | Synthetic Data Generation | —Unverified | 0 |
| Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation | May 6, 2025 | Binary ClassificationClassification | —Unverified | 0 |
| Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models | May 6, 2025 | DiversitySynthetic Data Generation | CodeCode Available | 0 |
| Modeling supply chain compliance response strategies based on AI synthetic data with structural path regression: A Simulation Study of EU 2027 Mandatory Labor Regulations | May 4, 2025 | regressionSynthetic Data Generation | —Unverified | 0 |
| Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models | May 2, 2025 | DiversityReading Comprehension | —Unverified | 0 |
| ReasonIR: Training Retrievers for Reasoning Tasks | Apr 29, 2025 | Information RetrievalMMLU | CodeCode Available | 3 |
| Artificial Intelligence for Personalized Prediction of Alzheimer's Disease Progression: A Survey of Methods, Data Challenges, and Future Directions | Apr 29, 2025 | Causal InferenceFederated Learning | —Unverified | 0 |
| Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation | Apr 29, 2025 | BenchmarkingFairness | CodeCode Available | 0 |
| Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs | Apr 28, 2025 | Synthetic Data Generation | CodeCode Available | 3 |
| Towards Ball Spin and Trajectory Analysis in Table Tennis Broadcast Videos via Physically Grounded Synthetic-to-Real Transfer | Apr 28, 2025 | Monocular 3D Object LocalizationSports Analytics | CodeCode Available | 1 |
| Unified Multi-Task Learning & Model Fusion for Efficient Language Model Guardrailing | Apr 27, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONS | Apr 25, 2025 | Clinical Language TranslationMachine Translation | CodeCode Available | 1 |
| TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation | Apr 24, 2025 | Synthetic Data GenerationTime Series | —Unverified | 0 |
| Optimizing the Privacy-Utility Balance using Synthetic Data and Configurable Perturbation Pipelines | Apr 24, 2025 | Privacy PreservingSynthetic Data Generation | —Unverified | 0 |
| A Comprehensive Survey of Synthetic Tabular Data Generation | Apr 23, 2025 | Privacy PreservingSurvey | CodeCode Available | 1 |
| ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving | Apr 23, 2025 | Code GenerationSynthetic Data Generation | —Unverified | 0 |
| Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code | Apr 23, 2025 | Instruction FollowingPrivacy Preserving | —Unverified | 0 |
| A Statistical Approach for Synthetic EEG Data Generation | Apr 22, 2025 | EEGElectroencephalogram (EEG) | CodeCode Available | 0 |
| Learning from Reasoning Failures via Synthetic Data Generation | Apr 20, 2025 | Synthetic Data Generation | —Unverified | 0 |
| Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation | Apr 17, 2025 | Synthetic Data Generation | —Unverified | 0 |
| MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation | Apr 17, 2025 | DiversityDomain Adaptation | —Unverified | 0 |
| Synthetic Data for Blood Vessel Network Extraction | Apr 16, 2025 | Graph GenerationImage Generation | —Unverified | 0 |
| Evaluating the Diversity and Quality of LLM Generated Content | Apr 16, 2025 | DiversitySynthetic Data Generation | —Unverified | 0 |
| Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task | Apr 15, 2025 | 2D Object DetectionObject | —Unverified | 0 |
| Leveraging Vertical Public-Private Split for Improved Synthetic Data Generation | Apr 15, 2025 | Synthetic Data Generation | —Unverified | 0 |
| Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation | Apr 11, 2025 | Depth EstimationInstance Segmentation | CodeCode Available | 0 |
| SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data | Apr 11, 2025 | DecoderImage Segmentation | —Unverified | 0 |
| Enhancing Metabolic Syndrome Prediction with Hybrid Data Balancing and Counterfactuals | Apr 9, 2025 | counterfactualSynthetic Data Generation | CodeCode Available | 0 |
| MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection | Apr 9, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog | Apr 9, 2025 | Synthetic Data Generation | CodeCode Available | 0 |
| A Self-Supervised Framework for Space Object Behaviour Characterisation | Apr 8, 2025 | Anomaly DetectionEarth Observation | —Unverified | 0 |
| Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | Apr 7, 2025 | GSM8KMath | —Unverified | 0 |
| CORTEX-AVD: A Framework for CORner Case Testing and EXploration in Autonomous Vehicle Development | Apr 4, 2025 | Autonomous VehiclesSynthetic Data Generation | —Unverified | 0 |
| Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data | Apr 3, 2025 | Computational EfficiencySynthetic Data Generation | —Unverified | 0 |
| Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation | Apr 2, 2025 | Cross-Lingual TransferDecoder | —Unverified | 0 |