| Synthetica: Large Scale Synthetic Data for Robot Perception | Oct 28, 2024 | Data Augmentationobject-detection | —Unverified | 0 |
| zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation | Oct 28, 2024 | Binary ClassificationGenerative Adversarial Network | —Unverified | 0 |
| Rethinking Data Synthesis: A Teacher Model Training Recipe with Interpretation | Oct 27, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Little Giants: Synthesizing High-Quality Embedding Data at Scale | Oct 24, 2024 | Synthetic Data Generation | CodeCode Available | 0 |
| Privacy-hardened and hallucination-resistant synthetic data generation with logic-solvers | Oct 22, 2024 | Generative Adversarial NetworkHallucination | —Unverified | 0 |
| No more hard prompts: SoftSRV prompting for synthetic data generation | Oct 21, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs -- Evaluation through Synthetic Data Generation | Oct 21, 2024 | Synthetic Data Generation | —Unverified | 0 |
| Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning | Oct 21, 2024 | Spatial ReasoningSynthetic Data Generation | —Unverified | 0 |
| Synthetic Data Generation for Residential Load Patterns via Recurrent GAN and Ensemble Method | Oct 20, 2024 | DiversityGenerative Adversarial Network | —Unverified | 0 |
| ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions | Oct 18, 2024 | HallucinationNatural Questions | CodeCode Available | 0 |
| CCUP: A Controllable Synthetic Data Generation Pipeline for Pretraining Cloth-Changing Person Re-Identification Models | Oct 17, 2024 | Cloth-Changing Person Re-IdentificationPerson Re-Identification | CodeCode Available | 0 |
| A Little Human Data Goes A Long Way | Oct 17, 2024 | Fact VerificationQuestion Answering | CodeCode Available | 0 |
| Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection | Oct 16, 2024 | HallucinationIn-Context Learning | —Unverified | 0 |
| CONSULT: Contrastive Self-Supervised Learning for Few-shot Tumor Detection | Oct 15, 2024 | Anomaly DetectionContrastive Learning | —Unverified | 0 |
| LLM-based Code-Switched Text Generation for Grammatical Error Correction | Oct 14, 2024 | Grammatical Error CorrectionSynthetic Data Generation | —Unverified | 0 |
| TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs | Oct 14, 2024 | Synthetic Data Generation | CodeCode Available | 0 |
| DFIMat: Decoupled Flexible Interactive Matting in Multi-Person Scenarios | Oct 13, 2024 | Image MattingSynthetic Data Generation | CodeCode Available | 0 |
| Driving Privacy Forward: Mitigating Information Leakage within Smart Vehicles through Synthetic Data Generation | Oct 11, 2024 | Synthetic Data Generation | CodeCode Available | 0 |
| SimpleStrat: Diversifying Language Model Generation with Stratification | Oct 11, 2024 | DiversityLanguage Modeling | —Unverified | 0 |
| JurEE not Judges: safeguarding llm interactions with small, specialised Encoder Ensembles | Oct 11, 2024 | Decision MakingSynthetic Data Generation | —Unverified | 0 |
| Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains | Oct 10, 2024 | FairnessPrivacy Preserving | —Unverified | 0 |
| Unsupervised Data Validation Methods for Efficient Model Training | Oct 10, 2024 | Data Augmentationmodel | —Unverified | 0 |
| Fill In The Gaps: Model Calibration and Generalization with Synthetic Data | Oct 7, 2024 | DiversityPAC learning | —Unverified | 0 |
| Privacy Vulnerabilities in Marginals-based Synthetic Data | Oct 7, 2024 | Inference AttackMembership Inference Attack | —Unverified | 0 |
| SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification | Oct 7, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation | Oct 4, 2024 | Domain AdaptationHallucination | —Unverified | 0 |
| Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images | Oct 4, 2024 | DisentanglementPose Estimation | —Unverified | 0 |
| Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Oct 3, 2024 | HumanEvalSynthetic Data Generation | CodeCode Available | 1 |
| Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective | Oct 2, 2024 | Synthetic Data Generation | CodeCode Available | 0 |
| Restoring Super-High Resolution GPS Mobility Data | Oct 1, 2024 | DecoderSynthetic Data Generation | —Unverified | 0 |
| Targeted synthetic data generation for tabular data via hardness characterization | Oct 1, 2024 | Data AugmentationData Valuation | CodeCode Available | 0 |
| Improved Generation of Synthetic Imaging Data Using Feature-Aligned Diffusion | Oct 1, 2024 | DiversitySSIM | CodeCode Available | 0 |
| DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation | Sep 30, 2024 | Code GenerationSynthetic Data Generation | CodeCode Available | 0 |
| DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining | Sep 30, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 |
| Balancing Cost and Effectiveness of Synthetic Data Generation Strategies for LLMs | Sep 29, 2024 | Synthetic Data Generation | —Unverified | 0 |
| Differentially Private Non Parametric Copulas: Generating synthetic data with non parametric copulas under privacy guarantees | Sep 27, 2024 | Privacy PreservingSynthetic Data Generation | —Unverified | 0 |
| Preserving logical and functional dependencies in synthetic tabular data | Sep 26, 2024 | AttributeSynthetic Data Generation | CodeCode Available | 0 |
| Artificial Data Point Generation in Clustered Latent Space for Small Medical Datasets | Sep 26, 2024 | Overall - TestSynthetic Data Generation | —Unverified | 0 |
| KIPPS: Knowledge infusion in Privacy Preserving Synthetic Data Generation | Sep 25, 2024 | AttributeKnowledge Graphs | —Unverified | 0 |
| Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints | Sep 24, 2024 | Dataset GenerationPrivacy Preserving | CodeCode Available | 0 |
| MANTA -- Model Adapter Native generations that's Affordable | Sep 22, 2024 | Diversitymodel | —Unverified | 0 |
| CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair | Sep 19, 2024 | Code GenerationCode Repair | CodeCode Available | 0 |
| Making Large Language Models into World Models with Precondition and Effect Knowledge | Sep 18, 2024 | Synthetic Data Generation | —Unverified | 0 |
| Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation | Sep 18, 2024 | Dataset GenerationManagement | —Unverified | 0 |
| Qwen2.5-Coder Technical Report | Sep 18, 2024 | Code Generation | CodeCode Available | 11 |
| Synthetic data augmentation for robotic mobility aids to support blind and low vision people | Sep 17, 2024 | Data AugmentationDiversity | —Unverified | 0 |
| Enhanced segmentation of femoral bone metastasis in CT scans of patients using synthetic data generation with 3D diffusion models | Sep 17, 2024 | DenoisingSegmentation | —Unverified | 0 |
| SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records | Sep 13, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data | Sep 12, 2024 | Synthetic Data Generation | —Unverified | 0 |
| Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | Sep 12, 2024 | Multi-hop Question AnsweringQuestion Answering | —Unverified | 0 |