| From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning | May 30, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| How much do language models memorize? | May 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval | May 30, 2025 | Emotion RecognitionRetrieval | —Unverified | 0 |
| ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL | May 30, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning | May 30, 2025 | Question AnsweringReinforcement Learning (RL) | —Unverified | 0 |
| ByzFL: Research Framework for Robust Federated Learning | May 30, 2025 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations | May 30, 2025 | | CodeCode Available | 2 |
| The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models | May 30, 2025 | HallucinationMathematical Reasoning | CodeCode Available | 1 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models | May 30, 2025 | Video Generation | CodeCode Available | 1 |
| Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting | May 30, 2025 | Decision Making | CodeCode Available | 1 |
| Learning Safety Constraints for Large Language Models | May 30, 2025 | Adversarial Attack | CodeCode Available | 1 |
| Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | May 30, 2025 | BenchmarkingBlocking | CodeCode Available | 2 |
| Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists? | May 30, 2025 | | CodeCode Available | 0 |
| WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions | May 30, 2025 | | CodeCode Available | 0 |
| MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge | May 30, 2025 | Emotion RecognitionSelf-Supervised Learning | CodeCode Available | 0 |
| RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation | May 30, 2025 | Code GenerationDiversity | CodeCode Available | 0 |
| AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning | May 30, 2025 | GPUMath | CodeCode Available | 7 |
| Invariant Link Selector for Spatial-Temporal Out-of-Distribution Problem | May 30, 2025 | Citation RecommendationLink Prediction | CodeCode Available | 0 |
| ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration | May 30, 2025 | Low-rank compression | CodeCode Available | 0 |
| Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining | May 30, 2025 | Sensitivity | CodeCode Available | 0 |
| Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors | May 30, 2025 | MisinformationText Detection | CodeCode Available | 0 |
| VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software | May 30, 2025 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| Multi-criteria Rank-based Aggregation for Explainable AI | May 30, 2025 | Decision MakingFeature Importance | CodeCode Available | 0 |
| Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin | May 30, 2025 | DenoisingImage Generation | CodeCode Available | 1 |
| Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models | May 30, 2025 | MathMultiple-choice | CodeCode Available | 0 |
| Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion | May 30, 2025 | | CodeCode Available | 0 |
| Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings | May 30, 2025 | ChunkingComputational Efficiency | CodeCode Available | 1 |
| Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion | May 30, 2025 | In-Context LearningVoice Conversion | —Unverified | 0 |
| The Gaussian Mixing Mechanism: Renyi Differential Privacy via Gaussian Sketches | May 30, 2025 | Federated Learning | CodeCode Available | 0 |
| RealDrive: Retrieval-Augmented Driving with Diffusion Models | May 30, 2025 | DenoisingRAG | —Unverified | 0 |
| AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders | May 30, 2025 | Response Generation | —Unverified | 0 |
| LLM Inference Enhanced by External Knowledge: A Survey | May 30, 2025 | HallucinationKnowledge Graphs | CodeCode Available | 0 |
| PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches | May 30, 2025 | Binary ClassificationClassification | CodeCode Available | 0 |
| Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings | May 30, 2025 | In-Context Learning | CodeCode Available | 1 |
| TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores | May 30, 2025 | 3DGS | CodeCode Available | 2 |
| Statistical mechanics of extensive-width Bayesian neural networks near interpolation | May 30, 2025 | | CodeCode Available | 0 |
| EVA-MILP: Towards Standardized Evaluation of MILP Instance Generation | May 30, 2025 | | CodeCode Available | 0 |
| Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation | May 30, 2025 | Instruction Following | CodeCode Available | 1 |
| Predicting the Past: Estimating Historical Appraisals with OCR and Machine Learning | May 30, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 0 |
| Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability | May 30, 2025 | | CodeCode Available | 0 |
| Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation | May 30, 2025 | Continual PretrainingFairness | CodeCode Available | 0 |
| LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews | May 30, 2025 | Binary ClassificationQuestion Answering | CodeCode Available | 0 |
| LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text | May 30, 2025 | Quantization | CodeCode Available | 0 |
| When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways | May 30, 2025 | Continual LearningImage Augmentation | CodeCode Available | 2 |
| Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | May 30, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards | May 30, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 5 |
| Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | May 30, 2025 | Autonomous DrivingMath | CodeCode Available | 1 |
| ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous Driving | May 30, 2025 | Autonomous DrivingDecision Making | —Unverified | 0 |
| Federated Foundation Model for GI Endoscopy Images | May 30, 2025 | modelPrivacy Preserving | —Unverified | 0 |