| STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models | May 30, 2025 | Video Generation | CodeCode Available | 1 |
| Conformal Prediction for Zero-Shot Models | May 30, 2025 | Conformal PredictionPrediction | CodeCode Available | 1 |
| Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation | May 30, 2025 | Computational EfficiencySign Language Translation | CodeCode Available | 1 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Beyond FACS: Data-driven Facial Expression Dictionaries, with Application to Predicting Autism | May 30, 2025 | | CodeCode Available | 1 |
| Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin | May 30, 2025 | DenoisingImage Generation | CodeCode Available | 1 |
| Efficient RAW Image Deblurring with Adaptive Frequency Modulation | May 30, 2025 | Computational EfficiencyDeblurring | CodeCode Available | 1 |
| Learning Safety Constraints for Large Language Models | May 30, 2025 | Adversarial Attack | CodeCode Available | 1 |
| Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting | May 30, 2025 | Decision Making | CodeCode Available | 1 |
| EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding | May 30, 2025 | Action RecognitionGraph Generation | CodeCode Available | 1 |
| Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings | May 30, 2025 | ChunkingComputational Efficiency | CodeCode Available | 1 |
| Beyond the LUMIR challenge: The pathway to foundational registration models | May 30, 2025 | Image RegistrationZero-shot Generalization | CodeCode Available | 1 |
| Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT | May 30, 2025 | Spatial ReasoningVisual Reasoning | CodeCode Available | 1 |
| Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation | May 30, 2025 | Instruction Following | CodeCode Available | 1 |
| Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | May 30, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models | May 30, 2025 | Image Restoration | CodeCode Available | 1 |
| SiLVR: A Simple Language-based Video Reasoning Framework | May 30, 2025 | MathMME | CodeCode Available | 1 |
| Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings | May 30, 2025 | In-Context Learning | CodeCode Available | 1 |
| Sorrel: A simple and flexible framework for multi-agent reinforcement learning | May 30, 2025 | Multi-agent Reinforcement LearningPhilosophy | CodeCode Available | 1 |
| Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | May 30, 2025 | Autonomous DrivingMath | CodeCode Available | 1 |
| ByzFL: Research Framework for Robust Federated Learning | May 30, 2025 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| Large Language Models are Locally Linear Mappings | May 30, 2025 | | CodeCode Available | 1 |
| HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts | May 30, 2025 | ARCGeneral Knowledge | CodeCode Available | 1 |
| The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models | May 30, 2025 | HallucinationMathematical Reasoning | CodeCode Available | 1 |
| Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research | May 30, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| 3D Gaussian Splat Vulnerabilities | May 30, 2025 | 3DGSAdversarial Attack | CodeCode Available | 1 |
| un^2CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP | May 30, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 |
| DisTime: Distribution-based Time Representation for Video Large Language Models | May 30, 2025 | Temporal LocalizationVideo Understanding | CodeCode Available | 1 |
| Towards Effective Code-Integrated Reasoning | May 30, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 1 |
| Boosting All-in-One Image Restoration via Self-Improved Privilege Learning | May 30, 2025 | AllImage Restoration | CodeCode Available | 1 |
| Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model | May 30, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ProxyThinker: Test-Time Guidance through Small Visual Reasoners | May 30, 2025 | Visual Reasoning | CodeCode Available | 1 |
| TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence | May 30, 2025 | | CodeCode Available | 1 |
| ScienceMeter: Tracking Scientific Knowledge Updates in Language Models | May 30, 2025 | | CodeCode Available | 1 |
| BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization | May 30, 2025 | | CodeCode Available | 1 |
| Reinforcing Video Reasoning with Focused Thinking | May 30, 2025 | Data AugmentationVisual Reasoning | CodeCode Available | 1 |
| RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement | May 30, 2025 | Image EnhancementLow-Light Image Enhancement | CodeCode Available | 1 |
| Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation | May 30, 2025 | AllBenchmarking | CodeCode Available | 1 |
| FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification | May 29, 2025 | Anomaly DetectionContrastive Learning | CodeCode Available | 1 |
| Cora: Correspondence-aware image editing using few step diffusion | May 29, 2025 | Image-to-Image TranslationSemantic correspondence | CodeCode Available | 1 |
| Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages | May 29, 2025 | DiversityPrompt Learning | CodeCode Available | 1 |
| Label-Guided In-Context Learning for Named Entity Recognition | May 29, 2025 | In-Context Learningnamed-entity-recognition | CodeCode Available | 1 |
| LADA: Scalable Label-Specific CLIP Adapter for Continual Learning | May 29, 2025 | Continual Learning | CodeCode Available | 1 |
| Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint | May 29, 2025 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| CrossLinear: Plug-and-Play Cross-Correlation Embedding for Time Series Forecasting with Exogenous Variables | May 29, 2025 | Time SeriesTime Series Forecasting | CodeCode Available | 1 |
| The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets | May 29, 2025 | | CodeCode Available | 1 |
| Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation | May 29, 2025 | | CodeCode Available | 1 |
| Directed Graph Grammars for Sequence-based Learning | May 29, 2025 | Bayesian OptimizationGraph Generation | CodeCode Available | 1 |
| Accelerating AllReduce with a Persistent Straggler | May 29, 2025 | GPU | CodeCode Available | 1 |
| How does Transformer Learn Implicit Reasoning? | May 29, 2025 | ClusteringDiagnostic | CodeCode Available | 1 |