| Beyond the LUMIR challenge: The pathway to foundational registration models | May 30, 2025 | Image RegistrationZero-shot Generalization | CodeCode Available | 1 |
| Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | May 26, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 |
| ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving | May 26, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 1 |
| WHISTRESS: Enriching Transcriptions with Sentence Stress Detection | May 25, 2025 | SentenceZero-shot Generalization | —Unverified | 0 |
| G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning | May 24, 2025 | Link PredictionNode Classification | —Unverified | 0 |
| Anchored Diffusion Language Model | May 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing | May 23, 2025 | de novo peptide sequencingReranking | CodeCode Available | 1 |
| EasyInsert: A Data-Efficient and Generalizable Insertion Policy | May 22, 2025 | Pose PredictionZero-shot Generalization | —Unverified | 0 |
| CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning | May 22, 2025 | Zero-shot Generalization | —Unverified | 0 |
| AnyBody: A Benchmark Suite for Cross-Embodiment Manipulation | May 21, 2025 | Zero-shot Generalization | —Unverified | 0 |
| Prompt Tuning Vision Language Models with Margin Regularizer for Few-Shot Learning under Distribution Shifts | May 21, 2025 | Few-Shot LearningTask 2 | CodeCode Available | 0 |
| Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization | May 21, 2025 | Vision-Language-ActionZero-shot Generalization | CodeCode Available | 2 |
| gen2seg: Generative Models Enable Generalizable Instance Segmentation | May 21, 2025 | DecoderInstance Segmentation | —Unverified | 0 |
| EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy | May 21, 2025 | Motion PlanningVision-Language-Action | —Unverified | 0 |
| A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMs | May 19, 2025 | Machine Translationnamed-entity-recognition | CodeCode Available | 0 |
| ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling | May 19, 2025 | Graph GenerationKnowledge Distillation | —Unverified | 0 |
| AoP-SAM: Automation of Prompts for Efficient Segmentation | May 17, 2025 | Image SegmentationPrompt Engineering | —Unverified | 0 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 |
| GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction | May 16, 2025 | General KnowledgeZero-shot Generalization | CodeCode Available | 0 |
| Depth Anything with Any Prior | May 15, 2025 | Depth CompletionDepth Estimation | —Unverified | 0 |
| NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning | May 15, 2025 | Novel View SynthesisRobot Manipulation | —Unverified | 0 |
| Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis | May 14, 2025 | DenoisingDepth Estimation | CodeCode Available | 7 |
| Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing | May 14, 2025 | cross-modal alignmentDenoising | —Unverified | 0 |
| Foundation Models Knowledge Distillation For Battery Capacity Degradation Forecast | May 13, 2025 | Knowledge DistillationTime Series | CodeCode Available | 1 |