| Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation | Mar 20, 2025 | Depth EstimationImage Reconstruction | —Unverified | 0 |
| Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance | Mar 20, 2025 | Prompt EngineeringZero-shot Generalization | —Unverified | 0 |
| STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding | Mar 20, 2025 | Video UnderstandingZero-shot Generalization | CodeCode Available | 1 |
| GenM^3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation | Mar 19, 2025 | Large Language ModelMotion Generation | —Unverified | 0 |
| Learning with Expert Abstractions for Efficient Multi-Task Continuous Control | Mar 19, 2025 | continuous-controlContinuous Control | CodeCode Available | 0 |
| Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better | Mar 19, 2025 | AttributeReinforcement Learning (RL) | —Unverified | 0 |
| Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach | Mar 18, 2025 | 6D Pose EstimationPose Estimation | —Unverified | 0 |
| Compound Expression Recognition via Large Vision-Language Models | Mar 14, 2025 | Emotion RecognitionZero-shot Generalization | —Unverified | 0 |
| Autoregressive Image Generation with Randomized Parallel Decoding | Mar 13, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter | Mar 12, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation | Mar 11, 2025 | Domain GeneralizationLanguage Modeling | CodeCode Available | 0 |
| A Recipe for Improving Remote Sensing VLM Zero Shot Generalization | Mar 10, 2025 | Cross-Modal RetrievalZero-Shot Cross-Modal Retrieval | —Unverified | 0 |
| PE3R: Perception-Efficient 3D Reconstruction | Mar 10, 2025 | 3D ReconstructionZero-shot Generalization | CodeCode Available | 3 |
| PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM | Mar 10, 2025 | DecoderPose Estimation | —Unverified | 0 |
| Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Mar 9, 2025 | Domain GeneralizationObject Detection | CodeCode Available | 4 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction | Mar 5, 2025 | Vision-Language-ActionZero-shot Generalization | —Unverified | 0 |
| RAILGUN: A Unified Convolutional Policy for Multi-Agent Path Finding Across Different Environments and Tasks | Mar 4, 2025 | Multi-Agent Path FindingZero-shot Generalization | —Unverified | 0 |
| Nature-Inspired Population-Based Evolution of Large Language Models | Mar 3, 2025 | GPUZero-shot Generalization | CodeCode Available | 1 |
| Re-Imagining Multimodal Instruction Tuning: A Representation View | Mar 2, 2025 | Instruction FollowingMME | CodeCode Available | 0 |
| Delving into Out-of-Distribution Detection with Medical Vision-Language Models | Mar 2, 2025 | Benchmarkingimage-classification | CodeCode Available | 1 |
| Contrastive Learning of English Language and Crystal Graphs for Multimodal Representation of Materials Knowledge | Feb 23, 2025 | Contrastive LearningZero-shot Generalization | —Unverified | 0 |
| Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models | Feb 20, 2025 | Reinforcement Learning (RL)Zero-shot Generalization | —Unverified | 0 |
| GeLLMO: Generalizing Large Language Models for Multi-property Molecule Optimization | Feb 19, 2025 | Zero-shot Generalization | CodeCode Available | 0 |
| WRT-SAM: Foundation Model-Driven Segmentation for Generalized Weld Radiographic Testing | Feb 17, 2025 | Anomaly DetectionImage Segmentation | —Unverified | 0 |