| S^3: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models | Dec 6, 2024 | zero-shot-classificationZero-shot Generalization | —Unverified | 0 |
| Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Dec 5, 2024 | Stereo MatchingZero-shot Generalization | CodeCode Available | 3 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| UTSD: Unified Time Series Diffusion Model | Dec 4, 2024 | Denoisingmodel | —Unverified | 0 |
| The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control | Dec 4, 2024 | Zero-shot Generalization | —Unverified | 0 |
| COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection | Nov 28, 2024 | object-detectionObject Detection | CodeCode Available | 1 |
| Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Nov 26, 2024 | GPUImage Generation | CodeCode Available | 2 |
| vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation | Nov 26, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 2 |
| Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Nov 26, 2024 | Decodermultimodal generation | —Unverified | 0 |
| Generating Out-Of-Distribution Scenarios Using Language Models | Nov 25, 2024 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models | Nov 25, 2024 | Domain GeneralizationPrompt Learning | —Unverified | 0 |
| Context-Aware Multimodal Pretraining | Nov 22, 2024 | Contrastive LearningRepresentation Learning | —Unverified | 0 |
| SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation | Nov 19, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 0 |
| HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments | Nov 19, 2024 | Deep Reinforcement LearningRobot Navigation | —Unverified | 0 |
| Scalable Autoregressive Monocular Depth Estimation | Nov 18, 2024 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 |
| MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models | Nov 15, 2024 | Instruction FollowingZero-shot Generalization | CodeCode Available | 0 |
| Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos | Nov 14, 2024 | 4D reconstructionSelf-Supervised Learning | —Unverified | 0 |
| Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching | Nov 14, 2024 | Depth EstimationKnowledge Distillation | —Unverified | 0 |
| WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Nov 8, 2024 | Task PlanningZero-shot Generalization | CodeCode Available | 2 |
| In the Era of Prompt Learning with Vision-Language Models | Nov 7, 2024 | Domain AdaptationDomain Generalization | —Unverified | 0 |
| Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity | Nov 7, 2024 | DiversityMeta Reinforcement Learning | CodeCode Available | 0 |
| Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli | Nov 3, 2024 | Optical Flow EstimationSemantic Segmentation | CodeCode Available | 0 |
| ZIM: Zero-Shot Image Matting for Anything | Nov 1, 2024 | Image InpaintingImage Matting | CodeCode Available | 3 |
| JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking | Oct 31, 2024 | Code CompletionOpen-Domain Question Answering | —Unverified | 0 |
| Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning | Oct 31, 2024 | Graph Neural Networkreinforcement-learning | —Unverified | 0 |