| Zero-Shot Monocular Scene Flow Estimation in the Wild | Jan 17, 2025 | Depth EstimationPrediction | —Unverified | 0 |
| StereoGen: High-quality Stereo Image Generation from a Single Image | Jan 15, 2025 | Depth EstimationImage Generation | —Unverified | 0 |
| Capability-Aware Shared Hypernetworks for Flexible Heterogeneous Multi-Robot Coordination | Jan 10, 2025 | DiversityImitation Learning | CodeCode Available | 0 |
| Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation | Jan 8, 2025 | Code GenerationLanguage Modeling | —Unverified | 0 |
| MADation: Face Morphing Attack Detection with Foundation Models | Jan 7, 2025 | Face Morphing Attack DetectionFace Recognition | CodeCode Available | 0 |
| Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models | Jan 3, 2025 | Zero-shot Generalization | CodeCode Available | 0 |
| On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach | Jan 1, 2025 | Adversarial RobustnessZero-shot Generalization | —Unverified | 0 |
| On the Out-Of-Distribution Generalization of Large Multimodal Models | Jan 1, 2025 | In-Context LearningOut-of-Distribution Generalization | —Unverified | 0 |
| From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models | Dec 31, 2024 | Decision MakingZero-shot Generalization | —Unverified | 0 |
| EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation | Dec 25, 2024 | ObjectZero-shot Generalization | —Unverified | 0 |
| Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio | Dec 23, 2024 | Contrastive LearningPrompt Learning | —Unverified | 0 |
| Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees | Dec 21, 2024 | Graph Neural NetworkIn-Context Learning | CodeCode Available | 0 |
| Zero-Shot Generalization for Blockage Localization in mmWave Communication | Dec 18, 2024 | Self-Supervised LearningZero-shot Generalization | —Unverified | 0 |
| Efficient Fine-Tuning of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction | Dec 18, 2024 | Drug DiscoveryZero-shot Generalization | —Unverified | 0 |
| Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer | Dec 18, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 0 |
| Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion | Dec 18, 2024 | DenoisingDepth Completion | —Unverified | 0 |
| EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Dec 12, 2024 | Image ComprehensionImage Generation | —Unverified | 0 |
| WiFo: Wireless Foundation Model for Channel Prediction | Dec 12, 2024 | modelMulti-Task Learning | —Unverified | 0 |
| Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion | Dec 11, 2024 | GPUImage Segmentation | CodeCode Available | 0 |
| Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models | Dec 11, 2024 | DisentanglementPosition | —Unverified | 0 |
| ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning | Dec 10, 2024 | Evolutionary AlgorithmsLifelong learning | CodeCode Available | 0 |
| S^3: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models | Dec 6, 2024 | zero-shot-classificationZero-shot Generalization | —Unverified | 0 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| UTSD: Unified Time Series Diffusion Model | Dec 4, 2024 | Denoisingmodel | —Unverified | 0 |
| The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control | Dec 4, 2024 | Zero-shot Generalization | —Unverified | 0 |
| Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Nov 26, 2024 | Decodermultimodal generation | —Unverified | 0 |
| Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models | Nov 25, 2024 | Domain GeneralizationPrompt Learning | —Unverified | 0 |
| Generating Out-Of-Distribution Scenarios Using Language Models | Nov 25, 2024 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| Context-Aware Multimodal Pretraining | Nov 22, 2024 | Contrastive LearningRepresentation Learning | —Unverified | 0 |
| SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation | Nov 19, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 0 |
| HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments | Nov 19, 2024 | Deep Reinforcement LearningRobot Navigation | —Unverified | 0 |
| Scalable Autoregressive Monocular Depth Estimation | Nov 18, 2024 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 |
| MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models | Nov 15, 2024 | Instruction FollowingZero-shot Generalization | CodeCode Available | 0 |
| Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos | Nov 14, 2024 | 4D reconstructionSelf-Supervised Learning | —Unverified | 0 |
| Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching | Nov 14, 2024 | Depth EstimationKnowledge Distillation | —Unverified | 0 |
| In the Era of Prompt Learning with Vision-Language Models | Nov 7, 2024 | Domain AdaptationDomain Generalization | —Unverified | 0 |
| Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity | Nov 7, 2024 | DiversityMeta Reinforcement Learning | CodeCode Available | 0 |
| Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli | Nov 3, 2024 | Optical Flow EstimationSemantic Segmentation | CodeCode Available | 0 |
| Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning | Oct 31, 2024 | Graph Neural Networkreinforcement-learning | —Unverified | 0 |
| JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking | Oct 31, 2024 | Code CompletionOpen-Domain Question Answering | —Unverified | 0 |
| GHIL-Glue: Hierarchical Control with Filtered Subgoal Images | Oct 26, 2024 | Imitation LearningVideo Prediction | —Unverified | 0 |
| Adversarial Environment Design via Regret-Guided Diffusion Models | Oct 25, 2024 | Deep Reinforcement LearningDiversity | —Unverified | 0 |
| Random Policy Enables In-Context Reinforcement Learning within Trust Horizons | Oct 25, 2024 | In-Context LearningIn-Context Reinforcement Learning | —Unverified | 0 |
| BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Oct 24, 2024 | Instruction FollowingNatural Language Understanding | —Unverified | 0 |
| LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias | Oct 22, 2024 | 3DGSDecoder | —Unverified | 0 |
| DEL-Ranking: Ranking-Correction Denoising Framework for Elucidating Molecular Affinities in DNA-Encoded Libraries | Oct 19, 2024 | DenoisingZero-shot Generalization | —Unverified | 0 |
| MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer | Oct 14, 2024 | Transfer LearningVideo Recognition | CodeCode Available | 0 |
| On the Evaluation of Generative Robotic Simulations | Oct 10, 2024 | Diversitytext similarity | —Unverified | 0 |
| Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels | Oct 10, 2024 | Motion ForecastingZero-shot Generalization | —Unverified | 0 |
| Zero-Shot Generalization of Vision-Based RL Without Data Augmentation | Oct 9, 2024 | Data AugmentationDisentanglement | —Unverified | 0 |