| ISCUTE: Instance Segmentation of Cables Using Text Embedding | Feb 19, 2024 | Instance SegmentationObject Recognition | —Unverified | 0 |
| Triple-Encoders: Representations That Fire Together, Wire Together | Feb 19, 2024 | Contrastive LearningRepresentation Learning | CodeCode Available | 1 |
| From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning | Feb 19, 2024 | Motion PlanningTask and Motion Planning | —Unverified | 0 |
| 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Feb 18, 2024 | DenoisingRobot Manipulation | CodeCode Available | 3 |
| Unsupervised Discovery of Object-Centric Neural Fields | Feb 12, 2024 | ObjectObject Discovery | —Unverified | 0 |
| On the Out-Of-Distribution Generalization of Multimodal Large Language Models | Feb 9, 2024 | In-Context LearningOut-of-Distribution Generalization | —Unverified | 0 |
| Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control | Feb 9, 2024 | Zero-shot Generalization | CodeCode Available | 0 |
| Learning to Route Among Specialized Experts for Zero-Shot Generalization | Feb 8, 2024 | parameter-efficient fine-tuningZero-shot Generalization | CodeCode Available | 2 |
| InCoRo: In-Context Learning for Robotics Control with Feedback Loops | Feb 7, 2024 | In-Context LearningScene Understanding | —Unverified | 0 |
| Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains | Feb 6, 2024 | TAGZero-shot Generalization | CodeCode Available | 1 |
| Image-Caption Encoding for Improving Zero-Shot Generalization | Feb 5, 2024 | image-classificationImage Classification | CodeCode Available | 0 |
| Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning | Feb 4, 2024 | Contact-rich ManipulationZero-shot Generalization | CodeCode Available | 2 |
| Symbol: Generating Flexible Black-Box Optimizers through Symbolic Equation Learning | Feb 4, 2024 | Meta-LearningZero-shot Generalization | CodeCode Available | 1 |
| Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model | Jan 31, 2024 | Image Segmentationparameter-efficient fine-tuning | —Unverified | 0 |
| Data-Free Generalized Zero-Shot Learning | Jan 28, 2024 | Generalized Zero-Shot Learningzero-shot-classification | CodeCode Available | 0 |
| InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions | Jan 24, 2024 | document understandingQuestion Answering | CodeCode Available | 2 |
| Solving Continual Offline Reinforcement Learning with Decision Transformer | Jan 16, 2024 | Offline RLreinforcement-learning | —Unverified | 0 |
| Exploring the Best Practices of Query Expansion with Large Language Models | Jan 12, 2024 | Information RetrievalRe-Ranking | CodeCode Available | 1 |
| An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models | Jan 12, 2024 | Active LearningDiversity | —Unverified | 0 |
| MatSAM: Efficient Extraction of Microstructures of Materials via Visual Large Model | Jan 11, 2024 | Image SegmentationPrompt Engineering | CodeCode Available | 1 |
| Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness | Jan 9, 2024 | Adversarial RobustnessZero-shot Generalization | CodeCode Available | 1 |
| TimeGraphs: Graph-based Temporal Reasoning | Jan 6, 2024 | Zero-shot Generalization | —Unverified | 0 |
| ConfusionPrompt: Practical Private Inference for Online Large Language Models | Dec 30, 2023 | Privacy PreservingZero-shot Generalization | —Unverified | 0 |
| Semantic Guidance Tuning for Text-To-Image Diffusion Models | Dec 26, 2023 | Zero-shot Generalization | CodeCode Available | 2 |
| Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation | Dec 20, 2023 | Robot ManipulationZero-shot Generalization | CodeCode Available | 2 |
| Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning | Dec 19, 2023 | DiversityInstruction Following | —Unverified | 0 |
| A Dual Curriculum Learning Framework for Multi-UAV Pursuit-Evasion in Diverse Environments | Dec 19, 2023 | Reinforcement Learning (RL)Zero-shot Generalization | —Unverified | 0 |
| Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey | Dec 15, 2023 | Image GenerationImage Segmentation | —Unverified | 0 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning | Dec 14, 2023 | DecoderLanguage Modelling | —Unverified | 0 |
| How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation | Dec 12, 2023 | Anomaly DetectionAutonomous Driving | CodeCode Available | 1 |
| Adaptive Human Trajectory Prediction via Latent Corridors | Dec 11, 2023 | PredictionTrajectory Prediction | —Unverified | 0 |
| Multi-View Unsupervised Image Generation with Cross Attention Guidance | Dec 7, 2023 | Hard AttentionImage Generation | —Unverified | 0 |
| MuRF: Multi-Baseline Radiance Fields | Dec 7, 2023 | NeRFZero-shot Generalization | CodeCode Available | 1 |
| Large Language Models are Good Prompt Learners for Low-Shot Image Classification | Dec 7, 2023 | ClassificationFew-Shot Image Classification | CodeCode Available | 1 |
| Boosting Segment Anything Model Towards Open-Vocabulary Learning | Dec 6, 2023 | modelObject | CodeCode Available | 1 |
| MASP: Scalable GNN-based Planning for Multi-Agent Navigation | Dec 5, 2023 | Reinforcement Learning (RL)Zero-shot Generalization | —Unverified | 0 |
| I-PHYRE: Interactive Physical Reasoning | Dec 4, 2023 | Zero-shot Generalization | —Unverified | 0 |
| Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation | Dec 4, 2023 | Depth EstimationGPU | CodeCode Available | 4 |
| Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent | Nov 30, 2023 | Autonomous VehiclesCommon Sense Reasoning | —Unverified | 0 |
| Large Model Based Referring Camouflaged Object Detection | Nov 28, 2023 | modelObject | —Unverified | 0 |
| UniIR: Training and Benchmarking Universal Multimodal Information Retrievers | Nov 28, 2023 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing | Nov 27, 2023 | Language ModellingPrompt Learning | —Unverified | 0 |
| VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning | Nov 25, 2023 | DecoderModel Optimization | CodeCode Available | 1 |
| A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs | Nov 21, 2023 | object-detectionObject Detection | —Unverified | 0 |
| Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders | Nov 16, 2023 | Data AugmentationDomain Generalization | CodeCode Available | 1 |
| Neural-Logic Human-Object Interaction Detection | Nov 16, 2023 | DecoderHuman-Object Interaction Detection | CodeCode Available | 1 |
| Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts | Nov 15, 2023 | Question AnsweringSentence | CodeCode Available | 0 |
| Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech | Nov 15, 2023 | Contrastive LearningCross-corpus | CodeCode Available | 0 |
| Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels | Nov 12, 2023 | PathfinderVisual Reasoning | —Unverified | 0 |