| Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning | Dec 19, 2023 | DiversityInstruction Following | —Unverified | 0 |
| A Dual Curriculum Learning Framework for Multi-UAV Pursuit-Evasion in Diverse Environments | Dec 19, 2023 | Reinforcement Learning (RL)Zero-shot Generalization | —Unverified | 0 |
| Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey | Dec 15, 2023 | Image GenerationImage Segmentation | —Unverified | 0 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning | Dec 14, 2023 | DecoderLanguage Modelling | —Unverified | 0 |
| How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation | Dec 12, 2023 | Anomaly DetectionAutonomous Driving | CodeCode Available | 1 |
| Adaptive Human Trajectory Prediction via Latent Corridors | Dec 11, 2023 | PredictionTrajectory Prediction | —Unverified | 0 |
| Multi-View Unsupervised Image Generation with Cross Attention Guidance | Dec 7, 2023 | Hard AttentionImage Generation | —Unverified | 0 |
| MuRF: Multi-Baseline Radiance Fields | Dec 7, 2023 | NeRFZero-shot Generalization | CodeCode Available | 1 |
| Large Language Models are Good Prompt Learners for Low-Shot Image Classification | Dec 7, 2023 | ClassificationFew-Shot Image Classification | CodeCode Available | 1 |
| Boosting Segment Anything Model Towards Open-Vocabulary Learning | Dec 6, 2023 | modelObject | CodeCode Available | 1 |
| MASP: Scalable GNN-based Planning for Multi-Agent Navigation | Dec 5, 2023 | Reinforcement Learning (RL)Zero-shot Generalization | —Unverified | 0 |
| I-PHYRE: Interactive Physical Reasoning | Dec 4, 2023 | Zero-shot Generalization | —Unverified | 0 |
| Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation | Dec 4, 2023 | Depth EstimationGPU | CodeCode Available | 4 |
| Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent | Nov 30, 2023 | Autonomous VehiclesCommon Sense Reasoning | —Unverified | 0 |
| Large Model Based Referring Camouflaged Object Detection | Nov 28, 2023 | modelObject | —Unverified | 0 |
| UniIR: Training and Benchmarking Universal Multimodal Information Retrievers | Nov 28, 2023 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing | Nov 27, 2023 | Language ModellingPrompt Learning | —Unverified | 0 |
| VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning | Nov 25, 2023 | DecoderModel Optimization | CodeCode Available | 1 |
| A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs | Nov 21, 2023 | object-detectionObject Detection | —Unverified | 0 |
| Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders | Nov 16, 2023 | Data AugmentationDomain Generalization | CodeCode Available | 1 |
| Neural-Logic Human-Object Interaction Detection | Nov 16, 2023 | DecoderHuman-Object Interaction Detection | CodeCode Available | 1 |
| Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts | Nov 15, 2023 | Question AnsweringSentence | CodeCode Available | 0 |
| Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech | Nov 15, 2023 | Contrastive LearningCross-corpus | CodeCode Available | 0 |
| Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels | Nov 12, 2023 | PathfinderVisual Reasoning | —Unverified | 0 |