| Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning | Apr 4, 2024 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 2 | 5 |
| 2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection | Jun 15, 2023 | Anomaly DetectionAnomaly Localization | CodeCode Available | 2 | 5 |
| Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV | Mar 3, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 | 5 |
| Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model | Mar 17, 2024 | Image RestorationZero-shot Generalization | CodeCode Available | 2 | 5 |
| SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation | Apr 6, 2025 | Multi-Object TrackingObject | CodeCode Available | 2 | 5 |
| GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Jun 18, 2024 | BenchmarkingDepth Estimation | CodeCode Available | 2 | 5 |
| RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model | Mar 12, 2024 | Change DetectionZero-shot Generalization | CodeCode Available | 2 | 5 |
| Segment Any Anomaly without Training via Hybrid Prompt Regularization | May 18, 2023 | Anomaly DetectionAnomaly Localization | CodeCode Available | 2 | 5 |
| RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation | Jun 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Dec 17, 2024 | Denoising | CodeCode Available | 2 | 5 |
| RecGPT: A Foundation Model for Sequential Recommendation | Jun 6, 2025 | Decodermodel | CodeCode Available | 2 | 5 |
| Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation | Dec 20, 2023 | Robot ManipulationZero-shot Generalization | CodeCode Available | 2 | 5 |
| Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning | Feb 4, 2024 | Contact-rich ManipulationZero-shot Generalization | CodeCode Available | 2 | 5 |
| InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions | Jan 24, 2024 | document understandingQuestion Answering | CodeCode Available | 2 | 5 |
| PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | Sep 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 | 5 |
| On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? | May 3, 2024 | Computational EfficiencyPrompt Learning | CodeCode Available | 2 | 5 |
| No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | Apr 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 | 5 |
| OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction | Aug 16, 2024 | PredictionTraffic Prediction | CodeCode Available | 2 | 5 |
| Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Mar 28, 2025 | DescriptiveImage Quality Assessment | CodeCode Available | 2 | 5 |
| Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery | Apr 3, 2025 | Field Boundary DelineationInstance Segmentation | CodeCode Available | 2 | 5 |
| Multitask Prompted Training Enables Zero-Shot Task Generalization | Oct 15, 2021 | BenchmarkingDecoder | CodeCode Available | 2 | 5 |
| Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation | Jul 3, 2024 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 2 | 5 |
| NeRF-Supervised Deep Stereo | Mar 30, 2023 | NeRFNeural Rendering | CodeCode Available | 2 | 5 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 | 5 |
| Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement | Oct 15, 2024 | DisentanglementInductive Bias | CodeCode Available | 2 | 5 |
| Detecting Everything in the Open World: Towards Universal Object Detection | Mar 21, 2023 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS | Sep 9, 2024 | DenoisingSpeech Enhancement | CodeCode Available | 2 | 5 |
| Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Nov 26, 2024 | GPUImage Generation | CodeCode Available | 2 | 5 |
| Matryoshka Diffusion Models | Oct 23, 2023 | Image GenerationZero-shot Generalization | CodeCode Available | 2 | 5 |
| Learning to Route Among Specialized Experts for Zero-Shot Generalization | Feb 8, 2024 | parameter-efficient fine-tuningZero-shot Generalization | CodeCode Available | 2 | 5 |
| Autoregressive Image Generation with Randomized Parallel Decoding | Mar 13, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 2 | 5 |
| Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | May 26, 2025 | Zero-shot Generalization | CodeCode Available | 2 | 5 |
| Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents | Apr 19, 2023 | Information RetrievalPassage Ranking | CodeCode Available | 2 | 5 |
| Crosslingual Generalization through Multitask Finetuning | Nov 3, 2022 | Coreference ResolutionCross-Lingual Transfer | CodeCode Available | 2 | 5 |
| Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter | Mar 12, 2025 | Zero-shot Generalization | CodeCode Available | 2 | 5 |
| EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce | Aug 14, 2023 | DiversityInstruction Following | CodeCode Available | 2 | 5 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 | 5 |
| vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation | Nov 26, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 2 | 5 |
| Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning | Jan 19, 2021 | reinforcement-learningReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation | Oct 6, 2021 | Image GenerationText to Image Generation | CodeCode Available | 1 | 5 |
| LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation Models | Feb 6, 2025 | zero-shot-classificationZero-shot Generalization | CodeCode Available | 1 | 5 |
| CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers | Apr 9, 2024 | Knowledge DistillationZero-shot Generalization | CodeCode Available | 1 | 5 |
| How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation | Dec 12, 2023 | Anomaly DetectionAutonomous Driving | CodeCode Available | 1 | 5 |
| M^3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation | May 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Gradient Ascent Post-training Enhances Language Model Generalization | Jun 12, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| GOMAA-Geo: GOal Modality Agnostic Active Geo-localization | Jun 4, 2024 | Contrastive Learninggeo-localization | CodeCode Available | 1 | 5 |
| MAgNet: Mesh Agnostic Neural PDE Solver | Oct 11, 2022 | Zero-shot Generalization | CodeCode Available | 1 | 5 |
| Generalization to New Actions in Reinforcement Learning | Nov 3, 2020 | reinforcement-learningReinforcement Learning | CodeCode Available | 1 | 5 |
| Digital Twin-Enhanced Wireless Indoor Navigation: Achieving Efficient Environment Sensing with Zero-Shot Reinforcement Learning | Jun 11, 2023 | Navigatereinforcement-learning | CodeCode Available | 1 | 5 |
| Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks | Oct 31, 2017 | Machine TranslationTranslation | CodeCode Available | 1 | 5 |