| Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Nov 26, 2024 | GPUImage Generation | CodeCode Available | 2 |
| WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Nov 8, 2024 | Task PlanningZero-shot Generalization | CodeCode Available | 2 |
| BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities | Oct 18, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement | Oct 15, 2024 | DisentanglementInductive Bias | CodeCode Available | 2 |
| PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | Sep 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS | Sep 9, 2024 | DenoisingSpeech Enhancement | CodeCode Available | 2 |
| GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy | Aug 26, 2024 | Few-Shot LearningImage Generation | CodeCode Available | 2 |
| OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction | Aug 16, 2024 | PredictionTraffic Prediction | CodeCode Available | 2 |
| HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | Jul 26, 2024 | Depth EstimationGPU | CodeCode Available | 2 |
| Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation | Jul 3, 2024 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 2 |
| RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation | Jun 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Jun 18, 2024 | BenchmarkingDepth Estimation | CodeCode Available | 2 |
| Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models | Jun 5, 2024 | Few-Shot LearningLanguage Modeling | CodeCode Available | 2 |
| On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? | May 3, 2024 | Computational EfficiencyPrompt Learning | CodeCode Available | 2 |
| GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis | Apr 9, 2024 | Image GenerationZero-shot Generalization | CodeCode Available | 2 |
| No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | Apr 4, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 |
| Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning | Apr 4, 2024 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 2 |
| Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model | Mar 17, 2024 | Image RestorationZero-shot Generalization | CodeCode Available | 2 |
| RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model | Mar 12, 2024 | Change DetectionZero-shot Generalization | CodeCode Available | 2 |
| Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV | Mar 3, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| Learning to Route Among Specialized Experts for Zero-Shot Generalization | Feb 8, 2024 | parameter-efficient fine-tuningZero-shot Generalization | CodeCode Available | 2 |
| Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning | Feb 4, 2024 | Contact-rich ManipulationZero-shot Generalization | CodeCode Available | 2 |
| InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions | Jan 24, 2024 | document understandingQuestion Answering | CodeCode Available | 2 |
| Semantic Guidance Tuning for Text-To-Image Diffusion Models | Dec 26, 2023 | Zero-shot Generalization | CodeCode Available | 2 |
| Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation | Dec 20, 2023 | Robot ManipulationZero-shot Generalization | CodeCode Available | 2 |