| IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition | Sep 18, 2024 | Imitation LearningReinforcement Learning (RL) | —Unverified | 0 |
| ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video | Sep 16, 2024 | Autonomous Drivingmotion prediction | CodeCode Available | 1 |
| Benchmarking VLMs' Reasoning About Persuasive Atypical Images | Sep 16, 2024 | BenchmarkingObject Recognition | —Unverified | 0 |
| PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | Sep 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| AnySkin: Plug-and-play Skin Sensing for Robotic Touch | Sep 12, 2024 | Zero-shot Generalization | —Unverified | 0 |
| IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS | Sep 9, 2024 | DenoisingSpeech Enhancement | CodeCode Available | 2 |
| TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs | Sep 8, 2024 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 |
| Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance | Aug 27, 2024 | Decoderobject-detection | CodeCode Available | 1 |
| GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy | Aug 26, 2024 | Few-Shot LearningImage Generation | CodeCode Available | 2 |
| Segment Anything Model for Grain Characterization in Hard Drive Design | Aug 22, 2024 | Zero-shot Generalization | —Unverified | 0 |
| Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment | Aug 22, 2024 | Multi-Task LearningRetrieval | —Unverified | 0 |
| Generalizable Facial Expression Recognition | Aug 20, 2024 | Domain AdaptationFacial Expression Recognition | CodeCode Available | 1 |
| Zero-Shot Object-Centric Representation Learning | Aug 17, 2024 | ObjectObject Discovery | —Unverified | 0 |
| OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction | Aug 16, 2024 | PredictionTraffic Prediction | CodeCode Available | 2 |
| One Shot is Enough for Sequential Infrared Small Target Segmentation | Aug 9, 2024 | One-Shot SegmentationSegmentation | CodeCode Available | 0 |
| Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation | Aug 7, 2024 | Adversarial RobustnessImage Segmentation | —Unverified | 0 |
| Visual Grounding for Object-Level Generalization in Reinforcement Learning | Aug 4, 2024 | Language ModellingObject | CodeCode Available | 1 |
| HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling | Aug 2, 2024 | DiversityZero-shot Generalization | CodeCode Available | 0 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 |
| HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | Jul 26, 2024 | Depth EstimationGPU | CodeCode Available | 2 |
| HDL-GPT: High-Quality HDL is All You Need | Jul 25, 2024 | AllCode Generation | —Unverified | 0 |
| SSTD: Stripe-Like Space Target Detection Using Single-Point Weak Supervision | Jul 25, 2024 | Pseudo LabelZero-shot Generalization | —Unverified | 0 |
| Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models | Jul 22, 2024 | Zero-shot Generalization | CodeCode Available | 1 |
| OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Jul 19, 2024 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models | Jul 18, 2024 | HallucinationLanguage Modelling | —Unverified | 0 |
| Disentangling Representations through Multi-task Learning | Jul 15, 2024 | Decision MakingMulti-Task Learning | —Unverified | 0 |
| ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video | Jul 13, 2024 | Autonomous DrivingMotion Estimation | CodeCode Available | 1 |
| Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting | Jul 12, 2024 | Autonomous DrivingDeep Learning | —Unverified | 0 |
| Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Jul 11, 2024 | Anomaly DetectionAutonomous Vehicles | —Unverified | 0 |
| Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization | Jul 11, 2024 | Data AugmentationDomain Generalization | —Unverified | 0 |
| Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search | Jul 10, 2024 | Few-Shot LearningGPU | CodeCode Available | 0 |
| Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation | Jul 10, 2024 | Instance SegmentationSemantic Segmentation | CodeCode Available | 1 |
| Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation | Jul 3, 2024 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 2 |
| Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval | Jul 1, 2024 | cross-modal alignmentImage Retrieval | —Unverified | 0 |
| A Two-stage Reinforcement Learning-based Approach for Multi-entity Task Allocation | Jun 29, 2024 | Combinatorial Optimizationreinforcement-learning | CodeCode Available | 1 |
| RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation | Jun 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| NeuralSCF: Neural network self-consistent fields for density functional theory | Jun 22, 2024 | Zero-shot Generalization | —Unverified | 0 |
| GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Jun 18, 2024 | BenchmarkingDepth Estimation | CodeCode Available | 2 |
| Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers | Jun 17, 2024 | Motion ForecastingZero-shot Generalization | —Unverified | 0 |
| Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity | Jun 17, 2024 | Continual LearningZero-shot Generalization | CodeCode Available | 0 |
| RobustSAM: Segment Anything Robustly on Degraded Images | Jun 13, 2024 | DeblurringImage Dehazing | CodeCode Available | 3 |
| Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning | Jun 13, 2024 | Zero-shot Generalization | CodeCode Available | 0 |
| Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models | Jun 5, 2024 | Few-Shot LearningLanguage Modeling | CodeCode Available | 2 |
| Prompt-based Visual Alignment for Zero-shot Policy Transfer | Jun 5, 2024 | Autonomous DrivingLanguage Modelling | —Unverified | 0 |
| GOMAA-Geo: GOal Modality Agnostic Active Geo-localization | Jun 4, 2024 | Contrastive Learninggeo-localization | CodeCode Available | 1 |
| OLIVE: Object Level In-Context Visual Embeddings | Jun 2, 2024 | ObjectZero-shot Generalization | CodeCode Available | 0 |
| μLO: Compute-Efficient Meta-Generalization of Learned Optimizers | May 31, 2024 | GPUZero-shot Generalization | CodeCode Available | 1 |
| Text-only Synthesis for Image Captioning | May 28, 2024 | Image CaptioningLanguage Modelling | —Unverified | 0 |
| TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability | May 27, 2024 | Adversarial RobustnessKnowledge Distillation | —Unverified | 0 |
| Benchmarking General-Purpose In-Context Learning | May 27, 2024 | BenchmarkingDecision Making | —Unverified | 0 |