| IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition | Sep 18, 2024 | Imitation LearningReinforcement Learning (RL) | —Unverified | 0 |
| ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video | Sep 16, 2024 | Autonomous Drivingmotion prediction | CodeCode Available | 1 |
| Benchmarking VLMs' Reasoning About Persuasive Atypical Images | Sep 16, 2024 | BenchmarkingObject Recognition | —Unverified | 0 |
| PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | Sep 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| AnySkin: Plug-and-play Skin Sensing for Robotic Touch | Sep 12, 2024 | Zero-shot Generalization | —Unverified | 0 |
| IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS | Sep 9, 2024 | DenoisingSpeech Enhancement | CodeCode Available | 2 |
| TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs | Sep 8, 2024 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 |
| Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance | Aug 27, 2024 | Decoderobject-detection | CodeCode Available | 1 |
| GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy | Aug 26, 2024 | Few-Shot LearningImage Generation | CodeCode Available | 2 |
| Segment Anything Model for Grain Characterization in Hard Drive Design | Aug 22, 2024 | Zero-shot Generalization | —Unverified | 0 |
| Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment | Aug 22, 2024 | Multi-Task LearningRetrieval | —Unverified | 0 |
| Generalizable Facial Expression Recognition | Aug 20, 2024 | Domain AdaptationFacial Expression Recognition | CodeCode Available | 1 |
| Zero-Shot Object-Centric Representation Learning | Aug 17, 2024 | ObjectObject Discovery | —Unverified | 0 |
| OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction | Aug 16, 2024 | PredictionTraffic Prediction | CodeCode Available | 2 |
| One Shot is Enough for Sequential Infrared Small Target Segmentation | Aug 9, 2024 | One-Shot SegmentationSegmentation | CodeCode Available | 0 |
| Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation | Aug 7, 2024 | Adversarial RobustnessImage Segmentation | —Unverified | 0 |
| Visual Grounding for Object-Level Generalization in Reinforcement Learning | Aug 4, 2024 | Language ModellingObject | CodeCode Available | 1 |
| HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling | Aug 2, 2024 | DiversityZero-shot Generalization | CodeCode Available | 0 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 |
| HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | Jul 26, 2024 | Depth EstimationGPU | CodeCode Available | 2 |
| HDL-GPT: High-Quality HDL is All You Need | Jul 25, 2024 | AllCode Generation | —Unverified | 0 |
| SSTD: Stripe-Like Space Target Detection Using Single-Point Weak Supervision | Jul 25, 2024 | Pseudo LabelZero-shot Generalization | —Unverified | 0 |
| Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models | Jul 22, 2024 | Zero-shot Generalization | CodeCode Available | 1 |
| OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Jul 19, 2024 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models | Jul 18, 2024 | HallucinationLanguage Modelling | —Unverified | 0 |