| Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior | Mar 29, 2024 | NeRF | CodeCode Available | 2 |
| FairCLIP: Harnessing Fairness in Vision-Language Learning | Mar 29, 2024 | Fairness | CodeCode Available | 2 |
| MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation | Mar 29, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 2 |
| Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Mar 29, 2024 | Pose Estimationregression | CodeCode Available | 2 |
| AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation | Mar 29, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| Efficient Modulation for Vision Networks | Mar 29, 2024 | GPU | CodeCode Available | 2 |
| SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects | Mar 29, 2024 | 3D Object Detection3D Object Detection From Monocular Images | CodeCode Available | 2 |
| DiJiang: Efficient Large Language Models through Compact Kernelization | Mar 29, 2024 | | CodeCode Available | 2 |
| Motion Inversion for Video Customization | Mar 29, 2024 | Video Generation | CodeCode Available | 2 |
| Fully Geometric Panoramic Localization | Mar 29, 2024 | Indoor LocalizationVisual Localization | CodeCode Available | 2 |
| Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | Mar 29, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning | Mar 29, 2024 | Continual LearningContinual Panoptic Segmentation | CodeCode Available | 2 |
| Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting | Mar 29, 2024 | DenoisingImage Inpainting | CodeCode Available | 2 |
| StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation | Mar 29, 2024 | Image-to-Image TranslationTranslation | CodeCode Available | 2 |
| MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning | Mar 29, 2024 | Multi-Task Learningparameter-efficient fine-tuning | CodeCode Available | 2 |
| VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis | Mar 29, 2024 | HallucinationImage Captioning | CodeCode Available | 2 |
| FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation | Mar 29, 2024 | Blind DockingDrug Discovery | CodeCode Available | 2 |
| DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | Mar 29, 2024 | ObjectVideo Instance Segmentation | CodeCode Available | 2 |
| Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Mar 29, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 2 |
| TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes | Mar 28, 2024 | 3D dense captioningDense Captioning | CodeCode Available | 2 |
| MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | Mar 28, 2024 | AI AgentMinecraft | CodeCode Available | 2 |
| Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Mar 28, 2024 | 6D Pose Estimation using RGBKeypoint Detection | CodeCode Available | 2 |
| MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation | Mar 28, 2024 | Talking Head Generation | CodeCode Available | 2 |
| Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis | Mar 28, 2024 | Change DetectionLanguage Modelling | CodeCode Available | 2 |
| Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction | Mar 28, 2024 | 3D geometry3D Reconstruction | CodeCode Available | 2 |