| FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation | Mar 29, 2024 | Blind DockingDrug Discovery | CodeCode Available | 2 |
| MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation | Mar 29, 2024 | Image SegmentationMedical Image Analysis | CodeCode Available | 2 |
| Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Mar 29, 2024 | Pose Estimationregression | CodeCode Available | 2 |
| FairCLIP: Harnessing Fairness in Vision-Language Learning | Mar 29, 2024 | Fairness | CodeCode Available | 2 |
| VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis | Mar 29, 2024 | HallucinationImage Captioning | CodeCode Available | 2 |
| Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | Mar 29, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation | Mar 29, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| SceneTracker: Long-term Scene Flow Estimation Network | Mar 29, 2024 | 3D Object TrackingObject Tracking | CodeCode Available | 2 |
| Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior | Mar 29, 2024 | NeRF | CodeCode Available | 2 |
| DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | Mar 29, 2024 | ObjectVideo Instance Segmentation | CodeCode Available | 2 |
| Efficient Modulation for Vision Networks | Mar 29, 2024 | GPU | CodeCode Available | 2 |
| Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting | Mar 29, 2024 | DenoisingImage Inpainting | CodeCode Available | 2 |
| Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Mar 29, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 2 |
| StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation | Mar 29, 2024 | Image-to-Image TranslationTranslation | CodeCode Available | 2 |
| ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning | Mar 29, 2024 | Continual LearningContinual Panoptic Segmentation | CodeCode Available | 2 |
| Fully Geometric Panoramic Localization | Mar 29, 2024 | Indoor LocalizationVisual Localization | CodeCode Available | 2 |
| MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning | Mar 29, 2024 | Multi-Task Learningparameter-efficient fine-tuning | CodeCode Available | 2 |
| Motion Inversion for Video Customization | Mar 29, 2024 | Video Generation | CodeCode Available | 2 |
| DiJiang: Efficient Large Language Models through Compact Kernelization | Mar 29, 2024 | | CodeCode Available | 2 |
| Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis | Mar 28, 2024 | Change DetectionLanguage Modelling | CodeCode Available | 2 |
| MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation | Mar 28, 2024 | Talking Head Generation | CodeCode Available | 2 |
| DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | Mar 28, 2024 | Fine-Grained Image ClassificationImage Classification | CodeCode Available | 2 |
| Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction | Mar 28, 2024 | 3D geometry3D Reconstruction | CodeCode Available | 2 |
| A Review of Graph Neural Networks in Epidemic Modeling | Mar 28, 2024 | Epidemiology | CodeCode Available | 2 |
| GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM | Mar 28, 2024 | Simultaneous Localization and Mapping | CodeCode Available | 2 |
| RecDiffusion: Rectangling for Image Stitching with Diffusion Models | Mar 28, 2024 | Image Stitching | CodeCode Available | 2 |
| Infrared Small Target Detection with Scale and Location Sensitivity | Mar 28, 2024 | Sensitivity | CodeCode Available | 2 |
| Disentangling Length from Quality in Direct Preference Optimization | Mar 28, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 |
| Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM | Mar 28, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes | Mar 28, 2024 | 3D dense captioningDense Captioning | CodeCode Available | 2 |
| Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Mar 28, 2024 | 6D Pose Estimation using RGBKeypoint Detection | CodeCode Available | 2 |
| Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving | Mar 28, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 2 |
| GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving | Mar 28, 2024 | Autonomous Driving | CodeCode Available | 2 |
| BAMM: Bidirectional Autoregressive Motion Model | Mar 28, 2024 | Denoisingmodel | CodeCode Available | 2 |
| SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing | Mar 28, 2024 | | CodeCode Available | 2 |
| OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation | Mar 28, 2024 | 3D Object DetectionNovel Class Discovery | CodeCode Available | 2 |
| MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | Mar 28, 2024 | AI AgentMinecraft | CodeCode Available | 2 |
| Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction | Mar 27, 2024 | 3D Generation3DGS | CodeCode Available | 2 |
| A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint | Mar 27, 2024 | Image DehazingPseudo Label | CodeCode Available | 2 |
| Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes | Mar 27, 2024 | Grasp Generation | CodeCode Available | 2 |
| An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM | Mar 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| IDGenRec: LLM-RecSys Alignment with Textual ID Learning | Mar 27, 2024 | Sequential RecommendationText Generation | CodeCode Available | 2 |
| Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding | Mar 27, 2024 | AttributeDecision Making | CodeCode Available | 2 |
| Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction | Mar 27, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 2 |
| Attention Calibration for Disentangled Text-to-Image Personalization | Mar 27, 2024 | Image GenerationNovel Concepts | CodeCode Available | 2 |
| Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation | Mar 27, 2024 | MambaSpeech Separation | CodeCode Available | 2 |
| Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding | Mar 27, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| LITA: Language Instructed Temporal-Localization Assistant | Mar 27, 2024 | Instruction FollowingTemporal Localization | CodeCode Available | 2 |
| Generative Medical Segmentation | Mar 27, 2024 | DecoderDomain Generalization | CodeCode Available | 2 |
| Garment3DGen: 3D Garment Stylization and Texture Generation | Mar 27, 2024 | Image to 3DTexture Synthesis | CodeCode Available | 2 |