| SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization | Jan 1, 2025 | Domain GeneralizationHomography Estimation | CodeCode Available | 1 |
| D^3: Scaling Up Deepfake Detection by Learning from Discrepancy | Jan 1, 2025 | DeepFake DetectionFace Swapping | CodeCode Available | 1 |
| Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval | Jan 1, 2025 | Cross-Modal RetrievalRetrieval | CodeCode Available | 1 |
| Adapting Dense Matching for Homography Estimation with Grid-based Acceleration | Jan 1, 2025 | Homography Estimation | CodeCode Available | 1 |
| FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection | Jan 1, 2025 | | CodeCode Available | 1 |
| MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation | Jan 1, 2025 | Action AnticipationMamba | CodeCode Available | 1 |
| Exploring Contextual Attribute Density in Referring Expression Counting | Jan 1, 2025 | AttributeReferring Expression | CodeCode Available | 1 |
| SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis | Jan 1, 2025 | Large Language Model | CodeCode Available | 1 |
| AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning | Jan 1, 2025 | Self-Supervised Learning | CodeCode Available | 1 |
| VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification | Jan 1, 2025 | Hallucination | CodeCode Available | 1 |
| FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation | Jan 1, 2025 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 1 |
| Population Aware Diffusion for Time Series Generation | Jan 1, 2025 | Time SeriesTime Series Generation | CodeCode Available | 1 |
| Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection | Jan 1, 2025 | parameter estimation | CodeCode Available | 1 |
| UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts | Jan 1, 2025 | DeblurringDenoising | CodeCode Available | 1 |
| Free Lunch Enhancements for Multi-modal Crowd Counting | Jan 1, 2025 | cross-modal alignmentCrowd Counting | CodeCode Available | 1 |
| Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding | Jan 1, 2025 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| Multimodal Large Models Are Effective Action Anticipators | Jan 1, 2025 | Action AnticipationLong Term Action Anticipation | CodeCode Available | 1 |
| PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention | Jan 1, 2025 | Intrinsic Image Decomposition | CodeCode Available | 1 |
| PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers | Jan 1, 2025 | Autonomous DrivingPose Estimation | CodeCode Available | 1 |
| Docopilot: Improving Multimodal Models for Document-Level Understanding | Jan 1, 2025 | document understandingRAG | CodeCode Available | 1 |
| Less is More: Token Context-aware Learning for Object Tracking | Jan 1, 2025 | Object TrackingVisual Tracking | CodeCode Available | 1 |
| VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond | Jan 1, 2025 | Hyperspectral Image Super-ResolutionImage Restoration | CodeCode Available | 1 |
| Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation | Jan 1, 2025 | | CodeCode Available | 1 |
| OW-OVD: Unified Open World and Open Vocabulary Object Detection | Jan 1, 2025 | AttributeIncremental Learning | CodeCode Available | 1 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement | Jan 1, 2025 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 |
| CSC-PA: Cross-image Semantic Correlation via Prototype Attentions for Single-network Semi-supervised Breast Tumor Segmentation | Jan 1, 2025 | Image SegmentationLesion Segmentation | CodeCode Available | 1 |
| T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting | Jan 1, 2025 | DenoisingObject Counting | CodeCode Available | 1 |
| Implicit Correspondence Learning for Image-to-Point Cloud Registration | Jan 1, 2025 | Image to Point Cloud RegistrationPoint Cloud Registration | CodeCode Available | 1 |
| Column Property Annotation using Large Language Models | Jan 1, 2025 | Columns Property AnnotationColumn Type Annotation | CodeCode Available | 1 |
| ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-based Few-shot Learning | Jan 1, 2025 | Few-Shot LearningImage Generation | CodeCode Available | 1 |
| A Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts | Jan 1, 2025 | Image RestorationRain Removal | CodeCode Available | 1 |
| Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling | Jan 1, 2025 | DenoisingImage Denoising | CodeCode Available | 1 |
| Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration | Jan 1, 2025 | class-incremental learningClass Incremental Learning | CodeCode Available | 1 |
| LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs | Jan 1, 2025 | In-Context LearningMeta-Learning | CodeCode Available | 1 |
| Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration | Jan 1, 2025 | MambaVideo Restoration | CodeCode Available | 1 |
| Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation | Jan 1, 2025 | DenoisingTest-time Adaptation | CodeCode Available | 1 |
| Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning | Jan 1, 2025 | cross-modal alignmentDenoising | CodeCode Available | 1 |
| Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images | Jan 1, 2025 | Decoder | CodeCode Available | 1 |
| DV-Matcher: Deformation-based Non-rigid Point Cloud Matching Guided by Pre-trained Visual Features | Jan 1, 2025 | | CodeCode Available | 1 |
| On the Implementation of a Bayesian Optimization Framework for Interconnected Systems | Jan 1, 2025 | Bayesian OptimizationChemical Process | CodeCode Available | 1 |
| All-Day Multi-Camera Multi-Target Tracking | Jan 1, 2025 | AllMamba | CodeCode Available | 1 |
| LC-Mamba: Local and Continuous Mamba with Shifted Windows for Frame Interpolation | Jan 1, 2025 | Mamba | CodeCode Available | 1 |
| GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector | Jan 1, 2025 | 3D Object DetectionNeRF | CodeCode Available | 1 |
| Diffusion-based Event Generation for High-Quality Image Deblurring | Jan 1, 2025 | DeblurringImage Deblurring | CodeCode Available | 1 |
| Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater | Jan 1, 2025 | Deep Reinforcement LearningSegmentation | CodeCode Available | 1 |
| Learning to Filter Outlier Edges in Global SfM | Jan 1, 2025 | Binary ClassificationCamera Pose Estimation | CodeCode Available | 1 |
| HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration | Jan 1, 2025 | Point Cloud Registration | CodeCode Available | 1 |
| AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting | Jan 1, 2025 | Disentanglementmodel | CodeCode Available | 1 |
| LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate | Jan 1, 2025 | Image Restoration | CodeCode Available | 1 |
| Deep Change Monitoring: A Hyperbolic Representative Learning Framework and a Dataset for Long-term Fine-grained Tree Change Detection | Jan 1, 2025 | Change DetectionFace Anti-Spoofing | CodeCode Available | 1 |