| Point Segment and Count: A Generalized Framework for Object Counting | Jan 1, 2024 | Few-shot Object Counting and DetectionKnowledge Distillation | CodeCode Available | 2 |
| FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models | Jan 1, 2024 | DecoderDenoising | CodeCode Available | 2 |
| Real-World Mobile Image Denoising Dataset with Efficient Baselines | Jan 1, 2024 | DenoisingImage Denoising | CodeCode Available | 2 |
| MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning | Jan 1, 2024 | Multi-Task Learningparameter-efficient fine-tuning | CodeCode Available | 2 |
| Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models | Jan 1, 2024 | Survey | CodeCode Available | 2 |
| FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation | Jan 1, 2024 | Action SegmentationSegmentation | CodeCode Available | 2 |
| Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment | Jan 1, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 2 |
| Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring | Jan 1, 2024 | Deblurring | CodeCode Available | 2 |
| The More You See in 2D the More You Perceive in 3D | Jan 1, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 2 |
| Day-Night Cross-domain Vehicle Re-identification | Jan 1, 2024 | Vehicle Re-Identification | CodeCode Available | 2 |
| A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark | Jan 1, 2024 | Age EstimationBenchmarking | CodeCode Available | 2 |
| An Empirical Study of Scaling Law for Scene Text Recognition | Jan 1, 2024 | Optical Character Recognition (OCR)Scene Text Recognition | CodeCode Available | 2 |
| MMA: Multi-Modal Adapter for Vision-Language Models | Jan 1, 2024 | Domain GeneralizationGeneral Knowledge | CodeCode Available | 2 |
| BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition | Jan 1, 2024 | Action RecognitionSkeleton Based Action Recognition | CodeCode Available | 2 |
| SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency | Jan 1, 2024 | Dynamic Time Warping | CodeCode Available | 2 |
| Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases | Jan 1, 2024 | | CodeCode Available | 2 |
| DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement | Jan 1, 2024 | DiversityScene Flow Estimation | CodeCode Available | 2 |
| MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation | Jan 1, 2024 | SegmentationVideo Segmentation | CodeCode Available | 2 |
| Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens | Jan 1, 2024 | Semantic Segmentation | CodeCode Available | 2 |
| CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution | Jan 1, 2024 | DiversityImage Super-Resolution | CodeCode Available | 2 |
| Scaled Decoupled Distillation | Jan 1, 2024 | Knowledge Distillation | CodeCode Available | 2 |
| ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios | Jan 1, 2024 | | CodeCode Available | 2 |
| Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly | Jan 1, 2024 | Anomaly Detection | CodeCode Available | 2 |
| Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion | Jan 1, 2024 | Infrared And Visible Image Fusion | CodeCode Available | 2 |
| Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation | Jan 1, 2024 | DescriptiveObject | CodeCode Available | 2 |
| LiSA: LiDAR Localization with Semantic Awareness | Jan 1, 2024 | Knowledge DistillationSemantic Segmentation | CodeCode Available | 2 |
| Exploring Orthogonality in Open World Object Detection | Jan 1, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval | Jan 1, 2024 | Image RetrievalRetrieval | CodeCode Available | 2 |
| When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation | Jan 1, 2024 | AttributeDisentanglement | CodeCode Available | 2 |
| Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding | Jan 1, 2024 | Attribute | CodeCode Available | 2 |
| ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention | Jan 1, 2024 | Blocking | CodeCode Available | 2 |
| Exposure Bracketing Is All You Need For A High-Quality Image | Jan 1, 2024 | AllDeblurring | CodeCode Available | 2 |
| MRFS: Mutually Reinforcing Image Fusion and Segmentation | Jan 1, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation | Jan 1, 2024 | General KnowledgeNavigate | CodeCode Available | 2 |
| DiffLoc: Diffusion Model for Outdoor LiDAR Localization | Jan 1, 2024 | Denoisingmodel | CodeCode Available | 2 |
| MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting | Dec 31, 2023 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 2 |
| Masked Modeling for Self-supervised Representation Learning on Vision and Beyond | Dec 31, 2023 | Representation LearningSelf-Supervised Learning | CodeCode Available | 2 |
| RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models | Dec 31, 2023 | HallucinationRAG | CodeCode Available | 2 |
| Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution | Dec 30, 2023 | DecoderImage Generation | CodeCode Available | 2 |
| Visual Point Cloud Forecasting enables Scalable Autonomous Driving | Dec 29, 2023 | 3D geometryAutonomous Driving | CodeCode Available | 2 |
| MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining | Dec 29, 2023 | GPULanguage Modeling | CodeCode Available | 2 |
| Overview of the PromptCBLUE Shared Task in CHIP2023 | Dec 29, 2023 | In-Context Learning | CodeCode Available | 2 |
| SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation | Dec 28, 2023 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action | Dec 28, 2023 | DecoderImage Generation | CodeCode Available | 2 |
| SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | Dec 28, 2023 | Pose EstimationVisual Odometry | CodeCode Available | 2 |
| Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels | Dec 28, 2023 | Aesthetics Quality AssessmentImage Quality Assessment | CodeCode Available | 2 |
| One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts | Dec 28, 2023 | AllAnatomy | CodeCode Available | 2 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe | Dec 28, 2023 | ObjectObject Tracking | CodeCode Available | 2 |
| Any-point Trajectory Modeling for Policy Learning | Dec 28, 2023 | Trajectory ModelingTransfer Learning | CodeCode Available | 2 |