| Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies | Jan 4, 2025 | Edge-computingKnowledge Distillation | CodeCode Available | 2 |
| Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers | Jan 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Navigation Variable-based Multi-objective Particle Swarm Optimization for UAV Path Planning with Kinematic Constraints | Jan 3, 2025 | Metaheuristic Optimization | CodeCode Available | 2 |
| TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | Jan 3, 2025 | 3D Human Pose EstimationMonocular 3D Human Pose Estimation | CodeCode Available | 2 |
| UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery | Jan 3, 2025 | object-detectionObject Detection | CodeCode Available | 2 |
| VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment | Jan 3, 2025 | Computational EfficiencyScene Understanding | CodeCode Available | 2 |
| PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping | Jan 3, 2025 | 3DGSSurface Reconstruction | CodeCode Available | 2 |
| FLAME: Financial Large-Language Model Assessment and Metrics Evaluation | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Metadata Conditioning Accelerates Language Model Pre-training | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Merging Context Clustering with Visual State Space Models for Medical Image Segmentation | Jan 3, 2025 | ClusteringImage Segmentation | CodeCode Available | 2 |
| Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning | Jan 2, 2025 | ImputationRetrieval | CodeCode Available | 2 |
| R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization | Jan 2, 2025 | Data AugmentationVisual Localization | CodeCode Available | 2 |
| KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model | Jan 2, 2025 | MTEB BenchmarkRetrieval-augmented Generation | CodeCode Available | 2 |
| Click-Calib: A Robust Extrinsic Calibration Method for Surround-View Systems | Jan 2, 2025 | | CodeCode Available | 2 |
| RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer | Jan 2, 2025 | Audio Generationtext-to-speech | CodeCode Available | 2 |
| High-Fidelity Lightweight Mesh Reconstruction from Point Clouds | Jan 1, 2025 | | CodeCode Available | 2 |
| DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution | Jan 1, 2025 | Attribute | CodeCode Available | 2 |
| Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis | Jan 1, 2025 | Visual Prompt Tuning | CodeCode Available | 2 |
| 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining | Jan 1, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 |
| nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark | Jan 1, 2025 | BenchmarkingImage Segmentation | CodeCode Available | 2 |
| Navigating Image Restoration with VAR's Distribution Alignment Prior | Jan 1, 2025 | Image ReconstructionImage Restoration | CodeCode Available | 2 |
| FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression | Jan 1, 2025 | Descriptive | CodeCode Available | 2 |
| ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect | Jan 1, 2025 | | CodeCode Available | 2 |
| One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency | Jan 1, 2025 | Object | CodeCode Available | 2 |
| Adaptive Keyframe Sampling for Long Video Understanding | Jan 1, 2025 | Video Understanding | CodeCode Available | 2 |
| MATCHA: Towards Matching Anything | Jan 1, 2025 | Point Tracking | CodeCode Available | 2 |
| MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots | Jan 1, 2025 | NeRF | CodeCode Available | 2 |
| HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver | Jan 1, 2025 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| Structure-from-Motion with a Non-Parametric Camera Model | Jan 1, 2025 | | CodeCode Available | 2 |
| AutoPresent: Designing Structured Visuals from Scratch | Jan 1, 2025 | Image Generation | CodeCode Available | 2 |
| BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer | Jan 1, 2025 | Data Augmentation | CodeCode Available | 2 |
| VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration | Jan 1, 2025 | | CodeCode Available | 2 |
| LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging | Jan 1, 2025 | Lesion SegmentationSegmentation | CodeCode Available | 2 |
| TrustRAG: Enhancing Robustness and Trustworthiness in RAG | Jan 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection | Jan 1, 2025 | Defect Detection | CodeCode Available | 2 |
| Samba: A Unified Mamba-based Framework for General Salient Object Detection | Jan 1, 2025 | Mambaobject-detection | CodeCode Available | 2 |
| Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention | Jan 1, 2025 | HallucinationResponse Generation | CodeCode Available | 2 |
| RORem: Training a Robust Object Remover with Human-in-the-Loop | Jan 1, 2025 | Object | CodeCode Available | 2 |
| RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions | Dec 31, 2024 | DiversityRAG | CodeCode Available | 2 |
| Superposition in Transformers: A Novel Way of Building Mixture of Experts | Dec 31, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| PyMilo: A Python Library for ML I/O | Dec 31, 2024 | | CodeCode Available | 2 |
| Online Video Understanding: OVBench and VideoChat-Online | Dec 31, 2024 | Autonomous DrivingQuestion Answering | CodeCode Available | 2 |
| MCP-Solver: Integrating Language Models with Constraint Programming Systems | Dec 31, 2024 | Natural Language Understanding | CodeCode Available | 2 |
| Dual Diffusion for Unified Image Generation and Understanding | Dec 31, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| Efficient Parallel Genetic Algorithm for Perturbed Substructure Optimization in Complex Network | Dec 30, 2024 | Combinatorial OptimizationGraph Mining | CodeCode Available | 2 |
| Varformer: Adapting VAR's Generative Prior for Image Restoration | Dec 30, 2024 | Image ReconstructionImage Restoration | CodeCode Available | 2 |
| YOLO-UniOW: Efficient Universal Open-World Object Detection | Dec 30, 2024 | Incremental LearningObject | CodeCode Available | 2 |
| DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition | Dec 30, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control | Dec 30, 2024 | DenoisingImage Generation | CodeCode Available | 2 |