| Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies | Jan 4, 2025 | Edge-computingKnowledge Distillation | CodeCode Available | 2 |
| Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers | Jan 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Navigation Variable-based Multi-objective Particle Swarm Optimization for UAV Path Planning with Kinematic Constraints | Jan 3, 2025 | Metaheuristic Optimization | CodeCode Available | 2 |
| UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery | Jan 3, 2025 | object-detectionObject Detection | CodeCode Available | 2 |
| PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping | Jan 3, 2025 | 3DGSSurface Reconstruction | CodeCode Available | 2 |
| VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment | Jan 3, 2025 | Computational EfficiencyScene Understanding | CodeCode Available | 2 |
| Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Merging Context Clustering with Visual State Space Models for Medical Image Segmentation | Jan 3, 2025 | ClusteringImage Segmentation | CodeCode Available | 2 |
| FLAME: Financial Large-Language Model Assessment and Metrics Evaluation | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | Jan 3, 2025 | 3D Human Pose EstimationMonocular 3D Human Pose Estimation | CodeCode Available | 2 |
| Metadata Conditioning Accelerates Language Model Pre-training | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Click-Calib: A Robust Extrinsic Calibration Method for Surround-View Systems | Jan 2, 2025 | | CodeCode Available | 2 |
| RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer | Jan 2, 2025 | Audio Generationtext-to-speech | CodeCode Available | 2 |
| R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization | Jan 2, 2025 | Data AugmentationVisual Localization | CodeCode Available | 2 |
| Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning | Jan 2, 2025 | ImputationRetrieval | CodeCode Available | 2 |
| KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model | Jan 2, 2025 | MTEB BenchmarkRetrieval-augmented Generation | CodeCode Available | 2 |
| FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression | Jan 1, 2025 | Descriptive | CodeCode Available | 2 |
| ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect | Jan 1, 2025 | | CodeCode Available | 2 |
| nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark | Jan 1, 2025 | BenchmarkingImage Segmentation | CodeCode Available | 2 |
| Navigating Image Restoration with VAR's Distribution Alignment Prior | Jan 1, 2025 | Image ReconstructionImage Restoration | CodeCode Available | 2 |
| LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging | Jan 1, 2025 | Lesion SegmentationSegmentation | CodeCode Available | 2 |
| Samba: A Unified Mamba-based Framework for General Salient Object Detection | Jan 1, 2025 | Mambaobject-detection | CodeCode Available | 2 |
| RORem: Training a Robust Object Remover with Human-in-the-Loop | Jan 1, 2025 | Object | CodeCode Available | 2 |
| MATCHA: Towards Matching Anything | Jan 1, 2025 | Point Tracking | CodeCode Available | 2 |
| One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency | Jan 1, 2025 | Object | CodeCode Available | 2 |