| XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM | Oct 31, 2024 | 3DGSBenchmarking | CodeCode Available | 3 |
| UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation | Mar 29, 2024 | Image SegmentationLesion Segmentation | CodeCode Available | 3 |
| Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning | Mar 8, 2025 | Reranking | CodeCode Available | 3 |
| Pipeline Gradient-based Model Training on Analog In-memory Accelerators | Oct 19, 2024 | | CodeCode Available | 3 |
| General Geospatial Inference with a Population Dynamics Foundation Model | Nov 11, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 3 |
| Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Dec 5, 2024 | Contrastive LearningHallucination | CodeCode Available | 3 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 |
| PuzzleAvatar: Assembling 3D Avatars from Personal Albums | May 23, 2024 | Language ModellingText to 3D | CodeCode Available | 3 |
| GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering | Feb 15, 2024 | 3D ReconstructionNovel View Synthesis | CodeCode Available | 3 |
| Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | Jan 1, 2024 | Domain GeneralizationSemantic Segmentation | CodeCode Available | 3 |
| Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | May 4, 2025 | Mixture-of-ExpertsNeRF | CodeCode Available | 3 |
| Self-Refine: Iterative Refinement with Self-Feedback | Mar 30, 2023 | Mathematical ReasoningResponse Generation | CodeCode Available | 3 |
| Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Mar 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems | May 22, 2025 | | CodeCode Available | 3 |
| LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry | Jan 3, 2024 | Point TrackingVisual Odometry | CodeCode Available | 3 |
| Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey | Feb 8, 2024 | ArticlesEntity Alignment | CodeCode Available | 3 |
| A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation | Aug 20, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 3 |
| Score-Guided Diffusion for 3D Human Recovery | Mar 14, 2024 | DenoisingHuman Mesh Recovery | CodeCode Available | 3 |
| R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning | May 5, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 3 |
| A Survey on the Optimization of Large Language Model-based Agents | Mar 16, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 3 |
| SOAP: Style-Omniscient Animatable Portraits | May 8, 2025 | Image to 3D | CodeCode Available | 3 |
| wgatools: an ultrafast toolkit for manipulating whole genome alignments | Sep 13, 2024 | | CodeCode Available | 3 |
| Detecting Twenty-thousand Classes using Image-level Supervision | Jan 7, 2022 | Cross-Domain Few-Shot Object Detectionimage-classification | CodeCode Available | 3 |
| VidTok: A Versatile and Open-Source Video Tokenizer | Dec 17, 2024 | QuantizationSSIM | CodeCode Available | 3 |
| Transformers Can Do Arithmetic with the Right Embeddings | May 27, 2024 | GPUPosition | CodeCode Available | 3 |
| StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On | Dec 4, 2023 | Semantic correspondenceVirtual Try-on | CodeCode Available | 3 |
| A General Framework for Inference-time Scaling and Steering of Diffusion Models | Jan 12, 2025 | Protein Design | CodeCode Available | 3 |
| Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis | Oct 10, 2024 | Feature CompressionImage Generation | CodeCode Available | 3 |
| GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images | Mar 8, 2025 | cross-modal alignmentDiagnostic | CodeCode Available | 3 |
| Unified Source-Free Domain Adaptation | Mar 12, 2024 | Domain AdaptationLanguage Modelling | CodeCode Available | 3 |
| A Python library for efficient computation of molecular fingerprints | Mar 27, 2024 | Drug DiscoveryMolecular Property Prediction | CodeCode Available | 3 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Feb 5, 2024 | 3D Semantic SegmentationCamera Pose Estimation | CodeCode Available | 3 |
| MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization | Jan 2, 2025 | Contrastive LearningKey Detection | CodeCode Available | 3 |
| Language-Codec: Bridging Discrete Codec Representations and Speech Language Models | Feb 19, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 |
| ROLAND: Graph Learning Framework for Dynamic Graphs | Aug 15, 2022 | Graph LearningGraph Representation Learning | CodeCode Available | 3 |
| DiC: Rethinking Conv3x3 Designs in Diffusion Models | Dec 31, 2024 | Decoder | CodeCode Available | 3 |
| LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | Oct 14, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 3 |
| BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | Feb 6, 2024 | BinarizationGPU | CodeCode Available | 3 |
| MotionGPT: Human Motion as a Foreign Language | Jun 26, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly | Oct 3, 2024 | RAG | CodeCode Available | 3 |
| AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation | Mar 26, 2024 | 3D Multi-Person Mesh RecoveryAll | CodeCode Available | 3 |
| Efficient Agent Training for Computer Use | May 20, 2025 | | CodeCode Available | 3 |
| Agent Workflow Memory | Sep 11, 2024 | AI AgentLanguage Modeling | CodeCode Available | 3 |
| LaViDa: A Large Diffusion Language Model for Multimodal Understanding | May 22, 2025 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| Aquila2 Technical Report | Aug 14, 2024 | Management | CodeCode Available | 3 |
| The Flan Collection: Designing Data and Methods for Effective Instruction Tuning | Jan 31, 2023 | | CodeCode Available | 3 |
| DUFOMap: Efficient Dynamic Awareness Mapping | Mar 3, 2024 | Computational Efficiency | CodeCode Available | 3 |
| Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | May 5, 2025 | AI AgentAutomatic Speech Recognition | CodeCode Available | 3 |