| Video-LLaVA: Learning United Visual Representation by Alignment Before Projection | Nov 16, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels | Feb 27, 2025 | Image ClassificationInstance Segmentation | CodeCode Available | 4 | 5 |
| Video Understanding with Large Language Models: A Survey | Dec 29, 2023 | SurveyVideo Understanding | CodeCode Available | 4 | 5 |
| Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages | Mar 26, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 | 5 |
| InstructIR: High-Quality Image Restoration Following Human Instructions | Jan 29, 2024 | DeblurringDenoising | CodeCode Available | 4 | 5 |
| Lightweight Pixel Difference Networks for Efficient Visual Representation Learning | Feb 1, 2024 | Edge DetectionObject Recognition | CodeCode Available | 4 | 5 |
| AlphaFold Meets Flow Matching for Generating Protein Ensembles | Feb 7, 2024 | Diversity | CodeCode Available | 4 | 5 |
| ScreenAgent: A Vision Language Model-driven Computer Control Agent | Feb 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| 2D Matryoshka Sentence Embeddings | Feb 22, 2024 | RAGRepresentation Learning | CodeCode Available | 4 | 5 |
| The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark | Apr 3, 2024 | EEGMotor Imagery | CodeCode Available | 4 | 5 |
| 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors | Mar 4, 2024 | 3D GenerationText to 3D | CodeCode Available | 4 | 5 |
| SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting | Mar 8, 2024 | GPU | CodeCode Available | 4 | 5 |
| An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | Mar 11, 2024 | Computational EfficiencyVideo Understanding | CodeCode Available | 4 | 5 |
| Long-CLIP: Unlocking the Long-Text Capability of CLIP | Mar 22, 2024 | Image GenerationImage Retrieval | CodeCode Available | 4 | 5 |
| RaDe-GS: Rasterizing Depth in Gaussian Splatting | Jun 3, 2024 | Computational EfficiencyNovel View Synthesis | CodeCode Available | 4 | 5 |
| One-Step Effective Diffusion Network for Real-World Image Super-Resolution | Jun 12, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 4 | 5 |
| On Scaling Up 3D Gaussian Splatting Training | Jun 26, 2024 | 3DGS3D Reconstruction | CodeCode Available | 4 | 5 |
| DiffusionDet: Diffusion Model for Object Detection | Nov 17, 2022 | Denoisingmodel | CodeCode Available | 4 | 5 |
| Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction | Oct 1, 2024 | Predictionregression | CodeCode Available | 4 | 5 |
| LAMBDA: A Large Model Based Data Agent | Jul 24, 2024 | model | CodeCode Available | 4 | 5 |
| Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Aug 14, 2024 | Continual LearningFew-Shot Learning | CodeCode Available | 4 | 5 |
| StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation | Sep 19, 2024 | Image GenerationPersonalized Image Generation | CodeCode Available | 4 | 5 |
| Parameter Efficient Instruction Tuning: An Empirical Study | Nov 25, 2024 | Instruction FollowingMemorization | CodeCode Available | 4 | 5 |
| Identity-Preserving Text-to-Video Generation by Frequency Decomposition | Nov 26, 2024 | Human-Domain Subject-to-VideoImage to Video Generation | CodeCode Available | 4 | 5 |
| You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Dec 9, 2024 | 3D Generation3D geometry | CodeCode Available | 4 | 5 |
| Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Dec 18, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 4 | 5 |
| CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 4 | 5 |
| Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM | Feb 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| Highly Accurate Dichotomous Image Segmentation | Mar 6, 2022 | 2k3D Reconstruction | CodeCode Available | 4 | 5 |
| Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator | Feb 26, 2025 | Depth EstimationDiversity | CodeCode Available | 4 | 5 |
| MIMIC-IT: Multi-Modal In-Context Instruction Tuning | Jun 8, 2023 | In-Context LearningVisual Question Answering | CodeCode Available | 4 | 5 |
| ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Jul 8, 2024 | multimodal generationText Generation | CodeCode Available | 4 | 5 |
| Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Mar 20, 2025 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 4 | 5 |
| OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model | Mar 30, 2025 | Autonomous DrivingDecision Making | CodeCode Available | 4 | 5 |
| LSKNet: A Foundation Lightweight Backbone for Remote Sensing | Mar 18, 2024 | Change Detectionobject-detection | CodeCode Available | 4 | 5 |
| Reflexion: Language Agents with Verbal Reinforcement Learning | Mar 20, 2023 | Decision MakingHumanEval | CodeCode Available | 4 | 5 |
| EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Aug 21, 2024 | 3D Instance SegmentationGPU | CodeCode Available | 4 | 5 |
| Ming-Omni: A Unified Multimodal Model for Perception and Generation | Jun 11, 2025 | Image Generationtext-to-speech | CodeCode Available | 4 | 5 |
| Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models | May 23, 2025 | | CodeCode Available | 4 | 5 |
| Enhance-A-Video: Better Generated Video for Free | Feb 11, 2025 | Video Generation | CodeCode Available | 4 | 5 |
| OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit | May 12, 2025 | GPUPrivacy Preserving | CodeCode Available | 4 | 5 |
| Token Merging for Fast Stable Diffusion | Mar 30, 2023 | Image Generation | CodeCode Available | 4 | 5 |
| Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion | Jan 31, 2024 | | CodeCode Available | 4 | 5 |
| MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT | Feb 26, 2024 | | CodeCode Available | 4 | 5 |
| PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice | Jun 11, 2024 | NetHackreinforcement-learning | CodeCode Available | 4 | 5 |
| Latent Swap Joint Diffusion for 2D Long-Form Latent Generation | Feb 7, 2025 | Audio GenerationDenoising | CodeCode Available | 4 | 5 |
| Elucidating the Design Space of Diffusion-Based Generative Models | Jun 1, 2022 | Image Generation | CodeCode Available | 4 | 5 |
| Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | Jun 9, 2022 | Common Sense ReasoningMath | CodeCode Available | 4 | 5 |
| BitNet a4.8: 4-bit Activations for 1-bit LLMs | Nov 7, 2024 | Quantization | CodeCode Available | 4 | 5 |
| A Survey on Vision-Language-Action Models for Embodied AI | May 23, 2024 | Image CaptioningInstruction Following | CodeCode Available | 4 | 5 |