| I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | May 25, 2025 | Mixture-of-Expertsmultimodal interaction | CodeCode Available | 2 |
| Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution | Dec 3, 2022 | Box-supervised Instance SegmentationDecoder | CodeCode Available | 2 |
| GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents | May 29, 2025 | | CodeCode Available | 2 |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Jul 28, 2023 | ObjectQuestion Answering | CodeCode Available | 2 |
| OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction | Mar 15, 2022 | 3D ReconstructionGraph Neural Network | CodeCode Available | 2 |
| Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset | Jul 3, 2023 | Human Mesh RecoveryMotion Generation | CodeCode Available | 2 |
| Routoo: Learning to Route to Large Language Models Effectively | Jan 25, 2024 | MMLUMulti-task Language Understanding | CodeCode Available | 2 |
| Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization | Aug 18, 2023 | | CodeCode Available | 2 |
| Objects as Points | Apr 16, 2019 | 3D Object DetectionKeypoint Detection | CodeCode Available | 2 |
| Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | Apr 11, 2024 | DecoderDense Video Captioning | CodeCode Available | 2 |
| CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities | Mar 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation | Mar 17, 2025 | Data InteractionScene Understanding | CodeCode Available | 2 |
| Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models | Apr 6, 2024 | Image GenerationUnconditional Image Generation | CodeCode Available | 2 |
| Dataset Regeneration for Sequential Recommendation | May 28, 2024 | Recommendation SystemsSequential Recommendation | CodeCode Available | 2 |
| CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models | Jun 10, 2024 | Fairness | CodeCode Available | 2 |
| M^2SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation | Mar 20, 2023 | Computed Tomography (CT)Decoder | CodeCode Available | 2 |
| VQF: Highly Accurate IMU Orientation Estimation with Bias Estimation and Magnetic Disturbance Rejection | Mar 31, 2022 | | CodeCode Available | 2 |
| REAL-Colon: A dataset for developing real-world AI applications in colonoscopy | Mar 4, 2024 | Benchmarking | CodeCode Available | 2 |
| SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization | Dec 20, 2022 | Dialogue GenerationLanguage Modeling | CodeCode Available | 2 |
| Context-Aware Video Instance Segmentation | Jul 3, 2024 | Instance SegmentationPanoptic Segmentation | CodeCode Available | 2 |
| Benchmarking Graph Neural Networks | Mar 2, 2020 | BenchmarkingGraph Classification | CodeCode Available | 2 |
| PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images | Jul 13, 2022 | 3D human pose and shape estimation3D Human Pose Estimation | CodeCode Available | 2 |
| Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering | Nov 26, 2024 | PrognosisQuestion Answering | CodeCode Available | 2 |
| MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control | Mar 18, 2024 | Instruction FollowingMinecraft | CodeCode Available | 2 |
| DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | May 31, 2024 | cross-modal alignmentVisual Localization | CodeCode Available | 2 |
| PerCo (SD): Open Perceptual Compression | Sep 30, 2024 | AttributeImage Compression | CodeCode Available | 2 |
| Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture | Mar 12, 2024 | Motion MagnificationRepresentation Learning | CodeCode Available | 2 |
| MINERVA: Evaluating Complex Video Reasoning | May 1, 2025 | BenchmarkingTemporal Localization | CodeCode Available | 2 |
| RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning | Apr 9, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 2 |
| Universal Guidance for Diffusion Models | Feb 14, 2023 | Face Recognitionobject-detection | CodeCode Available | 2 |
| Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis | Jun 15, 2023 | Image GenerationPreference Mapping | CodeCode Available | 2 |
| GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech | May 15, 2022 | Speech SynthesisStyle Transfer | CodeCode Available | 2 |
| Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark | May 31, 2022 | Autonomous DrivingCamera Pose Estimation | CodeCode Available | 2 |
| Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule | May 12, 2025 | Drug DesignScheduling | CodeCode Available | 2 |
| Autonomous Catheterization with Open-source Simulator and Expert Trajectory | Jan 17, 2024 | | CodeCode Available | 2 |
| Data-Centric Foundation Models in Computational Healthcare: A Survey | Jan 4, 2024 | EthicsSurvey | CodeCode Available | 2 |
| BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | Aug 14, 2024 | Backdoor AttackPrompt Learning | CodeCode Available | 2 |
| LRM-Zero: Training Large Reconstruction Models with Synthesized Data | Jun 13, 2024 | 3D Reconstruction | CodeCode Available | 2 |
| DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models | Oct 9, 2023 | | CodeCode Available | 2 |
| Regional Tiny Stories: Using Small Models to Compare Language Learning and Tokenizer Performance | Apr 7, 2025 | | CodeCode Available | 2 |
| UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement | Apr 22, 2024 | 4kImage Enhancement | CodeCode Available | 2 |
| TRACE: Temporal Grounding Video LLM via Causal Event Modeling | Oct 8, 2024 | Text GenerationVideo Understanding | CodeCode Available | 2 |
| DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models | Jul 1, 2024 | DenoisingImage Restoration | CodeCode Available | 2 |
| ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention | Jan 1, 2024 | Blocking | CodeCode Available | 2 |
| A Survey on Large Language Models for Code Generation | Jun 1, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| Rethinking Optimization and Architecture for Tiny Language Models | Feb 5, 2024 | Language Modelling | CodeCode Available | 2 |
| FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model | Apr 11, 2024 | Mamba | CodeCode Available | 2 |
| Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio | Jun 28, 2023 | Language ModellingText Generation | CodeCode Available | 2 |
| VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models | May 27, 2024 | Object | CodeCode Available | 2 |
| THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models | Mar 31, 2025 | GPU | CodeCode Available | 2 |