| MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training | May 31, 2023 | Language ModellingQuantization | CodeCode Available | 2 |
| MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video | Mar 2, 2022 | 3D Human Pose EstimationClassification | CodeCode Available | 2 |
| Gradient Boosting Reinforcement Learning | Jul 11, 2024 | GPUreinforcement-learning | CodeCode Available | 2 |
| AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind | Feb 21, 2025 | Model Discovery | CodeCode Available | 2 |
| CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization | Nov 30, 2023 | 3DGSNeRF | CodeCode Available | 2 |
| Diffusion Actor-Critic with Entropy Regulator | May 24, 2024 | Decision MakingMuJoCo | CodeCode Available | 2 |
| Contextual Object Detection with Multimodal Large Language Models | May 29, 2023 | Cloze TestDecoder | CodeCode Available | 2 |
| V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding | Dec 12, 2024 | Position | CodeCode Available | 2 |
| Exploration-Driven Generative Interactive Environments | Apr 3, 2025 | | CodeCode Available | 2 |
| PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution | Nov 26, 2024 | DenoisingImage Super-Resolution | CodeCode Available | 2 |
| Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation | Apr 2, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space | May 3, 2024 | Facial Expression RecognitionFacial Expression Recognition (FER) | CodeCode Available | 2 |
| DreamGaussian4D: Generative 4D Gaussian Splatting | Dec 28, 2023 | Video Generation | CodeCode Available | 2 |
| GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks | Jun 19, 2024 | Kolmogorov-Arnold Networks | CodeCode Available | 2 |
| Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts | Mar 31, 2024 | Image SegmentationInteractive Segmentation | CodeCode Available | 2 |
| Referring to Any Person | Mar 11, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Training Language Models to Self-Correct via Reinforcement Learning | Sep 19, 2024 | HumanEvalMath | CodeCode Available | 2 |
| GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning | Feb 3, 2024 | Link PredictionNode Classification | CodeCode Available | 2 |
| Training on test proteins improves fitness, structure, and function prediction | Nov 4, 2024 | PredictionProtein Structure Prediction | CodeCode Available | 2 |
| mGPT: Few-Shot Learners Go Multilingual | Apr 15, 2022 | Cross-Lingual Natural Language InferenceCross-Lingual Paraphrase Identification | CodeCode Available | 2 |
| Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion | May 30, 2024 | Semantic CommunicationVideo Compression | CodeCode Available | 2 |
| TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting | Jan 22, 2025 | ClusteringTime Series | CodeCode Available | 2 |
| SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model | May 3, 2023 | Instance SegmentationObject | CodeCode Available | 2 |
| YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception | Aug 24, 2022 | Autonomous DrivingDrivable Area Detection | CodeCode Available | 2 |
| CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions | May 24, 2025 | Benchmarking | CodeCode Available | 2 |
| Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation | Jun 14, 2024 | NavigateVision and Language Navigation | CodeCode Available | 2 |
| A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations | Jun 19, 2024 | Benchmarking | CodeCode Available | 2 |
| MonoOcc: Digging into Monocular Semantic Occupancy Prediction | Mar 13, 2024 | 3D geometryAutonomous Vehicles | CodeCode Available | 2 |
| Self-Supervised Any-Point Tracking by Contrastive Random Walks | Sep 24, 2024 | Contrastive LearningData Augmentation | CodeCode Available | 2 |
| Click-Calib: A Robust Extrinsic Calibration Method for Surround-View Systems | Jan 2, 2025 | | CodeCode Available | 2 |
| ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis | Mar 11, 2024 | Question Answering | CodeCode Available | 2 |
| Brain Latent Progression: Individual-based Spatiotemporal Disease Progression on 3D Brain MRIs via Latent Diffusion | Feb 12, 2025 | | CodeCode Available | 2 |
| Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review | Apr 20, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks | Dec 29, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 2 |
| Generating Long Semantic IDs in Parallel for Recommendation | Jun 6, 2025 | | CodeCode Available | 2 |
| Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities | Jun 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation | Feb 28, 2024 | Semantic SegmentationTAG | CodeCode Available | 2 |
| Three New Validators and a Large-Scale Benchmark Ranking for Unsupervised Domain Adaptation | Aug 15, 2022 | Domain AdaptationUnsupervised Domain Adaptation | CodeCode Available | 2 |
| LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents | Feb 13, 2024 | BenchmarkingModel Selection | CodeCode Available | 2 |
| Learning from All Vehicles | Mar 22, 2022 | AllAutonomous Driving | CodeCode Available | 2 |
| LambdaNetworks: Modeling Long-Range Interactions Without Attention | Feb 17, 2021 | image-classificationImage Classification | CodeCode Available | 2 |
| Next Patch Prediction for Autoregressive Visual Generation | Dec 19, 2024 | Image GenerationPrediction | CodeCode Available | 2 |
| The Stable Artist: Steering Semantics in Diffusion Latent Space | Dec 12, 2022 | Image Generation | CodeCode Available | 2 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 |
| PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding | Aug 18, 2024 | Language ModellingQuestion Answering | CodeCode Available | 2 |
| SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers | Jun 9, 2023 | Continual LearningContinual Semantic Segmentation | CodeCode Available | 2 |
| CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion | Oct 19, 2022 | Camera Pose EstimationDepth Estimation | CodeCode Available | 2 |
| Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation | Apr 9, 2024 | Knowledge DistillationLanguage Modeling | CodeCode Available | 2 |
| Active Generalized Category Discovery | Mar 7, 2024 | Active Learningimbalanced classification | CodeCode Available | 2 |
| COLD: A Benchmark for Chinese Offensive Language Detection | Jan 16, 2022 | | CodeCode Available | 2 |