| Causal Diffusion Transformers for Generative Modeling | Dec 16, 2024 | DecoderImage Generation | CodeCode Available | 2 |
| Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference | May 28, 2024 | GPUText Generation | CodeCode Available | 2 |
| Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement | Mar 11, 2024 | Clinical KnowledgeDescriptive | CodeCode Available | 2 |
| Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation | Jul 3, 2025 | DiversityVideo Generation | CodeCode Available | 2 |
| When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning | Mar 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| When Do LLMs Help With Node Classification? A Comprehensive Analysis | Feb 2, 2025 | Node Classification | CodeCode Available | 2 |
| GhostNetV2: Enhance Cheap Operation with Long-Range Attention | Nov 23, 2022 | | CodeCode Available | 2 |
| A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection | Mar 9, 2022 | Co-Salient Object Detectionobject-detection | CodeCode Available | 2 |
| Atlas: End-to-End 3D Scene Reconstruction from Posed Images | Mar 23, 2020 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 2 |
| Federated Learning in Mobile Networks: A Comprehensive Case Study on Traffic Forecasting | Dec 5, 2024 | Federated LearningManagement | CodeCode Available | 2 |
| Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization | Nov 1, 2024 | Computational EfficiencyIn-Context Learning | CodeCode Available | 2 |
| MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models | May 15, 2025 | General KnowledgePrompt Engineering | CodeCode Available | 2 |
| SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network | May 16, 2024 | Binary ClassificationDecoder | CodeCode Available | 2 |
| Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation | Mar 29, 2022 | CPUDecoder | CodeCode Available | 2 |
| Are Self-Attentions Effective for Time Series Forecasting? | May 27, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| Autoformalizing Euclidean Geometry | May 27, 2024 | Math | CodeCode Available | 2 |
| HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation | Nov 19, 2024 | Domain AdaptationImage Generation | CodeCode Available | 2 |
| Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction | Apr 21, 2025 | Math | CodeCode Available | 2 |
| HybridNets: End-to-End Perception Network | Mar 17, 2022 | Autonomous DrivingDrivable Area Detection | CodeCode Available | 2 |
| Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training | May 23, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) | Mar 24, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| D-Flow: Differentiating through Flows for Controlled Generation | Feb 21, 2024 | | CodeCode Available | 2 |
| REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation | May 25, 2024 | Graph GenerationObject | CodeCode Available | 2 |
| Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis | Mar 25, 2025 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Jun 5, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Cross-video Identity Correlating for Person Re-identification Pre-training | Sep 27, 2024 | DenoisingPerson Re-Identification | CodeCode Available | 2 |
| Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion | Oct 19, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction | Jul 18, 2024 | Click-Through Rate Prediction | CodeCode Available | 2 |
| SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Apr 23, 2024 | 3D Human Pose EstimationPose Estimation | CodeCode Available | 2 |
| Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement | Oct 27, 2024 | DecoderImage Enhancement | CodeCode Available | 2 |
| Learning Vision from Models Rivals Learning Vision from Data | Dec 28, 2023 | Contrastive LearningImage Captioning | CodeCode Available | 2 |
| Enhancing Retrieval-Augmented Generation: A Study of Best Practices | Jan 13, 2025 | In-Context LearningRAG | CodeCode Available | 2 |
| A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems | Jun 26, 2024 | Audio Source SeparationDecoder | CodeCode Available | 2 |
| ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation | Aug 13, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search | Mar 26, 2025 | Decision MakingRAG | CodeCode Available | 2 |
| Correlation Matching Transformation Transformers for UHD Image Restoration | Jun 2, 2024 | DeblurringImage Deblurring | CodeCode Available | 2 |
| Me LLaMA: Foundation Large Language Models for Medical Applications | Feb 20, 2024 | Few-Shot LearningGPU | CodeCode Available | 2 |
| Mixed Diffusion for 3D Indoor Scene Synthesis | May 31, 2024 | DenoisingIndoor Scene Synthesis | CodeCode Available | 2 |
| Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering | Nov 25, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields | Apr 1, 2024 | 3D Object DetectionNeRF | CodeCode Available | 2 |
| R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | Jan 18, 2024 | Benchmarking | CodeCode Available | 2 |
| rPPG-Toolbox: Deep Remote PPG Toolbox | Oct 3, 2022 | BenchmarkingData Augmentation | CodeCode Available | 2 |
| Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision | Feb 14, 2024 | Language ModellingSegmentation | CodeCode Available | 2 |
| Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data | Feb 8, 2024 | Action RecognitionMamba | CodeCode Available | 2 |
| Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses | Feb 3, 2021 | DecoderSpeech Denoising | CodeCode Available | 2 |
| ControlVideo: Training-free Controllable Text-to-Video Generation | May 22, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 2 |
| Realistic Rainy Weather Simulation for LiDARs in CARLA Simulator | Dec 20, 2023 | Data Augmentationobject-detection | CodeCode Available | 2 |
| Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation | Mar 16, 2023 | DiversityGesture Generation | CodeCode Available | 2 |
| Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark | Mar 12, 2025 | Image RetrievalRetrieval | CodeCode Available | 2 |
| SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation | Dec 8, 2022 | 3D Reconstruction3D Shape Generation | CodeCode Available | 2 |