| Bottleneck Transformers for Visual Recognition | Jan 27, 2021 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution | May 8, 2024 | Image Super-Resolution | CodeCode Available | 2 | 5 |
| EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records | May 23, 2024 | Mamba | CodeCode Available | 2 | 5 |
| Multi-Modal Self-Supervised Learning for Recommendation | Feb 21, 2023 | Contrastive LearningData Augmentation | CodeCode Available | 2 | 5 |
| Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration | Jan 23, 2024 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 2 | 5 |
| MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding | Jan 1, 2024 | Autonomous DrivingInstruction Following | CodeCode Available | 2 | 5 |
| ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation | Jun 14, 2024 | Code Generation | CodeCode Available | 2 | 5 |
| MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | Jun 21, 2024 | GPULanguage Modeling | CodeCode Available | 2 | 5 |
| iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Jul 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting | May 31, 2023 | DecoderScene Text Detection | CodeCode Available | 2 | 5 |
| Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model | Jul 6, 2024 | Image-to-Image TranslationTranslation | CodeCode Available | 2 | 5 |
| PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer | Jul 10, 2024 | DecoderHandwritten Mathmatical Expression Recognition | CodeCode Available | 2 | 5 |
| Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection | Mar 4, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 2 | 5 |
| Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks | Jul 18, 2024 | Autonomous DrivingBEV Segmentation | CodeCode Available | 2 | 5 |
| Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective | Oct 16, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 2 | 5 |
| LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain | Aug 19, 2024 | RAGRetrieval | CodeCode Available | 2 | 5 |
| ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild | Aug 23, 2022 | 2D Human Pose EstimationNeural Architecture Search | CodeCode Available | 2 | 5 |
| UniFormer: Unifying Convolution and Self-attention for Visual Recognition | Jan 24, 2022 | Image Classificationobject-detection | CodeCode Available | 2 | 5 |
| SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation | Oct 19, 2024 | AI AgentBenchmarking | CodeCode Available | 2 | 5 |
| Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey | Feb 20, 2023 | Survey | CodeCode Available | 2 | 5 |
| Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt | Jan 2, 2024 | Anomaly DetectionAnomaly Segmentation | CodeCode Available | 2 | 5 |
| ChatIE: Zero-Shot Information Extraction via Chatting with ChatGPT | Feb 20, 2023 | Event Extractionnamed-entity-recognition | CodeCode Available | 2 | 5 |
| BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models | Nov 21, 2024 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| Habitat 2.0: Training Home Assistants to Rearrange their Habitat | Jun 28, 2021 | Deep Reinforcement LearningGPU | CodeCode Available | 2 | 5 |
| UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models | Feb 19, 2024 | Image GenerationMachine Unlearning | CodeCode Available | 2 | 5 |
| Real-time 3D-aware Portrait Video Relighting | Oct 24, 2024 | NeRF | CodeCode Available | 2 | 5 |
| 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining | Jan 1, 2025 | Optical Character Recognition (OCR) | CodeCode Available | 2 | 5 |
| Aligning Large Language Models with Human: A Survey | Jul 24, 2023 | Survey | CodeCode Available | 2 | 5 |
| MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration | Jan 8, 2025 | DeblurringDenoising | CodeCode Available | 2 | 5 |
| GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation | Mar 31, 2023 | Image GenerationOptical Character Recognition (OCR) | CodeCode Available | 2 | 5 |
| MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning | Nov 24, 2022 | Deep Reinforcement LearningLayout Design | CodeCode Available | 2 | 5 |
| SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement | Feb 10, 2025 | Semantic Segmentation | CodeCode Available | 2 | 5 |
| EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting | Jun 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation | Mar 27, 2025 | DenoisingHuman Animation | CodeCode Available | 2 | 5 |
| Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models | Mar 28, 2025 | MMLUQuantization | CodeCode Available | 2 | 5 |
| Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation | Mar 1, 2023 | Video Frame Interpolation | CodeCode Available | 2 | 5 |
| Quartet: Native FP4 Training Can Be Optimal for Large Language Models | May 20, 2025 | | CodeCode Available | 2 | 5 |
| SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning | Oct 13, 2024 | Computational EfficiencyDeep Reinforcement Learning | CodeCode Available | 2 | 5 |
| Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM | Apr 7, 2024 | Marine Animal Segmentation | CodeCode Available | 2 | 5 |
| Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation | Apr 12, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 | 5 |
| DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception | May 7, 2025 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| LightEA: A Scalable, Robust, and Interpretable Entity Alignment Framework via Three-view Label Propagation | Oct 19, 2022 | Entity Alignment | CodeCode Available | 2 | 5 |
| Training-free Graph Neural Networks and the Power of Labels as Features | Apr 30, 2024 | Node Classification | CodeCode Available | 2 | 5 |
| HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis | Apr 29, 2024 | CPUEdge-computing | CodeCode Available | 2 | 5 |
| ICASSP 2024 Speech Signal Improvement Challenge | Jan 25, 2024 | | CodeCode Available | 2 | 5 |
| Simulate Any Radar: Attribute-Controllable Radar Simulation via Waveform Parameter Embedding | Jun 3, 2025 | 3D Object DetectionAttribute | CodeCode Available | 2 | 5 |
| Text-Guided Synthesis of Eulerian Cinemagraphs | Jul 6, 2023 | Image Animation | CodeCode Available | 2 | 5 |
| MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | May 18, 2024 | 3DGSNeRF | CodeCode Available | 2 | 5 |
| CoLLaVO: Crayon Large Language and Vision mOdel | Feb 17, 2024 | Large Language Modelmodel | CodeCode Available | 2 | 5 |
| Masked Autoencoders As Spatiotemporal Learners | May 18, 2022 | Inductive BiasRepresentation Learning | CodeCode Available | 2 | 5 |