| AST: Audio Spectrogram Transformer | Apr 5, 2021 | Audio ClassificationAudio Tagging | CodeCode Available | 2 | 5 |
| Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models | Sep 17, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 2 | 5 |
| Motion Mamba: Efficient and Long Sequence Motion Generation | Mar 12, 2024 | MambaMotion Generation | CodeCode Available | 2 | 5 |
| A Graph-Based Approach for Category-Agnostic Pose Estimation | Nov 29, 2023 | 2D Pose EstimationAnimal Pose Estimation | CodeCode Available | 2 | 5 |
| Agent Lumos: Unified and Modular Training for Open-Source Language Agents | Nov 9, 2023 | MathQuestion Answering | CodeCode Available | 2 | 5 |
| Toward General Instruction-Following Alignment for Retrieval-Augmented Generation | Oct 12, 2024 | Instruction FollowingRAG | CodeCode Available | 2 | 5 |
| Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis | Mar 24, 2022 | DenoisingImage Denoising | CodeCode Available | 2 | 5 |
| CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models | Nov 28, 2023 | Dialogue Generation | CodeCode Available | 2 | 5 |
| LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding | Feb 28, 2022 | Document Image Classificationdocument understanding | CodeCode Available | 2 | 5 |
| Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework | Apr 19, 2024 | Earth ObservationSegmentation | CodeCode Available | 2 | 5 |
| Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond | Oct 9, 2024 | Benchmarking | CodeCode Available | 2 | 5 |
| 3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification | Aug 25, 2024 | Computational EfficiencyHyperspectral Image Classification | CodeCode Available | 2 | 5 |
| Attention as a Hypernetwork | Jun 9, 2024 | | CodeCode Available | 2 | 5 |
| DETR Doesn't Need Multi-Scale or Locality Design | Aug 3, 2023 | Decoder | CodeCode Available | 2 | 5 |
| SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals | May 28, 2024 | Contrastive LearningRepresentation Learning | CodeCode Available | 2 | 5 |
| Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion | Feb 22, 2024 | Music Generation | CodeCode Available | 2 | 5 |
| Deformable One-shot Face Stylization via DINO Semantic Guidance | Mar 1, 2024 | One-Shot Face Stylization | CodeCode Available | 2 | 5 |
| STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion | Jan 3, 2024 | 3D Human Pose EstimationHuman Mesh Recovery | CodeCode Available | 2 | 5 |
| SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese | Jan 22, 2024 | DiversityGSM8K | CodeCode Available | 2 | 5 |
| RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models | Jan 12, 2025 | Image SegmentationSegmentation | CodeCode Available | 2 | 5 |
| Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention | Jan 1, 2025 | HallucinationResponse Generation | CodeCode Available | 2 | 5 |
| Multi-Modal Fusion Transformer for End-to-End Autonomous Driving | Apr 19, 2021 | Autonomous Driving | CodeCode Available | 2 | 5 |
| EfficientRAG: Efficient Retriever for Multi-Hop Question Answering | Aug 8, 2024 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation | Dec 23, 2023 | DecoderImage Segmentation | CodeCode Available | 2 | 5 |
| Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer | Dec 19, 2024 | Image ManipulationImage Manipulation Localization | CodeCode Available | 2 | 5 |