| Knowledge Graph-Guided Retrieval Augmented Generation | Feb 8, 2025 | DiversityHallucination | CodeCode Available | 2 | 5 |
| σ-GPTs: A New Approach to Autoregressive Models | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Feb 4, 2025 | DenoisingDomain Generalization | CodeCode Available | 2 | 5 |
| SimpleClick: Interactive Image Segmentation with Simple Vision Transformers | Oct 20, 2022 | Image SegmentationInteractive Segmentation | CodeCode Available | 2 | 5 |
| Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer | Dec 11, 2023 | Style Transfer | CodeCode Available | 2 | 5 |
| CodeJudge: Evaluating Code Generation with Large Language Models | Oct 3, 2024 | Code Generation | CodeCode Available | 2 | 5 |
| RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors | Mar 13, 2025 | 3DGS | CodeCode Available | 2 | 5 |
| Towards Generative Ray Path Sampling for Faster Point-to-Point Ray Tracing | Oct 31, 2024 | valid | CodeCode Available | 2 | 5 |
| Temporally Efficient Vision Transformer for Video Instance Segmentation | Apr 18, 2022 | Instance SegmentationSemantic Segmentation | CodeCode Available | 2 | 5 |
| DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion | Mar 9, 2025 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 | 5 |
| Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs | Apr 5, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 | 5 |
| Pre-training Music Classification Models via Music Source Separation | Oct 24, 2023 | ClassificationGenre classification | CodeCode Available | 2 | 5 |
| Smooth Exploration for Robotic Reinforcement Learning | May 12, 2020 | continuous-controlContinuous Control | CodeCode Available | 2 | 5 |
| Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment | Aug 16, 2022 | | CodeCode Available | 2 | 5 |
| GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis | Jan 30, 2023 | Image GenerationScene Understanding | CodeCode Available | 2 | 5 |
| RRHF: Rank Responses to Align Language Models with Human Feedback | Sep 21, 2023 | | CodeCode Available | 2 | 5 |
| Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | May 20, 2025 | GPUVideo Generation | CodeCode Available | 2 | 5 |
| Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory | Mar 24, 2022 | Motion Synthesis | CodeCode Available | 2 | 5 |
| Scene Text Recognition with Permuted Autoregressive Sequence Models | Jul 14, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| U-shaped Vision Mamba for Single Image Dehazing | Feb 6, 2024 | Image DehazingImage Restoration | CodeCode Available | 2 | 5 |
| FastMAC: Stochastic Spectral Sampling of Correspondence Graph | Mar 13, 2024 | Point Cloud Registration | CodeCode Available | 2 | 5 |
| LHU-Net: A Light Hybrid U-Net for Cost-Efficient, High-Performance Volumetric Medical Image Segmentation | Apr 7, 2024 | Computational EfficiencyImage Segmentation | CodeCode Available | 2 | 5 |
| Learning A Spiking Neural Network for Efficient Image Deraining | May 10, 2024 | Image ReconstructionRain Removal | CodeCode Available | 2 | 5 |
| ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE | Sep 12, 2024 | | CodeCode Available | 2 | 5 |
| Exploring the Benefit of Activation Sparsity in Pre-training | Oct 4, 2024 | | CodeCode Available | 2 | 5 |
| AST: Audio Spectrogram Transformer | Apr 5, 2021 | Audio ClassificationAudio Tagging | CodeCode Available | 2 | 5 |
| Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models | Sep 17, 2024 | Brain Computer InterfaceEEG | CodeCode Available | 2 | 5 |
| Motion Mamba: Efficient and Long Sequence Motion Generation | Mar 12, 2024 | MambaMotion Generation | CodeCode Available | 2 | 5 |
| A Graph-Based Approach for Category-Agnostic Pose Estimation | Nov 29, 2023 | 2D Pose EstimationAnimal Pose Estimation | CodeCode Available | 2 | 5 |
| Agent Lumos: Unified and Modular Training for Open-Source Language Agents | Nov 9, 2023 | MathQuestion Answering | CodeCode Available | 2 | 5 |
| Toward General Instruction-Following Alignment for Retrieval-Augmented Generation | Oct 12, 2024 | Instruction FollowingRAG | CodeCode Available | 2 | 5 |
| Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis | Mar 24, 2022 | DenoisingImage Denoising | CodeCode Available | 2 | 5 |
| CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models | Nov 28, 2023 | Dialogue Generation | CodeCode Available | 2 | 5 |
| LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding | Feb 28, 2022 | Document Image Classificationdocument understanding | CodeCode Available | 2 | 5 |
| Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework | Apr 19, 2024 | Earth ObservationSegmentation | CodeCode Available | 2 | 5 |
| Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond | Oct 9, 2024 | Benchmarking | CodeCode Available | 2 | 5 |
| 3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification | Aug 25, 2024 | Computational EfficiencyHyperspectral Image Classification | CodeCode Available | 2 | 5 |
| Attention as a Hypernetwork | Jun 9, 2024 | | CodeCode Available | 2 | 5 |
| DETR Doesn't Need Multi-Scale or Locality Design | Aug 3, 2023 | Decoder | CodeCode Available | 2 | 5 |
| SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals | May 28, 2024 | Contrastive LearningRepresentation Learning | CodeCode Available | 2 | 5 |
| Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion | Feb 22, 2024 | Music Generation | CodeCode Available | 2 | 5 |
| Deformable One-shot Face Stylization via DINO Semantic Guidance | Mar 1, 2024 | One-Shot Face Stylization | CodeCode Available | 2 | 5 |
| STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion | Jan 3, 2024 | 3D Human Pose EstimationHuman Mesh Recovery | CodeCode Available | 2 | 5 |
| SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese | Jan 22, 2024 | DiversityGSM8K | CodeCode Available | 2 | 5 |
| RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models | Jan 12, 2025 | Image SegmentationSegmentation | CodeCode Available | 2 | 5 |
| Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention | Jan 1, 2025 | HallucinationResponse Generation | CodeCode Available | 2 | 5 |
| Multi-Modal Fusion Transformer for End-to-End Autonomous Driving | Apr 19, 2021 | Autonomous Driving | CodeCode Available | 2 | 5 |
| EfficientRAG: Efficient Retriever for Multi-Hop Question Answering | Aug 8, 2024 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation | Dec 23, 2023 | DecoderImage Segmentation | CodeCode Available | 2 | 5 |
| Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer | Dec 19, 2024 | Image ManipulationImage Manipulation Localization | CodeCode Available | 2 | 5 |