| SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding | Aug 28, 2024 | Instruction Followingscientific discovery | CodeCode Available | 2 |
| CFAT: Unleashing Triangular Windows for Image Super-resolution | Jan 1, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Towards Fast, Accurate and Stable 3D Dense Face Alignment | Sep 21, 2020 | 3D Face Modelling3D Face Reconstruction | CodeCode Available | 2 |
| Diffusion Models for Adversarial Purification | May 16, 2022 | Adversarial Purification | CodeCode Available | 2 |
| Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations | Feb 27, 2024 | Recommendation Systems | CodeCode Available | 2 |
| RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions | Dec 31, 2024 | DiversityRAG | CodeCode Available | 2 |
| Samba: A Unified Mamba-based Framework for General Salient Object Detection | Jan 1, 2025 | Mambaobject-detection | CodeCode Available | 2 |
| Centralized Feature Pyramid for Object Detection | Oct 5, 2022 | Objectobject-detection | CodeCode Available | 2 |
| DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models | Oct 17, 2022 | DiversityText Generation | CodeCode Available | 2 |
| An End-to-End Structure with Novel Position Mechanism and Improved EMD for Stock Forecasting | Mar 25, 2024 | PositionTime Series | CodeCode Available | 2 |
| PartCraft: Crafting Creative Objects by Parts | Jul 5, 2024 | | CodeCode Available | 2 |
| Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters | Aug 7, 2024 | GPU | CodeCode Available | 2 |
| A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images | Apr 25, 2021 | DecoderSegmentation | CodeCode Available | 2 |
| Large Language Models Are Zero-Shot Time Series Forecasters | Oct 11, 2023 | ImputationTime Series | CodeCode Available | 2 |
| TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba | Nov 26, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Feb 5, 2025 | DenoisingModel Optimization | CodeCode Available | 2 |
| ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization | Jun 4, 2024 | geo-localizationVisual Place Recognition | CodeCode Available | 2 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 |
| iFormer: Integrating ConvNet and Transformer for Mobile Application | Jan 26, 2025 | Instance Segmentationobject-detection | CodeCode Available | 2 |
| Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection | Jan 1, 2025 | Defect Detection | CodeCode Available | 2 |
| PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification | Aug 30, 2019 | Paraphrase IdentificationSentence | CodeCode Available | 2 |
| ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models | Mar 17, 2025 | Computational EfficiencyHallucination | CodeCode Available | 2 |
| GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation | Oct 27, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale | Apr 19, 2025 | Benchmarking | CodeCode Available | 2 |
| Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Jan 16, 2025 | 16kHallucination | CodeCode Available | 2 |
| GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI | Nov 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | Jan 28, 2025 | Hallucination | CodeCode Available | 2 |
| Mitigating Object Hallucination via Concentric Causal Attention | Oct 21, 2024 | HallucinationObject | CodeCode Available | 2 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| Bridging the Gap Between End-to-End and Two-Step Text Spotting | Apr 6, 2024 | Text Spotting | CodeCode Available | 2 |
| Degradation-Aware Feature Perturbation for All-in-One Image Restoration | May 19, 2025 | AllDeblurring | CodeCode Available | 2 |
| NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation | Feb 18, 2025 | 3D Generation3D Molecule Generation | CodeCode Available | 2 |
| Unicom: Universal and Compact Representation Learning for Image Retrieval | Apr 12, 2023 | Image ClassificationImage Retrieval | CodeCode Available | 2 |
| SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Mar 14, 2024 | Action RecognitionHuman Interaction Recognition | CodeCode Available | 2 |
| Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection | Jun 25, 2024 | Audio Deepfake DetectionSynthetic Speech Detection | CodeCode Available | 2 |
| Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method | Nov 23, 2024 | Autonomous Driving | CodeCode Available | 2 |
| Golden Cudgel Network for Real-Time Semantic Segmentation | Mar 5, 2025 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes | May 3, 2023 | | CodeCode Available | 2 |
| Agent Attention: On the Integration of Softmax and Linear Attention | Dec 14, 2023 | Computational Efficiencyimage-classification | CodeCode Available | 2 |
| HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Feb 18, 2025 | Computational EfficiencyCPU | CodeCode Available | 2 |
| Scene Adaptive Sparse Transformer for Event-based Object Detection | Apr 2, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization | May 25, 2024 | continuous-controlContinuous Control | CodeCode Available | 2 |
| Optimizing Large Language Models for OpenAPI Code Completion | May 24, 2024 | Code CompletionCode Generation | CodeCode Available | 2 |
| Preference Alignment with Flow Matching | May 30, 2024 | | CodeCode Available | 2 |
| InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction | Apr 17, 2023 | Zero-shot Named Entity Recognition (NER) | CodeCode Available | 2 |
| Scaling Transformer to 1M tokens and beyond with RMT | Apr 19, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Occupancy as Set of Points | Jul 4, 2024 | | CodeCode Available | 2 |
| LangCoop: Collaborative Driving with Language | Apr 18, 2025 | Autonomous Driving | CodeCode Available | 2 |
| PlanT: Explainable Planning Transformers via Object-Level Representations | Oct 25, 2022 | CARLA longest6Decision Making | CodeCode Available | 2 |