| ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization | Jun 4, 2024 | geo-localizationVisual Place Recognition | CodeCode Available | 2 | 5 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 | 5 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 | 5 |
| iFormer: Integrating ConvNet and Transformer for Mobile Application | Jan 26, 2025 | Instance Segmentationobject-detection | CodeCode Available | 2 | 5 |
| Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection | Jan 1, 2025 | Defect Detection | CodeCode Available | 2 | 5 |
| PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification | Aug 30, 2019 | Paraphrase IdentificationSentence | CodeCode Available | 2 | 5 |
| ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models | Mar 17, 2025 | Computational EfficiencyHallucination | CodeCode Available | 2 | 5 |
| GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation | Oct 27, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 | 5 |
| Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale | Apr 19, 2025 | Benchmarking | CodeCode Available | 2 | 5 |
| Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Jan 16, 2025 | 16kHallucination | CodeCode Available | 2 | 5 |
| GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI | Nov 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 | 5 |
| CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | Jan 28, 2025 | Hallucination | CodeCode Available | 2 | 5 |
| Mitigating Object Hallucination via Concentric Causal Attention | Oct 21, 2024 | HallucinationObject | CodeCode Available | 2 | 5 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 | 5 |
| Bridging the Gap Between End-to-End and Two-Step Text Spotting | Apr 6, 2024 | Text Spotting | CodeCode Available | 2 | 5 |
| Degradation-Aware Feature Perturbation for All-in-One Image Restoration | May 19, 2025 | AllDeblurring | CodeCode Available | 2 | 5 |
| NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation | Feb 18, 2025 | 3D Generation3D Molecule Generation | CodeCode Available | 2 | 5 |
| Unicom: Universal and Compact Representation Learning for Image Retrieval | Apr 12, 2023 | Image ClassificationImage Retrieval | CodeCode Available | 2 | 5 |
| SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Mar 14, 2024 | Action RecognitionHuman Interaction Recognition | CodeCode Available | 2 | 5 |
| Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection | Jun 25, 2024 | Audio Deepfake DetectionSynthetic Speech Detection | CodeCode Available | 2 | 5 |
| Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method | Nov 23, 2024 | Autonomous Driving | CodeCode Available | 2 | 5 |
| Golden Cudgel Network for Real-Time Semantic Segmentation | Mar 5, 2025 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 | 5 |
| Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes | May 3, 2023 | | CodeCode Available | 2 | 5 |
| Agent Attention: On the Integration of Softmax and Linear Attention | Dec 14, 2023 | Computational Efficiencyimage-classification | CodeCode Available | 2 | 5 |
| HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Feb 18, 2025 | Computational EfficiencyCPU | CodeCode Available | 2 | 5 |
| Scene Adaptive Sparse Transformer for Event-based Object Detection | Apr 2, 2024 | Objectobject-detection | CodeCode Available | 2 | 5 |
| Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization | May 25, 2024 | continuous-controlContinuous Control | CodeCode Available | 2 | 5 |
| Optimizing Large Language Models for OpenAPI Code Completion | May 24, 2024 | Code CompletionCode Generation | CodeCode Available | 2 | 5 |
| Preference Alignment with Flow Matching | May 30, 2024 | | CodeCode Available | 2 | 5 |
| InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction | Apr 17, 2023 | Zero-shot Named Entity Recognition (NER) | CodeCode Available | 2 | 5 |
| Scaling Transformer to 1M tokens and beyond with RMT | Apr 19, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Occupancy as Set of Points | Jul 4, 2024 | | CodeCode Available | 2 | 5 |
| LangCoop: Collaborative Driving with Language | Apr 18, 2025 | Autonomous Driving | CodeCode Available | 2 | 5 |
| PlanT: Explainable Planning Transformers via Object-Level Representations | Oct 25, 2022 | CARLA longest6Decision Making | CodeCode Available | 2 | 5 |
| Measuring and Narrowing the Compositionality Gap in Language Models | Oct 7, 2022 | Question Answering | CodeCode Available | 2 | 5 |
| AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms | Feb 21, 2025 | Scheduling | CodeCode Available | 2 | 5 |
| GrootVL: Tree Topology is All You Need in State Space Model | Jun 4, 2024 | Allimage-classification | CodeCode Available | 2 | 5 |
| ViTs for SITS: Vision Transformers for Satellite Image Time Series | Jan 12, 2023 | Semantic SegmentationTime Series | CodeCode Available | 2 | 5 |
| Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions | Mar 25, 2024 | Attribute | CodeCode Available | 2 | 5 |
| MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features | Sep 30, 2022 | Image Classification | CodeCode Available | 2 | 5 |
| ProcessPainter: Learn Painting Process from Sequence Data | Jun 10, 2024 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| ChainerCV: a Library for Deep Learning in Computer Vision | Aug 28, 2017 | Deep Learningobject-detection | CodeCode Available | 2 | 5 |
| Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings | Mar 25, 2025 | 4kAction Recognition | CodeCode Available | 2 | 5 |
| Graph Diffusion Transformers for Multi-Conditional Molecular Generation | Jan 24, 2024 | DecoderDenoising | CodeCode Available | 2 | 5 |
| When and why vision-language models behave like bags-of-words, and what to do about it? | Oct 4, 2022 | Contrastive LearningRetrieval | CodeCode Available | 2 | 5 |
| CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models | Mar 31, 2024 | DenoisingSpeech Synthesis | CodeCode Available | 2 | 5 |
| FlexiDreamer: Single Image-to-3D Generation with FlexiCubes | Apr 1, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 | 5 |
| USP: Unified Self-Supervised Pretraining for Image Generation and Understanding | Mar 8, 2025 | Image GenerationRepresentation Learning | CodeCode Available | 2 | 5 |
| Alpha^2: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning | Jun 24, 2024 | Deep Reinforcement Learning | CodeCode Available | 2 | 5 |
| FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation | Mar 15, 2023 | DecoderInstance Segmentation | CodeCode Available | 2 | 5 |