| ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization | Jun 4, 2024 | geo-localizationVisual Place Recognition | CodeCode Available | 2 | 5 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 | 5 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 | 5 |
| iFormer: Integrating ConvNet and Transformer for Mobile Application | Jan 26, 2025 | Instance Segmentationobject-detection | CodeCode Available | 2 | 5 |
| Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection | Jan 1, 2025 | Defect Detection | CodeCode Available | 2 | 5 |
| PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification | Aug 30, 2019 | Paraphrase IdentificationSentence | CodeCode Available | 2 | 5 |
| ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models | Mar 17, 2025 | Computational EfficiencyHallucination | CodeCode Available | 2 | 5 |
| GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation | Oct 27, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 | 5 |
| Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale | Apr 19, 2025 | Benchmarking | CodeCode Available | 2 | 5 |
| Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Jan 16, 2025 | 16kHallucination | CodeCode Available | 2 | 5 |
| GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI | Nov 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 | 5 |
| CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | Jan 28, 2025 | Hallucination | CodeCode Available | 2 | 5 |
| Mitigating Object Hallucination via Concentric Causal Attention | Oct 21, 2024 | HallucinationObject | CodeCode Available | 2 | 5 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 | 5 |
| Bridging the Gap Between End-to-End and Two-Step Text Spotting | Apr 6, 2024 | Text Spotting | CodeCode Available | 2 | 5 |
| Degradation-Aware Feature Perturbation for All-in-One Image Restoration | May 19, 2025 | AllDeblurring | CodeCode Available | 2 | 5 |
| NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation | Feb 18, 2025 | 3D Generation3D Molecule Generation | CodeCode Available | 2 | 5 |
| Unicom: Universal and Compact Representation Learning for Image Retrieval | Apr 12, 2023 | Image ClassificationImage Retrieval | CodeCode Available | 2 | 5 |
| SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Mar 14, 2024 | Action RecognitionHuman Interaction Recognition | CodeCode Available | 2 | 5 |
| Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection | Jun 25, 2024 | Audio Deepfake DetectionSynthetic Speech Detection | CodeCode Available | 2 | 5 |
| Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method | Nov 23, 2024 | Autonomous Driving | CodeCode Available | 2 | 5 |
| Golden Cudgel Network for Real-Time Semantic Segmentation | Mar 5, 2025 | Real-Time Semantic SegmentationSemantic Segmentation | CodeCode Available | 2 | 5 |
| Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes | May 3, 2023 | | CodeCode Available | 2 | 5 |
| Agent Attention: On the Integration of Softmax and Linear Attention | Dec 14, 2023 | Computational Efficiencyimage-classification | CodeCode Available | 2 | 5 |
| HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Feb 18, 2025 | Computational EfficiencyCPU | CodeCode Available | 2 | 5 |