| Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Jan 16, 2025 | 16kHallucination | CodeCode Available | 2 |
| DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization | May 18, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| KAN or MLP: A Fairer Comparison | Jul 23, 2024 | Continual Learning | CodeCode Available | 2 |
| ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Nov 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale | Apr 19, 2025 | Benchmarking | CodeCode Available | 2 |
| GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation | Oct 27, 2024 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers | Oct 9, 2024 | DecoderRe-Ranking | CodeCode Available | 2 |
| Combinatorial Client-Master Multiagent Deep Reinforcement Learning for Task Offloading in Mobile Edge Computing | Feb 18, 2024 | Deep Reinforcement LearningEdge-computing | CodeCode Available | 2 |
| Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss | Jan 5, 2024 | Knowledge Distillation | CodeCode Available | 2 |
| ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models | Mar 17, 2025 | Computational EfficiencyHallucination | CodeCode Available | 2 |
| CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards | Jul 12, 2025 | | CodeCode Available | 2 |
| Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Feb 24, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model | Feb 6, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset | Dec 3, 2024 | 3D Generation | CodeCode Available | 2 |
| ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models | Jun 26, 2024 | Classification | CodeCode Available | 2 |
| mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | Feb 12, 2025 | cross-modal alignmentLarge Language Model | CodeCode Available | 2 |
| Multiview Scene Graph | Oct 15, 2024 | DecoderObject | CodeCode Available | 2 |
| MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation | Nov 22, 2024 | Video Generation | CodeCode Available | 2 |
| N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting | Jan 30, 2022 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection | Sep 24, 2024 | Depression DetectionMamba | CodeCode Available | 2 |
| Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models | Feb 26, 2024 | | CodeCode Available | 2 |
| Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors | Jul 13, 2024 | Super-ResolutionVideo Super-Resolution | CodeCode Available | 2 |
| DiMeR: Disentangled Mesh Reconstruction Model | Apr 24, 2025 | Image to 3Dmodel | CodeCode Available | 2 |
| Can Large Language Model Agents Simulate Human Trust Behavior? | Feb 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing | Mar 14, 2025 | | CodeCode Available | 2 |
| Fast Best-of-N Decoding via Speculative Rejection | Oct 26, 2024 | | CodeCode Available | 2 |
| Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness | Apr 10, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 |
| On the Generalization of BasicVSR++ to Video Deblurring and Denoising | Apr 11, 2022 | DeblurringDenoising | CodeCode Available | 2 |
| Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation | May 4, 2025 | Knowledge DistillationMultivariate Time Series Forecasting | CodeCode Available | 2 |
| PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation | Jan 23, 2024 | DecoderImage Segmentation | CodeCode Available | 2 |
| One Quantizer is Enough: Toward a Lightweight Audio Codec | Apr 7, 2025 | | CodeCode Available | 2 |
| Side Adapter Network for Open-Vocabulary Semantic Segmentation | Feb 23, 2023 | Language ModellingOpen Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification | Aug 30, 2019 | Paraphrase IdentificationSentence | CodeCode Available | 2 |
| MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models | May 21, 2025 | Computational Efficiency | CodeCode Available | 2 |
| Style-Based Global Appearance Flow for Virtual Try-On | Apr 3, 2022 | Virtual Try-on | CodeCode Available | 2 |
| BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis | Nov 13, 2024 | NeRFNovel View Synthesis | CodeCode Available | 2 |
| Open-Vocabulary DETR with Conditional Matching | Mar 22, 2022 | Language Modellingobject-detection | CodeCode Available | 2 |
| Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection | Jan 1, 2025 | Defect Detection | CodeCode Available | 2 |
| FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching | Mar 24, 2025 | Weakly-supervised Learning | CodeCode Available | 2 |
| HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions | Jul 28, 2022 | Image ClassificationObject Detection | CodeCode Available | 2 |
| iFormer: Integrating ConvNet and Transformer for Mobile Application | Jan 26, 2025 | Instance Segmentationobject-detection | CodeCode Available | 2 |
| FairyGen: Storied Cartoon Video from a Single Child-Drawn Character | Jun 26, 2025 | | CodeCode Available | 2 |
| BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Dec 10, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 2 |
| V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization | Nov 5, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 |
| Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection | May 19, 2025 | Event-based visionObject | CodeCode Available | 2 |
| TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation | Feb 11, 2025 | Image Generation | CodeCode Available | 2 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention | Jan 12, 2023 | Image Dehazing | CodeCode Available | 2 |
| A Plug-and-Play Bregman ADMM Module for Inferring Event Branches in Temporal Point Processes | Jan 8, 2025 | Point Processes | CodeCode Available | 2 |