| MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Jun 5, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Cross-video Identity Correlating for Person Re-identification Pre-training | Sep 27, 2024 | DenoisingPerson Re-Identification | CodeCode Available | 2 |
| Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion | Oct 19, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction | Jul 18, 2024 | Click-Through Rate Prediction | CodeCode Available | 2 |
| SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Apr 23, 2024 | 3D Human Pose EstimationPose Estimation | CodeCode Available | 2 |
| Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement | Oct 27, 2024 | DecoderImage Enhancement | CodeCode Available | 2 |
| Learning Vision from Models Rivals Learning Vision from Data | Dec 28, 2023 | Contrastive LearningImage Captioning | CodeCode Available | 2 |
| Enhancing Retrieval-Augmented Generation: A Study of Best Practices | Jan 13, 2025 | In-Context LearningRAG | CodeCode Available | 2 |
| A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems | Jun 26, 2024 | Audio Source SeparationDecoder | CodeCode Available | 2 |
| ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation | Aug 13, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search | Mar 26, 2025 | Decision MakingRAG | CodeCode Available | 2 |
| Correlation Matching Transformation Transformers for UHD Image Restoration | Jun 2, 2024 | DeblurringImage Deblurring | CodeCode Available | 2 |
| Me LLaMA: Foundation Large Language Models for Medical Applications | Feb 20, 2024 | Few-Shot LearningGPU | CodeCode Available | 2 |
| Mixed Diffusion for 3D Indoor Scene Synthesis | May 31, 2024 | DenoisingIndoor Scene Synthesis | CodeCode Available | 2 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| CoSeR: Bridging Image and Language for Cognitive Super-Resolution | Nov 27, 2023 | Super-Resolution | CodeCode Available | 2 |
| FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation | Mar 8, 2024 | Federated LearningImage Segmentation | CodeCode Available | 2 |
| Improved Canonicalization for Model Agnostic Equivariance | May 23, 2024 | Contrastive Learningmodel | CodeCode Available | 2 |
| PENCIL: Long Thoughts with Short Memory | Mar 18, 2025 | | CodeCode Available | 2 |
| GenN2N: Generative NeRF2NeRF Translation | Apr 3, 2024 | ColorizationContrastive Learning | CodeCode Available | 2 |
| Translating Images to Road Network: A Sequence-to-Sequence Perspective | Feb 13, 2024 | | CodeCode Available | 2 |
| BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream | Jul 2, 2024 | NeRF | CodeCode Available | 2 |
| Unsupervised Semantic Segmentation by Distilling Feature Correspondences | Mar 16, 2022 | FormSemantic Segmentation | CodeCode Available | 2 |
| I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models | Dec 27, 2023 | Video Generation | CodeCode Available | 2 |
| Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery | Jan 11, 2024 | 3D ReconstructionDepth Estimation | CodeCode Available | 2 |
| LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels | Mar 22, 2024 | 3D Semantic SegmentationLIDAR Semantic Segmentation | CodeCode Available | 2 |
| PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views | Oct 24, 2024 | | CodeCode Available | 2 |
| How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis | Feb 8, 2024 | | CodeCode Available | 2 |
| Blockwise Parallel Transformers for Large Context Models | Sep 21, 2023 | | CodeCode Available | 2 |
| Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework | Jun 20, 2024 | HallucinationQuestion Answering | CodeCode Available | 2 |
| Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models | Sep 15, 2022 | image-classificationImage Classification | CodeCode Available | 2 |
| Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism | Nov 25, 2022 | GPU | CodeCode Available | 2 |
| Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming | Dec 11, 2024 | Text to 3DText-to-Image Generation | CodeCode Available | 2 |
| ETSformer: Exponential Smoothing Transformers for Time-series Forecasting | Feb 3, 2022 | Time SeriesTime Series Analysis | CodeCode Available | 2 |
| YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion | Mar 10, 2025 | | CodeCode Available | 2 |
| CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI | May 29, 2022 | Chinese Sentiment AnalysisConversational Response Generation | CodeCode Available | 2 |
| V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer | Mar 20, 2022 | 3D Object DetectionAutonomous Vehicles | CodeCode Available | 2 |
| Preference Leakage: A Contamination Problem in LLM-as-a-judge | Feb 3, 2025 | | CodeCode Available | 2 |
| Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression | Jun 11, 2025 | Image Generation | CodeCode Available | 2 |
| Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution | Feb 27, 2023 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge | Apr 2, 2024 | Robotic Grasping | CodeCode Available | 2 |
| Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers | Mar 22, 2024 | Information Retrieval | CodeCode Available | 2 |
| ParCo: Part-Coordinating Text-to-Motion Synthesis | Mar 27, 2024 | Motion Synthesis | CodeCode Available | 2 |
| A Comprehensive Survey on Self-Supervised Learning for Recommendation | Apr 4, 2024 | Contrastive LearningRecommendation Systems | CodeCode Available | 2 |
| Self-Supervised Visual Preference Alignment | Apr 16, 2024 | 8kMM-Vet | CodeCode Available | 2 |
| Kandinsky 3.0 Technical Report | Dec 6, 2023 | Image GenerationImage to Video Generation | CodeCode Available | 2 |
| SparseLLM: Towards Global Pruning for Pre-trained Language Models | Feb 28, 2024 | Computational EfficiencyProblem Decomposition | CodeCode Available | 2 |
| AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery | Jan 10, 2025 | | CodeCode Available | 2 |
| Diffusion Time-step Curriculum for One Image to 3D Generation | Apr 6, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |
| Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs | Apr 17, 2025 | Position | CodeCode Available | 2 |