| Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary | Jan 16, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Meta Prompting for AI Systems | Nov 20, 2023 | Data InteractionGSM8K | CodeCode Available | 2 |
| VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking | Jan 24, 2025 | DenoisingImage Generation | CodeCode Available | 2 |
| Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation | Feb 16, 2024 | Video Generation | CodeCode Available | 2 |
| FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion | Oct 27, 2022 | Data Augmentationtext annotation | CodeCode Available | 2 |
| Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking | Mar 9, 2023 | Contrastive LearningDecoder | CodeCode Available | 2 |
| MS-DETR: Efficient DETR Training with Mixed Supervision | Jan 8, 2024 | DecoderObject | CodeCode Available | 2 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures | Mar 20, 2025 | DeblurringZero-shot Generalization | CodeCode Available | 2 |
| Accelerating Transformers with Spectrum-Preserving Token Merging | May 25, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See | Oct 8, 2024 | | CodeCode Available | 2 |
| UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning | Jan 12, 2022 | Representation Learning | CodeCode Available | 2 |
| OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation | Jun 9, 2025 | Image Generation | CodeCode Available | 2 |
| ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs | Sep 22, 2023 | Math | CodeCode Available | 2 |
| CATT: Character-based Arabic Tashkeel Transformer | Jul 3, 2024 | Arabic Text DiacritizationDecoder | CodeCode Available | 2 |
| Monocular Occupancy Prediction for Scalable Indoor Scenes | Jul 16, 2024 | 3D Semantic Scene Completion from a single RGB imagePrediction | CodeCode Available | 2 |
| Very fast Bayesian Additive Regression Trees on GPU | Oct 30, 2024 | CPUGPU | CodeCode Available | 2 |
| Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal | Jul 24, 2024 | Raindrop RemovalRain Removal | CodeCode Available | 2 |
| MARG: Multi-Agent Review Generation for Scientific Papers | Jan 8, 2024 | Review GenerationSpecificity | CodeCode Available | 2 |
| YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection | Sep 27, 2024 | Fracture detection | CodeCode Available | 2 |
| Ignore Previous Prompt: Attack Techniques For Language Models | Nov 17, 2022 | Adversarial AttackAdversarial Text | CodeCode Available | 2 |
| IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Jul 15, 2024 | DenoisingDepth Estimation | CodeCode Available | 2 |
| GPT Can Solve Mathematical Problems Without a Calculator | Sep 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review | Jul 18, 2024 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| VSSD: Vision Mamba with Non-Causal State Space Duality | Jul 26, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video | Mar 12, 2025 | Video Inpainting | CodeCode Available | 2 |
| Dynamic Factor Allocation Leveraging Regime-Switching Signals | Oct 18, 2024 | | CodeCode Available | 2 |
| BoQ: A Place is Worth a Bag of Learnable Queries | May 12, 2024 | Image Similarity SearchRetrieval | CodeCode Available | 2 |
| Alpha-CLIP: A CLIP Model Focusing on Wherever You Want | Dec 6, 2023 | 3D Generation | CodeCode Available | 2 |
| Dual Vision Transformer | Jul 11, 2022 | | CodeCode Available | 2 |
| DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting | Nov 26, 2024 | AttributeDiversity | CodeCode Available | 2 |
| Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression | Jul 21, 2022 | HallucinationImage Enhancement | CodeCode Available | 2 |
| Learning to Decode Collaboratively with Multiple Language Models | Mar 6, 2024 | Instruction Following | CodeCode Available | 2 |
| Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers | Jun 25, 2024 | Image GenerationModel Compression | CodeCode Available | 2 |
| QAQ: Quality Adaptive Quantization for LLM KV Cache | Mar 7, 2024 | QuantizationQuestion Answering | CodeCode Available | 2 |
| VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models | Mar 8, 2024 | Video Generation | CodeCode Available | 2 |
| IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems | Mar 8, 2024 | | CodeCode Available | 2 |
| Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance | Mar 8, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 |
| VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models | Mar 10, 2024 | Copy DetectionImage Generation | CodeCode Available | 2 |
| Beyond Text: Frozen Large Language Models in Visual Signal Comprehension | Mar 12, 2024 | DeblurringDecoder | CodeCode Available | 2 |
| RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model | Mar 12, 2024 | Change DetectionZero-shot Generalization | CodeCode Available | 2 |
| The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023 | Apr 1, 2024 | MRI Reconstruction | CodeCode Available | 2 |
| MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning | Mar 13, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments | Mar 14, 2024 | Zero-Shot Learning | CodeCode Available | 2 |
| Generative Region-Language Pretraining for Open-Ended Object Detection | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment | Mar 16, 2024 | Image Quality Assessment | CodeCode Available | 2 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| Graph Neural Networks for Learning Equivariant Representations of Neural Networks | Mar 18, 2024 | | CodeCode Available | 2 |
| Diversified and Personalized Multi-rater Medical Image Segmentation | Mar 20, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 |
| A Multimodal Vision Foundation Model for Clinical Dermatology | Oct 19, 2024 | DiagnosticLesion Segmentation | CodeCode Available | 2 |