| Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline | Sep 26, 2023 | Knowledge DistillationObject Tracking | CodeCode Available | 2 |
| Scalable Diffusion Models with State Space Backbone | Feb 8, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| Temporally Consistent Transformers for Video Generation | Oct 5, 2022 | MinecraftVideo Generation | CodeCode Available | 2 |
| Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary | Jan 16, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 2 |
| Meta Prompting for AI Systems | Nov 20, 2023 | Data InteractionGSM8K | CodeCode Available | 2 |
| VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking | Jan 24, 2025 | DenoisingImage Generation | CodeCode Available | 2 |
| Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation | Feb 16, 2024 | Video Generation | CodeCode Available | 2 |
| FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion | Oct 27, 2022 | Data Augmentationtext annotation | CodeCode Available | 2 |
| Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking | Mar 9, 2023 | Contrastive LearningDecoder | CodeCode Available | 2 |
| MS-DETR: Efficient DETR Training with Mixed Supervision | Jan 8, 2024 | DecoderObject | CodeCode Available | 2 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures | Mar 20, 2025 | DeblurringZero-shot Generalization | CodeCode Available | 2 |
| Accelerating Transformers with Spectrum-Preserving Token Merging | May 25, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See | Oct 8, 2024 | | CodeCode Available | 2 |
| UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning | Jan 12, 2022 | Representation Learning | CodeCode Available | 2 |
| OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation | Jun 9, 2025 | Image Generation | CodeCode Available | 2 |
| ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs | Sep 22, 2023 | Math | CodeCode Available | 2 |
| CATT: Character-based Arabic Tashkeel Transformer | Jul 3, 2024 | Arabic Text DiacritizationDecoder | CodeCode Available | 2 |
| Monocular Occupancy Prediction for Scalable Indoor Scenes | Jul 16, 2024 | 3D Semantic Scene Completion from a single RGB imagePrediction | CodeCode Available | 2 |
| Very fast Bayesian Additive Regression Trees on GPU | Oct 30, 2024 | CPUGPU | CodeCode Available | 2 |
| Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal | Jul 24, 2024 | Raindrop RemovalRain Removal | CodeCode Available | 2 |
| MARG: Multi-Agent Review Generation for Scientific Papers | Jan 8, 2024 | Review GenerationSpecificity | CodeCode Available | 2 |
| YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection | Sep 27, 2024 | Fracture detection | CodeCode Available | 2 |
| Ignore Previous Prompt: Attack Techniques For Language Models | Nov 17, 2022 | Adversarial AttackAdversarial Text | CodeCode Available | 2 |
| IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Jul 15, 2024 | DenoisingDepth Estimation | CodeCode Available | 2 |
| GPT Can Solve Mathematical Problems Without a Calculator | Sep 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review | Jul 18, 2024 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| VSSD: Vision Mamba with Non-Causal State Space Duality | Jul 26, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video | Mar 12, 2025 | Video Inpainting | CodeCode Available | 2 |
| Dynamic Factor Allocation Leveraging Regime-Switching Signals | Oct 18, 2024 | | CodeCode Available | 2 |
| BoQ: A Place is Worth a Bag of Learnable Queries | May 12, 2024 | Image Similarity SearchRetrieval | CodeCode Available | 2 |
| Alpha-CLIP: A CLIP Model Focusing on Wherever You Want | Dec 6, 2023 | 3D Generation | CodeCode Available | 2 |
| Dual Vision Transformer | Jul 11, 2022 | | CodeCode Available | 2 |
| DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting | Nov 26, 2024 | AttributeDiversity | CodeCode Available | 2 |
| Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression | Jul 21, 2022 | HallucinationImage Enhancement | CodeCode Available | 2 |
| Graph-based Neural Weather Prediction for Limited Area Modeling | Sep 29, 2023 | Weather Forecasting | CodeCode Available | 2 |
| MOROCCO: Model Resource Comparison Framework | Apr 29, 2021 | model | CodeCode Available | 2 |
| LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging | Jan 1, 2025 | Lesion SegmentationSegmentation | CodeCode Available | 2 |
| Vakyansh: ASR Toolkit for Low Resource Indic languages | Mar 30, 2022 | Punctuation Restorationspeech-recognition | CodeCode Available | 2 |
| Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration | May 26, 2025 | Domain GeneralizationHallucination | CodeCode Available | 2 |
| KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model | Jun 26, 2025 | Representation LearningRetrieval | CodeCode Available | 2 |
| CodeS: Towards Building Open-source Language Models for Text-to-SQL | Feb 26, 2024 | Data AugmentationDiagnostic | CodeCode Available | 2 |
| TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer | Jul 27, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models | Apr 8, 2023 | Drug DiscoveryProtein Design | CodeCode Available | 2 |
| Number it: Temporal Grounding Videos like Flipping Manga | Nov 15, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning | Apr 14, 2025 | Mathematical Reasoningmbpp | CodeCode Available | 2 |
| Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement | Oct 15, 2024 | DisentanglementInductive Bias | CodeCode Available | 2 |
| Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation | Dec 1, 2022 | 3D GenerationText to 3D | CodeCode Available | 2 |
| Photoreal Scene Reconstruction from an Egocentric Device | Jun 4, 2025 | | CodeCode Available | 2 |
| How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library | Mar 31, 2024 | Question Answering | CodeCode Available | 2 |