| Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models | Jul 17, 2024 | BenchmarkingRed Teaming | CodeCode Available | 2 | 5 |
| Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Mar 31, 2025 | Logical ReasoningMultiple-choice | CodeCode Available | 2 | 5 |
| Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ | Jul 1, 2017 | | CodeCode Available | 2 | 5 |
| ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | Nov 11, 2024 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats | May 29, 2024 | De-identificationFairness | CodeCode Available | 2 | 5 |
| NeRF-RPN: A general framework for object detection in NeRFs | Nov 21, 2022 | NeRFobject-detection | CodeCode Available | 2 | 5 |
| HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models | Oct 23, 2023 | DiagnosticHallucination | CodeCode Available | 2 | 5 |
| Automatic Differentiation-based Full Waveform Inversion with Flexible Workflows | Nov 30, 2024 | Dynamic Time Warping | CodeCode Available | 2 | 5 |
| AirMorph: Topology-Preserving Deep Learning for Pulmonary Airway Analysis | Dec 15, 2024 | AnatomyDeep Learning | CodeCode Available | 2 | 5 |
| Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey | Feb 14, 2024 | Survey | CodeCode Available | 2 | 5 |
| SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding | May 22, 2025 | Motion EstimationQuestion Answering | CodeCode Available | 2 | 5 |
| An Empirical Study of Qwen3 Quantization | May 4, 2025 | Natural Language UnderstandingQuantization | CodeCode Available | 2 | 5 |
| One-shot Entropy Minimization | May 26, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 | 5 |
| KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Mar 31, 2024 | 3D Human Pose EstimationMonocular 3D Human Pose Estimation | CodeCode Available | 2 | 5 |
| BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks | May 26, 2023 | Image CaptioningMedical Visual Question Answering | CodeCode Available | 2 | 5 |
| Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets | May 17, 2025 | | CodeCode Available | 2 | 5 |
| VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling | Jun 6, 2024 | DiversityMusic Generation | CodeCode Available | 2 | 5 |
| ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities | Apr 9, 2025 | ColorizationSketch Colorization | CodeCode Available | 2 | 5 |
| ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL | May 30, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 2 | 5 |
| nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark | Jan 1, 2025 | BenchmarkingImage Segmentation | CodeCode Available | 2 | 5 |
| X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation | Mar 8, 2025 | GPUImage Generation | CodeCode Available | 2 | 5 |
| A Transformer-Based Siamese Network for Change Detection | Jan 4, 2022 | Change DetectionDecoder | CodeCode Available | 2 | 5 |
| Focal Modulation Networks | Mar 22, 2022 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| An Embodied Generalist Agent in 3D World | Nov 18, 2023 | 3D dense captioning3D Question Answering (3D-QA) | CodeCode Available | 2 | 5 |
| Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV | Mar 3, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 | 5 |
| JaxUED: A simple and useable UED library in Jax | Mar 19, 2024 | CPU | CodeCode Available | 2 | 5 |
| AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | Mar 2, 2024 | Instruction FollowingLLM real-life tasks | CodeCode Available | 2 | 5 |
| Reliable, Reproducible, and Really Fast Leaderboards with Evalica | Dec 15, 2024 | | CodeCode Available | 2 | 5 |
| PillarNet: Real-Time and High-Performance Pillar-based 3D Object Detection | May 16, 2022 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention | Feb 15, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 | 5 |
| ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model | Jun 11, 2025 | cross-modal alignmentDescriptive | CodeCode Available | 2 | 5 |
| Flows: Building Blocks of Reasoning and Collaborating AI | Aug 2, 2023 | Prompt Engineering | CodeCode Available | 2 | 5 |
| Supervised Contrastive Learning | Apr 23, 2020 | Class Incremental LearningContrastive Learning | CodeCode Available | 2 | 5 |
| All You Need to Know About Training Image Retrieval Models | Mar 17, 2025 | AllImage Retrieval | CodeCode Available | 2 | 5 |
| Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning | Jun 30, 2025 | Imitation LearningTrajectory Planning | CodeCode Available | 2 | 5 |
| AERO: Audio Super Resolution in the Spectral Domain | Nov 22, 2022 | Audio Super-ResolutionBandwidth Extension | CodeCode Available | 2 | 5 |
| MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining | Dec 29, 2023 | GPULanguage Modeling | CodeCode Available | 2 | 5 |
| GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval | Jul 17, 2024 | DecoderImage Enhancement | CodeCode Available | 2 | 5 |
| A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization | Jan 1, 2023 | Translation | CodeCode Available | 2 | 5 |
| Training Generative Adversarial Networks with Limited Data | Jun 11, 2020 | 10-shot image generationConditional Image Generation | CodeCode Available | 2 | 5 |
| GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond | Sep 28, 2023 | Benchmarking | CodeCode Available | 2 | 5 |
| Rethinking Visual Geo-localization for Large-Scale Applications | Apr 5, 2022 | Contrastive Learninggeo-localization | CodeCode Available | 2 | 5 |
| Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Jun 18, 2024 | Anomaly DetectionAnomaly Localization | CodeCode Available | 2 | 5 |
| DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling | May 16, 2025 | Attribute | CodeCode Available | 2 | 5 |
| SUM: Saliency Unification through Mamba for Visual Attention Modeling | Jun 25, 2024 | MambaMarketing | CodeCode Available | 2 | 5 |
| Masked Autoregressive Flow for Density Estimation | May 19, 2017 | Density Estimation | CodeCode Available | 2 | 5 |
| SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI | Jan 23, 2024 | MRI segmentationSegmentation | CodeCode Available | 2 | 5 |
| Detecting, Explaining, and Mitigating Memorization in Diffusion Models | Jul 31, 2024 | Image GenerationMemorization | CodeCode Available | 2 | 5 |
| Generalized Inner Loop Meta-Learning | Oct 3, 2019 | Meta-Learningreinforcement-learning | CodeCode Available | 2 | 5 |
| AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning | May 25, 2024 | | CodeCode Available | 2 | 5 |