| Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions | Aug 8, 2023 | Caption GenerationImage Captioning | CodeCode Available | 2 |
| DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | May 30, 2024 | DeepFake DetectionMamba | CodeCode Available | 2 |
| MonoFormer: One Transformer for Both Diffusion and Autoregression | Sep 24, 2024 | Image GenerationText Generation | CodeCode Available | 2 |
| MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition | Nov 24, 2020 | GPUImage Matting | CodeCode Available | 2 |
| Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov Arnold Networks | Jun 16, 2024 | FormKolmogorov-Arnold Networks | CodeCode Available | 2 |
| iPad: Iterative Proposal-centric End-to-End Autonomous Driving | May 21, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 2 |
| Reconstructing 3D Human Pose by Watching Humans in the Mirror | Apr 1, 2021 | 3D Pose EstimationPose Estimation | CodeCode Available | 2 |
| Occlusion-Aware Instance Segmentation via BiLayer Network Architectures | Aug 8, 2022 | Human Instance SegmentationInstance Segmentation | CodeCode Available | 2 |
| RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models | Sep 30, 2024 | Contrastive Learning | CodeCode Available | 2 |
| Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference | Dec 15, 2023 | DecoderDenoising | CodeCode Available | 2 |
| DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition | Dec 30, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Policy improvement by planning with Gumbel | Sep 29, 2021 | reinforcement-learningReinforcement Learning (RL) | CodeCode Available | 2 |
| BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization | Oct 14, 2019 | Bayesian OptimisationBayesian Optimization | CodeCode Available | 2 |
| TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP | Apr 29, 2020 | Adversarial AttackAdversarial Text | CodeCode Available | 2 |
| TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation | Oct 8, 2024 | Video Generation | CodeCode Available | 2 |
| Unpaired Motion Style Transfer from Video to Animation | May 12, 2020 | 3D ReconstructionMotion Style Transfer | CodeCode Available | 2 |
| ADOP: Approximate Differentiable One-Pixel Point Rendering | Oct 13, 2021 | Inverse RenderingNeural Rendering | CodeCode Available | 2 |
| LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models | Dec 5, 2023 | Decoder | CodeCode Available | 2 |
| OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment | Mar 27, 2025 | Retrieval-augmented Generation | CodeCode Available | 2 |
| JAX, M.D.: A Framework for Differentiable Physics | Dec 9, 2019 | Drug DiscoveryGPU | CodeCode Available | 2 |
| Image Inpainting with Learnable Feature Imputation | Nov 2, 2020 | Image InpaintingImputation | CodeCode Available | 2 |
| Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation | Jul 19, 2023 | Talking Head GenerationVideo Generation | CodeCode Available | 2 |
| RouterBench: A Benchmark for Multi-LLM Routing System | Mar 18, 2024 | | CodeCode Available | 2 |
| VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization | Mar 31, 2021 | Virtual Try-onVocal Bursts Intensity Prediction | CodeCode Available | 2 |
| Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models | Feb 5, 2024 | Medical Diagnosis | CodeCode Available | 2 |
| ESOD: Efficient Small Object Detection on High-Resolution Images | Jul 23, 2024 | GPUObject | CodeCode Available | 2 |
| FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration | Aug 19, 2020 | 3D Hand Pose Estimation3D Human Reconstruction | CodeCode Available | 2 |
| Inversion-Based Style Transfer with Diffusion Models | Nov 23, 2022 | DenoisingImage Generation | CodeCode Available | 2 |
| BOP Challenge 2020 on 6D Object Localization | Sep 15, 2020 | 6D Pose Estimation6D Pose Estimation using RGB | CodeCode Available | 2 |
| pixelNeRF: Neural Radiance Fields from One or Few Images | Dec 3, 2020 | 3D ReconstructionGeneralizable Novel View Synthesis | CodeCode Available | 2 |
| Episodic Memories Generation and Evaluation Benchmark for Large Language Models | Jan 21, 2025 | | CodeCode Available | 2 |
| Video Swin Transformer | Jun 24, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding | Feb 14, 2024 | ChatbotCode Generation | CodeCode Available | 2 |
| Open Graph Benchmark: Datasets for Machine Learning on Graphs | May 2, 2020 | Knowledge GraphsNode Property Prediction | CodeCode Available | 2 |
| "Principal Components" Enable A New Language of Images | Mar 11, 2025 | Decoder | CodeCode Available | 2 |
| FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration | Aug 13, 2021 | 3D Human Pose Estimation3D Human Reconstruction | CodeCode Available | 2 |
| GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields | Nov 24, 2020 | Image GenerationNeural Rendering | CodeCode Available | 2 |
| Flow-edge Guided Video Completion | Sep 3, 2020 | Video Inpainting | CodeCode Available | 2 |
| Frustratingly Simple Few-Shot Object Detection | Mar 16, 2020 | Cross-Domain Few-Shot Object DetectionFew-Shot Object Detection | CodeCode Available | 2 |
| Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM | Aug 2, 2019 | | CodeCode Available | 2 |
| OWL: A Large Language Model for IT Operations | Sep 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Auto-Regressive Moving Diffusion Models for Time Series Forecasting | Dec 12, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 2 |
| LandMarkSystem Technical Report | Mar 27, 2025 | 3DGS3D Reconstruction | CodeCode Available | 2 |
| SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training | Jun 2, 2021 | Deep LearningFraud Detection | CodeCode Available | 2 |
| PTQ4SAM: Post-Training Quantization for Segment Anything | May 6, 2024 | Instance Segmentationobject-detection | CodeCode Available | 2 |
| Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey | Apr 29, 2025 | Decision MakingMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Full-Atom Peptide Design with Geometric Latent Diffusion | Feb 21, 2024 | | CodeCode Available | 2 |
| SoccerMap: A Deep Learning Architecture for Visually-Interpretable Analysis in Soccer | Oct 20, 2020 | Decision Making | CodeCode Available | 2 |
| TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | Aug 6, 2024 | Speech EnhancementSpeech Separation | CodeCode Available | 2 |
| Example-based Motion Synthesis via Generative Motion Matching | Jun 1, 2023 | Motion GenerationMotion Synthesis | CodeCode Available | 2 |