| H3WB: Human3.6M 3D WholeBody Dataset and Benchmark | Nov 28, 2022 | 3D Facial Landmark Localization3D Hand Pose Estimation | CodeCode Available | 2 |
| PointPillars: Fast Encoders for Object Detection from Point Clouds | Dec 14, 2018 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| VTimeLLM: Empower LLM to Grasp Video Moments | Nov 30, 2023 | Dense Video CaptioningTemporal Relation Extraction | CodeCode Available | 2 |
| SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories | Mar 11, 2025 | Decision MakingInteractive Segmentation | CodeCode Available | 2 |
| Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding | Nov 15, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings | Mar 25, 2025 | 4kAction Recognition | CodeCode Available | 2 |
| Graph Diffusion Transformers for Multi-Conditional Molecular Generation | Jan 24, 2024 | DecoderDenoising | CodeCode Available | 2 |
| When and why vision-language models behave like bags-of-words, and what to do about it? | Oct 4, 2022 | Contrastive LearningRetrieval | CodeCode Available | 2 |
| CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models | Mar 31, 2024 | DenoisingSpeech Synthesis | CodeCode Available | 2 |
| FlexiDreamer: Single Image-to-3D Generation with FlexiCubes | Apr 1, 2024 | 3D GenerationImage to 3D | CodeCode Available | 2 |
| USP: Unified Self-Supervised Pretraining for Image Generation and Understanding | Mar 8, 2025 | Image GenerationRepresentation Learning | CodeCode Available | 2 |
| Alpha^2: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning | Jun 24, 2024 | Deep Reinforcement Learning | CodeCode Available | 2 |
| FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation | Mar 15, 2023 | DecoderInstance Segmentation | CodeCode Available | 2 |
| Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer | Jun 9, 2022 | Autonomous DrivingGPU | CodeCode Available | 2 |
| Mapping the Mind of an Instruction-based Image Editing using SMILE | Dec 20, 2024 | Autonomous Driving | CodeCode Available | 2 |
| MatteFormer: Transformer-Based Image Matting via Prior-Tokens | Mar 29, 2022 | Image Matting | CodeCode Available | 2 |
| LLMGA: Multimodal Large Language Model based Generation Assistant | Nov 27, 2023 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers | Jul 13, 2024 | MambaState Space Models | CodeCode Available | 2 |
| auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data | Apr 15, 2022 | BIG-bench Machine Learningcounterfactual | CodeCode Available | 2 |
| AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | Jul 10, 2023 | Image Animation | CodeCode Available | 2 |
| FastMoE: A Fast Mixture-of-Expert Training System | Mar 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 |
| Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving | Dec 9, 2024 | 4D reconstructionAutonomous Driving | CodeCode Available | 2 |
| Improving Image Restoration by Revisiting Global Information Aggregation | Dec 8, 2021 | Color Image DenoisingDeblurring | CodeCode Available | 2 |
| Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network | Jul 29, 2024 | DecoderSuper-Resolution | CodeCode Available | 2 |
| AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI | Jan 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 2 |
| FLAT: Chinese NER Using Flat-Lattice Transformer | Apr 24, 2020 | Chinese Named Entity Recognitionnamed-entity-recognition | CodeCode Available | 2 |
| RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models | Dec 31, 2023 | HallucinationRAG | CodeCode Available | 2 |
| SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Jul 30, 2024 | Caption GenerationQuestion Answering | CodeCode Available | 2 |
| Squeezeformer: An Efficient Transformer for Automatic Speech Recognition | Jun 2, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation | Dec 8, 2022 | 3D Reconstruction3D Shape Generation | CodeCode Available | 2 |
| ControlVideo: Training-free Controllable Text-to-Video Generation | May 22, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 2 |
| Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision | Feb 14, 2024 | Language ModellingSegmentation | CodeCode Available | 2 |
| Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction | Feb 28, 2024 | ChatbotReconstruction Attack | CodeCode Available | 2 |
| Forgetting Transformer: Softmax Attention with a Forget Gate | Mar 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Tool-Planner: Task Planning with Clusters across Multiple Tools | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| PID: Physics-Informed Diffusion Model for Infrared Image Generation | Jul 12, 2024 | Image Generation | CodeCode Available | 2 |
| Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey | Jul 10, 2024 | Adversarial AttackImage Generation | CodeCode Available | 2 |
| LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch | Sep 4, 2024 | Evolutionary AlgorithmsFairness | CodeCode Available | 2 |
| PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | Sep 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer | Sep 12, 2024 | Target Sound Extraction | CodeCode Available | 2 |
| No More Adam: Learning Rate Scaling at Initialization is All You Need | Dec 16, 2024 | All | CodeCode Available | 2 |
| DAMamba: Vision State Space Model with Dynamic Adaptive Scan | Feb 18, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | Feb 24, 2025 | Code Completion | CodeCode Available | 2 |
| NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit | Oct 28, 2022 | Singing Voice Synthesis | CodeCode Available | 2 |
| MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | Nov 28, 2023 | 3D Question Answering (3D-QA)Diagnostic | CodeCode Available | 2 |
| VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset | May 29, 2023 | Audio captioningAudio-Visual Captioning | CodeCode Available | 2 |
| Hierarchical Open-vocabulary Universal Image Segmentation | Jul 3, 2023 | Image ComprehensionImage Segmentation | CodeCode Available | 2 |
| vid-TLDR: Training Free Token merging for Light-weight Video Transformer | Mar 20, 2024 | Action RecognitionComputational Efficiency | CodeCode Available | 2 |
| Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation | Jan 15, 2025 | Image SegmentationReferring Expression Segmentation | CodeCode Available | 2 |
| Guiding Language Models of Code with Global Context using Monitors | Jun 19, 2023 | Code CompletionCode Generation | CodeCode Available | 2 |