| UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation | Dec 8, 2022 | Image SegmentationMedical Image Segmentation | CodeCode Available | 2 | 5 |
| StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding | Nov 6, 2024 | Image ComprehensionStreaming video understanding | CodeCode Available | 2 | 5 |
| Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba | Jul 12, 2024 | 3D Hand Pose EstimationMamba | CodeCode Available | 2 | 5 |
| Light and Optimal Schrödinger Bridge Matching | Feb 5, 2024 | | CodeCode Available | 2 | 5 |
| Fuzz4All: Universal Fuzzing with Large Language Models | Aug 9, 2023 | | CodeCode Available | 2 | 5 |
| VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection | Aug 22, 2023 | Anomaly DetectionBinary Classification | CodeCode Available | 2 | 5 |
| ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction | Mar 30, 2024 | | CodeCode Available | 2 | 5 |
| How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States | Jun 9, 2024 | Safety Alignment | CodeCode Available | 2 | 5 |
| Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design | Oct 17, 2024 | Protein DesignReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels | Dec 28, 2023 | Aesthetics Quality AssessmentImage Quality Assessment | CodeCode Available | 2 | 5 |
| TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts | Mar 5, 2024 | Graph AttentionGraph Embedding | CodeCode Available | 2 | 5 |
| The Chosen One: Consistent Characters in Text-to-Image Diffusion Models | Nov 16, 2023 | Consistent Character GenerationImage Generation | CodeCode Available | 2 | 5 |
| SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding | Aug 21, 2023 | Entity TypingEvent Extraction | CodeCode Available | 2 | 5 |
| MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | Jul 31, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 2 | 5 |
| Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning | Jan 2, 2025 | ImputationRetrieval | CodeCode Available | 2 | 5 |
| AlignBench: Benchmarking Chinese Alignment of Large Language Models | Nov 30, 2023 | Benchmarking | CodeCode Available | 2 | 5 |
| SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | Dec 20, 2023 | AttributeCross-Modal Retrieval | CodeCode Available | 2 | 5 |
| EMBER2024 -- A Benchmark Dataset for Holistic Evaluation of Malware Classifiers | Jun 5, 2025 | Malware AnalysisMalware Classification | CodeCode Available | 2 | 5 |
| Why are Visually-Grounded Language Models Bad at Image Classification? | May 28, 2024 | Classificationimage-classification | CodeCode Available | 2 | 5 |
| ICASSP 2023 Acoustic Echo Cancellation Challenge | Sep 22, 2023 | Acoustic echo cancellationSpeech Enhancement | CodeCode Available | 2 | 5 |
| Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning | Sep 11, 2024 | Large Language Model | CodeCode Available | 2 | 5 |
| ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images | Jan 1, 2024 | Denoising | CodeCode Available | 2 | 5 |
| Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning | Feb 14, 2025 | Reinforcement Learning (RL)Skills Assessment | CodeCode Available | 2 | 5 |
| Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation | May 10, 2024 | Semantic Segmentation | CodeCode Available | 2 | 5 |
| Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions | Sep 13, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning | Mar 18, 2025 | Autonomous DrivingMotion Planning | CodeCode Available | 2 | 5 |
| ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination | Oct 8, 2023 | DiversityMulti-agent Reinforcement Learning | CodeCode Available | 2 | 5 |
| BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration | May 22, 2024 | | CodeCode Available | 2 | 5 |
| Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving | Mar 28, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 2 | 5 |
| Pose for Everything: Towards Category-Agnostic Pose Estimation | Jul 21, 2022 | 2D Pose EstimationCategory-Agnostic Pose Estimation | CodeCode Available | 2 | 5 |
| Neural Optimal Transport | Jan 28, 2022 | Image-to-Image TranslationTranslation | CodeCode Available | 2 | 5 |
| Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations | Jul 17, 2024 | Image EnhancementLow-Light Image Enhancement | CodeCode Available | 2 | 5 |
| Singer Identity Representation Learning using Self-Supervised Techniques | Jan 10, 2024 | Domain GeneralizationRepresentation Learning | CodeCode Available | 2 | 5 |
| What does a platypus look like? Generating customized prompts for zero-shot image classification | Sep 7, 2022 | Descriptiveimage-classification | CodeCode Available | 2 | 5 |
| Towards Zero-shot Point Cloud Anomaly Detection: A Multi-View Projection Framework | Sep 20, 2024 | Anomaly DetectionSpecificity | CodeCode Available | 2 | 5 |
| Skeleton-free Pose Transfer for Stylized 3D Characters | Jul 28, 2022 | Pose Transfer | CodeCode Available | 2 | 5 |
| Ambiguous Medical Image Segmentation using Diffusion Models | Apr 10, 2023 | DiagnosticDiversity | CodeCode Available | 2 | 5 |
| VQA^2: Visual Question Answering for Video Quality Assessment | Nov 6, 2024 | Question AnsweringVideo Quality Assessment | CodeCode Available | 2 | 5 |
| PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation | Apr 1, 2024 | Layout DesignLayout Generation | CodeCode Available | 2 | 5 |
| Class-Incremental Learning: A Survey | Feb 7, 2023 | class-incremental learningClass Incremental Learning | CodeCode Available | 2 | 5 |
| Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling | Feb 21, 2019 | Sequence-To-Sequence Speech Recognition | CodeCode Available | 2 | 5 |
| InstMove: Instance Motion for Object-centric Video Segmentation | Mar 14, 2023 | ObjectOptical Flow Estimation | CodeCode Available | 2 | 5 |
| EgoMimic: Scaling Imitation Learning via Egocentric Video | Oct 31, 2024 | DiversityImitation Learning | CodeCode Available | 2 | 5 |
| Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies | Jul 18, 2024 | ARC | CodeCode Available | 2 | 5 |
| VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder | May 13, 2022 | Blind Face RestorationDecoder | CodeCode Available | 2 | 5 |
| SG-Reg: Generalizable and Efficient Scene Graph Registration | Apr 20, 2025 | GPU | CodeCode Available | 2 | 5 |
| Modular Boundaries in Recurrent Neural Networks | Oct 31, 2023 | Community DetectionDimensionality Reduction | CodeCode Available | 2 | 5 |
| ESMStereo: Enhanced ShuffleMixer Disparity Upsampling for Real-Time and Accurate Stereo Matching | Jun 26, 2025 | Disparity EstimationStereo Matching | CodeCode Available | 2 | 5 |
| Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model | Mar 13, 2024 | Autonomous Navigation | CodeCode Available | 2 | 5 |