| HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation | Apr 9, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| Liquid Structural State-Space Models | Sep 26, 2022 | Heart rate estimationLong-range modeling | CodeCode Available | 2 |
| STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting | Jun 7, 2024 | motion retargeting | CodeCode Available | 2 |
| SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description | Aug 24, 2024 | DescriptiveSpeech Synthesis | CodeCode Available | 2 |
| One Transformer Can Understand Both 2D & 3D Molecular Data | Oct 4, 2022 | Graph Regressionmolecular representation | CodeCode Available | 2 |
| SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video | Jan 30, 2022 | 3D Human ReconstructionNeural Rendering | CodeCode Available | 2 |
| LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Nov 14, 2024 | Earth ObservationInstruction Following | CodeCode Available | 2 |
| FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning | Aug 13, 2021 | Federated Learning | CodeCode Available | 2 |
| LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications | Mar 4, 2025 | Action Generation | CodeCode Available | 2 |
| Unveiling COVID-19 from Chest X-ray with deep learning: a hurdles race with small data | Apr 11, 2020 | Small Data Image ClassificationTransfer Learning | CodeCode Available | 2 |
| DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification | Jul 4, 2024 | DescriptiveDiversity | CodeCode Available | 2 |
| CoIR: A Comprehensive Benchmark for Code Information Retrieval Models | Jul 3, 2024 | BenchmarkingCode Search | CodeCode Available | 2 |
| ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations | Jul 30, 2021 | | CodeCode Available | 2 |
| BioCLIP: A Vision Foundation Model for the Tree of Life | Nov 30, 2023 | | CodeCode Available | 2 |
| VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module | Apr 7, 2024 | Image Registration | CodeCode Available | 2 |
| Fusing finetuned models for better pretraining | Apr 6, 2022 | | CodeCode Available | 2 |
| Flow Matching in Latent Space | Jul 17, 2023 | Computational EfficiencyImage Generation | CodeCode Available | 2 |
| Evaluating Explainability for Graph Neural Networks | Aug 19, 2022 | | CodeCode Available | 2 |
| Efficient Quality Diversity Optimization of 3D Buildings through 2D Pre-optimization | Mar 28, 2023 | Diversity | CodeCode Available | 2 |
| Certified Human Trajectory Prediction | Mar 20, 2024 | Autonomous VehiclesPrediction | CodeCode Available | 2 |
| Rethinking Mobile Block for Efficient Attention-based Models | Jan 3, 2023 | Unity | CodeCode Available | 2 |
| LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations | Oct 3, 2024 | | CodeCode Available | 2 |
| Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era | Sep 3, 2024 | Scene UnderstandingShadow Detection | CodeCode Available | 2 |
| MambaHSI: Spatial-Spectral Mamba for Hyperspectral Image Classification | Jan 9, 2025 | ClassificationHyperspectral Image Classification | CodeCode Available | 2 |
| Provable Robust Watermarking for AI-Generated Text | Jun 30, 2023 | Language Modelling | CodeCode Available | 2 |
| Large Language Models are Efficient Learners of Noise-Robust Speech Recognition | Jan 19, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation | May 16, 2024 | | CodeCode Available | 2 |
| ToolGen: Unified Tool Retrieval and Calling via Generation | Oct 4, 2024 | RetrievalText Generation | CodeCode Available | 2 |
| MoCha-Stereo: Motif Channel Attention Network for Stereo Matching | Apr 10, 2024 | Disparity EstimationStereo Depth Estimation | CodeCode Available | 2 |
| Equivariant Energy-Guided SDE for Inverse Molecular Design | Sep 30, 2022 | 3D Molecule GenerationDrug Discovery | CodeCode Available | 2 |
| LinVT: Empower Your Image-level Large Language Model to Understand Videos | Dec 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics | Dec 12, 2023 | Information RetrievalRepresentation Learning | CodeCode Available | 2 |
| Blue noise for diffusion models | Feb 7, 2024 | Denoising | CodeCode Available | 2 |
| Recurrent Memory Transformer | Jul 14, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Artificial Kuramoto Oscillatory Neurons | Oct 17, 2024 | Adversarial RobustnessObject Discovery | CodeCode Available | 2 |
| AvatarGen: A 3D Generative Model for Animatable Human Avatars | Nov 26, 2022 | Human Animation | CodeCode Available | 2 |
| SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas | Mar 18, 2025 | Multi-agent Reinforcement Learningreinforcement-learning | CodeCode Available | 2 |
| Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation | Nov 23, 2022 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 |
| SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning | May 16, 2025 | Contrastive Learning | CodeCode Available | 2 |
| Self-Supervised Multimodal Learning: A Survey | Mar 31, 2023 | Machine TranslationSelf-Supervised Learning | CodeCode Available | 2 |
| Unified Multimodal Discrete Diffusion | Mar 26, 2025 | Image CaptioningImage Generation | CodeCode Available | 2 |
| RepairAgent: An Autonomous, LLM-Based Agent for Program Repair | Mar 25, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration | Oct 21, 2024 | Point Cloud Registration | CodeCode Available | 2 |
| Accurate 3D Body Shape Regression using Metric and Semantic Attributes | Jun 14, 2022 | 3D Human Reconstruction3D Human Shape Estimation | CodeCode Available | 2 |
| Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models | Feb 19, 2024 | | CodeCode Available | 2 |
| ExpertPrompting: Instructing Large Language Models to be Distinguished Experts | May 24, 2023 | In-Context LearningInstruction Following | CodeCode Available | 2 |
| Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More | Feb 17, 2025 | | CodeCode Available | 2 |
| RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models | Nov 16, 2022 | Dimensionality ReductionInformation Retrieval | CodeCode Available | 2 |
| AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception | Apr 15, 2024 | | CodeCode Available | 2 |
| Multimodal Analogical Reasoning over Knowledge Graphs | Oct 1, 2022 | Graph EmbeddingKnowledge Graph Embedding | CodeCode Available | 2 |