| Adapting a Language Model While Preserving its General Knowledge | Jan 21, 2023 | Continual LearningGeneral Knowledge | CodeCode Available | 2 |
| DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design | Feb 26, 2024 | AvgDrug Design | CodeCode Available | 2 |
| NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality | May 9, 2022 | SentenceSpeech Synthesis | CodeCode Available | 2 |
| General Detection-based Text Line Recognition | Sep 25, 2024 | HTROptical Character Recognition (OCR) | CodeCode Available | 2 |
| Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models | Apr 14, 2025 | Action GenerationDenoising | CodeCode Available | 2 |
| FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling | Jun 9, 2025 | Density Estimation | CodeCode Available | 2 |
| Denoising Diffusion Models for Plug-and-Play Image Restoration | May 15, 2023 | DeblurringDenoising | CodeCode Available | 2 |
| OmniColor: A Global Camera Pose Optimization Approach of LiDAR-360Camera Fusion for Colorizing Point Clouds | Apr 6, 2024 | 3D Reconstruction | CodeCode Available | 2 |
| RemDet: Rethinking Efficient Model Design for UAV Object Detection | Dec 13, 2024 | Objectobject-detection | CodeCode Available | 2 |
| Pose2Sim: An open-source Python package for multiview markerless kinematics | Sep 14, 2022 | | CodeCode Available | 2 |
| Scaling Laws of Synthetic Images for Model Training ... for Now | Dec 7, 2023 | | CodeCode Available | 2 |
| DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency | Mar 10, 2024 | PredictionPrognosis | CodeCode Available | 2 |
| FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation | Mar 29, 2024 | Blind DockingDrug Discovery | CodeCode Available | 2 |
| Sky-image-based solar forecasting using deep learning with multi-location data: training models locally, globally or via transfer learning? | Nov 3, 2022 | Transfer Learning | CodeCode Available | 2 |
| Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study | Oct 19, 2022 | Data AugmentationRelation | CodeCode Available | 2 |
| MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | Mar 28, 2024 | AI AgentMinecraft | CodeCode Available | 2 |
| DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis | Dec 28, 2024 | 3DGSFace Generation | CodeCode Available | 2 |
| Neural Preset for Color Style Transfer | Mar 23, 2023 | 4kColor Normalization | CodeCode Available | 2 |
| XNet: Wavelet-Based Low and High Frequency Fusion Networks for Fully- and Semi-Supervised Semantic Segmentation of Biomedical Images | Jan 1, 2023 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| VecCity: A Taxonomy-guided Library for Map Entity Representation Learning | Oct 31, 2024 | Representation Learning | CodeCode Available | 2 |
| Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval | Mar 22, 2023 | Image-text matchingLanguage Modeling | CodeCode Available | 2 |
| MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement | Jul 1, 2025 | Automatic Speech RecognitionMamba | CodeCode Available | 2 |
| Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps | Sep 26, 2022 | 3D Object DetectionMonocular 3D Object Detection | CodeCode Available | 2 |
| Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning | Oct 12, 2022 | Contrastive LearningForm | CodeCode Available | 2 |
| FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier | Nov 15, 2023 | Computational EfficiencyPatch Matching | CodeCode Available | 2 |
| Neural Light Spheres for Implicit Image Stitching and View Synthesis | Sep 26, 2024 | Image Stitching | CodeCode Available | 2 |
| The More You See in 2D the More You Perceive in 3D | Jan 1, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 2 |
| Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models | Dec 10, 2024 | Video Generation | CodeCode Available | 2 |
| Data Management For Training Large Language Models: A Survey | Dec 4, 2023 | ManagementSurvey | CodeCode Available | 2 |
| AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning | Dec 4, 2024 | Video Understanding | CodeCode Available | 2 |
| MTS-Mixers: Multivariate Time Series Forecasting via Factorized Temporal and Channel Mixing | Feb 9, 2023 | Multivariate Time Series ForecastingTime Series | CodeCode Available | 2 |
| Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models | May 5, 2025 | Anomaly ClassificationAnomaly Detection | CodeCode Available | 2 |
| AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making | Nov 6, 2024 | Decision MakingDiversity | CodeCode Available | 2 |
| PyTorch-IE: Fast and Reproducible Prototyping for Information Extraction | May 16, 2024 | | CodeCode Available | 2 |
| ProteinBERT: a universal deep-learning model of protein sequence and function | Feb 10, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Large Language Model Instruction Following: A Survey of Progresses and Challenges | Mar 18, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation | Mar 28, 2024 | 3D Object DetectionNovel Class Discovery | CodeCode Available | 2 |
| On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR | Nov 1, 2024 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 2 |
| Image as Set of Points | Mar 2, 2023 | Clustering | CodeCode Available | 2 |
| Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Jun 11, 2025 | Multimodal ReasoningSpatial Reasoning | CodeCode Available | 2 |
| Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code | Oct 2, 2023 | Image GenerationText-based Image Editing | CodeCode Available | 2 |
| Flowformer: Linearizing Transformers with Conservation Flows | Feb 13, 2022 | D4RLOffline RL | CodeCode Available | 2 |
| Scaled Decoupled Distillation | Jan 1, 2024 | Knowledge Distillation | CodeCode Available | 2 |
| Diffusion Models for Imperceptible and Transferable Adversarial Attack | May 14, 2023 | Adversarial Attack | CodeCode Available | 2 |
| UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Dec 13, 2024 | Contrastive Learning | CodeCode Available | 2 |
| MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems | Jan 7, 2025 | RAGRetrieval | CodeCode Available | 2 |
| CausalGym: Benchmarking causal interpretability methods on linguistic tasks | Feb 19, 2024 | BenchmarkingInterpretability Techniques for Deep Learning | CodeCode Available | 2 |
| PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns | Mar 20, 2024 | Multimodal Reasoning | CodeCode Available | 2 |
| Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes | May 17, 2023 | Autonomous DrivingTrajectory Planning | CodeCode Available | 2 |
| Harnessing Vision Models for Time Series Analysis: A Survey | Feb 13, 2025 | SurveyTime Series | CodeCode Available | 2 |