| Masked Face Recognition Dataset and Application | Mar 20, 2020 | Face DetectionFace Recognition | CodeCode Available | 2 | 5 |
| LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models | Oct 12, 2023 | Natural Language UnderstandingQuantization | CodeCode Available | 2 | 5 |
| Semantic Image Synthesis via Diffusion Models | Jun 30, 2022 | DecoderDenoising | CodeCode Available | 2 | 5 |
| Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing | Apr 4, 2023 | Multimodal fashion image editing | CodeCode Available | 2 | 5 |
| Generating 3D Molecules for Target Protein Binding | Apr 19, 2022 | Drug DiscoveryGraph Neural Network | CodeCode Available | 2 | 5 |
| FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation | May 22, 2023 | Imitation LearningMotion Planning | CodeCode Available | 2 | 5 |
| Isotropic Correlation Models for the Cross-Section of Equity Returns | Nov 13, 2024 | | CodeCode Available | 2 | 5 |
| Large Language Model with Region-guided Referring and Grounding for CT Report Generation | Nov 23, 2024 | Computed Tomography (CT)Diagnostic | CodeCode Available | 2 | 5 |
| QAEncoder: Towards Aligned Representation Learning in Question Answering System | Sep 30, 2024 | Document EmbeddingQuestion Answering | CodeCode Available | 2 | 5 |
| Neural-Driven Image Editing | Jul 7, 2025 | Contrastive LearningMultimodel-guided image editing | CodeCode Available | 2 | 5 |
| Rethinking Negative Instances for Generative Named Entity Recognition | Feb 26, 2024 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 2 | 5 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 | 5 |
| Space Group Informed Transformer for Crystalline Materials Generation | Mar 23, 2024 | | CodeCode Available | 2 | 5 |
| SFFNet: A Wavelet-Based Spatial and Frequency Domain Fusion Network for Remote Sensing Segmentation | May 3, 2024 | feature selection | CodeCode Available | 2 | 5 |
| Fourier Neural Operator with Learned Deformations for PDEs on General Geometries | Jul 11, 2022 | valid | CodeCode Available | 2 | 5 |
| KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider | Jun 3, 2025 | | CodeCode Available | 2 | 5 |
| Deep Video Prior for Video Consistency and Propagation | Jan 27, 2022 | Optical Flow EstimationSemantic Segmentation | CodeCode Available | 2 | 5 |
| Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | Jan 11, 2021 | Language ModellingMixture-of-Experts | CodeCode Available | 2 | 5 |
| Towards Large-Scale Training of Pathology Foundation Models | Mar 24, 2024 | Nuclear SegmentationSelf-Supervised Learning | CodeCode Available | 2 | 5 |
| Explicit Differentiable Slicing and Global Deformation for Cardiac Mesh Reconstruction | Sep 3, 2024 | Anatomy | CodeCode Available | 2 | 5 |
| MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training | Aug 3, 2022 | Instance SegmentationSegmentation | CodeCode Available | 2 | 5 |
| Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards | Mar 14, 2025 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models | Jan 23, 2024 | Human-Object Interaction DetectionObject | CodeCode Available | 2 | 5 |
| Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts | Mar 7, 2025 | Mixture-of-ExpertsState Space Models | CodeCode Available | 2 | 5 |
| Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models | Jun 11, 2024 | DiversityGPU | CodeCode Available | 2 | 5 |
| InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions | Jan 24, 2024 | document understandingQuestion Answering | CodeCode Available | 2 | 5 |
| Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | Oct 12, 2024 | Experimental Designscientific discovery | CodeCode Available | 2 | 5 |
| MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation | Jun 29, 2025 | GPUOptical Flow Estimation | CodeCode Available | 2 | 5 |
| NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates | Jun 17, 2022 | Audio Super-ResolutionSuper-Resolution | CodeCode Available | 2 | 5 |
| KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model | Jan 2, 2025 | MTEB BenchmarkRetrieval-augmented Generation | CodeCode Available | 2 | 5 |
| RecDiffusion: Rectangling for Image Stitching with Diffusion Models | Mar 28, 2024 | Image Stitching | CodeCode Available | 2 | 5 |
| APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs | Oct 31, 2024 | | CodeCode Available | 2 | 5 |
| PALO: A Polyglot Large Multimodal Model for 5B People | Feb 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Continual Test-Time Domain Adaptation | Mar 25, 2022 | Domain AdaptationTest-time Adaptation | CodeCode Available | 2 | 5 |
| Taming Data and Transformers for Audio Generation | Jun 27, 2024 | Audio captioningAudio Generation | CodeCode Available | 2 | 5 |
| Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection | Sep 13, 2024 | MambaOpen Vocabulary Object Detection | CodeCode Available | 2 | 5 |
| DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI | Jul 19, 2023 | Conversational RecommendationDiversity | CodeCode Available | 2 | 5 |
| Sample-Efficient Diffusion for Text-To-Speech Synthesis | Sep 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model | May 31, 2024 | 3DGSImage Compression | CodeCode Available | 2 | 5 |
| SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | May 16, 2024 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games | Apr 26, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 2 | 5 |
| Real-time Spatial-temporal Traversability Assessment via Feature-based Sparse Gaussian Process | Mar 6, 2025 | Autonomous NavigationComputational Efficiency | CodeCode Available | 2 | 5 |
| FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction | May 28, 2024 | In-Context LearningPrediction | CodeCode Available | 2 | 5 |
| AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities | Dec 18, 2024 | Change DetectionDiversity | CodeCode Available | 2 | 5 |
| LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language | May 21, 2024 | regression | CodeCode Available | 2 | 5 |
| BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations | Apr 15, 2022 | Self-Supervised Learning | CodeCode Available | 2 | 5 |
| Learning local equivariant representations for quantum operators | Jul 8, 2024 | Computational Efficiency | CodeCode Available | 2 | 5 |
| Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation | Apr 1, 2024 | Action SegmentationSegmentation | CodeCode Available | 2 | 5 |
| Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context | Sep 15, 2023 | | CodeCode Available | 2 | 5 |
| ByT5 model for massively multilingual grapheme-to-phoneme conversion | Apr 6, 2022 | Grapheme-to-Phoneme Conversion | CodeCode Available | 2 | 5 |