| Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models | Jun 1, 2023 | GPUImage Compression | CodeCode Available | 2 |
| LibAUC: A Deep Learning Library for X-Risk Optimization | Jun 5, 2023 | BenchmarkingClassification | CodeCode Available | 2 |
| STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection | Jun 5, 2023 | Face AlignmentFacial Landmark Detection | CodeCode Available | 2 |
| Estimating heterogeneous treatment effects with right-censored data via causal survival forests | Jan 27, 2020 | | CodeCode Available | 2 |
| FasterViT: Fast Vision Transformers with Hierarchical Attention | Jun 9, 2023 | Image Classificationobject-detection | CodeCode Available | 2 |
| Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models | Jun 13, 2023 | Catalytic activity predictionChemical-Disease Interaction Extraction | CodeCode Available | 2 |
| P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark | May 21, 2025 | | CodeCode Available | 2 |
| EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations | Jun 21, 2023 | Graph Property Prediction | CodeCode Available | 2 |
| SoftGPT: Learn Goal-oriented Soft Object Manipulation Skills by Generative Pre-trained Heterogeneous Graph Transformer | Jun 22, 2023 | Object | CodeCode Available | 2 |
| 3D Reconstruction of Spherical Images based on Incremental Structure from Motion | Jun 22, 2023 | 3D Reconstruction | CodeCode Available | 2 |
| RVT: Robotic View Transformer for 3D Object Manipulation | Jun 26, 2023 | ObjectRobot Manipulation | CodeCode Available | 2 |
| To Spike or Not To Spike: A Digital Hardware Perspective on Deep Learning Acceleration | Jun 27, 2023 | | CodeCode Available | 2 |
| MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying | Jun 30, 2023 | Autonomous DrivingDecoder | CodeCode Available | 2 |
| An Open-Source Knowledge Graph Ecosystem for the Life Sciences | Jul 11, 2023 | Knowledge Graphs | CodeCode Available | 2 |
| Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation | Oct 29, 2024 | Few-shot 3D Point Cloud Semantic SegmentationPoint Cloud Segmentation | CodeCode Available | 2 |
| A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning | Jul 16, 2023 | Continual LearningFederated Learning | CodeCode Available | 2 |
| Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference | May 9, 2024 | | CodeCode Available | 2 |
| PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks | Jul 21, 2023 | | CodeCode Available | 2 |
| LP-MusicCaps: LLM-Based Pseudo Music Captioning | Jul 31, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion | Aug 11, 2023 | Voice Conversion | CodeCode Available | 2 |
| CDMamba: Incorporating Local Clues into Mamba for Remote Sensing Image Binary Change Detection | Jun 6, 2024 | Change DetectionMamba | CodeCode Available | 2 |
| Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations | Aug 23, 2023 | BenchmarkingDecoder | CodeCode Available | 2 |
| SONAR: Sentence-Level Multimodal and Language-Agnostic Representations | Aug 22, 2023 | DecoderMachine Translation | CodeCode Available | 2 |
| FreeVA: Offline MLLM as Training-Free Video Assistant | May 13, 2024 | FairnessQuestion Answering | CodeCode Available | 2 |
| RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World | Nov 29, 2024 | Robot Task PlanningScheduling | CodeCode Available | 2 |
| LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | Oct 14, 2024 | Video CaptioningVideo Generation | CodeCode Available | 2 |
| WeatherBench 2: A benchmark for the next generation of data-driven global weather models | Aug 29, 2023 | Weather Forecasting | CodeCode Available | 2 |
| ConTextTab: A Semantics-Aware Tabular In-Context Learner | Jun 12, 2025 | In-Context LearningWorld Knowledge | CodeCode Available | 2 |
| MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering | May 20, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 |
| FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Jun 10, 2025 | Image-text RetrievalQuestion Answering | CodeCode Available | 2 |
| PromptASR for contextualized ASR with controllable style | Sep 14, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 |
| PLVS: A SLAM System with Points, Lines, Volumetric Mapping, and 3D Incremental Segmentation | Sep 19, 2023 | 3D ReconstructionSegmentation | CodeCode Available | 2 |
| OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control | Sep 22, 2023 | GPUreinforcement-learning | CodeCode Available | 2 |
| RLLTE: Long-Term Evolution Project of Reinforcement Learning | Sep 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Smoothing Methods for Automatic Differentiation Across Conditional Branches | Oct 5, 2023 | Stochastic Optimization | CodeCode Available | 2 |
| Generative Judge for Evaluating Alignment | Oct 9, 2023 | | CodeCode Available | 2 |
| Under pressure: learning-based analog gauge reading in the wild | Apr 12, 2024 | | CodeCode Available | 2 |
| Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition | Oct 10, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Large Language Models as Zero-shot Dialogue State Tracker through Function Calling | Feb 16, 2024 | AvgDialogue State Tracking | CodeCode Available | 2 |
| DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model | Oct 11, 2023 | Autonomous DrivingImage Generation | CodeCode Available | 2 |
| UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | Oct 12, 2023 | 3D Object Detection3D Semantic Segmentation | CodeCode Available | 2 |
| A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models | Oct 14, 2023 | Document Ranking | CodeCode Available | 2 |
| Representation Learning with Large Language Models for Recommendation | Oct 24, 2023 | Recommendation SystemsRepresentation Learning | CodeCode Available | 2 |
| Atom: Low-bit Quantization for Efficient and Accurate LLM Serving | Oct 29, 2023 | GPUQuantization | CodeCode Available | 2 |
| EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition | Jun 26, 2024 | EEGEEG Emotion Recognition | CodeCode Available | 2 |
| GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs | Oct 14, 2024 | Few-Shot LearningTAG | CodeCode Available | 2 |
| ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras | Oct 12, 2024 | motion predictionPose Tracking | CodeCode Available | 2 |
| A Survey of Graph Meets Large Language Model: Progress and Future Directions | Nov 21, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion | Nov 27, 2023 | Lifelike 3D Human Generation | CodeCode Available | 2 |
| TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data | Feb 12, 2025 | | CodeCode Available | 2 |