| LLM Inference Unveiled: Survey and Roofline Model Insights | Feb 26, 2024 | Knowledge DistillationLanguage Modelling | CodeCode Available | 4 | 5 |
| Multimodal Whole Slide Foundation Model for Pathology | Nov 29, 2024 | Cross-Modal Retrievalmodel | CodeCode Available | 4 | 5 |
| TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch | Oct 27, 2023 | Self-Supervised LearningSpeech Enhancement | CodeCode Available | 4 | 5 |
| Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | Jun 12, 2024 | | CodeCode Available | 4 | 5 |
| MonSter: Marry Monodepth to Stereo Unleashes Power | Jan 15, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 4 | 5 |
| Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook | Oct 16, 2023 | Time SeriesTime Series Analysis | CodeCode Available | 4 | 5 |
| Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference | Mar 8, 2023 | Hyperparameter OptimizationLanguage Modeling | CodeCode Available | 4 | 5 |
| Efficient Post-training Quantization with FP8 Formats | Sep 26, 2023 | image-classificationImage Classification | CodeCode Available | 4 | 5 |
| Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments | Jun 24, 2024 | Benchmarking | CodeCode Available | 4 | 5 |
| Transformers in Time Series: A Survey | Feb 15, 2022 | Anomaly DetectionSurvey | CodeCode Available | 4 | 5 |
| RaTEScore: A Metric for Radiology Report Generation | Jun 24, 2024 | DiagnosticEntity Embeddings | CodeCode Available | 4 | 5 |
| ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching | Jul 12, 2025 | Dialogue Generationtext-to-speech | CodeCode Available | 4 | 5 |
| Atom of Thoughts for Markov LLM Test-Time Scaling | Feb 17, 2025 | | CodeCode Available | 4 | 5 |
| Mixtral of Experts | Jan 8, 2024 | Code GenerationCommon Sense Reasoning | CodeCode Available | 4 | 5 |
| ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy | Feb 8, 2025 | Q-LearningSafe Exploration | CodeCode Available | 3 | 5 |
| KwaiAgents: Generalized Information-seeking Agent System with Large Language Models | Dec 8, 2023 | | CodeCode Available | 3 | 5 |
| FlexRAG: A Flexible and Comprehensive Framework for Retrieval-Augmented Generation | Jun 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| How Far Are We From AGI: Are LLMs All We Need? | May 16, 2024 | All | CodeCode Available | 3 | 5 |
| Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework | Mar 25, 2024 | Denoising | CodeCode Available | 3 | 5 |
| Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting | Mar 15, 2024 | 3D GenerationImage to 3D | CodeCode Available | 3 | 5 |
| TKAN: Temporal Kolmogorov-Arnold Networks | May 12, 2024 | Kolmogorov-Arnold NetworksManagement | CodeCode Available | 3 | 5 |
| How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition | Oct 9, 2023 | Code GenerationInstruction Following | CodeCode Available | 3 | 5 |
| What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? | Mar 10, 2024 | Depth EstimationImage Matting | CodeCode Available | 3 | 5 |
| HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation | Jan 24, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 | 5 |
| AutoAugment: Learning Augmentation Policies from Data | May 24, 2018 | Data AugmentationDomain Generalization | CodeCode Available | 3 | 5 |
| Attention Heads of Large Language Models: A Survey | Sep 5, 2024 | Survey | CodeCode Available | 3 | 5 |
| Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts | Mar 11, 2024 | Anomaly Detection | CodeCode Available | 3 | 5 |
| Time-series Transformer Generative Adversarial Networks | May 23, 2022 | Question AnsweringTime Series | CodeCode Available | 3 | 5 |
| Denoising Vision Transformers | Jan 5, 2024 | DenoisingDepth Estimation | CodeCode Available | 3 | 5 |
| High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain Activity | Jan 1, 2023 | DenoisingImage Reconstruction | CodeCode Available | 3 | 5 |
| CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Jan 20, 2025 | Video GenerationVirtual Try-on | CodeCode Available | 3 | 5 |
| SOAP: Improving and Stabilizing Shampoo using Adam | Sep 17, 2024 | Computational Efficiency | CodeCode Available | 3 | 5 |
| Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips | Feb 15, 2024 | | CodeCode Available | 3 | 5 |
| M+: Extending MemoryLLM with Scalable Long-Term Memory | Feb 1, 2025 | 16kGPU | CodeCode Available | 3 | 5 |
| MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 3 | 5 |
| MiniViT: Compressing Vision Transformers with Weight Multiplexing | Apr 14, 2022 | DiversityImage Classification | CodeCode Available | 3 | 5 |
| SPMamba: State-space model is all you need in speech separation | Apr 2, 2024 | AllMamba | CodeCode Available | 3 | 5 |
| Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection | Aug 6, 2024 | audio moment retrievalHighlight Detection | CodeCode Available | 3 | 5 |
| Vision as LoRA | Mar 26, 2025 | | CodeCode Available | 3 | 5 |
| Deep Limit Order Book Forecasting | Mar 14, 2024 | Deep Learning | CodeCode Available | 3 | 5 |
| Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Mar 14, 2024 | MambaMoment Retrieval | CodeCode Available | 3 | 5 |
| ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting | Jul 23, 2023 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 3 | 5 |
| EfficientFormer: Vision Transformers at MobileNet Speed | Jun 2, 2022 | | CodeCode Available | 3 | 5 |
| Demystify Mamba in Vision: A Linear Attention Perspective | May 26, 2024 | image-classificationImage Classification | CodeCode Available | 3 | 5 |
| Visual Large Language Models for Generalized and Specialized Applications | Jan 6, 2025 | Ethics | CodeCode Available | 3 | 5 |
| Order Matters: Sequence to sequence for sets | Nov 19, 2015 | Language Modeling | CodeCode Available | 3 | 5 |
| MotionBERT: A Unified Perspective on Learning Human Motion Representations | Oct 12, 2022 | 3D Human Pose Estimation3D Pose Estimation | CodeCode Available | 3 | 5 |
| SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation | Dec 16, 2024 | DecoderSemantic Segmentation | CodeCode Available | 3 | 5 |
| Large Language Models as Tool Makers | May 26, 2023 | | CodeCode Available | 3 | 5 |
| Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception | Oct 16, 2024 | Binary ClassificationChunking | CodeCode Available | 3 | 5 |