| Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | Oct 3, 2024 | | CodeCode Available | 3 | 5 |
| RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control | May 27, 2024 | | CodeCode Available | 3 | 5 |
| Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Dec 18, 2024 | Representation LearningRobot Manipulation | CodeCode Available | 3 | 5 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 | 5 |
| Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | Jul 19, 2024 | | CodeCode Available | 3 | 5 |
| DataDecide: How to Predict Best Pretraining Data with Small Experiments | Apr 15, 2025 | ARCHellaSwag | CodeCode Available | 3 | 5 |
| The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry | Feb 6, 2024 | | CodeCode Available | 3 | 5 |
| UCF: Uncovering Common Features for Generalizable Deepfake Detection | Apr 27, 2023 | Binary ClassificationDecoder | CodeCode Available | 3 | 5 |
| Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection | Mar 19, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 3 | 5 |
| REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers | Apr 15, 2025 | Image Generation | CodeCode Available | 3 | 5 |
| C-Adapter: Adapting Deep Classifiers for Efficient Conformal Prediction Sets | Oct 12, 2024 | Conformal PredictionPrediction | CodeCode Available | 3 | 5 |
| Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis | May 16, 2024 | Language ModellingLarge Language Model | CodeCode Available | 3 | 5 |
| CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification | Mar 13, 2022 | Audio ClassificationKnowledge Distillation | CodeCode Available | 3 | 5 |
| Modular Duality in Deep Learning | Oct 28, 2024 | Deep LearningGPU | CodeCode Available | 3 | 5 |
| Distributed Prioritized Experience Replay | Mar 2, 2018 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 3 | 5 |
| PromptHMR: Promptable Human Mesh Recovery | Apr 8, 2025 | 3D Human Pose EstimationHuman Mesh Recovery | CodeCode Available | 3 | 5 |
| Pushing the Limits of Large Language Model Quantization via the Linearity Theorem | Nov 26, 2024 | GPULanguage Modeling | CodeCode Available | 3 | 5 |
| U-Net: Convolutional Networks for Biomedical Image Segmentation | May 18, 2015 | Cell SegmentationCell Tracking | CodeCode Available | 3 | 5 |
| History-Guided Video Diffusion | Feb 10, 2025 | Video Generation | CodeCode Available | 3 | 5 |
| Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services | Apr 25, 2024 | GPU | CodeCode Available | 3 | 5 |
| Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval | Feb 17, 2025 | Information RetrievalRetrieval | CodeCode Available | 3 | 5 |
| Probabilistic Volumetric Fusion for Dense Monocular SLAM | Oct 3, 2022 | | CodeCode Available | 3 | 5 |
| Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation | May 30, 2023 | Machine TranslationSegmentation | CodeCode Available | 3 | 5 |
| Discovered Policy Optimisation | Oct 11, 2022 | IngenuityMeta-Learning | CodeCode Available | 3 | 5 |
| MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | May 13, 2024 | Data AugmentationGSM8K | CodeCode Available | 3 | 5 |
| On Distillation of Guided Diffusion Models | Oct 6, 2022 | DenoisingImage Generation | CodeCode Available | 3 | 5 |
| SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | Aug 26, 2024 | | CodeCode Available | 3 | 5 |
| SoundStream: An End-to-End Neural Audio Codec | Jul 7, 2021 | CPUDecoder | CodeCode Available | 3 | 5 |
| Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective | Feb 2, 2025 | Multi-Task Learning | CodeCode Available | 3 | 5 |
| On the Content Bias in Fréchet Video Distance | Apr 18, 2024 | Video Generation | CodeCode Available | 3 | 5 |
| Flow Matching for Generative Modeling | Oct 6, 2022 | Density EstimationImage Generation | CodeCode Available | 3 | 5 |
| W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training | Aug 7, 2021 | Contrastive LearningLanguage Modeling | CodeCode Available | 3 | 5 |
| 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Feb 16, 2024 | DenoisingRobot Manipulation | CodeCode Available | 3 | 5 |
| Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion | Jun 6, 2024 | 3D Generation | CodeCode Available | 3 | 5 |
| SkyMath: Technical Report | Oct 25, 2023 | GSM8KLanguage Modeling | CodeCode Available | 3 | 5 |
| XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters | May 19, 2023 | | CodeCode Available | 3 | 5 |
| Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning | Mar 26, 2025 | Few-Shot LearningVisual Reasoning | CodeCode Available | 3 | 5 |
| Designing and building the mlpack open-source machine learning library | Aug 17, 2017 | BIG-bench Machine Learning | CodeCode Available | 3 | 5 |
| One-step Diffusion with Distribution Matching Distillation | Nov 30, 2023 | | CodeCode Available | 3 | 5 |
| EAFormer: Scene Text Segmentation with Edge-Aware Transformers | Jul 24, 2024 | DecoderSegmentation | CodeCode Available | 3 | 5 |
| Accurate clinical and biomedical Named entity recognition at scale | Jul 19, 2022 | Clinical Concept ExtractionDe-identification | CodeCode Available | 3 | 5 |
| Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 | Oct 3, 2024 | Scheduling | CodeCode Available | 3 | 5 |
| EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models | Feb 18, 2024 | Event ExtractionHallucination | CodeCode Available | 3 | 5 |
| LRM: Large Reconstruction Model for Single Image to 3D | Nov 8, 2023 | Image to 3DNeRF | CodeCode Available | 3 | 5 |
| GluonTS: Probabilistic Time Series Models in Python | Jun 12, 2019 | Anomaly DetectionTime Series | CodeCode Available | 3 | 5 |
| Practical Deep Reinforcement Learning Approach for Stock Trading | Nov 19, 2018 | Deep Reinforcement Learningreinforcement-learning | CodeCode Available | 3 | 5 |
| CodeBLEU: a Method for Automatic Evaluation of Code Synthesis | Sep 22, 2020 | Code TranslationTranslation | CodeCode Available | 3 | 5 |
| Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | Dec 5, 2024 | Multimodal ReasoningNatural Language Visual Grounding | CodeCode Available | 3 | 5 |
| Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Jun 10, 2024 | 3D Semantic SegmentationComputed Tomography (CT) | CodeCode Available | 3 | 5 |
| Text Embeddings Reveal (Almost) As Much As Text | Oct 10, 2023 | | CodeCode Available | 3 | 5 |