| Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning | Mar 26, 2025 | Few-Shot LearningVisual Reasoning | CodeCode Available | 3 |
| Designing and building the mlpack open-source machine learning library | Aug 17, 2017 | BIG-bench Machine Learning | CodeCode Available | 3 |
| One-step Diffusion with Distribution Matching Distillation | Nov 30, 2023 | | CodeCode Available | 3 |
| EAFormer: Scene Text Segmentation with Edge-Aware Transformers | Jul 24, 2024 | DecoderSegmentation | CodeCode Available | 3 |
| Accurate clinical and biomedical Named entity recognition at scale | Jul 19, 2022 | Clinical Concept ExtractionDe-identification | CodeCode Available | 3 |
| Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 | Oct 3, 2024 | Scheduling | CodeCode Available | 3 |
| EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models | Feb 18, 2024 | Event ExtractionHallucination | CodeCode Available | 3 |
| LRM: Large Reconstruction Model for Single Image to 3D | Nov 8, 2023 | Image to 3DNeRF | CodeCode Available | 3 |
| GluonTS: Probabilistic Time Series Models in Python | Jun 12, 2019 | Anomaly DetectionTime Series | CodeCode Available | 3 |
| Practical Deep Reinforcement Learning Approach for Stock Trading | Nov 19, 2018 | Deep Reinforcement Learningreinforcement-learning | CodeCode Available | 3 |
| CodeBLEU: a Method for Automatic Evaluation of Code Synthesis | Sep 22, 2020 | Code TranslationTranslation | CodeCode Available | 3 |
| Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | Dec 5, 2024 | Multimodal ReasoningNatural Language Visual Grounding | CodeCode Available | 3 |
| Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Jun 10, 2024 | 3D Semantic SegmentationComputed Tomography (CT) | CodeCode Available | 3 |
| Text Embeddings Reveal (Almost) As Much As Text | Oct 10, 2023 | | CodeCode Available | 3 |
| dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching | May 17, 2025 | Denoising | CodeCode Available | 3 |
| SkillMimic: Learning Basketball Interaction Skills from Demonstrations | Aug 12, 2024 | DiversityHuman-Object Interaction Detection | CodeCode Available | 3 |
| DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation | Jan 28, 2025 | 3D Generation | CodeCode Available | 3 |
| Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model | Aug 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| RMPE: Regional Multi-person Pose Estimation | Dec 1, 2016 | 2D Human Pose EstimationHuman Detection | CodeCode Available | 3 |
| Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks | Jun 12, 2024 | BenchmarkingChatbot | CodeCode Available | 3 |
| NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models | Mar 5, 2024 | QuantizationSpeech Synthesis | CodeCode Available | 3 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving | Dec 2, 2024 | Autonomous DrivingNovel View Synthesis | CodeCode Available | 3 |
| Learning and discovering multiple solutions using physics-informed neural networks with random initialization and deep ensemble | Mar 8, 2025 | Uncertainty Quantification | CodeCode Available | 3 |
| 3D Facial Expressions through Analysis-by-Neural-Synthesis | Apr 5, 2024 | 3D Face ReconstructionFace Reconstruction | CodeCode Available | 3 |
| ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions | Mar 13, 2024 | Instance SegmentationObject Detection | CodeCode Available | 3 |
| GLU Variants Improve Transformer | Feb 12, 2020 | | CodeCode Available | 3 |
| OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research | May 16, 2023 | Philosophyreinforcement-learning | CodeCode Available | 3 |
| DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Mar 11, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 |
| ADOPT: Modified Adam Can Converge with Any β_2 with the Optimal Rate | Nov 5, 2024 | Deep Reinforcement Learningimage-classification | CodeCode Available | 3 |
| FlashSpeech: Efficient Zero-Shot Speech Synthesis | Apr 23, 2024 | RhythmSpeech Synthesis | CodeCode Available | 3 |
| Momentum Contrast for Unsupervised Visual Representation Learning | Nov 13, 2019 | Contrastive LearningImage Classification | CodeCode Available | 3 |
| Characterization of Excess Risk for Locally Strongly Convex Population Risk | Dec 4, 2020 | | CodeCode Available | 3 |
| wav2letter++: The Fastest Open-source Speech Recognition System | Dec 18, 2018 | Speech Recognition | CodeCode Available | 3 |
| Identifying Audio Adversarial Examples via Anomalous Pattern Detection | Feb 13, 2020 | | CodeCode Available | 3 |
| Towards VQA Models That Can Read | Apr 18, 2019 | TextVQAVisual Question Answering (VQA) | CodeCode Available | 3 |
| First Order Motion Model for Image Animation | Feb 29, 2020 | Image Animationmodel | CodeCode Available | 3 |
| Transformers in Medical Imaging: A Survey | Jan 24, 2022 | Image ClassificationImage Segmentation | CodeCode Available | 3 |
| Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models | Sep 3, 2023 | HallucinationWorld Knowledge | CodeCode Available | 3 |
| Pythia v0.1: the Winning Entry to the VQA Challenge 2018 | Jul 26, 2018 | Data AugmentationVisual Question Answering (VQA) | CodeCode Available | 3 |
| SQLFlow: A Bridge between SQL and Machine Learning | Jan 19, 2020 | BIG-bench Machine Learning | CodeCode Available | 3 |
| Mesh R-CNN | Jun 6, 2019 | 3D Shape ModelingPrediction | CodeCode Available | 3 |
| PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting | Dec 16, 2024 | 3D Reconstruction4k | CodeCode Available | 3 |
| MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Mar 7, 2025 | Video Generation | CodeCode Available | 3 |
| Efficient and Robust Automated Machine Learning | Dec 1, 2015 | AutoMLBayesian Optimization | CodeCode Available | 3 |
| MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem | May 20, 2025 | Mathematical Reasoningscientific discovery | CodeCode Available | 3 |
| SynSin: End-to-end View Synthesis from a Single Image | Dec 18, 2019 | Novel View Synthesis | CodeCode Available | 3 |
| An Extensible Framework for Open Heterogeneous Collaborative Perception | Jan 25, 2024 | | CodeCode Available | 3 |
| Multi-Head RAG: Solving Multi-Aspect Problems with LLMs | Jun 7, 2024 | BenchmarkingDecoder | CodeCode Available | 3 |
| Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence | Feb 12, 2020 | BIG-bench Machine LearningGPU | CodeCode Available | 3 |