| Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion | Mar 20, 2024 | Autonomous VehiclesDenoising | CodeCode Available | 3 | 5 |
| Delay-penalized CTC implemented based on Finite State Transducer | May 19, 2023 | Attribute | CodeCode Available | 3 | 5 |
| BlackMamba: Mixture of Experts for State-Space Models | Feb 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection | Dec 30, 2024 | object-detectionObject Detection | CodeCode Available | 3 | 5 |
| Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Apr 29, 2025 | Domain GeneralizationMath | CodeCode Available | 3 | 5 |
| OneChart: Purify the Chart Structural Extraction via One Auxiliary Token | Apr 15, 2024 | Decoder | CodeCode Available | 3 | 5 |
| AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos | Mar 30, 2025 | | CodeCode Available | 3 | 5 |
| What We Talk About When We Talk About LMs: Implicit Paradigm Shifts and the Ship of Language Models | Jul 2, 2024 | | CodeCode Available | 3 | 5 |
| StyleShot: A Snapshot on Any Style | Jul 1, 2024 | Image GenerationStyle Transfer | CodeCode Available | 3 | 5 |
| Theia: Distilling Diverse Vision Foundation Models for Robot Learning | Jul 29, 2024 | | CodeCode Available | 3 | 5 |
| BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec | Sep 9, 2024 | Quantization | CodeCode Available | 3 | 5 |
| Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning | May 25, 2023 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 | 5 |
| Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks | Mar 30, 2023 | Human ParsingPedestrian Attribute Recognition | CodeCode Available | 3 | 5 |
| DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection | Apr 19, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 3 | 5 |
| VAD: Vectorized Scene Representation for Efficient Autonomous Driving | Mar 21, 2023 | Autonomous DrivingBench2Drive | CodeCode Available | 3 | 5 |
| Scaling Diffusion Transformers to 16 Billion Parameters | Jul 16, 2024 | AttributeConditional Image Generation | CodeCode Available | 3 | 5 |
| Ola: Pushing the Frontiers of Omni-Modal Language Model | Feb 6, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 3 | 5 |
| SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery | Dec 15, 2023 | Contrastive LearningEarth Observation | CodeCode Available | 3 | 5 |
| LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning | Jun 5, 2023 | Benchmarking | CodeCode Available | 3 | 5 |
| Matcha-TTS: A fast TTS architecture with conditional flow matching | Sep 6, 2023 | Acoustic ModellingDecoder | CodeCode Available | 3 | 5 |
| Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models | Jul 9, 2024 | Vision and Language Navigation | CodeCode Available | 3 | 5 |
| Decoding-based Regression | Jan 31, 2025 | Density Estimationregression | CodeCode Available | 3 | 5 |
| OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | Dec 27, 2024 | DiversitySynthetic Data Generation | CodeCode Available | 3 | 5 |
| Demystifying Long Chain-of-Thought Reasoning in LLMs | Feb 5, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| MAXIM: Multi-Axis MLP for Image Processing | Jan 9, 2022 | DeblurringDenoising | CodeCode Available | 3 | 5 |
| MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark | Jun 3, 2024 | MMLUMulti-task Language Understanding | CodeCode Available | 3 | 5 |
| Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline | Jan 1, 2024 | Crowd Countingobject-detection | CodeCode Available | 3 | 5 |
| TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | Apr 18, 2024 | GPU | CodeCode Available | 3 | 5 |
| IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages | Mar 11, 2024 | Articles | CodeCode Available | 3 | 5 |
| Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection | Jun 2, 2024 | 3D Object Detectioncross-modal alignment | CodeCode Available | 3 | 5 |
| Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | Feb 27, 2025 | Image Generationtoken-classification | CodeCode Available | 3 | 5 |
| A Survey on Mixture of Experts | Jun 26, 2024 | In-Context LearningMixture-of-Experts | CodeCode Available | 3 | 5 |
| InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts | May 25, 2025 | Chart UnderstandingQuestion Answering | CodeCode Available | 3 | 5 |
| Descriptive Image Quality Assessment in the Wild | May 29, 2024 | DescriptiveImage Quality Assessment | CodeCode Available | 3 | 5 |
| Vision Transformer Adapter for Dense Predictions | May 17, 2022 | Instance SegmentationObject Detection | CodeCode Available | 3 | 5 |
| UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization | May 20, 2024 | Visual Localization | CodeCode Available | 3 | 5 |
| DoWhy: Addressing Challenges in Expressing and Validating Causal Assumptions | Aug 27, 2021 | Causal Discovery | CodeCode Available | 3 | 5 |
| GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns | May 27, 2024 | | CodeCode Available | 3 | 5 |
| Designing a Better Asymmetric VQGAN for StableDiffusion | Jun 7, 2023 | DecoderImage Generation | CodeCode Available | 3 | 5 |
| FinanceBench: A New Benchmark for Financial Question Answering | Nov 20, 2023 | How to refund a wrong transaction in PhonePeQuestion Answering | CodeCode Available | 3 | 5 |
| Deep Learning for Free-Hand Sketch: A Survey | Jan 8, 2020 | Deep LearningSurvey | CodeCode Available | 3 | 5 |
| SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation | Nov 26, 2024 | Natural Language UnderstandingReferring Video Object Segmentation | CodeCode Available | 3 | 5 |
| OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain | May 12, 2025 | Multivariate Time Series ForecastingRepresentation Learning | CodeCode Available | 3 | 5 |
| pyannote.audio: neural building blocks for speaker diarization | Nov 4, 2019 | Action DetectionActivity Detection | CodeCode Available | 3 | 5 |
| PyGDA: A Python Library for Graph Domain Adaptation | Mar 13, 2025 | Domain AdaptationGRAPH DOMAIN ADAPTATION | CodeCode Available | 3 | 5 |
| o1-Coder: an o1 Replication for Coding | Nov 29, 2024 | Reinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| Molecular Fingerprints Are Strong Models for Peptide Function Prediction | Jan 29, 2025 | Graph ClassificationGraph Regression | CodeCode Available | 3 | 5 |
| In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR | Jan 14, 2025 | Knowledge GraphsLanguage Modeling | CodeCode Available | 3 | 5 |
| All You May Need for VQA are Image Captions | May 4, 2022 | AllImage Captioning | CodeCode Available | 3 | 5 |
| YourBench: Easy Custom Evaluation Sets for Everyone | Apr 2, 2025 | MMLU | CodeCode Available | 3 | 5 |