| DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Apr 13, 2024 | Data AugmentationKey Point Matching | CodeCode Available | 3 | 5 |
| SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents | Jan 17, 2024 | Natural Language Visual Grounding | CodeCode Available | 3 | 5 |
| Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents | Feb 22, 2025 | AI Agent | CodeCode Available | 3 | 5 |
| MEMORYLLM: Towards Self-Updatable Large Language Models | Feb 7, 2024 | Model Editing | CodeCode Available | 3 | 5 |
| BatchTopK Sparse Autoencoders | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning | Nov 26, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 | 5 |
| Large Language Models Are Human-Level Prompt Engineers | Nov 3, 2022 | Few-Shot LearningIn-Context Learning | CodeCode Available | 3 | 5 |
| Zero-Shot Text-to-Image Generation | Feb 24, 2021 | Image GenerationText to Image Generation | CodeCode Available | 3 | 5 |
| ShapeLLM: Universal 3D Object Understanding for Embodied Interaction | Feb 27, 2024 | 3D geometry3D Object Captioning | CodeCode Available | 3 | 5 |
| MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | May 15, 2025 | cross-modal alignmentGeometry Problem Solving | CodeCode Available | 3 | 5 |
| LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Jun 28, 2024 | Vision-Language-ActionWorld Knowledge | CodeCode Available | 3 | 5 |
| The Unreasonable Effectiveness of Deep Features as a Perceptual Metric | Jan 11, 2018 | Image Quality AssessmentSSIM | CodeCode Available | 3 | 5 |
| Cross-Modal Causal Intervention for Medical Report Generation | Mar 16, 2023 | Medical Report Generationobject-detection | CodeCode Available | 3 | 5 |
| Evaluating Large Language Models for Radiology Natural Language Processing | Jul 25, 2023 | | CodeCode Available | 3 | 5 |
| GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement | Jun 17, 2024 | speech-recognitionSpeech Recognition | CodeCode Available | 3 | 5 |
| Neuron-Level Sequential Editing for Large Language Models | Oct 5, 2024 | Model Editing | CodeCode Available | 3 | 5 |
| The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas | Jun 25, 2025 | | CodeCode Available | 3 | 5 |
| Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model | Aug 30, 2024 | Audio CompressionAudio Generation | CodeCode Available | 3 | 5 |
| SALMONN: Towards Generic Hearing Abilities for Large Language Models | Oct 20, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 3 | 5 |
| Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | Apr 24, 2023 | AudioCapsAudio Generation | CodeCode Available | 3 | 5 |
| PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization | Mar 3, 2025 | | CodeCode Available | 3 | 5 |
| OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer | Jul 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking | Mar 27, 2022 | CPUMulti-Object Tracking | CodeCode Available | 3 | 5 |
| TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving | May 31, 2022 | Autonomous DrivingCARLA longest6 | CodeCode Available | 3 | 5 |
| EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba | Mar 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities | Aug 8, 2024 | | CodeCode Available | 3 | 5 |
| Accelerating Diffusion Transformers with Dual Feature Caching | Dec 25, 2024 | Video Generation | CodeCode Available | 3 | 5 |
| Keypoint Promptable Re-Identification | Jul 25, 2024 | Metric LearningOccluded Person Re-Identification | CodeCode Available | 3 | 5 |
| Proteus: A Self-Designing Range Filter | Jun 30, 2022 | | CodeCode Available | 3 | 5 |
| SARATR-X: Toward Building A Foundation Model for SAR Target Recognition | May 15, 2024 | 2D Object DetectionEarth Observation | CodeCode Available | 3 | 5 |
| AutoTimes: Autoregressive Time Series Forecasters via Large Language Models | Feb 4, 2024 | DecoderIn-Context Learning | CodeCode Available | 3 | 5 |
| PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | Mar 5, 2024 | Knowledge DistillationPrompt Engineering | CodeCode Available | 3 | 5 |
| Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions | Aug 28, 2023 | BenchmarkingFormation Energy | CodeCode Available | 3 | 5 |
| Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models | Mar 10, 2024 | Visual Question Answering | CodeCode Available | 3 | 5 |
| SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation | Aug 16, 2024 | Image SegmentationMarine Animal Segmentation | CodeCode Available | 3 | 5 |
| Multimodal Foundation Models: From Specialists to General-Purpose Assistants | Sep 18, 2023 | Image GenerationSurvey | CodeCode Available | 3 | 5 |
| Aria-UI: Visual Grounding for GUI Instructions | Dec 20, 2024 | Natural Language Visual GroundingVisual Grounding | CodeCode Available | 3 | 5 |
| Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations | Jan 15, 2025 | | CodeCode Available | 3 | 5 |
| VRT: A Video Restoration Transformer | Jan 28, 2022 | DeblurringDenoising | CodeCode Available | 3 | 5 |
| A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making | Oct 31, 2024 | Decision MakingDiagnostic | CodeCode Available | 3 | 5 |
| TinyAgent: Function Calling at the Edge | Sep 1, 2024 | Language ModellingQuantization | CodeCode Available | 3 | 5 |
| Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models | Oct 16, 2024 | HallucinationKnowledge Graphs | CodeCode Available | 3 | 5 |
| Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series | Feb 16, 2022 | Anomaly DetectionDensity Estimation | CodeCode Available | 3 | 5 |
| Towards An End-to-End Framework for Flow-Guided Video Inpainting | Apr 6, 2022 | HallucinationOptical Flow Estimation | CodeCode Available | 3 | 5 |
| Sintel: A Machine Learning Framework to Extract Insights from Signals | Apr 19, 2022 | Anomaly DetectionBIG-bench Machine Learning | CodeCode Available | 3 | 5 |
| VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation | Aug 28, 2023 | Instance SegmentationOptical Flow Estimation | CodeCode Available | 3 | 5 |
| TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement | Jun 14, 2023 | GPUMotion Estimation | CodeCode Available | 3 | 5 |
| Playing Non-Embedded Card-Based Games with Reinforcement Learning | Apr 7, 2025 | Board GamesDecision Making | CodeCode Available | 3 | 5 |
| Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation | Sep 20, 2018 | Multi-task Audio Source SeperationMusic Source Separation | CodeCode Available | 3 | 5 |
| DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models | Oct 26, 2022 | DiversityMisinformation | CodeCode Available | 3 | 5 |