| Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Dec 5, 2024 | Stereo MatchingZero-shot Generalization | CodeCode Available | 3 | 5 |
| EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling | Dec 31, 2023 | 3D Face AnimationDiversity | CodeCode Available | 3 | 5 |
| Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration | Apr 2, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 | 5 |
| NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist | May 15, 2023 | Controllable Language ModellingDialogue Generation | CodeCode Available | 3 | 5 |
| TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation | Apr 25, 2024 | 3D Human Pose EstimationHuman Mesh Recovery | CodeCode Available | 3 | 5 |
| Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| Poseidon: Efficient Foundation Models for PDEs | May 29, 2024 | Operator learning | CodeCode Available | 3 | 5 |
| LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model | Jan 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Quantifying the robustness of deep multispectral segmentation models against natural perturbations and data poisoning | May 18, 2023 | Adversarial RobustnessData Poisoning | CodeCode Available | 3 | 5 |
| pix2gestalt: Amodal Segmentation by Synthesizing Wholes | Jan 25, 2024 | 3D ReconstructionObject Recognition | CodeCode Available | 3 | 5 |
| Paint Bucket Colorization Using Anime Character Color Design Sheets | Oct 25, 2024 | ColorizationLine Art Colorization | CodeCode Available | 3 | 5 |
| Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models | Dec 11, 2023 | Chart UnderstandingDecoder | CodeCode Available | 3 | 5 |
| SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving | Nov 25, 2024 | 3DGSAutonomous Driving | CodeCode Available | 3 | 5 |
| EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation | Mar 24, 2022 | 3D Object Detection6D Pose Estimation using RGB | CodeCode Available | 3 | 5 |
| NGD-SLAM: Towards Real-Time Dynamic SLAM without GPU | May 12, 2024 | CPUDeep Learning | CodeCode Available | 3 | 5 |
| Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+ | Mar 16, 2023 | BenchmarkingGraph Regression | CodeCode Available | 3 | 5 |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Oct 11, 2018 | Citation Intent ClassificationCommon Sense Reasoning | CodeCode Available | 3 | 5 |
| ZigMa: A DiT-style Zigzag Mamba Diffusion Model | Mar 20, 2024 | Mambamodel | CodeCode Available | 3 | 5 |
| Searching for Best Practices in Retrieval-Augmented Generation | Jul 1, 2024 | Question AnsweringRAG | CodeCode Available | 3 | 5 |
| Evaluating representation learning on the protein structure universe | Jun 19, 2024 | Representation Learning | CodeCode Available | 3 | 5 |
| Proxy Denoising for Source-Free Domain Adaptation | Jun 3, 2024 | DenoisingDomain Adaptation | CodeCode Available | 3 | 5 |
| FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba | Apr 15, 2024 | Infrared And Visible Image FusionMamba | CodeCode Available | 3 | 5 |
| OceanGPT: A Large Language Model for Ocean Science Tasks | Oct 3, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| 3D-LLM: Injecting the 3D World into Large Language Models | Jul 24, 2023 | 3D Object Captioning3D Question Answering (3D-QA) | CodeCode Available | 3 | 5 |
| Fast Feedforward 3D Gaussian Splatting Compression | Oct 10, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 3 | 5 |
| Scaling Laws for Fine-Grained Mixture of Experts | Feb 12, 2024 | Mixture-of-Experts | CodeCode Available | 3 | 5 |
| WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit | Feb 2, 2021 | Decoderspeech-recognition | CodeCode Available | 3 | 5 |
| A Review of Large Language Models and Autonomous Agents in Chemistry | Jun 26, 2024 | Property Predictionscientific discovery | CodeCode Available | 3 | 5 |
| Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Dec 2, 2024 | Human Instance SegmentationPose-Based Human Instance Segmentation | CodeCode Available | 3 | 5 |
| Accelerating Diffusion Transformers with Token-wise Feature Caching | Oct 5, 2024 | Video Generation | CodeCode Available | 3 | 5 |
| One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion | Sep 10, 2024 | AllDeep Reinforcement Learning | CodeCode Available | 3 | 5 |
| skscope: Fast Sparsity-Constrained Optimization in Python | Mar 27, 2024 | | CodeCode Available | 3 | 5 |
| Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding | Oct 2, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 | 5 |
| Repeat After Me: Transformers are Better than State Space Models at Copying | Feb 1, 2024 | State Space Models | CodeCode Available | 3 | 5 |
| SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition | Feb 23, 2025 | Deep HashingGPU | CodeCode Available | 3 | 5 |
| Towards Universal Soccer Video Understanding | Dec 2, 2024 | Action ClassificationSports Understanding | CodeCode Available | 3 | 5 |
| Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance | Mar 26, 2024 | DeblurringDenoising | CodeCode Available | 3 | 5 |
| Temporal Graph Analysis with TGX | Feb 6, 2024 | | CodeCode Available | 3 | 5 |
| From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning | May 18, 2025 | Reinforcement Learning (RL)Visual Grounding | CodeCode Available | 3 | 5 |
| Halton Scheduler For Masked Generative Image Transformer | Mar 21, 2025 | Image GenerationText to Image Generation | CodeCode Available | 3 | 5 |
| Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance | Sep 23, 2024 | Emotion RecognitionFAD | CodeCode Available | 3 | 5 |
| iNatAg: Multi-Class Classification Models Enabled by a Large-Scale Benchmark Dataset with 4.7M Images of 2,959 Crop and Weed Species | Mar 25, 2025 | Multi-class Classification | CodeCode Available | 3 | 5 |
| Q-Bench+: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs | Feb 11, 2024 | Image Quality AssessmentQuestion Answering | CodeCode Available | 3 | 5 |
| SemDeDup: Data-efficient learning at web-scale through semantic deduplication | Mar 16, 2023 | | CodeCode Available | 3 | 5 |
| PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Mar 26, 2024 | Image ClassificationInstance Segmentation | CodeCode Available | 3 | 5 |
| Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models | Feb 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation | Jun 3, 2024 | Image Generation | CodeCode Available | 3 | 5 |
| DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation | Nov 7, 2024 | Object Localization | CodeCode Available | 3 | 5 |
| Universal Language Model Fine-tuning for Text Classification | Jan 18, 2018 | General ClassificationLanguage Modeling | CodeCode Available | 3 | 5 |