| ViTPose++: Vision Transformer for Generic Body Pose Estimation | Dec 7, 2022 | 2D Human Pose EstimationAnimal Pose Estimation | CodeCode Available | 3 |
| FAN: Fourier Analysis Networks | Oct 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| FilterNet: Harnessing Frequency Filters for Time Series Forecasting | Nov 3, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 3 |
| QuEst: Graph Transformer for Quantum Circuit Reliability Estimation | Oct 30, 2022 | | CodeCode Available | 3 |
| WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction | Sep 24, 2024 | Managementspeech-recognition | CodeCode Available | 3 |
| KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction | May 29, 2025 | Question Answering | CodeCode Available | 3 |
| BERGEN: A Benchmarking Library for Retrieval-Augmented Generation | Jul 1, 2024 | BenchmarkingRAG | CodeCode Available | 3 |
| MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications | Apr 11, 2025 | GPU | CodeCode Available | 3 |
| Evaluating Text-to-Visual Generation with Image-to-Text Generation | Apr 1, 2024 | Image to textQuestion Answering | CodeCode Available | 3 |
| Attention Is All You Need | Jun 12, 2017 | Abstractive Text SummarizationAll | CodeCode Available | 3 |
| CodeTF: One-stop Transformer Library for State-of-the-art Code LLM | May 31, 2023 | | CodeCode Available | 3 |
| StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization | Dec 10, 2024 | Story Visualization | CodeCode Available | 3 |
| DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Dec 24, 2024 | Video EditingVideo Generation | CodeCode Available | 3 |
| Residual Kolmogorov-Arnold Network for Enhanced Deep Learning | Oct 7, 2024 | Computational EfficiencyDeep Learning | CodeCode Available | 3 |
| Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Apr 3, 2025 | MambaTalking Head Generation | CodeCode Available | 3 |
| AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One | Dec 10, 2023 | AllBenchmarking | CodeCode Available | 3 |
| A Survey on LoRA of Large Language Models | Jul 8, 2024 | Federated Learningparameter-efficient fine-tuning | CodeCode Available | 3 |
| VisionZip: Longer is Better but Not Necessary in Vision Language Models | Dec 5, 2024 | Video UnderstandingVisual Question Answering | CodeCode Available | 3 |
| Humans in 4D: Reconstructing and Tracking Humans with Transformers | May 31, 2023 | 3D Human Pose EstimationAction Recognition | CodeCode Available | 3 |
| Sigmoid Loss for Language Image Pre-Training | Mar 27, 2023 | Contrastive LearningDisentanglement | CodeCode Available | 3 |
| Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding | Feb 9, 2025 | Image CaptioningImage-text Retrieval | CodeCode Available | 3 |
| T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback | May 29, 2024 | Video Generation | CodeCode Available | 3 |
| Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning | Jun 10, 2024 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 3 |
| Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation | Mar 20, 2024 | | CodeCode Available | 3 |
| Restoring Images in Adverse Weather Conditions via Histogram Transformer | Jul 14, 2024 | Image Restoration | CodeCode Available | 3 |
| MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion | Nov 18, 2023 | Video Generation | CodeCode Available | 3 |
| MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries | Jan 27, 2024 | BenchmarkingRAG | CodeCode Available | 3 |
| NerfAcc: A General NeRF Acceleration Toolbox | Oct 10, 2022 | NeRF | CodeCode Available | 3 |
| Llemma: An Open Language Model For Mathematics | Oct 16, 2023 | Arithmetic ReasoningAutomated Theorem Proving | CodeCode Available | 3 |
| Datasets: A Community Library for Natural Language Processing | Sep 7, 2021 | Image ClassificationObject Recognition | CodeCode Available | 3 |
| Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction | Feb 15, 2023 | 3D Semantic Scene CompletionAutonomous Driving | CodeCode Available | 3 |
| ResNeSt: Split-Attention Networks | Apr 19, 2020 | image-classificationImage Classification | CodeCode Available | 3 |
| MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer | Jan 19, 2023 | Image GenerationImage Segmentation | CodeCode Available | 3 |
| IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus | Feb 22, 2024 | Zero-shot Generalization | CodeCode Available | 3 |
| StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs | Mar 26, 2025 | Benchmarking | CodeCode Available | 3 |
| Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory | Apr 10, 2025 | MathMMLU | CodeCode Available | 3 |
| Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs | Jan 11, 2024 | Representation LearningSelf-Supervised Learning | CodeCode Available | 3 |
| Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling | Jan 9, 2023 | 2D Object DetectionContrastive Learning | CodeCode Available | 3 |
| Inferring Articulated Rigid Body Dynamics from RGBD Video | Mar 20, 2022 | Contact mechanicsInverse Rendering | CodeCode Available | 3 |
| SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Apr 25, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 3 |
| Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters | Mar 18, 2024 | Continual LearningIncremental Learning | CodeCode Available | 3 |
| Neural Network Verification with Branch-and-Bound for General Nonlinearities | May 31, 2024 | | CodeCode Available | 3 |
| AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation | Apr 4, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 3 |
| DrivAerNet: A Parametric Car Dataset for Data-Driven Aerodynamic Design and Prediction | Mar 12, 2024 | | CodeCode Available | 3 |
| Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection | Mar 4, 2025 | Anomaly DetectionMulti-class Anomaly Detection | CodeCode Available | 3 |
| Diffusion Model-Based Video Editing: A Survey | Jun 26, 2024 | modelSurvey | CodeCode Available | 3 |
| Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer | Mar 7, 2022 | | CodeCode Available | 3 |
| BoT-SORT: Robust Associations Multi-Pedestrian Tracking | Jun 29, 2022 | Multi-Object TrackingObject | CodeCode Available | 3 |
| TopoBench: A Framework for Benchmarking Topological Deep Learning | Jun 9, 2024 | BenchmarkingDeep Learning | CodeCode Available | 3 |
| InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation | Sep 12, 2023 | GPUImage Generation | CodeCode Available | 3 |