| YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images | Apr 9, 2024 | Objectobject-detection | CodeCode Available | 2 | 5 |
| Ecco: An Open Source Library for the Explainability of Transformer Language Models | Aug 1, 2021 | Text Generation | CodeCode Available | 2 | 5 |
| DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer | Feb 8, 2024 | | CodeCode Available | 2 | 5 |
| Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model | Jan 1, 2024 | Motion Style TransferStyle Transfer | CodeCode Available | 2 | 5 |
| eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data | Feb 13, 2024 | Domain Generalization | CodeCode Available | 2 | 5 |
| AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning | Aug 7, 2023 | Offline RLreinforcement-learning | CodeCode Available | 2 | 5 |
| MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer | Oct 5, 2021 | Image Classificationobject-detection | CodeCode Available | 2 | 5 |
| LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Jan 7, 2025 | Mixture-of-ExpertsRepresentation Learning | CodeCode Available | 2 | 5 |
| Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems | Jul 1, 2024 | RAG | CodeCode Available | 2 | 5 |
| Unlocking the Potential of Classic GNNs for Graph-level Tasks: Simple Architectures Meet Excellence | Feb 13, 2025 | Graph ClassificationGraph Property Prediction | CodeCode Available | 2 | 5 |
| UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery | Sep 18, 2021 | Change DetectionDecoder | CodeCode Available | 2 | 5 |
| Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV | Jul 14, 2024 | DenoisingImage Denoising | CodeCode Available | 2 | 5 |
| ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention | May 28, 2024 | GPURepresentation Learning | CodeCode Available | 2 | 5 |
| DiffiT: Diffusion Vision Transformers for Image Generation | Dec 4, 2023 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| Benchmarking Large Language Models in Retrieval-Augmented Generation | Sep 4, 2023 | Benchmarkingcounterfactual | CodeCode Available | 2 | 5 |
| MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt | Dec 14, 2024 | MambaObject | CodeCode Available | 2 | 5 |
| Grasp, See, and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior | Feb 23, 2024 | ObjectObject Rearrangement | CodeCode Available | 2 | 5 |
| Visual Adversarial Examples Jailbreak Aligned Large Language Models | Jun 22, 2023 | | CodeCode Available | 2 | 5 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 | 5 |
| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 | 5 |
| ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers | Jun 4, 2022 | Knowledge DistillationQuantization | CodeCode Available | 2 | 5 |
| OmniMAE: Single Model Masked Pretraining on Images and Videos | Jun 16, 2022 | | CodeCode Available | 2 | 5 |
| TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers | Jan 13, 2022 | GPUObject | CodeCode Available | 2 | 5 |
| StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation | May 30, 2023 | 3D GenerationAttribute | CodeCode Available | 2 | 5 |
| SCNet: Sparse Compression Network for Music Source Separation | Jan 24, 2024 | CPUMusic Source Separation | CodeCode Available | 2 | 5 |
| Large Language Models Can Self-Improve in Long-context Reasoning | Nov 12, 2024 | | CodeCode Available | 2 | 5 |
| StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces | Mar 10, 2023 | AttributeSuper-Resolution | CodeCode Available | 2 | 5 |
| LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On | May 22, 2023 | Virtual Try-on | CodeCode Available | 2 | 5 |
| CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding | Apr 22, 2024 | Attribute | CodeCode Available | 2 | 5 |
| A Simple Framework for Contrastive Learning of Visual Representations | Feb 13, 2020 | Contrastive LearningImage Classification | CodeCode Available | 2 | 5 |
| LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences | Dec 2, 2024 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer | Mar 8, 2022 | Image Classificationobject-detection | CodeCode Available | 2 | 5 |
| MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection | Aug 16, 2024 | Event DetectionSound Event Detection | CodeCode Available | 2 | 5 |
| The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group Dance Challenge | Oct 27, 2022 | Multi-Object TrackingMultiple Object Tracking | CodeCode Available | 2 | 5 |
| KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting | Nov 26, 2020 | Knowledge GraphsRepresentation Learning | CodeCode Available | 2 | 5 |
| MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation | Nov 24, 2023 | 3D GenerationImage Generation | CodeCode Available | 2 | 5 |
| UniRGB-IR: A Unified Framework for RGB-Infrared Semantic Tasks via Adapter Tuning | Apr 26, 2024 | Multispectral Object DetectionPedestrian Detection | CodeCode Available | 2 | 5 |
| PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis | May 24, 2024 | Art AnalysisComputational Efficiency | CodeCode Available | 2 | 5 |
| LumberChunker: Long-Form Narrative Document Segmentation | Jun 25, 2024 | ChunkingForm | CodeCode Available | 2 | 5 |
| EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation | Sep 26, 2024 | Image SegmentationMamba | CodeCode Available | 2 | 5 |
| PokerBench: Training Large Language Models to become Professional Poker Players | Jan 14, 2025 | | CodeCode Available | 2 | 5 |
| LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Jan 14, 2025 | Feature CompressionLanguage Modeling | CodeCode Available | 2 | 5 |
| Geodesic Diffusion Models for Medical Image-to-Image Generation | Mar 2, 2025 | DenoisingImage Denoising | CodeCode Available | 2 | 5 |
| Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark | Mar 12, 2025 | Image RetrievalRetrieval | CodeCode Available | 2 | 5 |
| Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation | Mar 16, 2023 | DiversityGesture Generation | CodeCode Available | 2 | 5 |
| Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses | Feb 3, 2021 | DecoderSpeech Denoising | CodeCode Available | 2 | 5 |
| Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data | Feb 8, 2024 | Action RecognitionMamba | CodeCode Available | 2 | 5 |
| rPPG-Toolbox: Deep Remote PPG Toolbox | Oct 3, 2022 | BenchmarkingData Augmentation | CodeCode Available | 2 | 5 |
| R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | Jan 18, 2024 | Benchmarking | CodeCode Available | 2 | 5 |
| Explaining Explanations: Axiomatic Feature Interactions for Deep Networks | Feb 10, 2020 | | CodeCode Available | 2 | 5 |