| Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer | Jun 9, 2022 | Autonomous DrivingGPU | CodeCode Available | 2 | 5 |
| Mapping the Mind of an Instruction-based Image Editing using SMILE | Dec 20, 2024 | Autonomous Driving | CodeCode Available | 2 | 5 |
| MatteFormer: Transformer-Based Image Matting via Prior-Tokens | Mar 29, 2022 | Image Matting | CodeCode Available | 2 | 5 |
| LLMGA: Multimodal Large Language Model based Generation Assistant | Nov 27, 2023 | Image GenerationLanguage Modeling | CodeCode Available | 2 | 5 |
| Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers | Jul 13, 2024 | MambaState Space Models | CodeCode Available | 2 | 5 |
| auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data | Apr 15, 2022 | BIG-bench Machine Learningcounterfactual | CodeCode Available | 2 | 5 |
| AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | Jul 10, 2023 | Image Animation | CodeCode Available | 2 | 5 |
| FastMoE: A Fast Mixture-of-Expert Training System | Mar 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 | 5 |
| Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving | Dec 9, 2024 | 4D reconstructionAutonomous Driving | CodeCode Available | 2 | 5 |
| Improving Image Restoration by Revisiting Global Information Aggregation | Dec 8, 2021 | Color Image DenoisingDeblurring | CodeCode Available | 2 | 5 |
| Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network | Jul 29, 2024 | DecoderSuper-Resolution | CodeCode Available | 2 | 5 |
| AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI | Jan 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 2 | 5 |
| FLAT: Chinese NER Using Flat-Lattice Transformer | Apr 24, 2020 | Chinese Named Entity Recognitionnamed-entity-recognition | CodeCode Available | 2 | 5 |
| RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models | Dec 31, 2023 | HallucinationRAG | CodeCode Available | 2 | 5 |
| SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Jul 30, 2024 | Caption GenerationQuestion Answering | CodeCode Available | 2 | 5 |
| Squeezeformer: An Efficient Transformer for Automatic Speech Recognition | Jun 2, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation | Dec 8, 2022 | 3D Reconstruction3D Shape Generation | CodeCode Available | 2 | 5 |
| ControlVideo: Training-free Controllable Text-to-Video Generation | May 22, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 2 | 5 |
| Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision | Feb 14, 2024 | Language ModellingSegmentation | CodeCode Available | 2 | 5 |
| Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction | Feb 28, 2024 | ChatbotReconstruction Attack | CodeCode Available | 2 | 5 |
| Forgetting Transformer: Softmax Attention with a Forget Gate | Mar 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Tool-Planner: Task Planning with Clusters across Multiple Tools | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| PID: Physics-Informed Diffusion Model for Infrared Image Generation | Jul 12, 2024 | Image Generation | CodeCode Available | 2 | 5 |
| Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey | Jul 10, 2024 | Adversarial AttackImage Generation | CodeCode Available | 2 | 5 |
| LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch | Sep 4, 2024 | Evolutionary AlgorithmsFairness | CodeCode Available | 2 | 5 |
| PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | Sep 13, 2024 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 2 | 5 |
| SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer | Sep 12, 2024 | Target Sound Extraction | CodeCode Available | 2 | 5 |
| No More Adam: Learning Rate Scaling at Initialization is All You Need | Dec 16, 2024 | All | CodeCode Available | 2 | 5 |
| DAMamba: Vision State Space Model with Dynamic Adaptive Scan | Feb 18, 2025 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | Feb 24, 2025 | Code Completion | CodeCode Available | 2 | 5 |
| NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit | Oct 28, 2022 | Singing Voice Synthesis | CodeCode Available | 2 | 5 |
| MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | Nov 28, 2023 | 3D Question Answering (3D-QA)Diagnostic | CodeCode Available | 2 | 5 |
| VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset | May 29, 2023 | Audio captioningAudio-Visual Captioning | CodeCode Available | 2 | 5 |
| Hierarchical Open-vocabulary Universal Image Segmentation | Jul 3, 2023 | Image ComprehensionImage Segmentation | CodeCode Available | 2 | 5 |
| vid-TLDR: Training Free Token merging for Light-weight Video Transformer | Mar 20, 2024 | Action RecognitionComputational Efficiency | CodeCode Available | 2 | 5 |
| Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation | Jan 15, 2025 | Image SegmentationReferring Expression Segmentation | CodeCode Available | 2 | 5 |
| Guiding Language Models of Code with Global Context using Monitors | Jun 19, 2023 | Code CompletionCode Generation | CodeCode Available | 2 | 5 |
| dKV-Cache: The Cache for Diffusion Language Models | May 21, 2025 | Code GenerationDenoising | CodeCode Available | 2 | 5 |
| Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models | Dec 26, 2023 | DecoderReranking | CodeCode Available | 2 | 5 |
| Diffusion Models Beat GANs on Image Synthesis | May 11, 2021 | Conditional Image GenerationDiversity | CodeCode Available | 2 | 5 |
| Towards Stable Test-Time Adaptation in Dynamic Wild World | Feb 24, 2023 | Test-time Adaptation | CodeCode Available | 2 | 5 |
| beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems | Sep 16, 2024 | Collaborative FilteringRecommendation Systems | CodeCode Available | 2 | 5 |
| Measuring Style Similarity in Diffusion Models | Apr 1, 2024 | AttributeStyle Detection | CodeCode Available | 2 | 5 |
| LangBridge: Multilingual Reasoning Without Multilingual Supervision | Jan 19, 2024 | Code CompletionLogical Reasoning | CodeCode Available | 2 | 5 |
| LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities | May 22, 2023 | Event Extractiongraph construction | CodeCode Available | 2 | 5 |
| LEACE: Perfect linear concept erasure in closed form | Jun 6, 2023 | FairnessForm | CodeCode Available | 2 | 5 |
| SEBERTNets: Sequence Enhanced BERT Networks for Event Entity Extraction Tasks Oriented to the Finance Field | Jan 21, 2024 | Asset ManagementEvent Extraction | CodeCode Available | 2 | 5 |
| Graph-enhanced Large Language Models in Asynchronous Plan Reasoning | Feb 5, 2024 | | CodeCode Available | 2 | 5 |
| CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation | Apr 30, 2024 | MambaState Space Models | CodeCode Available | 2 | 5 |
| An OpenMind for 3D medical vision self-supervised learning | Dec 22, 2024 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 2 | 5 |