| TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer | Mar 20, 2024 | Keyword Spotting | CodeCode Available | 2 |
| Maximum Entropy Heterogeneous-Agent Reinforcement Learning | Jun 19, 2023 | MuJoCoMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut | Feb 23, 2022 | Objectobject-detection | CodeCode Available | 2 |
| Few-Shot Scene Classification of Optical Remote Sensing Images Leveraging Calibrated Pretext Tasks | Jul 6, 2022 | Contrastive LearningFew-Shot Learning | CodeCode Available | 2 |
| Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration | Aug 28, 2024 | AllImage Restoration | CodeCode Available | 2 |
| FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition | Feb 5, 2024 | Action RecognitionOpen Vocabulary Action Recognition | CodeCode Available | 2 |
| Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View | Apr 4, 2024 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 2 |
| AnoDDPM: Anomaly Detection With Denoising Diffusion Probabilistic Models Using Simplex Noise | Jun 30, 2022 | Anomaly DetectionDenoising | CodeCode Available | 2 |
| Efficient Neural Audio Synthesis | Feb 23, 2018 | Audio SynthesisCPU | CodeCode Available | 2 |
| Muse: Text-To-Image Generation via Masked Generative Transformers | Jan 2, 2023 | Image GenerationLanguage Modelling | CodeCode Available | 2 |
| A Generalist Agent | May 12, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Stochastic Interpolants: A Unifying Framework for Flows and Diffusions | Mar 15, 2023 | Denoising | CodeCode Available | 2 |
| Unsupervised Cross-Domain Image Generation | Nov 7, 2016 | Domain AdaptationImage Generation | CodeCode Available | 2 |
| SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks | May 27, 2023 | Decoder | CodeCode Available | 2 |
| Temporal Feature Matters: A Framework for Diffusion Model Quantization | Jul 28, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding | Jan 9, 2025 | Visual Question Answering (VQA)Visual Reasoning | CodeCode Available | 2 |
| Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption | Mar 14, 2025 | Full reference image quality assessmentFull-Reference Image Quality Assessment | CodeCode Available | 2 |
| Dynamic Brain Transformer with Multi-level Attention for Functional Brain Network Analysis | Sep 5, 2023 | | CodeCode Available | 2 |
| DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder | Dec 23, 2024 | | CodeCode Available | 2 |
| FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance | May 9, 2023 | | CodeCode Available | 2 |
| ChemDFM: A Large Language Foundation Model for Chemistry | Jan 26, 2024 | Formmodel | CodeCode Available | 2 |
| Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement | Dec 21, 2024 | Mamba | CodeCode Available | 2 |
| ETTA: Elucidating the Design Space of Text-to-Audio Models | Dec 26, 2024 | AudioCapsAudio captioning | CodeCode Available | 2 |
| FastSpeech: Fast,Robustand Controllable Text-to-Speech | May 22, 2019 | Decodertext-to-speech | CodeCode Available | 2 |
| Pruning Filters for Efficient ConvNets | Aug 31, 2016 | Image ClassificationNetwork Pruning | CodeCode Available | 2 |
| DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation | Jul 4, 2023 | 3D Shape GenerationDenoising | CodeCode Available | 2 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration | Jun 28, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model | Nov 10, 2023 | DiversityNeRF | CodeCode Available | 2 |
| MaskTerial: A Foundation Model for Automated 2D Material Flake Detection | Dec 12, 2024 | Instance SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions | Dec 17, 2020 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 2 |
| LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Oct 28, 2024 | Video GenerationVideo Reconstruction | CodeCode Available | 2 |
| Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency | Dec 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment | Nov 7, 2024 | Code Generation | CodeCode Available | 2 |
| Counterfactual Phenotyping with Censored Time-to-Events | Feb 22, 2022 | counterfactualCounterfactual Reasoning | CodeCode Available | 2 |
| FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF | Dec 20, 2024 | Privacy Preservingreinforcement-learning | CodeCode Available | 2 |
| Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval | Oct 28, 2024 | Image RetrievalImage to text | CodeCode Available | 2 |
| Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Mar 18, 2025 | Human-Domain Subject-to-VideoVideo Generation | CodeCode Available | 2 |
| MemSeg: A semi-supervised method for image surface defect detection using differences and commonalities | May 2, 2022 | Anomaly DetectionDefect Detection | CodeCode Available | 2 |
| Event-Based Motion Magnification | Feb 19, 2024 | BenchmarkingMotion Detection | CodeCode Available | 2 |
| EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies | Mar 25, 2023 | Anomaly DetectionComputational Efficiency | CodeCode Available | 2 |
| Motion Inversion for Video Customization | Mar 29, 2024 | Video Generation | CodeCode Available | 2 |
| Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point Cloud | Jul 25, 2022 | Object RecognitionSegmentation | CodeCode Available | 2 |
| Multi-Document Grounded Multi-Turn Synthetic Dialog Generation | Sep 17, 2024 | | CodeCode Available | 2 |
| Panacea: Panoramic and Controllable Video Generation for Autonomous Driving | Nov 28, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| YAKE! Keyword extraction from single documents using multiple local features | Mar 1, 2018 | Keyword ExtractionNatural Language Understanding | CodeCode Available | 2 |
| Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI | Nov 22, 2024 | counterfactualCounterfactual Explanation | CodeCode Available | 2 |
| MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare | Jun 17, 2022 | Federated LearningKnowledge Distillation | CodeCode Available | 2 |
| Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis | Jan 16, 2025 | Explainable Artificial Intelligence (XAI)Explainable Models | CodeCode Available | 2 |
| cuSLINK: Single-linkage Agglomerative Clustering on the GPU | Jun 28, 2023 | ClusteringGPU | CodeCode Available | 2 |