| How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers | Mar 4, 2024 | Few-Shot LearningLanguage Modeling | —Unverified | 0 |
| How Lightweight Can A Vision Transformer Be | Jul 25, 2024 | Mixture-of-ExpertsTransfer Learning | —Unverified | 0 |
| How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Feb 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought | May 21, 2025 | ChatbotInstruction Following | —Unverified | 0 |
| HydraSum - Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models | Sep 29, 2021 | Abstractive Text SummarizationDecoder | —Unverified | 0 |
| Hypertext Entity Extraction in Webpage | Mar 4, 2024 | Mixture-of-Experts | —Unverified | 0 |
| IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method | Sep 29, 2024 | Domain AdaptationMixture-of-Experts | —Unverified | 0 |
| Identifying Shopping Intent in Product QA for Proactive Recommendations | Apr 9, 2024 | FrictionMixture-of-Experts | —Unverified | 0 |
| iMedImage Technical Report | Mar 27, 2025 | Anomaly DetectionDiagnostic | —Unverified | 0 |
| Imitation Learning from MPC for Quadrupedal Multi-Gait Control | Mar 26, 2021 | Imitation LearningMixture-of-Experts | —Unverified | 0 |