SOTAVerified

Mixture-of-Experts

Papers

Showing 11111120 of 1312 papers

TitleStatusHype
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models0
HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou0
HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts0
How Can Cross-lingual Knowledge Contribute Better to Fine-Grained Entity Typing?0
How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model0
How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers0
How Lightweight Can A Vision Transformer Be0
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines0
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought0
HydraSum - Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models0
Show:102550
← PrevPage 112 of 132Next →

No leaderboard results yet.