SOTAVerified

Mixture-of-Experts

Papers

Showing 501550 of 1312 papers

TitleStatusHype
GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks0
Retraining-Free Merging of Sparse MoE via Hierarchical ClusteringCode1
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-ExpertsCode2
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training0
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed RoutingCode0
Efficient Dictionary Learning with Switch Sparse AutoencodersCode1
Upcycling Large Language Models into Mixture of Experts0
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation ExpertsCode4
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs0
Toward generalizable learning of all (linear) first-order methods via memory augmented Transformers0
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models0
Aria: An Open Multimodal Native Mixture-of-Experts ModelCode5
Probing the Robustness of Theory of Mind in Large Language Models0
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains MoreCode2
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the WildCode1
Multimodal Fusion Strategies for Mapping Biophysical Landscape FeaturesCode0
Realizing Video Summarization from the Path of Language-based Semantic Understanding0
A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles0
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding with LLMs0
MLP-KAN: Unifying Deep Representation and Function LearningCode0
On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions0
Searching for Efficient Linear Layers over a Continuous Space of Structured MatricesCode1
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping0
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among PromptsCode0
Neutral residues: revisiting adapters for model extension0
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing0
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs0
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language ModelsCode2
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging0
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards0
UniAdapt: A Universal Adapter for Knowledge Calibration0
Robust Traffic Forecasting against Spatial Shift over YearsCode0
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning0
IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method0
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet UpcyclingCode2
SciDFM: A Large Language Model with Mixture-of-Experts for Science0
A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow PredictionCode1
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of ExpertsCode4
Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks0
Leveraging Mixture of Experts for Improved Speech Deepfake Detection0
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM0
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond0
A Gated Residual Kolmogorov-Arnold Networks for Mixtures of ExpertsCode0
Routing in Sparsely-gated Language Models responds to Context0
Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning0
On-Device Collaborative Language Modeling via a Mixture of Generalists and SpecialistsCode0
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts0
Mixture of Diverse Size Experts0
GRIN: GRadient-INformed MoE0
Show:102550
← PrevPage 11 of 27Next →

No leaderboard results yet.