SOTAVerified

Mixture-of-Experts

Papers

Showing 701750 of 1312 papers

TitleStatusHype
Stealing User Prompts from Mixture of Experts0
Efficient and Interpretable Grammatical Error Correction with Mixture of ExpertsCode0
MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning0
ProMoE: Fast MoE-based LLM Serving using Proactive Caching0
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging0
Neural Experts: Mixture of Experts for Implicit Neural Representations0
Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving0
FinTeamExperts: Role Specialized MOEs For Financial Analysis0
Hierarchical Mixture of Experts: Generalizable Learning for High-Level SynthesisCode0
MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases0
Mixture of Parrots: Experts improve memorization more than reasoning0
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference0
Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches0
MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning0
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition0
Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling0
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts0
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-ExpertsCode0
MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning0
Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks0
Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts0
On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs0
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference0
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router0
Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models0
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight0
Scalable Multi-Domain Adaptation of Language Models using Modular Experts0
Learning to Ground VLMs without Forgetting0
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs0
ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL0
MoIN: Mixture of Introvert Experts to Upcycle an LLM0
GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks0
AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach0
Upcycling Large Language Models into Mixture of Experts0
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed RoutingCode0
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training0
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs0
Toward generalizable learning of all (linear) first-order methods via memory augmented Transformers0
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models0
Probing the Robustness of Theory of Mind in Large Language Models0
Multimodal Fusion Strategies for Mapping Biophysical Landscape FeaturesCode0
Realizing Video Summarization from the Path of Language-based Semantic Understanding0
Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding with LLMs0
A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles0
On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions0
Neutral residues: revisiting adapters for model extension0
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping0
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among PromptsCode0
MLP-KAN: Unifying Deep Representation and Function LearningCode0
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs0
Show:102550
← PrevPage 15 of 27Next →

No leaderboard results yet.