SOTAVerified

Mixture-of-Experts

Papers

Showing 901925 of 1312 papers

TitleStatusHype
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing PolicyCode1
MoCaE: Mixture of Calibrated Experts Significantly Improves Object DetectionCode1
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language ModelsCode1
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts0
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction TuningCode2
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts0
Exploring Sparse MoE in GANs for Text-conditioned Image SynthesisCode1
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectivesCode0
Task-Based MoE for Multitask Multilingual Machine Translation0
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget0
Fast Feedforward NetworksCode2
Motion In-Betweening with Phase ManifoldsCode2
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert InferenceCode1
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE0
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-ExpertsCode1
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly DetectionCode0
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs0
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph CompletionCode1
Experts Weights Averaging: A New General Training Scheme for Vision Transformers0
A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction0
Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving0
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-ExpertsCode2
MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language ModelsCode1
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing PlatformCode0
Bidirectional Attention as a Mixture of Continuous Word ExpertsCode0
Show:102550
← PrevPage 37 of 53Next →

No leaderboard results yet.