Paper detail

Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts

Mixture-of-Experts models enable large language models to scale efficiently, as they only activate a subset of experts for each input. Their core mechanisms, Top-k routing and auxiliary load balancing, remain heuristic, however, lacking a cohesive theoretical underpinning to support them. To this end, we build the first unified theoretical framework that rigorously derives these practices as optimal sparse posterior approximation and prior regularization from a Bayesian perspective, while simultaneously framing them as mechanisms to minimize routing ambiguity and maximize channel capacity from an information-theoretic perspective. We also pinpoint the inherent combinatorial hardness of routing, defining it as the NP-hard sparse subset selection problem. We rigorously prove the existence of a "Coherence Barrier"; when expert representations exhibit high mutual coherence, greedy routing strategies theoretically fail to recover the optimal expert subset. Importantly, we formally verify that imposing geometric orthogonality in the expert feature space is sufficient to narrow the divide between the NP-hard global optimum and polynomial-time greedy approximation. Our comparative analyses confirm orthogonality regularization as the optimal engineering relaxation for large-scale models. Our work offers essential theoretical support and technical assurance for a deeper understanding and novel designs of MoE.

preprint2026arXivOpen access

Signal facts

What is known right now

Open access2 authors1 topic

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.