基本信息
- Title: DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
- Source type: paper
- Related topic notes: Mixture of Experts, DeepSeek
TODO
- 阅读论文原文,整理 fine-grained expert segmentation、shared experts 和 expert specialization 的设计动机。
- 回填 DeepSeekMoE 与经典 sparsely-gated MoE、Switch Transformer 的差异。
- 补充 MoE 中 shared/routed experts、负载均衡和专家 specialization 的稳定知识。