🍊 Latent Atlas 🍉

❯

❯

Sparse and Efficient Architectures

Sparse and Efficient Architectures

2026年2月14日1分钟阅读

稀疏与高效架构模块负责整理 MoE、Mamba、SSM、Linear Attention 等用于提升扩展性、效率或长序列建模能力的架构路线。

Reading Path

Mixture of Experts：通过 sparse FFN 扩大 total capacity。
Linear Attention 与 Efficient Transformer：降低 attention 长序列复杂度的路线。
State Space Model 与 SSM：用状态空间结构替代或补充 attention 的路线。

Notes

Mixture of Experts
SSM
State Space Model
Linear Attention
Efficient Transformer

Related Source TODOs

Outrageously Large Neural Networks
GShard
Switch Transformer
DeepSeekMoE
DeepSeek-V2
DeepSeek-V3

此文件夹下有5条笔记。

2026年2月15日
Mamba / SSM
- architecture
- non-transformer
2026年2月14日
Efficient Transformer
- transformer
- efficient-architecture
2026年2月14日
Linear Attention
2026年2月14日
Mixture of Experts
2026年2月14日
State Space Model
- ssm
- efficient-architecture

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026