🍊 Latent Atlas 🍉

❯

Architecture

2026年1月18日1分钟阅读

大模型架构设计，从 Transformer 基础结构到注意力变体、位置编码、模型家族、稀疏高效架构和多模态架构。

Recommended Path

Transformer：先理解 block、residual、normalization、FFN 和 decoder-only 结构。
Attention：再理解 self-attention、MHA、MQA/GQA/MLA 和长上下文 attention 变体。
Positional Encoding：理解 RoPE、ALiBi、YaRN 等位置机制如何影响 attention。
Mixture of Experts：理解 dense FFN 到 sparse expert FFN 的容量/计算取舍。
Model Families：最后把 GPT、LLaMA、Qwen、DeepSeek 等看作基础机制的组合案例。

Core Modules

Transformer — 标准 Transformer 结构、decoder-only、FFN、归一化和残差。
Attention — Self-Attention、MHA、MQA、GQA、滑动窗口注意力。
Positional Encoding — 绝对位置、正弦位置、RoPE、ALiBi、YaRN。

Model and Architecture Families

Model Families — GPT、LLaMA、Qwen、DeepSeek、Mistral、Gemma 等模型家族。
Sparse and Efficient Architectures — MoE、Mamba、SSM、Linear Attention 等高效路线。
Multimodal — VLM、CLIP、LLaVA、Qwen-VL 和多模态 projector。

此文件夹下有6条笔记。

2026年2月15日
Multimodal
2026年2月14日
Sparse and Efficient Architectures
2026年2月07日
Model Families
2026年2月01日
Positional Encoding
2026年1月25日
Attention
2026年1月18日
Transformer

🍊 Latent Atlas 🍉 · An AI knowledge atlas built with Quartz © 2026