Serving Systems 模块负责整理大模型在线服务系统,包括 vLLM、continuous batching、请求调度、并行 serving 和 PD 分离等。 Notes vLLM Continuous Batching Request Scheduling Tensor Parallel Serving Disaggregated Serving