Artificial Intelligence
DeepSeek-V4 MegaMoE: Overlapping Communication and Compute
How DeepSeek-V4 MegaMoE overlaps expert-parallel communication with GPU compute using wave scheduling, TMA/MMA, and Epilogue warp pipelines for faster serving.
Qing Ke AI