Latest Articles

Dive deep into the world of Artificial Intelligence with our curated collection of articles, covering the latest breakthroughs and insights from leading researchers and engineers.

Filtering by tag:

OME

(1 article)

July 21, 2025Technology

Deploying Kimi K2: Scalable MoE Model on 128 GPUs

Learn how to deploy Kimi K2, a state-of-the-art Mixture-of-Experts (MoE) model, on a massive 128 H200 GPU cluster. This guide covers the key challenges and solutions using OME and SGLang for scalable, high-performance inference, achieving 4800 tokens/second with low latency.

Alex

Kimi K2 deployment Mixture-of-Experts model OME+1 more