Latest Articles

Dive deep into the world of Artificial Intelligence with our curated collection of articles, covering the latest breakthroughs and insights from leading researchers and engineers.

Filtering by tag:

SGLang

(3 articles)

July 29, 2025Technology

LLM Inference on H800: A Disaggregated Architecture Guide

Explore LLM inference optimization on H800 SuperPods. Learn how a disaggregated architecture with SGLang tackles the prefill bottleneck to boost throughput.

yiakwy

LLM inference disaggregated architecture H800 SuperPod+2 more

July 21, 2025Technology

Deploying Kimi K2: Scalable MoE Model on 128 GPUs

Learn how to deploy Kimi K2, a state-of-the-art Mixture-of-Experts (MoE) model, on a massive 128 H200 GPU cluster. This guide covers the key challenges and solutions using OME and SGLang for scalable, high-performance inference, achieving 4800 tokens/second with low latency.

Alex

Kimi K2 deployment Mixture-of-Experts model OME+1 more

July 7, 2025Technology

SGLang Destroys vLLM: 3x Faster + 40% Cheaper (2025 H800 Benchmarks)

SGLang crushes vLLM with 3x throughput and 40% cost savings via prefill-decode separation. Real H800/A100 benchmarks, architecture deep-dive, production deployment guide. The future of LLM inference.

Alex

SGLang LLM inference disaggregated inference+6 more