Kimi K2: A Trillion-Parameter Open-Source LLM
Explore Kimi K2, the 1.04T parameter open-source MoE model. Our deep dive covers its MuonClip optimizer, agentic AI training, and benchmark performance.
Ji Zhi Liu
Dive deep into the world of Artificial Intelligence with our curated collection of articles, covering the latest breakthroughs and insights from leading researchers and engineers.
Explore Kimi K2, the 1.04T parameter open-source MoE model. Our deep dive covers its MuonClip optimizer, agentic AI training, and benchmark performance.
Ji Zhi Liu
Explore Jason Wei's three laws of AI: the Verifier's Law, Commoditization of Intelligence, and the Jagged Edge. A framework for understanding AI's future progress and automation timeline.
Founder Park
Master distributed RLHF frameworks: Compare OpenRLHF and veRL architectures. Learn Ray Actors, GPU colocation, PPO implementation, and hybrid engine design for scalable reinforcement learning systems.
Qing Ke Ai
AI agent challenges explained: solve latency issues, fix brittle planning, and avoid reflection loops. Advanced engineering patterns and RL techniques for production-ready agentic AI systems.
Qing Ke Ai
Learn how GRPO-RoC fixes outcome-based reward issues. This training method improves AI reasoning by 40% through data curation. With code examples & benchmarks.
Qing Ke Ai
Explore the 5 core reward functions powering DeepSeek-Coder-V2. Learn how its modular reward model for accuracy, reasoning, and format shapes AI behavior.
Ning Si Ai
Learn to replicate the DeepSeek R1 training process. This guide covers building a reinforcement learning pipeline from scratch using GRPO for advanced LLM reasoning.
Ning Si Ai
Traditional reinforcement learning models struggle with real-time applications due to "AI lag." Two ICLR 2025 papers from Mila introduce groundbreaking solutions to tackle inaction and delay regret, enabling large AI models to operate in high-frequency, dynamic environments without compromising speed or intelligence.
Alex
Master training 671B parameter LLMs with RL. Solve 5 critical challenges: Megatron vs FSDP, memory offloading, weight conversion, 1000+ GPU scaling. Real DeepSeek V3 workflow with GRPO achieving 10x speedup.
Alex