Document Chunking for RAG: A Practical Guide
Boost your RAG system's performance with our guide to document chunking. Explore strategies from recursive to semantic chunking with Python & LangChain code.
Ai Yun Shu
Dive deep into the world of Artificial Intelligence with our curated collection of articles, covering the latest breakthroughs and insights from leading researchers and engineers.
Boost your RAG system's performance with our guide to document chunking. Explore strategies from recursive to semantic chunking with Python & LangChain code.
Ai Yun Shu
Confused by the LangChain ecosystem? Learn the key differences between LangChain, LangGraph, and LangSmith to choose the right tool for your LLM stack.
Shen Du Xue Xi Shi Jue
Learn how to add special tokens to LLMs during fine-tuning without causing catastrophic forgetting. Our guide covers smart initialization and PEFT/LoRA.
Bao Bao Suan Fa Bi Ji
Learn what Context Engineering is and how it improves LLM accuracy. Discover the 3 key stages in RAG systems for better, more reliable AI-generated outputs.
Quan Ge Tan Ai
Discover the top 10 AI tools thriving in the underground economy. Based on real API data, we reveal the AI coding agents and role-playing apps developers use.
Shi Zi Lu Kou Crossing
Explore the future of AI programming assistants. Learn about a local-first, secure AI coding tool that automates refactoring, testing, and deployment from your CLI.
Lao Yang Ai Gao Sheng Huo
Explore the top RAG frameworks of 2025. Compare production-ready tools like Haystack & RAGFlow with cutting-edge research to build powerful AI applications.
Chen Jin Shi Xue Ai
Explore the architecture of LLM agents. This visual guide covers memory, tools, planning, and multi-agent systems like AutoGen. Learn how AI agents work.
Lao Liu Shuo Nlp
# RAG Evaluation 101: From Recall@K to Answer Faithfulness Retrieval-Augmented Generation (RAG) systems combine an information retriever with a generative model to produce answers grounded in extern...
AI Author
Learn how outcome-based rewards teach AI models bad habits. Discover GRPO-RoC, a training method that improves AI reasoning by curating high-quality data.
Qing Ke Ai
Discover how the Checkpoint-Engine achieves a 30x speedup in LLM RL training. Learn about its innovative approach to parameter updates for large-scale RL.
Qing Ke Ai
Learn about Multi-head Latent Attention (MLA) and how it improves on Multi-Query Attention (MQA). Discover Matrix Absorption and its impact on performance.
Chen Jin Shi Xue Ai
Learn how to build an iOS app with AI. This Vibe Coding guide covers AI pair programming, using v0 for UI, and leveraging generative AI for faster development.
Shan Gui Er
Explore the architecture of Alibaba's Qwen3-Next, a powerful large language model. Learn about its Mixture of Experts (MoE) design and performance.
Zheng Li
Learn what Agentic RAG is and how it improves on traditional RAG. This guide shows you how to build an adaptive system with LangGraph and the Qwen model.
Ning Si Ai
Learn how to train a language model with this PyTorch training loop guide. Explore text generation, the AdamW optimizer, and Mixture of Experts models.
Zheng Li
Learn how to build a Llama-style MoE language model from scratch. This guide covers the Mixture of Experts architecture, tokenization, and model setup.
Zheng Li
Learn the complete Supervised Fine-Tuning (SFT) pipeline to enhance LLM reasoning. This guide covers the DeepSeek R1 process, from SFT to knowledge distillation.
Ning Si Ai
Learn to implement a full GRPO training pipeline. This guide covers Supervised Fine-Tuning (SFT) with cold-start data, CoT prompting, and the GRPOTrainer.
Ning Si Ai
Explore the 5 core reward functions powering DeepSeek-Coder-V2. Learn how its modular reward model for accuracy, reasoning, and format shapes AI behavior.
Ning Si Ai
Learn to replicate the DeepSeek R1 training process. This guide covers building a reinforcement learning pipeline from scratch using GRPO for advanced LLM reasoning.
Ning Si Ai
Learn how Prefill-Decode separation in LLM serving boosts goodput by 4.48x. Discover DistServe, a new architecture that optimizes latency and meets strict SLOs.
GiantPandaLLM
Learn how knowledge distillation and model temperature work to train smaller, more efficient AI models. A key technique for LLM model compression.
Chen Jin Shi Xue Ai
Struggling with AI projects? The problem isn't your models, it's your AI infrastructure. Learn why data silos & lag hold you back and how to build a better f...
Pingxingjilu
Learn to install and use LLaMA Factory to fine-tune hundreds of LLMs on your local machine. This guide covers CUDA setup, installation, and WebUI usage.
Number in the Moutain
Explore the Transformer model and its architecture. Learn about attention mechanisms and how Transformers power modern AI and Large Language Models (LLMs).
Quan Ge Tan Ai
A comprehensive guide to using Ollama for running large language models like Llama 3 and Mistral on your local machine. Learn installation, commands, and how to create custom models.
Jordan Lee
Discover what Large Language Models (LLMs) are and how they power Generative AI. This in-depth guide covers the Transformer architecture, prompt engineering, and more.
Quan Ge Tan Ai
Explore the shift to separated architectures for RL post-training of LLMs. Learn how systems like AsyncFlow & TransferQueue solve data orchestration challenges.
Little Boji
Explore LLM inference optimization on H800 SuperPods. Learn how a disaggregated architecture with SGLang tackles the prefill bottleneck to boost throughput.
yiakwy
Monitoring **PyTorch GPU memory usage** during model training can be perplexing. To demystify this, we'll dive into the **PyTorch memory snapshot** tool, a powerful utility for detailed **GPU memory ...
Panda
Discover a critical flaw in Supervised Fine-Tuning (SFT) that limits LLM performance. Learn how a simple learning rate tweak unifies SFT and DPO for a 25% gain.
Alex
Unpack the powerful workflow behind GraphRAG. Learn how it transforms data into a network of nodes and edges, uses intelligent graph traversal for searching, and applies advanced metrics and metadata filters to deliver highly relevant, contextualized answers.
Mi
This article delves into the core challenges of GPU performance, analyzing the differences between compute-bound and memory-bound operations and highlighting the issue of underutilized memory bandwidth. It further proposes strategies to maximize throughput and looks ahead to the collaborative future of CPUs and GPUs, as well as the evolution of GPU architecture, offering a first-principles perspective on understanding and optimizing GPU performance.
xiaodong gong
Traditional reinforcement learning models struggle with real-time applications due to "AI lag." Two ICLR 2025 papers from Mila introduce groundbreaking solutions to tackle inaction and delay regret, enabling large AI models to operate in high-frequency, dynamic environments without compromising speed or intelligence.
Alex
The rise of Agentic AI places unprecedented demands on our infrastructure. This article explores the emerging software and hardware requirements, from specialized runtimes and memory services to zero-trust security models, dissecting AWS's new Bedrock AgentCore platform and discussing the future of AI infrastructure.
Alex
This article dissects the architectural evolution of modern large language models in 2025, moving beyond benchmarks to analyze the core design choices of flagship open-source models. We explore key innovations like DeepSeek-V3's Multi-Head Latent Attention (MLA) and Mixture of Experts (MoE), OLMo 2's unique normalization strategies, Gemma 3's use of sliding window attention, and Llama 4's take on MoE. By focusing on these architectural blueprints, we gain a clearer understanding of the engineering priorities shaping the future of LLMs.
Alex
Learn how to deploy Kimi K2, a state-of-the-art Mixture-of-Experts (MoE) model, on a massive 128 H200 GPU cluster. This guide covers the key challenges and solutions using OME and SGLang for scalable, high-performance inference, achieving 4800 tokens/second with low latency.
Alex
Learn how to select the best ldmatrix operation in CUTLASS CuTe for high-performance GPU matrix multiplication. Optimize data movement and performance.
Alex
Unlock the full potential of your CUDA kernels by mastering memory coalescing with TiledCopy. This article dives deep into optimizing data transfers from Global to Shared Memory on NVIDIA GPUs, covering cp.async, row-major vs. column-major layouts, and cache line alignment to maximize memory bandwidth and accelerate your deep learning workloads.
Alex
# Fine-Tuning Qwen3 with Unsloth: Step-by-Step Guide Qwen3, the latest generation of large language models, is redefining AI with advanced reasoning, instruction following, and robust multilingual s...
Alex
# Baidu ERNIE 4.5: Advancements in Multimodal Large Language Models Baidu's ERNIE 4.5 marks a major leap in artificial intelligence, especially in the development of **multimodal large language mode...
Alex
# MemOS: Persistent Memory for LLMs & Next-Gen AI Agents  transforms LLMs from base models to chat assistants. Step-by-step guide to SFT workflow, datasets, and best practices.
Alex
Discover how linear layers enable multi-head attention in Transformers, powering advanced NLP models with parallel processing and rich representations.
Alex
Explore single-controller vs multi-controller in veRL, inspired by Google's Pathways, and learn their impact on distributed reinforcement learning systems.
Alex
The field of artificial intelligence has seen rapid advancements in reinforcement learning for reasoning, particularly within large language models (LLMs). This article reviews influential research s...
Alex
Discover how xAI's Grok 4 sets new AI benchmarks, outperforms rivals, and introduces multi-agent systems in the race for next-gen artificial intelligence.
Alex
## Qwen3 Training Pipeline: Pre-training, Reinforcement Learning, and Model Distillation ### Qwen3 Pre-training: Building a Robust Foundation Qwen3 training begins with a comprehensive three-stage ...
Alex
## LLM API Market 2024: Key Trends and Model Leaderboard As we reach the midpoint of 2024, the competitive landscape for large language models (LLMs) is shifting rapidly. The so-called "LLM Wars" ar...
Alex
Discover the technical challenges and solutions in training a 671B parameter LLM with Reinforcement Learning, covering frameworks, memory, and efficiency.
Alex
Discover how traditional infrastructure skills translate to AI infrastructure. Learn key concepts, differences, and engineering fundamentals for LLM systems.
Alex
With its impressive performance and elegant architecture, **SGLang** is rapidly establishing itself in the competitive world of **large language model (LLM) inference**. Could it be the next PyTorch,...
Alex
### Why Direct Reinforcement Learning on Base Language Models is the Next Frontier Direct reinforcement learning (RL) on base language models is emerging as a transformative approach in LLM optimiza...
Alex
Learn Andrew Ng
Alex
Reinforcement learning for LLMs (large language models) is revolutionizing the field of artificial intelligence by enabling models to learn beyond the constraints of supervised learning. This article...
Alex
# Decoding Strategies for Large Language Models (LLMs) At the core of every large language model (LLM) is a sophisticated process for generating text. Instead of selecting words at random, the model...
yong qiang
## Kimi Researcher: Advancing AI Agents with End-to-End Reinforcement Learning Kimi Researcher is the flagship product of the Kimi Agent initiative, designed to revolutionize research automation thr...
Alex
## Qwen3 Model Family: QK-Norm and Enhanced Attention Mechanism The Qwen3 model family, Alibaba's latest large language model release, introduces a significant upgrade for on-device AI: the adoption...
Alex