Latest Articles

Dive deep into the world of Artificial Intelligence with our curated collection of articles, covering the latest breakthroughs and insights from leading researchers and engineers.

Filtering by tag:
TiledCopy
(1 article)
Technology

Optimizing TiledCopy for Memory Coalescing on NVIDIA GPUs

Unlock the full potential of your CUDA kernels by mastering memory coalescing with TiledCopy. This article dives deep into optimizing data transfers from Global to Shared Memory on NVIDIA GPUs, covering cp.async, row-major vs. column-major layouts, and cache line alignment to maximize memory bandwidth and accelerate your deep learning workloads.

Alex