RTPurbo: Minimal Training Achieves 97%+ Sparsity and 9x Speedup for Long-Context LLMs
Alibaba's RTPurbo converts Full Attention models into efficient sparse models with only ~600 training steps, achieving up to 9.36x Prefill acceleration and near-lossless accuracy on long-text benchmarks.
Zhou Yanke