Blueprint for Supercharging LLM Inference With “PagedAttention over RDMA”
Learn how "PagedAttention over RDMA" (PAoR) revolutionizes large language model serving by addressing key-value (KV) cache challenges with RDMA networking and WEKA’s high-performance distributed storage. This session showcases seamless integration with vLLM and TensorRT-LLM, enabling faster inference with reduced latency and increased throughput across multi-node environments.