CVE-2025-46570

vLLM’s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel

Description

vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.

References

Link	Tags
https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r	vendor advisory
https://github.com/vllm-project/vllm/pull/17045	issue tracking vendor advisory
https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f	patch

Frequently Asked Questions

What is the severity of CVE-2025-46570?: CVE-2025-46570 has been scored as a low severity vulnerability.
How to fix CVE-2025-46570?: To fix CVE-2025-46570, make sure you are using an up-to-date version of the affected component(s) by checking the vendor release notes. As for now, there are no other specific guidelines available.
Is CVE-2025-46570 being actively exploited in the wild?: As for now, there are no information to confirm that CVE-2025-46570 is being actively exploited. According to its EPSS score, there is a ~0% probability that this vulnerability will be exploited by malicious actors in the next 30 days.
What software or system is affected by CVE-2025-46570?: CVE-2025-46570 affects vllm-project vllm.

Description

Categories

CWE-208 – Observable Timing Discrepancy

CWE-203 – Observable Discrepancy

References

Frequently Asked Questions