LMDeploy Vulnerability Exploited in 12 Hours, Exposing AI Infrastructure Supply Chain Risk
Critical flaw in deployment toolkit weaponized before most enterprises could patch, signaling coordinated threat actor pivot to AI middleware bottlenecks.
A Server-Side Request Forgery vulnerability in LMDeploy—an open-source toolkit for deploying large language models—was exploited 12 hours and 31 minutes after CVE disclosure, marking the latest compression of security response windows in AI infrastructure.
CVE-2026-33626, published April 21, was first exploited at 03:35 UTC on April 22 from IP 103.116.72.119 in Hong Kong, according to Sysdig. The attacker probed AWS instance metadata services, Redis caches, MySQL databases, and administrative endpoints across an 8-minute session—10 distinct requests targeting cloud credentials and internal infrastructure. No public proof-of-concept code existed; the advisory text alone provided sufficient detail to construct a working exploit.
The vulnerability exists in LMDeploy’s load_image() function, which fetches arbitrary URLs without validating internal or private IP addresses, per the GitHub security advisory. Vision-language model deployments typically run on GPU instances with broad IAM roles for accessing model artifacts, training datasets, and cross-account resources—turning a standard SSRF primitive into a credential exfiltration vector.
AI Infrastructure as Attack Vector
LMDeploy, developed by Shanghai AI Laboratory’s InternLM project, is a compression and serving toolkit with approximately 7,800 GitHub stars. Despite its niche status, it does not appear in CISA’s Known Exploited Vulnerabilities catalog. The speed of exploitation reflects a pattern observed across AI Infrastructure over six months: critical flaws in inference servers, model gateways, and agent orchestration tools weaponized within hours of public disclosure.
“CVE-2026-33626 fits a pattern that we have observed repeatedly in the AI-infrastructure space over the past six months: critical vulnerabilities in inference servers, model gateways, and agent orchestration tools are being weaponized within hours of advisory publication.”
— Sysdig Threat Research Team
The attacker’s reconnaissance included queries to 169.254.169.254 (AWS instance metadata), localhost Redis on port 6379, MySQL on port 3306, and out-of-band DNS callbacks via requestrepo.com—a methodical enumeration of cloud credentials and internal service topology, GBHackers reported. Inference deployments commonly ship with Redis for prompt caching, MySQL for usage metering, and internal HTTP control planes, creating multiple pivot points from a single SSRF exploit.
Generative AI Accelerates Exploit Development
The absence of public exploit code at the time of weaponisation suggests advisory-to-exploit automation. Sysdig noted that “the advisory text, read carefully, is enough to craft an exploit. Generative AI is accelerating this collapse.” The named vulnerable file (lmdeploy/vl/utils.py), affected parameter, and missing validation logic provided a blueprint for rapid weaponisation—collapsing traditional patch windows that assumed days or weeks between disclosure and active exploitation.
On March 26, LiteLLM—an AI infrastructure library with 3.4 million daily downloads—was compromised by the TeamPCP threat group to harvest AWS, GCP, Azure tokens, SSH keys, and Kubernetes credentials. Malicious packages were removed from PyPI after approximately three hours. Zscaler ThreatLabz identified the incident as part of a broader surge in supply chain attacks targeting AI middleware.
IBM’s 2026 X-Force report documented a four-fold increase in supply chain and third-party compromises since 2020, with a 44% rise in attacks exploiting public-facing applications with weak or missing authentication controls, according to Cycode. The LMDeploy and LiteLLM incidents demonstrate threat actor focus on middleware layers connecting models to enterprise data—components with broad IAM privileges but limited security scrutiny.
Organisational Preparedness Gap
A 2026 State of AI and API Security report found that 60.2% of organisations admit profound lack of control over the security of AI models driving their applications, Security Boulevard reported. The LMDeploy vulnerability underscores this gap: vision-language model nodes with GPU instance access, model artifact storage permissions, and cross-account assume-role capabilities represent high-value targets with minimal defensive coverage.
- 12.5-hour exploitation window represents new baseline for AI infrastructure CVE response
- SSRF vulnerabilities in model serving layers unlock cloud credentials, not just internal network access
- LLM-assisted exploit development eliminates need for public proof-of-concept code
- AI middleware (inference servers, model gateways, agent orchestrators) emerging as primary supply chain attack vector
LMDeploy patched the vulnerability in version 0.12.3 by adding IP address validation to image loading functions. Organisations running vision-language models should audit IAM roles assigned to GPU instances, segment model serving infrastructure from production data stores, and implement network policies restricting instance metadata service access from application workloads.
What to Watch
Monitor whether CISA adds CVE-2026-33626 to the Known Exploited Vulnerabilities catalog despite LMDeploy’s relatively small install base—a decision that would signal recognition of AI infrastructure as critical dependency. Track additional disclosures in the inference server and model gateway ecosystem for sub-24-hour exploitation timelines. Watch for threat intelligence on the Hong Kong IP 103.116.72.119 and whether the April 22 attack represents reconnaissance for a broader campaign or opportunistic credential harvesting. The pattern suggests coordinated threat actor investment in AI infrastructure targeting capabilities—organisations should assume CVE disclosure and exploitation now occur on the same operational day.