Researchers find widespread remote code execution risk in AI inference engines from unsafe ZMQ and pickle use

by

Cybersecurity researchers reported critical remote code execution vulnerabilities affecting several AI inference engines, including components from Meta, NVIDIA, Microsoft and open-source projects such as vLLM and SGLang. Oligo Security researcher Avi Lumelsky said the flaws all trace to an overlooked unsafe use of ZeroMQ and Python\’s pickle deserialization.

At the center of the issue is a pattern researchers call ShadowMQ, in which insecure deserialization logic has been copied across projects. The original vulnerability was reported in Meta\’s Llama framework and involved using ZeroMQ\’s recv_pyobj() to accept pickled objects over a network-exposed socket, allowing an attacker who can send crafted data to trigger arbitrary code execution; maintainers also addressed related issues in the pyzmq library.

Oligo said the same unsafe pattern – pickle deserialization over unauthenticated ZMQ TCP sockets – recurred in multiple inference frameworks. The issues have been assigned identifiers including CVE-2025-30165 for vLLM (CVSS 8.0), CVE-2025-23254 for NVIDIA TensorRT-LLM (CVSS 8.8, fixed in version 0.18.2) and a commit addressing CVE-2025-60455 for Modular Max Server; vLLM maintainers have switched to the V1 engine by default and SGLang has implemented what has been described as incomplete fixes, while Microsoft\’s Sarathi-Serve remained unpatched in the disclosure.

Researchers traced how the vulnerability pattern propagated, saying it often arose from direct code reuse or copy-pasting: vulnerable files indicate adaptations between projects, and one project borrowed logic from another, effectively repeating the same unsafe practice across codebases, the report said.

Because inference engines are often deployed as cluster nodes, a successful compromise of a single node could allow attackers to execute arbitrary code, escalate privileges, steal models or deploy persistent malicious payloads such as cryptocurrency miners, the researchers warned.

Separately, AI security platform Knostic reported techniques to compromise Cursor\’s built-in browser via JavaScript injection and by registering a rogue local Model Context Protocol server. The report said a malicious MCP server can replace browser pages to harvest credentials.

The Knostic report also demonstrated how a malicious extension or injected JavaScript can run inside the Node.js interpreter used by the IDE, inheriting full file-system privileges and the ability to modify or persist code and extensions, which the company said could turn the IDE into a malware distribution and exfiltration platform. The researchers recommended disabling Auto-Run features, vetting extensions, installing MCP servers only from trusted sources, using least-privilege API keys and auditing MCP server code for critical integrations.