Unsanitized RPC function calls
The vulnerability is located in PyTorch’s distributed Remote Procedure Call (RPC) component, torch.distributed.rpc. The component facilitates inter-process communication between the various nodes involved in distributed training scenarios, in which a task is distributed between multiple deployments that function as workers and is controlled from a master node.
When using RPC, workers can serialize PythonUDFs (User Defined Functions) and send them to the master node, which then deserializes and runs them. The problem is that in PyTorch versions older than 2.2.2 there are no restrictions on calling built-in Python functions such as eval, which further allows executing arbitrary commands on the underlying operating system.
“An attacker can exploit this vulnerability to remotely attack master nodes that are starting distributed training,” the researchers who reported the vulnerability wrote in their report. “Through RCE [remote code execution], the master node is compromised, so as to further steal the sensitive data related to AI.”