This makes it a good target for deploying something like a trojan that connects back to the attackers and then collects all possible sensitive information exposed by future workflow executions. But what to use as a trojan that wouldn’t be detected by antivirus products or whose communications wouldn’t get blocked? The GitHub Actions runner agent itself, or rather another instance of it that’s not linked to the PyTorch organization but to a GitHub organization controlled by the attackers.
“Our ‘Runner on Runner’ (RoR) technique uses the same servers for C2 as the existing runner, and the only binary we drop is the official GitHub runner agent binary, which is already running on the system. See ya, EDR and firewall protections,” Stawinski said.
Extracting sensitive access tokens
Up until this step, the attackers managed to get a very stealthy trojan program running inside a machine that’s part of the organization’s development infrastructure and which is used to execute sensitive jobs as part of its CI/CD pipeline. The next step is post-exploitation: trying to exfiltrate sensitive data and pivot to other parts of the infrastructure.
Workflows often include access tokens to GitHub itself or other third-party services. These tokens are required for the jobs that are defined in the workflow to execute correctly. For example, the build agent needs read privileges to check out the repository first and might also need write access to publish the resulting binary as a new release or to modify existing releases.
These tokens are stored on the filesystem of the runner in various locations like the.git configuration file or in environment variables and can obviously be read by the stealthy “trojan” that runs with root privileges. Some, such as GITHUB_TOKEN, are ephemeral and only valid during the execution of the workflow, but the researchers found ways to extend their life. Even if they wouldn’t have found these methods, new workflows with newly generated tokens are executed all the time on a busy repository like PyTorch, so there are plenty of new ones to collect.
“The PyTorch repository used GitHub secrets to allow the runners to access sensitive systems during the automated release process,” Stawinski said. “The repository used a lot of secrets, including several sets of AWS keys and GitHub Personal Access Tokens (PATs).”
PATs are often over privileged and are an attractive target for attackers, but in this case they were used as part of other workflows that were not executing on the compromised self-hosted runner. However, the researchers found ways to use the ephemeral GitHub tokens they were able to collect to place malicious code into workflows that were executing on other runners and contained those PATs.
“It turns out that you can’t use a GITHUB_TOKEN to modify workflow files,” Stawinski said. “However, we discovered several creative…’workarounds’…that will let you add malicious code to a workflow using a GITHUB_TOKEN. In this scenario, weekly.yml used another workflow, which used a script outside the .github/workflows directory. We could add our code to this script in our branch. Then, we could trigger that workflow on our branch, which would execute our malicious code. If this sounds confusing, don’t worry; it also confuses most bug bounty programs.”
In other words, even if an attacker can’t modify a workflow directly, they might be able to modify an external script that is called by that workflow and get their malicious code in that way. Repositories and CI/CD workflows can get quite complex with many interdependencies, so such small oversights are not uncommon.
Even without the PATs, the GITHUB_TOKEN alone with write privileges would have been enough to poison PyTorch’s releases on GitHub and separately extracted AWS keys could have been used to backdoor PyTorch releases hosted on the organization’s AWS account. “There were other sets of AWS keys, GitHub PATs, and various credentials we could have stolen, but we believed we had a clear demonstration of impact at this point,” the researchers said. “Given the critical nature of the vulnerability, we wanted to submit the report as soon as possible before one of PyTorch’s 3,500 contributors decided to make a deal with a foreign adversary.”
Mitigating risk from CI/CD workflows
There are many lessons to learn from this attack for software development organizations: from the risks associated with running self-hosted GitHub Actions runners in default configurations to the risks of having workflows that execute scripts from outside the workflows directory to risks associated with overprivileged access tokens and legitimate applications repurposed as trojans — other researchers did this before with Amazon’s AWS System Manager agent and with Google’s SSO and device management solution for WIndows.
“Securing and protecting the runners is the responsibility of end users, not GitHub, which is why GitHub recommends against using self-hosted runners on public repositories,” Stawinski said. “Apparently, not everyone listens to GitHub, including GitHub.”
However, if self-hosted runners are necessary, organizations should at the very least consider changing the default setting of “Require approval for first-time contributors” to “Require approval for all outside collaborators.” It’s also a good idea to make self-hosted runners ephemeral and to execute workflows from fork PRs only on GitHub-hosted runners.
This is not the first time when insecure use of GitHub Actions features has generated software supply-chain security risks. Other CI/CD services and platforms have also had their own vulnerabilities and insecure default configurations. “The issues surrounding these attack paths are not unique to PyTorch,” the researchers said. “They’re not unique to ML repositories or even to GitHub. We’ve repeatedly demonstrated supply chain weaknesses by exploiting CI/CD vulnerabilities in the world’s most advanced technological organizations across several CI/CD platforms, and those are only a small subset of the greater attack surface.”