I’m a big fan of the “try weird stuff and see what happens” approach to security research. Modern software has a huge number of bugs, and engineering teams often have to prioritize which bugs to fix, based in part on the number of users affected by each bug. The result is that glaringly obvious bugs (say, a website being down) tend to get fixed very quickly. On the other hand, a bug that only occurs in exceedingly unusual circumstances, and has no obvious security impact, can stick around for a long time. Occasionally, new code piles up around the bug, making faulty assumptions about how the buggy system works. Whenever I find one of those bugs, it can be fun to trigger the buggy code in a variety of different ways and see if anything else breaks. Provided that my testing isn’t causing problems for anyone else, this is satisfying in the same way that watching structural demolition videos is satisfying – you can never tell which parts of a structure depend on other parts,1 until you break one small piece and the whole architecture topples.

Messing with pull request inputs

I recently found a bug like that in GitHub’s pull requests.

Every pull request is supposed to have a “base branch” (also known as a “base ref”), which specifies the branch that a set of proposed changes is being compared against. Often, this base branch is the “main” branch of a repository.

As the creator of a pull request, we can decide which base branch it should have. For instance, instead of setting the base branch to the “main” branch, we could set it to another branch that’s still a work in progress, or a release branch that gets updated less frequently.

But this is a user input supplied as a string, so technically we don’t have to provide a branch name. What would happen if we created a pull request using GitHub’s API with the “base branch” name set to something else, such as a commit hash?

mutation {
  createPullRequest(input: {
    repositoryId: "..."
    title: "Update README.md"
    headRefName: "not-an-aardvark:patch-1"

    # ??? this is a commit hash, not a ref
    baseRefName: "fd9cfdc590e789ae559b5a7878e7e6b929a249d9"
  }) {
    __typename
  }
}

When I tried this on a test repository, GitHub politely returned an error:

{
  "errors": [{ "message": "Base ref must be a branch" }]
}

Well, that seems reasonable. Not sure what I expected.

However, I had just found another issue where GitHub applied stricter validation for creating a pull request than for editing one.2 So on a hunch, I tried creating a pull request with a valid base branch, and then editing the base branch name to a commit hash afterwards. Surprisingly, this worked, and I ended up with a pull request where the base branch is a commit.

Nonsensical pull requests

It’s worth noting at this point that a “pull request where the base branch is a commit” doesn’t really make sense as a concept. In git terminology, branches and commits are different types of things, and they’re not interchangeable. Usually, a pull request can be “merged” by updating the base branch to point to a new commit that includes the changes in the pull request. But it’s not clear what it would even mean to “merge” a pull request where the base branch is itself a commit. Commits, as identified by a particular hash, are immutable – you can’t just merge other commits into them.

However, software sometimes does stuff regardless of whether it makes sense, so here we have a pull request where the base branch is a commit.

A screenshot of a GitHub pull request. The title is "Update README.md". The pull request is in an "open" state, and at the top it says, "not-an-aardvark wants to merge 1 commit into an-organization:fd9cfdc590e789ae559b5a7878e7e6b929a249d9 from not-an-aardvark:patch-1". The screenshot is annotated with a red arrow pointing to the middle of that sentence. Further down, the screenshot indicates that the pull request was created 6 days ago and has no description provided. There is one commit, "Update README.md". Below the commit, an event is displayed: "not-an-aardvark changed the base branch from 'main' to 'fd9cfdc590e789ae559b5a7878e7e6b929a249d9'". At the bottom, there is a "Merge pull request" button, which is disabled. The message above it says, "This branch has conflicts that must be resolved".

GitHub’s UI makes a valiant attempt to display this pull request, given that it’s nonsense. The only odd thing in this screenshot is that the UI says there are merge conflicts. There aren’t actually any merge conflicts in this pull request, by the typical definition. I suspect something on the backend tried to simulate a merge and failed (since merging this wouldn’t make any sense), and then the UI assumed that the failure was due to merge conflicts.

Separately, I also tried setting the base ref to a commit shorthash (rather than a full 40-character hash), with similar results in the UI. When I then pushed another commit with the same shorthash, the link to the base branch in the UI started returning a 404 (since it was linking to an ambiguous shorthash), but everything else seemed to keep working.

So that’s all nice, but it didn’t seem particularly useful or security-relevant. While this was technically only possible as a result of a bug, it was very much a “Doctor, it hurts when I do this” situation – the buggy behavior was entirely self-inflicted, and it wasn’t clear why anyone would want to self-inflict it. Even if I created the pull request on someone else’s repository, they could just, say, close the pull request. I spent a bit of time trying to figure out whether the bug was actually useful for anything.

GitHub Actions crash course

Let’s take a brief detour to talk about GitHub Actions, an automation platform built into GitHub. GitHub Actions is effectively a hosted bot service – it allows a project to run arbitrary code in a sandbox in response to events that occur on GitHub. For example, some common uses of GitHub Actions include running a project’s tests every time someone pushes a commit, deploying recent changes to an external server, and adding labels to newly-created GitHub issues. (I used to maintain large open-source projects, and I can attest that automation like this is very useful.)

GitHub Actions has an interesting permission model. To summarize:

  • The configuration for GitHub Actions consists of a set of workflows, stored in the .github/workflows/ folder in any given repository. Each workflow can listen for specific events in that repository, and execute code in response to those events.
  • For workflows that need sensitive values (e.g. credentials for deploying to a server), a maintainer can also add custom secrets via the repository settings on GitHub. A workflow’s configuration can then specify that the secret should be injected into an environment variable. Secrets are only supposed to be accessible to people with write access to the repository, and they’re not copied over when a repository is forked.
  • By default, each repository always has a secret called GITHUB_TOKEN, which is a GitHub API token with write access to that repository.3

How does GitHub Actions handle pull requests?

Pull requests from forks are a special case for GitHub Actions.

It’s often useful to trigger automation in response to a pull request. For example, a maintainer might want to automatically run a project’s tests so that a contributor can see whether their change breaks anything. A maintainer might also want to automatically add comments and labels to a new pull request.

However, it’s important that the author of a pull request can’t access the repository’s secrets (e.g. by updating a workflow file to print out the secrets instead of running tests). To address this issue, GitHub provides two different ways to trigger Actions workflows from pull requests:

  • The pull_request event simulates a merge of the pull request, and triggers Actions workflows based on the configuration and code at the merge commit. This is intended for e.g. running tests, and verifying that the code would still work if the pull request was merged. However, since the code in the pull request is potentially malicious, workflows triggered by the pull_request event are run without access to the repository’s secrets.
  • The pull_request_target event triggers Actions workflows based on the configuration and code at the base branch of the pull request. Since the base branch is part of the base repository itself and not part of a fork, workflows triggered by pull_request_target are trusted and run with access to secrets. This is intended for e.g. adding comments and labels to new pull requests (which requires a GitHub API token).

Nonsensical pull requests vs. GitHub Actions

So in case you didn’t spot it, the nonsensical pull requests that we created earlier weren’t entirely useless – they also had a side-effect of breaking the GitHub Actions permission model.

To see why, look at the last part of the previous section:

Since the base branch is part of the base repository itself and not part of a fork, workflows triggered by pull_request_target are trusted and run with access to secrets.

Only people with write access to the base repository can push to branches in it. So this seems fine, provided that the “base branch” of a pull request is actually a branch.

But we just created a pull request where the “base branch” is a commit hash, not a branch. And anyone can create a new commit hash in the base repository, since GitHub shares commits between forks.

As a result, an attacker could:

  1. Fork any public repository that uses GitHub Actions4
  2. Create a pull request to the repository (updating the readme, say)
  3. Create a malicious Actions workflow with the pull_request_target event (example), and separately commit it to the fork
  4. Update the base branch of the pull request from step (2) to the commit hash from step (3)

And then the malicious Actions workflow would run, immediately, with the secrets from the victim repository. At this point, the attacker would have write access to the repository (due to the GITHUB_TOKEN secret). They would also have access to any services that the repository had integrated with via GitHub Actions secrets.

Many open-source projects publish directly to package managers from GitHub, using a credential saved as a GitHub Actions secret. As a result, it likely would have been possible for an attacker to pull off a much larger-scale version of the eslint-scope supply chain attack, by exploiting this issue on a bunch of open-source projects at once, stealing all of their package manager credentials, and publishing a bunch of package updates containing malware.5 6

Timeline

As always, the GitHub security team was very responsive and patched the issue quickly.

  • 2021-02-04 08:37:44 UTC I reported this issue to GitHub’s bug bounty program via HackerOne
  • 2021-02-04 14:34:56 UTC Initial reply from GitHub, stating that they were looking into the issue
  • 2021-02-04 16:30:36 UTC Issue confirmed by GitHub security team
  • 2021-02-04 22:14:32 UTC Issue partially patched on github.com, GitHub replied on HackerOne to double-check that the patch resolved the issue

    At this point, an Actions build would no longer trigger for a pull request if the base ref was a 40-character commit hash. However, I found that a build would still trigger if the base ref was set to a shorthash, or to another symbolic ref in the base repo that the attacker can modify (e.g. refs/pull/5/head, which always points to the head branch of an existing pull request). So the issue was still exploitable at this point.

  • 2021-02-04 22:36:05 UTC I replied to GitHub stating roughly the above, and suggested that it might be better to prevent setting the base ref of a pull request to a non-branch at all, in addition to trying to detect “bad” base refs from Actions after the fact.
  • (sometime a few hours later) Issue fully patched on github.com, apparently implementing the suggestion from my comment. (This fix might have already been in progress as a secondary mitigation, independently of the fact that I suggested it in a comment.)
  • 2021-03-02, around 20:45 UTC GitHub Enterprise Server 3.0.1 was released with a fix for this issue, as CVE-2021-22862.
  • 2021-03-03 14:45:24 UTC GitHub awarded a $25000 bug bounty

  1. Unless you’re a civil engineer in charge of the operation, I guess. Messing around with bugs is really only satisfying when it’s someone else’s responsibility to deal with the problem. For bugs in my own projects, it’s more like a source of anxiety. I hope everyone at GitHub who has to deal with these bug reports is doing well. 

  2. I’m not sure why there was a pattern of issues like this. My guess is that when the validity checks were added, it was more convenient to only apply them to new pull requests (rather than doing complex migrations on existing pull requests), and validating-only-on-creation was supposed to be a way of doing that. The downside of this approach is that all new code still needs to account for the “invalid” case due the presence of old data, which can undermine some of the benefit of adding validation in the first place. 

  3. The interaction of GITHUB_TOKEN with forks is actually a bit more complicated, but those extra details aren’t directly relevant here. 

  4. The restrictions on this attack were that (a) the attacker must be able to create a fork of the repository, and (b) the victim repository must have used GitHub Actions in some fashion prior to the attack. (My understanding is that if the repository had never used GitHub Actions, the malicious workflow wouldn’t trigger, presumably as a result of some optimization. Since GitHub Actions is by far the most popular CI for GitHub projects, this only slightly narrows down the scope of the issue.) 

  5. To be clear, I have no reason to believe that this issue was exploited in the wild. The attack would be publicly visible in a pull request’s event history as a base-branch change, although it could be mistaken for a benign action if the maintainer wasn’t paying close attention. 

  6. Incidentally, I was on the ESLint team at the time that the eslint-scope attack happened. In the aftermath of that attack, I upgraded our release infrastructure to use npm two-factor authentication at runtime, with a shared secret distributed among team members via Keybase. While this has some caveats and doesn’t solve all supply-chain threat models, it largely ensures that a release can only happen when a maintainer is present and trying to publish a release. The added overhead for our team was quite small, and I would recommend the approach to other open-source projects if you’re looking to reduce the risk of an attack like this.

    The main implementation challenge was that our release infrastructure needed to be able to accept a short-term credential, as input from a human, in the middle of a build. (Prior to publishing, our release pipeline would compile some artifacts and then run all of a project’s tests over the course of several minutes, so any TOTP supplied at the beginning of the release process would have expired by the time it was time to publish the release.) I’m unsure whether it’s possible to do this easily with GitHub Actions.