Reflections on LLVM's switch to GitHub pull requests
2023-9-9 15:0:0 Author: maskray.me(查看原文) 阅读量:13 收藏

Since 2012, LLVM has relied on its self-hosted Phabricator instance (on Google Cloud Platform) for code review, but now it's making a transition to GitHub pull requests. In this post, I'll share my perspective on this switch, highlighting GitHub offers significant benefits in some areas while having major drawbacks in the review process.

I may update this article as the process stabilizes further.

Transition to GitHub pull requests

The move to GitHub pull requests has been a topic of discussion over the past few years. Several lengthy threads on the subject have emerged:

This transition could very well be the most contentious infrastructure change in LLVM's history. If you have the patience to delve into the discussions within the mentioned threads, you'll come across numerous remarks critiquing GitHub pull requests as being subpar. I'd like to express my gratitude to Joerg Sonnenberger for sharing a post by Gregory Szorc titled Problems with Pull Requests and How to Fix Them, which made a great analysis comparing several code review tools.

Nevertheless, a decision has been made, and the targeted transition date was set for September 1, 2023. The negotiation during the decision-making process could have been handled more strategically to mitigate some confusion.

On September 1, 2023 (or 2nd on some parts of the world), the pull request lockdown was completely removed.

Accessibility

In general, I believe that the majority of contributors consider GitHub to offer better accessibility. However, I have heard quite a few opinions suggesting that GitHub's code review capability is significantly worse than that of Phabricator. I will delve into this topic further later in this post.

Having contributed patches to more than 200 projects, many of which are one-off and even trivial, I genuinely appreciate it when a project uses GitHub pull requests. This is because I am already familiar with the system. On the other hand, if a project relies on a self-hosted code review website, I might find it less convenient as I'm not particularly keen on registering a username on a website I may never visit again. Even worse, I might need to invest time in getting acquainted with the system if I haven't encountered similar instances before.

The same argument applies to LLVM's self-hosted Phabricator instance. Many contributors have not used Phabricator before and would consider both the website and the command-line tool (https://github.com/phacility/arcanist), to be challenging to use. In general, I believe that GitHub is more contributor-friendly but perhaps not as reviewer-friendly.

Automation

GitHub provides GitHub Apps and GitHub Actions to extend its functionality. With these, we can automate pull request labelling, testing, code analysis, code coverage, and potentially even a merge queue in the future. Some fantastic tooling is available for free and widely used by other open source projects, making it easy to emulate how other projects have set up automation.

Phabricator can also handle automation, but there are far fewer resources available for it. LLVM's self-hosted Phabricator instance, for instance, relies on https://github.com/google/llvm-premerge-checks and Buildkite.

Patch subscription

The llvm-project repository is vast. With a code frequency of 100+ commits every day, it's practically impossible for anyone to monitor every new commit. Nonetheless, many people wish to stay informed about changes to specific components, making patch subscription essential.

One way to achieve this is through mailing lists, such as llvm-commits. This list contains emails about new pull requests, edits, GitHub actions, labelling, resolved issues, and more, making it quite noisy.

The other method is to utilize the code review tool, formerly Phabricator and now GitHub. With Phabricator, users can set up fairly complex subscription rules known as Herald. When a patch title, description, affected files, or the acting user matches certain criteria, you can take actions like adding yourself as a reviewer/subscriber or sending a one-off email.

GitHub, however, is less flexible in this regard. Individual users can choose to watch all pull requests, but they can't do so selectively. Interestingly, users can watch issues with a specific label but not pull requests with a specific label.

To enable component-based subscription, the llvm organization on GitHub has created multiple pr-subscribers-* teams, which users can freely join them. ( Note: A maintainer is supposed to accept every join request. Unfortunately, GitHub only displays the "pending request" information on the https://github.com/orgs/llvm/teams/pr-subscribers-* page and not on any other page. It's not reasonable to expect a maintainer to routinely check the https://github.com/orgs/llvm/teams/pr-subscribers-* pages for pending join requests. So if they miss the email notification that says would like to join "LLVM", the request may remain pending indefinitely. )

Then we use label-based subscription. A GitHub action is set up to label every pull request, and these labels are used to notify pr-subscribers-* teams. For example, a pull request affecting clang/lib/Driver/XX will receive the labels clang and clang:driver, and the pr-subscribers-clang and pr-subscribers-clang:driver teams will be notified.

.github/CODEOWNERS

Previously, we added pr-subscribers-* teams to .github/CODEOWNERS. Due to GitHub's CODEOWNERS mechanism, the pr-subscribers-clang team would be added as a reviewer when a pull request affecting clang/xx was created. pr-subscribers-clang members would receive an email notification about the pull request with a full diff.

However, a complication arose when a member of the pr-subscribers-* team approved a change. It resulted in a message saying $user approved these changes on behalf of llvm/pr-subscribers-xx, which could be misleading if the user did not wish to assume such authority. In addition, the team was automatically removoed as a team reviewer, adding to the confusion. This use case wasn't in line with GitHub's intended functionality, and there was a risk that GitHub's future changes might disrupt our workflow.

Email filtering

Filtering is another crucial aspect of managing notifications. GitHub supports numerous notification reasons.

I am still in the process to make a curated list of Gmail filters. Here are some filters I currently use:

Code review

Patch evolution

In Phabricator, the body a differential (Phabricator's term for a patch) contains is a patch file. The patch file is based on a specific commit, but Phabricator is not required to know the base commit. A stable identifier, Differential Revision: https://reviews.llvm.org/Dxxxxx, in the commit message connects a patch file to a differential. When you amend a patch, Phabricator recognizes that the differential has evolved from patch X to patch Y. The user interface allows for comparisons between any two revisions associated with a differential. Additionally, review comments are confidently associated with the source line.

On the other hand, GitHub structures the concept of pull requests around branches and enforces a branch-centric workflow. A pull request centers on the difference (commits) between the base branch and the feature branch. GitHub does not employ a stable identifier for commit tracking. If commits are rebased, reordered, or combined, GitHub can easily become confused.

When you force-push a branch after a rebase, the user interface displays a line such as "force-pushed the BB branch from X to Y". Clicking the "compare" button in GitHub presents X..Y, which includes unrelated commits. Ideally, GitHub would show the difference between the two patch files, as Phabricator does, but it only displays the difference between the two head commits. These unrelated in-between commits might be acceptable for projects with lower commit frequency but can be challenging for a project with a code frequency of 100+ commits every day.

The fidelity of preserving inline comments after a force push has always been a weakness. In the past, there was a notorious "lost inline comment" problem. Nowadays, the situation has improved, but some users still report that inline comments may occasionally become misplaced.

Due to the difficulties in comparing revisions and the lack of confidence in preserving inline comments, some recommendations suggest adopting less flexible and less preferred workflows, which involve only appending new commits and discouraging rebases. This approach sometimes results in a cluttered commit history, with commit messages like "fix typo," "add test," and "fix ci."

In a large repository, avoiding rebases may not be realistic due to other commits frequently modifying nearly identical lines. When working with both the latest main branch and the pull request branch, switching between branches results in numerous rebuilds.

Commit message

GitHub's repository setting allows three options for pull requests: "Allow merge commits", "Allow squash merging", and "Allow rebase merging".

The default effect is quite impactful.

  • Many projects utilize merge commits when a linear history is preferred, even for just a single commit.
  • When "Squash and merge" is selected, the default commit message becomes a concatenation of all commits, often resulting in a message filled with "fix typo," "add test," and "fix ci".

In 2022, GitHub finally introduced an option called "Pull request title and description" for the "Allow squash merging" option. This new option mitigates some problems.

Patch series

I am not well-versed in reviewing patch series on GitHub, but this is a widely acknowledged pain point. Numerous projects are exploring ways to alleviate this issue.

Miscellaneous

Pros

  • Fetching a pull request is more straightforward. git fetch origin pull/xx/head:local_branch. Phabricator allows the less-known method: curl -L 'https://reviews.llvm.org/Dxxxxx?download=1' | patch -p1
  • @mention applies to every user on GitHub, which is convenient when you need to seek for a user's opinions. You cannot expect every user to have an account on a self-hosted instance.
  • gh is more powerful than arcanist.

Cons

  • Fetching a pull request at an old revision can be challenging.
  • The narrow display can be somewhat inconvenient.
  • The loss of side-by-side diff.
  • Viewing diff and conversations requires switching between tabs.
  • Collapsed comments are challenging to locate and often require multiple clicks, makeing it difficult for reviewers to confirm whether a comment has been resolved.
  • One cannot comment on a source line a few lines out of the affected lines.
  • The inability to search for lines in nearby context when they are not displayed.

I've noticed that my review productivity has decreased, a sentiment shared by many others. It's disheartening that alternative solutions haven't been thoroughly considered. However, we must move forward and adapt to the new workflow while striving to mitigate the loss in productivity.

I hope that future iterations of GitHub will incorporate some ideas from Pull Request feature requests for GitHub #56635.

I've voiced numerous concerns regarding GitHub pull requests, and for that, I apologize. It's essential to acknowledge that GitHub contributes significantly to open source projects in many positive ways. My intention in sharing these concerns is to express a genuine hope that GitHub pull requests can be enhanced to better support large projects.

I would also like to express my gratitude to Phorge, a community-driven fork of Phabricator, for their dedication and contributions, even as LLVM decided to migrate to the free but proprietary solution provided by GitHub. Phorge's commitment to providing alternatives and nurturing an open-source community for organizations that favor self-hosted solutions is truly commendable.

Future of reviews.llvm.org

On September 5, 2023, I added a red banner to reviews.llvm.org to discourage new patches.

Transitioning existing differentials to GitHub pull requests could potentially cause disruption. Resurrecting old patches is a practice that people regularly engage in. As per the current schedule, new differentials will be disallowed on October 1, 2023.

It is anticipated that at some point next year reviews.llvm.org will become a read-only website. To preserve /Dxxxxx pages (referenced by many commits), we can utilize phab-archive from Mercurial.

As activities on Phabricator wind down, maintenance should become more lightweight.

In the past two weeks, there have been different IP addresses crawling /source/llvm-github/ pages. It very like a botnet as adjacent files are visited by IP addresses from very different autonomous systems. Such a visit will cause Phabricator to spawn a process like git log --skip=0 -n 30 --pretty=format:%H:%P 988a16af929ece9453622ea256911cdfdf079d47 -- llvm/lib/Demangle/ItaniumDemangle.cpp that takes a few seconds (llvm-project is huge). I have redirected some pages to https://github.com/llvm/llvm-project/.

On September 8, 2023, I resized the database disk to 850GB to fix a full disk issue. Hopefully, we won't need to resize the disk again!


文章来源: https://maskray.me/blog/2023-09-09-reflections-on-llvm-switch-to-github-pull-requests
如有侵权请联系:admin#unsafe.sh