Description
We have added a policy to use the asf.yaml Github Collaborators: https://github.com/apache/kafka-site/pull/510
The policy states that we set this list to be the top 20 commit authors who are not Kafka committers. Unfortunately, it's not trivial to compute this list.
Here is the process I followed to generate the list the first time (note that I generated this list on 2023-04-28, so the lookback is one year:
1. List authors by commit volume in the last year:
$ git shortlog --email --numbered --summary --since=2022-04-28 | vim
2. manually filter out the authors who are committers, based on https://kafka.apache.org/committers
3. truncate the list to 20 authors
4. for each author
4a. Find a commit in the `git log` that they were the author on:
commit 440bed2391338dc10fe4d36ab17dc104b61b85e8 Author: hudeqi <[email protected]> Date: Fri May 12 14:03:17 2023 +0800 ...
4b. Look up that commit in Github: https://github.com/apache/kafka/commit/440bed2391338dc10fe4d36ab17dc104b61b85e8
4c. Copy their Github username into .asf.yaml under both the PR whitelist and the Collaborators lists.
5. Send a PR to update .asf.yaml: https://github.com/apache/kafka/pull/13713
This is pretty time consuming and is very scriptable. Two complications:
- To do the filtering, we need to map from Git log "Author" to documented Kafka "Committer" that we can use to perform the filter. Suggestion: just update the structure of the "Committers" page to include their Git "Author" name and email (https://github.com/apache/kafka-site/blob/asf-site/committers.html)
- To generate the YAML lists, we need to map from Git log "Author" to Github username. There's presumably some way to do this in the Github REST API (the mapping is based on the email, IIUC), or we could also just update the Committers page to also document each committer's Github username.
Ideally, we would write this script (to be stored in the Apache Kafka repo) and create a Github Action to run it every three months.