Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14995

Automate asf.yaml collaborators refresh

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Resolved
    • None
    • 4.0.0
    • None

    Description

      We have added a policy to use the asf.yaml Github Collaborators: https://github.com/apache/kafka-site/pull/510

      The policy states that we set this list to be the top 20 commit authors who are not Kafka committers. Unfortunately, it's not trivial to compute this list.

      Here is the process I followed to generate the list the first time (note that I generated this list on 2023-04-28, so the lookback is one year:

      1. List authors by commit volume in the last year:

      $ git shortlog --email --numbered --summary --since=2022-04-28 | vim 

      2. manually filter out the authors who are committers, based on https://kafka.apache.org/committers

      3. truncate the list to 20 authors

      4. for each author

      4a. Find a commit in the `git log` that they were the author on:

      commit 440bed2391338dc10fe4d36ab17dc104b61b85e8
      Author: hudeqi <[email protected]>
      Date:   Fri May 12 14:03:17 2023 +0800
      ...

      4b. Look up that commit in Github: https://github.com/apache/kafka/commit/440bed2391338dc10fe4d36ab17dc104b61b85e8

      4c. Copy their Github username into .asf.yaml under both the PR whitelist and the Collaborators lists.

      5. Send a PR to update .asf.yaml: https://github.com/apache/kafka/pull/13713

       

      This is pretty time consuming and is very scriptable. Two complications:

      • To do the filtering, we need to map from Git log "Author" to documented Kafka "Committer" that we can use to perform the filter. Suggestion: just update the structure of the "Committers" page to include their Git "Author" name and email (https://github.com/apache/kafka-site/blob/asf-site/committers.html)
      • To generate the YAML lists, we need to map from Git log "Author" to Github username. There's presumably some way to do this in the Github REST API (the mapping is based on the email, IIUC), or we could also just update the Committers page to also document each committer's Github username.

       

      Ideally, we would write this script (to be stored in the Apache Kafka repo) and create a Github Action to run it every three months.

       

      Attachments

        Activity

          People

            joaopedrofonseca João Pedro Fonseca
            vvcephei John Roesler
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: