Skip to content

DPTP-4801: cmd/ci-secret-bootstrap: Log Secret change keys#5130

Open
wking wants to merge 1 commit intoopenshift:mainfrom
wking:log-secret-change-keys
Open

DPTP-4801: cmd/ci-secret-bootstrap: Log Secret change keys#5130
wking wants to merge 1 commit intoopenshift:mainfrom
wking:log-secret-change-keys

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented Apr 23, 2026

Today there was a Deck outage, when both of two Pods restarted their 'deck' containers simultaneously at 22:23, causing:

Application is not available

errors to https://prow.ci.openshift.org/ users. Timestamps and exit 0:

$ oc -n ci get -l component=deck -o json pods | jq -r '.items[].status.containerStatuses[] | select(.restartCount > 0) | {name, restartCount, lastState}'
{
  "name": "deck",
  "restartCount": 1,
  "lastState": {
    "terminated": {
      "containerID": "cri-o://21786efafb65fd5b67e72eb2a1a91405b182d3b20daeb549980217210bf0e22a",
      "exitCode": 0,
      "finishedAt": "2026-04-23T22:23:37Z",
      "reason": "Completed",
      "startedAt": "2026-04-23T20:04:03Z"
    }
  }
}
{
  "name": "deck",
  "restartCount": 1,
  "lastState": {
    "terminated": {
      "containerID": "cri-o://7af2f63b995729d1e8e64b1776a6a2aa3439076e9d55616182ac61b1c64fe855",
      "exitCode": 0,
      "finishedAt": "2026-04-23T22:23:58Z",
      "reason": "Completed",
      "startedAt": "2026-04-23T19:28:51Z"
    }
  }
}

The container exits were because of Kubeconfig changes:

$ oc -n ci logs -c deck --previous deck-54c8d55b65-6fxcn | tail -n16 | head -n2
{"component":"deck","file":"sigs.k8s.io/prow/cmd/deck/main.go:392","func":"main.main.func2","level":"info","msg":"Kubeconfig changed, exiting to trigger a restart","severity":"info","time":"2026-04-23T22:23:36Z"}
{"component":"deck","file":"sigs.k8s.io/prow/pkg/interrupts/interrupts.go:63","func":"sigs.k8s.io/prow/pkg/interrupts.handleInterrupt","level":"info","msg":"Received signal.","severity":"info","signal":2,"time":"2026-04-23T22:23:36Z"}

The Secret update was this ci-secret-bootstrapper:

{"cluster":"app.ci","component":"ci-secret-bootstrap","file":"/go/src/github.com/openshift/ci-tools/cmd/ci-secret-bootstrap/main.go:815","func":"main.updateSecrets","level":"debug","msg":"secret updated","name":"deck","namespace":"ci","severity":"debug","time":"2026-04-23T22:23:02Z","type":"Opaque"}

This commit adds more logs to that secret updated entry, to make it easier for us to figure out which change triggered the next bump, so we can decide if it's appropriate, and the kind of thing we'll accept a few minutes of Deck outage over, or if it's surprising churn.

Summary by CodeRabbit

  • Improvements
    • Error messages now include a key-level summary showing which secret keys were added, changed, or removed when secret data differs, making failed updates clearer.
    • Debug and success logs now include the same key-level change details so updates and troubleshooting surface precise differences rather than generic notices.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 23, 2026

@wking: This pull request references DPTP-4801 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Today there was a Deck outage, when both of two Pods restarted their 'deck' containers simultaneously at 22:23, causing:

Application is not available

errors to https://prow.ci.openshift.org/ users. Timestamps and exit 0:

$ oc -n ci get -l component=deck -o json pods | jq -r '.items[].status.containerStatuses[] | select(.restartCount > 0) | {name, restartCount, lastState}'
{
 "name": "deck",
 "restartCount": 1,
 "lastState": {
   "terminated": {
     "containerID": "cri-o://21786efafb65fd5b67e72eb2a1a91405b182d3b20daeb549980217210bf0e22a",
     "exitCode": 0,
     "finishedAt": "2026-04-23T22:23:37Z",
     "reason": "Completed",
     "startedAt": "2026-04-23T20:04:03Z"
   }
 }
}
{
 "name": "deck",
 "restartCount": 1,
 "lastState": {
   "terminated": {
     "containerID": "cri-o://7af2f63b995729d1e8e64b1776a6a2aa3439076e9d55616182ac61b1c64fe855",
     "exitCode": 0,
     "finishedAt": "2026-04-23T22:23:58Z",
     "reason": "Completed",
     "startedAt": "2026-04-23T19:28:51Z"
   }
 }
}

The container exits were because of Kubeconfig changes:

$ oc -n ci logs -c deck --previous deck-54c8d55b65-6fxcn | tail -n16 | head -n2
{"component":"deck","file":"sigs.k8s.io/prow/cmd/deck/main.go:392","func":"main.main.func2","level":"info","msg":"Kubeconfig changed, exiting to trigger a restart","severity":"info","time":"2026-04-23T22:23:36Z"}
{"component":"deck","file":"sigs.k8s.io/prow/pkg/interrupts/interrupts.go:63","func":"sigs.k8s.io/prow/pkg/interrupts.handleInterrupt","level":"info","msg":"Received signal.","severity":"info","signal":2,"time":"2026-04-23T22:23:36Z"}

The Secret update was this ci-secret-bootstrapper:

{"cluster":"app.ci","component":"ci-secret-bootstrap","file":"/go/src/github.com/openshift/ci-tools/cmd/ci-secret-bootstrap/main.go:815","func":"main.updateSecrets","level":"debug","msg":"secret updated","name":"deck","namespace":"ci","severity":"debug","time":"2026-04-23T22:23:02Z","type":"Opaque"}

This commit adds more logs to that secret updated entry, to make it easier for us to figure out which change triggered the next bump, so we can decide if it's appropriate, and the kind of thing we'll accept a few minutes of Deck outage over, or if it's surprising churn.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 23, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
Once this PR has been reviewed and has the lgtm label, please assign prucek for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Compute per-key diffs between expected and existing Kubernetes Secret Data, classifying keys as added, changed, or removed; include this key-level summary in the error returned when in-place updates are blocked without --force and in debug logs when a Secret is updated. Tests adjusted to expect the new structured error message.

Changes

Cohort / File(s) Summary
Secret Update Change Reporting
cmd/ci-secret-bootstrap/main.go
Build key-level diff (added/changed/removed) between desired and existing Secret.Data; include the formatted key summary in the non---force rejection error message and in the debug log when secretClient.Update runs.
Tests Updated
cmd/ci-secret-bootstrap/main_test.go
Update test expectations to assert the new structured key-level diff appears in the error message when force is false and data differs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 12 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling ❓ Inconclusive Go code requires inspection for error handling patterns including ignored errors, proper error wrapping, panic misuse, and nil checks. Provide the Go code (main.go and surrounding files) to perform comprehensive error handling verification.
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding Secret change key logging to ci-secret-bootstrap. It directly reflects the actual modifications in both modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Coverage For New Features ✅ Passed PR includes appropriate test coverage for new functionality. The TestUpdateSecrets test was updated to verify error messages include key-level diff information when secret data differs and --force is false.
Stable And Deterministic Test Names ✅ Passed The custom check for Ginkgo test names is not applicable. The codebase uses standard Go testing conventions, not Ginkgo-style tests.
Test Structure And Quality ✅ Passed The custom check targets Ginkgo test code, but the modified test file uses standard Go testing.T with table-driven patterns. The check is not applicable.
Microshift Test Compatibility ✅ Passed PR does not add any new Ginkgo e2e tests; only standard Go unit tests using testing package are present.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. Modifications are within cmd/ci-secret-bootstrap/, a CLI tool directory with standard Go unit tests. Custom check for SNO compatibility applies only to Ginkgo e2e tests, which are absent.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only CLI tool logging, not deployment manifests or scheduling configurations.
Ote Binary Stdout Contract ✅ Passed PR modifies debug logging in main.go using logrus (stderr by default); no stdout writes introduced in process-level code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies CLI tool and standard Go unit tests using table-driven patterns with *testing.T, not Ginkgo e2e tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/ci-secret-bootstrap/main.go`:
- Around line 805-818: The key-change classification has three issues: comparing
[]byte with !=, a missing short-declaration in the second loop, and an
inconsistent format string. Replace the direct byte-slice comparison in the
first loop by reusing existingValue and calling bytes.Equal(existingValue,
value) (import "bytes" if not present) and invert the result; fix the second
loop to use short declaration if _, ok := secret.Data[k]; !ok { ... } so ok is
declared; and normalize the format string in change to "added: %v, changed: %v,
removed: %v". Use the identifiers secret.Data, existingSecret.Data,
existingValue, addedKeys/changedKeys/removedKeys, and change to locate the
edits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e449238e-715d-408d-9477-b370aa849dc9

📥 Commits

Reviewing files that changed from the base of the PR and between 191feee and 52c3ae5.

📒 Files selected for processing (1)
  • cmd/ci-secret-bootstrap/main.go

Comment thread cmd/ci-secret-bootstrap/main.go Outdated
@wking wking force-pushed the log-secret-change-keys branch 3 times, most recently from 933dec9 to 0e91221 Compare April 24, 2026 18:20
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/ci-secret-bootstrap/main.go (1)

805-818: Make key-change summaries deterministic before logging.

addedKeys, changedKeys, and removedKeys come from map iteration, so output order is unstable. Sort each slice before formatting to keep logs/errors consistent across runs.

Proposed patch
 					for k := range existingSecret.Data {
 						if _, ok := secret.Data[k]; !ok {
 							removedKeys = append(removedKeys, k)
 						}
 					}
+					sort.Strings(addedKeys)
+					sort.Strings(changedKeys)
+					sort.Strings(removedKeys)
 					change := fmt.Sprintf("added: %v, changed: %v, removed: %v", addedKeys, changedKeys, removedKeys)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/ci-secret-bootstrap/main.go` around lines 805 - 818, The key-change
summary is nondeterministic due to map iteration; before building the summary
string (where change := fmt.Sprintf(...)), sort the slices addedKeys,
changedKeys, and removedKeys (e.g., using sort.Strings) so their order is
stable; locate the block that compares secret.Data and existingSecret.Data (uses
equality.Semantic.DeepEqual) and insert sort.Strings(addedKeys),
sort.Strings(changedKeys), and sort.Strings(removedKeys) just prior to the
fmt.Sprintf call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/ci-secret-bootstrap/main.go`:
- Around line 805-818: The key-change summary is nondeterministic due to map
iteration; before building the summary string (where change :=
fmt.Sprintf(...)), sort the slices addedKeys, changedKeys, and removedKeys
(e.g., using sort.Strings) so their order is stable; locate the block that
compares secret.Data and existingSecret.Data (uses equality.Semantic.DeepEqual)
and insert sort.Strings(addedKeys), sort.Strings(changedKeys), and
sort.Strings(removedKeys) just prior to the fmt.Sprintf call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 343b0b42-54bc-48be-8a9d-7152bd7610d8

📥 Commits

Reviewing files that changed from the base of the PR and between 933dec9 and 0e91221.

📒 Files selected for processing (1)
  • cmd/ci-secret-bootstrap/main.go

@wking wking force-pushed the log-secret-change-keys branch 2 times, most recently from e8a721e to a636842 Compare April 25, 2026 03:04
Today there was a Deck outage, when both of two Pods restarted their
'deck' containers simultaneously at 22:23, causing:

  Application is not available

errors to https://prow.ci.openshift.org/ users.  Timestamps and exit 0:

  $ oc -n ci get -l component=deck -o json pods | jq -r '.items[].status.containerStatuses[] | select(.restartCount > 0) | {name, restartCount, lastState}'
  {
    "name": "deck",
    "restartCount": 1,
    "lastState": {
      "terminated": {
        "containerID": "cri-o://21786efafb65fd5b67e72eb2a1a91405b182d3b20daeb549980217210bf0e22a",
        "exitCode": 0,
        "finishedAt": "2026-04-23T22:23:37Z",
        "reason": "Completed",
        "startedAt": "2026-04-23T20:04:03Z"
      }
    }
  }
  {
    "name": "deck",
    "restartCount": 1,
    "lastState": {
      "terminated": {
        "containerID": "cri-o://7af2f63b995729d1e8e64b1776a6a2aa3439076e9d55616182ac61b1c64fe855",
        "exitCode": 0,
        "finishedAt": "2026-04-23T22:23:58Z",
        "reason": "Completed",
        "startedAt": "2026-04-23T19:28:51Z"
      }
    }
  }

The container exits were because of Kubeconfig changes:

  $ oc -n ci logs -c deck --previous deck-54c8d55b65-6fxcn | tail -n16 | head -n2
  {"component":"deck","file":"sigs.k8s.io/prow/cmd/deck/main.go:392","func":"main.main.func2","level":"info","msg":"Kubeconfig changed, exiting to trigger a restart","severity":"info","time":"2026-04-23T22:23:36Z"}
  {"component":"deck","file":"sigs.k8s.io/prow/pkg/interrupts/interrupts.go:63","func":"sigs.k8s.io/prow/pkg/interrupts.handleInterrupt","level":"info","msg":"Received signal.","severity":"info","signal":2,"time":"2026-04-23T22:23:36Z"}

The Secret update was this ci-secret-bootstrapper in [1]:

  {"cluster":"app.ci","component":"ci-secret-bootstrap","file":"/go/src/github.com/openshift/ci-tools/cmd/ci-secret-bootstrap/main.go:815","func":"main.updateSecrets","level":"debug","msg":"secret updated","name":"deck","namespace":"ci","severity":"debug","time":"2026-04-23T22:23:02Z","type":"Opaque"}

This commit adds more logs to that "secret updated" entry, to make it
easier for us to figure out which change triggered the next bump, so
we can decide if it's appropriate, and the kind of thing we'll accept
a few minutes of Deck outage over, or if it's surprising churn.

[1]: https://deck-internal-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/origin-ci-private/logs/periodic-ci-secret-bootstrap/2047432829557542912
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 25, 2026

@wking: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/breaking-changes a636842 link false /test breaking-changes

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants