jhathaway (Jesse Hathaway)
User

Projects

Infrastructure-Foundations
Group
Trusted-Contributors
Group

Calendar

User Details

User Since: Nov 22 2021, 10:00 PM (159 w, 2 d)
Availability: Available
LDAP User: JHathaway
MediaWiki User: JHathaway (WMF) [ Global Accounts ]

Recent Activity
View All

Yesterday

jhathaway renamed T345067: reimage puppetmasters to puppetservers from reimage puppetmasteres to puppetserveres to reimage puppetmasters to puppetservers.

Wed, Dec 11, 6:25 PM · Puppet-Infrastructure, Infrastructure-Foundations, SRE

jhathaway closed T381538: Backport facter to bullseye, a subtask of T330490: Next steps for Puppet 7, as Resolved.

Wed, Dec 11, 5:34 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

jhathaway closed T381538: Backport facter to bullseye as Resolved.

Wed, Dec 11, 5:34 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

jhathaway changed the status of T381538: Backport facter to bullseye, a subtask of T330490: Next steps for Puppet 7, from Open to In Progress.

Wed, Dec 11, 5:33 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

jhathaway changed the status of T381538: Backport facter to bullseye from Open to In Progress.

Wed, Dec 11, 5:33 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

Tue, Dec 10

jhathaway created T381927: Log tls cipher information.

Tue, Dec 10, 9:34 PM · Infrastructure-Foundations, Mail, SRE

jhathaway created T381919: Supermicro: unable to set boot order after using Redfish to boot once.

Tue, Dec 10, 8:04 PM · Infrastructure-Foundations

Mon, Dec 9

jhathaway triaged T381639: Facter 4 upgrade removed 'mountpoints' fact, breaking cinderutils::ensure as High priority.

Mon, Dec 9, 3:35 PM · Infrastructure-Foundations, Puppet-Core, Cloud-VPS, cloud-services-team

Fri, Dec 6

jhathaway added a comment to T381639: Facter 4 upgrade removed 'mountpoints' fact, breaking cinderutils::ensure.

packages with depends on ruby-sys-filesystem have been published:

Fri, Dec 6, 9:39 PM · Infrastructure-Foundations, Puppet-Core, Cloud-VPS, cloud-services-team

jhathaway added a comment to T381639: Facter 4 upgrade removed 'mountpoints' fact, breaking cinderutils::ensure.

@taavi sorry about the breakage, I'll add a dependency to the package, in addition to the fix @MoritzMuehlenhoff committed, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1100999

Fri, Dec 6, 2:58 PM · Infrastructure-Foundations, Puppet-Core, Cloud-VPS, cloud-services-team

Wed, Dec 4

jhathaway updated the task description for T381538: Backport facter to bullseye.

Wed, Dec 4, 11:08 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

jhathaway triaged T381538: Backport facter to bullseye as Low priority.

Wed, Dec 4, 11:06 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

jhathaway created T381538: Backport facter to bullseye.

Wed, Dec 4, 11:06 PM · Patch-For-Review, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

Mon, Dec 2

jhathaway added a comment to T380696: Mailserver refusing emails sent through VRTS due to too large headers.

I notified their postmaster, I'll update if I receive a reply.

Mon, Dec 2, 4:52 PM · collaboration-services, Infrastructure-Foundations, Mail, vrts

jhathaway triaged T380696: Mailserver refusing emails sent through VRTS due to too large headers as Low priority.

Mon, Dec 2, 3:29 PM · collaboration-services, Infrastructure-Foundations, Mail, vrts

Mon, Nov 25

jhathaway added a comment to T380696: Mailserver refusing emails sent through VRTS due to too large headers.

From my initial read of one of the rejected emails, I think the problem is on the recipient end. The email gets caught in some type of routing loop:

Mon, Nov 25, 4:46 PM · collaboration-services, Infrastructure-Foundations, Mail, vrts

jhathaway added a comment to T378285: Emails from wikimediats.zendesk.com fails DMARC policy.

In T378285#10351000, @revi wrote:

In T378285#10270193, @revi wrote:

Option 1 seems… to have conflicting info: they say "no SMTP relay" but they have an EAP program which practically seems to be the… SMTP relay. I think it's up to you to figure out which is true (and what path to go).

Looks like they surrendered to the demand and added SMTP relay support.

Mon, Nov 25, 4:12 PM · User-revi, Mail, Trust-and-Safety, Infrastructure-Foundations

jhathaway added a project to T380696: Mailserver refusing emails sent through VRTS due to too large headers: collaboration-services.

Mon, Nov 25, 3:36 PM · collaboration-services, Infrastructure-Foundations, Mail, vrts

jhathaway triaged T380214: sre.hosts.decommission: clear OOB networking config as Low priority.

Mon, Nov 25, 3:23 PM · Infrastructure-Foundations

jhathaway triaged T379244: PCC: cleanup old hosts as Medium priority.

Mon, Nov 25, 3:22 PM · User-aborrero, Infrastructure-Foundations, Puppet CI

Wed, Nov 20

jhathaway added a comment to T370452: Q1:rack/setup/install thanos-be2005.

@elukey thanos-be2005 is now re-imaging without any user intervention. It wasn't quite as easy as just running the re-image script twice, I still had problems actually booting into debian. But, I lost track of the error states. Perhaps the cause was artifacts of my earlier testing. Hopefully, your re-imaging doesn't have any issues.

Wed, Nov 20, 10:55 PM · SRE, SRE-swift-storage, Data-Persistence, ops-codfw, DC-Ops

Tue, Nov 19

jhathaway updated subscribers of T380214: sre.hosts.decommission: clear OOB networking config.

In T380214#10335094, @Volans wrote:

Was the host unracked?

Tue, Nov 19, 3:44 PM · Infrastructure-Foundations

Mon, Nov 18

jhathaway added a comment to T370452: Q1:rack/setup/install thanos-be2005.

@elukey, unfortunately I observed the same double d-i installer issue with thanos-be2005. Grub's installer does not throw any errros, but upon reboot the debian boot option is last in the boot order. I suspect that https://www.supermicro.com/support/faqs/faq.cfm?faq=27004 is still true, namely that you cannot affect the boot order from within Debian or httpboot once messes up something in the bios. I submitted a question on that ticket, but we should go through our regular support channel as well. Though, I'm also not sure if that FAQ explains all the behaviors we have seen.

Mon, Nov 18, 11:20 PM · SRE, SRE-swift-storage, Data-Persistence, ops-codfw, DC-Ops

jhathaway added a comment to T380009: VRTS e-mail address unreachable / e-mail routing issue.

In T380009#10332939, @eoghan wrote:

We had a quick chat with ITS today where they disabled the change that caused the routing to change, and it did cause gmail to start returning 550 for unknown addresses again, so we have confirmed their change was what caused this to start behaving differently.

Mon, Nov 18, 8:18 PM · Patch-For-Review, collaboration-services, User-revi, Infrastructure-Foundations, Mail, SRE, Znuny, vrts

jhathaway created T380214: sre.hosts.decommission: clear OOB networking config.

Mon, Nov 18, 6:11 PM · Infrastructure-Foundations

fnegri awarded T378235: OpenBao evaluation a Love token.

Mon, Nov 18, 3:20 PM · SecTeam-Processed, Security, Infrastructure-Foundations

Fri, Nov 15

jhathaway closed T376941: Corto: Scrutinize/finalize template text as Resolved.

Updated template to more closely match the original, https://gitlab.wikimedia.org/repos/sre/corto/-/merge_requests/28

Fri, Nov 15, 3:15 PM · Incident Tooling, SRE-OnFire

jhathaway closed T376941: Corto: Scrutinize/finalize template text, a subtask of T356790: Corto: Incident responder workflow automation (MVP), as Resolved.

Fri, Nov 15, 3:15 PM · Incident Tooling, SRE-OnFire

Wed, Nov 13

jhathaway added a comment to T371400: Q1:rack/setup/install ms-be208[1-8].

Sorry, egg on my face, that was my fault. I commented out the auto
reboot so I could do some debugging yesterday, before the reboot, but
forgot to remove the puppet override.

Wed, Nov 13, 1:38 PM · SRE, SRE-swift-storage, Data-Persistence, ops-codfw, DC-Ops

Nov 10 2024

jhathaway added a comment to T371400: Q1:rack/setup/install ms-be208[1-8].

@elukey I was able to reproduce the issue, by wiping the files from the efi partition, before kicking off another re-image. I think the problem is actually in the debian-installer, rather than on the supermicro side, which is why we don't see this issue on sretest2001.codfw.wmnet. I think the debian-installer is failing to install grub properly and create the efi boot entry, which is part of the grub install process. I think the issue is related to setting grub-installer/bootdev which is done by autoinstall/scripts/partman_early_command.sh on the ms-be boxes. On ms-be2082 this evaluated to grub-installer/bootdev /dev/sdj /dev/sdk which seems correct, but perhaps /dev/sdk needs to be first? I also tried setting grub-installer/only_debian boolean false, which we set in the raid1-2dev-efi.cfg, but that didn't seem to have any effect, so I don't think we are still hitting, "#this workarounds LP #1012629 / Debian #666974", but I'm also not sure. I am off Monday, but happy to investigate more on Tuesday.

Nov 10 2024, 11:24 PM · SRE, SRE-swift-storage, Data-Persistence, ops-codfw, DC-Ops

Nov 7 2024

jhathaway added a comment to T371400: Q1:rack/setup/install ms-be208[1-8].

@elukey I tried reproducing the double Debian installer bug, but I failed, the steps I tried.

Nov 7 2024, 11:06 PM · SRE, SRE-swift-storage, Data-Persistence, ops-codfw, DC-Ops

Nov 6 2024

jhathaway added a comment to T370786: corto: review irc grammar ergonomics.

sounds good to me!

Nov 6 2024, 8:53 PM · Incident Tooling, SRE-OnFire

Nov 1 2024

jhathaway added a comment to T378239: OpenBao Puppet consumer.

Puppet 6 added the ability to defer a function's evaluation, which results in the function being run on the Puppet client, rather than on the Puppetserver. This feature was primarily built so that Puppet could query for secrets on the client, rather than on the Puppetserver. The primary advantage being that your Puppetserver no longer needs to be authorized to read all secrets for an organization. Unfortunately using Deferred can be a bit cumbersome as any data manipulation of the secret must also be deferred. For example the rendering of an EPP template would also need to be deferred so as to execute client side. The result is difficult to read code:

Nov 1 2024, 3:35 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway updated subscribers of T378584: Evaluate hw-raid controllers for Supermicro's Config J.

@MatthewVernon, during a sprint week myself and @ayounsi worked on adding EFI booting support. We are pretty close to having all the patches merged, but there is one nettlesome issue with Supermicro's in which the first provisioning reboot fails due to an apparent issue with the networking stack on the host. @ayounsi has been chasing that issue down with Supermicro. That said, I think we are still in a position to move forward with EFI for these hosts.

Nov 1 2024, 2:17 PM · SRE-swift-storage, Infrastructure-Foundations, Data-Persistence, DC-Ops

Oct 28 2024

jhathaway updated the task description for T378239: OpenBao Puppet consumer.

Oct 28 2024, 7:53 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway added a comment to T378285: Emails from wikimediats.zendesk.com fails DMARC policy.

Thanks for mentioning option (1), I was involved with re-setting that up for Qualtrics in T314815, and we could do something similar for zendesk, if they support it on their end.

Oct 28 2024, 7:11 PM · User-revi, Mail, Trust-and-Safety, Infrastructure-Foundations

jhathaway updated subscribers of T376949: UEFI and software RAID.

@Volans mentioned that it would be nice to sync the EFI partition following the replacement of a failed disk. One possibility would be to modify the script to support being called from mdadm's monitoring events. For instance when we receive a RebuildStarted event, we could sync the EFI partitions, see man 8 mdadm

Oct 28 2024, 4:02 PM · Patch-For-Review, Infrastructure-Foundations

jhathaway updated subscribers of T378285: Emails from wikimediats.zendesk.com fails DMARC policy.

Thanks @revi, perhaps the From: header has changed since T272750, @Nahid any idea who on Trust & Safety manages this zendesk instance?

Oct 28 2024, 3:29 PM · User-revi, Mail, Trust-and-Safety, Infrastructure-Foundations

jhathaway triaged T378285: Emails from wikimediats.zendesk.com fails DMARC policy as Medium priority.

Oct 28 2024, 3:19 PM · User-revi, Mail, Trust-and-Safety, Infrastructure-Foundations

Oct 25 2024

jhathaway triaged T378246: OpenBao unsealing as Medium priority.

Oct 25 2024, 10:17 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378250: OpenBao audit logs as Medium priority.

Oct 25 2024, 10:16 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378250: OpenBao audit logs.

Oct 25 2024, 10:16 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378249: OpenBao backups as Medium priority.

Oct 25 2024, 10:16 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378249: OpenBao backups.

Oct 25 2024, 10:16 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378248: OpenBao high availability as Medium priority.

Oct 25 2024, 10:15 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378248: OpenBao high availability.

Oct 25 2024, 10:15 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378247: OpenBao authentication methods as Medium priority.

Oct 25 2024, 10:14 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378247: OpenBao authentication methods.

Oct 25 2024, 10:14 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378246: OpenBao unsealing.

Oct 25 2024, 10:12 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378245: OpenBao installation & configuration as Medium priority.

Oct 25 2024, 10:11 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378245: OpenBao installation & configuration.

Oct 25 2024, 10:11 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378244: OpenBao storage engine as Medium priority.

Oct 25 2024, 10:09 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378244: OpenBao storage engine.

Oct 25 2024, 10:08 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378243: OpenBao secret engines as Medium priority.

Oct 25 2024, 10:07 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378243: OpenBao secret engines.

Oct 25 2024, 10:07 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378241: OpenBao human consumer as Medium priority.

Oct 25 2024, 10:05 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378241: OpenBao human consumer.

Oct 25 2024, 10:05 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378240: OpenBao Kubernetes consumer as Medium priority.

Oct 25 2024, 10:03 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378240: OpenBao Kubernetes consumer.

Oct 25 2024, 10:03 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378239: OpenBao Puppet consumer as Medium priority.

Oct 25 2024, 10:00 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378239: OpenBao Puppet consumer.

Oct 25 2024, 10:00 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378237: OpenBao consumers as Medium priority.

Oct 25 2024, 9:55 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378237: OpenBao consumers.

Oct 25 2024, 9:54 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway triaged T378235: OpenBao evaluation as Medium priority.

Oct 25 2024, 9:52 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway created T378235: OpenBao evaluation.

Oct 25 2024, 9:52 PM · SecTeam-Processed, Security, Infrastructure-Foundations

jhathaway added a comment to T376949: UEFI and software RAID.

In T376949#10262150, @MoritzMuehlenhoff wrote:

I haven't found the time to look into this deeper myself, but the topic also came up in #debian-boot and the following pointers were also provided there:

The Debian wiki page on this: https://wiki.debian.org/UEFI#RAID_for_the_EFI_System_Partition

Oct 25 2024, 4:42 PM · Patch-For-Review, Infrastructure-Foundations