User Details
- User Since
- Nov 22 2021, 10:00 PM (159 w, 2 d)
- Availability
- Available
- LDAP User
- JHathaway
- MediaWiki User
- JHathaway (WMF) [ Global Accounts ]
Yesterday
Tue, Dec 10
Mon, Dec 9
Fri, Dec 6
packages with depends on ruby-sys-filesystem have been published:
@taavi sorry about the breakage, I'll add a dependency to the package, in addition to the fix @MoritzMuehlenhoff committed, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1100999
Wed, Dec 4
Mon, Dec 2
I notified their postmaster, I'll update if I receive a reply.
Mon, Nov 25
From my initial read of one of the rejected emails, I think the problem is on the recipient end. The email gets caught in some type of routing loop:
Wed, Nov 20
@elukey thanos-be2005 is now re-imaging without any user intervention. It wasn't quite as easy as just running the re-image script twice, I still had problems actually booting into debian. But, I lost track of the error states. Perhaps the cause was artifacts of my earlier testing. Hopefully, your re-imaging doesn't have any issues.
Tue, Nov 19
Mon, Nov 18
@elukey, unfortunately I observed the same double d-i installer issue with thanos-be2005. Grub's installer does not throw any errros, but upon reboot the debian boot option is last in the boot order. I suspect that https://www.supermicro.com/support/faqs/faq.cfm?faq=27004 is still true, namely that you cannot affect the boot order from within Debian or httpboot once messes up something in the bios. I submitted a question on that ticket, but we should go through our regular support channel as well. Though, I'm also not sure if that FAQ explains all the behaviors we have seen.
Fri, Nov 15
Updated template to more closely match the original, https://gitlab.wikimedia.org/repos/sre/corto/-/merge_requests/28
Wed, Nov 13
Sorry, egg on my face, that was my fault. I commented out the auto
reboot so I could do some debugging yesterday, before the reboot, but
forgot to remove the puppet override.
Nov 10 2024
@elukey I was able to reproduce the issue, by wiping the files from the efi partition, before kicking off another re-image. I think the problem is actually in the debian-installer, rather than on the supermicro side, which is why we don't see this issue on sretest2001.codfw.wmnet. I think the debian-installer is failing to install grub properly and create the efi boot entry, which is part of the grub install process. I think the issue is related to setting grub-installer/bootdev which is done by autoinstall/scripts/partman_early_command.sh on the ms-be boxes. On ms-be2082 this evaluated to grub-installer/bootdev /dev/sdj /dev/sdk which seems correct, but perhaps /dev/sdk needs to be first? I also tried setting grub-installer/only_debian boolean false, which we set in the raid1-2dev-efi.cfg, but that didn't seem to have any effect, so I don't think we are still hitting, "#this workarounds LP #1012629 / Debian #666974", but I'm also not sure. I am off Monday, but happy to investigate more on Tuesday.
Nov 7 2024
@elukey I tried reproducing the double Debian installer bug, but I failed, the steps I tried.
Nov 6 2024
sounds good to me!
Nov 1 2024
Puppet 6 added the ability to defer a function's evaluation, which results in the function being run on the Puppet client, rather than on the Puppetserver. This feature was primarily built so that Puppet could query for secrets on the client, rather than on the Puppetserver. The primary advantage being that your Puppetserver no longer needs to be authorized to read all secrets for an organization. Unfortunately using Deferred can be a bit cumbersome as any data manipulation of the secret must also be deferred. For example the rendering of an EPP template would also need to be deferred so as to execute client side. The result is difficult to read code:
@MatthewVernon, during a sprint week myself and @ayounsi worked on adding EFI booting support. We are pretty close to having all the patches merged, but there is one nettlesome issue with Supermicro's in which the first provisioning reboot fails due to an apparent issue with the networking stack on the host. @ayounsi has been chasing that issue down with Supermicro. That said, I think we are still in a position to move forward with EFI for these hosts.
Oct 28 2024
Thanks for mentioning option (1), I was involved with re-setting that up for Qualtrics in T314815, and we could do something similar for zendesk, if they support it on their end.
@Volans mentioned that it would be nice to sync the EFI partition following the replacement of a failed disk. One possibility would be to modify the script to support being called from mdadm's monitoring events. For instance when we receive a RebuildStarted event, we could sync the EFI partitions, see man 8 mdadm
Oct 25 2024
Oct 23 2024
They are still a bit rough in places, but resolving for now:
architecture was dropped in favor of only having mx-in and mx-out hosts.
architecture was dropped in favor of only having mx-in and mx-out hosts.