When IPMI Cards Attack
We’ve occasionally had issues with a database machine hanging at boot waiting for
a PepperC Usb Mass storage device. Turns out that it’s part of an IPMI
card for supermicro motherboards, and for some annoying reason, it was
asserting itself as the root device.
This is a big server, with a raid card, a bunch of drives and a couple of logical arrays off of that card. It’s meant to boot off of one of the arrays, on /dev/sda.
What is happening is that the SCSI and the USB are probed in parallel and there’s a race condition as to which one responds first and therefore gets the sda designation. The next gets sdb, then sdc, and so on. Normally, drives and partitions are referred to by that
designation. Normally, the order is static, and everything is good.
But if something takes longer than normal, perhaps due to an extra
drive in the case or something, then the other one has a chance to
take over.
And then you’re trying to boot off of something that you don’t expect,
like the newly inserted big drive for backups, or the PepperC device. (Good thing that the IPMI card has KVM over ip. It’s nice when a piecce of equipment solves aas many problems as it causes. )
So, the other way to refer to drives is by uuid, a universal id. All
drives and partitions have a unique 128bit id, which makes it a bit harder to have some interloper of a device register a second or two early and mess everything up. This stable root device (AKA UUID) walkthrough worked for me with Debian Stable (etch) with the caveat that I had to mkswap again, as my swap partition didn’t have a uuid associated with it originally.
Incidentally, this IPMI card also will prevent booting if its event log is full by displaying a ‘Press F1 to continue’ prompt on the console. I’m at the point where I’m not sure if the IPMI card helps or hinders reliability, and I’m likely not to put any in machines in the future.
No comments