spod.cx

I hate Sun.

This week has been one of "those" weeks. On Tuesday morning (the normal JANET maintenance window), the networks team at work moved a large part of the uni network over to the new core. In theory, this should have been pretty straightforward with an hour of downtime, then everything working again. After the maintenance window had passed and things still weren't working properly I noticed that none of our users were getting DHCP addresses assigned. Additionally, 3 of our 6 Sun machines didn't appear to be able to talk to anything outside the library network. After speaking to the network guys, we got DHCP running again, but they had no idea why the half the sun kit wasn't working. After digging further it turned out that the non-working machines were missing the /etc/defaultrouter file, which specifies their default gateway. Why this hadn't been there in the first place is a bit of a mystery since it should have been set at install time, as is why the machines were working up until the network changes, which didn't actually change anything as far as we were concerned. Odd.

Wednesdays fun and games started when I realised that nemesis, the server that runs our main library system was reporting a fault on its internal disks. One of the disks had failed, taking out everything else on the same fibre channel loop, leaving only the external half of the RAID arrays online. After speaking to Sun support, and doing the usual diagnostic stuff they ask for, they agreed to send out an engineer and a pair of disks.

Today's fun and games started when I updated the openboot prom firmware on an old E450 we've got. It was going to be our test machine, so I decided to bring everything fully up to date before we started to use it. The firmware update worked fine, but reset some of the default eeprom settings, one of which is 'don't boot unless someone enters the firmware password'. Unfortunately, it appears that at some point in the past, this password has been set by someone (before my time), and the only way to reset it is to swap out a bit of hardware inside the machine. More annoyingly, we only took the machine off it's maintenance contract 2 weeks ago since it's no longer a production server, which means that replacing the NVRAM chip will cost us money. Fucking Sun.

Christ alone knows what's going to break tomorrow, but we have a Sun engineer coming out at 9am to replace the faulty hardware in nemesis. I suspect somehow he'll manage to turn a currently working server into a dead one, if this week is anything to go by...

To cap it all, the entry gate system that's been doing nothing for 4 years doesn't work properly now the management want to use it.

Instead of actually getting some work done this week, I seem to have spent the last 3 days trying to deal with non-stop brokenness. On the plus side, it's nearly the weekend :)


Contact: site@spod.cx