Archive for July, 2005
More Butterflies from the Zoo
I took another run through the butterfly tent at the zoo last time I was there — and wound up with some more neat shots.
No commentsCozumel In the bulls eye
It looks like Cozumel is about to get a nearly direct hit from a category 4 hurricane. You may remember Cozumel from galleries such as the Baby Turtle Release, and a lazy sunday. When we were there, a Category 1 fizzled out and made landfall as a tropical storm. Not a big deal.
But this is bigger. The storm surge is supposed to be 10-12 feet or so, a significant percentage of the height of the island. Apparently the eye wall is going to hit the island. I’d expect that the windward side road might have some severe washouts. Not sure how well the town will do — at least most of the construction is cinder block walls. Roofs will be torn off and I’d expect a lot of flooding.
I just hope the locals can ride this out and not lose too much.
No commentsNeutrally Buoyant
There’s something just a little eerie about a helium balloon that’s been balasted to very close to neutrally buoyant wander around the living room. Slowly. Chased by the eddies and drafts in the room. Nearly escaping to the great outdoors through the open door to the porch only to be snatched back inside and balasted a little more.
Server Death
To anwser a few unanswered questions from the last post: Yes, quite a lot of the recovered files where bad. They dropped bits, jpgs look cubist, and mp3s have been invaded by alien noises. Some are ok, will probably be restoring anything that can be restored easily. I’ll recover the text files that I need but blow away the rest. And, I’ll be scripting better backups of everything next time.
And if losing one drive wasn’t enough, I started getting kernel panics when there was only the boot drive in the system. That eliminates the Promise IDE card from being a single point of failure, and leads me to believe that it’s either the processor, motherboard, or memory. None of them are worth any more of my time. Goodbye server, it’s not been a bad run but it’s time to move to something better. RIP Cabbage 2000/10/6 – 2005/7/7
No commentsGoing Analog and Finding Digital
or what happens when a server blows a drive. I start using the turntable again, since the mp3s are on a crashed drive.
Last week was a bad week for hardware in my life. This webserver randomly turned off one nught (as it does every so often). My 160gig media hard drive lost it’s superblock and a whole lot more, and a server at work one hour away lost a processor fan. But this isn’t a story about fans, this is a story about hard drives. Hard drives with lots of data that are too big to backup to anything but other hard drives.
In the last couple of weeks, I’ve updated to the new Debian stable and replaced a very loud power supply in this machine. I thought that it was preventative maintenance. But then, last week when I was out of town, it locked up, later determined to have been due to IDE errors. Of course, it has to happen when I’m out of town. On reboot, I get errors like:
Jun 30 09:01:50 cabbage kernel: hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jun 30 09:01:50 cabbage kernel: hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=191, high=0, low=191, sector=128 Jun 30 09:01:50 cabbage kernel: end_request: I/O error, dev 22:01 (hdg), sector 128 Jun 30 09:01:50 cabbage kernel: sh-2006: reiserfs read_super_block: bread failed (dev 22:01, block 64, size 1024) Jun 30 09:01:50 cabbage kernel: sh-2021: reiserfs_read_super: can not find reiserfs on ide3(34,1)
Nice. This drive has 300 cds of mp3s backed up on the original audio disks, 50 or 60 gigs of un backed up baby movies (the raw footage, I’ve got dvds and working copies), 30 gigs of pictures that are all backed up, and random other stuff that isn’t critical, but nice to have. Not something that I really want to restore from the original sources. Especially the mp3s. So, off to Fry’s for a new drive, they have a 200gig Seagate for $50. Plug it in, partition for one big partition, and off to the restoration races.
First attempt is reiserfsck on the original bad drive, but of course, it says that it’s a hardware problem. So, time to copy what I can from the old drive to the new one. dd fails instantly, since the first couple of blocks of the drive are bad.
dd_rescue to the rescue. (umm, sorry). Where dd exits on error, dd_rescue just goes really slowly, and can go either forwards or backwards. I started from the end of the drive, and got about half of the data before the machine hung after resetting the ide bus.
dd_rescue -r /dev/hdg1 /dev/hde1
Update GNU ddrescue looks to be a better option than dd_rescue + dd_rhelp.
At this point, I should have given up on recovering the drive on that machine, moved the drives to another machine that didn’t need the Promise IDE controller. I also should have just used dd_rhelp to automate what I wound up doing manually for several hours. It took 10 or 15 more reboots after IDE errors for me to decide move to the other machine and run off of a better controller.
After about a day of the drives churning, and several thousand unreadable blocks later, I had a copy of the bad partition on a fresh new drive. Next task, reconstruct what I could of the file system. It turns out that the superblock and the volume bitmap were pretty well hosed, so first task was to recreate the superblock. I then tried the not too invasice –check option, but had many uncorrectable errors. So, time to bring out the big gun of –rebuild-tree, which should reconstruct as much of the filesystem as possible by scanning the whole disk. 18 hours later (or so) I had an error free file system, with a bunch of missing files.
resierfsck --create-superblock /dev/hde1 reiserfsck --check /dev/hde1 reiserfsck --rebuild-tree /dev/hde1 reiserfsck --check /dev/hde1
But what files are missing and how many files are silently corrupted? And do I trust the new filesystem? It would have been good to get a list of the block numbers that were bad, but I did that recovery over 2 machines, and one of them started off of a live CD, so I don’t really have that info. I can’t find anything that tells me if the rebuild-tree option will just trash files with bad leaf nodes, or if they are not detected at all. And I don’t trust the new filesystem, so I decided to copy all of the files to other drives, reformat, and copy back.
At this point, it would have been really nice if everything just worked. But, of course, it didn’t or I wouldn’t be writing this paragraph. I moved the the drive with the recovered data and added another large drive to the promise controller. While copying,
cp -a
failed halfway through when the machine hung. Rsync gave a list of missing files, which I found useful, but it too hung the machine a few times, and I got
kernel bug in page_alloc.c
errors that from a quick googling tend to indicate hardware trouble. Memtest86 seems to indicate that this isn’t a memory or memory controller issue, so I’m guessing that the Promise card is bad.
A year ago, I thought that this machine was old enough that I was on the bubble about getting a IDE controller or replacing the motherboard. (Which would have cascaded to the processor, memory, and ethernet cards). Since then, I’ve replaced the power supply, the boot drive, and then there’s this fiasco. And now I’m trying to figure out if I should buy a new machine and swap in the hard drives, or just get a mac mini and get out of the aging x86 hardware repair business.
But the silver linings are that the drive failed 2 weeks before the warranty ran out, and while railing against the crappy sound quality of my analog copies of “The Name of This Band is the Talking Heads”, I found out that it had finally been converted to digital for the first time late last year. And that’s a worthy discovery no matter what.
No comments