Tuesday, November 10, 2009

Recovering from Terastation Meltdown, Without a Backup

On Sunday night, I was doing some video editing for a friend, when I realized that my Terastation (original) was not running. I glanced in my server closet and discovered 4 green HDD lights (all on solid), a diag light (flashing), and no fan or hard drive noise coming from the unit.

I cycled the power and the unit started up again. It worked for a short time (performing an array check) and then would quit and go into the same state again. Each time I powered it up, it lasted between 30 seconds to 3 minutes or so.

I called Buffalo tech support, who graciously offered me assistance (even though this device is sorely out of warranty). The tech recommended I turn it off, hold the "init" button in and power it on, while continuing to hold the init button down for 15 seconds. He recommended I wait until the device powered up and completed the array check, then perform a firmware upgrade. He seemed fairly confident that this would solve my problem, so we hung up.

No sooner did the phone disconnect, then the Terastation shutdown again with a "click". The init button routine seemed to do absolutely nothing. I decided I was on my own, and I attempted a firmware upgrade.

The firmware upgrade (available from Buffalo Technology's website) ran, and seemed to complete successfully. However, when I booted the Terastation back up after the upgrade, it came up in "EM mode" (Buffalo's name for the recovery firmware, stored on flash, that only allows you to upgrade/write the firmware to the hard disks). This was probably what the tech was trying to get me into when we did the INIT button thing... I again attempted the firmware upgrade. This time, while writing the firmware, I shuttered as I heard:

* click *

... Sure enough, the device had shut off, and it would no longer boot at all (not even to EM mode!) My Terastation was a brick. I searched around online and found that it's possible to recover the device with some soldering of a connector that I didn't have, a JTAG cable that I didn't have, and a firmware upgrade. At this point, I was panicking a little, as a backup of the device was another thing I didn't have (yes, yes, shame on me.)

Recovering the Data

My next feat was to get the data off the drives, which were in RAID 5 configuration, without the Terastation. I found several articles online, in particular this one from UFSexplorer . Basically, you can hook the 4 PATA drives up to a computer running Windows, and use UFSexplorer to build the array virtually, and copy the data off of it. Here's the basic procedure I followed.

Obviously, do all of this at you're own risk, but presuming you're doing this because you have no backup, you're probably into risk taking anyway.
What you'll need:

-We'll be running a total of 5 drives (the 4 Terastation drives, plus one Windows system drive), so you'll need a full tower PC case, with a hard drive running Windows.

-It should have at least 4 free power connectors and 2 IDE controllers on board. You can improvise, such as with an extra IDE controller card if you have one, and a 'Y' adapter drive power cable, or an external drive power supply. You'll probably need to disconnect any DVD drives to free up some power and data connectors.

-An external PATA IDE->USB converter (for the 5th drive, which will actually be drive #4 on the Terastation). Again, you can improvise with things like an extra IDE controller card, if you have one.

1. Assuming you have the Terastation "original", you'll have to pretty much completely disassemble it to get to the drives.. This involves the removal the outer case (many screws), a metal guard (many screws), the system board (many screws), one more small metal guard (2 screws), and finally, the drive cage (2 screws). The drives are numbered 4,3,2,1 from left (system board side) to right. You'll probably need to remove drive #4 completely, as you'll be connecting that to your USB hard drive adapter.

2. Set the jumpers on all of drives, and connect them to power.

Note: you may be able to get away with powering the drive array with the Terastation power supply, but in my case, I later found out that the power supply was bad, which was responsible for this whole mess. It's advisable to use as little from the Terastation as possible unless you are certain you know what's wrong with it).

After a lot of futzing around to get the drives all powered and connected, and changing jumper settings on all the drives, I finally got them all to be recognized by the computer. Here's a full description of the setup, by drive:

Windows system drive:
-Primary onboard IDE controller, jumpered as master.

Terastation drive #1:
-Primary onboard IDE controller, jumpered as slave

Terastation drive #2:
-Secondary onboard IDE controller, jumpered as master

Terastation drive #3:
-Secondary onboard IDE controller, jumpered as slave

Terastation drive #4:
-USB IDE adapter, jumpered as master or cable select

Get your Windows PC all booted up after verifying the drives are all visible to the BIOS setup. They won't show up in Windows as drive letters, because they don't have recognizable file systems. Don't panic.

3. Now you'll need to download (and eventually buy, for about $75 USD) UFS Explorer Professional Recovery. It's worth the money. One major issue this gets around is that the RAID was originally built on a PPC, big-endian-based system, and you are now trying to access it on a PC, which is little-endian. If you don't know what that means, just continue.

4. Follow the instructions on the UFS Explorer site, specific to the Terastation. Here's the basic steps:

-When you launch the program, you should see all 4 drives with a mess of partitions. Some will be "XFS", and some will be "Unknown".

You'll most likely need to need to use the "Hex View" function to establish which drive is which. This is documented on the UFS Explorer site, as well. View each of the large partitions (232GB on mine) and pay attention to the very first 4 bytes or so - they should help you identify what part of the RAID 5 you're looking at. If you connected the drives as I did above, your array order should be 2 (Superblock), 3 (iNode block), 1 (parity), 4 (parity).

-Click the "RAID Builder button"

-Choose the partition option, not the disk option (I forget the exact wording)

-Go through and add all 4 of the really big partitions to the right side.

-Use the "move to top" and "move to bottom" buttons to get them in the correct order

-Click Ok and you should see a new partition on the bottom of the list. Right click and choose "explore". If you see all of your folders on the right, you can now copy your data somewhere (i.e. to a network or USB drive). If you don't see all of the folders, or you get "error in filesystem", you probably have your drives in the wrong order. Right click to close the partition and try again in a different order.

After almost 2 days of copying, I have all of my data back, and I am now building a FreeNAS-based NAS, using a dual eSATA enclosure and an older Dell. Terastations are great, but I'm too strapped for cash to replace it now. Maybe someday I'll resurrect it by replacing the power supply, but right now I just don't have the time. The other thing I plan to do, ASAP, is setup Jungledisk to sync the NAS files with an offsite backup.

Wednesday, February 11, 2009

Back to basics - Making Ethernet Cables Without Losing Your Head

A friend recently asked me for some tips on network cabling tools. After realizing I have a lot to say on this subject, I figured it was as good a reason as any to do a post on it. Here's some tips from a guy who has done more cable-crimping than his pay scale warrants:

-Don't skimp when buying crimpers (i especially like the ones that "click" to let you know that they are completely crimped)

-Buy the bulk pack. Don't underestimate how many terminators you'll go through if you don't do cables often.

-Print out (in color) the specification for the T568B standard as small as possible, and STICK IT somewhere on your crimping tool. Then you'll always have it!

Color Code Guide Note: Others will argue that the T568A standard should be used. In my experience, the T568A standard is usually only involved when making a crossover cable. It's a matter of preference, but be consistent.

-A simple network cable tester is REALLY nice to have. It can save you a lot of frustration, especially if you are doing a job (whole house, building, etc.)

Cheap cable tester

Finally, assemble your crimpers, terminators, tester into a small box, and label it "Cable Kit". Keep it near your cable box. Ask me how I know.

Saturday, January 17, 2009

HOWTO: Windows Mobile VPN connection to DD-WRT

So, after a little bit of screwing around, I have gotten my Windows Mobile phone to connect to my home network via a PPTP VPN to my DD-WRT router. My main goal for this was to allow my SIP client (I'm using SJ Phone) to connect to my Asterisk Box behind the firewall.

Here's how I did it!

On DD-WRT router/firewall
Go to Services->PPTP
Click "enable"

Enter the following:
Server IP:
(internal IP address of router)

Client IP(s):
(a range of internal IP addresses that are unoccupied)

CHAP-Secrets: myusername * mypassword *
(Asterisks and spaces are required as shown)

Hit apply-settings, and save.

On the Windows Mobile Phone
Go to Start->Settings->Connections,
Click the advanced tab
Click select networks
Set to the following:
"Programs that automatically connect to the Internet should connect using: 'My ISP'"
"Programs that automatically connect to a private network should connect using: 'My Work Network'"
Hit Ok
Click to the Tasks tab
Under "My Work Network", click "Edit my VPN servers"
Click "New"

Enter the following:
Name: "Home VPN" (or whatever you want)
Hostname or IP: "blah.dyndns.org" (assuming you have some kind of ddns setup)
VPN type: PPTP
Click Next

Enter the following:
User name: myusername (Or whatever you used above)
Password: mypassword (Or whatever you used above. NOTE: this probably won't be remembered by the device anyway)
Domain: (Leave Blank)

Click Finish

Tell the phone to connect to the VPN

With your device outside your LAN, activate your VPN connection as follows:

Under "My Work Network" Click Edit my VPN Servers
Click and hold the VPN connection entry
Choose "Connect"
Enter your username and password (if asked) (leave domain blank again)
You should see it connect. If you don't see an error pop up, it worked!

Configure SJPhone
Next, configure SJPhone to connect to the internal IP address of the asterisk box:
Click the "Profiles" tab
Click "new"
Profile Name: My asterisk box
(rest as defaults)
Click the "SIP proxy" tab
Proxy domain: (or your asterisk box internal IP) : 5060

Click Ok, and your SIP device should register! If you drop your VPN connection, you may need to manually force SJPhone to reregister the SIP connection before you can make/receive calls again.

Saturday, January 3, 2009

How to repair a damaged linux partition - the safe way

The other day, my brother asked me to recover the data from a linux hard drive that would no longer boot for him. He said the superblock was reported as damaged and the machine refused to boot.

I took the hard drive home and put it in a spare machine I had. Presuming the hard drive was damaged, I decided it was best if I didn't try to boot off of it. The more the hard drive is used, the worse the chances of recovery.

The first thing I did was download, burn, and boot from the Ubuntu Rescue Remix live CD. This is basically a stripped version of Ubuntu with only some of the basic tools helpful for performing data recovery and forensic analyses.

I mounted the "victim" drive as follows (after sudo -i to get root access):

# mkdir /mnt/victim_drive 

I used fdisk and printed the partition table on the device in order to determine which partition was the one with the data I was after. In this case it was /dev/sda4.

# mount /dev/sda4 /mnt/victim_drive

This operation failed due to the bad superblock. Time to get to work.

I ran fsck.ext3 (my brother told me he had formatted it ext3) on the device:
# fsck.ext3 /dev/sda

Immediately it asked me if I wanted to make changes to a bunch of inodes, and some other stuff that sounded generally scary. At this point, rather than proceed, I hit Ctrl-C and decided to do things the "safe" way. Read on.

The "safe" way of recovering the drive involves working on an image of the damaged disk, rather than the actual one. This has 2 main advantages:

  1. The repairs will be done on a known working device, as opposed to a damaged one that could respond in an undefined way.
  2. Most of all, attempts to repair could ultimately result in even further damage, sealing the fate of the already damaged file system. If you are working on an image instead of the actual drive, you can always go back and try a different approach.
Next, I mounted an alternate location (this could be a network share or another drive) on which I could store the image of the damaged disk. Remember that you'll need at least as much space free as the size of the drive you are working on (actually more):

# smbmount //server/share /mnt/rescue

Then I grabbed an image of the drive using ddrescue (took a while!). ddrescue will automatically try to do low level repairs in order to get a complete image. mmls is part of the sleuth-kit (apt-get install sleuth-kit) - a handy tool to look inside the disk image once it's done.

# ddrescue -r 3 /dev/sda /mnt/rescue/diskimage.dd

# mmls /mnt/image.dd

DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors

Slot Start End Length Description
00: ----- 0000000000 0000000000 0000000001 Primary Table (#0)
01: ----- 0000000001 0000000062 0000000062 Unallocated
02: 00:00 0000000063 0028017359 0028017297 Linux (0x83)
03: 00:01 0028017360 0029334689 0001317330 DOS Extended (0x05)
04: ----- 0028017360 0028017360 0000000001 Extended Table (#1)
05: ----- 0028017361 0028017422 0000000062 Unallocated
06: 01:00 0028017423 0029334689 0001317267 Linux Swap / Solaris x86 (0x82)
07: ----- 0029334690 0029336831 0000002142 Unallocated

This is a list of all the partitions that are in the disk image. Now we can operate on the disk image and try to repair the partition. The target partition is Linux (0x83), starting at sector 63. We need to know the offset in bytes, so we multiply the 512 byte sectors

63 * 512 bytes/sector = 32256 bytes.

Now we can work on that partition. We first setup a loop device with the image file, and then run fsck on it:
# losetup /dev/loop2 fsimage.dd -o 32256

# fsck.ext3 -y /dev/loop2
The -y option is useful when you have a partition with lots of problems - it's better than sitting there and repeatedly answering "yes" to each of the prompts.

With any luck, after this, you now have a clean disk image that is ready to be mounted. Mount it as follows:

# mkdir /mnt/recovered
# mount /dev/loop2 /mnt/recovered
In my case, the files were all there, but they had all been dumped into the lost+found directory, and they were listed under directory names of their respective inodes. I was able to quickly locate the files I was looking for using a command such as this:

# cd /mnt/recovered/lost+found
# find | grep home/wmcgrath

If the results had been less successful, I could have gone back to the drawing board. Unmount the image, get rid of the unwanted loop device, and (re)move the image file, such as:

# umount /mnt/recovered

# losetup -d /dev/loop2

# mv /mnt/rescue/diskimage.dd diskimage.failed.attempt.1.dd
Now you can go back and make a new image as we did in the beginning, and try other approaches. More information is available at the Ubuntu data recovery page

Remember - the best way to avoid losing data is to make regular backups, but if you have to "operate", this is the safest way to do it!