Saturday, January 3, 2009

How to repair a damaged linux partition - the safe way

The other day, my brother asked me to recover the data from a linux hard drive that would no longer boot for him. He said the superblock was reported as damaged and the machine refused to boot.

I took the hard drive home and put it in a spare machine I had. Presuming the hard drive was damaged, I decided it was best if I didn't try to boot off of it. The more the hard drive is used, the worse the chances of recovery.

The first thing I did was download, burn, and boot from the Ubuntu Rescue Remix live CD. This is basically a stripped version of Ubuntu with only some of the basic tools helpful for performing data recovery and forensic analyses.

I mounted the "victim" drive as follows (after sudo -i to get root access):

# mkdir /mnt/victim_drive 

I used fdisk and printed the partition table on the device in order to determine which partition was the one with the data I was after. In this case it was /dev/sda4.

# mount /dev/sda4 /mnt/victim_drive

This operation failed due to the bad superblock. Time to get to work.

I ran fsck.ext3 (my brother told me he had formatted it ext3) on the device:
# fsck.ext3 /dev/sda

Immediately it asked me if I wanted to make changes to a bunch of inodes, and some other stuff that sounded generally scary. At this point, rather than proceed, I hit Ctrl-C and decided to do things the "safe" way. Read on.

The "safe" way of recovering the drive involves working on an image of the damaged disk, rather than the actual one. This has 2 main advantages:

  1. The repairs will be done on a known working device, as opposed to a damaged one that could respond in an undefined way.
  2. Most of all, attempts to repair could ultimately result in even further damage, sealing the fate of the already damaged file system. If you are working on an image instead of the actual drive, you can always go back and try a different approach.
Next, I mounted an alternate location (this could be a network share or another drive) on which I could store the image of the damaged disk. Remember that you'll need at least as much space free as the size of the drive you are working on (actually more):

# smbmount //server/share /mnt/rescue

Then I grabbed an image of the drive using ddrescue (took a while!). ddrescue will automatically try to do low level repairs in order to get a complete image. mmls is part of the sleuth-kit (apt-get install sleuth-kit) - a handy tool to look inside the disk image once it's done.

# ddrescue -r 3 /dev/sda /mnt/rescue/diskimage.dd

# mmls /mnt/image.dd

DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors

Slot Start End Length Description
00: ----- 0000000000 0000000000 0000000001 Primary Table (#0)
01: ----- 0000000001 0000000062 0000000062 Unallocated
02: 00:00 0000000063 0028017359 0028017297 Linux (0x83)
03: 00:01 0028017360 0029334689 0001317330 DOS Extended (0x05)
04: ----- 0028017360 0028017360 0000000001 Extended Table (#1)
05: ----- 0028017361 0028017422 0000000062 Unallocated
06: 01:00 0028017423 0029334689 0001317267 Linux Swap / Solaris x86 (0x82)
07: ----- 0029334690 0029336831 0000002142 Unallocated

This is a list of all the partitions that are in the disk image. Now we can operate on the disk image and try to repair the partition. The target partition is Linux (0x83), starting at sector 63. We need to know the offset in bytes, so we multiply the 512 byte sectors

63 * 512 bytes/sector = 32256 bytes.

Now we can work on that partition. We first setup a loop device with the image file, and then run fsck on it:
# losetup /dev/loop2 fsimage.dd -o 32256

# fsck.ext3 -y /dev/loop2
The -y option is useful when you have a partition with lots of problems - it's better than sitting there and repeatedly answering "yes" to each of the prompts.

With any luck, after this, you now have a clean disk image that is ready to be mounted. Mount it as follows:

# mkdir /mnt/recovered
# mount /dev/loop2 /mnt/recovered
In my case, the files were all there, but they had all been dumped into the lost+found directory, and they were listed under directory names of their respective inodes. I was able to quickly locate the files I was looking for using a command such as this:

# cd /mnt/recovered/lost+found
# find | grep home/wmcgrath

If the results had been less successful, I could have gone back to the drawing board. Unmount the image, get rid of the unwanted loop device, and (re)move the image file, such as:

# umount /mnt/recovered

# losetup -d /dev/loop2

# mv /mnt/rescue/diskimage.dd diskimage.failed.attempt.1.dd
Now you can go back and make a new image as we did in the beginning, and try other approaches. More information is available at the Ubuntu data recovery page

Remember - the best way to avoid losing data is to make regular backups, but if you have to "operate", this is the safest way to do it!

1 comment:

Reet said...

Data loss in either of the case is the unforgivable sin of any operating system including Linux. To be sure that the frequent file corruption is due to some software problem, you first need to ensure it through the hardware diagnosis; else you would adhere to the wrong problem. The file corruption realization may start from the simple system freezing.

Retaining the backup is the good practice. In that case, for Linux Recovery, you can boot your system in recovery mode and use rsync-au to copy all the affected files and restore them. But the lack of the proper files may not solve the issue.