Monday, July 11, 2011

A Short Story on a Waste Of Time

This is about wasting a lot of time, effort and some energy on an unfortunately not so successful transition from smaller to bigger disks. Actors include a few external drives, Time Machine, an iMac with a dying system disk and me, being a little stupid. Fortunately there were no really serious consequences, however if I ever face a similar situation again, I might come here and read up on how to migrate systems and backups more sensibly.

Previously on Lost Daniel’s machine

To understand the situation fully, this is what my drive and partition layout used to be:

  • 320GB internal hard drive called “Snow Leopard internal”
  • 2TB external USB drive
    • 500GB partition called “TM500” for Time Machine
    • 1.5TB partition called “TMRest” for media, disk images etc.
  • 2x1TB in a Firewire 800 Western Digital MyBook II enclosure, configured as a 1TB RAID1, used for media and VMware instances

The partitioning on the USB drive came about by migrating (cloning) a Time Machine backup from an earlier 500GB ATA drive in an external Firewire enclosure. The enclosure had failed, and as I did not feel like buying a new one for an elderly PATA hard drive, I replaced it with a new USB model, including a 2TB drive. However, I did not want to “waste” the whole new thing for Time Machine. Even with the 500GB drive I had had well over 6 months of backup to go back through, which was more than enough for me. So after transferring the Time Machine data from the old drive to the new bigger one, I set up the remaining 1.5TB as a media partition.

Early signs of trouble

A few weeks ago my iMac’s internal hard disk started failing. It manifested itself a spurious beachballing during random operations, even ones that would not obviously encompass intense disk activity like surfing the web or writing emails. At first I suspected a recent Flash or Adobe Reader updater to be the culprit, because I had had headaches over these before. Only after a few days did I realize that the system was logging disk read errors into the kernel log:

kernel[0]: disk0s2: I/O error.

Googling for that message quickly confirmed my suspicion and revealed that the disk was failing, making it impossible for the OS to function reliably. Fortunately the drive was kind enough to fail slowly, instead of a big bang, with this early warning I could make sure Time Machine had a very recent backup of my data, before shutting down the computer and taking it in for repair.

The old 320GB disk that came with the computer had been working since 2008, and while I would not have minded it living a little longer, this turned out to be a fine opportunity to get some more room to breathe on the internal disk, without having to resort to putting the Movies folder and the iTunes library to an external Firewire drive. So I had the shop install a replacement 2TB disk in place of the faulty stock one.

Room to breathe

Upon return I booted from the Snow Leopard DVD and let the Time Machine restore run. A few hours later I was happy to boot up the Mac as if nothing had changed. Well, almost nothing. For one, applications that had been downloaded from the Internet caused the “This is potentially harmful, because it’s from the Intarweb” confirmation dialog to appear, even though I had run these apps for ages before the restore, but more importantly, the available disk space was about 1.800GB more than before. So far so good.

Happy about the newly won capacity I started moving back the iTunes library from the external drive to the internal one. After that, I began moving several hundred gigs of iMove projects, too.

Remember the backups

When finished copying lots of data - which had taken several hours - it occurred to me that now the 500GB Time Machine partition on the USB drive would be way too small now, even though the system drive was only about half full. I thought about my options for a moment and came up with the idea to reset the Firewire enclosure to work in RAID0 mode, effectively giving me 2TB capacity instead of 1TB with the former RAID1 configuration, and let Time Machine use that volume for backups. Sure, RAID0 increases the chance of fatal failure, but as I also back up my data to Carbonite, I figured this would be an acceptable risk.

I would then reformat the 2TB USB attached drive as a single large partition for miscellaneous use.

Lots of copying and moving

Because of the upcoming repartitioning and reformatting I had to do a lot of shuffling around the remaining 1.000GB of data that was still lying partially on any of the three external drive partitions. This is when I realized that hard drives may have grown in size, but they have not grown in speed nearly as much. Even without the bottlenecks of Firewire 800 and USB2.0 copying around a TB of data will take an uncomfortable amount of time, even with directly attached SATA drives and sequential access. For example, assuming a (very optimistic) average speed of 60MB/s which I could get from the FW800 drive to the internal volume, moving 1TB (which is approximately 1.000.000MB will still take at least 4.5 hours. With USB 2.0 you can feel lucky if you get half the speed on average, easily doubling that time.

So over the course of three days I moved several TBs of data between the drives, partitioning and formatting in between. At last I ended up with this setup:

  • 2TB internal hard drive called “Snow Leopard internal”
  • 2TB external USB drive, called “USB2MEDIA”
  • 2x1TB in FW 800 MyBook II enclosure, configured as 2TB RAID0, called “TM2TBR0”

It would have been easier, had I not wanted to keep my 6 months of Time Machine backup history intact. Even though I have rarely needed it, I tend to rather err on the side of caution. Without that and the accompanying limitations with regards to what could be moved where and repartitioned how and when, I could probably have saved one or two large copy jobs.

Caught my mistake?

As I said earlier, after a few days and nights of shifting large amounts of data over way to narrow bandwidth channels, I was now sitting in front of my target configuration. Extra points to anyone who has caught my little oversight that made all this was somewhat in vain. The rest of you would probably have been just a confused as I have when I entered Time Machine for the first time after that, just to see that my backup history had shrunk down to only 2 weeks after all.

Turns out, I was a little too happy about all the space I had won on the internal drive - so I did not remember to turn off Time Machine after the first boot into the new system. Instead, I started importing all sorts of large media files from the external disks to the internal one. While I was doing so, Time Machine dutifully kicked in, noticed a lot of new material and started backing that up to the still 500GB volume. And because of all the new data and the much too small target drive, it did the only thing it could, remove old backups to make room for the new stuff.

In essence, I had lost my 6 months worth of history even before I came up with my elaborate (?) data moving and migration scheme…

So what?

Well, in the end I did not really lose anything very important - all my current files and some history are still there. Should I find out a day from now that I need to get back a version of a file from 6 months ago, I may be out of luck, but that’s about it. However, I am fairly sure this will not happen to me again, shouldwhen the next hard drive failure strikes. And now that you have made it through this text desert, maybe it won’t happen to you, at all :)

1 comment:

John VanHouten said...

Wow! That must have sucked. Thanks for sharing your story with the rest of the world though. I will be sure to avoid that Time Machine mishap when I switch hard drives soon.