Log in

No account? Create an account
Trevor Stone's Journal
Those who can, do. The rest hyperlink.
The Pros and Cons of Dropping Your Hard Drive 
5th-Nov-2009 01:16 am
bad decision dinosaur
Long story short:
I dropped my hard drive on the floor. It stopped working. The only stuff that wasn't copied elsewhere was music. I bought a big new hard drive. I copied a metric crap load of music from my brother's computer. I copied files from a whole bunch of MP3 CDs. I copied a bunch of music from my ex (from a hard drive that used to be mine). In the end, I have a lot more music than before and still have most of my original collection.

As I got settled in to my parents' house after Burning Man, I was confronted by a persistent problem in this house: the lack of flat spaces not covered with tons of crap. Places I can sensibly do computer work are essentially limited to the dining room table, easy chairs in the living room, and the bedroom I'm occupying. My summertime computer activities had mostly been in the dining room, but I wasn't a huge fan of moving my laptop and external hard drive around when I wanted to eat and I wanted to be able to work in private sometimes. Since there's a bookshelf I could clear off next to the bed and the convenience of using a computer before I use the bathroom is pretty sweet.

OSHA would not be impressed with my setup. Sitting on the edge of a bed without back support and reaching my arms pretty far is not an ideal setup, and my back has tightened itself into some impressive knots. As a variation of routine, I sometimes move the laptop to the bed and sit cross-legged. The challenge for this maneuver is the shortness of my external hard drive's firewire cable. It's a little longer than the bookshelf-bed divide, but any laptop movement on the bed is pretty restrictive. I'd already pulled it off the shelf once by accident, though it seemed to recover fairly well. The cable occasionally got unplugged too, but that's not a big problem unless a write is in progress. Technically, I don't need my hard drive plugged in, but that's where all my music is (200 GB wouldn't fit in my internal HD's free space) and listening to music makes job searching much more enjoyable.

Astute readers can probably see the impending failure. In late September, I made a slight laptop position adjustment while music was playing. This yanked the hard drive off the shelf and on to the (carpeted) floor. "Crap!" I said, putting my laptop back in a stable location and plugging the drive in. There began a bunch of hard drive reading noise. "Good, it's probably checking its data sanity," I said. When iTunes finished playing the (buffered) song, it paused for an extended period while the waiting for I/O operations to respond. I'm a little panicked, but it comes back to life. A song or two plays, then one ends abruptly. I play it again and it again ends abruptly. I quit iTunes as a precautionary measure and continue working.

A little later, when copying images from external to laptop hard drive so I could go work on them in the living room, I get an I/O error on one of my Burning Man pictures. Uh oh. I try again, same result. They're still on my CF card, so there's no catastrophe looming. But I start to suspect that the tumble may have damaged some files. I immediately copy my pictures folder to my laptop. Most of them are already in iPhoto and thus technically backed up, but I like having them in a simple filesystem structure too. Fortunately, that one BM picture is the only one with errors. I dig through the rest of my home directory backup from my last computer, looking for things that hadn't yet joined my new homedir. No I/O errors there, whew. (Or maybe there were, but of things I had elsewhere.) Now the only unsecured data is music files. I poke at the song that was finishing early. It's got I/O errors like the photo did. Realizing I don't have room to copy 200 GB of mp3s onto my available 40 GB of internal hard drive, I run md5sum -r /Volumes/threemusicians/music/. This prints an error for each file it couldn't read fully and a fingerprint for each file it could; I figure I can take the list of files it failed on and see what I can do with those. The command takes a while on that much data, so I let it run and go to sleep.

When I wake up, my computer is essentially frozen. The mouse will move, but nothing is responsive. It sounds like the hard drive has spun down. I restart and continue the process, but it proves unstable for the computer. I turn the hard drive off and think up a plan. "I need a new hard drive, clearly. But Finder gets stuck on an I/O error, so I should write a script to start copying the songs I can't easily replace. Which are those?" Sixty or so gigs are from CDs I own that I ripped in March, figuring I should have a backup in case anything happened to them in storage. So I don't need to save those. A lot of the music on my hard drive came from my brother. Even more of it I downloaded in college and he copied and burned to CD. I gave him a hard drive identical to mine for Christmas a few years ago and copied my whole music library at the time, though he deleted a lot of files which were (a) duplicates or (b) not his musical style. Plus, Tam has my original external drive with my music collection at about the same time, minus the stuff that annoyed her but I wouldn't delete (from the Grateful Dead to Symphony for Dot Matrix Printer). My brother pointed out that he and Tam have fairly opposite music tastes, so if both had deleted something "Maybe you don't really need it." So really, I had a poor man's distributed backup system. The only large chunks I don't have access to are songs I copied from coworkers at the job I left this year. While I dig a bunch of those albums, they're mostly well known, so I should be able to find them again.

I ran locate on my music path, sticking the file path of every music file in a text file (try that with Spotlight!). I turned each path into an empty file on my laptop with a ".missing" extension, deciding that my script could find .missing files and copy over them; at any point, all .missing files would be the set of things I still need to recover. For the partial list of files that had md5 errors, I renamed the .missing files to .error so I didn't get hung up on them and so I could distinguish files that I needed to replace. I parsed all the artists and albums from my album reviews and had a script create files in appropriate directories to indicate the whole folder didn't need a copy. I then got a list of files on all my brother's hard drives and a Disc Catalog export of all his mp3 CDs. I wrote some scripts to rename .missing files to indicate that I could get them from him, but this is where things started getting complicated. iTunes keeps files organized by the id3 tags' artist and album. Unfortunately, a lot of the mp3 CDs were burned either before it had that feature or without passing through the massaging of iTunes. In some cases it was just a matter of "03 - Time.mp3" versus "03 Time.mp3," but other cases are more perverse: spaces and characters removed from the filename to fit in MacOS Classic's 31 character limit. Album folders with the year in the name. And then there's the "Box Set Disc 2/15 Everydays.mp3" versus "Box Set/2-15 Everydays.mp3" problem. I've often gotten anal about id3 tag correctness, so older copies don't necessarily match the latest file structure. I considered getting fancy and matching just by track number, looking for X% commonality in strings, or prompting for match potentials, but I decided to just set up as many exact matches as possible and hope I'd managed to get the target size below my laptop's free space.

After hacking on things for a while, I figured I should buy a new hard drive so that I'd have room for whatever I could copy. Since my original problem was wanting to listen to music without being tied to a corner of my bedroom, I figured a wireless hard drive would be a good call. Looking around, I found that Apple's Time Capsule is around $200 more than a regular external hard drive. This is partly because the Time Capsule is also an Airport base station, a function I don't need since I already have a router. Other NAS products were likewise significantly more expensive than storage alone. Since I don't know what my "stable" computer and network setup will be once I'm out of my parents' house again, I opted to spend about $200 on a 2 terabyte USB drive. (I was rather fond of my old drive's Firewire800 + Firewire400 + USB + eSATA configuration, but none of the drives at a local office or electronics store had F800, so I just went for straight USB).

So at this point I had a bunch of short python scripts, a big honkin' hard drive, and a hack of a copy plan in place. I turned my external drive on for the first time in a week so I could snag the hardest-to-recover files. It spun up, whirred and clicked like it was reading data, made a disheartening noise, and then didn't want to mount. I ran Disk Utility's Verify and Repair functions, which previously had found nothing wrong. Partway through, I think things crashed. I rebooted and tried again, but the hard drive wasn't making access noises, so after two hours I force quit it. I ran FileSalvage, a program that reads the disk in raw mode and infers files, and it made no progress. (Technically, it showed a slowly advancing progress bar at the rate of 128k every few seconds, but that's a lie; I don't think the program believes it can fail to make progress.) The drive just didn't want to provide any data. So much for my elaborate copy scheme.

So now on to plan B: grab stuff from my brother. In the last six years or so, he's been quite the music magnet. Between the public library, usenet binaries groups, blogs, fileshareing networks, LP ripping, and other sources, he's amassed quite a collection of bluegrass, old blues, acoustic guitar, Indian classical, European classical, Hawaiian, Celtic, and all manner of other genres that are, in most places, well outside the mainstream. I copied the various music folders scattered about his four hard drives to my new terabyte rasa. Adding all that to my missing set brought my total library time up to 400 days; it had started around 100. Next, I started working my way through his booklets of mp3 CDs, often with a clever title and theme. Remember kids, CD-Rs aren't the world's most reliable media; several files had read errors and these are all less than 10 years old. I discovered that if I put files in my new music folder with the same folder and filenames as my old music folder, iTunes was smart enough to adjust the absolute path, so information about the track (playlists, play count, etc.) wasn't lost and I didn't get two entries for each track, one missing and one present. I probably spent a little too much time finessing that structure, but the anal part of my brain took over.

I also discovered that iTunes writes its whole music library (track locations and metadata) whenever anything changes and the GUI is generally nonresponsive during the write (though music keeps playing). This was slightly noticeable when I had around 150 GB of music, but it's a major time drag with a terabyte of songs and a 264 MB library XML file. Making minor corrections to id3 tags takes several seconds of waiting for I/O; not long enough to read something in Firefox, but long enough to feel like I'm wasting time. So for the couple hundred folders my brother had with files missing an artist or album, I turned to python again. The mutagen module can read metadata for several types of media files and provides a dictionary-like interface to that information. My brother's organization scheme (or, more precisely, the set of organization schemes used by people he's downloaded files from) typically contain artist and album information, but in such a heterogenous way that a script to reliably infer it all would get really complicated. The simple approach I took was to print filenames, artists, and album titles for each music file in a folder with incomplete tags, then prompt for an artist and album. If there was a unanimous choice among partially-tagged files, that was suggested. If not, the folder structure was parsed for suggestions. In practice, I usually ended up drag-n-dropping part of my terminal window. Part way through I also learned some of the annoyances of python's str/unicode split and got hit in a few ways that stricter typing might have prevented.

The final (at least until I get my CDs out of storage) stop on my music reclamation path was in Pueblo. I visited Tam for a night, petting cats I haven't seen for over a year, talking about life, providing advice on packing lighter, and so forth. She'd deleted some music she didn't like or that took up a lot of space, but most of my circa December 2007 mp3 collection was in tact. Hooray!

Current library stats (including a few thousand missing songs, excluding some audio books, but including others I haven't reclassified):
178,731 tracks
495 days, 17 hours, 21 minutes, 59 seconds
1,021.56 GB
8158 distinct artist names
12673 distinct album titles (lots of Some Album Disc 1/2 titles need to be fixed)
845 genres, though some are frivolous

Largest folders:
Various Artists11.82
Unknown Artist9.95
Frank Zappa7.63
Grateful Dead6.82
Arte Flamenco5.8
King Crimson3.45
Martin Simpson2.58
Pink Floyd2.44
Tom Waits2.38
Cocteau Twins2.08
Yo-Yo Ma2.01
Bob Dylan1.99
Kronos Quartet1.96
Joni Mitchell1.96
John Hartford1.95
Pablo Casals1.93
Lonnie Johnson1.92
Johnny Cash1.84
John Fahey1.84

Time spent on this project: Way too much.
Next project: Figuring out a genre/world music arrangement scheme so when I'm in the mood for something I have some idea what my options are.
This page was loaded Feb 25th 2018, 2:10 am GMT.