Another HDD bites the dust… I thought I was prepared. It turns out that there are always unexpected roadblocks when dealing with digital data.
Most recently, I decided to use a RAID 1 scheme (mirroring) as an additional piece of my data security plan. I then have a remote time machine backup. The Time Machine disk is smaller than my RAID, and as such, I only backupĀ irreplaceable data, like photos and documents to Time Machine. The remainder, like MP3s and DVDs I left to the RAID for protection. In the worst case scenario, I can always re-rip a CD or DVD.
Lesson 1: RAIDs aren’t backups!
Things have been going fine, and I’d been living happily with this setup for about 3 years before disaster strikes. My disk disappeared while I was working, which causes all sorts of weird behaviors. One of the RAID slices failed. The failure was so bad that I couldn’t even read any smart data from the disk. I also couldn’t remount the RAID for some reason. I shut down, removed the failed HDD, and rebooted. The RAID disk reappeared!
But there’s a catch… I did a quick check to see if the data on the disk looks accurate. It turns out that it is not. In fact, it is 6 months out of date! How did that happen?? The only explanation is that RAID failed silently 6 months ago. I’ve been operating on only one disk this whole time!
Lesson 2: Mac OS doesn’t inform of RAID failure!
In fact, disk utility (in El Capitan) doesn’t even show problems with RAID sets at all. The only way to verify a RAID set’s health is to check manually using a command in the terminal, or use the Yosemite Disk Utility.
$ diskutil AppleRAID list
I promptly ordered a new HDD. My plan for recovery is to rsync the data from the out-of-date disk to the new disk, and then rsync the more recent backup from Time Machine on top of it. That will ensure that I have the most coverage of my data.
$ rsync -aPE /Volumes/Old\ Volume/ /Volumes/New\ Volume
When rsyncing, I use the options ‘a’, ‘P’, and ‘E’. The ‘a’ is for ‘archive’ and enables a lot of good switches. ‘P’ is so progress is displayed. This is helpful for gauging how the copy is going. ‘E’ is for extended attributes. I use this because I still have OS 9 files hanging around with resource forks that I am trying to preserve.
The plan goes accordingly, and all my data is merged onto the new HDD! I reboot, login, and then discover the new hiccup; I can’t write anything to my user folder. I can’t load preferences, save anything. I can’t even touch a file!
$ touch ~/test.txt
touch: ~/test.txt: Permission denied
I look at the permissions of my folder to make sure everything looks fine, and indeed it does!
$ ls -ld ~
drwxr-xr-x@ 99 michael staff 3366 13 Sep 14:47 /Users/michael/
I can’t wrap my head around what possible be wrong! If the permissions are right, and the user settings are right? What could the problem be?? My brother is smart enough to have me try listing with the ‘e’ option which shows extended attributes.
$ ls -lde ~
drwxr-xr-x@ 99 michael staff 3366 13 Sep 14:47 /Users/michael/
0: group:everyone deny add_file,delete,add_subdirectory,delete_child,writeattr,writeextattr,chown
Lesson 3: Time Machine puts protective ACLs on everything!
Access Control Lists (ACLs)… I hate ACLs… A wonderful combination of events led to every file being completely protected from all users via ACLs. First of all, I didn’t know this, but Time Machine protects all the backed up files using ACLs. This keeps users from unwittingly deleting things from their backups. It makes sense, but isn’t something I would have expected to be done on a file level. The second ingredient in this predicament is that I used the ‘-E’ option in rsync which copies the extended attributes. If I had left that off, this wouldn’t have been an issue…
Fortunately, there is an easy solution. Remove all the ACLs from everything! This is done with chmod.
$ sudo chmod -RN ~
This takes a while, but works as expected. My user folder is now working as expected, I can write and save documents. I can read my Photos library, etc. But there’s one more issue… I notice that my homebrew is acting funny. I look at my homebrew directory, and discover that none of the symbolic links have had the ACLs removed.
$ ls -le libpng-config
lrwxr-xr-x+ 1 michael 501 41 Aug 16 22:50 libpng-config@ -> ./Cellar/libpng/1.6.31/bin/libpng-config
0: group:everyone deny write,delete,append,writeattr,writeextattr,chown
Lesson 4: ACLs can’t be removed from Symbolic Links!
It turns out that using chmod to remove the ACLs from symlinks doesn’t work. The man page suggests that the -h should be used to remove the ACLs.
-h If the file is a symbolic link, change the mode of the link itself rather than the file that the link points to.
But that doesn’t work either.
$ sudo chmod -hN libpng-config
$ ls -le libpng-config
lrwxr-xr-x+ 1 michael 501 41 Aug 16 22:50 libpng-config@ -> ../Cellar/libpng/1.6.31/bin/libpng-config
0: group:everyone deny write,delete,append,writeattr,writeextattr,chown
Searching online yields similar stories. It sounds like this is a long standing bug. Time to pull out the big guns… A one line bash script to erase and recreate all symlinks!
$ find . -type l | while read link; do echo $link; dest=$(readlink "$link"); sudo rm "$link"; ln -s "$dest" "$link"; done
It works like a charm. I run that across all the data that came from Time Machine, and it looks like I am back in business. Full recovery!
I need to change my way of protecting my data. 2 things I am going to do differently going forward:
- I won’t rely on RAID 1. The lack of notification if there is a slice failure is unacceptable. I will now use a weekly rsync chron job to sync my data to a second drive in my computer. Having a fully duplicated drive is handy because you can switch to it immediately in the event of a failure, and not have to go through a lengthy restoration process.
- Full backup to Time Machine. Although I didn’t suffer (major) data loss, I think it’s worth buying a larger HDD to create a full Time Machine backup, and not just the irreplaceable documents.
So, there you have it. Hopefully this will answer questions for other people out there, especially about ACLs!