19 February 2013

Backups: Your Relationship to Your Data and Your IT Guy

My plan for this blog was a series of tips, tricks and suggestions for good practices in using the various pieces of technology I work with on a daily basis as well as my thoughts on using those to support good practices in science. Well, today I have a few tips not just on the technology side, but also on the social side, all inspired by an email I got this morning (loosely translated and anonymized) :

Good Morning [my name misspelled], 
I'm turning to you, because I don't know where else to go. I somehow -- I really don't know how -- managed to delete my pictures folder, and no, I don't have a TimeMachine Backup...[everything is gone, list of important events whose pictures are gone]
I tried the trial version of Data Rescue 3 and saw that the pictures are still "there", but the trial version will only rescue a small amount of data. I really can't afford the 50€ to buy the program at the moment, and who knows, how often you [impersonal -- this is clear in the original] really need it. I'll definitely start using TimeMachine now!
I've asked around and don't know anybody who has such a program. Can you help me? 
It would be really great if you could, because those are really unique and precious memories for me. I'll gladly make sure that you get good, strong coffee this month and next. [This last part sounded better in the original, but the literal meaning is correct.]
Best,
Anna [name changed]

Now the person in question is a passing acquaintance, who worked as the student assistant for a workgroup on the same floor as mine where a few of my friends work, and the computer in question is a personal machine. Finally, I'm not actually in terms of my contract an IT person, I was just more or less drafted into it because I can do it and generally enjoy working with computers.

So that's the baseline information. Now on to what we can learn from all this.  I'm going to discuss:
  1. How deletion works and why programs like Data Rescue 3 can (sometimes) undelete
  2. What this means if you find yourself needing such a program
  3. Why you should still be using real backup software and a few recommendations on that front (i.e. there is no excuse for not backing up given the utilities built in modern OSes)
  4. What we can learn from Anna's experience in terms of dealing with your IT guy (and for the IT guys: how to not come off as a jerk yet not get abused by coworkers) 
This is clearly going to be a long one…


How deletion works and why programs like Data Rescue 3 can (sometimes) undelete

On the majority of filesystems out there today, there is some sort of "table" which lists files and their locations on disk. (Sorry, I know, I'm massively oversimplifying this, but if you know how filesystems work, you should probably skip this section anyway.) Now, the locations are actually fixed size chunks of the disk (think of them as rooms in a building) and a file may be bigger than a single chunk in which case it's spread out amongst several locations. These locations don't necessarily have to be adjacent. For example, the file could "grow" after the location immediately following it is filled -- this is the same as what happens when the storage capacity of a room is exceeded but the adjacent rooms are already taken --   and then the rest of the file has to be written to another location. (This is the "fragmentation" that defragmentation tries to get rid of -- clearly, it's more efficient to read the data when it's all in a row, at least on traditional hard drives. SSDs  don't really suffer from this problem, but that's a discussion for another time.) 

When you request a file, the location is looked up in the table and fetched for you. When you write a file, you can determine the free spots on the drive by looking at the same table. (Again, an oversimplification, but this works for now.) So, when you delete a file, you don't actually need to go out to the location on disk and erase the data there, you can just delete the entry in the table and mark the disk location as "free". (Sometimes you see the option for "secure delete" or something similar -- in that case, the data is actually written over by some pattern.)

On modern OS X, Windows and, depending on your choice of desktop, Linux, there is an extra level of diversion. You first move things to the "Trash" or "Recycle Bin", where they aren't really deleted, just stored in a special location in the directory structure. If you then decide to delete for good, you empty the waste storage container  in question and the files are removed from the allocation table and their location on disk is marked as free.

Programs like Data Rescue 3 take advantage of all this when recovering deleted files. Instead of looking at the allocation table, these programs search the sections marked as free looking for bits and pieces of deleted files. If they can assemble an entire file, it's then listed as recoverable and you're given the option of getting it back. 


What this means if you find yourself needing such a program

Too long, didn't read: don't do anything with that computer, because almost everything can lead to a disk write, which potentially means overwriting your files now in the area marked as "free"! 

This is a subtle but important point: if you use your web browser, then it has a disk cache, which means disk writes. If you use your email client, then it also has a local disk cache. Your operating system has various logs it keeps, which means disk writes. Even your music program writes to disks to keep track of things like play counts. If you're really lucky, then there is still a lot of free space on your disk and the portion with your files on it won't come up for a write. But there's a lot of luck involved there even if your disk is mostly empty. If your disk was nearly full, then it's just ticking away the pieces of your data with every second.

So don't touch the computer except to do the recovery! Don't even download the recovery software or look up the recovery software online with that computer! Even the install process for the recovery software risks overwriting valuable data! (This is why the better ones offer the option of running directly from CD or the like.)  

Why you should still be using real backup software and a few recommendations on that front (i.e. there is no excuse for not backing up given the utilities built in modern OSes)

So, as you can see, recovery software is an iffy proposition at best. But there are a lot of more catastrophic problems that can befall your data -- your drive could die (there is a long list of horrible and sudden ways for disks of all kinds to die, and the operating environments of laptops only make it more likely). Various methods of adding redundancy to your disks are also problematic at best -- even the best RAID setup won't save you from accidental deletion (because the redundancy is only in space and not in time) or the problems that happen when additional disks fail in the rebuild process (which happens more often than you think).

Now I'm a huge fan of ZFS -- it provides protection against silent corruption, the copy-on-write snapshots provide a TimeMachine like history for your files, and its method of rebuilding make it quite a bit robuster than traditional RAID. But it's not ready for the average home user, and a lot of its advantages only become apparent in a multiple-disk, server-type setup, which means it's not going to help you on your average laptop.    

But both Windows and OS X have offered for several years a passable backup system. Windows has supposedly had one built in for many years, but starting from Win7, it's become quite useable. I haven't had much experience testing it myself (luckily, the disk in the machine I use it on hasn't shown any problems yet), but I've heard generally good things about it. On OS X, you have TimeMachine, which is really the easiest thing you could imagine. The only thing you have to watch out for is that older  backups don't get deleted when the backup disk gets full. Both provide coarse-grained historical versioning, so you can even revert a file that you changed in an undesirable way. (Though for such things, you really should consider getting into the DVCS groove.) In the newest versions of OS X, you even get the versioning bit to some extent without the TimeMachine.

If you're using your computer at work or for work, many large organizations offer a central backup service. While it might not be practical to do the initial backup from anywhere but the internal network, incremental backups mean that small changes don't take that long to backup even from home. 

Especially if you're working on something like a valuable presentation or your thesis, use something like Dropbox or SpiderOak or Wuala. Dropbox is probably the easiest to use, but the others have their advantages, too, especially if you're concerned about privacy of cloud storage. These services all offer versioning (if rather coarse-grained temporally), so you can restore things from older versions (for Dropbox, you have to use the web interface for this), and even if your computer, apartment, etc. spontaneously combusts, you still have a copy somewhere. (Oh yeah: if you're really paranoid, you should have at least one backup stored offsite just in case some physical calamity -- like say a burst pipe or fire -- befalls both your computer and your backup disk.) 

If you need versions, I really, really, recommend looking into DVCS for a variety of reasons. I use Mercurial most of the time, but Git and Bazaar are also quite good. Joel Spolsky gives perhaps the best introduction to all the concepts involved. SparkleShare provides a Dropbox-like interface for Git, but there are all sorts of graphical tools for the different systems, which hide the command line but not the abstractions involved (better if you're doing anything nonlinear). 

What doesn't work is just copying your home folder over to an external drive every couple of days. Let's be honest: if it's not automatic, you're not going to have truly regular backups. Further, you're either wasting a lot of space in redundancy by having multiples copies for each day/week/month's backup or you're continuously overwriting the old backup, which means that you only have one snapshot. If you discover afterwards that you deleted the files accidentally, then your one snapshot won't have them either. In both cases, you're wasting a lot of time on full copies, when most of the time you only need to note the comparatively small changes. (This is how things like TimeMachine work so efficiently.)


What we can learn from Anna's experience in terms of dealing with your IT guy (and for the IT guys: how to not come off as a jerk yet not get abused by coworkers)

So, Anna's bottom line was "Livius can help me and I'll save 50€." Now, my post-taxes wage is around 15€/hour (officially, i.e. under the assumption that I work exactly the number of hours in my contract, which is a horrible joke for doctoral students). So, if I take more than about 3 hours and 20minutes for this little project, then the cost to me -- in terms of pay, ignoring things like time stress for my real job -- is equal to the amount she would have paid for the software. And she hopes to reimburse me with coffee? 

The problem here is of course the relative price of my time and the value of her pictures. Clearly, those pictures mean a lot to her. And clearly, 50€ for a one time thing is also an expensive proposition. But my time is expensive, too. And I've made it far enough that I don't depend on coffee donations to make it. At least take me to a nice lunch. Wait, scratch that. I don't want to eat lunch with you, I want to eat in peace. Bring me a nice lunch from my favorite Indian place so that I have something to do while babysitting the recovery process, and go read a book somewhere until I'm done (but do leave me your password or the password to a temporary admin account).  

This is why your IT guy is so grumpy. And when your IT guy isn't actually getting paid to do IT for the computers at work, but is still forced to do so, and then you bring your personal computer with a problem caused by not following good practice and being too cheap to pay for your mistakes, and you expect him to make it all magically go away for a cup of coffee, it gets irritating. Sometimes, there is a bit of two-minute magic, and then it's less annoying. But anything involving large amounts of data is going to take several hours.  

So, what can you do as a non tech person?

Well, first, stop making excuses and start making backups. Second, follow the advice in the Cracked articles I linked up there -- the Cracked style can be a bit … offensive at times, but it's solid advice. Don't get panicky or hovery, just be honest and nice and understand that you are asking for a major favor. And finally, don't be cheap. If you can't afford 50€, but you can do 20€, then offer to buy your tech guy that nice lunch and then leave him alone to eat in peace. If you can afford 50€, but are too cheap so spend it on that software, then take an honest look at just how valuable you think those pictures actually are. And if you can afford the software, but are scared of screwing something up while using it, then go to your tech guy and ask him for help with the offer of purchasing the software. If you don't have any more money to motivate him, consider giving him the software license to keep as a small thank you for helping you out.

So, what can you do as the tech person?

Slowly, but surely introduce and enforce some basic yet  important policies.  For me, this means that work machines are generally free to repair, but depending on my relationship to you and the nature of the problem, I may bill you for repairing your personal machine. Also, if you have a problem involving a piece of software I have explicitly mentioned as not supporting or have a problem that follows from ignoring my previous advice, then you get less sympathy and a lower spot on my todo list. For the cost of certain repairs, I have a price sheet that divides up "active work time" and "passive observation (e.g. monitoring a process where I can do something else on the side)" into two separate price categories. 

I used to feel bad for taking payment for tech services, but it seems to be the only way to make people realize that they are actually asking a non trivial favor for me.  Everybody thinks computer repair shop rates are high; part of the reason for that thought is seen in how little people value computer guys' time as evidenced by the nature of favors asked of tech people. Doing things at an hourly rate, which is still less than the average repair shop despite offering at times better service, keeps things fair for everybody.  But if you are charging somebody for service, you have to remember that part of service is being polite and less pissy. Roll your eyes the second they're gone and absolutely remind them of best practices when telling them what went wrong. But don't assign blame (it really doesn't matter at that point) and be at least civil. 

For coworkers, especially when you're an unofficial IT guy, introduce set "tech office hours". You're available to help during those times, but not outside of them for non critical things. A system bursts into flames? You can help outside of office hours. They put something off and suddenly realize at the last moment that they can't get their script to work? Don't screw yourself over helping them: state clearly and honestly that you have your own things to worry about and you'll help if you get your stuff done in time. It's tough love, but it seems to me that it's the only way to help without getting abused in the process.

Of course, all of these are somewhat influenced by the type of relationship you have with a person in and out of work. Your significant other should probably get tech support more or less at whim! 

Oh, and one final tip: be very clear about what can go wrong with certain repairs and make sure to absolve yourself of liability (in writing if need be). Last summer, I repaired an ancient PowerBook, whose ridiculously small screws tightened way too tight  quickly stripped. (I think there's a metaphor for Apple in there somewhere…)  I wound up having to drill out one 5mm long screw located directly above the motherboard. Not fun. But at that point, the risk was worth it -- if I didn't get that screw out, the system would remain completely dead anyway and so there was nothing to lose in some sense. But I was terrified that I would somehow be held liable if I screwed it up. So, get it in writing.

I've only partly implemented these suggestions, but after today, I'm going to have to start following my own advice.

What became of Anna? Luckily, I found a piece of free (GPL) software that seems to be doing the trick as I write this.  You have to use it via the command line, which is a bit scary for some users, I'll admit, but it actually had informative and nice prompts as well as a decent wiki page. So, there was some magic to be had after all. But that coffee better be really good.

No comments:

Post a Comment