Category: Preservation & Archiving

  • 1.7 Terabytes richer, 12,000 Songs Poorer

    Two weeks ago I discovered that a 400 gig hard drive crashed. Completely. Total data loss.

    I did not lose a lot of data, and almost all my personal data and important docs are safe, but I’m sure I lost several things I can’t remember. More importantly, my complete mp3 collection from 2001 is down the drain. As luck would have it, I lost my 40 gigs mp3 player two weeks ago and was in fact waiting for my 80 gig Zune replacement.

    For several hours I was in a state of panic. I was afraid that my main documents partition was down.

    It became immediately clear to me that my data backup and storage solution was unacceptable.  I had the technology and the know-how, but not the attention span to ensure it was being executed properly. Details:

    • network backup.  I had a paid subscription to Mozy Backup, with unlimited storage space. Unfortunately, I tended to use it not as often as I should because of my slow bandwidth. (I upgraded to 3.0 Mbps recently).  Now it is perfectly clear how much bandwidth limits your ability to implement a backup solution.
    • DVD backup. I burn a DVD backup of my most critical data every 2 months or so. But I didn’t have a regular schedule of that, and it was purely random that I had a backup from  March.
    • keeping Storage media. Even when I remember to back up storage media, I often forget where things are. A few years ago, I implemented the 3 Copy rule. 3 copies of everything, each in separate CD cases. Amazing how easy it is to lose track of where I put things despite this rule.
    • addresses/financial information. I have a nifty password manager called EWallet, which exists both as a PDA and a desktop application. So even if I lose my PDA, the data is still safe on my desktop. (And vice versa). But what if I lose my PDA (which I did in March)?  And what if my desktop hard drive fails before I reload the data onto the PDA ? (which is what happened). 
    • backup to external drives. This seems like a good solution. And it generally is. But I have three external drives and each behaves erratically. Two of them appear only as USB 1 drives (making it extremely time-consuming  to copy lots of data onto them). My motherboard has an ESATA connection, but I’ve never managed to get it to work. Backup software isn’t usually good because it copies incremental backups rather than rsyncs (which is what I really want). If it’s going to take 20 minutes to make a backup, I probably will forget to do it.
    • USB eccentricities.  Originally people claimed USB was infinitely extensible, that Windows would just recognize everything like magic. Reality is less positive. Apparently USB external hard drives only work when the USB cable is hooked directly to the back instead of one of the hubs. Sometimes when I remove USB devices, Windows either doesn’t notice it or or doesn’t notice when it is reinserted. It is hard to tell Windows to rescan. One problem is that USB hubs aren’t particularly reliable. Also, they become hot (if they are powered). I frequently had problems with removing and inserting USB cables to a hub. Even the USB connections at the front of the machine don’t work particularly well. Frankly, the most important USB device on my PC is my network card, and that seems not to be as reliable as I would have thought. Finally you have to be extra careful about removing hard drives. If you don’t “safely removing hardware”, the hard drive can fail or require CHECKDSK or not properly throw files away.
    • Power Down. Nowadays I make a conscious decision to turn off my PC and turn off the air conditioning every day when I go to work. It saves me money and is good for the environment. The problem is that it decreases the window of time for doing backups. Is the extra cost in electricity really worth the security of backups?

    What was unhurt:

    • websites. Generally, if I updated something to a site, it is safe. (But dependent on the web host’s own backup solutions).
    • email. I have decided never to download my IMAP mail onto my client for various reasons.
    • music. Even though I lost a ton of music, most of the stuff I’ve been listening to in the last 5 years has been free or creative commons or public domain music. My main loss was the SXSW bit torrents. I have found the 2007 and 2008 files, but nothing else. More than anything else, I just need to know the names of the songs I lost. As luck would have it, I lost my 40 gig mp3 player at about the same time my Hard drive crash.Again, I used my

    What I lost:

    • I misplaced several installation CDs and also some applications I downloaded. Unfortunately, you can’t just download these things again. Sometimes the license is embedded in the download itself.
    • Time. So far I have lost about 12 hours of time on this nonsense. For almost 2 weeks my life has been a mess while I work through a solution. Right now for instance, I have three technical conundrums: 1)why isn’t my new Windows install booting? (I need the install CD to boot every time). 2)why does my new Linksys N USB wifi card not work as advertised? (it worked perfectly before). It works for a few minutes, then stops working totally. I suspect it may be related to the DSL service I may be using. 3)How do I update my Dell AXIM to the latest ROM update? I’m coming closer to an answer to that.
    • Money. I bought a replacement hard drive of $150 (1 Terabyte) and another external drive only for backups).

    Long Term Solutions

    • Google Docs and Google Gears.

    During this past week I have been rendered helpless and unproductive in my apartment. I spent 8 hours yesterday trying to resolve boot, network and USB problems on my XP. I have not gotten my PDA to work, nor have I started another backup solution or tried to recreate the mp3 collection. Also, I have a Vista Home Premium just waiting to be installed.

    Other problems:

    • Keeping better records of backups and licenses and a better reminder system
    • keeping my CDs/DVD’s stored in a safe place
    • Roaches!

    Up until now, I’ve adopted the rule of not worrying about system configuration when doing backups and instead focusing on writing–which is the only precious cargo I carry anyway. But the time, the time! (I thought Norton’s ghost was overkill, but I seriously will reconsider that assumption).

    So my question: when is this PC going to start making my life  easier again?

  • Backing up Important documents for emergencies

    Mike Panic on how to create a disaster-recovery/data backup usb drive of important documents. (Lifehacker has a more thorough but less helpful article on the same subject).

    Yesterday, I discovered three crazy things:

    1. my will was not in my safe (I had forgotten to leave it there)
    2. on my will a typo had indicated the percentage divvied out to people. From the context it was clear what the right percentage should be. It was spelled out and written out numerically, so the discrepency was apparent. Still, hells bells; lawyers need to catch these kinds of things!
    3. I use ewallet, a nifty password management program for my PDA. However, in the event of disaster/emergency, nobody would be able to access these things without my master password!
  • Welcome to 1922! (FAQ)

    (These FAQ are related to the Welcome to 1922! series I wrote for teleread. They list and describe some notable literary and artistic works produced between the years 1923 and 1931). I am not a lawyer and am not giving advice. I am merely sharing what I think to be accurate information at the time.

    How do I find out about which sound recordings from that year are in the public domain?
    How do I find paintings/visual works of art produced in an individual year? And how do I find public domain art in general?
    How can I find out if my congressman voted for this Freeze-the-public-domain legislation?

    (more…)

  • Tips on Saving & Protecting Your Digital Media

    About a year ago I did research into buying a media safe. There are really only two media safes/vaults priced for consumers (i.e., under $2000). Here they are. One costs $150; another costs $260.

    Spending money on a media safe seems extravagant, but we are collecting more kinds of media these days, and it’s unclear about whether we are taking enough steps to protect our data. (more…)

  • Brilliant Backup Strategy

    From a slashdot post.

    1. Buy 2 identically-sized USB hard drives.
    2. Write a script to backup your server onto one of them.
    3. After one week, unplug the USB drive, bring it to work, and put it inside a personal cabinet.
    4. Connect Drive 2 to the server. Let the script run normally on a one week rotation.
    5. After one week, bring Drive 2 to work, put inside a personal cabinet. Bring Drive 1 home later that day
    6. Either overwrite the original directory or slap things into a new directory. Repeat ad infinitum.

    So here’s what you have. You have incremental backups onsite and weekly backups offsite. All the rsycing/sambaing is done within your personal network, so there’s no security problems (other than the normal ones). You have hard drives which could be mounted by any server with a similar operating system. In the event of fire, you’d lose at most a week of work.

    The biggest problem would be archiving, but if you could fit all your data inside a 250Gig USB drive (and truthfully, that’s a helluva lot of space!), you could survive for a long time on it. 250 gig hard drives are costing about $200 these days ($400 for 2).

    On another note, I recently bought a H340 iriver mp3 player/recorder because my ihp-140 was having problems (and still is). I have about 38 gigs of content on my ihp-140, and if I lost the files, it would be no big deal as long as I had a log listing of these files. (Most are creative commons archives and wouldn’t be difficult to track down).

    Mike Rubel has an article about backing up with rsync and a slashdot post. (Another rsnapshot lets users restore backups without needing permission from root). Believe it or not, my current backup solution is not ideal, although when I move to a web server that will change.

  • Tips for Scanning Old Photos

    I need to scan lots of old family photos. Here are some handy articles about how to scan photos for archiving.

    Kimberly Powell writes a detailed guide on about.com. Tips for Scanning & Restoring

    Another excellent piece by Ralph G. McKnight about photo archiving for long periods of time (150 years, etc)

    Workflow on photo archiving, with overview and links to other articles (By Sue Chastain).

    Guidelines for saving photos in photo albums

    Great Scantips by Wayne Fulton, with FAQ, Restoration, How to Clean Scanners and comparison of file formats to save in .

    Article by David Mishkin on Restoring Old Photos .

    Protecting Old Photos by Australian Archive Association.

    Other interesting bits: how to date your photos (by visual cues): Maureen A. Taylor about how to follow the clues. Halvor Moorshead has more thoughts and examples. (He also has written a great concise article about scanning old photos) . Also interesting: Using clothing styles to date photos.
    General advice about handling a photo preservation project

  • Friendly URL’s

    Interesting piece on how to make more friendly URL’s by Waferbaby. The article is slightly dated; she writes,

    Why? Why should you go ahead and change a perfectly happy link to one that points to a directory (virtual or otherwise)?…

  • EXPANDABILITY. What if you, or the company you work for, decide to upgrade (or simply change) your site to use a different technology?for instance replacing your PHP?built site (about.php) with a system developed in ColdFusion (about.cfm)? If you stick to a directory structure URL (/about/), the page?s web address remains consistent, regardless of how the site is actually put together. And that?s gold; links don?t break, time is not wasted, and joy abounds. Build a neat directory structure from the get?go, and you’ll be thanking yourself down the track.
  • You may not want to expose the particular technology you?re using on your site to the rest of the world. By using a neat directory structure, you don?t have to. You may not want to expose the particular technology you?re using on your site to the rest of the world. By using a neat directory structure, you don?t have to…You don?t even need to actually use physical directories on the server; you can map the URLs on your site in any way you like, using mod_rewrite (a personal favorite).
  • Actually zope/plone also has URL remapping capabilities, as does the more sophisticated content management systems out there. The real concern I have is archivability. If you change server scripting languages, it’s a lot harder to remap if all your prefixes end in .asp or .php than a simpler backslash.

    On another note, I recently discovered that archive.org doesn’t do a particularly good job of archiving content. The solution? Use Alexa web browser, which will automatically report the URL to archive.org so that it caches data. I was offline for 2 months; ok, that was awful. But imagine my dismay to learn that after a month or so of being offline, all the google caches of my content had disappeared. I was unreachable. And although archive.org archived some of my content, most were 6 months out of date, and one important web page wasn’t cached at all. Using alexa ensures that this won’t happen.

  • Media Vaults

    Yesterday at Walmart I remembered that I should look for a “fireproof safe” or something that can protect my media content in my own apartment. They cost about 100-150$.

    Then I came across this great post on the alt.firefighters google newsgroup explaining the difference between fireproof safes and “media vaults” which is really what I need. It costs about $300-400, but it saves me the trouble of having to store something offsite. He refers to the UL Standards Page. Great stuff!

  • Simple OCR

    After successfully installing my USB 2.0 Canon scanner, I discovered that the included OCR really sucked. So I tried instead TOCR, a nonfree ($40) package that worked extremely well. The included OCR had almost everything (zones, spellchecker, etc), but it just gave lousy results. Now if only there was a linux/open-source version, I’d be happy. (Update: Here’s a link to linux OCR programs ).

    PS, I ended up buying the cheaper AOPEN DVD writer. Even though the Pioneer was faster, supported DVD+ and DVD- and was more portable, it came with Roxio (ugh!), cost $100 more and didn’t have easy playback (one thing I needed for doing screen captures from commercial DVD’s.

    I spent the morning trying to write a backup script. I like programming when I have the time for it. Shell scripting seems awfully lame though; vowing to do some python stuff.

  • E(x)Literature Conference

    In my previous post I mentioned James Boyle. Boyle gave a fantastic address, “The Opposite of Property,” at the 2003 e(x)literature conference . (Hear the MP3). BTW, this was the first conference I experienced remotely. I downloaded all the mp3’s and listened to 2 days of discussions about digital archiving and the sorry state of literature. Also recommended: Poet Stephanie Strickland reads her poetry (MP3), Stuart Brand talks about how a millenial library would be organized (MP3) and Howard Besser talks about the current technological challenges in archiving (MP3). All of these talks are incredible, and honestly, there are a dozen other talks (all online) that are great too. Everyone important in the eliterature field was there (except me of course!), and the issues of literature were laid out pretty thoroughly. Eliterature is a great bunch of people (I hung out with some of them at Hypertext 2000 Conference in San Antonio).

    Among other things, this conference is a trailblazer by offering mp3’s free for the public. If only more conferences could do this. I actually heard the entire conference while doing housework and going on a long road trip.

    Another rant about conferences. In a technological or academic field, going to conferences sustains a person’s mind and creativity. But the dirty fact is that conferences are expensive and most people go only if their company pays and if they can get the time off. I was amazed at hypertext 2000 to find that many Europeans had trekked to San Antonio for a conference. Great conference, but not THAT great. (And I flirted with the idea of going to Hypertext 2001 in Norway, until I started calculating the costs). The offline chat and networking can be great at these conferences, but almost no one can attend enough of these things (The two exceptions seem to be Cory Doctorow and Dave Weinberger, who go to every conference under the sun). The custom of “conference blogging” is a happy wonderful tradition that should offer the vicarious thrill of attending for the willing but destitute.

    If you develop a good reputation giving talks on the lecture circuit, that increases your chances of getting invited (and possibly even getting travel paid for). The more outrageous and pontifical you sound, the better (see Joshua Davis for how to do that). Some of these conferences are just outrageously expensive ($2000 plus for conference fees alone). I’ve even read about geek conferences given on cruise ships.

    Kind of Blue, a serial web novel by Scott Rettberg. Rettberg was one of the people who put together the world’s best literary experiment, The Unknown. Other email fictional narratives .

    One sign that your blogging is sapping your intellect is when you start linking to articles without actually reading them. Rettberg did a good blurb for Michael Berube, and I suppose I will have to concur. I’ll have to wade through his ouevre at my leisure.