Below is a guest post by Walker Sampson, the Digital Archivist at the University of Colorado Boulder. Walker describes the disk imaging workflow he presented at the first ever BitCurator User Forum held January 9th, 2015.
It was a real pleasure discussing workflows with fellow practitioners at the BitCurator Users Forum this year. Many thanks to Matt Farrell at Duke and Kari Smith at MIT for presenting with me, and to Farrell again for putting together and directing our panel.
I have pictured above a synopsis of my disk imaging workflow, which relies strongly on student help. Students take floppy disks from initial photography to imaging, mount testing, documentation of the results, and rehousing.
The disk is photographed to record the labeling information, which can be quite extensive (one collection of media art contains many disks with a printout of the disk’s file listing, along with the official label of the artists’ studio). A photograph also provides a visual reference for the future should we try to relocate the original media.
Students are trained to connect the KryoFlux floppy disk controller and a vintage disk drive to the host machine, which runs a copy of BitCurator. They are also trained on the KryoFlux GUI, and use this software for most of the disk imaging. While we have the FC5025 controller and software, KryoFlux is the device of choice given its greater versatility and ability to record flux timings for each track of the floppy disk. Within the KryoFlux GUI, students set the encoding formats to MFM (modified frequency modulation, a common coding for many IBM PC formatted disks) and the preservation stream by default, as this produces an accurate image for most disks. In the case that it does not, students are trained to run other disk encodings against the preservation stream files to attempt a positive disk image that can be mounted. In most cases, the disk turns out to be in a less common Apple or Commodore format.
The resulting log file, preservation tracks, image file, and disk photograph are saved in a folder. The student then runs a mount test through BitCurator’s mounting script. The results of the student’s disk imaging run are recorded in a row in a collection spreadsheet, which denotes the disk image name, date, their name, the disk drive used (we have different drives in use), the disk imaging device used, any bad sectors found, whether the disk mounts, and if there is a photograph accompanying the disk. The student than rehouses the disk in a Hollinger box. Disk collections which contain office documents and correspondence are candidates for floppy disk deaccessioning; collections for which the floppies are integral to the content and process of the donator, such as the media art collection mentioned above, are not.
I check the students’ work at the end of the week. If disks have bad sectors or do not mount I investigate those disk images and attempt new reads if necessary. Besides the local backup, I run BagIt on the cumulative work of the student every week and upload that bag to our servers, which run their own backup routine. When the collection is complete, I do the same for the entire collection and remove the in-progress bags.
This workflow emphasizes the capture of disk images foremost. Analysis and description of the disk content is provided intermittently by myself, and batch runs of bulk_extractor and fiwalk on the disk images will be performed at a later time. Import into a formal digital archive software will occur down the road as well – a work in progress for us presently. I hope this workflow can help others who may not have a repository in place, but nevertheless need to rescue content from legacy media in a manner that will allow more refined processing in the future.