A few months ago, I wrote about selected digitization readings and how we were going to use them to overhaul our digitization workflows. We’re now a couple of months into our new digitization workflow, and things are starting to run smoothly, but during the process, we noticed that we wanted a better way to match our digitized files to their description without using semantic filenames or separate metadata sheets. Continue reading
Today I’m going to write about something that came up at Code4Lib, has been on my mind recently, and is near and dear to my heart: authority reconciliation. You might remember this blog post from almost a year ago? If you don’t remember, or don’t have time to go back and read it, I’ll summarize the gist pretty quickly: reconciling large amounts of data against established authorities is complicated, time-consuming, and often frustrating, but we do the best that we can. I think my own experience in doing this type of work is why Christina Harlow’s talk “Get Your Recon” resonated so strongly with me. Continue reading
Over the weekend, we finished up a year-long project to import description for almost every single grant record the Ford Foundation ever gave. This is the same project that I wrote a post about last October. To refresh your memory, we started with 54,644 grant files described in an Excel spreadsheet, and we wanted to transform much of that data into EAD, and then import it into ArchivesSpace. Normally this project wouldn’t require an entire year, but we realized over the course of the project that we did not have efficient ways to reconcile our structured data against Library of Congress vocabularies. The post in October laid out our methods for reconciling subjects against LoC data; this post will detail the methods we took to reconcile corporate names against the LCNAF. Continue reading
Migration Testing, Data Quality Checks, and Troubleshooting Errors – 295 hours in 8 months
After finishing the initial data cleanup, it was time to start testing our migration; the only way to identify major issues was to do a dry run. To set up for our initial testing, I took a MySQL dump of our AT database, loaded it up into an empty AT instance, and then installed the AT Migration Plugin. To install the AT Migration Plugin, just place the scriptAT.zip in your base AT plugins folder, either on a server or on your local machine.
Our first migration test did not go smoothly. Continue reading
Since the RAC was founded in 1974, we’ve collected information about our researchers, including contact details, visit dates, topics of research, and publications. Starting in the 1980s, we captured this information in a number of different databases. Currently, this data is stored in ATReference, a customized version of the Archivists’ Toolkit that was developed by the RAC. As part of both the Aeon and ArchivesSpace implementation processes, we needed to migrate that data forward into Aeon so we can continue to access and add to the wealth of information it contains without having to support ATReference. Continue reading
About a year and a half ago, we decided that we were going to make the switch from our current Archival Management System, the no-longer-updated Archivists Toolkit, to its next iteration, ArchivesSpace. The RAC officially began preparing ourselves for ArchivesSpace in January of 2014. Continue reading
You might remember that earlier this year I wrote a post about Metadata Cleanup and Ford Foundation Grants that gave a very basic overview of how I went about reconciling thousands of subject terms against the Library of Congress. This reconciliation was essential in helping us gain control over data that we did not create, but that we also identified as possibly extremely valuable to researchers. This post will give an in-depth and updated account of how I hobbled together a (mostly) automatic way to check large amounts of topical terms against the Library of Congress. It still requires some hands-on work and quality checking is a must, but it cut a hundreds of hours job down exponentially.