I recently attended the IIPC (International Internet Preservation Consortium) conference in lovely Reykjavik from April 15th to the 17th. This was the first official year of the IIPC conference and it was a great opportunity for institutions of all sizes to get together and talk about the challenges facing web archiving today, and to strategize about the path forward. The presentations covered a wide variety of topics, but I think the ones that I found most helpful were those that focused on how researchers interact with web archives. Continue reading
Today I’m going to write about something that came up at Code4Lib, has been on my mind recently, and is near and dear to my heart: authority reconciliation. You might remember this blog post from almost a year ago? If you don’t remember, or don’t have time to go back and read it, I’ll summarize the gist pretty quickly: reconciling large amounts of data against established authorities is complicated, time-consuming, and often frustrating, but we do the best that we can. I think my own experience in doing this type of work is why Christina Harlow’s talk “Get Your Recon” resonated so strongly with me. Continue reading
I attended METRO’s annual conference on 1/21. METRO is the Metropolitan New York Library Council, and as members, RAC staff is open to attend any of their events. There were a lot of fantastic panels and speakers at the panel this year, but I’d like to focus on an overarching theme that I picked up on this year: getting our systems to communicate nicely with each other can streamline our work processes and improve our work as archivists.
Somewhat surprisingly, one of my favorite panels of Archives 2015 had very little to do with the actual work that I do day to day. Session 702, Controversial Crawling: Documenting University Scandal in Real Time, dealt with the practical issues of trying to capture internal and public discussions of university controversies on the web.
I often feel like archives shy away from documenting and seeking out controversial source materials, in many ways because of institutional pressure from invested parties that do not want those controversies kept in perpetuity. However, this panel offered a refreshing take on scandal, by explaining exactly how to three different web archivists selected and collected materials pertaining to institutional scandals, sometimes even against the wishes of those higher up in the organization. Continue reading
Over the weekend, we finished up a year-long project to import description for almost every single grant record the Ford Foundation ever gave. This is the same project that I wrote a post about last October. To refresh your memory, we started with 54,644 grant files described in an Excel spreadsheet, and we wanted to transform much of that data into EAD, and then import it into ArchivesSpace. Normally this project wouldn’t require an entire year, but we realized over the course of the project that we did not have efficient ways to reconcile our structured data against Library of Congress vocabularies. The post in October laid out our methods for reconciling subjects against LoC data; this post will detail the methods we took to reconcile corporate names against the LCNAF. Continue reading
CUSTOMIZING THE APPLICATION – 22 hours in 4 months
While we were mostly happy with the base ArchivesSpace application, we did want to make a few changes to the display and functionality in order to make it more user-friendly. I started out by referencing the Customizing and Theming ArchivesSpace documentation as well as the developer screencasts. Continue reading
Migration Testing, Data Quality Checks, and Troubleshooting Errors – 295 hours in 8 months
After finishing the initial data cleanup, it was time to start testing our migration; the only way to identify major issues was to do a dry run. To set up for our initial testing, I took a MySQL dump of our AT database, loaded it up into an empty AT instance, and then installed the AT Migration Plugin. To install the AT Migration Plugin, just place the scriptAT.zip in your base AT plugins folder, either on a server or on your local machine.
Our first migration test did not go smoothly. Continue reading
About a year and a half ago, we decided that we were going to make the switch from our current Archival Management System, the no-longer-updated Archivists Toolkit, to its next iteration, ArchivesSpace. The RAC officially began preparing ourselves for ArchivesSpace in January of 2014. Continue reading
You might remember that earlier this year I wrote a post about Metadata Cleanup and Ford Foundation Grants that gave a very basic overview of how I went about reconciling thousands of subject terms against the Library of Congress. This reconciliation was essential in helping us gain control over data that we did not create, but that we also identified as possibly extremely valuable to researchers. This post will give an in-depth and updated account of how I hobbled together a (mostly) automatic way to check large amounts of topical terms against the Library of Congress. It still requires some hands-on work and quality checking is a must, but it cut a hundreds of hours job down exponentially.
One of the first sessions I attended at this year’s SAA annual meetings was “Getting Things Done with Born-Digital Collections,” and it stuck with me as a great entry-level review of how to deal with born digital materials in a variety of different institutional environments. It also introduced tools to help archivists jump into their work, while providing some advice for those looking implement or expand born-digital programs. Many of the following tools/concepts may seem familiar in the work that we do here at the RAC.
The panel included five panelists: Gloria Gonzalez, Jason Evans Groth, Ashley Howdeshell, Dan Noonan, and Lauren Sorensen. While all of the panelists covered slightly different experiences, there was one universal takeaway: preserving digital collections needs to be an institutional endeavor, and in many cases, that endeavor is a constant work-in-progress, from tools to processes.