WARCs! What are they good for? Researchers!

I recently attended the IIPC (International Internet Preservation Consortium) conference in lovely Reykjavik from April 15th to the 17th. This was the first official year of the IIPC conference and it was a great opportunity for institutions of all sizes to get together and talk about the challenges facing web archiving today, and to strategize about the path forward. The presentations covered a wide variety of topics, but I think the ones that I found most helpful were those that focused on how researchers interact with web archives. Continue reading

A METRO Retrospective

I attended METRO’s annual conference on 1/21. METRO is the Metropolitan New York Library Council, and as members, RAC staff is open to attend any of their events. There were a lot of fantastic panels and speakers at the panel this year, but I’d like to focus on an overarching theme that I picked up on this year: getting our systems to communicate nicely with each other can streamline our work processes and improve our work as archivists.

Continue reading

SAA Report: Navigating Scandal

Somewhat surprisingly, one of my favorite panels of Archives 2015 had very little to do with the actual work that I do day to day. Session 702, Controversial Crawling: Documenting University Scandal in Real Time, dealt with the practical issues of trying to capture internal and public discussions of university controversies on the web.

I often feel like archives shy away from documenting and seeking out controversial source materials, in many ways because of institutional pressure from invested parties that do not want those controversies kept in perpetuity. However, this panel offered a refreshing take on scandal, by explaining exactly how to three different web archivists selected and collected materials pertaining to institutional scandals, sometimes even against the wishes of those higher up in the organization. Continue reading

Reconciling Large Corporate Name Datasets

Over the weekend, we finished up a year-long project to import description for almost every single grant record the Ford Foundation ever gave. This is the same project that I wrote a post about last October. To refresh your memory, we started with 54,644 grant files described in an Excel spreadsheet, and we wanted to transform much of that data into EAD, and then import it into ArchivesSpace. Normally this project wouldn’t require an entire year, but we realized over the course of the project that we did not have efficient ways to reconcile our structured data against Library of Congress vocabularies. The post in October laid out our methods for reconciling subjects against LoC data; this post will detail the methods we took to reconcile corporate names against the LCNAF. Continue reading

From AT to AS Part 3: Training and Customization

CUSTOMIZING THE APPLICATION – 22 hours in 4 months

While we were mostly happy with the base ArchivesSpace application, we did want to make a few changes to the display and functionality in order to make it more user-friendly. I started out by referencing the Customizing and Theming ArchivesSpace documentation as well as the developer screencasts. Continue reading

From AT to ArchivesSpace Part 2: Migrations and Error Reporting

Migration Testing, Data Quality Checks, and Troubleshooting Errors – 295 hours in 8 months

After finishing the initial data cleanup, it was time to start testing our migration; the only way to identify major issues was to do a dry run. To set up for our initial testing, I took a MySQL dump of our AT database, loaded it up into an empty AT instance, and then installed the AT Migration Plugin. To install the AT Migration Plugin, just place the scriptAT.zip in your base AT plugins folder, either on a server or on your local machine.

Our first migration test did not go smoothly. Continue reading

SAA Report: Getting Things Done with Born-Digital

One of the first sessions I attended at this year’s SAA annual meetings was “Getting Things Done with Born-Digital Collections,” and it stuck with me as a great entry-level review of how to deal with born digital materials in a variety of different institutional environments. It also introduced tools to help archivists jump into their work, while providing some advice for those looking implement or expand born-digital programs. Many of the following tools/concepts may seem familiar in the work that we do here at the RAC.

The panel included five panelists: Gloria Gonzalez, Jason Evans Groth, Ashley Howdeshell, Dan Noonan, and Lauren Sorensen. While all of the panelists covered slightly different experiences, there was one universal takeaway: preserving digital collections needs to be an institutional endeavor, and in many cases, that endeavor is a constant work-in-progress, from tools to processes.

Continue reading