Automating Archivematica Ingests

In November and December of 2018, we ingested 189 Rockefeller Foundation officers’ diaries into Archivematica, making access PDFs available on DIMES. These diaries recorded the day-to-day contacts and impressions of officers of the Rockefeller Foundation who visited educational, scientific and research institutions and laboratories throughout the United States and internationally. The fact that we were able to preserve and make available so many digital objects in such a short timespan is also a testament to how far we’ve come in our Archivematica use in the past several years.

In 2014, we ingested diaries from 29 diarists into Archivematica. These had been scanned from microfilm, and were digitized by a vendor, and were linked to their description in Archivist’s Toolkit via an integration with Archivematica. At that point, our use of Archivematica was very manual. An archivist would “transfer” each diary in the Archivematica dashboard, and choose an option at each decision point for each microservice. Additionally, the archivist would manually add PREMIS rights information in the Archivematica UI. Ingesting these diaries took a full year, and required hands-on attention from an archivist.

Since then, Archivematica–and our use of it–has evolved. (And, of course, we’ve moved from the Archivist’s Toolkit to ArchivesSpace.) Since Archivematica 1.2 (released in September 2014), there have been many updates to the software, but two of the most relevant to this project were sponsored by the RAC. This includes an integration with ArchivesSpace to send DIP object metadata to ArchivesSpace, either through manual matching in the dashboard or by including a csv file with ArchivesSpace RefIds, as well as allowing PREMIS rights information to be included as a csv file.

To prepare the Rockefeller Foundation officers’ diaries for ingest, I first ran a script to copy all digitized files to directories that were compliant with the Archivematica transfer structure. I then ran a script to create the ArchivesSpace metadata file, matching the RefIDs provided by Kanisha with the file path to the access PDF for each diary. Finally, I ran a third script to create a PREMIS rights metadata file, which parsed the dates in the diaries’ filenames to create machine-actionable rights statements. These scripts are all available on our GitHub.

To start the Archivematica ingest, I ran a script Hillel wrote to automatically start and approve a transfer using the Archivematica dashboard API. I had set the Archivematica processing configuration to automate each decision point, so no manual action was required after running the script. I further modified the script to pause for several minutes, so it could safely loop through a set of transfers without overwhelming Archivematica, further automating this work. The final piece of automation we had set up-–which was done by Jose–was moving access objects from the Archivematica DIP Store to our web server, so that the PDFs were accessible in DIMES.

While setting up these automated workflows took time, it was thrilling to see how quickly we got these diaries ingested and online. By taking the time to pause ingests into Archivematica to review and improve workflows, we were able to greatly increase how quickly and efficiently we could complete these tasks–which will carry ahead to future digitization and digital preservation work.

Additionally, our digitization team did an amazing job at getting these diaries–39,131 pages total–digitized so quickly. The diarists that were added in 2018 are:

#Archivematica #digitization

Software and Systems