We have just partnered with Archive-It to begin our Web Archiving Program. Web archiving is the capture, management and preservation of websites and web resources; Archive-It is a web-based subscription service from the Internet Archive that will capture digital content for the Rockefeller Archive Center. Archive-It will host and store the crawled website files, called Web ARChive or WARC files, at the Internet Archive data centers and provide the Rockefeller Archive Center with a copy of those WARC files. We will in turn preserve these electronic records using Archivematica our digital preservation management system.
The flexibility of Archive-It provides some excellent features that can be customized to fit our needs. For example, we can set crawl frequencies and define the depth and scope of the content harvested. The description component of the service includes cataloging with Dublin Core metadata and additional customizable fields. The web access component is another great feature that Archive-It provides. It will display the archived website and allow for full text searchability of the web collections.
Content harvested will include our own website and the websites of donors who want to have their website managed and preserved as a digital institutional memory record. It is important to begin web archiving not only to enhance traditional archival collections, but also to maintain functional records management processes to capture digital records with enduring historical and institutional value. Our initial testing includes a successful capture of our institution’s website. With our first capture completed, workflows and standardized policies are on their way to making our Web Archive Program a viable service for our donors.
If you are interested in exploring our activities with Archive-It’s web archiving service, visit the Rockefeller Archive Center collection page on Archive-It and browse the most current web crawl!