Web Archives – Where Do We Go From Here?

We’ve recently been thinking a lot about the potentials of web archives here at the RAC. Last week, I attended the appropriately titled web archives WARCshop hosted by Penn State University, and, while the organizers focused on getting its participants hands-on experience working with web archives research tools, the lesson that I took away from it is that libraries and archivists still have a long way to go in fully supporting researchers working with web archives. Penn State invited a great group of librarians and archivists to learn, as well as Jefferson Bailey and Lori Donovan from Archive-It, Nick Ruest from York University, and Ian Milligan from Waterloo to help lead the workshop. I was personally very excited for the meeting because I think Nick and Ian have been doing some of the most exciting research in web archives for the past few years, and I always love hearing them talk.

We all originally came to learn to get hands-on with tools like Warcbase, Archive-It Research Services, Gephi, and Kibana, but I think some of the most interesting conversations revolved around how we could help make web archives available for research, in every sense of the word. The things you can do with warc, wat, and wane files are undoubtedly cool, but the barrier to entry for researching web archives is almost undoubtedly higher than working with other materials. It took a room full of technology-savvy people three or four hours to get all of the tools installed with the help of experts, so expecting a researcher to be able to do so on their own is probably unlikely. Additionally, not many academic researchers are going to get the leeway to spend six months learning about web archives and how to research them only to find out that a collection doesn’t have much useful information. How could we introduce researchers to the potential of web archives as a resource potential? How do we let researchers know that working with web archived data is more than just replaying old archived websites? How do our reference, access, and outreach staff get themselves up to speed on web archives technology enough to even lead and have these discussions with researchers?

Furthermore, during the course of the workshop it became clear that only York University were even making their WARC files accessible for download to outside researchers without someone having to ask for them first. It struck me that researchers are going to have a hard time doing new research if they do not even know they can have access to the digital files needed for their research. But, before we even get to reference and access, a lot of attendees still had questions about how to even describe web resources in an archival context. The general consensus was that they don’t fit well with traditional finding aids, but divorcing them from a finding aid removes them from the overall context of their creation alongside other institutional records. How can we even begin to provide access if we do not even understand how we’re going to describe the materials we’re supposed to be providing access to?

I went into the workshop excited to learn about what I could do to interact with web archives, and I came out knowing more, but with even more questions about the future. To me, web archives feel like the frontier of library and archive science and we, as a community, need to make some hard decisions and whip ourselves into shape before we can even begin to expect researchers to use these awesome resources like we expect them to. I know we’re trying to work through some of these issues here at the RAC for our burgeoning web archives program, but, as always, I’d love to hear from others about their thoughts and how they’re working with web archives. Comment, email, or tweet me because I’d love to talk to all of you out there.

Leave a Reply

Your email address will not be published. Required fields are marked *