WARCs! What are they good for? Researchers!

I recently attended the IIPC (International Internet Preservation Consortium) conference in lovely Reykjavik from April 15th to the 17th. This was the first official year of the IIPC conference and it was a great opportunity for institutions of all sizes to get together and talk about the challenges facing web archiving today, and to strategize about the path forward. The presentations covered a wide variety of topics, but I think the ones that I found most helpful were those that focused on how researchers interact with web archives.

On the first day, Ian Milligan and Matt Webber spoke about using the hackathon model as a tool for engaging scholars and getting them hands-on experience working with web archives. The conceit was that many historical scholars do not know exactly how to handle and use the mass of information that we can provide them in web archives, but using a collaborative hackathon model can help them jump into the work in a structured environment while gathering feedback about what could improve the research experience. Matt and Ian drew people from a host of different disciplines with a wide variety of technical skills (some had never opened a command line) and had them jump into large data analysis with web archives, indoctrinating them into the materials and type of work they can do. They created engagement by letting scholars make new connections with each other and diving deep into the datasets where they didn’t know how to without some mediation. This was really helpful in thinking about how we can entice researchers to use our web archives, and help explain the historical value of them to our donors. You can see some of the results of the hackathon here.

The second day of the conference had an entire block dedicated to web archives and researcher needs, kicked off by a great talk from Matt Webber and Pamela Graham, “The State of Researcher Use of Web Archives.” Matt and Pamela recognized that institutions needed to better understand how researchers use and collect web archives, so they sought: more insight on use cases, current practices and frequency of use, and current metadata/descriptive practices of creators. They collected survey results from 236 respondents (79 librarians/archivists, 157 researchers), and came out with a few key takeaways: researchers are not leveraging existing resources (49% create their own web archives), librarians often have what researchers need but researchers don’t know that, and that targeted outreach can give meaningful engagement. Researchers want to view web archives as a browser that can go back in time; they want exploratory research, but some want to aggregate full-text analysis or retrieve old documents. Some of the major barriers to researcher use of web archives are: lack of metadata, lack of tools to access the data, researchers find it hard to find relevant materials, lack of tools to analyze the data, and the researchers just find the datasets too large. Essentially, we have to be better about promoting what we have, what we can do with that data, and how web archives can help with historical research.

Finally, Jefferson Bailey of the Internet Archive closed out the talks on researchers and web archives with a great talk about Program Models for Research Services. Jefferson started his talk on the technical issues often inherent with research use of web archives: the WARC file format is not widely known and researchers don’t know how to use it, the size of the data, the infrastructure requirements necessary for providing access to web archives, and the different types of web archives data creates some questions about what a web archive actually is. Jefferson then went through a number different approaches and solutions to web archiving models of research, talking about how to best get your data to the researchers, and during this presentation, gave some great information about what the Internet Archive has learned about its users. Web Archives researchers are very concerned about what we do not capture and preserve, much more so than researchers working with analog materials. He also reminds institutions that our web archives are there to extend and enhance our other collections, and more data does not necessary mean better researcher or analysis. He ended with the takeaway that research support expectations often exceed available resources, but if we focus on derivation, portability, and access, we’re over halfway there to giving researchers what they need to get the most out of our web archives.

Ultimately, the conference gave me a lot to think about in regards to how we should be making our web archives available to researchers when we start ramping up our web archiving efforts. Web archives are rich in historical data, and we need to make sure we create a web archive that researchers feel comfortable using.

Leave a Reply

Your email address will not be published. Required fields are marked *