As part of the cost assessment project we recently underwent, I was asked to provide a count of how many items and pages were digitized during each fiscal year that the Special Projects Digitization Program has been in operation.
To do this, I first reviewed all available documentation of which projects have been undertaken, when they were started, how much material was estimated for digitization, and what files were to be produced. I used this to draw up a rough framework to guide my approach to counting specific files within the various structures used to store the many digital surrogates produced. Given the variety of source media, derivatives created, and file hierarchies used, this step was necessary to create meaningful data that could be complied across all of the variations, and to provide a reference point to check the very large numbers I would be coming up with by using the computer to count files.
I then searched each set of files (which are separated in to folders by project, stored in a few different places on the server) for whichever derivatives would best represent items digitized and pages digitized. To count items and pages for the FCD project, for example, I searched for the number of PDF files and split that number in half, since each item received a multi-page PDF service copy, which I knew to be duplicated by way of our requested file structure. I then searched the number of TIFF files, since each page was given its own master file. These two searches, performed on all server locations and hard drives holding FCD files, produced a count of items and pages, respectively. I also noted during which fiscal year they were created. By coming up with similar searches relevant to each of the other projects I was able to produce figures that could be tabulated across the history of the program by item, page and fiscal year.
It is important to note certain irregularities. Firstly, what is considered an “item” varies in a way that I could not account for with the information available to me. For example, sometimes an item is a single photo, or a multi-page document, or an entire folder or complied scrapbook. Other numbers had to be estimated since the projects they represent are not yet resolved.
Items and Pages Digitized for Special Projects (as of 3/13)
By Fiscal Year (figures are approximate):
- Fiscal Year 2008-09: 1,105 items (120,000 pages)
- Fiscal Year 2010-11: 879 items (6,637 pages)
- Fiscal Year 2011-12: 6,979 items (17,258 pages)
- Fiscal Year 2012-13: 3,212 items (124,728 pages)
By Project:
- RFOD
- Fiscal Year: 2008-09
- File Count: 1,105 documents (WO estimate: 120,000 pages)
- Karolinska
- Fiscal Year: 2010-11
- File Count: 23 items (48 pages)
- RU Portraits
- Fiscal Year: 2010-11
- File Count: 108 photos
- Rainbow Room Scrapbook
- Fiscal Year: 2010-11
- File Count: 1 item (238 pages)
- RBF Annual Reports
- Fiscal Year: 2010-11
- File Count: 47 items (3,243 pages)
- RFOD Portraits
- Fiscal Year: 2011-12
- File Count: 233 items (237 pages)
- PUMC
- Fiscal Year: 2011-12
- File Count: 2,670 documents (7,945 pages), 2,576 photos
- FCD
- Fiscal Year: 2012-13
- File Count: 1,212 documents (116,228 pages)
- Rockefeller Foundation Centennial (includes in house, over sized, & TB Posters)
- Fiscal Year: 2010-11: apx. 700 items (3,000 pages)
- Fiscal Year: 2011-12: apx. 1,500 items (6,500 pages)
- Fiscal Year: 2012-13: apx. 2,000 items (8,500 pages)
- File Count: 4,216 items (18,058 pages)