0% found this document useful (0 votes)
11 views264 pages

Ross Harvey - Preserving Digital Materials-De Gruyter Saur (2011)

The document is the second edition of 'Preserving Digital Materials' by Ross Harvey, focusing on digital preservation in libraries and information practices. It covers various aspects such as changing preservation paradigms, the importance of preserving digital materials, and the strategies and challenges involved in the preservation process. The book includes chapters on the attributes of digital materials, selection for preservation, and an overview of digital preservation strategies and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views264 pages

Ross Harvey - Preserving Digital Materials-De Gruyter Saur (2011)

The document is the second edition of 'Preserving Digital Materials' by Ross Harvey, focusing on digital preservation in libraries and information practices. It covers various aspects such as changing preservation paradigms, the importance of preserving digital materials, and the strategies and challenges involved in the preservation process. The book includes chapters on the attributes of digital materials, selection for preservation, and an overview of digital preservation strategies and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 264

Ross Harvey

Preserving Digital Materials


Current Topics
in Library and Information Practice

De Gruyter Saur
Ross Harvey

Preserving
Digital Materials
2nd Edition

De Gruyter Saur
ISBN 978-3-11-025368-9
e-ISBN 978-3-11-025369-6
ISSN 2191-2742

Library of Congress Cataloging-in-Publication Data

Harvey, D. R. (Douglas Ross), 1951-


Preserving digital materials / Ross Harvey. -- 2nd ed.
p. cm. -- (Current topics in library and information practice)
Includes bibliographical references and index.
ISBN 978-3-11-025368-9 (acid-free paper) -- ISBN 978-3-11-025369-6 (ebook)
1. Digital preservation. I. Title.
Z701.3.C65H37 2011
025.8‘4--dc23
2011032053

Bibliographic information published by the Deutsche Nationalbibliothek


The Deutsche Nationalbibliothek lists this publication in the Deutsche
1DWLRQDOELEOLRJUD¿HGHWDLOHGELEOLRJUDSKLFGDWDDUHDYDLODEOHLQWKH,QWHUQHW
at http://dnb.d-nb.de.
© 2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston
Typesetting: Dr. Rainer Ostermann, München
Printing: Hubert & Co. GmbH & Co. KG, Göttingen
’3ULQWHGRQDFLGIUHHSDSHU
Printed in Germany
www.degruyter.com
Contents

List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1
What is Preservation in the Digital Age? Changing Preservation
Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Changing paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
The need for a new preservation paradigm . . . . . . . . . . . . . . . . . . . . . . . 10
Changing definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Preservation definitions in the digital world . . . . . . . . . . . . . . . . . . . . . . 16
What exactly are we trying to preserve? . . . . . . . . . . . . . . . . . . . . . . . . . 21
How long are we preserving them for? . . . . . . . . . . . . . . . . . . . . . . . . . . 23
What strategies and actions do we apply? . . . . . . . . . . . . . . . . . . . . . . . . 24
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Chapter 2
Why do we Preserve? Who Should do it? . . . . . . . . . . . . . . . . . . . . . . 25
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Why preserve digital materials? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Professional imperatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
New stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
How much data have we lost? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Current state of awareness of digital preservation problems. . . . . . . . . . . 37
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Chapter 3
Why There’s a Problem: Digital Artifacts and Digital Objects . . . . . . . 39
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Modes of digital death . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Digital storage media. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Magnetic media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Optical disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
The future for digital storage media . . . . . . . . . . . . . . . . . . . . . . . . . 49
VI Contents

Digital objects – more than digital artifacts. . . . . . . . . . . . . . . . . . . . . . . 50


Loss of functionality of access devices . . . . . . . . . . . . . . . . . . . . . . . 51
Loss of manipulation and presentation capabilities . . . . . . . . . . . . . . 52
Weak links in the documentation chain and loss of contextual
information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4
Selection for Preservation – The Critical Decision. . . . . . . . . . . . . . . . 56
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Selection for preservation, cultural heritage, and professional practice. . . 57
Selection criteria traditionally used by libraries and archives . . . . . . . . . . 59
Why traditional selection criteria do not apply to digital materials . . . . . . 63
IPR, context, stakeholders, and lifecycle models. . . . . . . . . . . . . . . . . . . 65
Intellectual property rights and legal deposit. . . . . . . . . . . . . . . . . . . 65
Context and community. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Stakeholder input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Value of lifecycle models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Developing selection frameworks for preserving digital materials . . . . . . 69
Some selection frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
How much to select? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Chapter 5
What Attributes of Digital Materials Do We Preserve? . . . . . . . . . . . . 75
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Digital materials, technology, and data. . . . . . . . . . . . . . . . . . . . . . . . . . 77
The importance of preserving context. . . . . . . . . . . . . . . . . . . . . . . . . . . 79
The OAIS Reference Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
The role of metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Preservation metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Preservation metadata standards. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Persistent identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Authenticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Significant properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Research into authenticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Functional Requirements for Evidence in Recordkeeping Project
(Pittsburgh). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
InterPARES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Trusted digital repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Contents VII

Chapter 6
Overview of Digital Preservation Strategies . . . . . . . . . . . . . . . . . . . . 99
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Historical overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Who is doing what?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Criteria for effective strategies and practices. . . . . . . . . . . . . . . . . . . . . . 107
Broader concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Typologies of principles, strategies, and practices. . . . . . . . . . . . . . . . . . 114
A typology of digital preservation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Chapter 7
‘Preserve Technology’ Approaches: Tried and Tested Methods. . . . . . 121
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
‘Non-solutions’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Do nothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Storage and handling practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Durable/persistent digital storage media . . . . . . . . . . . . . . . . . . . . . . 127
Analogue backups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Digital archaeology and digital forensics . . . . . . . . . . . . . . . . . . . . . 130
‘Preserve technology’ approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Technology preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Technology watch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
The Universal Virtual Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Chapter 8
‘Preserve Objects’ Approaches: New Frontiers? . . . . . . . . . . . . . . . . . 140
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
‘Preserve Objects’ approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Bit-stream copying, refreshing, and replication . . . . . . . . . . . . . . . . . . . . 142
Bit-stream copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Refreshing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Standard data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
File format registries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
VIII Contents

Standardizing file formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150


Restricting the range of file formats . . . . . . . . . . . . . . . . . . . . . . . . . 152
Developing archival file formats . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Viewers and migration on request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Combining principles, strategies, and practices. . . . . . . . . . . . . . . . . . . . 165
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Chapter 9
Digital Preservation Initiatives and Collaborations . . . . . . . . . . . . . . . 168
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Typologies of digital preservation initiatives . . . . . . . . . . . . . . . . . . . . . 171
International initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . 172
International services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The Internet Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
JSTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
DuraSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
LOCKSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
MetaArchive Cooperative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
International alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
UNESCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
PADI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
OCLC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
CAMiLEON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
International Internet Preservation Consortium . . . . . . . . . . . . . . 183
Regional initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Regional services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
NEDLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Regional alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
ERPANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
European Commission-funded projects . . . . . . . . . . . . . . . . . . . 186
Digital Recordkeeping Initiative . . . . . . . . . . . . . . . . . . . . . . . . 188
National initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 187
National services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
AHDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Florida Digital Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
National alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Digital Curation Centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Digital Preservation Coalition . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Contents IX

NDIIPP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
National Digital Stewardship Alliance . . . . . . . . . . . . . . . . . . . . 193
HathiTrust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Sectoral initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Sectoral services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Cedars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Sectoral alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
JISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Chapter 10
Challenges for the Future of Digital Preservation . . . . . . . . . . . . . . . . 199
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
What have we learned so far? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Four major challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Challenge 1: managing digital preservation . . . . . . . . . . . . . . . . . . . 206
Challenge 2: funding digital preservation . . . . . . . . . . . . . . . . . . . . . 208
Challenge 3: peopling digital preservation . . . . . . . . . . . . . . . . . . . . 211
Challenge 4: making digital preservation fit . . . . . . . . . . . . . . . . . . . 213
Research and digital preservation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Conclusion: the future of digital preservation . . . . . . . . . . . . . . . . . . . . . 219

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
List of Figures

Figure 1.1 Selected Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19


Figure 1.2 Selected Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 3.1 Threats to Digital Continuity . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 3.2 Comparison of Data Carriers . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 3.3 Sample Generic Figures for Lifetimes of Media . . . . . . . . . 47
Figure 5.1 Deciding on Essential Elements . . . . . . . . . . . . . . . . . . . . . 93
Figure 6.1 Factors to Consider when Selecting Digital
Preservation Technologies. . . . . . . . . . . . . . . . . . . . . . . . . 109
Figure 7.1 Environmental Storage Conditions for Some Digital
Storage Media. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Figure 7.2 Storage and Handling of Digital Storage Media . . . . . . . . . . 127
Figure 8.1 Formats – Open and Proprietary . . . . . . . . . . . . . . . . . . . . 145
Figure 9.1 Initiatives and Collaborations . . . . . . . . . . . . . . . . . . . . . . 172
Introduction
This book is a revision of Preserving Digital Materials published in 2005
(Harvey, 2005b). The first edition was well received, one author describing it
as a ‘comprehensive examination of the landscape of preservation’ (Ross, 2007).
An update is timely, because there has been significant change in the field
since 2005 as the experience of practitioners expands and the findings of re-
searchers accumulate. This second edition has the same aims as the first: to
provide an introduction to the preservation of digital materials in order to inform
practice in cultural heritage institutions, and to provide a framework within
which to reflect on digital preservation issues. It is intended for a similar audi-
ence as the first edition – information professionals who seek a reference text,
practitioners who want to reflect on the issues, and students in the field of digital
preservation. It differs from the first edition in four principal respects:
– It provides a more international perspective. The first edition described
Australian activities in detail, whereas the second edition gives attention to
major initiatives in the UK, the EU and the US since 2005.
– It expands the audience to include information professionals working in
environments other than libraries and recordkeeping organizations, as well
as those who create digital materials. Digital preservation is considered in-
creasingly as being within the purview of scientists, scholars and individu-
als, not just of professionals employed in the institutional settings of librar-
ies and archives.
– It takes account of developments since 2005. These include, in particular,
the consolidation of digital preservation practice (so that we can now begin
to discuss ‘standard’ practice), developments that result from the activities
of bodies such as the Joint Information Systems Committee (JISC) and the
Digital Curation Centre (DCC) in the UK, and research and development
projects funded by the EU and, in the US, the Library of Congress and the
National Science Foundation (NSF). Specific topics that are added to or
given greater emphasis in the second edition include cost modeling and the
cost of digital preservation, skills identification and education and training
requirements and initiatives, personal digital archiving, and models such as
lifecycle models and the OAIS Reference Model.
– It also takes account of significant publications since 2005, such as Long-
Term Preservation of Digital Documents: Principles and Practices (Borg-
hoff et al., 2006), Digital Preservation (Deegan and Tanner, 2006), Preserv-
ing Digital Information (Gladney, 2007), the Workbook on Digital Private
Papers (Paradigm Project, 2008), and my own Digital Curation (Harvey,
2010).
2 Introduction

Preserving Digital Materials investigates the current practice of those who


preserve digital materials – information professionals (librarians, recordkeep-
ing professionals, museum professionals), scholars and scientists, and indi-
viduals. A strong claim can be made that preservation of digital materials is the
single most serious issue faced by information professionals and that it is also
of considerable importance to scholars and scientists, as well as to individuals.
Much information about digital preservation is available in print and on the
web, and the quantity is increasing, but most practitioners do not have the time
or the technical expertise to evaluate and synthesize it. Preserving Digital
Materials fills this gap in the literature. It provides an introduction to the prin-
ciples, strategies and practices applied by information professionals (librarians,
recordkeeping professionals, museum professionals), scholars and scientists,
and individuals to the preservation of digital materials. It aims to improve digital
preservation practice by focusing on current practice, taking stock of what we
know about the principles, strategies and practices that prevail, and describing
the outcomes of recent and current research.
Digital preservation poses many challenges. From a university newsletter
comes this comment: ‘why not embrace the digital future now? The issue of
preservation is one of the main obstacles’ (Shaw, 2010). There is, increasingly,
comment on digital preservation concerns in the popular press and the blogo-
sphere, of which the following is just one of many examples: Thorpe (2011)
reports in The Observer on the ‘Race to save digital art from the rapid pace of
technological change’. The National Library of Australia’s Digital Preserva-
tion Policy neatly sums up the challenges that institution faces in keeping digital
information materials accessible, all of which are faced by a very wide range
of institutions and individuals:

– The volume of materials to be maintained


– The diverse and frequently changing range of file formats and standards, and the
changing availability of hardware, software and other technology required for access
– Widespread use of relatively unstable carriers, subject to short-term media deterio-
ration and data corruption or loss
– The need for preservation decisions to be made early in the life cycle of digital objects
– For some materials, relatively long delays between their creation and their being
acquired and controlled …
– Uncertainty about the significant properties or essential characteristics that must be
maintained for some digital resources
– The need to maintain relationships between objects, between parts of complex ob-
jects which may be in different formats, and between objects and the metadata that
describes them
– The recurring nature of many of the threats and the short replacement cycles for …
infrastructure for managing digital collections
– Uncertainty about the strategies and techniques most likely to be effective, and the
significant time required to plan and implement any currently available strategies
for such diverse and large collections
Introduction 3

– The likely high costs of taking action, and the likely high costs of delaying or not
taking action (including the likelihood of loss of access)
– A mismatch between funding cycles and long term preservation commitments, even
for long existing institutions …, leading to the possibility that some preservation
commitments may have to be given priority over others
– Intellectual property and other rights-based constraints on preservation processes
and on the provision of access
– Administrative complexities in ensuring timely action is taken that will be cost-
effective over very long periods of time
– The need to develop and maintain suitable knowledge and systems to deal with
these challenges (National Library of Australia, 2008).

Failure to address these challenges results in the loss of significant quantities


of digital information. The paradigms that shape the environment in which we
now function have changed. By comparison with traditional practice in cultural
heritage institutions, significantly different issues are raised by the increasing
reliance in today’s society on information in digital form, both born-digital and
digitized from paper, film, or other media. The benign neglect that may have
sufficed in the past to preserve information is no longer enough; active inter-
vention is required.
The preservation of digital materials poses many challenges for which pre-
digital paradigms offer little assistance. One challenge is that preserving digital
materials requires constant maintenance, relying on complex hardware and
software that are frequently upgraded or replaced. Another challenge is the
increasing quantity and complexity of digital materials, taxing libraries and
archives systems designed to manage small numbers of simple documents. The
range of stakeholders who have an interest in maintaining digital materials for
use into the future is wide. A 2003 report begins with the words

The need for digital preservation touches all our lives, whether we work in commercial
or public sector institutions, engage in e-commerce, participate in e-government, or use
a digital camera. In all these instances we use, trust and create e-content, and expect
that this content will remain accessible to allow us to validate claims, trace what we
have done, or pass a record to future generations (NSF-DELOS Working Group on
Digital Archiving and Preservation, 2003, p.i).

These words remain as relevant now as when they were written almost ten
years ago.
We cannot expect a technological quick fix. We now appreciate that the
challenges of maintaining digital materials so they remain accessible in the
future are not just technological. They are equally bound up with organiza-
tional infrastructure, resourcing, and legal factors, and we have not yet got
the balance right. These and other factors combine to make the task difficult,
although there are clear pointers to the way ahead. As Breeding (2010, p.32)
notes, ‘while the current state of the art in digital preservation falls short of an
4 Introduction

ideal system that guarantees permanent survival, much has been done to address
the vulnerabilities inherent in digital content’.
Both the library community and the recordkeeping community (archivists
and records managers), as well as an increasing number of other groups, are
energetically seeking solutions to the challenges of digital preservation. Over
the last decade there has been increased sharing of the outcomes of research
and practice. Developments in one community have considerable potential to
assist practice in other information and heritage communities. This book goes
some way towards addressing this need by providing examples from several
different communities.
Although much high-quality information is available to information pro-
fessionals concerned with preserving materials in digital form, most notably on
the web, its sheer volume causes problems for busy information professionals,
scholars and scientists, and individuals who wish to understand the issues and
learn about strategies and practices for digital preservation. Preserving Digital
Materials is written for these time-poor information professionals, scholars
and scientists, and individuals. Its synthesis of current information, research
and perspectives about digital preservation from a wide range of sources
across many areas of practice makes it of interest to a wide range of readers ௅
from preservation administrators and managers who want a professional refer-
ence text to thinking practitioners who wish to reflect on the issues that digital
preservation raises in their professional practice. It will also be of interest to
students.
The reader should note two features of this book. Preserving Digital Mate-
rials is not a how-to-do-it manual, although it does include information about
practical applications, so it is not the place to learn how to apply the technical
procedures of digital preservation. It is not primarily concerned with digitiza-
tion and makes little distinction between information that is born-digital and
information that is digitized from physical media.
This book addresses four key questions which give the text its four-part
structure:

1. Why do we preserve digital materials?


2. What digital materials do we preserve?
3. How do we preserve digital materials?
4. How do we manage digital preservation?

Chapters 1 to 3 address the first question: why do we preserve digital materials?


These chapters examine key definitions and their relationship to ways of think-
ing about digital preservation, note some of the reasons why preservation is a
strong professional imperative for librarians, recordkeepers, scholars and scien-
tists, and individuals, indicate the extent of the preservation problem for digital
materials, and look at the reasons why a digital preservation problem exists.
Introduction 5

The question ‘what digital materials do we preserve?’ is investigated in chap-


ters 4 and 5. Chapter 4 examines the issues of selection of digital materials for
preservation, and chapter 5 notes the questions about the attributes of digital
materials we need to preserve. The question ‘how do we preserve digital mate-
rials?’ is covered in chapters 6, 7 and 8. An overview of digital preservation
strategies is provided in chapter 6, and chapters 7 and 8 describe specific strate-
gies. Aspects of the question ‘how do we manage digital preservation?’ are
noted in chapters 9 and 10. Chapter 9 describes major digital preservation ini-
tiatives and collaborations. Chapter 10 examines some of the issues that digital
preservation faces in the future.
The reader should be aware that this book presents a Western view of
preservation, a view not necessarily embraced by all cultures. The reader should
also note the words in Article 9 of the UNESCO Charter on the Preservation
of the Digital Heritage:

The digital heritage is inherently unlimited by time, geography, culture or format. It is


culture-specific, but potentially accessible to every person in the world. Minorities may
speak to majorities, the individual to a global audience. The digital heritage of all regions,
countries and communities should be preserved and made accessible, creating over
time a balanced and equitable representation of all peoples, nations, cultures and lan-
guages (UNESCO, 2004).

The first edition of Preserving Digital Materials (2005) used many Australian
examples, because Australian practice in digital preservation – from the library,
recordkeeping, audiovisual archiving, data archiving and geoscience sectors –
was often at the forefront of international best practice. This second edition of
Preserving Digital Materials provides a more international perspective, noting
major initiatives in the UK, the EU and the US since 2005. It is possible to do
this in 2011 because of the considerable quantity of material reported by these
and many other initiatives and readily available on web sites, in conference
proceedings and from other public sources.
As noted above, there is a considerable amount of high-quality information
available about preserving materials in digital form, much of it available on the
web. The accessibility that this provides is countered by the impermanence of
much web material, as noted in several chapters in this book. All URLs in this
book were correct at the time of writing.
The first edition of this book acknowledged my indebtedness to many
people, and these debts still remain. Producing the first edition I benefited from
discussions with many colleagues at that time. In particular, I acknowledged
the following individuals for their ideas and support: Tony Dean for suggesting
the example of Piltdown Man; Liz Reuben, Matthew Davies, Stephen Ellis and
Rachel Salmond for case studies; Alan Howell, of the State Library of Victoria,
and staff of the National Library of Australia, in particular Pam Gatenby, Colin
Webb, Kevin Bradley and Margaret Phillips, for their assistance with clarifying
6 Introduction

concepts. Some of the material in the first edition was based on interviews
with Australian digital preservation experts, whose assistance and encourage-
ment was invaluable: Toby Burrows, Mathew Davies, Ray Edmondson, Stephen
Ellis, Alan Howell, Maggie Jones, Gavan McCarthy, Simon Pockley, Howard
Quenault, Lloyd Sokvitne, Paul Tresize, and Andrew Wilson. Heather Brown
and Peter Jenkins provided examples, and their assistance and the permission
of the State Library of South Australia was gratefully acknowledged. Thanks
were due to Ken Thibodeau, CLIR, ERPANET and UNESCO for permission
to use their material. I acknowledged my gratitude to my then employer, Charles
Sturt University, which supported me by providing study leave in 2003. I was
fortunate to be based at the National Library of Australia as a National Library
Fellow from March to June 2003 and I greatly appreciated the generous sup-
port of its then Director-General, Jan Fullerton, and other staff of the National
Library. Finally, I acknowledged the unfailing support of Rachel Salmond in
this and others of my endeavours.
In writing the second edition of this book I have incurred new debts. In
addition to those noted for the first edition, I gratefully acknowledge students
who have enrolled in my courses on digital preservation at Yonsei University,
Seoul, and the Graduate School of Library and Information Science, Simmons
College, Boston. My ideas have been informed by conversations with people too
numerous to name, but I wish to particularly thank Jeannette Bastian, Michèle
Cloonan, Joy Davidson, Cal Lee, Michael Lesk, Martha Mahard, Seamus Ross,
Anne Sauer, Shelby Sanett, and Terry Plum. I am grateful for the support of
my current employer, the Graduate School of Library and Information Science,
Simmons College, Boston.
I must again acknowledge the unfailing support of Rachel Salmond. I owe
Rachel more than I can adequately express here for her help over three decades,
her editorial assistance and her patience with me as the preparation of this
book took over normal schedules.
Chapter 1
What is Preservation in the Digital Age?
Changing Preservation Paradigms
Introduction
To preserve, as the dictionary reminds us is to keep
safe … to maintain unchanged … to keep or maintain
intact. But the rapid obsolescence of information
technology entails the probability that any digital
object maintained unchanged for any length of time
will become inaccessible (Thibodeau, 1999)

Any discussion about the preservation of digital materials must begin with the
consideration of two interlinked areas: changing preservation paradigms, and
definitions of terms. Without a clear understanding of what we are discussing,
the potential for confusion is too great. In library and recordkeeping practice
we are moving rapidly from collection-based models, whose principles and
practices have been developed over many centuries, to models where collec-
tions are not of paramount importance and where what matters is the extent of
access provided to information resources, whether they are managed locally or
remotely. Archivists have considered, debated, and sometimes applied the con-
cept of non-custodial archives, where there is no central collection, to accom-
modate the massive increase in numbers of digital records. Librarians manage
hybrid libraries, consisting of both physical collections and distributed digital
information resources, and digital libraries. Other stakeholders with a keen inter-
est in digital preservation manage digital information in specific subject areas,
such as geospatial data or social science data. In the past this material, where it
existed, was maintained as collections of paper and other physical objects. The
practices developed and applied in libraries and archives are still largely based
on managing physical collections and cannot be applied automatically to man-
aging digital collections.
The changing models of library and recordkeeping practice require new
definitions. The old terms do not always convey useful meanings in the digital
environment and can be misleading and, on occasion, even harmful. In library
and recordkeeping practice we are changing from a preservation paradigm
where primary emphasis is placed on preserving the physical object (the arti-
fact as carrier of the information we wish to retain, for example, a CD) to one
where there is no physical carrier to preserve. What, then, does the term pres-
ervation mean in the digital environment? How has its meaning changed?
8 What is Preservation in the Digital Age?

What are the implications of these changes? The phrase benign neglect pro-
vides an example of a concept that is helpful in the pre-digital preservation
paradigm but is harmful in the new. It refers to the concept that many informa-
tion carriers made of organic materials (most notably paper-based artifacts)
will not deteriorate rapidly if they are left undisturbed. For digital materials
this concept is positively harmful. One thing we understand about information
in digital form is that actions must be applied almost from the moment it is
created, if it is to survive. Pre-digital paradigm definitions do not accommodate
new forms, such as works of art that incorporate digital technologies, and time-
defined creative enterprises such as performance art.
This chapter examines the effect of digital information on ‘traditional’
librarianship and recordkeeping paradigms, noting the need for a new preser-
vation paradigm in an environment that is dynamic and has many stakeholders,
often with competing interests. It considers the differences between born-
digital and digitized information, and defines key terms.

Changing paradigms
It is now commonplace to hear or read that we live in an information society,
of which one main characteristic is the widespread and increasing use of net-
worked computing, which relies on data. This is revolutionizing the way in
which large parts of the world’s population live, work and play, and how
libraries, archives, museums and other institutions concerned with preserving
documentary heritage function and are managed. New expectations of these
institutions are evolving.
The significance of these changes is readily illustrated by just one example.
The internet is rapidly becoming the first choice for people who are searching
for information on a subject, and a new verb, to google (derived directly from
Google, the name of a widely used internet search engine) has entered our
vocabulary. The sheer size and rapid rate of the internet’s growth mean that
no systems have been developed to provide comprehensive access to it. The
systems that do exist are embryonic and experimental, and the quality of the
information available on the web is variable. Attempts to estimate the rate of
the internet’s growth have included counting the number of domain names
over several years. There has been a dramatic increase in the number of domain
names since 1994, when only a small number were registered, rising to around
100 million at the start of 2001 and, ten years later, to almost 800 million in
2010 (Internet Systems Consortium, 2011).
These major changes – it is not too extreme to call it a revolution – raise
the question of how to keep the digital materials we decide are worth keeping.
The ever-increasing quantities being produced do not assist us in finding an
answer. ‘According to a recent study by market-research company IDC … the
Changing paradigms 9

size of the information universe is currently 800,000 petabytes. … but it’s just
a down payment on next year’s total, which will reach 1.2 million petabytes, or
1.2 zettabytes. If these growth rates continue, by 2020 the digital universe will
total 35 zettabytes, or 44 times more than in 2009’ (Tweney, 2010). Nor does
the rapidity with which changes in computer and information technology occur.
The challenges are new and complex for nearly all aspects of librarianship and
recordkeeping, including preservation.
There have also been changes in the ways in which information is pro-
duced and becomes available to communities of users. The internet is only one
of these ways. In the pre-digital (print) environment the processes of creation,
reproduction and distribution were separate and different; now, ‘technology
tends to erase distinctions between the separate processes of creation, repro-
duction and distribution that characterize the classic industrial model of print
commodities’ (Nurnberg, 1995, p.21). This has significant implications for
preservation, especially in terms of who takes responsibility for it and at what
stage preservation actions are first applied. For instance, in the industrial-mode
print world, acquiring the artifact – the book – so that it could be preserved
occurred by means such as legal deposit legislation, requiring publishers to
provide copies to libraries for preservation and other purposes. If the creator is
now also the publisher and distributor, as is often the case in the digital world,
who has the responsibility of acquiring the information? These points are noted
in more detail later in this book.
New ways of working and new structures are developing. Cyberscholarship
(known also as e-science or e-research) is based on ready access to digital mate-
rials and applies computing techniques to analyze, visualize and present results.
This research is typically highly collaborative, being based on the use of large
data sets produced and shared by international communities of scholars. The
practices developed in this cyberscholarship environment are significantly dif-
ferent from traditional practices. Other characteristics of cyberscholarship also
illustrate different practices. The enhanced ability to compute large quantities of
data, such as using visualizations and simulations, provide new possibilities,
some of which can be seen in the Electronic Cultural Atlas Initiative (ecai.org).
The generation of large quantities of data places heavy demands on how data are
stored and managed. Heavy emphasis is placed on sharing and re-using digital
information. All of these factors place different demands on how digital informa-
tion is managed, including on its preservation over time to ensure it remains
available and usable in the future. A 2008 study carried out for the Association
of Research Libraries provides examples of cyberscholarship in humanities, so-
cial sciences and scientific/technical/medical subject areas in the US (Maron and
Kirby Smith, 2008). Changing information practices in the humanities, in the
UK specifically, are described by Bulger and her colleagues (Bulger et al., 2011).
Cyberinfrastructure refers to the computer networks, libraries and archives,
online repositories and other resources needed to support cyberscholarship.
10 What is Preservation in the Digital Age?

These include easy-to-use, effective applications and services to locate, man-


age, analyze, visualize and store data, and sufficient numbers of people skilled
in managing large quantities of data. (Borgman (2007) provides an excellent
overview of cyberscholarship and cyberinfrastructure.) Much of the current
discussion about cyberscholarship and cyberinfrastructure is centered on the
roles that research libraries will play. Walters and Skinner (2011, p.5) see
research libraries being ‘repositioned as vibrant knowledge branches that reach
throughout their campuses to provide curatorial guidance and expertise for
digital content, wherever it may be created and maintained’. New structures
will continue to evolve and develop.

The need for a new preservation paradigm


The digital revolution is changing the professional practice of librarians, record-
keepers, and indeed all other information professionals. The paradigm shift in
how libraries, archives and information agencies conduct their activities can be
crudely (and naively) described as the change from acquiring, storing and pro-
viding access to information resources in physical forms, to acquiring, storing
and providing access to digital information resources. Preservation activities
are central to this new paradigm.
The pre-digital preservation paradigm is based on principles such as the
following:

– When materials are treated, the treatments should, when possible, be re-
versible
– Whenever possible or appropriate, the originals should be preserved; only
materials that are untreatable should be reformatted
– Library materials should be preserved for as long as possible
– Efforts should be put into preventive conservation, and aimed at providing
appropriate storage and handling of artifacts
– Benign neglect may be the best treatment (derived from Cloonan (1993,
p.596), Harvey (1993, pp.14,140), and Bastian, Cloonan and Harvey (2011,
pp.612-613)).

The definitions associated with the old preservation paradigm are firmly rooted
in the conservation of artifacts – the physical objects that carry the information
content. In fact, the term ‘materials conservation’ is sometimes used, especially
by museums. The definitions provided in the IFLA Principles for the Care and
Handling of Library Materials (Adcock, 1998), widely adopted in the library
and recordkeeping contexts, articulate principles firmly based on maintenance
of the physical artifact. The definition of Conservation notes that its aims are
to ‘slow deterioration and prolong the life of an object’, and that of Archival
The need for a new preservation paradigm 11

quality emphasizes the longevity and stability of materials in the words ‘a


material, product, or process is durable and/or chemically stable, that it has a long
life, and can therefore be used for preservation purposes’ (Adcock, 1998, p.4).
Medium/media is defined as ‘the material on which information is recorded.
Sometimes also refers to the actual material used to record the image’ (Adcock,
1998, p.5). The point here is that the old preservation paradigm considers
information content and the carrier as one and the same, although this changed
in the declining years of the old paradigm as large-scale copying programmes,
especially microfilming programmes, were implemented. To these definitions
we should add Restoration (from the 1986 version of the IFLA Principles):
‘Denotes those techniques and judgements used by technical staff engaged in
the making good of library and archive materials damaged by time, use and
other factors’ (Dureau and Clements, 1986, p.2).
Pre-digital paradigm thinking does not transfer well to the digital environ-
ment. This is easily illustrated. Taking just one example, the emphasis on
keeping the physical carrier (the diskette, CD, magnetic tape, hard drive, flash
drive) does not work because these carriers quickly become obsolete, are
closely linked to specific hardware and software drivers which also quickly
become obsolete, are easy to corrupt, and deteriorate rapidly (Bastian, Cloonan
and Harvey, 2011, p.611). Cox indicates how a different way of thinking is
needed in the recordkeeping community and suggests four elements of a new
preservation paradigm for electronic records. He notes, for instance, that record-
keepers ‘have long seen centralization or custody of records as crucial to their
work’, but this is not feasible for electronic records; transferring electronic
records to the custody of archives ‘may undermine their very long-term use’
(Cox, 2001, p.95).
As Marcum noted in her preface to a 2002 survey about the state of pres-
ervation programmes in American college and research libraries,

the information landscape has changed, thanks to the digital revolution. Libraries are
working to integrate access to print materials with access to digital materials. There is
likewise a challenge to integrate the preservation of analog and digital materials. Preser-
vation specialists have been trained to work with print-based materials, and they are
justifiably concerned about the increased complexity of the new preservation agenda
(Kenney and Stam, 2002, p.v).

Marcum’s description of the situation is still accurate ten years later. Addition-
ally research libraries are seeking new roles as experts in the curation of digital
materials (Walters and Skinner, 2011).
What is the new preservation agenda? How has the preservation paradigm
changed to accommodate it? Pre-digital preservation paradigm thinking does
include some useful understanding of digital preservation. For example, it
recognizes that copying (as in refreshing from tape to tape) is the basis of digital
12 What is Preservation in the Digital Age?

preservation; this recognition is encapsulated in British Standard BS4783 Part 2,


dating from 1988, which focuses on procedures for refreshing digital data by
copying them from magnetic tape to magnetic tape at regular intervals and on
the best storage and handling of these tapes. The old paradigm does not, how-
ever, engender an understanding of the complexity of copying – which is more
than simply preserving a bit-stream, but must take account of a wide range of
other attributes of the digital object that also need to be preserved.
In 1998 Hedstrom provided early recognition of the need for new preserva-
tion paradigms and of the enormity of the challenges faced, noting that ‘digital
preservation adds a new set of challenges for libraries and archives to the exist-
ing task of preserving a legacy of materials in traditional formats’ (Hedstrom,
1998, p.192). However, old-paradigm preservation thinking led to the charac-
terization of thinking about digital preservation as ‘a myopic focus on technical
problems (such as preserving digital objects) and a concomitant neglect of the
bigger picture (for example, public policy, among other issues)’ (Cloonan,
2001, p.232). Old-paradigm thinking does not suffice because not only are there
additional technical challenges, there are also new challenges resulting from
the quantities of digital information being produced. In a prescient comment
Hedstrom noted: ‘Our ability to create, amass, and share digital materials far
exceeds our current capacity to preserve even that small amount with continuing
value’ (Hedstrom, 1998, p.192). Specific examples from scientific areas, such
as determining long-term global climate change, and biomedicine, are pro-
vided in a 2003 report which contends that ‘much more digital content is avail-
able and worth preserving’ (Workshop on Research Challenges in Digital
Archiving and Long-term Preservation, 2003, p.2). The situation has not
changed significantly in the intervening years. New ways of thinking about
preservation and new skills are still needed.
At this point it is useful to consider some of the key elements of new para-
digm preservation thinking. The first of these is the need to actively maintain
digital information over time from the moment of its creation. Interruptions in
the management of a digital collection will mean that there is no collection left
to manage. Unlike most collections of physical objects, collections of digital
materials require ‘constant maintenance and elaborate “life-support” systems
to remain viable’ (Workshop on Research Challenges in Digital Archiving and
Long-term Preservation, 2003, p.7). In addition to these technical issues there
are other issues – social and institutional (political in the broadest sense) – for
‘even the most ideal technological solutions will require management and
support from institutions that go through changes in direction, purpose, and
funding’ (Workshop on Research Challenges in Digital Archiving and Long-
term Preservation, 2003, p.7). The three-legged stool model of digital preser-
vation, in which the legs represent technology, organization, and resources as
equally important components of supporting digital preservation, describes this
well (McGovern, 2007a).
Changing definitions 13

Further key elements are the scale and nature of the digital information we
wish to maintain into the future and the preservation challenges these pose.
The complexities of the variety of digital materials are described in this way:

Digital objects worthy of preservation include databases, documents, sound and video
recordings, images, and dynamic multi-media productions. These entities are created
on many different types of media and stored in a wide variety of formats. Despite a
steady drop in storage costs, the recent influx of digital information and its growing
complexity exceeds the archiving capacity of most organizations (Workshop on Research
Challenges in Digital Archiving and Long-term Preservation, 2003, p.7).

Arising from these key factors is the need for new kinds of skills. Current
preservation skills and techniques are labour-intensive and, even where ap-
propriate, do not scale up to the massive quantities of digital materials we are
already encountering. The problem cannot simply be addressed by technologi-
cal means. What kind of person will implement the new policies and develop
the new procedures required to maintain digital materials effectively into the
future? New kinds of positions, requiring new skill sets are already being es-
tablished in libraries and archives. Among key selection criteria for a Digital
Archivist at the MIT Libraries in Cambridge, Massachusetts, advertised in
May 2011, were:

– Demonstrated knowledge of digital archival and records management the-


ory and practice including issues related to intellectual property, content
management, access, and preservation
– Demonstrated knowledge of data storage methods, media and security
– Experience with digital repository platform(s) and with XML and digital
content creation/transformation tools
– Demonstrated knowledge of descriptive metadata standards including
MARC, DACS, Dublin Core and of data structure standards relevant to
archival control of digital collection material (examples: EAD, Dublin
Core, MODS, METS, PREMIS, or VRA Core)
– Experience with relational databases.

People with these skill sets are still in short supply. But it is more than new
skills that is required. We need to redefine the field of preservation and the
terms we use to describe preservation activities.

Changing definitions
Pre-digital preservation paradigm definitions do not convey useful meanings
when they are applied to digital preservation. What are they?
14 What is Preservation in the Digital Age?

Currently ‘conservation’ is the more specific term and is particularly used in relation to
specific objects, whereas ‘preservation’ is a broader concept covering conservation as
well as actions relating to protection, maintenance, and restoration of library collections.
The eminent British conservator, Christopher Clarkson, emphasizes this broader aspect
when he states that preservation ‘encompasses every facet of library life’: it is, he says,
‘preventive medicine ... the concern of everyone who walks into, or works in, a library.’
For Clarkson conservation is ‘the specialized process of making safe, or to a certain
degree usable, fragile period objects’ and ‘restoration’ expresses rather extensive rebuild-
ing and replacement by modern materials within a period object, catering for a future
of more robust use.’ He neatly distinguishes the three terms by relating them to the extent
of operations applied to an item: ‘restoration implies major alterations, conservation
minimal and preservation none’ (Harvey, 1993, pp.6-7).

These definitions are based on the assumptions that deteriorated materials or


artifacts can be made good by restoring them, and that we can slow down, per-
haps even halt, the rate of deterioration by taking appropriate measures, such
as paying attention to careful handling and high-quality storage. The principles
on which old-paradigm preservation practice is based, and in which these defi-
nitions are rooted, are almost entirely oriented towards artifacts – preserving
the media in which the information is stored. These principles were modified
as large-scale mass preservation treatments and practices, such as mass deacidi-
fication or microfilming, were implemented. For example, reformatting pro-
grammes (microfilming and photocopying) moved the paradigm towards a
recognition that information content could be preserved without also having to
preserve the original media on which the content was carried.
But these definitions simply do not work with digital materials. In a per-
ceptive conjecture about how preservation will change in the future, Cloonan
asked in 1993 ‘Which principles will be practiced? ... Will our very concept of
permanence change in the next generation ...? The answer ... is likely to be
“yes”’ (Cloonan, 1993, p.602). She suggested that our preservation concerns in
the future will be about the preservation of knowledge, not the preservation of
individual items, and that ‘we must continue to save as much information as
possible, regardless of the format or the means by which it is stored and dis-
seminated’ (Cloonan, 1993, p.603). Elsewhere, Cloonan conjectured that, while
it is easy to indicate what a library preservation programme consists of – ‘disaster
recovery planning, collection development policies, environmental controls,
integrated pest management, proper storage, physical treatment, reformatting,
migration, staff and user education, and the like’ – we are still not any closer to
what preservation is really about, what its ‘essence’ is. ‘Is preservation merely
a set of actions? Is it a way of seeing? Or a way of interpreting information? Is
copying preservation? Is reformatting?’ (Cloonan, 2001, pp.232-233).
The very stuff we seek to preserve needs better definition. Preserving the
media will not suffice; but what exactly is it that we want to preserve? Is it digi-
tal materials (as this book’s title suggests), digital objects, digital records, digital
Changing definitions 15

documents, data? We need definitions that can be commonly understood by


those who create, manage and use digital materials. The Digital Curation Centre’s
Curation Lifecycle Model (Digital Curation Centre, 2008) gives definitions for
the terms data, digital object and database that accommodate the range of digital
materials we seek to preserve. Data is ‘any information in binary digital form’
and includes digital objects and databases. Digital objects can be simple (‘such
as textual files, images or sound files, along with their related identifiers and
metadata’) or complex (‘made by combining a number of other digital objects,
such as websites’). Databases are ‘structured collections of records or data stored
in a computer system’.
Conway provided an early description of the changes from old to new
preservation paradigms and the development of new definitions in terms of
transformation, suggesting that consensus about ‘a set of fundamental princi-
ples that should govern the management of available resources in a mature
preservation program’ has been reached in the analogue world and that they
persist in the digital world. They ‘in essence, define the priorities for extending
the useful life of information resources. These concepts are longevity, choice,
quality, integrity, and accessibility’ (Conway, 2000, p.22).
These concepts have been transformed to accommodate the preservation of
digital information. Longevity has altered from a focus on extending the life of
physical media to one on ‘the life expectancy of the access system’. Choice
(selection of material to be preserved) is no longer a decision made later in the
life cycle of an item, but has become, for digital materials, ‘an ongoing process
intimately connected to the active use of the digital files’. Integrity, based on
‘the authenticity, or truthfulness, of the information content of an item’, no
longer has maintaining the physical medium as its primary emphasis, but now
is about developing procedures that allow us to ensure and be assured that no
changes have been made. In the digital world, access to the artifact is clearly no
longer sufficient; what is required, suggests Conway, is access to ‘a high quality,
high value, well-protected, and fully integrated digital product’ (Conway,
2000, p.27). These concepts continue to be explored further in relation to elec-
tronic records by the InterPARES Project (InterPARES, 1999-).
The changing paradigms and definitions also entail major shifts in the con-
ceptualization of preservation. Preservation implies alteration; Cloonan (2001,
p.235) refers to the ‘paradox’ of preservation: ‘it is impossible to keep things
the same forever. To conserve, preserve, or restore is to alter’. If we seek to
keep a digital object unchanged, we require, in effect, that the technology on
which it was developed to operate (or another technology designed to emulate
the original) is available. Over time this reduces access to the digital objects
we are attempting to maintain by preserving them. As Thibodeau, Moore and
Baru (2000, p. 113) suggest, ‘many users would not be pleased if, in order to
access digital objects that had been preserved across the last 30 years, they had
to learn to use the PL1 programming language or Model 204 database soft-
16 What is Preservation in the Digital Age?

ware’. An implication of this paradox is that we should accept some degree of


change in digital materials as we preserve them over time. The real issue is:
how much change is acceptable?
Change is also apparent in the ways that information professionals define
preservation. In a 2002 survey of records managers and archivists who deal
with electronic records, Cloonan and Sanett (2002, p.73) posed the questions:
‘What is the meaning of preservation? Does the meaning change when it is
applied to electronic rather than paper-based records?’. They noted that:

It is clear that professionals are revising their definitions of preservation from a once-
and-forever approach for paper-based materials to an all-the-time approach for digital
materials. Preservation must now accommodate both media and access systems …
while we once tended to think about preserving materials for a particular period of time
– for example, permanent/durable paper was expected to last for five hundred years –
we now think about retaining digital media for a period of continuing value (Cloonan
and Sanett, 2002, p.93).

Interviewees expressed dissatisfaction with the term digital preservation, sug-


gesting that other terms such as long-term retention are more suitable (Cloonan
and Sanett, 2002, p.74). Definitions of preservation provided by survey re-
spondents, Cloonan and Sanett (2002, p.85) concluded, ‘demonstrate a shift
taking place from defining preservation as a once-and-forever approach for
paper-based materials, to an all-the-time approach for digital materials’. This
shift implies an acceptance that preservation may even begin before a record has
been created.
Debate has continued about whether digital preservation sufficiently de-
scribes what needs to occur for digital materials to be accessible over time, on
the grounds that more than just aspects of preservation need to be encompassed
by whatever term is used. The terms digital curation and digital stewardship
have been proposed and are gaining acceptance because they describe more than
just preservation, referring also to ‘the creation, collection, organization [and]
dissemination’ of digital objects (Bastian, Cloonan and Harvey 2011, p.609).

Preservation definitions in the digital world


What, then, are viable definitions of preservation in a digital world? What
concepts do these definitions need to accommodate? What difficulties do we
encounter in trying to develop definitions that will assist us in our attempts to
preserve digital materials?
The first point to note is the dynamic nature of the field. We must be pre-
pared to change our paradigms and the definitions that we develop from them to
address its changing nature. In a discussion of how archivists’ thinking can bet-
ter inform digital preservation, Gilliland-Swetland (2000, p.v) comments that
Preservation definitions in the digital world 17

the paradigms of any of the information professions come up short when compared with
the scope of the issues continuously emerging in the digital environment. An overarching
dynamic paradigm – that adopts, adapts, develops, and sheds principles and practices
of the constituent information communities as necessary – needs to be created.

Our new definitions need to accommodate the idea of information being


preserved independent of the media on which it resides. This is now well
accepted; old preservation paradigm practices were well acquainted with it
through microfilming programmes. It is no longer a fact that the original has
‘more integrity and veracity than a copy’ (Cloonan, 2001, pp.236-237); instead,
in the digital world, we need to look further to define what attributes of digital
objects we wish to maintain over time.
Definitions should also accommodate the social and organizational aspects
of digital preservation, the ‘public policy, economic, political, social or educa-
tional perspective’ (Cloonan, 2001, p.238). Old-paradigm definitions certainly
recognized that there was more to preservation than the technical aspects – the
IFLA Principles suggest it also encompasses ‘managerial and financial con-
siderations, ... staffing levels [and] policies’ (Adcock, 1998, p.5) – but defini-
tions of digital preservation need to go much further. They must be extended
because of yet another factor, the need to start preserving digital materials
almost from the moment of their creation, and even, some suggest, before they
are created.
Preservation in the pre-digital paradigm was usually applied retrospec-
tively. Conservation procedures were applied to artifacts only after the artifact,
or the information contained in or on it, had been deemed to be of significance
and therefore worth preserving for use in the future. For example, books printed
before 1800 are typically considered to be significant because they are the
product of handcraft production techniques and, therefore, no two items are
identical; and some artifacts are preserved because they attain iconic status –
the Magna Carta, the US Declaration of Independence. We could also rely on
benign neglect, where lack of action did not usually harm the item (assuming
certain factors such as low use were in play) and did not significantly affect the
likelihood of its survival. This concept no longer works for digital materials.
The 3.5-inch diskette, once very common, provides a good example. Informa-
tion on it was likely to become unreadable for many reasons: the diskette may
have been stored in conditions too humid or too hot; the drive to read it may
have been superseded by newer technology and may no longer be readily
available; the driver software for that drive may no longer be easy to find. After
a period of time, the diskette is rendered unusable. Active preservation needs
to start close to the time of creation of digital information if there is to be any
certainty that that information will be accessible in the future.
Other concepts need to be accommodated by the new definitions. For digital
materials ‘their preservation must be an integral element of the initial design of
systems and projects’ (Ross, 2000, p.13), but this is not usually the case. Digital
18 What is Preservation in the Digital Age?

materials exist in a bewilderingly large number of formats; there is still little


standardization. The most significant concept is that the preservation of digital
materials is much more than the preservation of information content or physical
carrier:

it is about preserving the intellectual integrity of information objects, including capturing


information about the various contexts within which information is created, organized,
and used; organic relationships with other information objects; and characteristics that
provide meaning and evidential value (Gilliland-Swetland, 2000, p.29).

Preserving the original bit-stream is only one part of the problem; equally im-
portant is the requirement to preserve ‘the means of interpreting, reading and
utilizing the bit stream’ (Deegan and Tanner, 2002).
The difficulties of definition are not helped by disciplinary differences.
There are, for instance, differences in the way archivists and librarians use terms.
Some terms, such as integrity and authenticity, arise from the world of archives
and were not, until recently, usually associated with the work of librarians.
These differences, however, pale in comparison with the significantly different
definitions used in the IT industry. How IT professionals think about the long-
term storage of data is a question that assumes importance for digital preserva-
tion because of the heavy reliance that information professionals place on their
skills and services. There is abundant evidence that they think very differently
about preservation. Definitions of archive, archiving and archival storage give
us some indication of the mindset of IT professionals. A selection of online
dictionaries of information technology indicate that the terms are used in two
ways:

1. The process of moving data to a different kind of storage medium: for ex-
ample, ‘archive … 2 verb to put data in storage … on backing storage (such
as magnetic tape rather than a hard disk)’ (Collins, 2002)
2. The process of backing up data for long-term storage: for example, ‘ar-
chive (v.) To copy files to a long-term storage medium for backup … On
smaller systems archiving is synonymous with backing up’ (Webopedia,
2011).

Few of the definitions located display any interest or concern with the reasons
why long-term storage might be required, although one earlier definition is a
notable exception: ‘archiving Long term storage of information on electronic
media. Information is archived for legal, security or historical reasons, rather
than for regular processing or retrieval’ (Gunton, 1993, p.11). Perhaps the mind-
set of IT professionals is better indicated by this excerpt: ‘You detect data that’s
not needed online and move it an off-shore store. When someone wants to use
it, go find the off-line media and restore the data’ (Faulds and Challinor, 1998,
p.280).
Preservation definitions in the digital world 19

There is no indication in these definitions of the period of time that long-


term refers to, yet this is a crucial point for those who are concerned with preser-
vation. While it is not especially helpful to define long-term in terms of a spe-
cific number of months or years, some awareness of the problems is required
in the definition, as the OAIS (Open Archival Information System) Reference
Model’s definition of long-term indicates:

A period of time long enough for there to be concern about the impacts of changing
technologies, including support for new media and data formats, and of a changing user
community, on the information being held in a repository (International Organization
for Standardization, 2003, p.1-11).

Definitions have changed as we come to know more about how to preserve


digital objects, as the example of time – how long we want to preserve material
for – illustrates. Long-term, initially incorporating elements of old-paradigm
thinking of indefinitely, or as long as possible – as in ‘digital preservation
means retaining digital image collections in a usable and interpretable form for
the long term’ (Kenney and Rieger, 2000, p.135) – is now more commonly
defined in the preservation community, in terms derived from the archival
community, as the period during which the information remains of value. Such
a definition is more helpful than the commonly encountered phrase ‘over time’
– as in ‘ensuring the integrity of information over time’ (Gilliland-Swetland,
2000, p.22), and ‘digital preservation [is] the processes and activities which
stabilize and protect reformatted and “born digital” authentic electronic mate-
rials in forms which are retrievable, readable, and usable over time’ (Cloonan
and Sanett, 2002, p.95). Selected definitions from the influential publications
Preservation Management of Digital Materials: The Handbook (Digital Pres-
ervation Coalition, 2008) and the UNESCO Guidelines for the Preservation of
Digital Heritage (UNESCO, 2003) are presented in Figure 1.1.

Preservation Management of Digital Guidelines for the Preservation of Digital


Materials (Digital Preservation Heritage (UNESCO, 2003)
Coalition, 2008)
Access ... continued, ongoing usability of a Accessibility The ability to access the es-
digital resource, retaining all qualities of sential, authentic meaning or purpose of a
authenticity, accuracy and functionality digital object (p.157)
deemed to be essential for the purposes the Digital materials cannot be said to be pre-
digital material was created and/or acquired served if access is lost. The purpose of
for (p.24) preservation is to maintain the ability to
present the essential elements of authentic
digital materials (p.21)
20 What is Preservation in the Digital Age?

Preservation Management of Digital Guidelines for the Preservation of Digital


Materials (Digital Preservation Heritage (UNESCO, 2003)
Coalition, 2008)
Authenticity The digital material is what it Authenticity Quality of genuineness and
purports to be. In the case of electronic trustworthiness of some digital materials, as
records, it refers to the trustworthiness of being what they purport to be, either as an
the electronic record as a record. In the case original object or as a reliable copy derived
of ‘born digital’ and digitized materials, it by fully documented processes from an
refers to the fact that whatever is being original (p.157)
cited is the same as it was when it was first
created unless the accompanying metadata
indicates any changes. Confidence in the
authenticity of digital materials over time is
particularly crucial owing to the ease with
which alterations can be made (p.24)
Digital heritage Those digital materials that
are valued sufficiently to be retained for fu-
ture access and use (p.157)
Digital Materials A broad term encom- Digital materials is generally used here as
passing digital surrogates created as a result a preferred term covering items of digital
of converting analogue materials to digital heritage at a general level. In some places,
form (digitization), and ‘born digital’ for digital object or digital resource have also
which there has never been and is never been used. These terms have been used in-
intended to be an analogue equivalent, and terchangeably and generically (p.20)
digital records (p.24)
Digital Preservation Refers to the series of Digital preservation The processes of
managed activities necessary to ensure con- maintaining accessibility of digital objects
tinued access to digital materials for as long over time (p.157)
as necessary. Digital preservation is defined … is used to describe the processes in-
very broadly for the purposes of this study volved in maintaining information and other
and refers to all of the actions required to kinds of heritage that exist in a digital form.
maintain access to digital materials beyond In these Guidelines, it does not refer to the
the limits of media failure or technological use of digital imaging or capture techniques
change ... (p.24) to make copies of non-digital items, even if
that is done for preservation purposes …
(p.20)
Information Packages ... Preservation de-
pends on maintaining digital objects and
any information and tools that would be
needed in order to access and understand
them. Together, these can be considered to
form an information package that must be
managed either as a single object or as a vir-
tual package (with the object and associated
information tools linked but stored sepa-
rately) (p.39)
What exactly are we trying to preserve? 21

Preservation Management of Digital Guidelines for the Preservation of Digital


Materials (Digital Preservation Heritage (UNESCO, 2003)
Coalition, 2008)
Preservation program The set of arrange-
ments, and those responsible for them, that
are put in place to manage digital materials
for ongoing accessibility (p.158).
… is used to refer to any set of coherent ar-
rangements aimed at preserving digital ob-
jects (p.20)
Figure 1.1: Selected Definitions (From Digital Preservation Coalition, 2008; UNESCO,
2003)

These definitions assist us by providing useful starting points for an extended


discussion of digital preservation. In particular, they address some significant
questions that we need guidance on:

– What exactly are we trying to preserve?


– How long are we preserving them for?
– What strategies and actions do we need to apply?

What exactly are we trying to preserve?


One of UNESCO’s thematic areas is culture, and, within that, heritage. There-
fore the UNESCO Guidelines are primarily concerned with digital heritage,
‘those digital materials that are valued sufficiently to be retained for future
access and use’. Although this statement is too general to help us decide pre-
cisely what it is we want to preserve, it does introduce the essential concept of
selection, of deciding value – in this case, that digital material on which high
value is placed. This is the subject of Chapter 4.
More specific in the UNESCO Guidelines and in Preservation Management
of Digital Materials: The Handbook are the definitions of digital materials –
the specific digital items, objects or resources that we are concerned with. Both
publications use digital materials, suggesting a high level of consensus; this
term will, therefore, be adopted in this book, despite its suggestion of pre-digital
paradigm thinking in the physicality of the word materials, as things with a
physical presence. Both sets of definitions categorically state that they are not
concerned with the use of digitizing of analogue materials as a preservation
technique. The Digital Preservation Coalition’s handbook (2008, pp.24-25)
‘specifically excludes the potential use of digital technology to preserve the
original artefacts through digitisation’, and the UNESCO Guidelines (2003,
p.20) are equally adamant, stating that digital preservation ‘does not refer to
the use of digital imaging or capture techniques to make copies of non-digital
22 What is Preservation in the Digital Age?

items, even if that is done for preservation purposes’. Such statements are
worth making firmly because of the misconception still too commonly encoun-
tered in the information professions that digitizing of analogue materials, usually
photographs or paper-based material, is sufficient for preservation purposes.
This is not the case, as a 2010 report of LIBER members (Ligue des Biblio-
thèques Européennes de Recherche, representing European research libraries)
indicates:

Making the digitised material available and visible online is only one of the challenges
faced ... Another lies in assuring long-term access to them. Digitised materials í like
other digital data í are also fragile items and need special measures and arrangements
in order to be accessible despite technological change. While the preservation of paper
documents is well understood and is supported by a well-established infrastructure and a
profession of librarians and other experts, the preservation of digital objects in general
and digitised material in particular is a relatively new task for libraries and poses great
challenges in terms of the expertise and resources required (Bergau, 2010, p.6).

In terms of how they are preserved, though, the definitions in these two sources
make no distinction between born-digital materials and digital materials created
by digitizing analogue materials. This is acknowledged in the Digital Preserva-
tion Coalition’s definition of digital materials, which covers both ‘digital sur-
rogates created as a result of converting analogue materials to digital form
(digitisation), and “born digital” for which there has never been and is never
intended to be an analogue equivalent, and digital records’ (Digital Preservation
Coalition, 2008, p.24). These definitions also make clear that it is not only the
bit-stream that we seek to preserve. In order to ensure access in the future to
digital materials, we also need to take account of other attributes of digital
materials. The UNESCO Guidelines indicate this in the definition of information
packages, which comes from the OAIS Reference Model in Chapter 5. In addi-
tion to the bit-stream, which is typically ‘not understandable or re-presentable’
by itself, ‘any information and tools that would be needed in order to access
and understand’ the digital materials must also be preserved (UNESCO, 2003,
p.39).
The definitions are also very clear about the need to maintain other attributes
of digital materials. To ensure that digital materials remain usable in the future,
access to them is required – and not simply access, but access to ‘all qualities
of authenticity, accuracy and functionality’ (Digital Preservation Coalition,
2008, p.24). This, in turn, requires definitions of authenticity, expressed by the
UNESCO Guidelines as the ‘quality of genuineness and trustworthiness of
some digital materials, as being what they purport to be, either as an original
object or as a reliable copy derived by fully documented processes from an
original’ (UNESCO, 2003, p.157). (Note the emphasis on the significance of
full documentation to ensure authenticity; this has important implications for
digital preservation, noted in Chapter 5.) Four further definitions in the
How long are we preserving them for? 23

UNESCO Guidelines clarify and emphasize these requirements (see Figure


1.2). Digital materials can be considered as physical objects, logical objects, or
conceptual objects. The physical object is the artifact (for example, the disk-
ette, the CD, or the magnetic tape whose physical characteristics store in or on
it the bit-stream – that is, the logical object). These are given sense when they
are used by humans and are labeled conceptual objects – ‘what we deal with in
the real world’ (Thibodeau, 2002, p.8). Essential elements (now more commonly
known as significant properties) of digital materials enable us to re-present the
materials in the manner in which they were originally intended.

Guidelines for the Preservation of Digital Heritage (UNESCO, 2003)

Conceptual objects Digital objects as humans interact with them in a human-


understandable form (p.157)
Essential elements The elements, characteristics and attributes of a given digital object
that must be preserved in order to re-present its essential meaning or purpose. Also
called significant properties (pp.157-158)
Logical objects Digital objects as computer encoding, underlying conceptual objects
(p.158)
Physical objects Digital objects as physical phenomena that record the logical encoding,
such as polarity states in magnetic media, or reflectivity states in optical media (p.158)

Figure 1.2: Selected Definitions (From UNESCO, 2003)

How long are we preserving them for?


Although the definitions in the UNESCO Guidelines do not provide specific
guidance about the length of time we preserve digital materials, the Digital
Preservation Coalition’s handbook assists us with its articulation of long-term,
medium-term and short-term preservation. Long-term preservation aims to
provide indefinite access to digital materials, or at least to the information con-
tained in them. Continued access to digital materials for a defined time (but not
indefinitely) is medium-term preservation: here, the time period is long enough
to encompass changes in technology. Short-term preservation is, in part, defined
by changes in technology: access to digital materials is maintained until tech-
nological changes make it inaccessible, or for a period during which the material
is likely to be in use but which is relatively short (Digital Preservation Coali-
tion, 2008, p.25). Definitions like these provide helpful ways of thinking about
digital preservation programmes, for example, about resource allocation and
the long-term resource implications of embarking on long-term preservation.
24 What is Preservation in the Digital Age?

What strategies and actions do we apply?


The definitions in these two sources make us aware, in a general sense, of the
components of a digital preservation programme. In order to achieve the aim of
a digital preservation programme (‘maintaining accessibility of digital objects
over time’ (UNESCO, 2003, p.20), or ‘to ensure continued access to digital
materials for as long as necessary’), various processes forming a ‘series of
managed activities’ (Digital Preservation Coalition, 2008, p.24) are required.
They need to form ‘a set of coherent arrangements’ (UNESCO, 2003, p.20).
These are noted in Chapters 7 to 10.

Conclusion
This chapter has introduced some of the key concepts that are reshaping preser-
vation practice in the digital environment. It notes the need for new ways of
thinking about preservation and poses three key questions that need to be con-
sidered when we think about the preservation of digital materials:

– What exactly are we trying to preserve?


– How long are we preserving them for?
– What strategies and actions do we need to apply?

These questions and other themes introduced in Chapter 1 are explored in the
rest of this book.
Chapter 2
Why do we Preserve? Who Should do it?

Introduction
Society, of course, has a vital interest in preserving
materials that document issues, concerns, ideas, dis-
course and events ... The ability of a culture to survive
into the future depends on the richness and acuity of
its members’ sense of history (Task Force on Archiv-
ing of Digital Information, 1996, p.1)

Preservation is commonly perceived to be the responsibility of large, well-


resourced institutions such as national libraries and archives, state libraries, and
university and research libraries. This perception is no longer valid in the digital
age. It may have been a legitimate view in the days when expensive conserva-
tion laboratories were considered a necessary requirement for a successful
preservation programme and when computer installations were expensive and
few in number, but the current reality is very different. Documentary materials
in digital form are now being created at all levels of society. Responsibility for
the preservation of these digital materials must be shared among creators and
users of digital information, and not remain solely the concern of librarians
and archivists.
This chapter investigates three questions:

– Why should digital materials be preserved?


– Who has responsibility for their preservation?
– How significant is the problem of digital preservation?

While the continuing roles of the institutions traditionally identified as respon-


sible for preservation – libraries, archives and museums – are noted, attention
is also paid to the roles of the much wider range of stakeholders who must also
participate if digital preservation is to be effective.

Why preserve digital materials?


The reasons for preserving knowledge are variously described in terms of duties,
obligations and benefits. Preservation is based on the notion that, because man
learns from the past, ‘evidence of the past therefore has considerable signifi-
cance to the human race and is worth saving’ (Harvey, 1993, p.6). Not only is
26 Why do we Preserve? Who Should do it?

preservation worth doing, it is also, some suggest, a duty. Agresto, former head
of the US National Endowment for the Humanities, suggested that ‘we have a
human obligation not to forget’ (cited in Harvey, 1993, p.7) and that preserva-
tion is essential for the well-being of democracies that depend ‘on knowledge
and the diffusion of knowledge’ and on ‘knowledge shared’ (Harvey, 1993,
p.7). Even greater claims are made: ‘the ability of a culture to survive into the
future’ depends on the preservation of knowledge (Task Force on Archiving of
Digital Information, 1996, p.1).
The cultural and political imperatives that have led to preservation being
considered as fundamental have been explored in books, such as Lowenthal’s
The Past is a Foreign Country (Lowenthal, 1985) and Taylor’s Cultural Selec-
tion (Taylor, 1996), which persuade us that preservation is not simply the con-
cern of a limited number of cultural heritage institutions and professions, but
has dimensions that have significant impact, both limiting and sustaining, on
most aspects of society. There is, in fact, no single reason why we preserve
knowledge. Preservation, suggests Cloonan (2001, p.231), ‘has a life force
fueled by many (often disparate) sources’.
None of these reasons change when we consider the preservation of
knowledge encoded in digital materials, but the rhetoric alters to emphasize
economic rationales. Preserving digital materials is essential. If we do not attend
to it ‘what is at stake is the loss of data representing billions of dollars of in-
vestment in new intormation technology, new scientific discoveries, and new
information on which our economic prosperity and national security depend’
(NDIIPP, 2011, p.1). Evidential and accountability reasons are also commonly
given: ‘we expect that this [digital] content will remain accessible to allow us to
validate claims, trace what we have done, or pass a record to future generations’,
states the NSF-DELOS Working Group on Digital Archiving and Preservation
(2003, p.[i]), who also specify five conditions for preservation, any one of
which is sufficient to provide a benefit to society:

– If unique information objects that are vulnerable and sensitive and therefore subject
to risks can be preserved and protected;
– If preservation ensures long-term accessibility for researchers and the public;
– If preservation fosters the accountability of governments and organisations;
– If there is an economic or societal advantage in re-using information, or
– If there is a legal requirement to keep it (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.3).

Considerable attention is being directed to the need to preserve scientific data.


‘Data are the foundation on which scientific, engineering, and medical knowl-
edge is built’, notes the Committee on Ensuring the Utility and Integrity of
Research Data in a Digital Age (2009, p.ix). Their report examines in detail
how digital technologies have changed scientific research, leading to the genera-
tion of very large quantities of data that need to be accessible into the future.
Professional imperatives 27

There is also a widespread appreciation that the preservation scene is chang-


ing in significant ways. Lyman and Kahle (of Internet Archive fame) note that
recordkeeping, which, combined with archival preservation, is the basis of
historical memory, was greatly facilitated by print, and institutions such as uni-
versities, publishers and museums collect, organize and preserve ‘the historical
memory that gives culture continuity and depth’. But this is changing:

What are, and will be, the social contexts and institutions for preserving digital docu-
ments? Indeed, what new kinds of institutions are possible in cyberspace, and what
technologies will support them? What kind of new social contexts and institutions
should be invented for cyberspace? (Lyman and Kahle, 1998).

Such new social contexts are emerging and their digital content is deemed
worth preserving. The Library of Congress’s work in preserving Twitter content
(Watters, 2011) and the Schlesinger Library’s in preserving blogs (Dunn, 2009)
are two examples.
The very aims of preservation are also being questioned – ‘What are we
preserving? For whom? And why?’ (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.2) – and the expanded number of stake-
holders in the digital age means that a range of different interests must be con-
sidered.

Professional imperatives
What do these changes mean for libraries and archives? Have there been signifi-
cant changes in their practices?
At one level, there has been little change. Libraries are still, to use Deanna
Marcum’s words, ‘society’s stewards of cultural and intellectual resources’
(Kenney and Stam, 2002, p.v). Preservation is nothing less than core business
for libraries who maintain collections for use in the future. One typical view of
the preservation role of libraries is Gorman’s statement:

Libraries have a duty to preserve and make available all the records of humankind.
That is a unique burden. No other group of people has ever been as successful in pre-
serving the records of the past and no other group of people has that mission today ... Let
there be no mistake: if we librarians do not rise to the occasion, successive generations
will know less and have access to less for the first time in human history. This is not a
challenge from which we can shrink or a mission in which we can fail (Gorman, 1997).

The arguments for preservation as a core activity in libraries are traditionally


articulated as maintaining the collection for access for a period of time (which
varies according to the type of library and the number of years material is
required to remain accessible), as making good sense economically by allowing
28 Why do we Preserve? Who Should do it?

items to be used longer before they wear out, and by the ‘just in case’ argument:
‘It cannot easily be predicted what will be of interest to researchers in the
future. Preserving current collections is the best way to serve future users’
(Adcock, 1998, p.8).
Similarly, there has been no fundamental change in archivists’ secure under-
standing of their preservation responsibilities. They have typically placed the
physical care of their collections at least on a par with, if not at a higher level
of importance than, the provision of access to those collections. This ‘physical
defence of archives’ was indeed considered paramount by the British archivist
Sir Hilary Jenkinson, who formulated this influential statement in 1922:

The duties of the Archivist ... are primary and secondary. In the first place he has to
take all possible precautions for the safeguarding of his Archives and for their custody
... Subject to the discharge of these duties he has in the second place to provide to the
best of his ability for the needs of historians and other research workers. But the position
of primary and secondary must not be reversed (Jenkinson, 1965, p.15).

There is ample evidence in the archives literature to support the generalization


that archivists have thought more thoroughly about their professional practice
and have articulated it more clearly than have librarians. Gilliland-Swetland
examines how archival principles provide powerful ways of thinking about the
preservation of digital materials. ‘Implementing the archival perspective in the
digital environment’, she suggests, encompasses

– Working with information creators to identify requirements for the long-term man-
agement of information;
– Identifying the roles and responsibilities of those who create, manage, provide ac-
cess to, and preserve information
– Ensuring the creation and preservation of reliable and authentic materials;
– Understanding that information can be dynamic in terms of form, accumulation,
value attribution, and primary and secondary use; …
– Identifying evidence in materials and addressing the evidential needs of materials
and their users through archival appraisal, description, and preservation activities
(Gilliland-Swetland, 2000, p.21).

Nor should we forget the specific legal reasons for preservation. In the case of
archives these reasons are often connected to administrative and political ac-
countability. For some types of libraries, statutory responsibilities require that
preservation is their core business. National libraries, for example, have a stat-
utory responsibility for collecting and safeguarding access to information pub-
lished in their countries.
While the traditional preservation responsibilities of libraries and archives
may remain the same, there has been significant change in the ways in which
they are interpreted and operationalized as digital materials have become prev-
New stakeholders 29

alent. In their report about repositioning research libraries Walters and Skinner
(2011, p.57) state firmly that

very few research libraries should have more than half of their infrastructure devoted to
physical collections at this point in time. The library needs to think of digital curation as a
core function of the library and to invest financial and other resources into it accordingly.

New expertise and new perspectives are required, without discarding the prin-
ciples of preservation developed for non-digital materials. Of greatest signifi-
cance is the need to engage with other stakeholders and ‘form new alliances and
partnerships’ (Webb, 2000). These new stakeholders may not always embrace
engagement willingly and will often need to be convinced of their roles, as
Hilton, Thompson and Walters (2010) point out when writing of donations of
digital material to the Wellcome Library in the UK.
Smith cogently summarizes the concepts in this section:

Society has always created objects and records describing its activities, and it has con-
sciously preserved them in a permanent way … Cultural institutions are recognised
custodians of this collective memory: archives, librar[ies] and museums play a vital
role in organizing, preserving and providing access to the cultural, intellectual and his-
torical resources of society. They have established formal preservation programs for
traditional materials and they understand how to safeguard both the contextual circum-
stances and the authenticity and integrity of the objects and information placed in their
care … It is now evident that the computer has changed forever the way information is
created, managed, archives and accessed, and that digital information is now an integral
part of our cultural and intellectual heritage. However the institutions that have tradi-
tionally been responsible for preserving information now face major technical, organiza-
tional, resource, and legal challenges in taking on the preservation of digital holdings
(B. Smith, 2002, pp.133-134).

New stakeholders
The new challenges of digital preservation call for the involvement of new par-
ticipants. No longer are librarians and archivists the main groups concerned
with preserving digital materials: it is increasingly evident that the cultural heri-
tage institutions traditionally charged with responsibility for preserving materials
cannot continue to carry this responsibility in the digital age without widening
the range of partners in their endeavours. Scholars and scientists who, increas-
ingly, base their research on large data sets, drug companies who need to prove
ownership of intellectual property, lawyers who must keep secure evidence in
digital form, are but a few of myriad potential stakeholders.
Not only are new kinds of stakeholders claiming an interest or claiming
control, but higher levels of collaboration among stakeholders are also com-
monly understood to be necessary for digital preservation to be effective.
30 Why do we Preserve? Who Should do it?

Narrowly focused localized solutions are not considered likely to be the most
effective. Cooperation ‘can enhance the productive capacity of a limited sup-
ply of digital preservation funds, by building shared resources, eliminating
redundancies, and exploiting economies of scale’ (Lavoie and Dempsey, 2004).
The preservation of digital materials has become ‘essentially a distributed proc-
ess’ where ‘traditional demarcations do not apply’ and one for which ‘an inter-
disciplinary approach is necessary’ (Shenton, 2000, p.164).
Collaboration is considered more and more as the only way in which viable
and sustainable solutions can be developed, as the problems are well beyond
the scope of even the largest and most well-resourced single institution.
(UNESCO, 2003, Chapter 11 explores collaboration in more detail, and Chap-
ter 9 of this book provides examples of collaborative activities).
Who, more specifically, are these new stakeholders? What are their preser-
vation roles in an increasingly digital environment? An early indication was
provided by the Task Force on Archiving of Digital Information, whose in-
fluential 1996 report set much of the digital preservation agenda for the fol-
lowing decade. This report suggested that ‘intense interactions among the
parties with stakes in digital information are providing the opportunity and
stimulus for new stakeholders to emerge and add value, and for the relation-
ships and division of labor among existing stakeholders to assume new forms’.
It proposed two principles, the first that information creators, providers and
owners ‘have initial responsibility for archiving their digital information ob-
jects and thereby assuming the long-term preservation of these objects’, and
the second that, where this mechanism fails or becomes unworkable, ‘certified
digital archives have the right and duty’ to preserve digital materials (Task
Force on Archiving of Digital Information, 1996, pp.19-20). Since 1996 the
landscape of digital preservation has become clearer and we now see significant
levels of collaboration, strong emphasis on and involvement of data creators as
first-line preservers, and the development of certified digital archives (trusted
digital repositories).
In addition to data creators and certified digital archives, there are other
new stakeholders. They include commercial services, government agencies,
individuals, rights holders, beneficiaries, funding agencies, and users (Hodge and
Frangakis, 2004, p.15). ‘Hardware and software developers, publishers, produc-
ers, and distributors of digital materials as well as other private sector partners’
(UNESCO, 2004, Article 10) can also be added to the list. All stakeholders
are learning how to work together, learning first to understand the languages
of other disciplines and then working out how complementary skills can fit
together. Collaboration was of course not unknown in the old preservation
paradigm, one example being the collaboration of scholars and librarians to
identify the core literature in specific discipline areas for microfilming and
scanning projects (see Gwinn (1993) for an example in agriculture). The extent
and nature of collaborative activity has, however, intensified. It is encapsulated
New stakeholders 31

in the ‘Community Watch and Participation’ action in the influential DCC


Curation Lifecycle Model (Digital Curation Centre, 2008) where collaboration
to develop shared standards, tools and software is specifically noted. Chapter 9
notes many examples of collaborative digital preservation activities.
The role of scientists and scholars as stakeholders in digital preservation is
increasingly recognized. The way in which scholars’ work is being transformed,
most notably in the sciences but also in the social sciences and the humanities,
is well documented. The editorial of a 2011 issue of D-Lib Magazine notes that
‘The management of research data in a digital networked world is increasingly
recognized as a significant challenge, a significant opportunity, and absolutely
essential to the conduct of scientific research in the 21st century’ (Lannom,
2011). The predominantly data-driven approaches to science create and use
very large data sets and, consequently, increasing attention is being paid to the
long-term preservation of scientific data (see, for example, the DCC SCARP
project (www.dcc/ac/uk/projects/scarp) and the Alliance for Permanent Access
(www.alliance permanentaccess.org)).
The dissemination of scholarly output provides an example of the trans-
formation of scholarship. Many scholars no longer rely solely on the formal
mechanisms of printed publications for disseminating their research. They
place more and more emphasis on other mechanisms, such as pre-print archives
in high-energy physics and in mathematics, institutional repositories (often
university-based), the development of web sites and social media sites based
around communities of scholars, and the requirements of scholarly journals
that data supporting published research are deposited in a public archive, all of
which have implications for the preservation of their scholarly output.
Digital preservation is increasingly also of concern to individuals. Personal
information – ‘social and personal memory’ (B. Smith, 2002, p.135) – is being
created and stored in digital form on digital media such as CDs and flash
drives ‘with the mistaken belief that this will ensure that those memories will
always be available to them for consultation’ (B. Smith, 2002, p.135). As recog-
nition of the precariousness of personal digital materials grows, so too does the
interest of individuals in understanding the issues and responding to them.
Individuals have been identified as essential to any discussion of digital pres-
ervation. How we archive our personal digital materials has become a topic of
scholarly investigation (see, for example, a synthesis of the outcomes of the
Digital Lives research project that ran from 2007 to 2009, which indicates the
extent of the research and literature about this field (John et al., 2010)), the focus
of conferences (Personal Digital Archiving conferences were held in 2010 and
2011), the topic of publications (for example, Lee, 2011) and the target of advice
(the Library of Congress ran a Personal Archiving Day in 2011 and provides
advice (www.digitalpreservation.gov/you) for keeping personal information in
digital form).
32 Why do we Preserve? Who Should do it?

The keeping of personal correspondence provides an informative illustra-


tion of the major changes that are occuring. Its value as a source of historical
information has long been recognized, but the widespread shift from writing
letters to the use of email has diminished the likelihood that personal corre-
spondence will remain accessible to future historians. Lukesh asked in 1999,
‘Where will our understandings of today and, more critically, the next century be,
if this rich source of information is no longer available?’, as scientists, scholars,
historians and almost everyone increasingly use email (Lukesh, 1999). This
concern has continued to be expressed in actions to archive emails, for example,
at Harvard University (Goethals and Gogel, 2010). Pre-digital paradigm pres-
ervation conventions mean that creators of emails expect librarians and archi-
vists to collect and maintain materials ascertained to be of long-term value,
usually well after the time of the materials’ creation. But because emails, like all
digital materials, become inaccessible quickly, for reasons given in Chapter 3,
the determination of long-term value cannot be made well after the time the
emails were created. With digital materials, suggests Smith, ‘the critical depend-
ency of preservation on good stewardship begins with the act of creation, and
the creator has a decisive role in the longevity of the digital object’ (Smith,
2003, pp.2-3). For most creators of information in any form this is a new role.
Publishers are another stakeholder group whose responsibilities and roles
change as more of their output is distributed in digital form. Some national
libraries have developed cooperative arrangements with publishers to ensure
that preservation responsibilities for digital publications are understood and
shared. One example is the 2002 agreement between Elsevier Science and
the Koninklijke Bibliotheek (the National Library of the Netherlands)
through which the Library receives digital copies of all journals on Elsevier’s
ScienceDirect web platform. The level of participation by publishers in
CLOCKSS, a cooperation of publishers and libraries to preserve e-journals
(noted in more detail in Chapter 9), indicates increasing understanding of digital
preservation issues by publisher stakeholders.
Doubts have been expressed about the interest and willingness of for-profit
organizations to participate in digital preservation initiatives. Search engine
companies, for instance, ‘are not in the business of long-term archiving of the
web or even a portion of it, nor should they be expected to take on this respon-
sibility’. The entertainment industry is increasingly digital, and its products,
audio and video, have a well-established place as ‘critical resources for research,
historical documentaries, and cultural coherence resources’. Even given the
prevailing market-driven political ethos, it is difficult to envisage a situation
where market forces will be sufficient to ensure the preservation of this digital
material. The opposite is more likely to apply: ‘in some cases market forces
work against long-term preservation by locking customers into proprietary
formats and systems’ (Workshop on Research Challenges in Digital Archiving
and Long-term Preservation, 2003, pp.x-xi).
How much data have we lost? 33

How much data have we lost?


It is helpful to have some understanding of how extensive the preservation
problem is, although estimates must remain very inexact. We know something
of its parameters for the traditional artifacts, mainly paper-based, that make up
the collections of libraries and archives. Studies of paper deterioration have
been carried out since the 1950s, after Barrow’s influential study Deterioration
of Book Stock: Causes and Remedies (Barrow, 1959) raised alarm within the
library community. Later studies refined the conclusions of this study. Although
the evidence allows only general statements to be made, it is commonly
thought that paper embrittlement affects in the order of 30 per cent of the col-
lections of large research libraries in the United States. (This is explained in
more detail in Harvey, 1993, pp.9-10.) Nor is the problem of deterioration of
the artifacts typically found in traditional collections limited to paper. All
materials deteriorate: photographs, nitrate film, and cellulose acetate film are
but a few examples.
The concern of this book, however, is with digital materials. What is the
extent of the preservation problem for these materials? The dramatic increase
in the volume of digital materials (noted in Chapter 1) means that there is
much more digital content worth preserving. It follows, therefore, that the
quantity of what is worth preserving is also increasing dramatically, and our
ability to manage and preserve them is far outpaced by the rate of increase. It
was suggested in 2002 that the last 25 years have been a ‘scenario of data loss
and poor records that has dogged our progress’ and that, if this is not reversed,
‘the human record of the early 21st century may be unreadable’ (Deegan and
Tanner, 2002).
Rhetoric of this kind is common in the literature, but is regrettably poorly
supported with specifics and evidence. Alarmist descriptions abound: there
will be ‘a digital black hole … truly a digital dark age from which information
may never reappear’ (Deegan and Taylor, 2002) if we do not address the
problem, and we will become an amnesiac society. (An early use of this
evocative term in relation to digital materials was by Sturges (1990), whose
comments are still worth reading today.) Although we may be in ‘the best
documented era in history’ much of the documentation of the era, according to
some, has been lost – ‘the first email message, chat group session, and web
site’ (Vogt-O’Connor, 1999) are gone.
An extensive literature survey in 2004 located relatively few documented
examples of data loss. Usually the literature notes only general categories of
digital material that are, or are thought to be, at risk. One of these comes from
the US federal government, for whom, O’Mahony notes, it was the norm,
when web sites or internet files were changed, to overwrite the old information.
Consequently ‘the public is now experiencing losses of government informa-
tion, on a scale similar to that of the catastrophic fire of 1921 [in which the
34 Why do we Preserve? Who Should do it?

1890 census records were destroyed], on what seems to be a regular basis’


(O’Mahony, 1998, p.108). Another example is of business records in electronic
form. A study of the companies that were relocated after the World Trade Center
bombing in 1993 found that 40 per cent ceased trading, a major reason being
the loss of key business records in electronic form, and ‘43 per cent of compa-
nies which lose their data close down’ (cited in Ross and Gow, 1999, p.iii).
Some of the documented examples are of data recovery, in the sense that data
thought to have become inaccessible were recoverable, or those thought to
have been lost were only mislaid. Ross and Gow note four case studies of data
recovery: the Challenger Space Shuttle Tapes, Hurricane Marilyn (the Virgin
Islands, September 1995), video image recovery from damaged 8mm recorders
(from a crashed fighter plane), and German unification and the recovery of
electronic records from East Germany (Ross and Gow, 1999, pp.39-42). Another
documented example is the Functional Requirements for Evidence in Record-
keeping project administered by the University of Pittsburgh. The web site
with the working files of this project was accidently destroyed, but because it
was not updated since 1996 it can be accessed through the Internet Archive; it
can also be seen at www.archimuse.com/papers/nhprc.
From the literature it is only possible to conclude, as the authors of the
Digital Preservation Coalition’s handbook (2008, p.32) did for the UK, that the
evidence of data loss is ‘as yet only largely anecdotal ... [but] it is certain that
many potentially valuable digital materials have already been lost’.
We should document some specific examples to counter the charge of being
alarmist. Useful questions to pose are: Is the problem of digital preservation as
great as we have assumed? How much digital information has been lost and
how much has been compromised? To what extent have the data been com-
promised? There are excellent reasons to put some effort into attempting to
quantify the extent of digital information loss or compromise or, at the very
least, document some specific examples to supplement the still limited number
of case studies. The desirability of more documented examples and case studies
has been recognized for some time. For example, Ross and Gow concluded
that ‘information about data loss, recovery, and risk is very difficult to acquire
… more case studies about data loss and rescue need to be collected’ (Ross
and Gow, 1999, p.vi).
The term data loss used here also includes data that are compromised; they
are degraded to the extent that their quality is affected. (The phrases loss of data
integrity or loss of data authenticity might also be used.) The data may still be
accessible, but we have no clear idea of what they mean, what software was
used to create them, and so on.
The question ‘How much digital information has been lost and how much
has been compromised?’ is impossible, to answer. No general estimates of
quantity based on solid evidence (as opposed to conjecture) have been located,
and few specific examples or carefully documented case studies seem to exist.
How much data have we lost? 35

The same examples are presented, even when they are no longer in the ‘lost or
compromised’ category: the BBC’s Domesday Project, NASA data, the Viking
Mars mission, the Combat Area Casualty file containing prisoner of war and
missing in action information for the Vietnam war, the first email, the first web
site, as described in more detail below. Attempting to answer the question
requires, first, a consideration of the issue of selection for preservation (noted
in more detail in Chapter 4). A common argument is that anything significant
is likely to be maintained anyway; so should we be concerned about the rest?
Some of the examples that follow assume that the first (email, web site, and so
on) is worth preserving; but is this necessarily the case? It is often a view
developed in hindsight. Betts tells us that Ray Tomlinson, principal engineer at
BBN Technologies in Cambridge, Massachusetts did not save the first network
email ever sent in 1972 because ‘it just didn’t seem worth saving … Even if
backup tapes did exist, they might not be readable. They were just mag tapes,
and after seven or eight years, the oxide starts falling off, especially from tapes
of that era’ (Betts, 1999).
The small number of specific examples located indicate how great the
problem of loss or compromise of digital materials could be. The most often
quoted, indeed overused, examples are those cited in the 1996 report of the Task
Force on Archiving of Digital Information. Because they have been reported
very widely since, they warrant quoting at some length. The report notes the
case of the US Census of 1960.

In 1976, the National Archives identified seven series of aggregated data from the 1960
Census as having long-term historical value. A large portion of the selected records,
however, resided on tapes that the Bureau could read only with a UNIVAC type-II-A
tape drive. By the mid-seventies, that particular tape drive was long obsolete, and the
Census Bureau faced a significant engineering challenge in preserving the data from
the UNIVAC type II-A tapes. By 1979, the Bureau had successfully copied onto industry-
standard tapes nearly all the data judged then to have long-term value (Task Force on
Archiving of Digital Information, 1996, p.2).

The report notes other lost examples, one of them the first email message ‘sent
either from the Massachusetts Institute of Technology, the Carnegie Institute
of Technology or Cambridge University’ in 1964 (Task Force on Archiving of
Digital Information, 1996, p.3).
Rothenberg reminds us of some examples noted in a 1990 US House of
Representatives report:

hundreds of reels of tape from the Department of Health and Human Services; files
from the National Commission on Marijuana and Drug Abuse, the Public Land Law
Review Commission, the President’s Commission on School Finance, and the National
Commission on Consumer Finance; the Combat Area Casualty file containing POW
and MIA information for the Vietnam war; herbicide information needed to analyze the
impact of Agent Orange; and many others (Rothenberg, 1999b, pp.1-2).
36 Why do we Preserve? Who Should do it?

He reiterates the paucity of specific examples and offers a reason: ‘this may
simply reflect the fact that documents or data that are recognized as important
while they are still retrievable are the ones most likely to be preserved’ (Rothen-
berg, 1999b, p.2).
Another frequently cited example is the BBC Domesday Project. This pro-
ject captured the national imagination in the UK and resulted in a multi-media
version of the Domesday Book on videodisc, produced to mark the 900th anni-
versary of the original. It became inaccessible in the late 1980s as the hardware
platform for which it was developed, the BBC microcomputer and interactive
videodisc player, became obsolete. Attempts to restore the data include applying
emulation techniques and reverse engineering so it could be used on a Windows
PC and made available on the web. In 2011 the BBC made available a full ex-
traction of the community disc on the Domesday Reloaded web site (www.
bbc.co.uk/history/domesday). This project is also noted in Chapter 7.
Cook’s 1995 call to action provides Canadian examples of data loss where
recordkeeping practices were ignored in the move to online recordkeeping.
Cook noted that ‘the National Archives of Canada … found not only that 30 out
of 100 randomly chosen policy documents could not be found in the govern-
ment’s paper records, but also that no system was in place to safeguard the
contents of the electronic system’. Ontario Hydro’s nuclear power plant failed
to keep adequate electronic or paper records of its construction and operation
(Cook, 1995).
Scientific data have been the focus of many studies. One is Preserving
Scientific Data on Our Physical Universe (National Research Council, 1995),
which indicates what scientific data were then available from United States
scientific observation and what they have been and might be used for. It includes
some comments about what has survived. Space physics research has gener-
ated significant quantities of data over the last 30 years and much of this was
‘archived’ by sending the tapes, and sometimes relevant documentation, to the
NSSDC (National Space Science Data Center). However, ‘there are many data
at the NSSDC that most scientists would find difficult to use with only the
information originally supplied’ (National Research Council, 1995, p.21). This
report also notes the Landsat data, a large part of which resided ‘on tapes that
cannot be read by any existing hardware. Recent data-rescue efforts have been
successful in getting older data into accessible form, but these efforts are time-
consuming and costly’ (National Research Council, 1995). Humphrey gives
examples of the research data generated by research funded by the Social
Sciences and Humanities Research Council of Canada. Of a set of 150 studies
from 1977 to 1980, the data sets from only three could be located in 1998
(Humphrey, 2003).
There is some cause for optimism. Our longer experience with digital
preservation, together with some research, indicates that data previously thought
to be unrecoverable can be made usable if sufficient time, expertise and money
Current state of awareness of digital preservation problems 37

are available. The Legacy Media Project at the National Archives of Australia,
for example, recovered data fully from 82 per cent of the carriers it investi-
gated, with only 2 per cent unrecoverable; this was, however, at a cost of about
$35 per megabyte (Pearson, 2009). Storage media failure is, of course, only
one of the issues.

Current state of awareness of digital preservation problems


Some data loss is the result of lack of awareness about the proactive thinking and
interventionist action needed to preserve digital materials. Wheatley described in
2003 the lack of understanding by IT industry vendors as demonstrated during
the BBC Domesday Project. Commercial vendors offered to assist this digital
preservation project: ‘[one] offered us a special polymer that they guaranteed
would preserve a CD-ROM for 100 years. They were unable to answer how we
would preserve a CD-ROM player for that length of time’. Wheatley concluded
that ‘the myth that long-lived media equals long-lived preservation is still wor-
ryingly popular’ (Abbott, 2003, p.10). A more recent example suggests there
has been little change; the US producer of recordable CDs, MAM-A, promotes
its archival discs (CD-R and DVD-R) with the slogan ‘Store Your Most Precious
Memories For Over a Century’ (www.mam-a-store.com).
Levels of awareness of digital preservation are not keeping pace with the
expansion in digital resource creation, although the situation is improving.
Two decades ago there were pockets of awareness in only a few isolated disci-
pline areas, such as geoscience, where considerable specialist expertise is to be
found. Preservation was not generally considered to be a central professional
concern of information professionals, as Marcum, for instance, suggested: ‘pres-
ervation as a core activity of libraries remains less visible than others such as
cataloging and user surveys’ (Kenney and Stam, 2002, p.v). Digital preserva-
tion fared little better: it was ‘quite clearly not attention grabbing enough … to
have yet brought seriously on board authors, publishers and other digital con-
tent creators, funding agencies, senior administrators, hardware and software
manufacturers, and so on’ noted a keynote speaker at a digital preservation
conference in 2000 (Brindley, 2000, p.127).
A low level of awareness was also evident in a 2008 survey of local
government agencies in the UK to determine how prepared they were for the
preservation of digital materials. Although more than 80 per cent of respondents
held records in digital form, their awareness of preservation issues relating to
those records was low. Some steps had been taken; almost half of the
respondents had a digital preservation policy in place and had undertaken some
planning, but little else was reported (Boyle, Eveleigh and Needham, 2008).
We can now be a little more sanguine; the level of awareness is increasing.
The awareness that is evident in the requirements of funding agencies provides
38 Why do we Preserve? Who Should do it?

an excellent example. In the UK the Research Information Network (RIN)


framework of principles for the stewardship of research data articulates clearly
why continued access to data is central to high-quality research and lays out
the responsibilities of stakeholders – ‘universities, research institutions, libraries
and other information providers, publishers, research funders as well as re-
searchers themselves’ (Research Information Network, 2008, p.3). As a con-
sequence, scholarly and scientific communities now need to learn about digital
preservation so they can articulate data management plans in their applications
for research funding. The Digital Curation Centre (DCC) is one organization
that provides data management advice and resources for these communities.
The growing awareness of digital preservation issues by individuals concerned
with archiving their personal digital materials, and the increase in advice
aimed at this group, is noted earlier in this chapter.
There is no doubt, though, that more guidance still is needed for a wide
range of people with varying levels of awareness about and expertise of digital
preservation. With increasing awareness comes an urgent need for clear
guidance and institutional backing so that current best practice is applied. It is
no coincidence that the number of books about digital preservation published
in recent years or projected to be published soon has risen significantly.

Conclusion
This chapter dwells on the question of who should take responsibility for digital
preservation. As well as the organizations that have traditionally been concerned
with preservation (libraries, archives, museums), the preservation of digital
materials must involve many other stakeholders. Their input is needed in deter-
mining what is kept, in negotiating rights, and in developing policies and proce-
dures for managing digital materials. This chapter also considers the question
of how much digital material we have lost. Although the parameters of loss are
very unclear, there is little doubt that the amount of digital materials that we
are unable to access, or able to access only after considerable effort and expense,
is significant and will continue to increase unless action is taken. Our ability to
take action now is hampered by continuing widespread lack of awareness of
the problem. The next chapter looks more closely at why there is a problem by
considering the nature of digital materials.
Chapter 3
Why There’s a Problem:
Digital Artifacts and Digital Objects
Introduction
On the surface, digital technology appears to offer
few preservation problems. Bits and bytes are easy to
copy, so there should be no problems in developing
an unending chain of copies into the future, and hav-
ing copies all over the world in case of disaster. How-
ever, we already know that the reality is not so simple
and that there are very significant technical and man-
agement problems. The two main factors leading to
inaccessibility of digital information: changing tech-
nology platforms and media instability, are relentless,
with the potential to render digital information useless
(Webb, 2000)

Why are digital materials different? What are the modes of digital death?
Answering these questions provides a framework for understanding why digital
preservation poses major challenges. There are three sets of challenges:

– those relating to the nature of the media that are used to store digital mate-
rials;
– those resulting from the technologies required to create, store and access
digital materials;
– those characterized in this book as challenges to the integrity of digital
materials.

The issues are complex and are interrelated. Rothenberg’s perceptive statement
about them is worth quoting at length for its clear description of the range of
issues and their close relationship:

It is now generally recognized that the physical lifetimes of digital storage media are
often surprisingly short, requiring information to be ‘refreshed’ by copying it onto new
media with disturbing frequency. Moreover, most digital documents and artifacts exist
only in encoded form, requiring specific software to bring their bit streams to life and
make them truly usable; as these programs (or the hardware/software environments in
which they run) become obsolete, the digital documents that depend on them becomes
unreadable – held hostage to their own encoding. This problem is paradoxical, given
the fact that digital documents can be copied perfectly, which is often naively taken to
mean that they are eternal … In addition to the technical aspects of this problem, there
are administrative, procedural, organizational, and policy issues surrounding the man-
40 Why There’s a Problem: Digital Artifacts and Digital Objects

agement of digital material. Digital documents are significantly different from traditional
paper documents in ways that have significant implications for the means by which
they are generated, captured, transmitted, stored, maintained, access, and managed …
[mandating] new approaches to accessioning and saving digital documents to avoid
their loss. These approaches raise nontechnical issues concerning jurisdiction, funding,
responsibility for successive phases of the digital document life cycle, and the develop-
ment of policies requiring adherence to standard techniques and practices to prevent the
loss of digital information (Rothenberg, 1999a, p.2).

The sheer number of digital materials now in existence threatens to overwhelm;


they are being created at an ever-increasing rate (as noted in Chapter 1). The
ownership of intellectual property rights in digital materials is often complex
(consider, for instance, a film with its scriptwriters, actors, directors and many
other interests) and for some materials these rights need to be negotiated be-
fore libraries and archives can legally preserve them. New formats continue
to proliferate. Many libraries are focusing their attention on short-term goals
with the consequence that collection maintenance over the longer term is not
adequately resourced.
This chapter is principally concerned with the reasons why it is difficult to
maintain access to digital materials into the future – that is, to preserve them. It
examines the causes of deterioration of the media, noting structural and manu-
facturing reasons and deterioration that is related to how the media are stored
and handled. It considers the reasons for technological obsolescence and its
consequences, such as the loss of functionality of access devices, manipulation
and presentation capabilities, and contextual information. This chapter does
not include strategies and techniques used to address the deterioration of digi-
tal materials (the focus of Chapters 6, 7 and 8), but some are mentioned in
passing.

Modes of digital death


The Digital Preservation Coalition’s handbook compares the challenges in-
volved in the preservation of digital materials – ‘maintaining access to digital
materials over time’ – with the preservation of paper-based materials. First, the
implications of technology-related challenges, such as the dependence on
machinery for playback of digital materials and the speed at which this tech-
nology changes, are noted. These call for more immediate preservation action
than paper-based materials, for which a period of inaction is not usually harmful.
Its authors point out that preservation of digital materials cannot be sporadic:
‘a continual programme of active management is needed from the design and
creation stage’, and this, in turn, results in changing roles of stakeholders and
in different kinds of inter- and intra-institutional collaboration. The issues
resulting from ‘the fragility of the media’ are noted, as are the challenges of
Modes of digital death 41

‘ensuring the continued integrity, authenticity, and history’ of digital materials,


which are a consequence of their ease of alteration. The implications resulting
from the nature of digital materials are given; if proactive preservation is not
implemented at an early stage in the life of the digital materials, they will be
lost or unusable in a short period of time. This places significant demands on
allocation of preservation resources (Digital Preservation Coalition, 2008, p.33).
The National Library of Australia’s statement of the challenges of digital
preservation in its Digital Preservation Policy (National Library of Australia,
2008) enlarges on the Digital Preservation Coalition’s statement. Technology-
related challenges are ‘the changing availability of hardware, software and
other technology required for access’. Media-related issues are ‘widespread
use of relatively unstable carriers, subject to short-term media deterioration
and data corruption or loss’ and ‘the diverse and frequently changing range of
file formats and standards’. The challenges of ensuring integrity and authenticity
include ‘uncertainty about the significant properties that must be maintained
for some digital resources’, ‘intellectual property and other rights-based con-
straints’, and ‘the need for preservation decisions to be made early in the life
cycle of digital objects’ which can for some materials be frustrated by ‘rela-
tively long delays between their creation and their being acquired and con-
trolled by the Library’. The policy also notes general issues that have an impact
on the preservation of digital materials, such as their large quantity and rapid
growth-rate, the ‘recurring nature of many of the threats’, and uncertainty
about ‘the strategies and techniques most likely to be effective’. Other chal-
lenges and uncertainties are the resource implications (‘the likely high costs of
taking action, and … of delaying or not taking action’), ‘administrative com-
plexities in ensuring timely action is taken that will be cost-effective over very
long periods of time’, and ‘the need to develop and maintain suitable knowledge
and systems to deal with these challenges’.
Examples to illustrate these challenges are readily located (some are in
Chapter 2). Kroll Ontrack, a large international data recovery company, sug-
gests that hardware and system problems cause most customers to approach
them (29 per cent), followed by human error (27 per cent). Software corruption
or program problems, computer viruses and natural disasters are the other rea-
sons listed (Kroll Ontrack, 2011). Hedstrom and Montgomery, in their 1998
survey of Research Libraries Group members, reported that the oldest digital
materials were written to their current carriers in the late 1970s. Fifteen of the
36 institutions responding to the survey could not access some digital materials
they held because they lacked ‘the operational and/or technical capacity to
mount, read, and access’ them (Hedstrom and Montgomery, 1999, pp.11-12).
Among the findings of a 2009 survey of mainly European national libraries
and archives conducted by the Planets Project were: while over 90 per cent of
respondents are aware of the challenges presented by digital preservation, only
half had a digital preservation policy in place; less than a third of respondents
42 Why There’s a Problem: Digital Artifacts and Digital Objects

consider they have ‘complete control over the formats that they will accept and
enter into their archives’; and over 70 per cent currently held less than 100 ter-
abytes in 2009, but expect to store over 100 terabytes in ten years time (Planets,
2009, p.3). A 2010 survey of special collections in North American research
libraries described the ‘increasing availability of special collections materials
in digital form over the past decade … [as] nothing sort of revolutionary for
both users of special collections and the professionals who manage them’ and
highlighted two challenges – ‘the need for complex technical skills and chal-
lenging new types of intra-institutional collaboration’ (Dooley and Luce, 2010,
p.53). Most often cited as impediments to effective digital preservation in
these collections were lack of funding (69 per cent), lack of time for planning
(54 per cent) and lack of expertise (52 per cent) (p.60).
Figure 3.1 lists the threats to digital continuity (‘continuity of production,
continuity of survival, continuity of access’) identified in the UNESCO Guide-
lines. Although not all of these threats are specific to digital materials, the list
serves as a useful reminder of the magnitude of the challenge that we face in
preserving digital materials.

– The carriers used to store these digital materials are usually unstable and deteriorate
within a few years or decades at most
– Use of digital materials depends on means of access that work in particular ways:
often complex combinations of tools including hardware and software, which typi-
cally become obsolete within a few years and are replaced with new tools that work
differently
– Materials may be lost in the event of disasters such as fire, flood, equipment failure,
or virus or direct attack that disables stored data and operating systems
– Access barriers such as password protection, encryption, security devices, or hard-
coded access paths may prevent ongoing access beyond the very limited circum-
stances for which they were designed
– The value of the material may not be recognised before it is lost or changed
– No one may take responsibility for the material even though its value is recognised
– Those taking responsibility may not have adequate knowledge or facilities
– There may be insufficient resources available to sustain preservation action over the
required period
– It may not be possible to negotiate legal permissions needed for preservation
– There may not be the time or skills available to respond quickly enough to a sudden
and large change in technology
– The digital materials may be well protected but so poorly identified and described
that potential users cannot find them
– So much contextual information may be lost that the materials themselves are un-
intelligible or not trusted even when they can be accessed
Digital storage media 43

– Critical aspects of functionality, such as formatting of documents or the rules by


which databases operate, may not be recognised and may be discarded or damaged
in preservation processing.

Figure 3.1: Threats to Digital Continuity (UNESCO, 2003, p.30-31)

Digital storage media


The term digital storage media is used here to mean the carriers or media on
which digital data are recorded. Common examples are diskettes, CDs, magnetic
tapes, hard drives and flash drives. To generalize, digital storage media are
fragile, deteriorate quickly and can unexpectedly fail if not handled and stored
appropriately. They are Thibodeau’s physical objects, ‘simply an inscription of
signs on a medium’ whose preservation is necessary but not sufficient for effec-
tive preservation of digital materials (Thibodeau, 2002, pp.6-7).
Rothenberg brought the issues associated with the short lifespan of digital
storage media to public attention in his 1995 article in Scientific American
(Rothenberg, 1995, expanded in Rothenberg, 1999b). The physical lifetimes
of digital storage media are, he noted, ‘often surprisingly short’ (Rothenberg,
1999a, p.2), but the actual lifespan was irrelevant because it usually exceeded
the length of time during which the media could be read. This is a result of the
rapid obsolescence of hardware and software in a computing environment
driven by rapidly decreasing costs of storage and computing and rapid in-
creases in computer speeds. His estimates (Rothenberg, 1995), which we now
know to have been conservative, were:

Magnetic tape 1 year equipment obsolescence 5 years


Videotape 1-2 years equipment obsolescence 5 years
Magnetic disk 5-10 years equipment obsolescence 5 years
Optical disk 30 years equipment obsolescence 10 years.

A more recent comparison of data carriers is provided in the UNESCO Guide-


lines, noted in Figure 3.2.
44 Why There’s a Problem: Digital Artifacts and Digital Objects

Carrier Current Speed of Expected Other


storage increase in usable life of comments
capacities capacity single unit
per unit

Magnetic disk up to 200 doubling around 5 years generally


(eg hard disk) gigabytes every 12-18 fixed media
months

Magnetic tape up to 200 gi- doubling ev- around 5 years portable


gabytes ery 12-18 media suitable
months for backup

Optical disk up to 4 slow wide range from portable


(CD, DVD) gigabytes because not say, media. unit costs
used for very 5 years for low low; low-cost
large archives quality products consumer equip-
or backups to several decades ment widely
for high quality available
products

Figure 3.2: Comparison of Data Carriers (UNESCO, 2003, p.113)

An illustration of the speed of change in storage capacities is that at the time of


writing (2011) hard disks aimed at the consumer market now routinely store
five to ten times more than in 2003 – one or two terabytes compared with ‘up
to 200 gigabytes’ in 2003.
Even though the lifespan of some digital storage media, such as ‘archival’
quality CD, is comfortably measured in decades, digital materials stored on
them will not be safe and accessible in the future. Other factors come into play:
access to working drives required to play the media; to the software drivers
those drives need to operate; the file formats in which the digital information is
stored; and the availability of applications that can read those formats. These
are noted in more detail later in this chapter.
The large quantities of materials that are to be preserved exacerbate the
challenges. UNESCO estimates that there are 200 million hours of audiovisual
content in archives around the world. For Europe alone the estimate is 100 mil-
lion hours, of which about 10 million hours had been digitized by 2007. It is
estimated that 40 to 70 per cent will never be digitized and so will eventually be
lost. Preserving only the large quantities of material resulting from digitizing
activity to date is a major challenge, and there is much more to come (Wright,
2005, p.4).
Howell (2001, p.139) suggests six factors that contribute to the rapid dete-
rioration of digital storage media: manufacturing quality of the medium; how
heavily the medium is used; how carefully it is handled; the temperature and
humidity levels at which it is stored; the quality of its storage environment;
Digital storage media 45

and the quality of the equipment used to access the medium. These are worth
examining in more detail.
The manufacturing quality of the medium is a crucial factor simply because
all materials deteriorate. In the words of David Bearman (1998, p.24), it is ‘a
fact of physics’, a fact that we must accept as a major limitation on digital
preservation, and the ‘outside boundary beyond which we cannot rationally
plan to retain the information without transforming the medium’. We can, how-
ever, attempt to influence manufacturers so that they improve the quality of
their products to meet archival and preservation requirements. Knowledge of
the physical and chemical makeup of digital storage media and of their processes
of deterioration is helpful in making decisions about preservation actions. One of
the keys to prolonging the life of digital artifacts is providing storage conditions
that slow down the rate of deterioration.
Howell (2001, p.138) makes the point that many digital storage media are
‘rotating technologies’ with moving parts and are, therefore, subject to wear that
may damage the media. For example, even a small rearrangement of the magnet-
ized particles on a magnetic tape, perhaps caused by accidental physical contact
with a part of the playback equipment, can result in loss of data and may some-
times be sufficient to render a whole file unreadable. Maintaining recording and
playback equipment to a high standard minimizes the likelihood of this occur-
ring. Similarly, inappropriate handling of optical media can result in physical con-
tact with the part of the media that records the data. For example, touching the
surface of a CD-ROM and leaving an oily residue can corrupt the bits stored on it.
One noteworthy change from the pre-digital preservation paradigm is the
realization that digital storage media have little or no artifactual value. This
point is explored in the 2001 report of the Task Force on the Artifact in Library
Collections. Artifacts are valued in library and archives collections because
their physical form demonstrates ‘the originality, faithfulness (or authenticity),
fixity, and stability of the content’ (Task Force on the Artifact in Library Collec-
tions, 2001, p.vi); the artifact is significant for research purposes because
it provides this evidence. When the information stored on or in an artifact is
reformatted, as in microfilming a book printed on brittle paper or copying data
from an obsolete format to a current one, these evidentiary qualities are lost.
Because deterioration of digital materials is, as already noted, ‘a fact of physics’,
refreshing and migration are facts of digital preservation life. Because refreshing
and migration replace obsolete digital media with current media, the artifact
itself, because it is replaced, cannot demonstrate qualities such as originality,
authenticity or fixity; other mechanisms are used to demonstrate these eviden-
tiary qualities. This is a major change from pre-digital paradigm thinking, and
we are only slowly changing our professional mindsets to accommodate the
necessary changes in practice.
The following examples of magnetic tape and optical disks illustrate many
of the factors that contribute to the deterioration of digital storage media. Other
46 Why There’s a Problem: Digital Artifacts and Digital Objects

storage media in current use, such as hard disk drives, could have been noted.
However, the principles that are identified for these two media types apply
more generally.

Magnetic media

This section is based on the writings of Van Bogart (1995), Ross and Gow
(1999), and the International Association of Sound and Audiovisual Archives
Technical Committee (2004 and 2009), which can be referred to for more de-
tailed information.
Magnetic tapes have been used for digital data storage from the 1960s.
Magnetic media (tape in reels and in housings such as cassettes and cartridges)
are still in common use today because they are versatile and cheap and can
provide higher data densities than other media. In 2011 the highest storage
capacity magnetic tapes were five terabytes. They are available in a large
number of formats: the IASA guidelines for digital audio objects provide the
specifications of 23 common data tape formats (International Association of
Sound and Audiovisual Archives Technical Committee, 2004, p.58), but the
number is higher, especially if formats not in current use are included in the
count. The wide range of computer tape formats handled by one data conversion
company is given on its web site (www.ndci.com/Home/NewsInfo/TapeFormats/
tabid/92/Default.aspx).
Magnetic tapes store information in the alignment of magnetic particles sus-
pended within a polymer binder, which sticks the magnetic information-carrying
layer to a substrate and provides a smooth surface that helps the tape to run
through playback equipment smoothly. If humidity levels are too high, the
binder softens or becomes brittle through hydrolysis, resulting in the ‘sticky
tape’ phenomenon where the binder sticks to the equipment’s tape heads. Data
loss from dropout is one result of ‘sticky tape’. The magnetic particles, which
store data in the direction of the magnetism in them, vary in their magnetic
stability. The substrate is usually made of chemically stable polyester film
(Mylar or polyethylene terephthalate (PET)). It is affected by mechanical prob-
lems such as stresses on the tape caused by fluctuations in temperature and
humidity levels in storage areas, resulting in mistracking during playback. The
substrate can also be stretched if the tape is not appropriately stressed when it
is wound or rewound. Other factors that cause data loss include the quality and
maintenance of tape recording and playback devices.
The longevity of magnetic tape can be improved by attention to its care
and handling. Appropriate storage is essential for minimizing deterioration
through binder hydrolysis, which is a result of excessive moisture. The rate of
hydrolysis can be reduced by lowering humidity levels and temperatures in
tape storage areas. Magnetic pigments degrade more slowly at lower tempera-
Digital storage media 47

tures. It is also important that temperature and humidity levels are kept constant
and stable. Storage at temperatures that are too high (above 23oC, suggests
Van Bogart) increases dropout because the tightness of the tape packing is
increased; this, in turn, increases tape distortion. Increased tape-pack stresses
also occur as the tape absorbs moisture and expands at relative humidity levels
greater than about 70 per cent. Temperature and relative humidity levels that
are too high also promote fungal growth. Attention should be paid to maintain-
ing good air quality and to reducing dust and debris. Conditioning (acclimati-
zation) is required if tape is stored in an environment that differs from the
environment in which it is used. (Further information about the care and han-
dling of magnetic media is available in Chapter 7 and in Van Bogart, 1995.)
An indication of the life expectancy of some common tapes is provided in
Figure 3.3. The important point to note here is the effect of different relative
humidity (RH) and temperature levels on the life expectancy of digital artifacts.
(Note that although D3 tape and DLT tape cartridges have been superceded,
the trends indicated in Figure 3.3 are still valid.)

Medium 25% RH 30% RH 40% RH 50% RH 50% RH


10°C 15°C 20°C 25°C 28°C

D3 magnetic tape 50 years 25 years 15 years 3 years 1 year

DLT magnetic 75 years 40 years 15 years 3 years 1 year


tape cartridge

CD/DVD 75 years 40 years 20 years 10 years 2 years

CD-ROM 30 years 15 years 3 years 9 months 3 months


Figure 3.3: Sample Generic Figures for Lifetimes of Media (Digital Preservation Coalition,
2008, p.154)

Optical disks

This section is based on Byers (2003), Ross and Gow (1999), the International
Association of Sound and Audiovisual Archives Technical Committee (2004
and 2009) and Iraci (2010), which can be referred to for more detailed infor-
mation.
The term optical disk is applied to a large number of media which share
the characteristic of using laser light to record and retrieve bits from a data layer.
Optical disks became readily available at the end of the 1970s. The IASA
Guidelines on the Production and Preservation of Digital Audio Objects noted
12 commercially available CD and DVD disk types with storage capacities
48 Why There’s a Problem: Digital Artifacts and Digital Objects

ranging from 650 megabytes to 9.4 gigabytes (International Association of


Sound and Audiovisual Archives Technical Committee, 2004, p.42).
All optical disks are structured in basically the same way, but with
differences in the way data are recorded. Read-only optical disks are produced
by a laser which burns pits into a coating on a master disk. From this, another
master is produced which is then used to stamp the disks on a plastic base or
substrate. The substrate is coated with a thin layer of metal, and is then
covered with a protective polymer layer. Recordable optical disks are of two
types – record-once (CD-R, DVD-R) and rewritable (CD-RW, DVD-RW).
These use different methods for recording data on the disk. Record-once disks
use a dye process, in which the laser light alters the reflectivity of the dye layer
so that it is read as either reflective or non-reflective. Different kinds of dyes
are used: cyanine (blue); phtalocyanine green); and azo (dark blue). Rewritable
optical disks use a phase change (crystallization) process where the recording
material (metal alloy film) is heated. Their longevity is not clear, Iraci (2010,
p.1) noting that for recordable CDs and DVDs the lifetime, as indicated in
‘research studies, anecdotal information, and manufacturers’ literature’, is
stated as ranging from ‘a couple of years to more than 200 years’.
The part of an optical disk that is most susceptible to deterioration is the
metal reflective layer, because the metal is vulnerable to oxidization. Oxidiza-
tion causes corrosion, which obscures the distinction between pit and surface
(that is, between 0 and 1, in which digital data are stored) and data loss occurs.
Metals or alloys less likely to oxidize, such as platinum or gold, are sometimes
used, but their manufacturing costs are higher. The polymer base supporting
the metal substrate may not be impermeable and thus may allow oxygen to
reach the substrate and oxidize it. This polymer base may also promote oxidi-
zation in small areas where rough spots or other defects do not provide full
protection. Another cause of oxidization is failure of the protective polymer
coating, which leads to oxidization of the metal layer. The ink used to print onto
the disk can cause this failure.
As with magnetic tapes, the longevity of optical disks can be extended by
improvements in manufacturing quality and attention to appropriate storage and
handling. Although the temporary unreadability caused by fingerprints on a
CD-ROM is common knowledge, it is normally assumed that optical disks are
less vulnerable to damage caused by poor handling than are magnetic tapes.
This is not the case; all digital media need careful handling and optical disks are
no exception. The protective coating on a CD is thin, so care must be taken to
ensure that it does not break down. Storage at extremes of temperature and
humidity can affect the physical structures; for example, because plastic sub-
strates can absorb moisture, with oxidization of the metal layer following as a
consequence, high humidity conditions should be avoided. For the same reasons
storage areas in which temperature and humidity fluctuate, resulting in conden-
sation, should be avoided. (Storage and handling are covered in Chapter 7.)
Digital storage media 49

There is a clear link between manufacturing quality of optical disks and


their longevity. CD deterioration and its connection with manufacturing quality
were recognized very early in their existence (Day, 1989). For reasons that in-
clude constantly changing production processes and narrow profit margins, the
quality of blank CD-R and DVD-R can change from batch to batch and ‘at best
… can be described as variable’ (International Association of Sound and
Audiovisual Archives Technical Committee, 2004, p.67). When using optical
disks in a preservation setting – for interim storage, not for long-term storage – it
is essential to monitor the quality of new disks before use. This applies equally
to blank disks of any reliable brand.
Early studies of optical disk longevity, such as that carried out by NARA
in 1992, suggested that CD-ROMs would last for three to five years. Conse-
quently NARA did not consider CD-ROM to be an acceptable medium for
archival storage, although it was acceptable for use as a transfer medium for
permanent records (Harvey, 1995). A 1992 report on accelerated aging studies
of CDs, carried out by 3M with input from the National Media Laboratory,
suggested ‘a 25-year warranty that assures 100 year life-time at room tempera-
ture’: that is, the lower estimate takes account of ‘general storage fluctuation,
as long as it’s non-condensing’, with 100 years as the lifetime in high quality
storage conditions (Arps, 1993, pp.102-103). Saffady in 1993 summarized manu-
facturers’ lifetime estimates for read/write optical disks, which ranged from 10
years to 100 years (Saffady, 1993, Table 5). Today we are much less sanguine;
as noted in Figure 3.3, the periods can be significantly shorter (as little as
3 months), depending on storage conditions, and there are other factors that
need to be taken into account. However, the question of how long optical disks
last is arguably not very important in the larger digital preservation picture.
Their limited capacity and uncertainty about their lifespan result in their use in
preservation settings principally as short-term storage devices and definitely
not for long-term storage.

The future for digital storage media

The implication of estimates of the longevity of digital storage media is obvious;


the limiting factor for digital preservation is not the lifetime of the media. It is,
nevertheless, worth continuing to improve the ability of digital storage media
to store greater quantities of data securely for longer periods of time. Although
it is well recognized that equipment and software obsolescence, rather than
the longevity of digital storage media, is the limiting factor, it is worthwhile
committing research and resources to improving these media. For example,
improvements offer potential reductions in storage costs if storage density is
increased and if the vulnerability of digital storage media to changes in tem-
perature and humidity is reduced. Perhaps more importantly, improvements in
50 Why There’s a Problem: Digital Artifacts and Digital Objects

media could reduce the frequency with which copying of data (refreshing or
migrating them) needs to be carried out. Further research into digital storage
media is needed and is, in fact, being conducted, as exemplified by General
Electric’s work on new holographic digital storage technology (Lohr, 2009)
and by experiments with archival-quality microfilm for long-term storage of
bit-streams (www.peviar.ch and www.bitsave.ch).
Until we succeed in improving digital storage media, we need interim re-
sponses to the challenges presented by deteriorating media. One such response
was proposed by Howell in 2001, who noted that ‘it is pragmatic to keep crucial
pieces of hardware and operating software tucked away from the IT depart-
ment’s upgrading programmes, for a few years at least’ and provided an example
from the State Library of New South Wales, where a 5¼-inch floppy disk
drive was maintained to provide access to legal deposit material in that format
(Howell, 2001, p.142). This is still sound advice today.
Harvey concluded in 1995 that ‘there are at present too many unknowns to
commit digital data to currently-available artefacts for anything other than
short-term storage’ (Harvey, 1995). This situation has not changed. If we
choose to preserve digital artifacts, then we do so in the knowledge that this is
a short-term expedient.

Digital objects – more than digital artifacts


At the start of this chapter three sets of challenges for the preservation of digital
materials were listed. The first, challenges relating to the nature of media used
to store digital materials, is explained above. This section examines the second
and third sets of challenges – challenges arising from the technologies required
to create, store and access digital materials; and challenges to the integrity of
digital materials.
In the preceding sections the term digital storage media has been used; this
section uses the term digital objects. The distinction between these two terms
is helpful in understanding the reasons why digital materials are difficult to
preserve. Digital storage media consist of the physical storage medium, plus
the bit-stream recorded on it. A digital object is different; it is the bit-stream,
plus everything else that is needed to make sense of the bit-stream, and over its
life these could be stored on a number of different media. A digital object is
likely to include information about the format of the data preserved in the bit-
stream. The distinction noted in Chapter 1, between physical objects and logi-
cal objects – the terms used by the UNESCO Guidelines (2003) and by
Thibodeau (2002) – is the key. Physical objects (our digital storage media) are
simply ‘an inscription of signs on a medium’ (Thibodeau, 2002, p.6) and have
no concern with any meaning that is represented by the data in the bit-stream.
Logical objects (our digital objects) are
Digital objects – more than digital artifacts 51

processable units … according to the logic of some application software. The rules that
govern the logical object are independent of how the data are written on a physical
medium … A logical unit is a unit recognized by some application software. This rec-
ognition is typically based on data type [for example, ASCII or other more complex
formats] … to preserve digital information as logical objects, we have to know the
requirements for correct processing of each object’s data type and what software can
perform correct processing (Thibodeau, 2002, p.7).

To preserve the digital object, then, we need to preserve not merely the bit-
stream, but also the means to process that bit-stream: the access devices that let
us read the bit-stream from the digital media on which it is stored; the software
that allows us to manipulate and present the information represented by the
data carried by the bit-stream; the documentation so that we can understand
the data formats used and the software; and the contextual information that is
essential to ensure the integrity and authenticity of the information. Ross lists
the main factors that ‘can render resources non-interpretable’ as degradation of
the media, loss of functionality of access devices, loss of manipulation capa-
bilities, loss of presentation capabilities, weak links in the documentation chain,
and loss of contextual information (Ross, 2000, p.12). His terms are used here-
after in this chapter. As the UNESCO Guidelines for the Preservation of Digital
Heritage bluntly put it, ‘Digital materials cannot be said to be preserved if
access is lost’ (UNESCO, 2003, p.21).

Loss of functionality of access devices

The short lifespans of digital storage media are only one reason that digital
preservation remains a challenge. Another reason is what is commonly referred
to as technological obsolescence, where hardware and software are replaced by
newer devices or versions that supersede the old technology. The consequence
is that information stored on and accessed using obsolete technologies become
inaccessible. Even if the digital storage media on which the bit-stream is stored
remains in usable condition and the bit-stream stored on it is intact, it is almost
inevitable that the drive, software driver or computer will no longer be available
to access it.
Australia academic Tara Brabazon described this situation in 2000:

I still own my first laptop computer, bought in 1991. It is an Olivetti M316. It functions,
although the battery no longer does. It has a 40 MB hard drive, which is not large enough
to install a current version of Windows 98, let alone the ability to use the Windows
environment to prepare documents. That is probably quite fortunate, as the ‘F’ key
does not work, and most of the letters on the keyboard have been scratched off through
excessive use. There is no possibility or space for a modem connection (Brabazon, 2000,
p.156).
52 Why There’s a Problem: Digital Artifacts and Digital Objects

Brabazon does not mention another kind of technological obsolescence: the


obsolescence of digital storage media, where the media are no longer manufac-
tured and cannot be purchased, drives to access the media are no longer pro-
duced, and software needed to control the drives (in the form of device drivers)
and to read the data in the formats recorded on the media is no longer written
(Rothenberg, 1999a, p.7).
The information professions have little or no control over this rapid rate of
technological obsolescence, which is driven by the constantly changing eco-
nomic demands of the information technology marketplace. Conway, writing
about digitizing, suggested that digital project managers ‘ultimately ... have no
control over the evolution of the imaging marketplace’ (Conway, 2000). And
Rothenberg alluded to another reason, the influence of those working in the
computer industry, who have ‘become inured to the fact that every new gen-
eration of software and hardware technology entails the loss of information, as
documents are translated between incompatible formats’ (Rothenberg, 1999a,
p.4).

Loss of manipulation and presentation capabilities

The crux of the problem of preserving digital materials is that they are ‘inher-
ently software-dependent’ (Rothenberg, 1999a, p.8). The bit-stream can repre-
sent any of a very wide range of content and formats – often text or data, but also
images, audio and video, graphics, or combinations of these and other content.
These data require software to interpret them, to turn them into information:

This point cannot be overstated: in a very real sense, digital documents exist only by
virtue of software that understands how to access and display them; they come into
existence only by virtue of running this software (Rothenberg, 1999a, p.8).

Meaning is preserved only by preserving ‘the ability to reconstruct streams of


bits in a meaningful way that computers and humans can interpret, use, re-
purpose, and understand at any arbitrary point in the future’ (Workshop on
Research Challenges in Digital Archiving and Long-term Preservation, 2003,
p.6). Besser (2000) divides the preservation challenges of this dependence on
software into the viewing problem, the scrambling problem and the translation
problem. Without appropriate software, the digital file cannot be displayed
appropriately on the screen, so that ‘all we will be able to see is gibberish’.
Besser uses the example of files created in WordStar, once common word-
processing software, which cannot be viewed using today’s versions of Micro-
soft Word. The scrambling problem comprises the complications that result
from the use of compression techniques to reduce the size of digital files so
that they can be stored and transmitted more economically. These add further
Digital objects – more than digital artifacts 53

complexity to the encoding of files, adding another layer of coding to be rec-


ognized and allowed for. The use of lossy data compression techniques discard
data. Besser’s translation problem refers to the inability of applications software
to read files created for use in its own earlier versions: ‘in fact’, he notes, despite
the bit-stream being identical, ‘the very reason for converting the file is because
we are unable to successfully sustain that application’s environment over time’
(Besser, 2000). Besser also notes what he calls the inter-relation problem. In the
digital world material is typically related to other material, a common example
being web materials that often contain sections hyperlinked to other web-based
materials. This raises an important question: what are the boundaries of a web
page? That is, where does one stop preserving? Is it sufficient to preserve only
the main web page and not the links to other web sites? (Besser, 2000). Chap-
ter 4 explores this further.
Another metaphor for this software dependence is the National Archives of
Australia’s performance model. Digital records are the result of the mediation
of technology and data. The source, a data file, is meaningless to users by itself
until it is combined with the process, the technology ‘required to render mean-
ing from the source’. A performance is what provides meaning to users. It is
created when a source is combined with a process and is ‘what is rendered to
the screen or to any other output device’. The performance is ephemeral, last-
ing only as long as the source is combined with the process, and a new per-
formance is created every time the source and process are combined (Heslop,
Davis and Wilson, 2002).

Weak links in the documentation chain and loss of


contextual information

Documentation is the key to preserving digital objects – documentation about


them so that their meaning continues to be determinable in the future, about
the systems required to access the bit-stream, and about the context in which
the bit-stream was created and has been maintained. Digital objects that have
become separated from documentation about them are likely to be valueless.
Not only must this documentation about the objects be actively generated and
recorded, systems to maintain it and to transfer it to the future must also be
developed and maintained.
As the report of the NSF-DELOS Working Group on Digital Archiving
and Preservation (2003, p.iv) puts it, ‘There are two problems related to the
preservation of digital entities that need generic solutions – interpretability and
trustworthiness’. Interpretability is maintained by preserving the means to access
the bit-stream and make sense of it, or preserving documentation about it that
allows us to recreate the means. This requires preserving documentation about
the data formats used and about the software.
54 Why There’s a Problem: Digital Artifacts and Digital Objects

An unbroken documentation chain is considered essential for determining


whether a digital object can be trusted. ‘Trustworthiness’ of the digital object
is based on its authenticity and its integrity, that it is what it purports to be, and
that it is complete and has not been altered or corrupted (Ross, 2002, p.7).
Authenticity and integrity demand special attention if preserved digital objects
are to be trusted and used as evidence or as a source of information. The ease
with which digital objects can be altered adds complexity to these require-
ments, as do some preservation strategies, such as migration and normalization
(see Chapter 8) which ‘involve transformations of the original bitstream’
(NSF-DELOS Working Group on Digital Archiving and Preservation, 2003,
p.6). If we cannot be sure of the authenticity and integrity of a digital object,
we may not be able to use it effectively. For example, the value of a record of
a business or legal transaction in digital form is compromised if it cannot be
established that it has not been altered in any way.
The challenge is to demonstrate the authenticity and integrity of digital
objects. This is where the documentation about objects becomes significant, as
‘authenticity is best protected by … documentation that maintains the clear iden-
tity of the material’ (UNESCO, 2003, p.22). For instance, if it can be established
that a digital object has always been kept in the custody of an archive in which
every change to it has been recorded (such as changes resulting from migration
of the bit-stream – the dates at which migration occurred, to which media, and so
on), then we can be more secure about its integrity. The Digital Preservation
Coalition’s handbook gives further examples: scholars need confidence that the
objects they refer to will remain stable, and materials used as evidence in legal
situations need to demonstrate authenticity (Digital Preservation Coalition, 2008,
p.37). Ross poses the questions ‘What are the requirements of authenticity and
integrity functionality and what can be done to ensure that they are present in
digital objects or in the systems that maintain them?’ He suggests that

underpinning authenticity and integrity and their preservation over time are the con-
cepts of fixity, stabilisation, trust, and the requirements of custodians and users … an
authentic digital object is one whose genuineness can be assumed on the basis of one or
more of the following: mode, form, state of transmission, and manner of preservation
and custody (Ross, 2002, p.7).

Many of the concepts that have been developed to ensure authenticity and in-
tegrity of preserved digital objects come from research and thinking among the
recordkeeping community, such as the outcomes of the Functional Requirements
for Evidence in Recordkeeping Project at the University of Pittsburgh, and
InterPARES (see Chapter 5). Gilliland-Swetland (2000, p.16) reminds us that
‘the value of an individual record is derived in part from the sequence of records
within which it is located’ and that ‘it can be difficult to understand an individual
record without understanding its historical, legal, procedural, and documentary
context’ (p.18).
Conclusion 55

It is also necessary to preserve documentation about other aspects of digi-


tal objects in order to ensure their preservation, so that they can be understood
in the future. Chapter 5 notes this in more detail.

Conclusion
This chapter notes three modes of digital death: instability of storage media;
obsolescence of storage and access technologies; and challenges to the integ-
rity of digital materials. The rapid obsolescence of hardware and software is, to
a large extent, a result of today’s prevailing market-driven ethos, the highly
competitive nature of which means that product obsolescence is essential to
the survival of businesses. Digital preservation requires that means of address-
ing rapid obsolescence must be established. (Some of these means are noted in
Chapters 6, 7 and 8.) However, rapid obsolescence is not the only threat to the
preservation of digital materials imposed by the prevailing competitive philoso-
phy; commercial imperatives seldom coincide with cultural heritage imperatives.
Creators of digital materials and other stakeholders may lose interest in their
digital output – a business might close down, or a web site might cease to be
maintained – which has consequences for the future of their materials. Other
threats include stakeholders’ lack of awareness about digital preservation issues,
a shortage of the skill sets needed to preserve digital materials, lack of inter-
nationally agreed approaches, a shortage of practical models on which to base
preservation practice, and a lack of ongoing funding to address digital preserva-
tion issues. These issues are noted further in the chapters following.
Chapter 4
Selection for Preservation – The Critical Decision

Introduction
Selection in the digital world is not a choice made
once and for all near the end of an item’s life cycle,
but rather is an ongoing process intimately connected
to the active use of the digital files (Conway, 2000)

Librarians and archivists have long acknowledged their responsibility for pre-
serving documents for future use, and have developed criteria and processes
for identifying the documents to which they will devote resources to ensure
their preservation. Archivists have developed a considerable body of theory
and practice about appraisal to support this responsibility. In the main, however,
these criteria, processes, theory and practice have been developed and applied
primarily to paper-based documents. They do not automatically translate to
digital materials and therefore need to be revisited and modified to ensure
that they can be applied effectively in the digital world.
Selection must be reconsidered as we move from old to new preservation
paradigms. For digital materials, selection decisions are ‘not a choice made
once and for all near the end of an item’s life cycle, but rather … an ongoing
process intimately connected to the active use of the digital files’ (Conway,
2000). The digital mortgage (that is, the ongoing costs) that result from selec-
tion decisions also needs to be considered: ‘Program costs don’t cease when
the Web site disappears’ (Vogt-O’Connor, 2000).
This chapter considers the important role that selection plays in the respon-
sible professional practice of librarians and recordkeepers, when applied to
digital materials. It notes selection criteria traditionally applied in library practice
and appraisal criteria traditionally used by archives, and indicates why these
selection criteria, developed for physical artifacts, do not translate well when
applied to digital materials. It considers what additional factors need to be
considered or existing factors emphasized when developing effective selection
policies and practices for digital materials, such as the role of intellectual
property ownership, the importance of preserving context, and the nature of
stakeholder input. Emerging frameworks for selecting digital documents,
incorporating some new selection criteria and modified weightings for tradi-
tional selection criteria, are noted.
Selection for preservation, cultural heritage, and professional practice 57

Selection for preservation, cultural heritage,


and professional practice
Libraries, archives and museums have traditionally considered that one of their
primary roles is preservation, ensuring that the cultural heritage of the societies
they are embedded in and defined by is available for use by future generations,
as well as by current users. For these institutions preservation is a core business
activity, and they have developed preservation systems and technologies to
provide continuing access to cultural heritage materials. This preservation role
has, however, always been one in which major conceptual challenges are
embedded. For whom are we preserving? Who exactly are the users of the
future? Because the resources available to cultural heritage institutions (librar-
ies, archives and museums) are seldom sufficient for the preservation of all
materials, some selection is required. What are the implications of selecting
one item or category of material over another one? How might this selection
shape the ways that future generations think about us? Such selection decisions
have been the matter of considerable debate, sometimes charged with political
and polemical emotion, and they are not easy ones to make. ‘Every choice to
preserve is at the expense of something else’ (UNESCO, 2003, p.70).
Selection decisions – what should we preserve?, for how long?, who takes
responsibility? – are essential in managing collections of heritage materials.
They are necessary because ‘there are usually more things – more information,
more records, more publications, more data – than we have the means to keep’
(UNESCO, 2003, p.70). The universal dependence on computing and networks
has exacerbated this situation. In the pre-digital print environment the business
of publishing provided (and continues to provide for this material) some filter-
ing, some quality assurance of the product, through mechanisms such as sub-
mitting manuscripts to readers to make publishing decisions based on quality,
relevance to a defined public, and saleability. As noted in Chapter 1, increasing
quantities of information are being produced digitally and it is easier for indi-
viduals and organizations to make available to the public – to publish – without
any intervening quality assurance measures. The ease with which information
can be mounted on web sites has meant substantial increases in the amount
of information readily available, but it has also meant significant variation
in its quality. The inevitable result has been an even stronger requirement to
select materials for preservation purposes in the face of insufficient budgets,
expertise and facilities. ‘Digital preservation will be expensive’, the Cedars
Project (2002) reminds us in response to the question ‘why do we need to
select?’. (Just how expensive it is is noted in Chapter 10.) Unlike non-digital
material such as paper-based artifacts, where there is a period of time in which
to make selection decisions before deterioration of materials becomes an issue,
the time frame for deciding whether or not to preserve digital materials is very
short.
58 Selection for Preservation – The Critical Decision

Selection decisions for cultural heritage materials have been the focus of
considerable debate, and this increasingly includes digital materials. The issues
have been articulated most clearly by the archives profession. Cook summarizes
many of them. Archives, he suggests, are ‘a source of memory about the past,
about history, heritage, and culture, about personal roots and family connec-
tions, about who we are as human beings and about glimpses into our common
humanity through recorded information in all media, much more than they are
about narrow accountabilities or administrative continuity’ (Cook, 2000). But
there are many dangers: ‘Memory is notoriously selective – in individuals, in
societies, and, yes, in archives. With memory comes forgetting. With memory
comes the inevitable privileging of certain records and records creators, and
the marginalizing or silencing of others’ (Cook, 2000).
This is not the only way of thinking about selection. Other approaches to
selection are based on a clear distinction between the activities of records man-
agers and the work of archivists. Records managers select records for retention
based on ‘risk avoidance, market opportunities, or desires to avoid embarrass-
ment or accountability’ but this approach ‘inevitably will privilege the needs of
business or government in terms of the issues that get addressed, the allocation
of resources, and the long-term survival of records’ (Cook, 2000). The records
that survive into the future will reflect the concerns of administrators, rather
than the full range of human experience. Recordkeepers, suggests Cook, ‘need
as a profession to remind [themselves] continually of the fate of records left to
White House presidents and Soviet commissars, South African apartheid police
forces and Canadian peacekeepers in Somalia, rogue Queensland politicians
and the American Internal Revenue Service’ (Cook, 2000).
In any approach to selection the selectors must inevitably bring to their
decisions their personal beliefs and values. Responsible selection practice in
preservation must aim to minimize bias in the value judgments that are inevitably
reflected by the decisions about what to select for preservation. Selection dis-
enfranchises some groups, as our experiences with non-digital collections have
shown; an example is that history has moved from being that of significant
individuals, usually male, to a wider view based on the records of other groups
such as women and the poor, and of indigenous cultures.
Should we save everything? This is now technologically feasible for digital
materials with massive decreases in the cost of digital storage and significant
increases in processing power. Current thinking is that we should not, for
many reasons. We lack sufficient resources. Retrieval tools are not sufficiently
developed to provide adequate results. But there are other reasons why the
answer to this question is no and some are noted later in this chapter.
The current state of development of digital preservation requires that we
still have to pose the question of what really matters. How do we decide? Various
selection approaches might be used. Preserving what is easiest to preserve –
‘picking the low-hanging fruit’ (this metaphor is used in UNESCO, 2003,
Selection criteria traditionally used by libraries and archives 59

p.76) – is unlikely to select materials that are of particular value. Selection


decisions could be made on the basis of the type of material, but this raises
more questions than it answers, such as ‘are websites more important to pre-
serve than emails?’ Approaches such as these are not completely satisfactory
as a basis for making realistic decisions that are professionally responsible,
although some of them have been applied to digital materials on a small scale
in order to develop expertise and scalable strategies and practices for digital
preservation.
Where might we look for guidance on these issues? This chapter indicates
that selection criteria based on library practice are inadequate when applied to
digital materials, and that more appropriate guidance is to be found in appraisal
theory and practice derived from the archives discipline. However, all matrices
of selection criteria, regardless of their disciplinary origins, share common
problems. One problem is the difficulty of determining value. Current assess-
ments of value are not always a useful indicator of future value; the UNESCO
Guidelines for the Preservation of Digital Heritage (UNESCO, 2003, p.72)
provide the example of remote sensing data from earlier decades, which are now
being used to assess environmental damage. Another problem is determining
how much to preserve. There is a growing perception that we need to keep
more, rather than less, of the digital environment, partly because we have the
technology (such as that used by the Internet Archive) to do so and partly
because our abilities to interrogate and get new meanings from data are
increasing as new tools such as data visualization and data mining software are
developed. We understand better the requirements of some disciplines as a result
of research carried out in the last decade, such as the ERPANET case studies
(www.erpanet.org/studies) and the SCARP case studies (www.dcc.ac.uk/pro
jects/scarp). Probably the most significant development in our thinking about
selection of digital materials for preservation is our increasing realization that
we must specify as precisely as we are able the user group of the future, and
tailor our selection decisions and practices to meeting their projected needs.
One example of this thinking is the ‘designated community’ concept defined in
the OAIS Reference Model (Consultative Committee for Space Data Systems,
2002), which is noted in more detail in Chapter 5.

Selection criteria traditionally used by libraries and archives


Libraries, with their long history of preserving documentary heritage, are more
attuned to making preservation decisions about physical objects than about the
less tangible digital materials. Many of the policies developed for the selection
of artifacts also apply to digital materials. Certainly the key issues remain the
same: on what basis do we select? What are the implications for current and
future users? Who might the users of the future be? Who decides how much
60 Selection for Preservation – The Critical Decision

should be kept? Who keeps it? (The rest of this chapter is based in part on
Harvey (2005a and 2010, chapter 11).)
Most library selection practice is aimed at meeting the current needs of
their user communities. (This is, of course, a generalization: national libraries
are a significant exception to this in their use of legal deposit mechanisms to
acquire comprehensive national documentary resources collections.) Secondary
principles, such as decisions to develop particular subject areas where user
needs are of lower priority, may modify these policies. These selection guide-
lines based on meeting current user needs do not automatically apply to selecting
material for preserving in the future.
Criteria developed by libraries to select artifacts for preservation vary little.
They address characteristics of artifacts – usually age, evidential value, aesthetic
value, scarcity, associational value, market value and exhibition value (Task
Force on the Artifact in Library Collections, 2001, pp.9,11). Other perspectives
on these criteria may apply: for instance, the fact that an artifact is an original
and not a copy may be strongly associated with its evidential value and its
exhibition value. Occasionally additional criteria are used; examples include
‘fragility or condition’ (Pacey, 1991, p.189) and whether the artifact is ‘held in
community esteem’ ((Significance) 2001, p.11). The practicalities of preserva-
tion are also influential, among them the existence of a management plan and
assessment of physical condition (Edmondson, 2002, pp.22-23; Harris, 2000,
pp.206-224). Constraints to preservation resulting from intellectual property
issues also make an appearance, as in Harris’s ‘are there any constraints caused
by copyright laws?’ (Harris, 2000, pp.206-224).
To summarize, traditional library selection practices for preservation are
based on the preservation of items (or artifacts) in their original formats, accord-
ing to five key criteria: evidential value; aesthetic value; market value; asso-
ciational value; and exhibition value. Additional criteria are often also used,
although there is less consistency in their application: physical condition;
resources available; use; and an ill-defined, but nonetheless well-represented,
category of social significance (is the item ‘held in community esteem’?).
Although this summary sounds straightforward and its criteria clear cut, in
practice many problems arise in their interpretation, most of which relate to the
difficulty of determining what value is. ‘What were the grounds for deciding in
favor of one object and against another? How can libraries cope with the fact
that the value of the artifact is never quite the same to different researchers?’
Artifacts are ‘cultural variables’ that ‘are viewed and used in a given culture at
a particular moment’ (Task Force on the Artifact in Library Collections, 2001,
p.12). Libraries haven’t always got this right – in fact it is impossible to get it
right, given changing perceptions of value. As an example, consider the chang-
ing status as research materials of romance novels, a genre which has been
‘banish[ed] … from the cultural record of the nation’ (Flesch, 1996, p.190;
2004). Other problems arise when determining if and when a surrogate is just
Selection criteria traditionally used by libraries and archives 61

as suitable as the original artifact. Copies of artifacts lose some of their infor-
mation each time they are copied – a microfilm can never capture all of the
information that resides in the physical structure of a book. This is known as
generational loss.
Many approaches to selection for preservation have been applied in libraries.
For some of these the aim has been to reduce the amount of intellectual input
into the selection process, or to expend that effort only once so that others do
not have to duplicate it. The ‘great collections’ approach was based on selecting
artifacts whose content was valuable intellectually and which were physically
fragile, but this is labour-intensive. Similar approaches involved the determina-
tion by a bibliographer or panel of experts of the core literature in a field, but
this too was labour-intensive and required a high level of bibliographic control
in the field. User-driven selection occurs when an item requested by a user is
in poor physical condition and is treated when attention is called to it (Cox, 2002,
pp.97-98). Smith (1999, pp.9-11) provides more details of these approaches.
None of these approaches is fully appropriate to the selection of artifacts for
preservation. The same reservations apply to their application in the digital
environment.
The archival community has well-developed theory and a set of practices
known as appraisal. Appraisal theory and practice is significantly more devel-
oped than library selection practice and its deeper theoretical basis informs our
thinking about what digital materials are worth preserving. Appraisal, which
lies at the core of archival practice, is traditionally defined in terms of evaluating
records in order to determine whether they need to be kept as archives (that is,
as long as possible) or for a specified period or whether they are to be destroyed.
Appraisal of archival materials applies criteria similar to those used in
library selection. Archival value encompasses several narrower values: admin-
istrative (usefulness for the conduct of business); fiscal (usefulness for financial
business); legal (worth for conduct of legal business); intrinsic (the inherent
nature of the material, its significance as artifact); evidential (its value as evi-
dence of the record creator’s origins, functions and activities); and informa-
tional (usefulness for more general research purposes because of the record’s
information content) (Tibbo, 2003, pp.29-30). What most strongly distinguishes
archival appraisal practice from library selection practice is a stronger recognition
of, and emphasis on, the importance of context; as Cox (2002, p.53) indicates,
‘from an archival perspective, context is crucial to understanding information
or evidence (in any form)’. The distinctive nature of appraisal is noted by Gilli-
land-Swetland:

The archival perspective brings an evidence-based approach to the management of


recorded knowledge. It is fundamentally concerned with the organizational and per-
sonal processes and contexts through which records and knowledge are created as well
as the ways in which records individually and collectively reflect those processes. This
62 Selection for Preservation – The Critical Decision

perspective distinguishes the archival community from other communities of infor-


mation professionals that manage decontextualized information (Gilliland-Swetland,
2000, p.2).

This evidence-based approach with its emphasis on context is proving to be


more helpful in developing effective selection criteria for digital materials than
is selection practice based on the traditional library criteria. Digital materials
can be more effectively handled by considering them, at least in part, as records,
especially in ensuring that contextual information is recorded through docu-
mentation and metadata.
The Archives Association of Ontario’s brief statement about appraisal (Ar-
chives Association of Ontario, 1999) poses six questions:

1. What is the administrative, evidential or informational value of the records to the


organization?
2. Do the records meet the terms of your mandate and acquisition policy?
3. Are the records primary or unique?
4. Is the information in the records duplicated in another set of records?
5. Can the records be properly preserved?
6. Can the records be made available?

Two of these questions (5 and 6) are concerned with practicalities: are the
resources (technical, financial and human) available to preserve the records
and make them available in the future? Are access restrictions so rigid that the
records cannot be made available to users in a realistic and timely way? Ques-
tion 1 is fundamental to appraisal. By linking the administrative, evidential and
informational value of the records to an organization, it clearly indicates the
importance of context. Cox (2001, p.6) further indicates that ‘a record is a spe-
cific entity. It is transaction oriented. It is evidence of activity (transaction) and
that evidence can only be preserved if we maintain content, structure, and con-
text’.
How might we interpret this for materials that are not records of trans-
actions, such as the kinds of materials (including digital materials) usually
managed by libraries? Cox (2001, p.6) assists us: ‘Structure is the record form.
Context is the linkage of one record to other records. Content is the information,
but content without structure and context cannot be information that is reliable’.
We need to identify the equivalents of content, structure and context for digital
materials that are not principally records of transactions.
Appraisal theory and practice are themselves being continually reappraised,
in part because appraisal ‘is not free of bias and subjectivity; its results reflect
the cultural and other values of the time’ (Piggott, 2001). The ‘great trinity
mystery’ of appraisal pithily notes:
Why traditional selection criteria do not apply to digital materials 63

– apart from some exceptional cases, it is beyond our resources and power to keep all
records; which is a pity, because
– beyond their original use, all records conceivable have their uses; we’ve come to
expect unexpected uses and yet
– it is almost impossible to accurately predict future use, and when we try, the passage
of time can cause serious havoc with appraisal judgments (Piggott, 2001).

The definition of appraisal is evolving to accommodate digital records, so that


appraisal decisions are made at or near the time that records are created. The
ISO standard Records Management – Guidelines, which is based on an earlier
Australian Standard for Records Management, does not define appraisal but
notes that capture is ‘the process of determining that a record should be made
and kept … It involves deciding which documents are captured, which in turn
implies decisions about who may have access to those documents and generally
how long they are to be retained’ (AS ISO 15489.2, 2002, 4.3.2 – the italics
have been added).

Why traditional selection criteria do not apply


to digital materials
The UNESCO Guidelines for the Preservation of Digital Heritage suggest that
there are no differences between selection for preservation of digital materials
and selection of non-digital materials, that the ‘selection of digital heritage is
conceptually the same as selection for non-digital materials’; the same principles
and practices, based on existing policies and procedures, apply, although they
‘may need some adjustment’ (UNESCO, 2003, p.70). This is too simplistic a
view, because the different emphases required are so great that the concepts
are altered significantly. For instance, we place increased emphasis on whether
we possess the technical capability to preserve digital materials. This means
that the practicalities become more important, whereas in the past the informa-
tion content or evidential value of most materials in the collections of libraries
and archives would not be significantly altered if no actions were taken.
What are the differences of digital materials that affect selection decisions?
One is that, as noted in previous chapters, digital materials are more vulnerable
than material in traditional formats, such as paper. Another is cost. While major
resourcing issues are attached to any decisions about which materials we will
retain, the issues are different for digital materials (the digital mortgage noted
earlier in this chapter). (Costs are noted in Chapter 10.) Also different is that
preservation strategies and techniques developed in a pre-digital era do not
necessarily apply in the digital environment – in Cloonan’s words, ‘digital
objects force us to preserve them on their own terms’ (Cloonan, 2001, p.237).
For example, selection decisions made about digital materials need to take into
account whether or not the equipment to read them is available; this is not
64 Selection for Preservation – The Critical Decision

typically the case for non-digital materials. A further difference which relates
directly to selection is the need to make conscious preservation decisions about
digital materials early in their existence. Burrows compares the pre-digital and
the digital environments:

In the pre-digital era, selection necessarily preceded preservation. Once a book or journal
had been acquired, a decision could be made about its long-term retention. The mere
act of placing it in a library was often seen as a sufficient method of preservation in itself.
In the digital era things are very different. ‘Digital records don’t just survive by accident’,
observes Margaret Hedstrom. As a result, a lack of preservation is tantamount to de-
selection (Burrows, 2000, p.152).

Or, as the UNESCO Guidelines concisely state, ‘it may not be possible to wait
for evidence of enduring value to emerge before making selection decisions’
(UNESCO, 2003, p.71).
One characteristic of digital materials that assumes greater significance
when considering selection is the quantity of digital materials being generated
(noted in Chapter 1). We currently consider that we need to preserve a consid-
erable percentage of this material, and this consideration has a major conse-
quence for selection as well as for other aspects of digital preservation. Our
current best-practice techniques were developed for small quantities of simply-
structured digital objects and they work best with these. They require significant
input by people and paying people costs money. One response is to develop
efficient and effective automated processes that require minimal human inter-
vention. Appraisal is one area where research into the development of auto-
mated processes is being carried out (for examples see Harvey and Thompson,
2010; Oliver et al., 2008).
Other differences between digital and non-digital materials also need to be
considered when developing selection criteria. For digital materials the quanti-
ties to be assessed may be significantly greater and their quality may be more
variable; for example, unlike most print-based publications, they may not go
through the quality assurance processes of a publisher. We also face the chal-
lenges posed by new genres being developed, especially those that incorporate
linked resources. There is, too, the question of exactly which attributes of digital
materials should be preserved (noted in more detail in Chapter 5). Yet another
issue is the difficulty of determining ownership of intellectual property rights
for some digital materials. Any effective process and criteria for selection of
digital materials for preservation must take account of these differences.
IPR, context, stakeholders, and lifecycle models 65

IPR, context, stakeholders, and lifecycle models


Viable selection criteria for digital materials need to take account of many
factors that differentiate them from non-digital materials. This section consid-
ers four of these factors: intellectual property rights, which assume greater
importance, and legal deposit for digital materials, which is slowly becoming
a reality; the need to preserve much more contextual information than is
thought necessary for most non-digital materials; the potentially much greater
role that stakeholders play; and the lifecycle models that assist us to develop
effective preservation techniques, including principles for selection for preser-
vation.

Intellectual property rights and legal deposit

Legal frameworks for the preservation of traditional (usually non-digital)


material that enters library collections are already well established. They en-
compass the provisions of copyright law and legal deposit legislation. Copyright
laws typically include a provision that an item can be copied without specific
approval from the copyright owner if the copies made are for preservation
purposes. No similar blanket provisions apply to digital materials, for which the
legal framework is complex. Not much has changed; copying digital materials
for preservation purposes – the very basis of the digital preservation strategies
of refreshing, migration, and emulation – can infringe intellectual property
rights as they currently stand for digital material (Muir, 2004, pp.76-77). Some
countries have accommodated digital materials in their copyright legislation,
but the provisions are uneven (Besek et al., 2008). Data protection acts in some
jurisdictions may also present barriers to preservation.
Commenting on trends in intellectual property rights, Beagrie notes that,
as the economic value of intellectual property increases and investment in it
expands, legislation becomes more restrictive, for example, by the extension of
copyright protection periods. He comments that ‘the needs of memory institu-
tions for legal exceptions to undertake archiving are often overlooked or not
sufficiently understood’ and are overridden by commercial imperatives to
protect intellectual property (Beagrie, 2003, p.3). There are at present no simple
solutions to this issue. Strategies include establishing a dialogue with intellec-
tual property rights owners so that they are fully informed of the issues and of
what is required to allow the preservation of digital materials (Digital Preser-
vation Coalition, 2008, p.43).
Intellectual property rights issues may influence selection of digital materials
for preservation. If the rights are so restrictive that there is no real possibility
that access to the material can be made available in the future, then it would
probably be pointless to expend resources on its preservation.
66 Selection for Preservation – The Critical Decision

Among the powerful tools that some libraries apply to the preservation of
traditional documentary heritage materials are the provisions of legal deposit
legislation, through which material comes automatically to a designated library
without the expenditure of effort and resources to acquire that material. Legal
deposit is conceptually antithetical to selection; it implies that no selection is
made, but that all materials defined by the legislation are to be retained by
the library in which they are deposited. Legal deposit is noted here, however,
because it addresses one of the issues faced in preserving digital materials –
getting hold of them in the first place.
Because legal deposit legislation usually predates the digital era, digital
materials are not covered. Some countries have drafted and/or enacted legal
deposit legislation that covers all digital materials: these include Canada,
Denmark, Finland, France, Germany, Iceland, New Zealand, Norway, South
Africa, Sweden and the United Kingdom. Other countries have legal deposit
legislation that covers some digital materials, usually static publications such
as those issued on CD-ROM (Legal deposit, 2007). In the absence of legal
deposit legislation covering digital materials, there is some potential in negoti-
ating voluntary deposit schemes. A model code for voluntary legal deposit
agreements was developed and adopted in 2005 by the Conference of Euro-
pean National Libraries and the Federation of European Publishers (2005).

Context and community

The need to preserve contextual information about digital objects is noted in


earlier chapters. It is not simply the bit-stream that we aim to preserve, but also
the additional information and tools needed to access and understand that bit-
stream. Selection processes must take account of this requirement. If contextual
information is not available, then the selection decision may be not to preserve
those digital materials. The UNESCO Guidelines (2003, p.72) suggest some
specific instances:

Where digital materials can only be understood by reference to a set of rules such as a
record keeping system, database or data generation system, or other contextual infor-
mation, selection processes must identify the documentation that will also need to be
preserved.

Preserving contextual information needs to take into account the context in


which selection decisions are made. Every preservation programme operates
within a context. For national libraries the context is the nation; for a business
archive, the company that it is established to serve; for a university, its faculty
and students. The community that the programme serves is likely to impose
its own requirements. This directly affects the quantity and nature of digital
IPR, context, stakeholders, and lifecycle models 67

materials that are selected for preservation: in the case of a national library it is
a wide range, whereas for a university-based programme it will be narrower,
perhaps restricted only to the intellectual output of its faculty and research
students.
The necessary influence of context on selection for preservation is demon-
strated in the OAIS Reference Model, a key standard for developing digital
archives. This standard defines the concept of a ‘Designated Community’ (‘An
identified group of potential Consumers who should be able to understand a
particular set of information’ (Consultative Committee for Space Data Systems,
2002, p.1-10)) and in doing so allows us to be more precise about what serves
the needs of the specified designated community, including what is selected
for preservation for the use of that community. As well as the nature and extent
of the material that is preserved, the community will also define what kind of
contextual information is collected and preserved. For some communities it
may be sufficient to see only a passive rendition of the digital materials: a
screen shot or other visual presentation; perhaps a PDF version is all that is
required. For others, it will be necessary to retain sufficient contextual infor-
mation to allow the digital materials to be searched or manipulated. For yet
other communities, there must be enough contextual information of the right
kind to demonstrate that the authenticity of the digital materials has not been
compromised. Selection criteria for determining digital materials to be pre-
served need to take account of this contextual information. Chapter 5 examines
further the questions surrounding what attributes need to be preserved.

Stakeholder input

Chapter 2 noted that stakeholders typically play a greater role in digital preser-
vation than they played in the preservation of non-digital materials. There is
increasing awareness that the creators of digital materials are well placed to
influence the preservation of the materials they create, and that engaging and
influencing creators provide worthwhile benefits. The UNESCO Guidelines
(2003, p.73) suggest that because producers of digital materials are ‘well placed
to understand why digital objects were brought into being, their essential
“message”, and the relationships between objects and their context’ they are
likely to play an important role in selection decisions; in fact, it may be impos-
sible to reconstruct this information at a later date. An example of essential
engagement with stakeholders is the need to communicate with intellectual
property rights owners so that they understand the preservation implications of
the control they wish to exert over digital materials.
In the preservation of digital materials some community sectors, disci-
plines and individuals are increasingly engaged in the selection of what to
preserve, relieving archivists and librarians of sole responsibility for such
68 Selection for Preservation – The Critical Decision

decisions. Discipline-based approaches, which were also taken in the pre-


digital preservation paradigm, are proving effective, as shown by examples
such as the Smithsonian Astrophysics Observatory (SAO/NASA) Astrophysics
Data System (adswww.harvard.edu). A meta-analysis of sixteen case studies from
different disciplines carried out as part of the DCC SCARP Project concluded
that, because curation practices vary widely across the disciplines, research
data as a general concept is at a level too broad for generalized practice to be
developed and proposed ‘a finer-grained level, such as the research group’
(Lyon et al., 2010, p.4). Further case studies will add to our understanding and
indeed there is considerable research into understanding how data are used
and managed, as the Data Curation Profiles project (www4.lib.purdue.edu/dcp)
indicates.
The influential role of the individual in digital preservation is also in-
creasingly recognized. Although this is not a new role – institutional collectors
(libraries and archives) have relied on individuals to collect the more ephemeral
materials that constitute popular culture – the different characteristics of digital
materials call for greater emphasis on educating individuals about digital preser-
vation so that they can create preservation-friendly objects and keep them usable
while in the custody of individuals. Selection is emphasized in advice provided
to individuals, such as that given by the Library of Congress in the Personal
Archiving pages of its web site (www.digitalpreservation.gov/you).

Value of lifecycle models

As noted several times already in this book, selection decisions about which
digital materials to preserve are best made at an early stage in their existence.
Continuum and lifecycle models assist us to develop effective selection princi-
ples. The continuum approach, explained in more detail by Upward (2005), is
essentially a way of thinking about the life of a record from its creation onwards
and was conceived to get around issues associated with the traditional split
between records and archives. For electronic records, recordkeepers can no
longer wait ‘passively at the end of the life cycle for records to arrive at the
archives when their creators no longer wanted them – or were dead’ (Cook,
2000, p.2). Significant records must be identified early in their life so that
management and preservation decisions that will ensure ongoing access to the
critical aspects of those records, such as the information content and the attrib-
utes that determine their authenticity, are made right from the start.
Lifecycle models applicable to digital preservation have proliferated. Per-
haps the most widely applied, specifically developed for digital curation, is the
DCC Curation Lifecycle Model (Digital Curation Centre, 2008). This model
places heavy emphasis on selection and appraisal by encapsulating them in the
Sequential Action ‘Appraise & Select’ and the Occasional Actions ‘Reappraise’
Developing selection frameworks for preserving digital materials 69

and ‘Dispose’. (Harvey (2010, chapter 11) describes selection of digital mate-
rials for preservation in the context of the DCC Curation Lifecycle Model.)
Other lifecycle models similarly emphasize that selection of materials is essen-
tial for high-quality preservation. Selection is a component of the ‘Acquisition’
element of the LIFE (Life Cycle Information for E-Literature) Model (Wheatley
et al., 2007). Classifying data assets as ‘vital’, ‘important’ or ‘minor’ is part of
the Data Asset Framework process (www.data-audit.eu) (Jones, Ruusalepp and
Ross, 2009, p. 27).

Developing selection frameworks for preserving


digital materials
The point that traditional preservation practices do not transfer well to the digi-
tal environment has been made already, and it is certainly true for selection for
preservation. Characteristics of digital materials such as media instability and
technological obsolescence shorten the time frame within which decisions
about the future of digital materials must be made. The physical condition of
artifacts drives many traditional preservation decisions (Gertz, 2000, p.98), but
this factor is of little or no importance for digital materials. We have decided
what to preserve in the past by keeping everything that we can and waiting ‘for
the significant information to rise to the surface as time, people and events dic-
tate its importance’ (Howell, 2000, p.129). This inclusive approach has also
been applied in the digital environment, the most notable example being the
Internet Archive, but there is doubt about whether inclusive approaches are
sustainable as the quantity of digital materials increases and they avoid what is
the real issue – how to determine significance.
The evolution of thinking about selection of digital materials for preserva-
tion is instructive. Earlier literature about selecting digital materials was con-
cerned with the selection of analogue material for digitization and not with the
selection of digital material for preservation. These are not the same. The princi-
pal reason for selecting material to digitize is usually to improve access to that
material, which may still be subject to pre-digital preservation regimes. Selection
for preservation is, on the other hand, aimed at long-term retention of access to
information that is already, and probably only, in digital form. Digitizing pro-
grammes are usually access-driven, with preservation as a by-product. Despite
these differences, criteria for selecting materials to digitize provide useful advice
for developing selection criteria for preservation. While many of the traditional
selection criteria apply, the emphasis is different; for example, greater promi-
nence is given to user demand and intellectual property rights (Gertz, 2000,
pp.98-99). Gertz has identified the most frequently cited criteria for selecting
material to digitize as:
70 Selection for Preservation – The Critical Decision

– Does the item or collection have sufficient value to and demand from a current
audience to justify digitization?
– Do we have the legal right to create a digital version?
– Do we have the legal right to disseminate it?
– Can the materials be digitized successfully?
– Do we have the infrastructure to carry out a digital project?
– Does or can digitization add something beyond simply creating a copy?
– Is the cost appropriate? (Gertz, 2000, p.104)

There has been increasing recognition that additional criteria are needed to
adapt selection decisions to digital materials. The Cedars Project Team report
(2002) was particularly helpful with its definition of a digital object’s Significant
Properties as ‘the level of content and functionality retained’. These significant
properties are not empirical; they require judgments to be made by organiza-
tions about how they apply to their user communities and to the organization’s
preservation responsibilities. The Cedars Project Team report provides an
example: for a text in PDF, the decision is made that the text, not the format, is
significant, so information about the PDF format does not need to be stored
(Cedars Project Team, 2002, p.14-15). The report places high importance on
negotiating intellectual property rights before any other preservation actions
occur. It suggests that selection decisions for digital preservation must ‘be
pragmatic’ and based on the ‘estimated value of the material, the cost of stor-
age and support mechanisms, and the production of metadata to support the
material’ (Cedars Project Team, 2002, p.53). Primary criteria proposed were
that the digital materials was in currently high use, it was the type of material
that we would expect to preserve if it were published in traditional printed
format (typically commercially published scholarly works), and it was tied to
the long-term or cultural interests of the organization (Cedars Project Team,
2002, p.53). Additional criteria proposed included legal and IP issues, format
issues and technical issues.
The decision tree for selection of digital materials for long-term retention
(Digital Preservation Coalition, 2006), developed to accompany the Digital
Preservation Coalition’s handbook on the preservation of digital materials,
provides further guidance. It first poses questions relating to selection of con-
tent and format: is there an institutional selection policy? Does the material fit
into it? Is the material of long-term value? A second group of questions is
about legal and intellectual property issues: have acceptable rights been nego-
tiated? Can they be? Technical questions form a third group of questions: can
you handle the file format, now and in the future? Can the material be trans-
ferred to a more manageable format? The existence of documentation and
metadata form a fourth group: has sufficient been supplied?
These digital selection frameworks still place high priority on criteria for
determining value, but emphasize other criteria: the legal and intellectual
property rights governing a resource; whether we have the technical ability to
Developing selection frameworks for preserving digital materials 71

preserve it; the costs involved in preserving it; and the presence of appropriate
documentation and metadata. Other research and practice has indicated that
these other criteria are central to decision-making about selection of digital
materials for preservation. In Phase 1 of its investigations the InterPARES pro-
ject addressed the question of how to select electronic records for preservation.
It determined that records in digital format should be selected for long-term
preservation on the basis of their continuing value and authenticity, and whether
it is feasible to preserve them. The authenticity of records can be established
using the InterPARES Benchmark Requirements for Assessing the Authenticity
of Electronic Records. The feasibility of preservation should take into account
costs and capacity to preserve. Appraisal of records in digital form should be
carried out as early as possible, ideally being part of the design of records
management systems, but assessment of authenticity and feasibility may be
carried out later. Appraisal decisions should be monitored regularly to ensure
that the information kept about the records is valid (InterPARES, 1999-). The
UNESCO Guidelines consider the attributes that contribute to authenticity as
‘the elements that give material its value … the selection process should con-
sider what those elements and characteristics are’ (UNESCO, 2003, p.72).
These vary according to the kinds of material.

Some selection frameworks

It is unlikely that a single selection framework will suffice for all digital materials
in every context, given factors such as disciplinary differences and the variant
structures of digital materials (an email is not the same as a database, for exam-
ple). Selection frameworks for some materials are available and provide models
that can be adapted to suit other materials in different contexts. The concept of
technical appraisal provides one example. Thompson (2010) describes how
this is being applied at the Wellcome Library. The long-term preservation of
digital materials depends on an understanding of how file formats work and
requires access to appropriate software and hardware and the skills to use them.
If these are not available in the archive, digital preservation cannot be success-
ful. The Wellcome Library assigns high, medium and low levels of confidence
in its ability to preserve material based on their formats. These levels of confi-
dence are ‘based on resources available, the availability of tools for managing
digital material and experience with the life cycle management of born digital
materials’ (Thompson, 2010). Thompson takes care to point out that ‘this
approach is clearly based on pragmatic considerations’ and has risks associated
with it, but is ‘appropriate for the Wellcome Library at this point in time’. He
notes that ‘other factors such as intellectual content, significance of the material,
significance of the donor/creator and any relationship to material already in the
Library also play a part’. Selection may also be based on an assessment of
72 Selection for Preservation – The Critical Decision

risks. Risk management principles used at the British Library to prioritize digi-
tal materials stored on portable media such as CDs and DVDs are described by
McLeod (2008).
Research funding agencies increasingly require that data management plans
are presented as a part of applications for funding. Guidance for researchers
about developing funding applications includes advice about selection. Whyte
and Wilson’s guide (2010), developed for the Digital Curation Centre, provides
seven criteria that should be present in any selection policy for research data,
noting that they will be modified by discipline-specific factors. These criteria
are worth quoting in full:

1. Relevance to Mission: The resource content fits the centre’s remit and any priori-
ties stated in the research institution or funding body’s current strategy, including
any legal requirement to retain the data beyond its immediate use.
2. Scientific or Historical Value: Is the data scientifically, socially, or culturally sig-
nificant? Assessing this involves inferring anticipated future use, from evidence of
current research and educational value.
3. Uniqueness: The extent to which the resource is the only or most complete source
of the information that can be derived from it, and whether it is at risk of loss if not
accepted, or may be preserved elsewhere.
4. Potential for Redistribution: The reliability, integrity, and usability of the data
files may be determined; these are received in formats that meet designated techni-
cal criteria; and Intellectual Property or human subjects issues are addressed.
5. Non-Replicability: It would not be feasible to replicate the data/resource or doing
so would not be financially viable.
6. Economic Case: Costs may be estimated for managing and preserving the resource,
and are justifiable when assessed against evidence of potential future benefits; fund-
ing has been secured where appropriate.
7. Full Documentation: the information necessary to facilitate future discovery, ac-
cess, and reuse is comprehensive and correct; including metadata on the resource’s
provenance and the context of its creation and use (Whyte and Wilson, 2010).

Each of these seven criteria for selecting research data are expanded in the
guide. They are robust enough to be used as the basis for selection frameworks
for digital materials other than research data.

How much to select?


The discussions about how much to select, and about the feasibility of select-
ing all digital materials for preservation (that is, applying little or no selection),
were initially carried out in relation to web archiving. In this context the oppos-
ing ends of the spectrum are the selective approach, exemplified by the National
Library of Australia’s PANDORA digital archive (pandora.nla.gov.au), and
the ‘whole domain harvesting’ approach of the Internet Archive (www.archive.
Conclusion 73

org). Both have their strong proponents and in the preservation of web sites a
combination of targeted acquisitions and supplementary periodic snapshots of
a larger domain is often applied. The same spectrum applies to the selection
for preservation of other kinds of digital materials.
The cases for and against selection in the digital preservation context are
presented in this way:

Advocates of a comprehensive approach argue that any information may turn out to
have long-term value, and that the costs of detailed selection are greater than the costs
of collecting and storing everything. Advocates of a more selective approach argue that
it allows them to create collections of high value resources, with some assurance of tech-
nical quality and an opportunity to negotiate access rights with producers (UNESCO,
2003, p.73).

In practice both approaches are often combined.


Whyte and Wilson (2010) provide four objections to the the common view
that because storage is cheap we can keep everything. First, ‘digital content
expands’; the rate of growth continues to increase, as noted in Chapter 1. Sec-
ond, ‘backup and mirroring increases costs’; backup is currently considered
essential for digital preservation, so the storage requirements are, therefore, at
least doubled. Third, ‘discovery gets harder’; we still do not have acceptably
accurate tools for locating data in very large data sets. Fourth, ‘managing and
preserving is expensive’; high additional costs are incurred, such as the costs
of creating metadata. Other advocates of a selective approach have suggested
that sampling may prove useful (Neumayer and Rauber, 2007 and responses).
Whatever point on the selection spectrum is chosen, and at whatever point
on the timeline, a cautious approach is urged by the UNESCO Guidelines
because ‘a decision not to preserve is usually a final one for digital materials’
(UNESCO, 2003, p.72). A triage system might be used, so that material defi-
nitely worth preserving and material definitely not worth preserving are identi-
fied, and interim preservation measures are applied to the rest until a definite
decision about selection can be made.

Conclusion
Generic frameworks for selection of digital materials for preservtion are now
available and can be tailored to meet the requirements of specific contexts or
kinds of digital materials. Further research into selection is still needed, as it is
into other parts of the life cycle of digital materials. Such investigation needs
to take into account the high levels of human input currently required in making
selection decisions, and their cost. By identifying selection processes that can
be automated, it may be possible to reduce the level of resources needed for
this aspect of digital preservation. The outcomes may well result in ‘radically
74 Selection for Preservation – The Critical Decision

different approaches’ (NSF-DELOS Working Group on Digital Archiving and


Preservation, 2003, p.7). Some of these different approaches are suggested by
Oliver and her colleagues (2008), who pose two questions. The first: ‘Is it
necessary to determine the significance of digital information?’ Some digital
materials may be so poorly organized that determining signficance is exces-
sively time-consuming and resource-intensive. The second: ‘Is the process of
determining significance fundamentally flawed?’ (Oliver et al., 2008, pp.32-
36).
One could easily become obsessed by the challenges of selection, but con-
sider this: ‘preservation … is a relative rather than an absolute concept because
objects change over time as do our approaches to viewing or interpreting those
objects’ (Cloonan, 2001, p.235). So we should be reconciled to never getting it
completely right for all time and to the fact that concepts will change fre-
quently in the future. Nevertheless, it is our responsibility to rise to the chal-
lenges posed with some understanding of the consequences of ignoring them.
Chapter 5
What Attributes of Digital Materials Do We Preserve?
Introduction
There is an inherent paradox in digital preservation.
On the one hand, it aims to deliver the past to the
future in an unaltered, authentic state. On the other
hand, doing so inevitably requires some alteration
(Thibodeau, 2002, p.28)

In 1915 the fossil remains of an ancient Pleistocene hominid were discovered


in the Piltdown quarries in Sussex, following the discovery of other fossil
remains from this quarry in the preceding three years. The discovery of this
Piltdown man transformed thinking about the evolution of mankind. However,
in the 1950s it was clearly established that Piltdown man was ‘a case of out-
right deliberate fraud’, made up of fragments from a wide geographical and
temporal range of sources (Harter, 1999). There is a real possibility that the
issues that arose with Piltdown man might also arise with digital information if
digital materials are reconstructed, interpreted and used for evidential purposes
without appropriate knowledge of their context, leading, at the very least, to
erroneous conclusions, and perhaps even to fraud.
A digital ‘Piltdown man’ has probably occurred already. The characteristics
of digital materials predispose them to such misinterpretation and fraud. Pre-
serving digital materials inevitably means altering them, because the processes
and techniques we apply require changes to be made. This alteration triggers
Thibodeau’s ‘inherent paradox’, quoted at the head of this chapter. We need,
therefore, to determine the levels of change to digital materials that are accept-
able, so that we can claim to have preserved digital materials in a state as close
as possible to the original. We also need to have a clear idea about the level of
loss of digital data that we can accept. We need to understand better the nature
of the changes that we must make in order to preserve digital materials. We
need to record details of the changes.
As well as gaining a deeper understanding about the changes that we make
and the implications of making them, we need to know more about exactly
what it is we are trying to preserve. What characteristics, attributes, essential
elements, significant properties – many terms are used – of digital materials do
we seek to retain access to? What is the ‘essence’ of digital materials? Whereas
this question was a simple one to answer in the non-digital context, where
typically we sought to conserve and preserve the original artifact, for digital
materials it is not so straightforward.
76 What Attributes of Digital Materials Do We Preserve?

The example of emails, referred to in Chapter 2, is commonly used to illus-


trate this point. The essential elements of emails are relatively easy to identify:
the content information (sender’s name and address, subject, time, date, recipi-
ent(s), text of the message itself) and a simple standardized structure. But for
other digital materials the essential elements are considerably less standardized,
if they are standardized at all, so identifying what attributes of these we need
to retain is difficult. The UNESCO Guidelines note that
some materials are much more difficult to characterise, and expectations about how
they will be re-presented for use, especially to an open-ended community of potential
users, may be so hard to define in advance that it becomes almost impossible (UNESCO,
2003, p.75).

A good example to illustrate this difficulty comes from the realm of the web.
Some web sites do not have a clearly-defined audience, so it is difficult to
determine what users of these sites might wish to see in the future and even
more difficult to define the characteristics of those web sites that it is essential
to preserve. Many web sites are dynamic and interactive, with ‘search and
retrieval aspects intrinsically bound with content’. Therefore we need to preserve
the functionality of the search and retrieval components, as well as the data
that the functionality interacts with to generate the required output (Smith,
2003, p.1).
In order to determine the essential elements of digital materials we want to
retain access to, we need to know in the future ‘that they are what they purport
to be’, that they ‘are complete and have not been altered or corrupted’ (Ross,
2002, p.7). If we cannot ensure this, we will be victim to fraud, as exemplified
by the Piltdown man. Considerable energy has been expended, particularly in
the recordkeeping context, on determining what authenticity and integrity
mean in the digital environment. If these characteristics cannot be established for
digital materials, then their genuineness and our ability to use them for evidential
purposes are devalued. How do we demonstrate that digital materials have not
been altered? What actions do we have to take to establish authenticity and in-
tegrity and maintain them over time so that the future user can be confident the
digital materials they are using retain their original meaning? (Ross, 2002, p.7).
Another concept that we need to know more about when we address the
question ‘what attributes of digital materials do we preserve?’ is that of accept-
able loss. It is unlikely, if not impossible, that we can preserve all of the attributes
and functionality of digital materials, but we do not know much at all about the
levels of loss that are acceptable to users. In 2003 a statement about research
needs in digital preservation asked what still remains a crucial question:
how can we measure what loss is acceptable? What tools can be developed to inform
future users about the relationship between the original digital entity and what they re-
ceive in response to a query? (NSF-DELOS Working Group on Digital Archiving and
Preservation, 2003, pp.19-20).
Digital materials, technology, and data 77

Current knowledge of the attributes of digital materials and of how important


they are for users is in itself insufficient to preserve digital materials effec-
tively. We need to know even more:

If an object is preserved only to enhance access and use, some transformations might
be desirable that would be prohibited if they transformed attributes deemed essential to
the object … It would seem obvious that strategies, tactics and method for the preserva-
tion of digital objects ought to be informed by a rich understanding of their nature and the
specific objectives for preserving them in digital form. The nature of digital objects
defines what we need to preserve (Thibodeau, 1999).

This chapter examines some of the key concepts and issues associated with
identifying the attributes of digital objects that need to be preserved.

Digital materials, technology, and data


Digital materials result from the mediation of information technology and data.
That is, the digital object that is developed on a particular software and hardware
platform can be viewed as it was originally conceived only on that particular
platform. Viewing web sites illustrates this point. The software used to view them
– the web browsers – are configurable by the user in relation to characteristics
such as the font size and colour, unless the web site designer insists that the
browser displays a site in a particular way. The experience of viewing that web
site will be different for users who access it using software and hardware that
differs from that used to develop the site. Strictly speaking, observes Eastwood,
‘it is not possible to preserve an electronic record. It is only possible to preserve
the ability to reproduce an electronic record. It is always necessary to receive
from storage the binary digits that make up the record and process them through
some software for delivery or presentation’ (Eastwood, 2002, p.77, citing Thibo-
deau, 2000).
This mediation aspect – the rendering or presentation (and re-presentation)
of the digital files – is an important point to understand in digital preservation.
Digital objects are not simply a bit-stream; rather, they are a combination of
hardware, software and bit-stream. The bit-stream of a computer file has to be
processed technically before it makes sense to a user. If the hardware and
software combination differs from the combination within which the digital
object was created, it may look and behave differently from how it was origi-
nally intended to look and behave. This difference may be so extreme that the
authenticity of the digital object is called into question. (These concepts have
already been noted in the consideration of digital objects as physical, logical or
conceptual objects in Chapter 1.)
The UNESCO Guidelines (2003, p.120) identify three different kinds of
software dependency:
78 What Attributes of Digital Materials Do We Preserve?

– digital objects that are simple and less dependent on specific software
(plain text in ASCII files is an example)
– digital objects that depend on more complex software that is, however,
generic (such as HTML)
– digital objects that depend on software specific to a particular hardware
platform or operating environment (such as spreadsheets and word proc-
essing software).

Some digital objects contain executable files and software programs, and others
may combine all of the above.
The extent to which the digital object is dependent on software has a sig-
nificant effect on the way in which it can be preserved (UNESCO, 2003,
p.120). Furthermore, it is generally considered that it is not possible to preserve
originals of digital materials, because migration is probably inevitable at some
stage in the preservation process, and migration involves newer technology
that alters the originals. We are preserving copies of the originals, ‘preservation
copies made according to the particular methods and strategies that are appro-
priate or expedient’ (Gilliland-Swetland, 2002, p.198).
Crucial to the development of digital preservation was the realization that
separating technology from data is a key requirement. The technology (hard-
ware and software) changes rapidly and will continue to do so. Earlier thinking
placed a high emphasis on preserving the ability to read bit-streams in the
software that was used to create them, and digital preservation solutions such
as museums of technology and emulation were proposed. Once it was acknowl-
edged that bit-streams would have to be transferred from one medium to another
and that they could be rendered (or viewed or presented) using hardware/soft-
ware combinations other than those with which they were created, the focus
shifted from maintaining access to obsolete software and hardware to deter-
mining the characteristics or attributes of digital objects essential for maintaining
access to them on current and future hardware/software combinations. However,
this change in thinking brought its own problems, as an early statement about
electronic records recognized:

The content of a traditional record is recorded on a medium (storage device, like a


piece of paper) and cannot be separated from this medium. The content of an electronic
record is also recorded on a medium, but from time to time it has to be separated from
the original device and transferred to other, and often different types of, storage devices
whenever it is retrieved or when necessitated by technological obsolescence. Unlike
traditional records, an electronic record is therefore not permanently attached to a spe-
cific medium or storage device, so the opportunities for corruption grow. This pre-
sents additional problems in ensuring that the record’s authenticity and reliability are
maintained (International Council on Archives Committee on Electronic Records,
1997, p.22).
The importance of preserving context 79

Once we accept that attention should be focused on the preservation of the data,
regardless of the hardware and software that the data will be processed with,
then we need to pay a lot more attention to those data – to the attributes of the
data to be preserved and the level of acceptable loss and alteration of the data.
As already noted, the process of preserving digital objects inevitably means
that they are altered. While at one level digital preservation is a relatively simple
matter of preserving the bit-stream (although its simplicity should not be over-
stated, as Rosenthal (2010a) reminds us), the result is meaningless unless the
means to read and interpret that bit-stream are available, either by preserving
them or developing software that makes them accessible. We might, for example,
emulate a software application with the probable resultant loss of some func-
tionality, which affects that software’s ability to interpret the bit-stream fully.
What effect does this have on the authenticity of the digital object? Is some
loss acceptable? If we can determine which attributes of that digital object are
essential for us to understand it, we can then seek to preserve these attributes.
For example, the colours presented on a screen may not be significant for text
content, but they could be significant for understanding a web page. Static
document-like digital materials, typically those in PDF, HTML or XML for-
mats, can be read using software other than that in which they were created.
(In fact, it is this very characteristic of these file formats that makes them so
useful for digital preservation purposes.) Other digital materials, such as edu-
cational software, games and web animations, are ‘executable’ files – that is,
they are in a format that is directly read by a computer as a program and is run
(‘executed’). These are typically in proprietary file formats and are, therefore,
not easily re-usable in other contexts.
To date the determination of which attributes of digital objects are essen-
tial to maintain into the future has been addressed chiefly by the recordkeeping
community in relation to establishing the requirements for authentic digital
records (considered later in this chapter). Once these have been established,
‘combinations of data, software and hardware that will re-present those elements
as accurately as required’ need to be maintained (UNESCO, 2003, pp.120-121).

The importance of preserving context


In addition to determining the essential attributes of digital objects, informa-
tion about the context in which the digital objects were created and used needs
to be identified and maintained over time. Gilliland-Swetland points out the
value that is placed on context in standard archival practice:

When materials are generated by the activity of an individual or organization, an inter-


dependent relationship exists between the materials and their creator. A complex web
of relationships also exists between the materials and the historical, legal, and proce-
80 What Attributes of Digital Materials Do We Preserve?

dural contexts of their development as well as among all materials created by the same
activity. The organic nature of records refers to all these interrelationships, and archival
practices are designed to collectively document, capture, and exploit them. These
practices recognize that the value of an individual record is derived in part from the
sequence of records within which it is located. They also recognize that it can be difficult
to understand an individual record without understanding its historical, legal, proce-
dural, and documentary aspect (Gilliland-Swetland, 2000, pp.16,18).

Maintaining contextual information about digital objects, a concept deriving


from archival practice, is acknowledged as having significant value when we
manage digital materials in non-archival contexts. The requirement to trace the
process of creation and change is doubly significant for digital materials because,
as we have seen, they are altered in the process of preserving them. It is sig-
nificant because the authenticity of digital objects – whether they are what they
claim to be – and our ability to trust them as accurate depend on knowing how,
when and by whom they have been altered. In other words, we need to pre-
serve information about the context of the data (as has already been noted in
Chapter 4.)
Three examples illustrate the importance of determining and keeping con-
textual information. For records (that is, records as evidence of transactions) to be
authentic, ‘knowledge about the business context and about the interrelation-
ship between records’ is required. We should be able to identify ‘why, when,
where, by whom and so on they were created (or received) and used’ as indica-
tors of the record’s trustworthiness. Put another way, ‘it should be possible to
position a record in the time and context in which it purports to originate’
(Hofman, 2002, pp.18-19). The value of a digital file that represents a single
page of a digitized book is limited unless the relationship between that file and
files representing other pages of the same book is known and maintained. The
value of data from a curated database (a database that is contributed to and
edited by experts in a field) is limited unless those data can be linked to infor-
mation about their provenance (where they came from), which indicates how
authoritative they might be, and about who edited them and the reasons for
their editing.

The OAIS Reference Model


The Open Archival Information System (OAIS) Reference Model brings
together many of the concepts discussed so far in this chapter and points to
others that are relevant to determining the attributes of digital materials that we
should attempt to maintain into the future. This model was developed to pro-
vide a common framework for describing and comparing architectures and
operations of digital archives. It has been adopted throughout the world as a
schema for the design of digital preservation systems. The digital preservation
The OAIS Reference Model 81

community has adopted the concepts and terminology embedded within it. The
OAIS Reference Model is now firmly established as the key international
standard in digital preservation. (Lee (2010) provides a brief and informative
introduction to the OAIS Reference Model and its influence.)
The OAIS Reference Model was developed by the Consultative Committee
for Space Data Systems (based at NASA in the US) and, after input from other
communities, is now an international standard – ISO 14721:2003. The OAIS
Reference Model has been widely used as the basis for digital preservation
systems and research projects, including those of major libraries, such as the
British Library and the national libraries of Australia, France, New Zealand
and the Netherlands, and significant research projects, such as Planets (Lee,
2010, p.4028).
Part of the significance of the OAIS Reference Model lies in the establish-
ment of a common language for discussion of digital preservation; its Fore-
word notes that it ‘establishes a common framework of terms and concepts’ to
allow ‘existing and future archives to be more meaningfully compared and
contrasted’ and to promote standardization (Consultative Committee for Space
Data Systems, 2002, p.iii). For example, it defines long term preservation:
‘long enough to be concerned with the impact of changing technologies, in-
cluding support for new media and data formats, or with a changing user
community. Long Term may extend indefinitely’ (Consultative Committee for
Space Data Systems, 2002, p.1-1). It makes a clear distinction between simple
data storage and long-term preservation:

A major purpose of this reference model is to facilitate a much wider understanding


of what is required to preserve and access information for the Long Term. To avoid
confusion with simple ‘bit storage’ functions, the reference model defines an Open
Archival Information System (OAIS) which performs a long-term information preser-
vation and access function (Consultative Committee for Space Data Systems, 2002,
p.2-1).

The significance of the OAIS Reference Model also lies in its articulation of
the functional requirements of a digital archival system. It defines seven func-
tions:

– the Ingest function is ‘responsible for receiving information from producers


and preparing it for storage and management within the archive’
– the Archival Storage function ‘handles the storage, maintenance and
retrieval of the AIPs [Archive Information Packages] held by the archive’
– the Data Management function ‘coordinates the Descriptive Information
pertaining to the archive’s AIPs, in addition to system information used in
support of the archive’s function’
– the Administration function ‘manages the day-to-day operation of the ar-
chive’ and coordinates other functions
82 What Attributes of Digital Materials Do We Preserve?

– the Access function ‘helps consumers to identify and obtain descriptions of


relevant information in the archive, and delivers information from the ar-
chive to consumers’
– the Preservation Planning function undertakes technology watch, develops
preservation plans and recommends changes
– the Common Services function provides services that any IT system needs
in order to function.

These seven functions identify the processes that need to be incorporated into a
system for preserving digital information.
The OAIS Reference Model has proved to be influential in the develop-
ment of digital preservation, one indication being the widespread adoption of
the OAIS Reference Model’s term ingest in digital preservation discussion. Of
greatest relevance to this chapter, however, is the OAIS concept of ‘information
package’, explained as

a conceptual container of two types of information called Content Information and


Preservation Description Information (PDI). The Content Information and PDI are
viewed as being encapsulated and identifiable by the Packaging Information. The result-
ing package is viewed as being discoverable by virtue of the Descriptive Information
(Consultative Committee for Space Data Systems, 2002, p.2-5).

In other words, a digital object consists of considerably more than just the con-
tent that we wish to preserve (the ‘information which is the original target of
preservation’). It also comprises information that tells us what we need to know
in order to preserve it (‘to ensure it is clearly identified, and to understand the
environment in which the Content Information was created’), information about
its attributes (such as file formats), and so on. What is this information? And
why is it an essential component of the digital materials ‘package’ in relation
to preservation? This information – metadata – is an essential part of the OAIS
Reference Model and is vital to all digital preservation activities. The OAIS
Reference Model uses the terms description information and representation
information to refer to metadata (that is, ‘structured information that describes,
explains, locates, or otherwise makes it easier to retrieve, use, or manage an
information resource’ (NISO, 2004, p.1)).

The role of metadata


Metadata is an essential component of digital preservation. The UNESCO
Guidelines reinforce the essential nature of metadata for digital preservation,
listing among their fundamental principles four relating to metadata:
The role of metadata 83

17. Digital heritage materials must be uniquely identified, and described using
appropriate metadata for resource discovery, management and preservation.
18. Taking the right action later depends on adequate documentation. It is easier to
document the characteristics of digital resources close to their source than it is to
build that documentation later.
19. Preservation programmes should use standardised metadata schemas as they become
available, for interoperability between programmes.
20. The links between digital objects and their metadata must be securely maintained,
and the metadata must be preserved (UNESCO, 2003, p.22).

Metadata, ‘the backbone of digital curation’ (Higgins, 2006), allows digital


objects to be identified, located, accessed, understood and used. More specifi-
cally, in digital preservation what we need metadata to do is:

– ensure that we can always locate digital objects, regardless of where they
are stored
– describe digital objects clearly
– indicate the relationship between one digital object and other digital ob-
jects
– identify the technical characteristics of digital objects clearly
– indicate who has responsibility for managing and preserving digital objects
– describe how digitals object can legally be used
– describe the requirements for re-presenting digital objects
– record the history of digital objects
– document the authenticity of digital objects.

A reminder is timely at this point – digital preservation requires that the meta-
data associated with a digital object, as well as the digital object itself, is main-
tained over time and is readable in the future.
The kinds of metadata required to carry out the functions noted above are
generally categorized as:

– Descriptive metadata, which describes digital objects in ways that ensure


that they can be identified and located
– Structural metadata, which indicates the relationships between one digital
object and others
– Technical metadata, which records details of technical aspects of the crea-
tion, encoding and formatting of digital objects, which are required so that
these digital objects can be used
– Administrative metadata, which describes the processes applied to digital
objects over time and may include rights management metadata and other
specific kinds of metadata
84 What Attributes of Digital Materials Do We Preserve?

– Preservation metadata (sometimes considered as a subset of administra-


tive metadata), which records the provenance of digital objects and the ac-
tions applied to them as they are managed over time in a digital archive.

Preservation metadata

We need to consider preservation metadata in more detail. It is defined as


‘information that supports and documents the process of digital preservation’
(Caplan, 2006, p.7). It should not be confused with descriptive metadata, such as
that produced by the application of schemas like Dublin Core and its many
derivatives, EAD (Encoded Archival Description), VRA Core (Visual Resources
Association Core), and MODS (Metadata Object Description Schema). These
have the primary aim of resource discovery, that is of enabling information
resources to be identified and linked to users’ requests, and are best considered
as ‘a set of signposts for digital surfers … there to guide people to resources’
(Wilson, 2003, p.33, citing Tom Baker). By comparison, preservation metadata
has a different aim, which is to support ‘the functions of maintaining the fixity,
viability, renderability, understandability, and/or authenticity of digital materi-
als’ (Caplan, 2006, p.7). It does this by storing ‘technical details on the format,
structure and use of the digital content, the history of all actions performed on
the resource including changes and decisions, the authenticity information
such as technical features or custody history, and the responsibilities and rights
information applicable to preservation actions’ (Preservation metadata, 2003).
As already noted, the OAIS Reference Model considers preservation
metadata as being of two kinds: content information, and preservation descrip-
tion information. Content information consists of ‘details about the technical
nature of the object which tells the system how to re-present the data as spe-
cific data types and formats’. Preservation description information is ‘other
information needed for long-term management and use of the object, including
identifiers and bibliographic details, information on ownership and rights,
provenance, history, context including relationships to other objects, and valida-
tion information’ (UNESCO, 2003, p.94). More specifically, preservation meta-
data

– Identifies the material for which a preservation programme has responsibility


– Communicates what is needed to maintain and protect
– Communicates what is needed to re-present the intended object (or its defined es-
sential elements) to a user when needed, regardless of changes in storage and access
technologies
– Records the history and the effects of what happens to the object
– Documents the identity and integrity of the object as a basis for authenticity
– Allows a user and the preservation programme to understand the context of the object
in storage and in use (UNESCO, 2003, p.94).
The role of metadata 85

Preservation metadata standards

Commonly encountered standards for preservation metadata include PREMIS


(Preservation Metadata: Implementation Strategies) and METS (Metadata
Encoding and Transmission Standard), although many others are also used.
PREMIS defines preservation metadata elements that apply to a wide range of
digital objects. It is used in conjunction with other metadata standards, often
with METS, a standard that specifies how metadata is encoded in XML. This
means that metadata can be expressed in a standardized way so that it can be
used on most hardware and software platforms. XML is a widely adopted non-
proprietary standard that is widely understood and well supported by open
software applications. Standardizing the way in which metadata is expressed
using XML allows it to be exchanged among and reused by different archives.
The development of preservation metadata standards is notable for its high
level of international collaboration. NEDLIB (Networked European Deposit
Library) published a metadata scheme in 2000, which was intended specifi-
cally to manage the preservation of digital materials received through national
deposit. Cedars (CURL Exemplars in Digital Archives) quickly developed
the NEDLIBS metadata scheme further to cover administrative, technical and
legal information (Lupovici, 2001, pp.6-7). These and other schemes were
analyzed by a Working Group on Preservation Metadata convened by OCLC
and the Research Libraries Group (RLG), with representatives from the RLG’s
international membership. The Working Group released a recommended preser-
vation metadata framework in 2002 (OCLC/RLG Working Group on Preservation
Metadata, 2002), which was based on the OAIS Reference Model recommen-
dations. Following its release, OCLC and RLG convened the PREMIS Working
Group to focus on how preservation metadata could be implemented in digital
preservation systems. In 2004 PREMIS released a report on metadata practices
and other digital preservation activities from respondents in 13 countries
(OCLC/RLG PREMIS Working Group, 2004). Among its outcomes was an
implementable set of core preservation metadata elements and a data diction-
ary, available in 2005. (Caplan (2006) provides an excellent introduction to
preservation metadata which gives more on the history of its development; her
writing about PREMIS specifically (Caplan (2009) is also recommended.)
Preservation metadata schemas continue to evolve. There is still relatively
little experience of the effective application of preservation metadata. The
NSF-DELOS Working Group on Digital Archiving and Preservation noted
that there had been ‘limited evaluation of the effectiveness or cost of metadata
for managing digital entities over time’ and suggested that research is needed,
for example to demonstrate its value for specific purposes and to determine how
much metadata is needed. Tools are needed to ‘aid in the creation, authoring and
management of metadata’, such as those which automate, or partly automate, the
creation of metadata. (One example is the National Library of New Zealand’s
86 What Attributes of Digital Materials Do We Preserve?

metadata extraction tool.) Also required are tools that manage metadata schemes
so that they are useful over time, such as those ‘to track the provenance of
metadata schema, for version control, and to allow users to navigate from cur-
rent metadata schema and ontologies to those used when the digital entity
was created’. The value of metadata could be assessed relative to the costs of
‘extracting, creating and managing’ it, to provide a better understanding of the
‘minimum amount of metadata necessary for digital preservation’ (NSF-DELOS
Working Group on Digital Archiving and Preservation, 2003, p.19).

Persistent identifiers
Persistent identifiers are a form of metadata applied to digital materials. A per-
sistent identifier is ‘a name for a resource which will remain the same regardless
of where the resource is located’ (National Library of Australia, 2002). If the
material (or resource) is moved, the persistent identifier provides access to it in
its new location, provided that the persistent identifier is ‘maintained with the
correct current associated location when the resource was moved’ (Persistent
identifiers, 2002). Persistent identifiers can be considered as a element of preser-
vation metadata, in the subset of preservation description information already
noted above.
The use of persistent identifiers is usually explained with respect to web-
based digital materials. The normal way of identifying web materials is to use a
URL (Uniform Resource Locator). However, a URL points only to one location
of the material on the web; if the location of that material changes (for example,
the material is moved from one domain name to another), the URL is altered and
the material becomes inaccessible if the superceded URL is used. The persis-
tence of URLs has been investigated in many studies and their tendency to change
is widely recognized as an impediment to digital continuity. A recent example is
Rhodes report (2011) on a study which found that link rot was 8.3 per cent in the
first year, 14.3 per cent in the second year, 27.9 per cent in year three and 30.4
per cent in the fourth year. Persistent identifiers have been developed as a re-
sponse to this tendency. They are needed not only for web material but also for
data of all kinds. Unambiguous and reliable identification of digital materials is
essential for reliable long-term access and for ensuring their authenticity. It is
also required to support the linking of data, through which digital objects are
connected to other data or digital objects automatically and transparently.
Persistent identifiers are not new; formal schemes for identifying objects,
such as ISBN (International Standard Book Numbers), ISSN (International
Standard Serials Numbers) and ISMN (International Standard Music Numbers),
were developed and applied for many years in the pre-digital context. Various
kinds of persistent identifiers for digital materials have been developed and
used. The Uniform Resource Name (URN) is ‘a standard, persistent and unique
Authenticity 87

identifier for digital resources on the Internet’ (Persistent identifiers, 2002).


URNs rely on a resolver service to link a URN to the location (URL) of the
web site on which the digital resource is hosted. A Persistent Uniform Resource
Locator (PURL) is similar to a URL, but links to the resolver service rather
than to the actual location of the digital material on the web. An example of a
PURL is http://purl.org.net/weibel/, where http:// is the protocol, purl.org.net/
is the resolver address, and weibel/ is the name. (This hypothetical example is
one of many given in answers to frequently asked questions about PURL
(purl.oclc.org/docs/purl_faq.html#toc1.4)). The PURL service was developed
and is maintained by OCLC.
The Digital Object Identifier (DOI) system arose from an initiative of the
Association of American Publishers to ‘assist the publishing community with
electronic commerce and copyright management of digital objects published on
the Internet’ and allow ‘the allocation of a unique digital identifier to commer-
cial digital publications’ (Persistent identifiers, 2002). The system ‘provides
for the unique identification, persistence, resolution, metadata, and semantic
interoperability of of content entities (“objects”)’ (Paskin, 2010, p.1587) and
has become an international standard (ISO 26234:2010). The rules and opera-
tion of the DOI system are defined by the International DOI Foundation
(www.doi.org), which operates through seven Registration Agencies, covering
different types of digital material. These agencies register DOI names and pro-
vide the necessary infrastructure for the allocation and maintenance of DOIs.
Other persistent identifier systems have been developed, such as the Handle
System (www.handle.net), which is the resolution component of the DOI
system, and the ARK (Archival Resource Key). Tonkin (2008) provides additio-
nal information about the range of persistent identifiers available. Research
into and development of persistent identifers continues, as the PersID initiative
(www.persid.org) illustrates.

Authenticity
As noted in Chapter 1, definitions of terms taken from the pre-digital preserva-
tion paradigm are not always appropriate when applied to digital preservation.
Discussions about digital preservation are further confused by the difference in
terminology among librarians, recordkeepers and other interested groups:
‘Terms like provenance, archiving, context, records, etc. are used with slightly
different meanings … any discussion about preservation is challenged by
confusion of terminology’ (Hofman, 2002, p.15). The term authenticity is one
of these, so we need to establish what it means in the context of digital preser-
vation.
88 What Attributes of Digital Materials Do We Preserve?

Authenticity is defined as:

– the ‘quality of genuineness and trustworthiness of some digital materials,


as being what they purport to be, either as an original object or as a reliable
copy derived by fully documented processes from an original’ (UNESCO,
2003, p.157)
– referring to ‘the degree of confidence a user can have that the object is the
same as that expected based on a prior reference or that it is what it pur-
ports to be’ (Authenticity, 1999).

Terms related to authenticity are:

– integrity, which for digital material is ‘the state of being whole, uncorrupted
and free of unauthorised and undocumented changes’ (UNESCO, 2003,
p.158)
– significant properties – ‘the elements, characteristics and attributes of a
given digital object that must be preserved in order to re-present its essential
meaning or purpose’ (UNESCO, 2003, pp.157-158); and
– identity – the attributes of a digital object that uniquely characterize and
distinguish it from other digital objects (Duranti, 2003, p.2).

The UNESCO Guidelines get to the heart of the matter, stating succinctly that
‘Authenticity derives from being able to trust both the identity of an object
– that it is what it says it is, and has not been confused with some other object –
and the integrity of the object – that it has not been changed in ways that
change its meaning’ and that ‘evaluating, maintaining and providing evidence
of continued authenticity are key responsibilities for most preservation pro-
grammes’ (UNESCO, 2003, p.108-109).
Authenticity is valued in a number of different contexts for a variety of
reasons. It is critical where digital materials are used as evidence. For records,
the authenticity of the record is paramount; all who use it need to be able to
trust that it is what it purports to be. Many kinds of enterprises, whether com-
merical or nonprofit, government or non-government, are legally obliged to
keep records for specified periods to demonstrate accountability and for other
reasons such as continuity of operations and organizational memory; their
records must be authentic if they are to support these purposes. Scholars and
researchers rely on references to the materials they cite being stable over time.
For legal purposes some material must be able to meet the evidential require-
ments of a jurisdiction. The list goes on: heritage materials are valued because
they are authentic; for scientific data, ‘trust in their ongoing authenticity is criti-
cal, for without it they are of virtually no value’ (UNESCO, 2003, p.108).
A high value has been placed on authenticity for a very long time. The
government of ancient Athens considered that a key function of a library
Authenticity 89

should be ‘to serve as a repository of trustworthy copies’, at least of the works


of Aeschylus, Sophocles, and Euripides, whose plays were considered to be
superior to those of their successors and were revived repeatedly to become the
dramatic mainstay. However, according to Casson (2002), ‘the actors in them
took liberties with the text’ to the extent that Lycurgus, who held high office in
Athens from 338 to 325 BC, directed that ‘Written versions of the tragedies of
[Aeschylus, Sophocles, and Euripides] are to be preserved in the records office,
and the city clerk is to read them, for purposes of comparison, to the actors
playing the roles, and they are not to depart from them’. As Casson (2002,
pp.29-30) states, ‘an authoritative version of each play was to be kept on file,
and the actors were to follow it, under penalty of law’.
Authenticity of non-digital records has traditionally been assured by ‘the need
for creators to rely upon their own active records, the fixity of these records, a
documented unbroken chain of custody from the creators to the archivists, and
the description of the archival record within a finding aid’ (Gilliland-Swetland,
2005). Authenticity of non-digital materials was determined by considering
provenance, integrity and context. Artifacts are of value because their original-
ity, fidelity, fixity, and stability are preserved (Task Force on the Artifact in
Library Collections, 2001, pp.10-11). For museum objects, provenance (‘the
chain of ownership and context of use of an object’ ((Significance), 2001,
p.37)) is especially important. An artwork that has a doubtful provenance is
diminished in value (usually financial); material from an archaeological site
whose provenance has not been recorded is similarly reduced in its reliability
as evidence of past societies, and, therefore, in the uses to which it can be put.
Integrity – an object’s ‘condition, intactness and integrity’ – provides an object
with authenticity in that it is complete and in original condition ((Significance),
2001, p.43). These determinations of authenticity and integrity are very closely
linked to context. We need to be sure that the context in which the object was
produced, discovered, owned and stored is recorded. But this is more difficult
to ensure for digital materials, and their authenticity, therefore, is harder to es-
tablish and maintain.
Why is establishing and maintaining the authenticity of digital materials
different from traditional materials? Why is it of such concern to those involved
in digital preservation? As noted at the start of this chapter, preserving digital
materials inevitably means altering them, because the processes and techniques
we apply require changes to be made. Furthermore, changes to digital materials
are easy to make. In other words, the threats to authenticity of digital materials
are intrinsic to the preservation processes applied to digital materials. These
threats (UNESCO, 2003, p.109) can be summarized as being of three kinds:

1. multiple versions of a digital object resulting from ‘confusion in identifying


data, changes to identifiers, or failure to document the relationships between
different versions or copies’
90 What Attributes of Digital Materials Do We Preserve?

2. changes caused by preservation processes, for example, the common practice


of migrating information from one system or carrier another may result in
changes, as may adding metadata, creating new copies and other processes
3. changes to the content of the material.

In addition, as noted in Chapters 2 and 3, there are threats that apply to all digi-
tal data – what the UNESCO Guidelines call ‘the ongoing integrity of data’ –
and they affect authenticity. Among these threats are breakdown of carriers,
malicious acts, such as attacks by hackers or viruses, terrorist attacks, war, civil
unrest (especially as they compromise power supplies and the integrity of build-
ings), accidental acts by staff, and fires, floods and other natural disasters, and
business failure (UNESCO, 2003, p.109).
Because authenticity is an attribute that is so highly valued, digital preserva-
tion programmes need to take appropriate steps to ensure that it is not compro-
mised during the processes of managing the materials in their custody. The
strategies applied to ensure the authenticity of digital materials include assigning
unique identifiers, applying encapsulation techniques (packaging together the
digital object, its metadata, and other associated data), digital watermarking,
using digital signatures, encryption, digital time stamping, maintaining audit
trails, controlling custody, and establishing trusted repositories (noted later in
this chapter).

Significant properties

Ensuring authenticity does not demand that materials are kept in their original
form without change; as already noted, it is impossible to keep digital materials
in their original form. To attempt to do so is incompatible with digital preser-
vation processes, which often result in the original bit-stream being altered.
We need to identify the attributes of digital materials that must be preserved to
ensure authenticity, and also those attributes that we do not need to preserve.
The attributes that must be preserved are variously described as essence, essential
elements, and significant properties, which has become the term most com-
monly encountered.
The term significant properties is usually defined as ‘the characteristics of
digital objects that must be preserved over time in order to ensure the continued
accessibilty, usability, and meaning of the objects, and their capacity to be ac-
cepted as evidence of what they purport to record’ (Grace, Knight and Montague,
2009, p.3). It was first defined and used by the Cedars Project, which ran from
1998 to 2002. Other research projects expanded on the work of Cedars, including
research at the National Archives of Australia (Heslop, Davis and Wilson, 2002)
and the InSPECT Project based at Kings College London and The National
Archives (Grace, Knight and Montague, 2009). Knight and Pennock (2009)
Authenticity 91

describe research projects that have contributed to our understanding of sig-


nificant properties.
Hedstrom and Lee (2002, p.218) defined significant properties as ‘those
properties of digital objects that affect their quality, usability, rendering, and
behaviour’. Microfilming of paper documents preserve the significant proper-
ties of most of these documents, that is, their information content. But what is
the equivalent for digital materials, which are considerably more complex?
Hedstrom and Lee attempted to provide an empirical model for assisting deci-
sion-making about which significant properties of digital materials need to be
preserved. The National Library of Australia’s Digital Preservation Issues Group
also investigated significant properties, defining them in relation to four cate-
gories of desired functionality of digital objects that it wished to preserve: 1)
full dynamic/interactive functionality; 2) look, sound and feel; 3) intellectual
content; and 4) description of what was (for example, metadata for an object
that no longer exists). Both the National Library of Australia’s experience and
that of Hedstrom and Lee suggest that further research is required to ascertain
significant properties for specific types of digital materials in relation to their
use by particular communities (Hedstrom and Lee, 2002, p.223). Some further
research has occurred, most notably in the InSPECT Project, but there is much
more to be done in defining the significant properties of digital formats.
The term ‘essence’ of digital materials is used by the National Archives of
Australia to refer to the essential characteristics or properties of a record
(Heslop, Davis and Wilson, 2002, p.13). An Australian digital preservation
specialist interviewed in 2004 noted the National Archives of Australia’s view
of essence and the related term performance:

issues like look and feel … are significantly less important, because we can demonstrate
to you that if we show you the same Word document on a different operating system, it
can actually look different. If we show it on a different machine which has different
fonts installed, it will look different. If I alter the page setup, or even on different
machines with the same page setup, you’ll get a different pagination, and if you’ve got
an automatic footer with a page number in it, it will automatically change that pagination
for you, and it won’t be apparent to you that the pagination has changed. So a lot of that
we have actually defined as being ephemeral or circumstantial aspects of the performance
of the record in a particular situation … is not particularly relevant.

The idea of access to digital materials as analogous to a performance is also


well established and has been widely adopted. Thibodeau (2000, p.1) noted
that:

It is necessary to recognize that, strictly speaking, it is not possible to preserve an elec-


tronic record. It is only possible to preserve the ability to reproduce an electronic record.
It is always necessary to retrieve from storage the binary digits that make up the record
and process them through some software for delivery or presentation. Analogously, a
musical score does not actually store music. It stores a symbolic notation which, when
92 What Attributes of Digital Materials Do We Preserve?

processed by a musician on a suitable instrument, can produce music. Presuming the


process is the right process and it is executed correctly, it is the output of such processing
that is the record, not the stored bits that are subject to processing.

The concept is simple: the application of the same software and hardware to
the same digital materials should create a ‘presentation or performance’ that is
the same every time. For preserved digital materials, their essential elements
are presented during a performance sometime in the future: ‘copying data from
carrier to carrier, and providing the right tools to recreate the intended perform-
ance will preserve continuity of access to most digital objects’. This apparently
simple model is, however, rather more complex in practice:

it may be hard to define the performance that must be re-presented; it is usually diffi-
cult to work out what tools are needed once the original ones have been lost; the tools
themselves typically rely on other tools that also may have been superseded; and it may
be difficult to find tools that will create the required performance in a reliable, cost-
effective and timely way, especially in the context of many thousands, millions or more
of digital objects. Despite such underlying complexities, the performance model helps
in recognising what digital preservation programmes must aim for: the best means of
re-presenting what users need to access (UNESCO, 2003, p.35).

What this boils down to is that we need to understand the characteristics em-
bodied in the materials we are preserving and to determine the minimum set of
these characteristics that must be maintained in order to recreate the materials
in the future. As well as needing to preserve the physical object (the physical
form that carries the bit-stream) it is necessary to preserve the logical object
(the computer-readable code) and the conceptual object (‘the performance pre-
sented to a user’ (UNESCO, 2003, p.35)) that is meaningful to the human user.
However, we also need to envisage the digital object as ‘bundles of essential
elements that embody the message, purpose, or features for which the material
was chosen for preservation’ (UNESCO, 2003, p.35). Not all of the elements
that make up a digital object are equally important in recreating the conceptual
object.
It is also necessary to understand the needs of the community of users for
whom digital materials are being preserved. The significant properties concept
can also be understood as the properties of digital materials that are significant
to the stakeholder; that is, it is the stakeholder who assigns significance, rather
than the technical characteristics of the digital object that determine it (del Pozo,
Long and Pearson, 2010, p.293). ‘Community’ can be as closely defined as a
specific organization or discipline group, or as widely defined as ‘the general
public’. Community needs determine the kind of material selected for preserva-
tion and the level of authenticity required for the material selected. (As already
noted, the OAIS Reference Model articulates this as Designated Community –
‘An identified group of potential Consumers who should be able to understand
Authenticity 93

a particular set of information. The Designated Community may be composed


of multiple user communities’ (Consultative Committee for Space Data Systems,
2002, p.1-10). ‘Essence’, ‘significant properties’, or ‘essential elements’ of the
materials selected are defined in relation to the community’s requirements. For
example, if the community is one where value is placed on the authenticity of
records of transactions (that is, their evidential value is high), then considerable
attention must be paid to maintaining the integrity of those records by ensuring
that any alterations made to them are carried out only by authorized personnel
and are appropriately documented, or that the records are preserved in an un-
alterable (read-only) form. However, some communities will not require that
authenticity be proved to this extent. ‘Ultimately, preservation programmes
must decide how much to invest in ensuring that the authenticity of material in
their care can be trusted, bearing in mind that object identity and data integrity
are fundamental responsibilities’ (UNESCO, 2003, p.110).
The UNESCO Guidelines suggest some questions to be posed in deciding
on significant properties (‘essential elements’ as they refer to them) to preserve:

– For whom should this material be kept? Do they have specific expectations about
what they will be able to do with the material when it is re-presented?
– Why are the materials worth keeping? What gives them the value that warrants the
trouble of preserving them? Is that value associated with evidence, information, ar-
tistic or aesthetic factors, significant innovation, historic or cultural association,
what a user can make the material do or do with the material, culturally significant
characteristics?
– Is the value tied to the way the material looks? (Would it be lost or significantly de-
graded if the material looked different?)
– Is the value tied to the way the object works? (Would it be lost if particular func-
tions were removed? Or if particular functions happened at a different speed or re-
quired different keystrokes?)
– Is the value tied to the context of the material? (Would it be lost if links embedded
in the material did not work? Or if a user could no longer see evidence that con-
nected the material with its original context?)
– Is it possible to distinguish between elements within each of these areas? For exam-
ple, would advertising banners be considered an essential part of the way the mate-
rial looked? Would some navigation elements or display functions be needed but
not others?
– If it is difficult to define what needs to be maintained, it may be easier to consider
the impact of an element not being maintained, and to look for functions or elements
that are definitely not needed.

Figure 5.1: Deciding on Essential Elements (From UNESCO, 2003, p.75)

Further elaboration of the example of emails used earlier in this chapter illus-
trates some aspects of preserving authenticity and of significant properties.
Authenticity in this case ensures the trustworthiness of an email as a record of
94 What Attributes of Digital Materials Do We Preserve?

a transaction. An authentic record is ‘one that can be proven a) to be what it


purports to be, b) to have been created or sent to the person purported to have
created or sent it, and c) to have been created or sent at the time purported’
(Millar, 2004, p.4). For an email to be considered an authentic record, it must be
demonstrated that it has not been changed or corrupted and that it was created
by a particular person. This is achieved by ‘describing and preserving the
original context of the records and by maintaining a chain of unbroken custody’
(Digital Preservation Testbed, 2003, p.15). Changes are acceptable as long as
they do not alter the original meaning of the email. Emails can be considered
as consisting of five characteristics:

– context (‘the environment in which the digital record is made’)


– content (‘the body of the record, regardless of structure, colour, position or
font’)
– structure (‘the structure as it was originally made and reproduced on the
screen’)
– appearance (‘the final presentation, what the records looks like when it
appears on the screen’)
– behaviour (‘the interactive characteristics of a records; that which enables
us to manipulate and use the record so that new and extra content is dis-
played’) (Digital Preservation Testbed, 2003, pp.14-16).

For emails, it could be decided that it is only the content information that users
require – ‘the name and address of the sender, subject, date and time, recipients,
and the message, in a standardised structure with only the most simple of for-
matting’ (UNESCO, 2003, p.74).
More research into authenticity and its requirements in the digital world is
required, although, as the authors of a CLIR publication (Authenticity in a digital
environment, 2000) emphasize, establishing just what the requirements are is
no easy task. The NSF-DELOS Working Group on Digital Archiving and
Preservation suggested in 2003 that the digital preservation research agenda
should include research into tools that allow future users of digital materials to
determine whether they are authentic (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.vii). Research has also been done by the
InSPECT Project (Grace, Knight and Montague, 2009), but we still need defini-
tions of significant properties for more types of digital materials and a clearer
understanding of how these significant properties affect use and access of digital
materials, and knowledge of how much change is acceptable before the authen-
ticity of digital materials is compromised, among other things.
Research into authenticity 95

Research into authenticity


Several research projects have been influential in determining requirements for
digital materials to be considered as authentic. Although these projects originate
in the recordkeeping professions and, consequently, are principally concerned
with electronic records, their conclusions have proved to be applicable more
generally and are relevant to other kinds of digital materials.

Functional Requirements for Evidence in Recordkeeping Project


(Pittsburgh)

One of the ironies of digital preservation is that the web site on which the re-
ports of the Functional Requirements for Evidence in Recordkeeping Project,
administered by the University of Pittsburgh, were to be found was an early
victim of the instability of digital materials. The web site hosting the working
files of this project was deleted. Some of the site was subsequently recon-
structed using files captured and preserved by the Internet Archive (www.archi
muse.com/papers/nhprc). The project, commonly referred to as the Pittsburgh
Project, investigated during the early to mid 1990s the functional requirements
necessary for electronic recordkeeping systems to meet the needs of archivists
better. Its focus soon changed from system requirements to what is required for
electronic records to be considered as evidence. It developed a definition of what
constituted an electronic record that highlighted the importance of metadata
(Heazlewood, 2000, p.176).

InterPARES

The InterPARES (International Research on Permanent Authentic Records in


Electronic Systems) project (www.interpares.org) was based on an earlier project
headed by Luciana Duranti at the University of British Columbia in the mid
1990s, which sought to establish the systems requirements for electronic record-
keeping. Its first phase, now referred to as InterPARES 1, ran from 1999 to 2001.
Gilliland-Swetland (2002) provided a detailed description of InterPARES 1
and its outcomes. InterPARES 1 focused on the preservation of records that
are no longer current and was mainly concerned with records generated in data-
bases and document management systems. It asked ‘What is required to prove
the authenticity of electronic records?’ and analyzed case studies of electronic
records systems to develop statements of requirements for creating and main-
taining stable electronic records and for reproducing copies of preserved elec-
tronic records (US-InterPARES Project, 2002, p.5). Recommendations about
authenticity included:
96 What Attributes of Digital Materials Do We Preserve?

– Records should be created and maintained in a trusted recordkeeping system, and


preserved via a trusted custodian, who is able to maintain them for the long term
without alteration.
– The custodian of records should assess the authenticity of the records before transfer
to the preserver ... The assessment of authenticity needs to … establish the record’s
identity (the distinguishing characteristics of the record) and its integrity (the re-
cord’s wholeness and soundness) [and] establish evidence that the records have not
been inappropriately altered ...
– Records creators … should [meet] … the following Benchmark Requirements in the
creation, handling, and maintenance of active records:
Maintain expression of record attributes (especially those relating to identity and
integrity), and evidence of the following:
– Effectively implemented, creator-defined access privileges.
– Protective procedures to prevent loss or corruption of records procedures and to
prevent media and technology deterioration.
– Documentary forms of records (according to requirements of juridical systems
or of creators).
– Specific rules for authentication of records (i.e., which records are authenti-
cated, by whom, and how).
– Procedures that identify an authoritative record (when multiple copies exist).
– Procedures for removal and transfer of relevant documentation (involving re-
moval of records from the electronic system) ...
– The trusted custodian should be able to attest to the authenticity of copies of inactive
or preserved electronic records by meeting the following Baseline Requirements:
– Maintain controls over records transfer, maintenance, and reproduction.
– Retain documentation of reproduction processes.
– Capture, as part of archival description, any changes the records have under-
gone since they were first created (US-InterPARES Project, 2002, pp.7-8).

Other recommendations relating to preservation stressed the importance of


metadata, for instance, the need to ‘maintain information about the original
form of the record and the methods needed to translate between the stored
digital components and the copy of the record presented for use’. Thorough
documentation of the entire process of preservation was also recommended
(US-InterPARES Project, 2002, pp.9-10).
InterPARES 2 ran from 2002 to 2007. It addressed issues of authenticity,
reliability and accuracy of records, but differed from InterPARES 1 in that it
examined these characteristics of records throughout their life-cycle, and not
just after they had been determined to be worthy of long-term retention; it
focused on dynamic records, rather than those generated in databases and
document management systems.
InterPARES 3 began in 2007 and is planned to conclude in 2012. It builds
on the findings of the first two phases of InterPARES. Its aims are to ‘put theory
into practice, working with small and medium-sized archives and archival/
records units within organizations, and develop teaching modules for in-house
training programs, continuing education and academic curricula’ (InterPARES
Project, 1999-).
Trusted digital repositories 97

The InterPARES Project has produced a range of influential documents


and helpful advice that are well worth investigating further.

Trusted digital repositories


InterPARES 1 defined the role of a trusted custodian in ensuring that authentic
records remain available. The concept of trusted digital repositories, whose
roles and responsibilities were articulated by the influential 1996 Task Force on
Archiving of Digital Information, is similar. A number of ‘trusted organizations’
which have the capability of ‘storing, migrating and providing access’ to digital
materials are crucial for effective digital preservation, and a process of certifica-
tion is essential to ensure trust (Task Force on Archiving of Digital Information,
1996, p.37). Trusted digital repositories were identified in 2002 as, desirably,
‘national – and, increasingly, international – systems of digital repositories that
are ... responsible for the long-term access to the world’s ... heritage in digital
form’ (RLG/OCLC Working Group on Digital Archive Attributes, 2002, p.[i]).
To reach this level of operation the requirements had to be clearly articulated
and measures developed against which these repositories can be assessed.
Since 2002 the situation regarding trusted digital repositories has changed
significantly. The requirements have been clearly identified and disseminated
in the TRAC (Trusted Digital Repositories and Audit Checklist) document
(RLG-NARA Task Force on Digital Repository Certification, 2007). Criteria
and tools for auditing and certification digital repositories are available. A
working group is developing an ISO standard for auditing and certification of
trusted digital repositories based on TRAC, expected to be introduced in the
middle of 2012 as ISO/DIS 16363, and work is under way on establishing crite-
ria for the bodies that will audit repositories (Center for Research Libraries,
2009). From 2010 repositories have met the TRAC requirements and, accord-
ingly, are certified as trusted. The experiences of Portico, the first repository to
be certified by the Center for Research Libraries as a trusted digital repository
using TRAC, are described by Amy Kirchhoff and her colleagues (2010) and
the full report of the audit is available (Center for Research Libraries, 2010).
Dale and Gore (2010) provide further information about the emergence and
development of trusted digital repositories.
A trusted digital repository is ‘one whose mission is to provide reliable,
long-term access to managed digital resources to its designated community, now
and in the future’ (RLG/OCLC Working Group on Digital Archive Attributes,
2002). In order to provide such access the repository must meet requirements
specified in the TRAC document. These include: the repository’s compliance
with the OAIS Reference Model; a structure that supports the long-term viability
of the repository as well as the digital information for which it has assumed
responsibility; demonstration of financial responsibility and sustainability; sys-
98 What Attributes of Digital Materials Do We Preserve?

tems that meet commonly accepted standards to ensure the ongoing manage-
ment, access and security of the digital materials accepted by the repository;
implementation of system evaluation methodologies; and clearly stated policies.
Because the creators, the owners and the users of digital materials need to
be able to trust digital repositories, certification is a vital aspect of their estab-
lishment and operation. Not only is certification required, but auditing is also
required to monitor ongoing trustworthiness. Checklists are available for self-
assessment of repositories, such as TRAC and nestor’s Catalogue of Criteria
for Trusted Digital Repositories (nestor Working Group, 2006). Tools to assist
auditors are available, two examples being the online interactive tool for
repository managers, DRAMBORA (Digital Repository Audit Method Based
on Risk Assessment) toolkit (www.repositoryaudit.eu), developed by the DCC
and Digital Preservation Europe, and PLATTER (Planning Tool for Trusted
Electronic Repositories; www.digitalpreservationeurope.eu/publications/reports/
Repository_Planning_Checklist_and_Guidance.pdf).

Conclusion
Chapter 1 notes that to ensure that digital materials remain usable in the future,
access to them is required – and not simply access, but access to ‘all qualities of
authenticity, accuracy and functionality’ (Digital Preservation Coalition, 2008,
p.24). Authenticity of digital material needs to be defined in relation to users’
requirements, and the elements of the materials that meet these requirements
must also be clearly defined. It is also essential that we realize that some loss of
elements and functionality is inevitable. Consequently the levels of acceptable
loss must also be clearly described. It must, however, be acknowledged that
there are no absolute answers.
Once we have defined the levels of loss that future users of digital materials
are able to accept, we can focus on how to preserve the essential elements. The
next chapter examines strategies and techniques for digital preservation.
Chapter 6
Overview of Digital Preservation Strategies

Introduction
Maintaining access to digital resources over the
long-term involves interdependent strategies for
preservation in the short to medium term based on
safeguarding storage media, content and documen-
tation, and computer software and hardware; and
strategies for long-term preservation to address the
issues of software and hardware obsolescence
(Digital Preservation Coalition, 2008, p.103)

This chapter provides an overview of the range of principles, strategies and


practices that are currently in use for digital preservation. It also investigates
the requirements for approaches to digital preservation that will remain viable
and effective in the future. It attempts to put the approaches identified, both
those in current use and those that are emerging, into a useful context so that
they can be reflected on. Such reflection may point to clearer directions for the
future.
A distinction is made in this book between principles, strategies and prac-
tices. Principles are general ways of thinking, usually at the conceptual level;
for example, it is a principle to use standards where they exist. Strategies are
more concrete plans to achieve a particular long-term aim; an example of a
strategy is to limit the number of standard file formats that are to be maintained.
Practices are even more specific. A practice is what is needed to implement
and maintain a strategy, for example, the application of the necessary technology
to maintain the standard file formats selected. Practices encompass technolo-
gies – the machinery and equipment used in the practical application of scientific
knowledge. The word ‘technology’ is used in different ways through the lit-
erature. For instance, ERPANET (the Electronic Resource Preservation and
Access Network) used the word technology inclusively to cover all hardware
and software as well as methods and procedures: ‘technology are all means
that serve the purpose of preserving digital objects for as long as it [sic] is
needed’ (ERPANET, 2003, p.2). The principles–strategies–practices distinc-
tion is not the only way that the range of digital preservation activities can be
characterized. Another possibility is to distinguish between policies and proce-
dures. Policies are defined courses of action. Procedures are specific sequences
of actions that allow the policies to be put into practice.
100 Overview of Digital Preservation Strategies

Other typologies have been applied to digital preservation principles, strate-


gies and practices. (Typology is used here in the sense of a systematic grouping
to help our understanding of things being studied by identifying attributes or
qualities among them that link them together.) These typologies range from
limited lists with a small number of categories, such as that of Lim, Ramaiah
and Pitt (2003), to more sophisticated versions, such as Rothenberg’s typology
(2003). Some of these are examined in more detail later in this chapter.
Why might reflecting on these typologies help us to address digital preser-
vation concerns? The UNESCO Guidelines suggested in 2003 that strategies
‘are still evolving’ and that ‘there is, as yet, no universally applicable and prac-
tical solution to the problem of technological obsolescence for digital materials’
(UNESCO, 2003, pp.118,120). Nearly ten years later, this statement still holds
true. Making a distinction between principles and strategies on the one hand
and practices on the other, instead of combining them all together, allows us to
think more precisely about the validity of different approaches. Practices are
very likely to change frequently as the technologies on which they depend
change, whereas principles and strategies are less likely to change regularly be-
cause they are less bound to the issues associated with technology obsolescence.
There are many ways of characterizing the range of strategies for long-
term sustainability and accessibility of digital objects. For example, we could
distinguish between passive approaches (such as improving digital storage
media and improving storage and handling practices) and active approaches
(such as refreshing, migration, emulation, encapsulation, normalization and
replication). Research and implementation work proceeds in all of these and
more besides. The contention of this chapter is that, because such a wide range
of approaches, strategies and practices are in use or under development,
determining their soundness and viability could guide us to more rapid develop-
ment of mechanisms to address the significant threat of loss of digital informa-
tion. Developing useful typologies will assist us to make these determinations.
The UNESCO Guidelines identify some strategies that are likely to be viable
in the long term: ‘the use of standards for data encoding, structuring and de-
scription’, emulation, and migration of data (UNESCO, 2003, pp.120-121).
The Guidelines also note some principles that lie behind current approaches
to preserving digital materials (UNESCO, 2003, section 17.11-12). Despite
guidance such as this, the list of possible strategies remains long and bewildering.
From this a smaller number of strategies that are most commonly applied is
beginning to emerge.
What does a list of possible strategies look like? It is likely to include these
at least:

– Analogue backups: output to permanent paper or microfilm


– Backwards compatibility
– Bit-stream copying
Historical overview 101

– Canonicalization: translating artifacts into standard or ‘canonical’ forms


– Data recovery
– Digital archaeology
– Mass storage systems
– Durable/persistent digital storage media
– Emulation
– Encapsulation
– Improving handling and storage of storage media
– Long-term formats
– Migration (format migration, normalization then migration, software mig-
ration, version migration)
– Normalization
– Persistent object preservation
– Policy development
– Refreshing (of data, of storage media)
– Replication (redundancy, keeping multiple copies)
– Reverse engineering of software
– Software repositories
– Standardizing data formats
– Standards
– Technology preservation
– Technology watch
– Universal Virtual Computer
– Viewers for oboslete formats
– Virtual machines
– XML.

This chapter first describes some of the history of the development of principles
and strategies for digital preservation. It then notes strategies that are being
applied. Finally, some existing typologies of principles, strategies and practices
are noted and a further typology is proposed.

Historical overview
Chapter 1 explored the need for a new preservation paradigm in the digital world.
It noted that the principles that lie behind preservation practice, and the preser-
vation techniques themselves, need to change. Where pre-digital preservation
paradigm practices are based on the preservation of artifacts, the new preserva-
tion paradigm requires different ways of thinking which are still not completely
clear. Some are accepted and understood: for example, that there is a need to
actively maintain digital information from the moment of its creation, and that
there is likely to be greater emphasis on collaboration and support from a wider
range of stakeholders.
102 Overview of Digital Preservation Strategies

It is inevitable that the development of these new ways of thinking should


have its beginnings in the pre-digital preservation paradigm and be heavily
reliant on strategies and practices based on keeping and copying physical arti-
facts. Consequently, early digital preservation efforts focused on the storage
media (such as developing more durable media) and on copying onto more
durable storage media (the printing of digital objects to paper is an extreme
example of this).
It is salutary to review the literature of digital preservation over the last
decade and to see how thinking has changed. An awareness of this historical
background helps us understand better the requirements of effective digital
preservation strategies and practices. Before about 2000 the list of possible
strategies and practices was short: a heavy emphasis on media preservation,
then migration, emulation, and technology preservation as constants, with men-
tion of refreshing, output to permanent paper or microfilm, and some short-lived
possibilities such as digital tablets. (A 2002 description of a digital tablet reads
like a description of the iPad of today, with the additional characteristics that it
can ‘withstand millennia of neglect under harsh conditions’ and have ‘a storage
capacity of dozens or hundreds of terabytes’ (Lee et al., 2002, p.98).) The view
from this period was depicted by Lee et al. (2002, p.94), who described a
primary emphasis on media refreshing (copying), initially onto paper and film,
but over time onto magnetic tape and optical media such as CDs and DVDs.
Data preservation techniques are also noted: technology preservation; emulation;
migration; encapsulation.
Since about 2000 other strategies and practices have been added. Migration,
emulation, and technology preservation remain as the primary strategies and
practices. The list expanded, however, to include digital archaeology, standard-
ized file formats, developing policy, keeping multiple copies, and developing
preservation metadata. Currently the list, having expanded still further, also
includes bit-stream copying, reliance on standards, viewers, replacing artifacts
by formal descriptions, the Universal Virtual Computer approach, mass storage
systems, software repositories, and technology watch. Also included is the ‘do
nothing’ strategy. It is now understood that combinations of strategies and
practices, rather than any one single approach, will be required.
Pre-digital paradigm concepts may also have led to the earlier focus of
digital preservation activities on technical problems, without sufficient effort
being directed to wider issues such as developing appropriate public policy.
Research in digital preservation could a decade ago be characterized along
similar lines. Early research focused on two areas: ‘preserving the bitstream’
and ‘extending the useful life of the entity on which information was recorded’
(NSF-DELOS Working Group on Digital Archiving and Preservation, 2003,
pp.10-11). Research into media attempted (and still attempts) to develop more
durable media. Research directed at preserving the bit-stream examined issues
such as media migration. Much of this research was concentrated on three
Who is doing what? 103

strategies: ‘normalization of digital content into a few common formats;


migration of data from obsolete to current computing platforms and applica-
tions; and emulation of obsolete platforms on current computing platforms’
(NSF-DELOS Working Group on Digital Archiving and Preservation, 2003,
p.12). There was, perhaps, a mindset that willed us to develop the killer applica-
tion, a single technical solution (such as Rothenberg’s championing of emula-
tion, described below), although this way of thinking no longer prevails in the
literature of digital preservation.
In the intervening years there has been significant progress in many aspects
of digital preservation. Research focus is now shifting from technical concerns
to broader concerns focusing on sustainability of digital preservation initiatives.
Attendees at a 2011 workshop on shaping EU-funded research in digital preser-
vation proposed areas where research was needed. Research into technical
aspects of digital preservation, such as automation and simplification of preser-
vation processes, was required. Research into sustainability was also suggested:
‘developing well-expressed business models to support investment in digital
preservation’ was one proposed area of research; another was developing new
ways to educate computer scientists so that digital preservation becomes a core
part of their curricula (Workshop on the Future of the Past, 2011).
Two key realizations have allowed us to move towards a new preservation
paradigm and to develop new approaches, strategies and practices. One is the
concept of separating digital content from the technologies; the other is the
need to embrace a wider range of stakeholders in digital preservation processes
and to accommodate them in all-embracing policies and guidelines. These are
examined in the rest of this chapter and in the two following chapters.

Who is doing what?


One way of thinking about the applicability and long-term viability of specific
strategies and practices is to consider the ones that are actually being applied
and see if there are lessons to be learned from observing current practice. In
2005 the current state of practice could be characterized in this way: digital
preservation is a field in which, so far, there has been considerable invention
but relatively little application. More than five years later this characterization is
no longer valid. We have seen significant development in practice, especially
in the automation of digital preservation practices and the development of
software tools, and more widespread awareness of digital preservation issues
and implementation of digital preservation programmes.
In 1998 Hedstrom reported that methods being applied ‘fall far short of
what is required to preserve digital materials’. She noted that ‘all preservation
methods require trade-offs between what is desirable from the standpoint of
functionality, dependability, and cost and what is possible and affordable’ and
104 Overview of Digital Preservation Strategies

this could be observed in the methods being applied: printing to paper or micro-
film, with the loss of ‘retrieval and reuse potential’, and migration strategies,
which by normalizing to a standardized data format often result in loss of
information about document structures and relationships (Hedstrom, 1998,
p.195). One year later Hedstrom and Montgomery reported the results of a
survey of 54 Research Library Group members. Only 13 institutions had digital
preservation methods in place, the most common being ‘transfer accessions to
new media’ (25), refresh (18), migrate (17), ‘limit formats accessioned’ (16) and
‘standards for archival masters’ (7) (Hedstrom and Montgomery, 1999, pp.14-
15).
Five years later Walton observed that, although there was no consensus on
the best method or methods, four major national archives selected strategies
which emphasized ‘preserving objects over preserving technologies: conversion
to standard format (NA, NARA, NAA) or format migration (PRO)’ (Walton,
2003, p.7). Also in 2003, Bryan’s small survey of United States manuscript
repositories, which sought to ascertain their practice in managing preservation
of born-digital material, noted that of the nine respondents, four ‘print electronic
documents to paper and three migrate them to a server’ (Bryan, 2003). In the
same period the European Commission-funded ERPANET Project developed
case studies documenting the experiences of and approaches to digital preser-
vation in the pharmaceutical, publishing and telecommunications sectors.
These note some general strategies, for example, migration and reducing the
proliferation of formats. In the pharmaceutical industry, ‘size and proliferation
of formats are the main obstacles to the preservation of objects’. Migration to
new data formats was carried out when necessary, PDF (Portable Data Format)
had become the standard format for preservation purposes, and there was a
general belief that digital preservation solutions should come from outside the
industry. In the publishing industries ‘PDF proved a very popular format along-
side TIFF (Tagged Image File Format), XML (Extensible Markup Language),
and SGML (Standardized General Markup Language) for distribution and preser-
vation.’ The telecommunications organizations relied on migration carried out
when their business software was updated (Ross, Greenan and McKinney, 2003).
Two more extensive surveys published in 2004 provide firmer data. A sur-
vey of digital preservation practice in 21 natural science and scientific
publishing organizations operating on an international level concluded that
‘migration remains the preservation strategy of choice; it is still too soon for
most archives to have undergone a significant technological change’ (Hodge
and Frangakis, 2004, p.2). The strategies used in this community were ‘Trans-
formation to a Preservation Format’ (for example, ASCII and XML), migration,
and ‘Migration On-Request [where] the original version of the material is re-
tained and when necessary, conversion tools are applied to convert the original
to the format required by the user’ – this had been tested but not applied (Hodge
and Frangakis, 2004, pp.38-40). Another international survey, reported in 2004,
Who is doing what? 105

of current digital preservation practice in national libraries, state libraries, uni-


versity and research libraries and consortia, archives, museums, and other
organizations in 13 countries was carried out under the auspices of OCLC and
the Research Libraries Group (OCLC/RLG PREMIS Working Group, 2004).
Of its 48 respondents, 92 per cent were implementing (or planned to imple-
ment) normalization, migration, or emulation and most of these indicated that
they applied more than one strategy. The most popular strategy was bit-level
preservation (refreshing), implemented by 85 per cent of respondents, followed
by restricting submissions and normalization (both of which limit the variety
of formats that have to be managed), migration, and migration-on-demand, in
that order. Decisions about digital preservation were being made on a short-
term basis because it was assumed that ‘there will be additional technical options
in the future’ (OCLC/RLG PREMIS Working Group, 2004, p.37). Finally, the
report noted that there was not a lot of operational experience in digital preser-
vation.
This lack of operational experience means that the knowledge of those
who have applied digital preservation strategies and technologies beyond test
situations becomes all the more important. Interviews during 2004 of Austra-
lian digital preservation specialists whose expertise was derived from their
extensive operational experience indicated five themes that characterized viable
preservation strategies and practices (noted in more detail in the first edition of
this book):

1. Societal and organizational missions


2. Knowing what you are preserving
3. Standards
4. Operational matters
5. Technical issues.

The first theme (societal and organizational missions) involved concepts such
as assured funding, a sustainable supportive environment over time, knowing
the context in which preservation occurs, and community will. A sustainable
environment that supports digital preservation over time was considered essen-
tial, and this is related to ongoing funding. To be sustainable, digital preservation
activities had to be built into ‘normal operating activity’ and a good understand-
ing of the context was required. The second theme (knowing what you are pre-
serving) was expressed in terms of the need to think clearly about exactly what
it was that we are trying to preserve. These concepts were closely linked to the
third theme (standards), which noted strategies such as metadata, data formats
that remain accessible, normalization, and standard data formats. Standards
were considered as an essential aspect of any preservation strategy; XML,
metadata standards, and standard data formats were specifically noted. The
fourth group of characteristics comprised operational aspects: capture the
106 Overview of Digital Preservation Strategies

material first, build digital preservation into normal operations, integrate digital
preservation processes fully, keep data moving. The technical issues of the
fifth theme included such principles as the importance of not relying on pro-
prietary data formats or systems.
According to the Australian specialists interviewed, an effective digital
preservation strategy had these characteristics:

– It should be part of a sustainable environment that supports digital preser-


vation over time
– The context in which digital preservation operates, and therefore in which
the strategy is placed, needs to be clearly understood
– A very far-reaching forward plan should exist for it
– It should be built into normal operating activities
– It should be based on clear definitions of what it is that we are trying to
preserve – the ‘essence’ of a record
– It should be based on stable, widely used, and clearly defined standards,
including a limited number of standard data formats where possible
– It requires sufficient management information metadata and preservation
metadata
– It should recognize that digital preservation is an active process
– It should not be based on proprietary data formats or systems.

Has the situation changed in the last five years? Surveys of digital preservation
activities in 2009 and 2011 suggest that it has. Responses to a 2009 survey,
mainly of national libraries and archives in Europe but also with North American
input, indicated that digital preservation was understood as an issue that
demands action. Eighty-five per cent of respondents had or were planning a
digital preservation ‘solution’. Half of the respondents had in place a digital
preservation policy, which is significant because the existence of a policy meant
that an organization was three times more likely to have a budget for digital
preservation and a solution planned or in place (Planets, 2009, p.4). A significant
change from earlier surveys was the availability of software tools for digital
preservation, both proprietary and open-source, that had been evaluated and
used, among them DSpace, Fedora, E-prints, Ex Libris Digitool, and IBM
DIAS (Planets, 2009, p.50). Although this survey was not specifically about the
strategies in place or being considered, the high percentage of organizations
with policies, budgets, and solutions in place augurs well.
A 2011 survey of 72 members of the ARL (Association of Research Librar-
ies) about preservation of materials in their institutional repositories (Li and
Banach, 2011) also indicated that awareness of the importance of policies
about digital preservation had increased since a 2005 survey. Just over half of
the respondents to the 2011 survey indicated that they now have policies in
place. Specific strategies noted in this survey were backups (implemented by
Criteria for effective strategies and practices 107

93 per cent), storage in a secure system (76 per cent), checksums (63 per cent),
migration (50 per cent), refreshing (47 per cent), and emulation (7 per cent).
Many of these preservation processes were being handled within repository
software, DSpace being the most popular.

Criteria for effective strategies and practices


Other commentators have investigated the requirements for effective digital
preservation strategies and practices. Respondents to a survey of RLG members
in 1999 ranked the threats, in order of significance, as ‘technology obsolescence,
insufficient resources, insufficient planning; and physical condition of materi-
als’ (Hedstrom and Montgomery, 1999, p.19). This suggests that effective
approaches to and strategies for digital preservation need to address these con-
cerns, so our list of criteria for an effective digital preservation strategy is very
likely to include the requirements that the approach or strategy should be in-
dependent of technology, that it must be adequately resourced, that it is planned
in detail, and that attention is paid to improving the physical condition of
media.
Jeff Rothenberg has commented extensively on some of the requirements. He
encourages ‘a sound technical approach’ as the basis, recognizing that ‘techni-
cal, administrative, procedural, organizational, and policy issues’ (Rothenberg,
1999b, p.6) must also be addressed and issues such as privacy, authentication,
validation, and intellectual property rights need also to be taken into account.
Costs of implementation must be feasible (Rothenberg, 1999b, p.9). His ‘ideal
approach’ would be a single solution that is long-lived and can be ‘applied uni-
formly, automatically, and in synchrony (for example, at every future refresh
cycle) to all types of documents and all media, with minimal human intervention’
(Rothenberg, 1999b, p.16):

This approach must be extensible, since we cannot predict future changes, and it must
not require labor-intensive translation or examination of individual documents. It must
handle current and future documents of unknown type in a uniform way, while being
capable of evolving as necessary. Furthermore, it should allow flexible choices and
tradeoffs among priorities such as access, fidelity, and ease of document management
(Rothenberg, 1999b, p.30).

This is a tall order. Implicit in Rothenberg’s comments is the idea that there is
a single technical solution – he is well-known as championing emulation as
this single solution; but others do not agree. Lavoie and Dempsey (2004) have
suggested that factors such as ‘cost, user preferences, nature of the material,
whether it exists in multiple forms’ must be taken into account when ascertaining
which preservation strategies are appropriate.
108 Overview of Digital Preservation Strategies

Thibodeau has identified four criteria by which to ascertain the best


strategy:

– its feasibility (is there software and/or hardware capable of doing it?),
– its sustainability (can it be done into the future, or can an alternative future path be
identified?),
– its practicality (can it be applied within reasonable limits of difficulty and expense?),
and
– its appropriateness (this criterion relates to the type of objects and why we are pre-
serving them) (Thibodeau, 2002, pp.15-16).

To illustrate these he describes a spectrum of possibilities, ranging from ‘pre-


serving technology itself to preserving objects that were produced using infor-
mation technology (IT)’. The suitability of any strategy and practice, which
Thibodeau calls ‘methods’, can be determined only with respect to the nature
of the material being preserved. Computer games, for example, are much more
likely to require methods that fall at the ‘preserve technology’ end of this
spectrum, whereas emails fall at the ‘preserve object’ end. Some methods are
general and some apply only to specific technologies, for example specific
software and/or hardware, or specific data types. Therefore, the ‘range of appli-
cability is another basis for evaluating preservation methods’ (Thibodeau, 2002,
p.17).
Nor must we forget the requirements for authenticity in digital preservation,
noted in Chapter 5. Because ‘an authentic digital object is one whose genuine-
ness can be assumed on the basis of one or more of the following: mode, form,
state of transmission, and manner of preservation and custody’ (Ross, 2002,
p.7), it follows that effective digital preservation strategies and technologies
must ensure that these characteristics are not lost. We should keep in mind
Gilliland-Swetland’s cautionary comments:

Counterintuitively, perhaps, it is during the preservation of digital materials that evi-


dential value is often most at risk of being compromised. Digital preservation tech-
niques have moved beyond a concern for the longevity of digital media to a concern for
the preservation of the information stored in those media during recurrent migration to
new software and hardware. In the process, many of the intrinsic characteristics of
information objects can disappear – data structures can be modified and presentation of
the object on a computer screen can be altered (Gilliland-Swetland, 2000, pp.11-12).

ERPANET has produced a guide to assist with selecting digital preservation


technologies in its erpaTool series (ERPANET, 2003). They provide a list of
evaluation factors, reproduced here, with minor amendments, as Figure 6.1.
Criteria for effective strategies and practices 109

Factors Remarks
Maturity Is the technology fully developed and are there
already systems in productive use?
Experience Are there already verifiable experiences in applying
the technology for the preservation of similar objects?
Spread Is the technology widespread enough to guarantee
that it will be supported by the manufacturers during
the desired lifespan of the preservation system?
Standardization; open Is the technology based on standards and are the speci-
specifications fications of all the critical elements laid open by the
General

manufacturers or at least deposited with an independent


and trusted third party and available there in case of
the dissolution or downfall of the manufacturers?
Reliability Does the technology work reliably and can the
reliability of the outcome easily be checked?
Modularity and flexibili- Is it easily possible to add new components at low
ty cost, to change or update them?
Costs It is important to include not only the price of system
components, but all cost of implementing and
maintaining the system.
Legislation Are the objects subject to specific legislation which
asks for a specific form, format, storage medium, or
accessibility? Such regulations are basic conditions
for the selection of technologies.
Characteristics How can the main characteristics of the objects and
their context be preserved without threatening
authenticity and integrity? As conversion of digital
objects for preservation is often difficult and costly,
all selected technologies must be able to treat current
Objects

and, as far as possible, also future objects. Consider


also carefully what consequences a selected
technology has regarding the creation or preparation
of the objects for preservation.
Preservation period As digital systems only have a lifespan of about five
to 10 years at the highest and digital objects must be
preserved for much longer periods, it is important that
the system is able to efficiently export objects and
their context data in standard formats in order to
migrate them into a new system.
Skills Does the selected technology need specific skills
People

which must be available in-house? Are these skills


already available or can they easily be acquired?
110 Overview of Digital Preservation Strategies

Factors Remarks
Staff for maintenance Is the appropriately skilled work force in the right
number for the maintenance available?
Experience Is there sufficient experience with the technology for
support in case of difficulties available? (in-house or
easily accessible in the region)
Workflow Can the technology easily be implemented in the
preservation workflow or can the workflow be
adapted to the technology without major difficulties
or loss of efficiency?
Flexibility Can the technology flexibly be implemented? Does it
Procedures

allow changes in the preservation procedures?


Good practices Are good practices in using the technology already
established?
Quality requirements Can the technology meet the previously defined
quality standards? However, automatic or semi-
automatic techniques are difficult to be applied for
heterogeneous collections, complex objects, and high
quality requirements.

Figure 6.1: Factors to Consider when Selecting Digital Preservation Technologies


(Modified from ERPANET, 2003, pp.4-5)

More recently, attention has focused on a holistic approach to digital preser-


vation. For example, more attention is paid to such broader concerns as devel-
oping and implementing robust policies and applying standards (and where
appropriate participating in the development of new standards). Also increas-
ingly emphasized is the need for developing and adopting software tools that
assist with digital preservation, including software that assists with planning
digital preservation procedures. Automating digital preservation processes and
sustainability of digital preservation processes are also increasingly emphasized.
These are noted in more detail in the sections that follow.
Finally, we need to note the effect of national and even trans-national in-
formation infrastructures on approaches and strategies for digital preservation.
Effective approaches and strategies will work best if supported within these
infrastructures. Examples are most evident in the UK, where JISC has coordi-
nated and funded the development of digital preservation research and devel-
opment for UK universities (the range of its activities can be seen at www.jisc.
ac.uk/preservation), and in the European Union through research and develop-
ment activities coordinated and funded by the European Commission’s Common
Strategic Framework for research. National strategies for digital preservation
in other countries include New Zealand’s National Digital Heritage Archive
(NDHA) Programme (www.natlib.govt.nz/about-us/current-initiatives/ndha/past-
Broader concerns 111

initiatives/ndha-programme) and the National Digital Information Infrastruc-


ture and Preservation Program (NDIIPP) in the US (www.digitalpreservation.
gov). Also relevant here are possibilities for cooperation which are evident in
the international groups that have developed many key digital preservation
standards and in research projects where it is now routine to find researchers
and organizations from several countries.

Broader concerns
Attention to a more holistic approach to digital preservation is evident in the
increasing literature about broader concerns such as standards, planning, poli-
cies, and sustainability. Planning of digital preservation needs to take place at
all stages of digital preservation and needs to be based firmly in policy and
standards. Digital preservation operations need to be sustainable.

Standards

The principle of adhering to standards provides a more favourable environment


for the success of digital preservation; the widespread adoption of standards is in
fact a prerequisite for effective preservation. As the field of digital preservation
matures, more standards are developed, promulgated, and adopted. Widespread
adoption of standards is of considerable importance because it promotes collabo-
ration among organizations and interoperability among systems.
Standards that are stable and have been widely adopted are more likely to
be supported and remain viable over a longer period. This is not always the
case with proprietary standards or standards designed for use only with spe-
cific software and/or hardware. Advantages that accrue from standardization
are that costs will be reduced because there are fewer and less complex varia-
tions to deal with: for example, economies of scale could be achieved when
migrating data. The UNESCO Guidelines note that the use of widely available
standards ‘is more likely to allow re-interpretation of the data or re-construction
of tools in the future, if necessary’, but adds that use of standards is an ‘in-
vestment’ strategy, that is, it involves investment of effort at the start of the
process (UNESCO, 2003, pp.123-124), which has resourcing implications.

Planning

In order to preserve digital materials effectively, they must be managed over


time. This requires planning. In the OAIS Reference Model, a key standard for
creating a digital archive, planning is fundamental – Preservation Planning is
112 Overview of Digital Preservation Strategies

one of the central functions of this standard (see Chapter 5). Developing a data
management plan is increasingly required by funding bodies, who may specify
that data sharing, curation and preservation are a part of the plan for any pro-
ject for which funding is sought. Examples of funding bodies that require data
management plans are the National Science Foundation in the US (www.nsf.
gov/eng/general/dmp.jsp) and the Wellcome Trust (www.wellcome.ac.uk/About-
us/Policy/Spotlight-issues/Data-sharing/Guidance-for-researchers/index.htm), a
major funder of medical research in the UK.
Planning should start, ideally, when digital materials are created. They
should be created using open file formats that are in widespread use, are well
supported, and are stable. This reduces the risk of file formats becoming obso-
lete and unusable in the future. Later stages in the digital preservation life-
cycle also require planning: ingesting material into a digital archive requires
planning to ensure that procedures are consistent; storing digital materials
demands that sustainable storage is planned; making digital materials accessible
to users requires planning so that ways of accessing digital objects that are
appropriate to user communities are implemented; planning to collect relevant
metadata is vital at all stages of digital preservation. These are only some of
the areas where planning is required.
Software tools to assist with planning digital preservation are now avail-
able. One example is Plato, an outcome of the Preservation and Long-Term
Access through Networked Services (Planets) project, now the OpenPlanets
Foundation (www.openplanetsfoundation.org). Plato supports the decision-
making process to determine which preservation actions best suit the digital
materials that are to be preserved. It is available as open-source software
(www.ifs.tuwien.ac.at/dp/plato/intro.html). Case studies describing implemen-
tations of Plato are available: examples are its use in preservation of the digital
collection of the Danish Folklore Archive (Chivers et al., 2010) and its integra-
tion into the EPrints digital repository software (Tarrant et al., 2010). Another
planning tool is DMP Online (dmponline.dcc.ac.uk). Developed by the Digital
Curation Centre, this software tool assists users to develop a preservation plan
that meets the requirements of funding bodies for a data management plan
(Donnelly, Jones and Pattenden-Fail, 2010).

Policies

A principle that applies to all information services is that policies are necessary;
digital preservation is no different. Policies are a prerequisite for effective
digital preservation. They clearly state principles, values, and intentions, specify
how the policy is monitored and who is responsible for its maintenance, pro-
vide links to related policies, and indicate the process for reviewing the policy,
including frequency of review. The benefits of having policies about digital
Broader concerns 113

preservation in place include assisting with planning a coherent digital preser-


vation programme and publicly indicating that the organization is serious
about digital preservation.
Hand-in-hand with policies go procedures. As noted earlier in this chapter,
policies are defined courses of action, whereas procedures are specific se-
quences of actions that allow the policies to be put into practice. For example, a
policy about metadata will indicate what kinds of metadata are necessary and
at what stages of the lifecycle of digital material they are assigned or derived;
associated procedures will specify in detail which metadata standards are used
and how they are applied.
The DISC-UK DataShare Project’s helpful guide to developing policy,
Policy-making for Research Data in Repositories (Green, Macdonald and Rice,
2009) indicates six areas where policies are needed. The content of the archive
is the first area and includes its scope, kinds of digital information handled,
and file formats accepted. Policy about metadata is next; statements about who
can access metadata, what types are needed, and which metadata schema are
applied are examples. Examples of policy about ingesting digital objects into
the archive include statements about the quality requirements the objects must
meet, about confidentiality of data, and about rights relating to the objects. The
fourth area is policy about access, use, and reuse of digital objects. Policy
about preservation of digital objects is the fifth area; examples are the specifica-
tion of retention periods, and the safeguarding and demonstration of authenticity.
Finally, policy about withdrawing digital objects from the archive is needed.
The OAIS Reference Model specifies that policies are required for archival
storage, management, disaster recovery, and security. A 2008 JISC study
(Beagrie et al., 2008) provides useful guidance. It presents an analysis of digi-
tal preservation policies available at the time of their study and gives numerous
helpful examples.

Sustainability

The sustainability of digital preservation programmes has been of significant


concern for at least the last decade. The economic sustainability of pro-
grammes is of concern because to date most have been funded by project
money – that is, their funding is finite and usually only for a few years. This is
an impediment to planning programmes that aim to preserve material for long
periods of time, perhaps in perpetuity. However, sustainability includes other
factors, such as securing skilled personnel. The costs of digital preservation
and access to skilled personnel are both noted in more detail in Chapter 10.
114 Overview of Digital Preservation Strategies

Typologies of principles, strategies, and practices


An unclassified list of possible strategies was provided at the beginning of this
chapter. It is worth attempting to categorize these, to develop a typology (or
typologies) to help us understand them better so that we can determine which
are likely to be viable or most appropriate in particular digital preservation
contexts. Although most of the literature simply lists the strategies, five publi-
cations (the Digital Preservation Coalition handbook (2008), Walton (2003),
Thibodeau (2002), Rothenberg (2003) and the UNESCO Guidelines (2003))
do categorize them and deserve further attention.
The Digital Preservation Coalition’s influential handbook Preservation
Management of Digital Materials (2008) divides digital preservation strategies
into primary and secondary strategies. Primary strategies are ‘those which
might be selected by an archiving repository for medium to long-term preser-
vation of digital materials for which they have accepted responsibility’. Secon-
dary strategies are ‘those which might be employed in the short to medium
term either by the repository with long-term preservation responsibility and/or
by those with a more transient interest in the materials’ (Digital Preservation
Coalition, 2008, p.111). Primary preservation strategies are migration and emu-
lation; secondary strategies are technology preservation, adherence to standards,
backwards compatibility, encapsulation, permanent identifiers, converting to
stable analogue format, and digital archaeology. The DPC handbook notes that
secondary strategies will sometimes be employed as a holding action, to defer
the need for primary strategies to be applied. It proposes that there are two
dominant strategies, migration and emulation, and that the need to apply these
can on occasion be ‘deferred and/or simplified if appropriate preventive pres-
ervation procedures such as storage and maintenance … and selected secondary
preservation strategies’ are applied. Also noted as a further strategy is ‘con-
verting to an analogue preservation format’ and its significant limitations such
as the loss of digital characteristics. Digital archaeology is presented as a fall-
back position – ‘not precisely a preservation strategy at all but rather when the
absence of preservation strategies has left valuable resources inaccessible’
(Digital Preservation Coalition, 2008, p.111). The need to apply more than one
strategy is indicated: ‘employing a judicious mix of secondary strategies’ has
potential for reducing the risks of obsolesence and also the costs of preservation
(Digital Preservation Coalition, 2008, p.111).
Walton suggests five categories: preserve old technology; emulate old tech-
nology; format migration; standardize formats; and persistent object preservation
(Walton, 2003, pp.5-7). These fall into two types: those concerned primarily
with technology (preserve old technology, emulate old technology); and those
concerned primarily with data formats (format migration, standardize formats,
persistent object preservation). The merit of the first type, focusing on tech-
nology, is that authenticity is much more likely to be maintained, but it is
Typologies of principles, strategies, and practices 115

impractical because ‘it requires multiple solutions (that is, one for every dis-
tinct category of equipment being preserved) rather than a single one’, and
writing emulator software is labour-intensive and costly. There is also, for the
first category, the major issue of technology obsolescence.
The second type concentrates primarily on data formats. Format migration
attempts to ensure that the data remains live by continually updating it so it can
be accessed using whatever technology is current. Format standardization con-
verts the data to a limited number of non-proprietary formats (for example,
TIFF for images, XML for text) and there is consequently no need to rely on
one technology to read them, as many kinds of software can do this. Both of
these categories minimize the need to rely on software and hardware that will
become obsolete, but both require high levels of resources to manage ongoing
migration projects or convert to standard formats. Persistent object preservation
attempts to separate data completely from the technology. Digital materials are
described in terms that are independent of software or hardware; an example is
electronic records converted to XML and then encapsulated with metadata.
Walton notes, however, that ‘its operational viability remains to be demon-
strated’ (Walton, 2003, pp.5-7).
Walton’s approach is based on distinguishing between data preservation
and maintaining the technology. Thibodeau extends this approach. He describes
a ‘preservation spectrum’, with ‘preserve technology’ at one end and ‘preserve
objects’ at the other. On the ‘preserve technology’ end of the spectrum are
methods that attempt to keep data in specific logical or physical formats and to
use technology originally associated with those formats to access the data and
reproduce the objects. In the middle of the spectrum are methods that migrate
data formats as technology changes, enabling use of state-of-the-art technology
for discovery, access, and reproduction. On the ‘preserve objects’ end of the
spectrum are methods that focus on preserving essential characteristics of
objects that are defined explicitly and independently of specific hardware or
software (Thibodeau, 2002). But the suitability of a method for the material
being preserved also needs to be taken into account. Because some methods
are general and others apply only to specific technologies, Thibodeau proposes
that the method’s suitability (which he designates as applicability) needs also
to be accommodated.
Rothenberg takes a different tack. His classification is based on the extent
to which strategies and practices are complete. His ‘overview of proposed
approaches to preservation’ (Rothenberg, 2003) has three categories: non-
solutions, partial solutions, and potentially complete solutions. Non-solutions
include ‘do nothing’ and digital archaeology. Partial solutions form a longer
list:

– Save page-images of artifacts;


– Extract and save ‘core contents’ of artifacts;
116 Overview of Digital Preservation Strategies

– Translate artifacts into standard or ‘canonical’ forms (without migration);


– Rely on ‘viewer’ programs to render obsolete formats in the future;
– Save metadata to help interpret saved bit streams (‘assisted archaeology’);
– Save source-code of rendering software (for future reverse-engineering).

Rothenberg’s potentially complete solutions are of most interest. (Only three


are noted, suggesting that there is still significant work needed to develop a
range of viable strategies and practices):

– Formalization (replace artifacts by formal descriptions of themselves);


– Migration (repeatedly convert artifacts into new artifacts);
– Emulation (run original rendering software on virtually recreated hardware) (Rothen-
berg, 2003).

The categorization in the UNESCO Guidelines (2003) is different again, being


based in part on the length of time that the strategies are likely to provide a
viable result, and in part on what stage of the preservation process the principal
effort is needed. Four sets of strategies are identified. ‘Investment’ strategies
(primarily involving investment of effort at the start) are: use of standards; data
extraction and structuring; encapsulation; restricting the range of formats to be
managed; and the ‘UVC’ (Universal Virtual Computer) approach. Short-term
strategies (likely to work best over the short-term only) are: technology preser-
vation; backwards compatibility and version migration; and migration (which
may also work over longer periods). Medium- to long-term strategies (likely to
work over longer periods) are viewers and emulation, plus two already noted:
migration, and the UVC approach. Alternative strategies are noted as: non-
digital approaches; data recovery; and combinations of strategies (UNESCO,
2003, pp.122-123).
The UNESCO Guidelines distinguish between general principles, and spe-
cific strategies and practices. The general principles (which they label as
‘strategies’ (UNESCO, 2003, p.36)) for preserving digital materials include:

– Working with producers (creators and distributors) to apply standards


– Recognising that selection of material to be preserved is essential
– Placing the material in a safe place
– Using structured metadata and other documentation to control material, fa-
cilitate access and support preservation processes
– Protecting the integrity and identity of data
– Effectively managing preservation programs.

The UNESCO Guidelines also articulate nine specific principles on which


current strategies and practices are based. These can be grouped into four cate-
gories: conversion to analogue form; maintaining the original format; standard-
izing formats; and migration. Conversion to analogue form involves converting
Typologies of principles, strategies, and practices 117

data to a human readable form on a stable carrier such as paper, film or metal.
Maintaining the original format is the most popular category, with four strate-
gies: ‘Making the data ‘self-describing’ and ‘self-sustaining’ by packaging it
with metadata and with links to software that will continue to provide access
for some time’; ‘Maintaining the data in its original form … and providing
tools that will re-present it … using the original software and hardware … or
using new software that emulates the behaviour of the original software and/or
hardware’; ‘Maintaining the data and providing new presentation software
(viewers) that will render an acceptable presentation of it for each new operating
environment’; ‘Providing specifications for emulating the original means of
access on a theoretical intermediate computer platform, as a bridge to later
emulation in a future operating environment’. Standardizing formats includes
either ‘creating data in, or converting data to, a highly standardised form of
encoding and/or document structure (or file format) that will continue to be
widely recognised by computer systems for a long time’, or ‘converting the
data to a format where the means of access will be easier to find’. Migration
involves converting the data to new formats that can be read by each new
technology; it also includes providing the ability for migration on demand ‘by
maintaining the data and recording enough information about it to allow a future
user or manager to convert it to a then-readable form’ (UNESCO, 2003, p.121).
Finally, preservation programmes, these guidelines suggest, must be organiza-
tionally viable and financially sustainable, if they are to be effective and credible
(UNESCO, 2003, p.42).
This classification is noteworthy because of its emphasis on combining
strategies and practices. In the absence of a single, universally applicable solution
(which is unlikely ever to be developed), these guidelines support the applica-
tion of multiple strategies in preservation programmes. These will likely be
based on the use of standards for encoding, structuring and describing digital
objects, emulation of obsolete software or hardware, and migration of digital
objects (UNESCO, 2003, pp.120-121). Whatever we might consider as main-
stream approaches – the consensus is that these are normalization, migration,
technology preservation, and emulation – they are not mutually exclusive, and
will be deployed according to the kind of digital materials to which they are
applied and their access requirements.
None of the typologies described here is complete. They do not, for in-
stance, note the principle of redundancy (multiple copies at multiple locations),
best exemplified by the LOCKSS programme. However, they allow us to iden-
tify the principles that are most important and to make judgments about the
applicability and viability of specific strategies and practices in the longer
term. For example, they help us distinguish more clearly between ‘preserve
technology’ and ‘preserve objects’ approaches, they encourage us to consider
the nature of the digital object and the different requirements that each may
have, with consequent different ‘best’ strategies and technologies for their
118 Overview of Digital Preservation Strategies

preservation, and they suggest that the tools we have available to us at present
should be applied in combination.

A typology of digital preservation?


The changing nature of the field of digital preservation means that listing the
desirable principles, strategies and practices that should be in place for viable
and effective digital preservation activities is challenging. The list that follows
attempts to be complete, but it is essential to recognize that it needs to be revised
on a regular basis.

1) Principles (general ways of thinking, usually at the conceptual level)


a) Separate archival storage of bits from the processes of representing
and managing those bits
b) Actively maintain digital information continuously from the moment
of its creation
c) Give high priority to cooperative activities, including collaboration
with and support from a wide range of stakeholders
d) Apply standards and participate in their development
e) Combine strategies and technologies, rather than focus on a single
solution
f) Apply the principle of redundancy
g) Ensure that digital preservation activities are part of a sustainable
environment that supports digital preservation over time
h) Build digital preservation into normal operating activities.

2) Desirable strategies (plans designed to achieve a particular long-term


aim)
a) Distinguish between medium- to long-term strategies (primary strate-
gies) and short-term strategies (secondary strategies). Short-term strate-
gies focus on safeguarding storage media, content and documentation,
and computer software and hardware; long-term strategies attempt to
address software and hardware obsolescence
b) Strategies should be feasible in terms of institutional responsibilities
and the costs required to implement them
c) Authenticity must be maintained, so strategies that emphasize preserving
objects over preserving technologies are to be preferred
d) Prefer strategies that recognize the nature of different kinds of digital
materials and the different requirements that each may have. Clear
definitions of what it is that we are trying to preserve are therefore
needed.
e) Adequate and appropriate metadata is required
Conclusion 119

f) Strategies should not be based on proprietary data formats or systems,


but rather on stable, widely used and clearly defined standards
g) Strategies should be designed around the smallest number of standard
data formats possible.

3) Desirable practices (the machinery and equipment used in the practical


application of scientific knowledge, from the definition of technology at
the beginning of this chapter)
a) Practices should have the characteristics of feasibility (is there soft-
ware and/or hardware capable of doing it?); sustainability (can it be
done into the future, or can an alternative future path can be identi-
fied?); practicality (can it be applied within reasonable limits of diffi-
culty and expense?); appropriateness (this criterion relates to the type
of objects and why we are preserving them)
b) Distinguish between practices that leave the digital components of the
record exactly as received (e.g. preserve old technology, emulate old
technology), and those that focus on the data formats (e.g. format mi-
gration, standardize formats, and persistent object preservation).

Conclusion
Lavoie and Dempsey have noted that digital preservation is ‘a set of agreed
outcomes’ based on considerations such as the complexity of digital materials
and the features that we collectively decide should be preserved. These consid-
erations, they suggest, require that ‘the choice of preservation strategy will
need to reflect a consensus of all stakeholders associated with the archived
digital materials. Achieving such a consensus is difficult, and in some circum-
stances, impossible’ (Lavoie and Dempsey, 2004).
How, then, are we to determine which strategies are best if we do not have
this consensus to guide us? Lavoie and Dempsey continue:

A second-best solution is for the digital repository to articulate clearly what outcomes
can be expected from the preservation process. These outcomes should in turn be under-
stood and validated by stakeholders. Communication between the repository and stake-
holders, either to promote consensus on preservation outcomes, or for the repository to
disclose and explain its preservation policies, mitigates the risk that the repository’s
commitments are misaligned with stakeholder expectations (Lavoie and Dempsey, 2004).

Clear communication with stakeholders is the key as long as we are in a rapidly


evolving environment where there is little agreement about preferred strategies
and practices, and where new approaches are being developed and trialed.
The following two chapters are loosely based on Thibodeau’s characteriza-
tion of strategies on a spectrum that ranges from ‘preserve technology’ to ‘pre-
120 Overview of Digital Preservation Strategies

serve object’ (noted earlier in this chapter), to which are added some aspects of
Rothenberg’s typology (also noted earlier in this chapter). They describe spe-
cific principles, strategies and practices. Chapter 7 examines approaches based
on preserving the technology, and Chapter 8 examines approaches based on
preserving the digital object.
Chapter 7
‘Preserve Technology’ Approaches: Tried and Tested
Methods

Introduction
The need to preserve digital materials runs counter
to the market ethos of the computing industry, which
requires high turn-over of hardware and software in
order to survive financially. This outlook necessitates
rapid changes in formats and functionality and an
unwillingness to support ‘obsolete’ technology, all of
which make it harder and harder to preserve access
to digital materials (Heazlewood, 2002)

The previous chapter presented a long list of possible strategies that are avail-
able or under consideration for digital preservation. It noted that they are still
developing, with no universally accepted solution in sight. The previous chap-
ter also noted some of the characteristics of digital preservation strategies and
practices and described some of the typologies proposed for their categoriza-
tion. The two chapters following this one take as their basis Thibodeau’s
typology, described in Chapter 6, add to it some parts of Rothenberg’s typology,
also described in Chapter 6, and use the outcome to describe specific principles,
strategies and practices. The result is three categories:

– ‘Non-solutions’ (examined in this chapter)


– ‘Preserve Technology’ approaches (also examined in this chapter)
– ‘Preserve Objects’ approaches (examined in Chapter 8).

‘Non-solutions’ (the term used by Rothenberg (2003)) are:

– Do nothing
– Storage and handling practices
– Durable/persistent digital storage media
– Analogue backups
– Policy development (already noted in Chapter 6)
– Standards (also already noted in chapter 6)
– Digital archaeology.
122 ‘Preserve Technology’ Approaches: Tried and Tested Methods

The list of ‘Preserve Technology’ approaches (the term used by Thibodeau


(2002)) is much shorter:

– Technology preservation
– Emulation.

The reader of this chapter and the next should keep firmly in mind that ‘Pre-
serve Technology’ and ‘Preserve Objects’, as noted in Chapter 6, are the two
ends of a spectrum of possibilities, not discrete points on that spectrum. There
are many points in between these two poles. It should also be understood that
there are ways of grouping the possible approaches. The grouping used in
Chapters 7 and 8 represents current thinking about digital preservation strate-
gies and practices, which will change. For instance, many of the approaches
described in this chapter – technology preservation, adherence to standards,
converting to stable analogue format, digital archaeology – are what the Digital
Preservation Coalition handbook refers to as secondary strategies, ‘those which
might be employed in the short to medium term either by the repository with
long-term preservation responsibility and/or by those with a more transient in-
terest in the materials’ (Digital Preservation Coalition, 2008, p.111). These
strategies are, in a sense, holding actions, buying time for digital materials while
longer-term strategies are developed, and practices that keep digital materials
viable are implemented. But, as is made clear in the Digital Preservation Coalition
handbook, these strategies and practices do not address the real issues of tech-
nology obsolescence; they only put off the need to make decisions.
This chapter draws heavily on two key sources: Preservation Management
of Digital Materials: A Handbook (Digital Preservation Coalition, 2008), and
the UNESCO Guidelines for the Preservation of Digital Heritage (2003). The
reader may wish to update these by referring to the numerous high quality
resources available on the web. Two useful starting points are the DCC web
site (www.dcc.ac.uk) and the Library of Congress’s Digital Preservation web
site (www.digitalpreservation.gov).

‘Non-solutions’
‘Non-solutions’ is one of three categories suggested by Rothenberg (2003) in
his overview of preservation approaches, the others being ‘partial solutions’,
and ‘potentially complete solutions’. This categorization is useful because it
emphasizes that some of the practices promoted as viable digital preservation
techniques, both in the past and at present, are not likely to achieve the aims
of digital preservation activities – that is, an assurance of long-term access
to authentic digital materials. Rothenberg’s examples of non-solutions are ‘do
nothing’ and digital archaeology. Leaving aside ‘do nothing’ for the moment,
‘Non-solutions’ 123

these non-solutions are strategies and practices that are useful in the suite of
approaches to digital preservation, or are essential parts of the infrastructure
required for successful digital preservation, but they do not provide outcomes
that result in preserved digital materials over time.
Of the five non-solutions noted in this chapter, the first (do nothing) is the
least realistic option. Unlike non-digital materials, for which the principle of
benign neglect can have validity, digital materials typically require active inter-
vention right from the moment of their creation, if they are to survive, as has
been noted in more detail in Chapter 1. Two of the non-solutions (storage and
handling practices and durable/persistent digital storage media) provide a
breathing space, extending the life of digital storage media and, thereby, ensuring
that the digital materials stored on them as bit-streams remain in good condition
for longer. This potentially provides sufficient time to develop and implement
strategies and practices that are viable over the long term. Storage and handling
practices focus on environmental control, handling, building design, and secu-
rity, although we note here only the first two. The durable/persistent digital
storage media non-solution is based on the premise that developing improved
storage media – media that will last longer, store more, and remain accessible
for longer than current media – will provide greater economies and efficiencies
by reducing the frequency of copying and the number of media that are handled.
However, both of these approaches are non-solutions because they focus on
the media, avoiding the real issue of technological obsolescence. Proposing
them as anything more than an interim solution is a classic example of the in-
appropriate application of pre-digital paradigm preservation thinking to digital
materials.
Another of the non-solutions, analogue backups, has the problem of negat-
ing the essential advantages of information in digital form, such as improved
retrieval of the information content and ease of dissemination. (There is, of
course, the possibility of converting back to digital form material that has been
copied to analogue form, although this is obviously expensive and unlikely to
happen.) This approach is also referred to as analogue copying, output to per-
manent paper or microfilm, and sometimes as page image techniques and saving
page images of artifacts, although these last two can also refer to making digi-
tal page images, such as PDF files. This approach, too, is an example of the in-
appropriate application of pre-digital paradigm preservation thinking to digital
materials. Two more approaches included in the category of non-solutions
(policy development and standards) are in fact principles, following the defi-
nition and use of this term in Chapter 6, rather than strategies or practices.
Attending to these provides a more favourable environment for successful
digital preservation. Policy development is a prerequisite for digital preserva-
tion. Both are considered in more detail later in this chapter. Standards are also
a prerequisite in that their development and application provides the same
benefits for digital preservation as they do for any cooperative endeavour.
124 ‘Preserve Technology’ Approaches: Tried and Tested Methods

Standards are considered in another context, that of standard data formats, in


Chapter 8.
The final non-solution, digital archaeology (also known as data recovery)
is commonly recognized to be ‘not precisely a preservation strategy’ (Digital
Preservation Coalition, 2008, p.111) but, rather, a fall-back position. It encom-
passes actions taken to recover digital data that have become inaccessible using
normal techniques, but where the value of the data warrants the time-consuming
and expensive techniques of the data recovery specialist. To digital archae-
ology we can add a recent approach, digital forensics, which has much in
common with digital archaeology.

Do nothing

The do nothing approach need not detain us long. As noted earlier in this chap-
ter and in more detail in previous chapters, especially Chapter 1, it is not an
option for digital materials. Doing nothing reduces to zero, in a very short
time, the possibility of preserving digital materials. One familiar example is
failure to monitor and respond to the deterioration of digital media (for example,
a diskette or a hard drive) and the consequent inaccessibility of any data it
might carry.

Storage and handling practices

Some of the reasons for media deterioration were noted in Chapter 3. This sec-
tion notes techniques that are currently recommended as good practice for the
storage and handling of media, in order to prolong the length of time that the
data on them remain accessible. Application of these techniques assumes that
the hardware and software required for these media are still available. (The
sections ‘Technology preservation’ and ‘Emulation’ later in this chapter con-
sider what is necessary to support this assumption.)
Paying attention to the care and handing of digital storage media is a suit-
able preservation strategy because it reduces risk. The risk of losing access to
digital objects stored on digital media can be reduced significantly, meaning
that although this strategy is short-term it is, nonetheless, worth committing
resources to. There are also cost factors to be considered; keeping the media
accessible for longer periods reduces the frequency with which media refresh-
ing or migration needs to occur and, therefore, the costs associated with these
activities. Storing digital media in appropriate environmental conditions (this
term refers to temperature, relative humidity, light and clean air levels) extends
the time the media will last. It also protects them to some extent against acci-
dental damage, as does appropriate handling. Of course, the need for attention
‘Non-solutions’ 125

to the storage and handling of digital media is not unique to digital storage
media; it is fundamental to the preservation of other materials as well.
The National Library of Australia’s 1999 Draft Research Agenda for the
Preservation of Physical Format Digital Publications lists ‘what we already
know’, based on experience at the National Library of Australia and other in-
stitutions. One of its statements is particularly relevant to this section – that
‘magnetic media such as floppy disks have a relatively short useable life span,
so we have to slow down the rate of deterioration and/or move what we want
to preserve to a more stable carrier’. The comment is made that ‘we know
quite a lot about the conditions of storage and handling that will maximise the
useful life’ of these media (National Library of Australia, 1999). Optimum
environmental conditions for the storage of digital media are well documented.
Figure 7.1 summarizes a range of recommendations for magnetic tape, CD-
ROM, and DVDs; conditions for other digital storage media are noted by
Brown (2008c).

Media Access storage Long-term storage


(allows immediate access and (preserves the media as long as
playback) possible)
Temperature Relative Temperature Relative
(°C) Humidity (°C) Humidity
(%) (%)
Magnetic tape 18 to 24 45 to 55 18 to 22 35 to 45
cassettes
12.7mm1
Magnetic tape 10 to 45 20 to 80 18 to 22 35 to 45
cartridges1
Magnetic tape 5 to 45 20 to 80 5 to 32 20 to 60
4 & 8mm helical
scan1
Magnetic tape3 Room ambient Room ambient As low as 5 o, As low as
(15 to 23), (25-75), maximum 20%, maxi-
maximum maximum variation 4o mum variation
variation 4o variation 20% ±10%
Magnetic tape4 11 to 23 20 to 50
(maximum
temperature
must be asso-
ciated with
minimum RH,
and vice
versa)
126 ‘Preserve Technology’ Approaches: Tried and Tested Methods

Media Access storage Long-term storage


(allows immediate access and (preserves the media as long as
playback) possible)
Temperature Relative Temperature Relative
(°C) Humidity (°C) Humidity
(%) (%)
CD-ROM1 10 to 50 10 to 80 18 to 22 35 to 45
CD-ROM & 4 to 20 20 to 50
DVD2
Recordable CD -10 to 23 20 to 25,
& DVD5 maximum
variation
±10%
1
Digital Preservation Coalition, 2008, p.108, from BS 4783
2
Byers, 2003, p.vi
3
Based on Van Bogart, 1995, p.18
4
Bigourdan et al., 2006, from ISO 18923:2000
5
Iraci, 2010, p.4, from ISO 18925:2008
Figure 7.1: Environmental Storage Conditions for Some Digital Storage Media

The life expectancy of digital storage media is determined by a number of fac-


tors. For optical disks, for example, it is a combination of the ‘manufacturing
quality, condition of the disk before recording, quality of the disk recording,
handling and maintenance, [and] environmental conditions’ (Byers, 2003, p.12).
Similar statements can be made about other media types. For all of these media
a critical issue is the provision of optimum storage conditions, with tempera-
ture and relative humidity within the recommended ranges, and with as little
fluctuation as possible. Light levels are usually controlled; Byers notes that
this is less of an issue for CD-ROMs and DVDs, but is important for CD-Rs
that use dye-based technology, because light levels affect the rate at which the
dye layer (the layer where the data are recorded) degrades (Byers, 2003, p.17).
Another factor for media in long-term storage is the need to acclimatize the
media if it is removed from this storage to room temperature for access (play-
back) purposes. Long-term storage areas are typically kept at much lower tem-
peratures and relative humidity levels: for instance, Byers’ ‘consensus of several
reliable sources on the prudent care of CDs and DVDs’ (2003, p.1) recom-
mends long-term storage levels as 18oC and 40% RH, but ‘a lower temperature
and RH … for extended-term storage’ (2003, p.vi).
Careful handling of digital storage media also prolongs their life. For CDs
and DVDs, for instance, careful handling minimizes the possibility of data loss
resulting from scratching the data layer. Written guidelines and procedures for
handling digital storage media help to ensure correct handling. For CDs and
‘Non-solutions’ 127

DVDs these procedures could include statements about handling the disks by
their centre hole and outer edge, not touching the disc surface, returning them
to storage cases immediately after use, marking or labeling disks in a non-
harmful way, not attempting to bend the disks, and cleaning them as seldom as
possible, and then using approved methods only (Byers, 2003, pp.vi,19,25-26;
Iraci, 2010, p.4).
Figure 7.2 summarizes recommendations derived from several sources
about the storage and handling of digital storage media. It is not exhaustive.

Storage areas Keep free of smoke, dust, dirt and other contaminants
Store magnetic media away from strong magnetic fields
Keep cool, dry, dust-free, stable and secure
Minimize light levels, especially direct sunlight
Prohibit smoking and eating in the storage area
Minimize the threat from natural disasters
Provide appropriate enclosures for media to afford additional pro-
tection
Store in appropriate conditions any non-digital accompanying
materials such as operating instructions and codebooks
Monitoring Monitor environmental conditions on a regular basis
Monitor media condition on a systematic basis
Handling Handle with care
practices Minimize the handling and use of archival media
Establish guidelines and procedures for acclimatizing media if
moving them from significantly different storage conditions
Do not place adhesive labels on optical disks
Follow recommended guidelines for labeling and marking
Document the contents of the media, when created, and their
frequency of use
Quality of Choose digital media with their longevity in mind
media and Use high quality equipment
equipment Keep access devices well maintained and clean
Disasters Prepare for accidents
Figure 7.2: Storage and Handling of Digital Storage Media (From Digital Preservation
Coalition, 2008, pp.104,139; Howell, 2001, p.144; Iraci, 2010, p.4; Ross and
Gow, 1999, p.43)

Durable/persistent digital storage media

Encouraging the manufacture and use of durable/persistent digital storage media


is based on the idea that by using storage media that last longer and store more
data, the number of times refeshing (copying of data to new media) is required
and the number of media items to be handled is reduced, thereby producing
128 ‘Preserve Technology’ Approaches: Tried and Tested Methods

economies through efficiencies. This approach is a non-solution because it does


not address technological obsolescence. It does, however, have a place among
the practices used in digital preservation in that it buys time while other strate-
gies and practices that will make long-term preservation viable are evaluated
or developed and then implemented. The lifespan of consumer digital storage
media is typically short. One of the reasons for this short lifespan is the eco-
nomics of the storage media manufacuring industry; another is that the media are
continuously evolving.
Despite the limited use of strategies and practices aimed at increasing the
life and efficiency of digital media, they are useful tools. The lifespan of digi-
tal storage media is well enough known for us to be aware of the risks we take
when we use these media for preservation purposes. This awareness should
make us wary of some of the claims made by the media manufacturers, which
‘tend to reflect the exuberance of scientists compounded by the hype of their
marketing teams’ (Ross and Gow, 1999, p.iii). Attention to the selection of
high-quality media for preservation purposes (gold CDs perhaps – claims are
common that archival gold disks will last in excess of 100 years) extends the
period of time before refreshing is needed and helps reduce data loss as the
result of media deterioration, but that is all it does. Relying on strategies and
practices that focus on media storage has ‘the potential for endangering con-
tent by providing a false sense of security’ (Kenney et al., 2003). There is still
a considerable lack of awareness about the real issues of digital preservation.
A typical example is an advertisement for the ‘Amazing Century Disc’ which
offers a long-lived optical disk but makes no mention of technological obso-
lesence concerns, such as the unavailability of drives or software drivers in the
future (www.centurydisc.com/faqs/CenturyDisc_FAQ.html). As noted in Chap-
ter 2, ‘the myth that long-lived media equals long-lived preservation is still
worryingly popular’ (Abbott, 2003, p.10).
Research into digital storage media continues. Typically it is concerned
with developing media with greater storage densities, but some focuses on
long-lived media. One example is HD Rosetta technology, explained as nickel
plates onto which document images are micro-engraved using an ion beam;
viewing with optical magnifiers is required (www.norsam.com/rosetta.html).
Despite the benefits that improvements in digital media may produce, it is
doubtful that there will be sufficient benefit to warrant serious attention. There
is, perhaps, an analogy with the efforts made by the library and publishing
industries, archivists, and authors to influence the manufacture of paper so that
it was less acidic and longer lasting (Harvey, 1993, pp.192-194). In that case
the benefits were significant. For digital materials, the outcome of this way of
thinking (which is pre-digital preservation paradigm thinking) is considerably
less productive.
‘Non-solutions’ 129

Analogue backups

Making analogue backups is a non-solution in the sense that it preserves only


some of the attributes of the digital materials and drops what Rothenberg refers
to as ‘their core digital attributes’, such as their machine-readability, interac-
tive abilities, and other aspects of their functionality (Rothenberg, 1999a, p.9).
This approach is most commonly discussed in terms of making copies of the
information content by printing to paper or transferring to microfilm. Early
attempts to preserve digital materials often focused on copying them from
relatively unstable digital storage media to formats known through extensive
experience to be more stable, such as paper and microfilm. Hedstrom remarked
in 1998 that it was ‘probably the most commonly used preservation strategy’
because, lacking ‘more robust and cost-effective migration strategies’ at that
time, copying to more stable formats, even if they were analogue, offered advan-
tages as a method of last resort (Hedstrom, 1998, p.194). Although it has largely
been discounted as a viable preservation approach, it may on occasion still be
useful. An Australian digital preservation specialist interviewed in 2004 com-
mented:

when we first started thinking about preserving web sites, and capturing them was at
that stage seen to be too difficult, I considered for a time just at least printing out all the
web pages that we could identify … on good colour printers and storing them as repre-
sentations of at least what was on the Web in 1996 … [but] I thought that was inappro-
priate as a form of preservation … because it had to be digital. Well in fact now I really
rue that decision because … we would have had at least some record of what was
happening in 1996 on the Web.

Making analogue backups or copies is defined more specifically as ‘converting


certain valuable digital resources to a stable analogue medium’ (Digital Preser-
vation Coalition, 2008, p.118), thus ‘shifting the preservation burden to an ana-
logue copy in place of the digital object’ (UNESCO, 2003, p.142). The stable
analogue media suggested are paper (usually ‘permanent’ paper – paper that
meets the requirements of ISO 11108:1996) or microfilm (silver halide produced
to preservation standards).
Making analogue backups may have value as a preservation practice for a
limited number of categories of digital materials. For these categories, it pre-
serves the information content in a form free from the threat of technological
obsolescence because it either does not need equipment to access it (as in paper)
or requires only relatively simple equipment (such as a microfilm reader).
These analogue formats are likely to remain accessible for some hundreds of
years if they are produced to preservation standards (such as the standards
defined for ‘permanent’ paper or archival-quality microfilm) and stored in
appropriate conditions. However, this approach has significant limitations, as
already noted; it does not preserve the digital characteristics of the materials it
130 ‘Preserve Technology’ Approaches: Tried and Tested Methods

is applied to. For instance, there is a loss of functionality, such as the ability to
carry out calculations in spreadsheets, to search, or to make lossless copies.
Because of its resource requirements, it is only applicable to small-scale opera-
tions. For these reasons, making analogue copies is best considered as an interim
just-in-case strategy that is limited in its application to a narrow range digital
materials – those for which loss of functionality is not important. Case studies
of institutions where microfilm is part of an integrated preservation programme
are provided by Brown et al. (2011).

Digital archaeology and digital forensics

Digital archaeology, often referred to as data recovery, denotes a set of tech-


niques that are applied as a last resort and is, therefore, not a solution in any
sense. The term applies to ‘methods and procedures to rescue content from
damaged media or from obsolete or damaged hardware and software environ-
ments’ and involves ‘specialized techniques to recover bitstreams from media
that has been rendered unreadable, either due to physical damage or hardware
failure’ and is ‘explicitly an emergency recovery strategy’ (Kenney et al.,
2003). Special facilities, equipment and expertise are required, and it is usually
carried out by specialized data recovery companies whose expertise has estab-
lished that it is possible to recover data from a wide range of media types. The
recovery of the data does not necessarily lead to recovery of the ability to under-
stand those data, although it is a necessary pre-condition for it.
The UNESCO Guidelines caution us against relying on digital archaeology
and reminds us that it is ‘a very unreliable and high-risk substitute’ for a current
and active preservation programme (UNESCO, 2003, p.144). For one thing, it
is expensive, and most materials would not justify the costs of their recovery.
But more important, it is not reliable; there is no guarantee that digital materials
can be recovered, and, even if the data are recovered, there is no certainty that
they will be intelligible (Digital Preservation Coalition, 2008, p.111; UNESCO,
2003, p.144).
Ross and Gow’s 1999 study Digital Archaeology: Rescuing Neglected and
Damaged Data Resources suggests that data recovery should be unnecessary if
‘good disaster planning’ is in place, but this is very rarely the case. They note
(and this is the sting in the tail) that ‘with sufficient resources much material
that most of us would expect to be lost can be recovered’ (Ross and Gow,
1999, p.iii). Their detailed report is essential reading for further information
about digital archaeology and its techniques.
Digital forensics is, at the time of writing, the new kid on the block. It is
related to digital archaeology in that the original technology is needed in order
to make a copy of the files from the original storage device in the form of a
disk image. Digital forensics was developed in law enforcement and computer
‘Preserve technology’ approaches 131

security environments and is routinely used in these fields and others, such as
counter-terrorism, to locate data on digital storage media and authenticate
them so they meet standards of admissibility for legal purposes. This clearly
has strong resonances with the requirements to preserve authenticity in digital
preservation settings. Digital forensic techniques are based on capturing disk
images that are exact copies of information on digital media, including hidden
files and files that record changes to the information. This allows capturing of
contextual information that is often vital to preserving the authenticity of digital
materials. The methods and tools developed and used by digital forensics experts
are being adapted and applied in cultural heritage environments. Digital foren-
sics laboratories have been established at some larger archives and libraries,
including the Stanford University Libraries’ digital forensics laboratory (lib.
stanford.edu/digital-forensics) and BEAM (Bodleian Electronic Archives and
Manuscripts) (www.bodleian.ox.ac.uk/beam). The FIDO (Forensic Investigation
of Digital Objects) project (fido.cerch.kcl.ac.uk) is investigating the applica-
tion of digital forensics to archives in the university sector. Digital forensics
techniques are likely to become a standard part of the digital preservation toolkit.
Kirschenbaum, Ovenden and Redwine (2010) provide an excellent introduction
to the application of digital forensics to cultural heritage materials in digital
form.

‘Preserve technology’ approaches


Approaches located at the ‘preserve technology’ end of the digital preservation
spectrum attempt to keep data in specific logical or physical formats and use
technology originally associated with those formats to access the data and re-
produce the objects (Thibodeau, 2002). That is, they aim at as little change as
possible, preferably none, to both the digital materials (which remain exactly
as received) and the software, operating system and hardware that these mate-
rials were originally developed to be rendered on. Strategies and practices at the
extremes of the ‘preserve technology’ end of the spectrum tolerate no change
at all. Further along the spectrum lies emulation of old technology, where
change is permissible but the original conditions are maintained as far as
possible.
Of the two approaches noted in this ‘preserve technology’ category, one –
emulation – is commonly considered as viable for long-term preservation, but
the other – maintain original technology – is not. The UNESCO Guidelines
take the view that technology preservation is a short-term strategy and classi-
fies emulation as medium- to long-term (UNESCO, 2003, p.131).
132 ‘Preserve Technology’ Approaches: Tried and Tested Methods

Technology preservation

The ‘preserve technology’ approach to digital preservation is also referred to


by other terms which, taken together, define its scope: preserve technology;
maintain old technology; maintain museums of working computing equipment,
software and documentation; and the related strategy ‘technology watch’. This
approach requires that obsolete hardware and software are maintained in working
condition so that the digital materials retained by an institution can be accessed.
It has a long history, being the most immediately obvious approach to take
when obsolescence is imminent, but experience with this approach has shown
that it is an expensive – ultimately too expensive – approach. This experience
has also indicated that, given the large number of formats, hardware and soft-
ware that become obsolete (a number that is increasing over time), it is not a
viable option for anything but the short term. Its only role is as a first line of
defence:

it is the most basic, and in some ways the most important first step in preserving access
if no other strategy is in place. If the hardware and software required for access are dis-
carded before other strategies are available, it may be effectively impossible to provide
later access without expensive and uncertain data recovery work (UNESCO, 2003,
p.131).

Technology preservation is ‘ultimately a dead end’ because obsolete technol-


ogy cannot be maintained in a functional condition indefinitely. At best it can
‘extend the window of access for obsolete media and file formats’. Technology
preservation is heavily resource-intensive (Kenney et al., 2003) and the ex-
perience of preserving technology in audiovisual archives over many decades
provides clear evidence of this. Edmondson (2004) provides a starting point
for further investigation into its application in audiovisual archives.
There are several reasons why preserving technology is helpful. It allows
for complete preservation of authenticity, both of the digital materials (which
are unaltered), and of the technology and software platforms that render them
(which are also not changed); the functionality of the original and the way the
original performs are retained. Preserving technology provides the benefit of
buying time, by delaying the need to apply practices that will ensure long-term
preservation. If documentation for the technology is also preserved, this will
be of use in implementing other strategies such as emulation in the future.
But there are many less positive factors of technology preservation. It be-
comes more and more expensive as replacement parts for hardware and the
expertise to maintain it become scarce, both of which may happen over a short
period of time. Although technology preservation allows us to extend the time
we can access obsolescent and obsolete digital materials, it does so for only a
short period of time, perhaps as little as five years. To be effective this approach
‘Preserve technology’ approaches 133

requires access to documentation, such as software and hardware manuals,


which also need to be procured and managed.
To work effectively, the ‘preserve technology’ approach has specific require-
ments, which are articulated in the UNESCO Guidelines. It requires that the
hardware and software that will be needed to provide access are actively iden-
tified and that their path towards obsolescence is monitored. (This is discussed
in the ‘Technology watch’ section below.) Active and continuing arrangements
to maintain hardware and to license software should be put in place. The exper-
tise needed to maintain equipment and software should ideally not reside in one
person, but should be shared among several people (UNESCO, 2003, p.132).
It is worth reiterating that technology preservation is useful in the suite of
strategies and practices for digital preservation, but it must also be noted that it
can be no more than an interim strategy because of its limitations. It is good
practice after a technology upgrade to preserve the last generation of technol-
ogy for a short period.
Some libraries and archives maintain working pieces of old technology.
An example from Cornell University, where a service was offered to migrate
digital objects in obsolete file formats and on obsolete media into formats and
media that are more preservation-friendly, illustrates the kinds of old technology
that were needed (Entlich and Buckley, 2006).
Museums of computing have been established, such as the Computer His-
tory Museum (www.computerhistory.org), the National Museum of Computing
(www.tnmoc.org) and the Personal Computer Museum (www.pcmuseum. ca),
but maintaining working computers to support the preservation of digital mate-
rials is unlikely to be their primary interest. An earlier commentator envisaged
that a widespread industry will develop, ‘the equivalent of Kinko’s where they’ll
have every ancient computer available’, whose customers will drop in to read
and copy their old data files (Peter Schwartz, quoted in Hafner, 2004), but this
has not happened.

Technology watch

A strategy related to technology preservation is technology watch. Because the


process of technology obsolescence is rapid and storage media and file formats
can change equally rapidly, there is a need to monitor the rate of change so that
access to digital materials in danger of being lost because of this obsolescence
can be maintained. The process of technology watch acts as a trigger to action,
to ensure that equipment is maintained until another preservation practice is
applied to at-risk material or until digital objects are migrated to a more stable
storage medium.
The process is simply described: first, identify at-risk technology in a col-
lection; next, monitor its rate of obsolescence at regular intervals; finally, take
134 ‘Preserve Technology’ Approaches: Tried and Tested Methods

action when a technology is in danger of no longer being supported. At the


institutional level, a first step is likely to be a survey of digital materials present
in a collection to identify the formats, media, and software requirements pre-
sent. A risk assessment approach from which a plan of action is developed
may be useful. McLeod (2008) illustrates how risk management principles
were applied at the British Library to identify strategies for mitigating the risks
identified for digital storage media.
The Digital Preservation Coalition publishes a series of Technology Watch
Reports, available from its web site (www.dpconline.org/advice/technology-
watch-reports), as does the Digital Curation Centre (www.dcc.ac.uk/resources/
briefing-papers/technology-watch-papers).

Emulation

The battle lines were drawn in the 1990s between migration and emulation as
the preservation strategy most likely to succeed. In the event neither has domi-
nated, as we learn to place less trust in a single-strategy salvation and to develop
ways of working and thinking that accommodate several approaches simulta-
neously.
Emulation is based on the principle that it is possible to imitate obsolete sys-
tems on current systems using software ‘that makes one technology behave as
another’ (UNESCO, 2003, p.140). For example, an obsolete operating system
can be imitated so that digital objects are rendered and programs executed
without any change to them on a current computer. This is instead of either
changing the digital objects so they can be read on a current system or locating
a working version of the obsolete operating system and the hardware on which
it runs. Emulation reduces the need to keep old hardware working.
Emulation is a well-established principle in the computing industry and
will be familiar to most computer users. Emulation can be of operating systems:
for example, Windows Virtual PC, which allows older version of the Windows
operating system to be run in Windows 7 (www.microsoft.com/windows/virtual-
pc) and VMWare Fusion (www.vmware.com/products/ fusion), which allows
Windows to be run on an Apple computer running the Mac OSX operating
system. It can be emulation of hardware platforms: for example, Kaypro or
Apple II machines can be emulated on a PC; or of the computer chips (visual
6502.org). It can be emulation of software applications: for example, arcade
games emulated through the MAME project (mamedev.org). Printers are often
designed to emulate Hewlett-Packard printers. Terminal emulation is also
common, so that a PC, for example, can be used as a terminal connected to a
larger computer; these were once very common in mainframe computer
environments. Although emulation is often associated with computer games,
such as emulators that allow Sony PlayStation games to be played on a PC, it
‘Preserve technology’ approaches 135

is an essential part of all areas of computing. The web is a fruitful source for of
information about emulators.
Both hardware emulation and software emulation have been experimented
with to determine their feasibility for preserving digital materials. Emulation
of hardware is attractive because the result is applicable to a wide range and
large volume of digital objects. The same widespread applicability applies to
emulation of operating systems. Emulation of software applications is less
favoured, because it is more limited in its use and the effort and level of skill
required to develop a complex piece of emulator software that can be used for
only a small number of digital materials may be too high for it to be an option
in most preservation situations.
Emulation has been investigated in several major projects, most notably
the CAMiLEON (Creative Archiving at Michigan and Leeds) project which
investigated emulation, testing available emulators and constructing an emula-
tor for the BBC Domesday Project. Among its conclusions was that ‘emulation
is not necessarily superior to migration for preserving the original look and
feel of complex digital objects’. (The project’s web site (www2.si.umich.edu/
CAMILEON) provides more information.) However, more research was
needed, as this was a study of limited scope (Hedstrom and Lampe, 2001).
Project NEDLIB (Networked European Deposit Library) ran from 1998 to 2000
and was based at the Koninklijke Bibliotheek in The Hague. One of its activities
was to conduct an experiment using commercial emulation tools to investigate
the feasibility of emulation for digital preservation. This experiment, con-
ducted by Rothenberg, concluded that emulation should work in principle, but
that further investigation was required to demonstrate that it can also work in
practice (Rothenberg, 2000).
Jeff Rothenberg’s name is firmly linked with emulation as a digital preser-
vation strategy. He has been a strong proponent of emulation as the only digital
strategy that is likely to be effective. His argument is that authenticity of a
digital object can only be ensured if it is run on the software with which it was
created, because it is unrealistic to expect software in the future to behave in
exactly the same way as the original. Rothenberg points out that all digital
materials depend on software and that many new kinds of digital materials are
‘inherently digital’ and ‘cannot be meaningfully represented as page images’
(Rothenberg, 2003). This means that preservation strategies such as saving
page images are of little use for the preservation of much digital material.
Emulation, he contends, is the only preservation approach that has multiple
advantages and capabilities, among them preserving executable digital objects
(objects in which software programs are embedded), providing a ‘single, con-
sistent way’ of preserving all kinds of digital materials, reducing the effort ex-
pended in preserving individual artifacts (except for the necessary effort of
copying the bit-stream onto new media), and minimizing the need to under-
stand record formats. Despite his strong advocacy of emulation, Rothenberg
136 ‘Preserve Technology’ Approaches: Tried and Tested Methods

suggests that a mixed strategy approach is most feasible, with emulation used
‘if original behavior is needed; or as a cheap backup, to preserve everything’
(Rothenberg, 2003).
Other authorities also suggest that emulation has potential. It is one of only
two primary strategies (those suitable for medium to long-term preservation of
digital materials) noted, the other being migration, in the Digital Preservation
Coalition’s handbook (2008, pp.111-114). This source suggests that it has the
advantage over other methods of recreating the look and feel and functionality
of the original digital material, as well as the potential for avoiding the high
costs associated with repeated migration. Emulation is judged to have good
prospects for preserving complex digital materials (Digital Preservation Coali-
tion, 2008, p.113). The UNESCO Guidelines note that emulation is already
well established and understood in computing, that many emulators already ex-
ist for a variety of hardware and software platforms, and that it has the potential
to ‘allow a range of digital objects to be recreated with full functionality, in-
cluding software objects, using the original, untransformed data stream in
combination with original preserved software’ (UNESCO, 2003, p.141).
Not all share Rothenberg’s enthusiasm. The arguments against emulation
form a list as long as the arguments in its favour (Digital Preservation Coalition,
2008, p.114; UNESCO, 2003, p.141). Chief among them is that emulation has
not been sufficiently tested in practice. Also high on the list of disadvantages is
the high cost of developing emulators, which may be greater than the costs of
repeated migration, because it requires high levels of expertise to write com-
plex software. There is some scepticism about the ability of emulation to do all
that is claimed, and it may not be possible to emulate fully all of the functionality
of the original, nor all of its look and feel. The lack of adequate documenta-
tion of hardware and software may frustrate emulation attempts. Copyright
issues associated with ownership of software code may impede emulator devel-
opment. Users may have problems in interacting with old applications, and
there may be a need to either migrate the emulators themselves or emulate
the emulators.
If emulation is to be applied, then the requirements are many and varied.
Appropriate expertise is, of course, essential. Documentation of the systems to
be emulated needs to be comprehensive and accurate. The emulation software
should be written in open-source code and should follow best industry prac-
tice, which includes thorough documentation.
Probably the most widely reported emulation project carried out for preser-
vation purposes to date is the BBC Domesday Project. Abbott (2003), Darling-
ton, Finney and Pearce (2003), Mellor (2003) and Domesday Reloaded (2011)
are just some of the reports on this project. The CAMiLEON web site (www2.
si.umich.edu/CAMILEON/domesday/domesday.html) is also a useful source.
The original Domesday Project, undertaken from 1984 to 1986, surveyed the
UK to celebrate the 900th birthday of the Domesday Book. It cost about ǧ2.5
‘Preserve technology’ approaches 137

million and involved about one million school children from 14,000 British
schools. The resulting images and text were recorded on two 12-inch video-
disks that were accessed using a LV-ROM (LaserVision Read Only Memory)
player attached to a BBC Master computer with additional software and hard-
ware (Abbott, 2003, p.7). As part of its contribution to the project to preserve
the Domesday Project, the CAMiLEON Project developed an emulation of the
original Domesday system hardware.
The process of developing this emulation involved migrating the data files
from the videodisks to current media and developing software that emulates
the BBC Master computer and the laserdisk player (Mellor, 2003). Image files
were re-digitized from the original one-inch analogue videotapes (Darlington,
Finney and Pearce, 2003). The need to avoid the obsolescence of the emulation
software was kept in mind; ideally, the operation of the software should not be
limited to any specific operating system or type of computer so that it will be
easier to run on future computers. In the end, however, the BBC Domesday
emulator that was developed ran only on the Windows operating system (Mellor,
2003, p.8). A separate initiative resulted in a web version that was available
until 2008 (domesday1986.com). In 2011 the BBC made available a full extrac-
tion of the community disk on the Domesday Reloaded web site (www.bbc.co.
uk/history/domesday). A second disk, the national disk, is not available, but
has been handed over to The National Archives for long-term preservation.
Significant lessons were learned from the BBC Domesday Project. This
emulation project was hampered by lack of documentation and software to test
the emulation, but was fortunate in that the original system was still available
and functioning. This allowed the developers to ‘compare with and validate the
migrated system’, which has special significance in a multimedia system ‘where
the look-and-feel and user interaction is important’ (Darlington, Finney and
Pearce, 2003). Wheatley, one of the team who worked on this project, summa-
rizes the issues:

Most of the really difficult problems we faced were due to the long time gap between the
creation of Domesday and its preservation. If we had conducted the rescue 10 years
earlier it would have been far easier. The timeliness of preservation work is a crucial
issue that Domesday really underlines. Would we be able to rescue Domesday if we left it
another 10 years? I’m sure we could, but it would be at far greater expense (Abbott, 2003,
p.10).

Emulation is likely to play a significant role as a major preservation strategy. It


has been sufficiently well tested to be adopted as one key strategy of the Plan-
ets approach to digital preservation, resulting in the prototype GRATE (Global
Remote Access to Emulation) service which provides access via the web to sev-
eral centrally hosted emulators, such as Dioscuri (dioscuri.sourceforge. net) and
QEMU (wiki.qemu.org) (Suchodoletz et al., 2010). Dioscuri is an MS-DOS
emulator developed specifically for preservation purposes. It is open-source
138 ‘Preserve Technology’ Approaches: Tried and Tested Methods

and portable, running on any computer platform that supports the Java Virtual
Machine.
There is still a need to allocate significant resources, probably through
collaborative action, to develop a range of emulators. Emulation is, however,
unlikely to be the single solution to digital preservation as has been advocated.
Nor is it likely to replace migration as a primary digital preservation strategy,
because emulators will themselves need to be migrated. Holdsworth and
Wheatley (2001) remind us that emulation

should not be over-sold as the answer to all digital preservation issues. It is just part of
the armoury necessary for defending our digital heritage against the ravages of time in
a world where innovation (and hence change) is highly prized.

There are plentiful signs that emulation will be used increasingly as more emu-
lators are developed for specific categories of digital materials, such as com-
plex digital materials, or for digital objects containing executable software that
will only run on particular hardware, or for digital materials that need to be
viewed in their original environment. Its use at the Nationaal Archief of the
Netherlands is described in a recent case study (Helwig, Roberts and Nimmo,
2010).

The Universal Virtual Computer

Related to emulation is the Universal Virtual Computer (UVC). A UVC is


a software program that is written in simple machine language to emulate
the basic architecture of a computer. It is based on the concept that this basic
architecture can be implemented on any future computer because the UVC is
completely independent of the architecture of the computer on which it runs.
The UVC is created at the point when a digital object is accessed from an
archive. It is

constructed independently of any existing hardware or software, so that it is independent,


too, of time. It would simulate the basic architecture that every computer has had since
the beginning: memory, a sequence of registers, and rules for how to move information
among them (Tristram, 2002, quoting Raymond Lorie).

The simplicity of the Universal Virtual Computer ensures that it will be ‘rela-
tively easy to write an emulator for [it] in the future on the real machine being
used at that time’ (Lorie, 2002, p.1). The Universal Virtual Computer was
developed by the Koninklijke Bibliotheek and IBM who have implemented a
UVC tool that handles images in JPEG and GIF formats (Van Wijngaarden and
Oltmans, 2004). Van Der Hoeven, Van Diessen and Van der Meer (2005) note
the tool and its operation at the Koninklijke Bibliotheek in more detail.
Conclusion 139

The significance of a UVC for preservation is that it provides a single


platform, with the potential for minimizing the amount of effort required to
handle diverse combinations of hardware and software. For example, it mini-
mizes the need to migrate digital objects (including emulation software). The
Planets Project investigated the use of UVC as one of the preservation services
it offered. If the Universal Virtual Computer is more widely implemented, it
represents a valuable strategy although it does require considerable investment
at the beginning of the preservation process (UNESCO, 2003, p.130).

Conclusion
This chapter has described a number of strategies and practices currently applied
or being tested for digital preservation. They are all focused on the principle
that alteration of digital materials must be kept to a minimum and that the
technology (hardware and software) to access these materials is kept operational
or is emulated. Of the range of strategies and practices noted here, some are
interim measures and only one – emulation – is generally considered to be viable
for the long-term preservation of digital materials. The next chapter considers a
range of approaches from the other end of the preservation spectrum: ‘preserve
objects’.
Chapter 8
‘Preserve Objects’ Approaches: New Frontiers?

Introduction
Handling file formats is an essential part of a long-
term archive … most file formats fall out of fashion
within a few decades, and unless action is taken at an
early stage, many archived files will be incomprehen-
sible blocks of bits (Clausen, 2004)

The previous chapter and this chapter are structured around three categories of
preservation strategies and practices, derived from typologies of Thibodeau
and Rothenberg:

1. ‘Non-solutions’
2. ‘Preserve Technology’ approaches
3. ‘Preserve Objects’ approaches.

This chapter examines the third category, the ‘Preserve Objects’ approaches
which are at the opposite end of Thibodeau’s spectrum of digital preservation
methods from ‘Preserve Technology’ approaches. Like Chapter 7, it relies heav-
ily on two key sources: Preservation Management of Digital Materials: The
Handbook (Digital Preservation Coalition, 2008) and the UNESCO Guidelines
for the Preservation of Digital Heritage (UNESCO, 2003).

‘Preserve Objects’ approaches


The ‘preserve objects’ approach attempts to preserve the essential characteristics
of digital materials without reliance on specific hardware or software. That is,
the digital materials may be altered, although it is preferable if they are not, but
only to the extent that is required to make them able to be rendered by current
and future technology (hardware and software) without compromising the
‘essence’ or essential elements (discussed in Chapter 5) of that material. These
essential elements are ‘defined explicitly and independently of specific hardware
or software’ (Thibodeau, 2002). The ‘preserve objects’ approach can be con-
trasted with the other end of the spectrum of digital preservation methods, the
‘Preserve Objects’ approaches 141

‘preserve technology’ approach, where, as stated in Chapter 7, as little change


as possible – preferably none – occurs to both the digital materials and the
software, operating system, and hardware that these materials were originally
developed to be rendered on. The strategies and techniques encompassed by
the ‘preserve technology’ approach attempt to keep data in specific logical or
physical formats and to use technology originally associated with those for-
mats to access the data and reproduce the objects (Thibodeau, 2002). There are
other positions along the spectrum between these two poles; for example, in the
middle are ‘methods that migrate data formats as technology changes, enabling
use of state-of-the-art technology for discovery, access, and reproduction’ (Thi-
bodeau, 2002).
This chapter is concerned with the range of strategies and practices that
seek to accommodate changes in technology without compromising the digital
materials themselves or the ability to render their essential elements accurately
and meaningfully. These strategies and practices are grouped for the purposes
of this chapter as:

– those based on copying the bit-stream (backup and restore, bit-stream


copying, refreshing, replication, mass digital storage systems)
– those centered around data formats (standard data formats, normalization,
developing archival file formats)
– migration, which requires the bit-stream to be copied, and for which sig-
nificant knowledge about data formats is required (including migration on
demand and the use of viewers)
– encapsulation, which is based on packaging the digital object with meta-
data and other associated information
– combinations of strategies and practices.

Migration is considered as a primary strategy: one ‘which might be selected …


for medium to long-term preservation of digital materials’ (Digital Preserva-
tion Coalition, 2008, p.111). While migration is in widespread use, it is often
combined with other strategies and practices from the groups noted above, parti-
cularly with normalization (restricting the range of file formats that the archive
agrees to receive and manage). Other strategies described in this chapter, such
as normalization and encapsulation, are typically used in conjunction with
migration; they are secondary strategies – ‘those which might be employed in
the short to medium term either by the repository with long-term preservation
responsibility and/or by those with a more transient interest in the materials’
(Digital Preservation Coalition, 2008, p.111) – which may precede, ‘substan-
tially defer the need for’, or strengthen primary strategies. Some strategies noted
in this chapter are investment strategies, as they are named in the UNESCO
Guidelines’ classification of digital preservation strategies (UNESCO, 2003,
pp.122-141); they primarily involve ‘investment of effort at the start’. Encapsu-
142 ‘Preserve Objects’ Approaches: New Frontiers?

lation and restricting the range of formats to be managed fall into this category.
Other strategies noted in this chapter (backwards compatibility and version
migration, and migration) are short-term strategies, those ‘likely to work best
over the short-term only’. Two of the strategies and practices included in this
chapter, migration and using standards, especially standards for data structuring,
are ‘current front-runners as long-term strategies’ that have been shown to work
over time (UNESCO, 2003, pp.120-121).

Bit-stream copying, refreshing, and replication


Some definitions are called for. Bit-stream copying is the process of making an
exact duplicate of a digital object (Kenney et al., 2003). Refreshing is the
copying of data onto new media. Replication, also known as redundancy, refers
to keeping multiple copies. All of these processes involve making copies, but
each differs significantly from any of the others.

Bit-stream copying

Procedures for bit-stream copying are well understood. In making copies of a


bit-stream, well-established techniques that ensure that data are copied accu-
rately are applied, such as checksums. Regular backup routines are a key
component of the operation of any computing facility. They are commonly
practised as backup and restore, that is, backing up (copying) computer files
on a regular basis and restoring them if the data in the primary source (the files
from which the backup copy or copies are made) are corrupted or destroyed.
The backup files are often stored at sites away from the main site of operation,
which offers additional security in the event that one site is affected by disaster.
This is usually referred to as remote storage.
Although it is necessary for all digital preservation strategies, bit-stream
copying is not a long-term strategy because it does not address the key factors
that cause digital deterioration, mainly obsolescence of hardware and software.
It is effective as a risk management technique to minimize the possibility of data
loss through failure of hardware, media deterioration, sabotage, natural disasters
or other events. It should be considered as ‘the minimum maintenance strategy
for even the most lightly valued, ephemeral data’ (Kenney et al., 2003).

Refreshing

Whereas bit-stream copying is carried out so that a backup copy exists in case
there is a problem with the primary source files, refreshing takes a longer-term
Standard data formats 143

view. It refers to copying data from one storage medium to another of the same
type, for example, from a DAT tape that is becoming unstable to a new DAT
tape. The bit-stream is not altered and well-established techniques, such as
check-sums, are applied to ensure that the data are copied accurately. As with
bit-stream copying, refreshing is ‘a necessary component of any successful digi-
tal preservation program’ but does not address issues of obsolescence (Kenney
et al., 2003).

Replication

Keeping multiple copies – replication – is a long-established preservation tech-


nique in libraries, valued for its ‘built-in high redundancy and protections against
loss of information through vandalism, theft and disaster’ (Howell, 2001, p.143).
For example, a copy of a book may be kept in secure environment-controlled
storage as a preservation copy, with other copies available for use. The term repli-
cation is used here in two ways: as a planned activity that makes copies of digi-
tal objects and stores them in different locations; and as a planned approach
based on consortia of interested institutions that has as its core peer-to-peer
checking of files. The first, also known as offline backup and offline storage, has
already been noted in this chapter in the section about bit-stream copying. The
backup files are often stored at sites away from the main site of operation, which
offers additional security in the event that one site is affected by disaster.
The consortial approach is demonstrated in the LOCKSS (Lots Of Copies
Keep Stuff Safe) initiative, in which consortium partners agree to copy and
store selected digital objects, sharing their digital objects for the purpose of
peer-to-peer comparison and repairing files that become damaged. Replication
of digital materials had been noted as a feasible preservation strategy for digital
materials before it was implemented in the LOCKSS project: Exon suggested
that mirror sites around the world could store selected digital information ac-
cording to international agreement, so that ‘when the tides of history turn against
previously stable and peaceful areas of the world, their history can be saved
elsewhere, just like the Elgin marbles’ (Exon, 1995). (LOCKSS is noted in more
detail in Chapter 9.)

Standard data formats


Understanding file formats is a necessary precondition for the effective use of
many digital preservation techniques. In order to access and display digital
content it is necessary to decipher the bit-stream, to learn what the information
in that bit-stream represents. Characterization, the process of deciphering the
bit-stream, allows us to determine the likely format of a digital object and
144 ‘Preserve Objects’ Approaches: New Frontiers?

check the degree to which it conforms to the definitive specification of the


format. Characterization needs to be effective for digital preservation programs
to work efficiently.
File formats define how to interpret the bits in the bit-stream. They are ‘a
crucial layer, indeed a hinge between the bits in storage and their meaningful
interpretation’ (ERPATraining, 2004, p.1). If we do not know what the file
format is, digital objects remain only a collection of bits. Because they are so
critical to understanding and using digital objects, file format obsolescence is a
major challenge for digital preservation. (At least, this is the current under-
standing. David Rosenthal has argued that file format obsolescence is the ex-
ception rather than the norm, one reason being the recent widespread use of
networked services and another the maturing technology market (Rosenthal,
2010b and c). If Rosenthal’s view is valid, where we direct digital preservation
efforts and resources needs major reassessment.)
File formats would not pose so great a challenge to digital preservation if
there were fewer of them and if information about them was more accessible.
The wide range of standards and formats that need to be handled is readily
illustrated. A survey of RLG member institutions in 1998 identified at least 24
different storage formats. Of the 36 institutions with digital holdings, 24 main-
tained these in six or more different formats, with 10 or more different formats
in 13 institutions. The most common file formats were image files, text files
with mark-up, and ASCII files, with word-processing files, audio, video, and
spreadsheets also well represented (Hedstrom and Montgomery, 1999, pp.9-
10). In one week’s crawl of Danish web sites by the Danish Royal Library in
2003, 20 formats were identified from a total of 688,029 documents. Of the
total, 66.78 per cent were text files in HTML format, 19.17 per cent were image
files in GIF format, and 10.12 per cent were image files in JPEG format; that
is, 96 per cent were in HTML, GIF, or JPEG formats (Clausen, 2004, p.5).
Similar results were found in a 2006 harvest of Australian web sites for the
National Library of Australia. Almost 78 per cent were text files in HTML
format, 13 per cent were images files in JPEG, 5.3 per cent were images files in
GIF format, these three totalling 96.1 per cent. The next six most highly repre-
sented formats include PDF (1.2 per cent), plain text (0.5 per cent) and PNG
(0.4 per cent) (Fellows et al., 2008, pp.140-142). The range of formats present
in a range of UK institutional repositories was surveyed by Hitchcock and
Tarrant (2011). Heavily represented were JPEG, GIF, HTML, and PDF, as well
as discipline-specific formats for specialized respositories (such as Chemical
Markup Language files in a science data repository). A wide range of other
identified file formats were also represented. Of most concern, though, were
the high number of formats for which little information is available.
The problem posed by the multiplicity of formats (one estimate is than
15,000 (Clausen, 2004, p.4); the Wikipedia entry ‘List of file formats’ lists many
of them) is compounded by their nature. Many formats are proprietary, that is,
Standard data formats 145

they are the property of an owner who, for commercial reasons, is not willing
to provide access to documentation about them, and who may require a fee to
be paid for their use. Because one of the essential requirements of nearly all
digital preservation strategies and techniques is a thorough understanding of
file formats, the lack of access to full documentation about proprietary formats
presents a major barrier. By comparison, the documentation for open formats,
those that are in the public domain, is much more accessible. Consequently,
open formats are considered more favourably for use in digital preservation
applications than proprietary formats. To illustrate this point, consider the num-
ber of text document formats that the word-processing software in Open Office
(an open-source office software suite) can open: in addition to the OpenDocument
formats (.odt, .ott, .oth, and .odm), it opens formats used by earlier versions of
Open Office (.sxw, .stw, and .sxg), as well as the following:

– Microsoft Word 6.0/95/97/2000/XP (.doc and .dot)


– Microsoft Word 2003 XML (.xml)
– Microsoft Word 2007 XML (.docx, .docm, .dotx, .dotm)
– Microsoft WinWord 5 (.doc)
– WordPerfect Document (.wpd)
– WPS 2000/Office 1.0 (.wps)
– .rtf, .txt, and .csv
– StarWriter formats (.sdw, .sgl, .vor)
– DocBook (.xml)
– Unified Office Format text (.uot, .uof)
– Ichitaro 8/9/10/11 (.jtd and .jtt)
– Hangul WP 97 (.hwp)
– T602 Document (.602, .txt)
– AportisDoc (Palm) (.pdb)
– Pocket Word (.psw) (Open Office, 2010).

Figure 8.1 compares open and proprietary formats from the perspective of their
use in digital preservation.

Open format Open proprietary Closed proprietary


format format
What is it? Publicly shared Privately-owned Privately-owned
intellectual property, intellectual property. intellectual property.
usually maintained
by a standards
organisation.
Availability of Specification published Specification may No specification
the format without any be made available publicly accessible.
specification restrictions. with restrictions.
146 ‘Preserve Objects’ Approaches: New Frontiers?

Open format Open proprietary Closed proprietary


format format
How is it The format is The format is The format is
developed? developed through developed and developed and
a publicly visible, marketed by marketed by
community driven companies who companies who
process. control the way the control the way the
technology is used, technology is used,
to better their to better their
market position. market position.
How can it be Can be used and License holder has License holder has
used? changed by anyone exclusive control of exclusive control of
without restrictions, the technology to the technology to
other than licensing the (current or the (current or
conditions that may future) exclusion of future) exclusion of
limit development of others. others.
commercialised
versions of software.
What software Open formats are free Generally only Proprietary file
is needed to to be implemented by licensed formats such as
use it? anyone, including both applications are free Microsoft’s
proprietary and free to use this format. Windows Media
and open-source Audio can only be
software, using the accessed using the
typical licenses used software that
by each. produced that file,
or licensed
applications.
Examples Portable Network MP3 audio; Microsoft Word
Graphics (PNG); Microsoft Open (DOC), Microsoft
OpenDocument Office XML – e.g. Outlook, Excel
Format (ODF); JPEG; Docx (OoXML, also (XLS), Powerpoint
Extensible Hypertext issued as ECMA (PPT), Photoshop
Markup Language standard 376 and (PSD), Micosoft
(XHTML); Tagged ISO 29500 Access, RAW
Image File Format transitional). image formats,
(TIFF); Free Lossless WAV audio format.
Audio Codec (FLAC);
Portable Document
Format (PDF) – ISO
32000.
Conclusion Open formats – which Open proprietary Proprietary formats
are supported by a formats are a greater carry greater risk to
wide range of software risk than open long-term accessibi-
or are platform- formats because lity of the data they
independent – are rec- they are controlled hold – the lack of
Standard data formats 147

Open format Open proprietary Closed proprietary


format format
ommended for use by a corporate entity documentation of
where possible. Open under licensing the specification,
formats support long- arrangements which and licensing
term sustainability of may change. requirements for
data by allowing mi- software means the
gration from one tech- format is less future-
nical environment to proof.
another, without lock-
in to a specific vendor.

Figure 8.1: Formats – Open and Proprietary (Source: PARBICA Toolkit Guideline 18:
Digital Preservation)

Four main responses to the issues posed by the proliferation and complexity of
file formats can be identified: file format registries; standardizing file formats;
restricting the range of file formats handled in digital preservation systems;
and developing archival file formats such as PDF/A.

File format registries

One response to the issues presented by file formats is the establishment of file
format registries. These exist in the computer science arena and have some
applicability for digital preservation; one widely used example is the UNIX
file command in the UNIX operating system (Underwood, 2009). Specialized
registries have also been established for digital preservation purposes. They have
been developed to provide detailed and reliable information about file formats,
which are the key to many digital preservation activities. In a case study of file
formats that is still pertinent, Lawrence and his associates note that the risks
associated with migration, such as the differences between the source and tar-
get formats, are quantifiable, so they can be managed (Lawrence et al., 2000,
pp.12-13).
The kinds of file format problems that can arise as a result of migration are
illustrated in a report on missing fonts (Brown and Woods, 2011). Some soft-
ware performs font substitution without warning; Brown and Woods begin
their account with the common example of ‘PowerPoint presentations where
the slides were clearly missing glyphs (visible characters) or were otherwise
poorly rendered … the direct result of copying the presentation from the machine
upon which it was created to a machine provided for the presentation without
ensuring that the target machine has the required fonts’ (Brown and Woods,
2011, p.6). They estimate that only 79 per cent of digital documents are rendered
accurately using the fonts on a modern desktop computer; even if this can be
148 ‘Preserve Objects’ Approaches: New Frontiers?

increased to 92 per cent without too much additional effort, that still leaves a
significant percentage that does not render accurately.
Risks of this kind can be minimized if specifications of file formats are
available. The experiments reported by Lawrence and his associates found that
‘the most difficult aspect of this project was the acquisition of complete and
reliable file format specifications’ (Lawrence et al., 2000, p.13). They concluded
that publicly available information about file formats is vulnerable because it
relies on the efforts of interested individuals, so noted that it was essential to
establish ‘reliable, sustained repositories of file format specifications, docu-
mentation, and related software’ to support migration (Lawrence et al., 2000,
p.13). The Representation and Rendering Project at the University of Leeds
reached the same conclusions; accessible file format information fell ‘far short
of what is required to successfully tackle the problems of data obsolescence’
and the accuracy of most of what was available is ‘mediocre at best’ (Univer-
sity of Leeds, 2003, p.42). It specifically recommended that, because file for-
mat information available on the web is vulnerable, it should, as a matter of
urgency, be captured and preserved in a file format registry.
There are now a number of file format registries for digital preservation
and work to develop them continues. They include PRONOM, JHOVE and
JHOVE2, the Global Digital Formats Registry and the Unified Digital Formats
Registry, the DCC/CASPAR Representation Information Registry, and a listing
provided by the Library of Congress.
PRONOM is one file format registry set up specifically to support digital
preservation. It was developed by The National Archives (UK) to provide and
manage information about file formats and about the software applications used
to render these formats. PRONOM is a response to the short life cycle of soft-
ware and to the fact that older file formats are not necessarily supported by later
versions of software, or, if they are, only with alteration of formatting or content,
a situation inimical to the faithful reproduction of electronic records. Originally
developed as an in-house tool to support the National Archives’ digital preserva-
tion activities, it was quickly made publicly accessible on the web. PRONOM
makes publicly available the specifications of file formats and their related soft-
ware, and allows searching for file format, software, vendor, and migration
pathways among other options. Associated with PRONOM is the software tool
DROID (Digital Record Object Identification). DROID was developed by The
National Archives (UK) so that searching PRONOM for file formats could be
automated. It is open-source and, because it is written in Java, is platform-
independent. PRONOM is being developed further to support a wider range of
digital preservation processes. Its database of file formats and software is being
incorporated into the UDFR (Unified Digital Formats Registry). (This section is
based on the PRONOM web site (www.nationalarchives.gov.uk/pronom).)
JHOVE (JSTOR/Harvard Object Validation Environment) was developed
by JSTOR (www.jstor.org) and the Harvard University Library as a characteri-
Standard data formats 149

zation framework that provides information about and confirms file formats as
being valid and well formed. It is written in Java, so it can run on a range of
operating systems that support Java. JHOVE2 (https://bitbucket.org/jhove2/main/
wiki/Home), developed by the California Digital Library, Portico and Stanford
University, and based on input from the international digital preservation
community, enhances JHOVE and expands the range of formats it can iden-
tify (Abrams, Morrissey and Cramer, 2009).
Another file format registry developed to support digital preservation is the
GDFR (Global Digital Format Registry) set up by US partners to be ‘a distrib-
uted and replicated registry of format information populated and vetted by
experts and enthusiasts world-wide’ (www.gdfr.info). The GDFR web site pro-
vides use cases that identify some potential situations where a format repository
would be used. These include assessing the risk associated with a digital for-
mat, collection audit, validating the ingest of a digital object with a new for-
mat, monitoring for technology obsolescence, identifying rendering conditions
for a digital object, determining the appropriate migration path for a digital
object, determining the format of a given digital object, and identifying the
migration pathway for a format. The GDFR is a partner in the UDFR (Unified
Digital Formats Registry), established in 2009 in conjunction with The National
Archives (UK) and other international partners. The UDFR will be based on
the PRONOM database of file formats and the use cases developed for the
GDFR. Expected to be operational in 2012, the UDFR will provide web access
to its content, an API (an application programming interface – software that
allows different software programs to communicate and interact with each
other) to allow interaction between the Registry and local repositories, the ability
to work with DROID, and other features (www.udfr.org). A use case for the
UDFR, explained by the Library of Congress’s Chief of Repository Develop-
ment, illustrates how it is envisaged to operate:

‘Say the archive of a famous writer was written with an obsolete program, such as
WordStar, which would need to be either rendered for use, or migrated to a more cur-
rent system.’ So, a decision would have to be made on which program or tool would be
used to extract the information from the archive. … ‘UDFR will provide the documen-
tation to help make the decisions, and be incorporated into the tools themselves to
make preservation format analysis and action easier’ (Manus, 2011b).

Another format registry, the DCC/CASPAR Representation Information


Registry, started life as a DCC initiative to develop a registry of representa-
tion information. It combined with CASPAR (Cultural, Artistic and Scientific
knowledge for Preservation, Access and Retrieval), an EU-funded research
project that ran from 2005 to 2009 (www.casparpreserves.eu). The Registry
was developed as part of the CASPAR Project’s Representation Information
(RepInfo) Toolbox to support the creation, maintenance and reuse of represen-
tation information. It includes file format information and is accessible and
150 ‘Preserve Objects’ Approaches: New Frontiers?

searchable through a web interface (registry.dcc.ac.uk:8080/RegistryWeb/


Registry/index.jsp). The Library of Congress provides file format information
on its Sustainability of Digital Formats web site (www.digitalpreservation.gov/
formats/intro/intro.shtml). Many other file format registries, mostly established
and maintained by computer scientists or enthusiasts rather than by those
involved in digital preservation, are used to support digital preservation; an
example is Wotsit’s Format: The Programmer’s Resource (www.wotsit. org).
The file format registry response to file format issues attempts to deal with
all file formats, making no distinction among them. Other responses are based
on restricting the range of file formats, both those in which files are created
and those which digital archives agree to manage and preserve.

Standardizing file formats

If ‘good’ file formats are used for creating digital materials, the difficulties of
preserving these digital materials will be minimized. This is the thinking behind
the digital preservation approach centered on encouraging document creation
in file formats that are most likely to be sustainable over the long term. Investi-
gations into the effectiveness of different formats for this purpose are ongoing.
One example of earlier research comes from the Nationaal Archief of the
Netherlands. In its search for ways to ensure sustained accessibility to authentic
archival records in the long term the Digital Preservation Testbed at the
Nationaal Archief of the Netherlands investigated the sustainability of different
record types (text documents, emails, spreadsheets, databases) in different file
formats – MS Word or WordPerfect for text documents, Outlook, Eudora,
Novell GroupWise, Hotmail, and Kmail for emails, MS Excel and Lotus for
spreadsheets, and MS Access and Oracle for databases. Together these account
for more than 90 per cent of Dutch Government records (Slats, 2004).
What are the characteristics of ‘good’ file formats? They are the formats
that are most likely to be viable for longer periods, and are most likely to be
open-source and widely available and supported. Many have written about
this, including Clausen, who tells us that most file formats ‘fall out of fashion
within a few decades’ (Clausen, 2004, p.23) and suggests criteria by which we
can predict the ongoing viability of a file format: openness criteria (for example,
whether there is an open, publicly available specification for the format); port-
ability criteria (for example, independent of hardware and operating systems);
and quality criteria (for example, robustness, simplicity) (Clausen, 2004,
pp.11-12). Another list of criteria is provided by The National Archives (UK)
(Brown, 2008a): ubiquity (how widely adopted); support (amount of current
software support); disclosure (availability of documentation); documentation
quality (how comprehensive, accurate, and understandable); stability (frequency
of change); ease of identification and validation (ability to accurately identify);
Standard data formats 151

intellectual property rights (the fewer the better); metadata support (can
metadata be included); complexity (not overspecified); interoperability; viability
(has built-in error detection capability), and; re-usability.
There is consensus in the literature on five main criteria:

– adoption: the extent to which use of a format is widespread


– technological dependencies: whether a format depends on other technologies
– disclosure: whether file format specifications are in the public domain
– transparency: how readily a file can be identified and its contents checked
– metadata support: whether metadata is provided within the format (Todd, 2009, p.2).

Other criteria are also commonly noted:

– reusability / interoperability: can the format function with a variety of services


– robustness / complexity / viability: is the format inherently simple
– stability: is the format part of a managed release cycle …
– intellectual property / digital rights protection: whether rights complicate preservation
(Todd, 2009, p.2).

Three criteria are occasionally addressed in the literature:

– the ability of formats to convey content information


– extent, or ‘verbosity’ of format and
– cost (Todd, 2009, p.2).

Using standard file formats is not a long-term digital preservation strategy. It


does not address the obsolescence of file formats, but merely slows down the
rate at which it occurs. It is based on the assumption that, for popular file for-
mats, obsolescence and compatibility problems will be addressed because of
their large number of users, who will require that software to handle and ren-
der this format be upgraded so that their files can be used in new operating
systems and on new hardware platforms. However, this strategy is not a per-
manent preservation solution, as it lessens the immediate threat to digital objects
rather than ensures their accessibility and viability over time.
Identifying preferred file formats and requiring that digital objects are
created using them is common practice. One example is the use of TIFF
(Tagged Image File Format) to create preservation master images in many
digitization programmes. TIFF is used in this context because it is open and
stable and has a large user base. Creating digital materials in standard formats
can be encouraged in situations where some creators of the material can be
influenced, for example, in-house digitizing programmes. Format standardiza-
tion is suitable for use in situations where the file formats can accommodate all
or most of the characteristics of the digital materials. Consider a document
created in a current version of a standard office word-processor, which, when
converted to ASCII, will lose formatting. While this may not be considered an
152 ‘Preserve Objects’ Approaches: New Frontiers?

important loss for some kinds of materials, it will be significant for others. In
this case another file format that retains formatting, such as PDF, could be
considered.
A point in favour of the strategy of requiring standard file formats is that it
is likely to slow the rate at which file formats become obsolete. If widely used
and supported file formats, or very basic formats such as ASCII, are selected,
then the software available to encode, decode and render these formats is
likely to be available for longer periods of time than those for less popular
formats. This strategy is not suitable for digital objects where selecting a dif-
ferent file format results in a loss of characteristics that are essential for under-
standing the object (UNESCO, 2003, p.123).

Restricting the range of file formats

A strategy related to restricting the range of file formats in which files are
created is for the digital archive to restrict the range of file formats that it
agrees to receive and manage. The archive converts material it receives into
these formats, which are chosen for the extent to which they address require-
ments such as openness, portability, functionality, longevity, and preservability.
This practice has a long history in digital archives. Examples frequently cited
are the Florida Digital Archive, which indicates the formats to which it will
provide full preservation support (Florida Digital Archive, 2009), the UK Data
Archive, which notes optimal data formats for long-term preservation (UK
Data Archive, 2011?), and ICPSR (the Inter-University Consortium for Political
and Social Research archive), based at the University of Michigan in the US,
where digital files deposited are converted from their original format to a format
that has been determined by ICPSR to be a preservation format, and both the
original file and the normalized file are retained (ICPSR, 2007). The Wellcome
Library adds further criteria that take account of that library’s technical ability
to manage the formats over time (Thompson, 2010).
This strategy is most suitable where the digital archive is responsible for
digital materials that are uniform in structure and content. The archive may be
able to influence the creation of these materials. For example, a government
archives may specify and encourage, or require, that records created by its
agencies are in the standard formats and that it will only accept records in those
formats. Requirements for the successful operation of this strategy include
clear submission guidelines, effective data conversion processes (if data conver-
sion is needed), and quality control checking to a high standard (UNESCO,
2003, pp.128-129). Clausen proposes this strategy for web archiving. He
recommends that the file formats harvested and added to the web archive
should be constantly monitored so that action can be taken when new formats
become widespread and old formats fall out of favour. He also suggests that
Standard data formats 153

the files received should be retained in their original format as well as in the
converted and migrated versions ‘to allow for higher-quality conversion or
emulation at a later stage’ (Clausen, 2004, pp.23-24).
The use of stable, open file formats has the same advantages as those noted
in the previous section. The file formats selected will adhere to the characteris-
tics of openness, portability, functionality, longevity, and preservability. In
particular, the strategy has the potential to use the resources available for digital
preservation in the most efficient way to ensure that resource requirements,
such as specific software and expertise, are minimized and that the need for
customized attention during migration processes is reduced. It is most suitable
for materials for which retention of content is essential but formatting and
other characteristics are less significant. While this strategy ‘reduces the range
of problems needing to be managed’ (UNESCO, 2003, p.128), it needs to be
flexible enough to respond to the emergence of new formats.

Developing archival file formats

Another response to the issues posed by the proliferation and complexity of


file formats is the development of file formats with characteristics that will
ensure their sustainability. One of these is PDF/A, a variant of PDF. PDF was
originally a proprietary file format, but became an open standard ratified by the
ISO in 2008 (ISO 32000-1:2008). PDF, which reproduces the visual appearance
of the digital material, is in widespread use. Because it allows the physical
appearance as well as the content of digital materials to be reproduced, it is
considered to be especially well suited for situations where authenticity is
required. However, PDF is not an archival format. Some of its features, such
as allowing parts of the document (or example, specifications of fonts) to be
defined outside the format, are incompatible with long-term sustainability or
authenticity.
In order to ensure the sustainability of PDF over long periods of time, an
International Joint Working Group was established under the auspices of the
International Organization for Standardization to prepare an archival version
of the PDF format. A draft standard for PDF/A, ISO 19005-1 Document
Management: Electronic Document File Format for Long-Term Preservation
– Part I: Use of PDF 1.4 (PDF/A-1), was circulated for comment late in 2003,
and was formally approved in 2005 as ISO 19005-1:2005. A newer version,
PDF/A-2, was published in 2011 as ISO 19005-2:2011 Document management
– Electronic Document File Format for Long-Term Preservation – Part 2: Use
of ISO 32000-1 (PDF/A-2). The PDF/A standard defines a format that will
ensure that PDF documents can be rendered consistently and predictably in the
future. The main difference between the PDF/A and the PDF formats is that
PDF/A documents must be totally self-contained; that is, all of the information
154 ‘Preserve Objects’ Approaches: New Frontiers?

needed to display a document must be present in the file, and cannot refer
to external sources (for example, fonts). This means that many document
characteristics, such as audio content, video content, JavaScript, executable
files, and encyption must not be present. The PDF/A standard also mandates
the use of metadata to specified standards. Its viability for preservation was
examined at Ohio State University’s archives and library (Noonan, McCrory
and Black, 2010). Because it is still relatively new, PDF/A has not yet been
widely adopted, but it has the potential to be infuential.
Another example is the image format JPEG 2000. It is being adopted by
cultural heritage institutions to replace uncompressed TIFF, but modifications
to its current specifications are needed before it should be widely adopted (van
der Knijff, 2011). An international community, including the JP2K-UK work-
ing group (wiki.opf-labs.org/display/JP2/Home), is collaborating on developing
JPEG 2000 to make it more effective for use in the heritage sector.

XML
XML (eXtensible Markup Language) is now considered as an important archi-
val data format. XML is non-proprietary, being based on ISO 8879-1986 (a
standard for SGML, the Standard Generalized Markup Language) and is con-
sidered to promise long-term sustainability. XML is not, strictly speaking, a
file format; rather, it is a set of rules to describe data and documents, a mark-
up language in the same sense that HTML, widely used for web documents, is.
XML was specifically designed to be used independently of any hardware
platform and is supported by many open software applications.
A principal reason for XML’s favourable consideration for digital preser-
vation, in addition to its openness, is its stability and longevity. It has been
used since 1996, so there is a considerable amount of knowledge about it in the
IT industry, and it is well supported by open-source software applications. The
National Archives of Australia adopted XML for digital preservation because
‘even if the IT industry replaces XML with another data format in the future,
we will still be able to create our own XML tools for as long as we wish be-
cause all the information needed to construct XML tools is publicly available’
(Heslop, Davis and Wilson, 2002, p.18). The National Archives of Australia
converts digital records in proprietary data formats into equivalent formats in
XML. It has defined XML schema for common formats (for instance JPEG
and PNG) and these are available on the National Archives web site. It has
developed open-source tools that convert digital materials in some proprietary
formats into XML. One of these tools is Xena, described in the ‘Encapsulation’
section later in this chapter.
XML is used in digital preservation contexts in three main ways: for
characterization purposes, to express digital objects and/or associated data (such
XML 155

as metadata) in standard ways; to provide a wrapper to keep bit-streams, meta-


data and other information together (this use is described in more detail in
the ‘Encapsulation’ section later in this chapter); and as a standard for express-
ing metadata (noted in Chapter 5). Standardization by the application of XML
means that digital objects can more readily be managed and processed by auto-
mated means. The UNESCO Guidelines provide three examples: a large-scale
project to apply XML tags to emails at the San Diego Supercomputer Center;
XML tagging of emails and other common record formats by the National
Archives of Australia; and XML representation of database tables by the VERS
project at the Public Record Office Victoria in Australia (UNESCO, 2003,
p.125).
XML was investigated in the Dutch Digital Preservation Testbed study
From Digital Volatility to Digital Permanence: Preserving Email (Digital
Preservation Testbed, 2003) where, in comparison with migration and emula-
tion, it was found to be the most suitable preservation strategy for emails.
Portico uses XML for preservation purposes. Portico was established as a
permanent archive for the content of journals and by August 2011 held the
output of more than 130 publishers, with more than 12,000 journal titles and
102,000 e-book titles (www.portico.org/digital-preservation/the-archive-content-
access/archive-facts-figures). It preserves content in an application-neutral
way by producing an archive of XML metadata plus the digital objects. Each
archived object is packaged in a ZIP file containing the original publisher-
provided digital artifacts, any Portico-created digital objects, and associated
XML metadata. The metadata developed by Portico includes structural, tech-
nical, descriptive, and events metadata preserved in a Portico-defined XML
file (Morrissey et al., 2010).
One of the issues currently perceived with the XML approach is that it is
based on close definition of elements of the data structures; the ‘essence’ (see
Chapter 5) has to be defined in advance. While this works well for simple data
structures such as emails, it is more difficult to capture and sustain all of the
characteristics and behaviours of more complex kinds of materials. It is, there-
fore, applied primarily to ‘structured or semi-structured data or documents for
which retention of content, semantics and relationships is more important than
any particular display characteristics’ (UNESCO, 2003, p.126). In other words,
XML allows content to be reproduced, but is less accurate in representing the
original formatting and layout of the document; it ‘is not a one-size-fits-all
solution’ (Aschenbrenner, 2004). However, as simple documents constitute a
very high proportion of the digital materials in many contexts, such as record-
keeping environments, this approach has proven to be useful.
156 ‘Preserve Objects’ Approaches: New Frontiers?

Migration
We first need to establish what migration refers to in the context of digital
preservation. Is it format migration, software migration, or version migration?
In its simplest definition migration is ‘to copy data, or convert data, from one
technology to another, whether hardware or software, preserving the essential
characteristics of the data’ (Kenney et al., 2003). A commonly-encountered
longer definition is that migration is ‘a set of organized tasks designed to
achieve the periodic transfer of digital materials from one hardware/software
configuration to another, or from one generation of computer technology to a
subsequent generation’. (The first use of this definition appears to have been in
the report of the Task Force on Archiving of Digital Information (1996).)
Another definition describes migration as ‘a means of overcoming technologi-
cal obsolescence by transferring digital resources from one hardware/software
generation to the next’, the purpose being ‘to preserve the intellectual content
of digital objects and to retain the ability for clients to retrieve, display, and
otherwise use them in the face of constantly changing technology’ (Digital
Preservation Coalition, 2008, p.26). Migration, then, addresses the problems
caused by obsolescence of technology (hardware and software) and file for-
mats so that the intellectual content of the digital objects migrated is preserved
and the ability for users to retrieve, display, and use them is retained.
The term migration is sometimes used in the same sense as refreshing
(defined earlier as ‘the copying of data onto new media’), but it is considerably
more than that. Refreshing is carried out because of concerns about obsoles-
cence of the physical carrier of data; migration is additionally concerned with
obsolescence of data formats and attempts to ensure that file formats remain
readable. Migration is included in this chapter, rather than in Chapter 7, because
it is based on ensuring that the digital materials remain accessible in whatever
technology is current. Migration places a high premium on preserving the
integrity of digital objects so that they remain unchanged when used and ren-
dered, but it must also be recognized that migration inevitably changes the
data that are migrated. This raises concerns about the authenticity of migrated
data.
There are many varieties of migration. Thibodeau, for example, notes simple
version migration, format standardization, typed object model conversion,
Rosetta Stones translation, and object interchange format (Thibodeau, 2002,
pp.23-24). The most commonly used migration process is simple version migra-
tion – migration within the same software product. Because commercial soft-
ware developers provide this kind of migration there is always the likelihood
that they will discontinue support when they update their products. An exam-
ple is Microsoft Word, where the most recent version works with earlier ver-
sions of Word, although the number of versions for which backwards compati-
bility is possible is limited by the manufacturer. Another migration process is
Migration 157

format standardization, noted elsewhere in this chapter as a prerequisite or fa-


vourable pre-condition for successful migration over time.
Wheatley discusses in more detail the different kinds of migration and offers
an extended example taken from his work on the Domesday Project
(Wheatley, 2001). The definition in the Task Force on Archiving of Digital
Information 1996 report (noted above) specifies ‘a set of organised tasks’ and
Wheatley defines these tasks in terms of the preservation techniques then
available. His listing includes:

– Minimum preservation – preserving a copy of the bit-stream (noted in this


chapter as bit-stream copying and refreshing)
– Minimum migration – migration that uses simple manual or automated
techniques to improve human viewing; for example, removing all characters
except common ASCII characters from a word-processed file, resulting in
a text file that is easy to access but no longer has any formatting or struc-
tural information
– Automatic conversion migration – uses software tools to interpret digital
material and modify it into a new form; for example, a file from obsolete
word-processing software is output as a current Word document. Wheatley
notes this as ‘a good example of a traditional view of migration’.

Wheatley suggests that the minimum migration approach has the advantages of
simplicity and ease of execution, and, although all functionality is lost, it pro-
vides a ‘cheap and reliable way of getting to a substantial amount of the intel-
lectual content’. It has potential for application as a ‘useful stop gap measure’
as long as the original bit-stream and documentation are also maintained.
No matter how it is defined, migration has been, with emulation, the prin-
cipal preservation technique applied to date. It has a long history and was noted
by one writer as ‘the only serious candidate thus far for preservation of large
scale archives’ (Granger, 2000). There is considerable expertise in migration
on the part of data administrators and computer centre administrators.
The literature of digital preservation includes some substantial studies of
migration. An early example is the report Preserving the Whole: A Two-Track
Approach to Rescuing Social Science Data and Metadata (Green, Dionne and
Dennis, 1999), a detailed case study of preserving data in obsolete column
binary format, and its associated documentation. This case study is based on
the experience of the data archiving community, in particular that of Yale Uni-
versity Library’s management of social science numeric data since 1972.
It identified that the best strategy for data in column binary format was to
convert them to ASCII, because this format is software-independent and also
preserved the original content. This approach did, however, require ‘a file-by-
file migration strategy’ that is time-consuming for large numbers of files
(Green, Dionne and Dennis, 1999, p.24). One conclusion of this study was that
158 ‘Preserve Objects’ Approaches: New Frontiers?

the existence of detailed documentation about the obsolete column binary for-
mat meant that there were many options available to migrate this format to
others and also to read data in it (Green, Dionne and Dennis, 1999, p.[vi]). A
Dutch study of migration investigated the issues of large-scale migration, in
the order of 100 terabytes (Van Diessen and Van Rijnsoever, 2002). The quan-
tity of data posed major resource issues; writing them to optical storage media
with a capacity of about 5 gigabytes and a write speed of 4 megabytes per sec-
ond would take at least 290 days (Van Diessen and Van Rijnsoever, 2002, p.v).
This report developed ‘medium migration indicators’ which, when applied in
conjunction with ‘changes in the system architecture or load characteristics’,
can be used to ascertain when migration needs to be carried out. The report
also notes the actions required to manage media migration effectively in an
electronic deposit system (Van Diessen and Van Rijnsoever, 2002).
Migration is frequently used together with normalization (restricting the
number of file formats). The preservation processes implemented at ICPSR,
referred to above, applies migration and normalization together as its primary
digital preservation strategies. Normalization limits the range of file formats to
be preserved ‘to ensure that the digital preservation load is manageable’ and
that file formats are in the most recent rather than obsolete versions. Related
material, such as data files and documentation files, are converted to file for-
mats ‘that are as close as possible to ASCII for text, or TIFF for images, to
enable preservation’. Digital content is regularly copied from older to newer
storage media. Systematically applied policies are in place to govern the migra-
tion processes (McGovern, 2007b). The Florida Digital Archive also uses nor-
malization and migration together in a migration strategy that aims to be ‘as
lossless as possible, and retain the content, appearance, and behaviors of the
source’ (Caplan, 2007, p.307). File formats handled though migration, selected
based on the assumption that most files from participating libraries will be
documentary files of text, image, or sound, as opposed to executable software,
are AIFF, AVI, JPEG, JP2, JPX, PDF, TIFF, WAVE, XML, XML DTD, Quick
Time, and plain text. Migration workflows and routines have been developed
for these formats. Files in other formats for which no migration routine has
been developed are preserved as bit-streams and may be migrated in the future
if the archive receives a request for access to them. The development of migra-
tion routines for new file formats requires considerable resources, so files
taken in by the archive are normalized to a file format that the archive does
support. For example, PDFs are converted to page image TIFFs during the
ingest process.
Migration, as suggested above, is the preferred preservation strategy for most
digital archives. It is favoured for the reasons already noted – its long history
of use, and our experience with it, at least for some kinds of materials and
formats. Key criteria to consider when choosing an appropriate digital preser-
vation method or combination of methods are the nature of the data and digital
Migration 159

objects, the technical infrastructure required, costs, organizational factors, and


rights management. How do these criteria apply to migration? Although keep-
ing the original bit-stream is standard practice when migrating data, for very
large data sets the costs of doing this may be prohibitive. Knowing what the
requirements of the designated community are can help us determine the char-
acteristics of data to be preserved (see the section ‘Significant properties’ in
Chapter 5). The technical infrastructure required comes into play: is the tech-
nology required to get data off obsolete media in order to migrate them avail-
able? Cost is a critical criterion: is the particular strategy affordable? Migration
is labour-intensive, because the requirements to carefully document the process
in metadata and to implement strict quality-check procedures require ongoing
effort that uses considerable resources. Further, are the ongoing costs of main-
taining the migrated data over time feasible? It should be kept in mind that
deciding to migrate data usually, probably inevitably, requires ongoing future
migrations which also have associated costs. Organizational factors, such as
disciplinary preservation strategies, may require particular preservation ap-
proaches; a summary of practice of 16 case studies from a range of disciplines
indicates that ‘disciplinary differences exist ... on a variety of technical and
behavioural levels’ and present challenges (Key Perspectives Ltd, 2010, p.3).
Rights management also plays a key role: are the rights to digital objects available
so they can be managed over time?
However, although migration is heavily used as a digital preservation
strategy, the list of arguments against it is long. Rothenberg, whose view is
that migration is ‘essentially an approach based on wishful thinking’ (Rothen-
berg, 1999a, p.15), trenchantly summarizes these when he characterizes migra-
tion as ‘labor-intensive, time-consuming, expensive, error-prone, and fraught
with the danger of losing or corrupting information’ (Rothenberg, 1999a,
p.13). Migration needs to occur on a regular basis throughout the life of the
materials, each time different in that it requires analysis of data structures,
rules to be developed to control the transformation from one data format to
another, programs to be written, and quality control procedures (UNESCO,
2003, p.135). Migration, therefore, has significant ongoing costs associated
with it. It is unlikely to benefit from increases in computing power, Rothen-
berg suggests, because it is ‘highly dependent on the particular characteristics
of individual document formats’. Rothenberg also considers it unlikely that
automated migration techniques will assist; in fact, he encourages us to consider
them ‘with great suspicion because of their potential for silently corrupting
entire corpora of digital documents by performing inadvertently destructive
conversions on them’ (Rothenberg, 1999a, p.15). Complex materials are not
generally considered to be migratable because the loss of functionality and the
compromising of integrity and other attributes of the materials will be outside
acceptable ranges (Beagrie, 2001, p.104). Reliable data about the costs of
migration are only now starting to become available (noted in Chapter 10).
160 ‘Preserve Objects’ Approaches: New Frontiers?

The strongest argument against migration is that the process of migration


changes data. Change is fundamental to the process. The digital encoding of the
digital objects may be changed to make them more suitable for preservation, or
the original file or object may be changed so they can be better processed when
they are accessed. Each time objects are migrated, more changes are made. Over
time these may cumulate into major alterations to the digital object, so their
characteristics and behaviour differ from the original object. Because migration
has the potential to result in loss of functionality and compromising of integrity,
strict quality-checking procedures must be established and implemented.
That migration, a commonly-applied primary digital preservation strategy,
changes the material it is intended to preserve raises a serious question: how
can the requirements of longevity (the availability of digital materials for as
long as their designated community requires them), integrity (the availability
of digital objects that have not been manipulated, forged, or substituted), and
accessibility (the ability to locate and use the digital objects) be met? (These
characteristics are identified as essential requirements of digital preservation in
Chapter 1.) The InterPARES Project (noted in Chapter 5) has contributed
significantly to our understanding of how to meet these requirements.
Many tools for migration are available. For format migration in one insti-
tution the open-source ImageMagick for image format migration and SoX
(also open-source) for sound file migration is used. Another institution uses
commercial software tools: Acrobat Professional, Solid PDF, and Amber Lotus
1-2-3 Converter. An experiment applied readily available open-source software
to migrate social science and scientific data from legacy media (CD-ROM,
DVD, floppy disks): OpenOffice was applied to convert Microsoft Word,
PowerPoint, and Corel WordPerfect files to PDF (Woods and Brown, 2008).
The Planets (Preservation and Long-term Access through Networked Services)
project, ending in 2010 and continuing in the Open Planets Foundation, deter-
mined that enhancing existing migration tools was more effective than develop-
ing new migration tools from scratch (Zierau and van Wijk, 2008).
Digital preservation practices have often been applied on a small scale.
Because migration, as currently practised, is labour-intensive, the quantities of
digital materials that can be migrated is limited. We need to be able to migrate
larger quantities of data, and there has been some research into automated digital
preservation workflows and procedures. Automated migration tools have been
developed by projects such as CASPAR and Planets (digital preservation pro-
jects funded by the European Union), the DCC in the UK, and by NDIIPP (the
National Digital Information Infrastructure and Preservation Program) in the
US. Lists of data curation tools are readily available on the web, such as the
DCC’s Digital Curation Tools (DCC, 2010) and the NDIIPP Partner Tools and
Services Inventory (Library of Congress, 2010).
Migration is considered to be best suited to large collections of digital
objects that are simple in structure (for example, bitmap page images, ASCII
Viewers and migration on request 161

files, or well-defined XML formats). For migration to be successful certain


requirements should be met. These include (based on Digital Preservation
Coalition, 2008, pp.112-113 and UNESCO, 2003, p.135):

– Written policies and guidelines


– Rigorous documentation of the procedures followed
– Strict quality control procedures, applied during the development stages and
also after migration
– Retention of copies of the original material if some essential elements may
be lost in migration
– Testing the migration process before it is fully implemented.

Viewers and migration on request

Migration on request – that is, migration that is triggered by a request to access


the digital material – was promoted initially by the CAMiLEON Project (noted
in more detail in Chapter 7). Migration on request differs from other migration
techniques in that the point at which the migration is applied is not the same.
The trigger that prompts migration is a request to access digital material. For
example, a request for access to a digital object that was created in a now ob-
solete software package could be met by providing a copy of the original bit-
stream plus a migration tool (such as a conversion application or a viewer) that
can run on a current computing platform. A viewer is a small piece of software
that allows a file created in a particular program to be read on any computer,
even if the program is not installed. Viewers are in common use; Microsoft,
for example, provides free viewers that allow people to view, but not edit,
documents produced in its Office software.
This approach, which Clausen calls ‘conversion on demand’ (Clausen,
2004, p.16), has two main advantages over standard migration approaches.
First, it reduces the possibility of cumulative errors (loss of functionality or of
content), typically introduced during each migration, so that loss of essential
properties of the digital material and of the attributes which contribute to its
authenticity is minimized. Second, it has the potential to reduce the costs
incurred in regular migration. Because the migration is performed in response
to user requests for material, mass migration, with its significant resource
demands, is avoided. Against these advantages must be set some potential
risks. One is that it may not be possible to develop viewers or software tools that
work for complex materials. Another is that not all elements of the material
will be re-presented satisfactorily by the viewers. And, of course, viewers and
other software tools needed for migration on demand will themselves need to
be migrated (UNESCO, 2003, p.139).
162 ‘Preserve Objects’ Approaches: New Frontiers?

Migration on request is used in the LOCKSS system, where it is called


migration on access. LOCKSS preserves content in its original format and
migrates content on the fly, if required, when preserved content is requested.
The arguments for this approach include lower resource implications, and
fewer problems arising from migration because the content is migrated directly
from the original to the current format. Migration on access is triggered in
LOCKSS when a request is received for materials that cannot be read in the
requester’s browser. The request lists formats that the browser can display. If
the browser cannot read the original format of the preserved content, LOCKSS
selects a converter that can convert the original format to a format the browser
can display and creates a temporary access copy using the converter (lockss.
stanford.edu/lockss/How_It_Works).
Migration on demand appears to be of most use for digital materials for
which there is low demand and for which the cost of regular migration is high.
LOCKSS and other implementations have demonstrated its value in that con-
text.

Encapsulation
Encapsulation has been on the digital preservation agenda for over two decades.
Encapsulation is a technique of ‘grouping together a digital resource and what-
ever is necessary to maintain access to it ... [including] metadata, software
viewers, and discrete files forming the digital resource’ (Digital Preservation
Coalition, 2008, p.117). In some cases, rather than including the actual soft-
ware that reads the data in the package, metadata that points to the software at
another location or to the software’s specifications is included. Encapsulation
is widely used; for example data, metadata and software are often wrapped in
XML metadata. This is the approach that has been taken in the VERS Project
at the Public Records Office of Victoria, Australia (210.8.122.120/vers).
VERS accepts records in a range of formats, including text files, PDF, PDF-A,
JPEG, TIFF and MPEG, which are encapsulated in an XML wrapper with
metadata developed according to defined standards and authenticated using a
digital signature. The OAIS Reference Model’s Information Packages, described
in Chapter 5, are another example of encapsulation.
The advantages that the Digital Preservation Coalition’s Handbook notes
as being offered by encapsulation as a digital preservation strategy lie in the
way it groups together relevant information, so that ‘all supporting information
required for access is maintained as one entity’ (Digital Preservation Coalition,
2008, p.117). There is some skepticism, however, in the statement that ‘osten-
sibly, the grouping process lessens the likelihood that any critical component
necessary to decode and render a digital object will be lost’ (Kenney et al.,
2003). In theory, everything that is necessary to access the digital object, along
Storage 163

with the digital object itself, is maintained into the future, but this is perhaps a
risky approach to take because ‘the only test of an encapsulated specification is
at the point it is used to implement a rendering tool. The risk of missing vital
information in the specification seems to invalidate this approach’ (University
of Leeds, 2003, p.39).
Encapsulation by itself is not considered a viable digital preservation tech-
nique because it ‘does not really address the basic problem of technological
change’ (UNESCO, 2003, p.127). The software encapsulated will still become
obsolete. It is a key part of emulation, and is best considered as a strategy that
is either a part of or a prerequisite for other digital preservation approaches.
The UNESCO Guidelines suggest that encapsulation is best considered as ‘a
basic good practice for all objects’, one that ‘may facilitate other strategies’
(UNESCO, 2003, p.128).
Encapsulation is one of the functions performed in Xena (XML Electronic
Normalizing for Archives; xena.sourceforge.net), open-source software devel-
oped by the National Archives of Australia for its digital preservation pro-
gramme. Xena detects the file format of a digital object, normalizes that format
to an open format, provides some metadata, and wraps all this in an XML
wrapper to produce a file with the .Xena extension. Xena is written in Java and
runs on the Linux, Windows, and Mac OSX operating systems.

Storage
All digital preservation activities require reliable techniques for storing large
quantities of data. ‘Storing digital objects leads, if nothing else, to large byte-
counts’, noted Linden and his associates, pointing out in 2005 that individuals
routinely dealt with gigabytes of data for their personal archiving requirements
(Linden et al., 2005, p.3). Just five years later these quantities can be measured
in terabytes. Although digital storage systems can be considered as a basic part
of computer system management and, therefore, outside the scope of this book,
they are included here because one of the fundamental routines required by
digital preservation is the secure storage of large amounts of data. Concepts
and practices such as ensuring security of computer systems, redundancy,
RAID, networked storage and virtual storage are all part of the knowledge re-
quired to develop and maintain digital storage systems that will provide secure,
long-term storage of both the digital materials and associated metadata.
A storage system for digital preservation can be simple or very complex,
depending on what is required of it. At its most basic, it is little more than a
storage and backup system. At its most complex, it is a fully automated system
with sophisticated, robust safeguards built in. Typically it requires a high
financial outlay on hardware and software, and ongoing financial investment
in running, upgrading, and maintenance. Although the costs of hardware and
164 ‘Preserve Objects’ Approaches: New Frontiers?

storage continue to decrease, other costs associated with running a reliable


storage system do not.
Technical specifications for storage systems are available. The Digital
Preservation Coalition published a Technology Watch Report on mass archival
storage systems, based on experience at the British Library, which includes
specifications and notes lessons learned in the process (Linden et al., 2005).
Calls for tender and similar documents available on the web are a further
source of technical specifications. Adrian Brown (2008b) provides criteria for
selecting removable storage media for preservation purposes, concluding that
LTO (Linear Tape Open), DVD-R and hard disks were the preferred options
according to his criteria at that time.
Whatever the criteria, the reliability of storage is of critical concern to
digital preservation. Our current state of knowledge about bit preservation is
characterized by Rosenthal as:

– The more copies, the safer. As the size of the data increases, the per-copy cost in-
creases, reducing the number of backup copies that can be afforded.
– The more independent the copies, the safer. As the size of the data increases, there
are fewer affordable storage technologies. Thus, the number of copies in the same
storage technology increases, decreasing the average level of independence.
– The more frequently the copies are audited, the safer. As the size of the data increases,
the time and cost needed for each audit to detect and repair damage increases, reduc-
ing their frequency (Rosenthal, 2010d).

This is predicated on reliable storage. Rosenthal examines statistics about


storage failure and concludes that ‘in the real world, failures are inevitable’,
noting that we need different ways of detecting and managing them.
As well as the technical requirements, digital storage systems require well-
defined management processes, including:

– allocating ‘unambiguous responsibility for managing data storage and pro-


tection’
– appropriate technical infrastructure to do the job
– system capabilities such as: sufficient storage capacity; demonstrated reli-
ability; ability to manage redundant storage (‘as digital media has a small,
but significant failure rate, redundant copies of files at every stage are a
necessity’); error checking; means of storing metadata and of reliably linking
metadata (International Association of Sound and Audiovisual Archives
Technical Committee, 2004, pp.51-52).

Commercial data archiving services are available and have become big business.
They offer state-of-the-art data security and fully automated backup and restore
procedures. Some of them are specifically tailored to digital preservation; one
Combining principles, strategies, and practices 165

example is OCLC’s Digital Archive (www.oclc.org/digitalarchive), which offers


secure managed storage, automated monitoring and ‘health record’ reports on
data, and integration with workflows for OCLC and other products.
Cloud storage (defined in various ways, which have in common that it is
not owned and managed by a digital archive and is accessed over a network
using web services APIs) is increasingly finding favour for digital preservation
purposes. An example is the use of Amazon S3 cloud storage service by the
Central Connecticut State University Library for its digital archive (Iglesias
and Meesangnil, 2010). Using Amazon S3 cloud storage incurred significantly
lower costs than other services investigated. Software tools were investigated
and adopted for file integrity checking, format verification and the creation of
preservation metadata, so that the archive could be certain that data were not
lost or corrupted during the process of transfer over the Internet. The resulting
system was evaluated as being ‘quite usable’, requiring a minimum of training,
although some issues were encountered. Overall, the authors consider that ‘the
decision to go with Amazon’s S3 storage has been a very good one’ (Iglesias
and Meesangnil, 2010). The use of cloud storage by a small archive, the Jewish
Women’s Archives (jwa.org) offered that archive a cost-effective alternative to
the more usual model of owning, managing and updating in-house servers
(Jeffs, 2011). DuraCloud (duracloud.org), a service operated by DuraSpace, a
nonprofit organization concerned with preserving durable digital materials and
noted in more detail in Chapter 9, provides another example. DuraCloud pro-
vides software tools that allow users to manage their digital materials and store
them with several cloud storage providers at the same time, as well as a hosted
service that offers a simple approach to storage.
Commercial data archiving services and cloud storage may not yet be con-
sidered as trusted digital repositories as described in Chapter 5. Ensuring trust,
privacy and security are perceived to pose the greatest challenges to counteract
the dangers of handing over sensitive data to a third party and the threats of
data breach or loss. They are, however, being actively investigated and
adopted for digital preservation purposes and will become a major part of the
digital preservation landscape.

Combining principles, strategies, and practices


How do all of the various principles, strategies and practices noted in this
chapter and the two previous chapters fit together? The key lies in their effec-
tive combination. As noted in this chaper, combinations of them are already
in use. The use of standard formats and migration are frequently combined; for
example, TIFF is often chosen as a standard file format for collections of im-
ages to prepare them for migration in the future. Encapsulation and emulation
work together, as noted in the section above. Standard data formats (PDF),
166 ‘Preserve Objects’ Approaches: New Frontiers?

viewers and migration (of data and metadata encoded in XML) are combined
in the VERS strategy.
The concept of micro-services has been actively investigated for its rele-
vance to digital curation and services are being developed based on the con-
cept. This approach to developing infrastructure for digital preservation is based
on combining small services that perform a single preservation function into
one cohesive service. Each small service is self-contained and can be developed,
maintained and improved more easily than larger single software applications.
The small services can be combined to provide comprehensive preservation
environments. The micro-services approach has many advantages, such as ease
of updating or replacement of a component when it becomes outdated (Abrams
et al., 2010). The University of California Curation Center has adopted this
approach in developing its digital preservation infrastructure (www.cdlib.org/
services/uc3/curation).
Archivematica (archivematica.org) is also using the micro-services concept
to build its digital preservation system. This system aims to ‘reduce the cost
and technical complexity of deploying a comprehensive, interoperable digital
curation solution that is compliant with standards and best practices’ (Van
Garderen, 2010). Archivematica combines open-source software tools into an
integrated software suite that provides a digital preservation path from ingest
to access, compliant with the OAIS Reference Model. It uses best practice
metadata standards, such as METS, PREMIS and Dublin Core. Archivematica
is itself open-source and free. Examples of the software incorporated into
Archivematica include BagIt to package digital objects and metadata, FITS
(File Information Tools Set) for file format identification and validation,
ImageMagick for converting bitmap images, MD5 for check-sum generation
and verification, and OpenOffice to normalize office documents.
Proprietary managed digital preservation systems are also available. They
aim to provide a single solution to digital preservation. Three systems of this
kind available in 2011 are Tesella’s Safety Deposit Box, Ex Libris Rosetta,
and OCLC’s Digital Archive. Tesella’s Safety Deposit Box (www.digital-
preservation.com/solution/safety-deposit-box) was developed by Tesella in
conjunction with The National Archives (UK) and provides a complete digital
archival solution. The Safety Deposit Box is described as flexible, accommo-
dating different user requirements in a variety of ways, including providing a
choice of providers of storage solutions and relational databases, and accom-
modating an institution’s metadata requirements. Ex Libris Rosetta (www.
exlibrisgroup.com/category/RosettaOverview), developed in conjunction with
the National Library of New Zealand, is described as a flexible, highly scalable,
secure, and easily managed digital preservation digital preservation system.
OCLC’s Digital Archive (www.oclc.org/digitalarchive) promises a secure, easily
managed, scalable storage environment for digital materials. OCLC emphasizes
the Digital Archive’s automated monitoring and reporting capabilities.
Conclusion 167

Conclusion
Chapter 7 and 8 provide an insight into the range of practices that constitute
the current digital preservation landscape. The section ‘Combining principles,
strategies and practices’ in this chapter has noted that the practices described in
both of these chapters are often combined and gives examples of their combi-
nation. Chapter 9 takes this further by noting a number of digital preserva-
tion activities that have been selected as examples to illustrate the range of
approaches being taken in preserving digital materials.
Chapter 9
Digital Preservation Initiatives and Collaborations

Introduction
Preservation is a problem domain that demands col-
laborative action (Ross, 2004)

This chapter first considers the theme of collaboration that is strikingly apparent
in many digital preservation initiatives. It then notes some ways in which these
initiatives have been categorized and describes a number of digital preservation
activities in those categories, structured first around geographical considera-
tions – international, regional, national or sectoral – and then subdivided into
services (projects actually carrying out digital preservation) and alliances (col-
laborations to develop, test and/or promote approaches to digital preservation).
Only a selection of initiatives and collaborations are covered here. Their de-
scription is intended to illustrate the range and nature of contemporary digital
preservation activities, rather than to provide a comprehensive listing. The
selection and structuring of initiatives and collaborations in this chapter are
intended to provide a framework to help us reflect usefully on the experience
gained through these initiatives.

Collaboration
One characteristic of the digital preservation agenda is that collaboration is
firmly embedded in that agenda and has been so from the earliest days of its
compilation.

Partnerships have always been important in the digital preservation community. From
the very beginning it was apparent that no one organization – whether library, govern-
ment or academic – could adequately archive, preserve and continue to provide access
to the digital material, even with stringent selection criteria (Hodge and Frangakis,
2004, p.63).

Certainly collaborative activities have become more and more evident at all
levels, and collaboration is now assumed to be one of the keys to effective
confrontation of what seem at times to be overwhelming challenges of digital
preservation. Speakers at a 2004 workshop noted that ‘one key theme running
Collaboration 169

throughout the day was the need for active collaboration at every level and
across sectoral and geographic boundaries. Speaker after speaker illustrated
how this collaboration was essential’ (Digital Preservation Coalition, 2004).
The topic of Article 11 of the UNESCO Charter on the Preservation of the
Digital Heritage (UNESCO, 2004) is ‘Partnerships and cooperation’ and the
UNESCO Guidelines devote a whole chapter to collaboration (UNESCO,
2003, chapter 11, ‘Working Together’). In 2011 research libraries ‘cannot
hope to build a stable cyberinfrastructure unless they work together in collabo-
rative units, investing in community-generated solutions rather than insisting
on building individualized workflows and systems’, note Walters and Skinner
(2011, p.6), who provide recommendations for making collaborative strategies
work and a range of examples of collaborative activities. ‘The question’, they
suggest, ‘is no longer whether, but rather how to collaborate’ (p.13). Increasing
collaboration at local, national, regional and international levels can be readily
observed and the rest of this chapter describes examples of collaboration.
The reasons for the prominence of collaborative action in digital preservation
are to be found in large part in the scale of the issues and uncertainty surrounding
how to address them. Because digital preservation is expensive and resources
are scarce, collaborative activities ‘can enhance the productive capacity of a
limited supply of digital preservation funds, by building shared resources, elimi-
nating redundancies, and exploiting economies of scale’ (Lavoie and Dempsey,
2004). The issues are similar across different kinds of organizations (such as
libraries and archives) and different sectors (for example, different scientific
disciplines) so that ‘it makes sense to capitalise on the potential benefits of
pooling expertise and experience’; there may also be pressure, for example by
funding agencies, to collaborate (Digital Preservation Coalition, 2008, p.48).
Another compelling reason to collaborate is uncertainty about where the
responsibility for preserving digital materials lies. There are many stakeholders
(noted in Chapter 2), none of whom can realistically expect to develop workable,
scalable solutions on their own. Academic authors are encouraged to collaborate
with libraries in archiving their own works in university library-based digital
repositories, for example. Journal publishers are collaborating with libraries or
library-based organizations such as JSTOR and LOCKSS to provide continuing
access to their publications. Morris in 2002 saw promise in ‘the way that organi-
sations from different parts of the information chain are beginning to work to-
gether to address some of the problems’ (Morris, 2002, p.130), cautioning that

the only way we can hope to find reasonable and scholarship-friendly solutions is to
work together. We have to make sure that there is close communication between the
plethora of initiatives in different parts of the world, so that none of us wastes time and
money re-inventing the wheel ... And we need to make sure that all the members of the
information chain – information creators, information users, and all the intermediaries
in between – are involved in the discussions and in creating the solutions (Morris,
2002, p.132).
170 Digital Preservation Initiatives and Collaborations

As the initiatives described in this chapter amply demonstrate, this promise is


being realized.
However, as the UNESCO Guidelines put it, ‘Collaboration costs’
(UNESCO, 2003, p.62). Collaboration has the potential to sidetrack the main
outcome by diverting attention from local objectives. Considerable effort has
to be put into ensuring that ‘unambiguous agreements able to be accepted by
all parties’ are reached (Digital Preservation Coalition, 2008, p.50). Besser
(2007) provides several case studies of collaborative activities in digital preser-
vation that emphasize the need for mutual understanding. Despite these and
other concerns, it is acknowledged that the benefits outweigh the disadvan-
tages. The Guidelines identify the benefits of collaboration as:

– Access to a wider range of expertise


– Shared development costs
– Access to tools and systems that might otherwise be unavailable
– Shared learning opportunities
– Increased coverage of preserved materials
– Better planning to reduce wasted effort
– Encouragement for other influential stakeholders to take preservation seriously
– Shared influence on agreements with producers
– Shared influence on research and development of standards and practices
– Attraction of resources and other support for well-coordinated programmes at a
regional, national or sectoral level (UNESCO, 2003, p.63).

Certainly one does not have to look far to identify collaborative activities. In
his 2002 survey of digital preservation activities, Beagrie observed that the
National Library of Australia ‘believes that international collaboration at many
levels is essential in digital preservation’ (Beagrie, 2003, p.18) and noted the
Library’s commitment to many international collaborations, including PADI
with its international advisory group, a Memorandum of Understanding with
the Digital Preservation Coalition, the Safekeeping Initiative with CLIR, the
National Library’s role in the Conference of Directors of National Libraries,
whose action plan includes aspects of digital preservation, and its participation
in working groups such as the OCLC/RLG working groups on preservation
metadata and on digital archive attributes (Beagrie, 2003, p.19). Current ex-
amples include the collaborative activities that ARL (Association of Research
Libraries) members are participating in (Walters and Skinner, 2011, pp.32-55),
the projects and partners of the US National Digital Information Infrastructure
and Preservation Program (NDIIPP, 2011, pp.47-56) and any one of the major
European Commission-funded digital preservation activities (CORDIS, 2011).
These are noted in more detail later in this chapter.
Typologies of digital preservation initiatives 171

Typologies of digital preservation initiatives


Collaboration takes many forms. One distinction that can be made is between
internal collaboration (cooperation among sections of a large organization)
and external collaboration (cooperation with other organizations) (Digital
Preservation Coalition, 2008, p.39). Beyond this, though, existing typologies
are not particularly helpful in grouping activities in ways that acknowledge the
variety and extent of these collaborative activities adequately. One typology
takes an evolutionary approach by suggesting that collaboration comes at the
end of a chain of activities: first acknowledging that digital preservation is of
concern at a local level; next acting by initiating a project; then consolidation
by moving from projects to programmes; institutionalizing the activities to
incorporate the larger environments; and finally externalizing the activities by
‘embracing inter-institutional collaboration and dependency’ (Kenney and
McGovern, 2003). Another typology is based on the nature of the organizations
that carry out digital preservation: ‘enterprise-based preservation services’
such as research libraries, academic disciplines, and publishers; and ‘community-
based preservation services’, for example, JSTOR and The Internet Archive
(Smith, 2003).
The UNESCO Guidelines, which, as noted above, devote a whole chapter
to cooperation, suggest four ways in which digital preservation collaborations
can be categorized. In the centralized distributed model one partner takes the
main role and other partners contribute in specifically defined ways. While this
model offers some cost sharing, develops expertise among partners, and may
offer economies of scale, it does not necessarily encourage ownership by lesser
partners. This model is ‘probably good for beginning programmes seeking to
collaborate with large, advanced programmes’. More equally distributed models,
with several partners who have similar responsibilities and commitments, may
have issues of leadership and consultation may become time-consuming. This
model is ‘probably suitable where there are a number of players willing to
share responsibility but none wanting to lead a programme’. Very highly dis-
tributed collaborations have a number of participants who largely take respon-
sibility for their own activities. This model ‘may be a useful starting point for a
preservation programme, raising awareness and allowing some steps to be
taken’ and does not require high costs to participate. Standalone arrangements,
curiously, are noted in the UNESCO Guidelines as the fourth model of col-
laborative activities. They allow the development of ‘expertise, strategies and
systems’ before collaborative partners are sought (UNESCO, 2003, p.65). The
UNESCO Guidelines also provide useful information about how to make col-
laborations work effectively and how to set them up (UNESCO, 2003, pp.63-64).
The typology applied in this chapter is loosely based around geographical
considerations – international, regional, national or sectoral – and subdivided
into services and alliances. (As noted at the start of this chapter, services
172 Digital Preservation Initiatives and Collaborations

denotes projects carrying out digital preservation, and alliances denotes col-
laborations whose aim is to develop, test and/or promote approaches to digital
preservation.) These categories, based on geography and the nature of the col-
laboration, required arbitrary distinctions to be made in 2004, when the first
edition of this book was being prepared. These distinctions are even more diffi-
cult to make in 2011, as many recently-established initiatives have characteristics
of more than one category. Although some of these programmes are catego-
rized in this chapter as national, or local, and so on, such is the collaborative
nature of digital preservation activities that their lessons are typically made
available to a wider audience and are keenly scrutinized. Despite its imperfec-
tions this typology is applied because it accommodates most current digital
preservation activities. Initiatives and programmes that have been described
elsewhere in this book are referred to only briefly in this chapter.

Services Alliances

International The Internet Archive UNESCO


JSTOR PADI
DuraSpace OCLC
LOCKSS CAMiLEON
MetaArchive International Internet Preservation
Consortium

Regional NEDLIB ERPANET


European Commission-funded Pro-
jects

National AHDS Digital Curation Centre


Florida Digital Archive Digital Preservation Coalition
NDIIPP
NDSA
HathiTrust

Sectoral Cedars JISC


Figure 9.1: Initiatives and Collaborations Described in this Chapter

International initiatives and collaborations


The term international in this chapter is applied to two types of programme.
The first type are those digital preservation initiatives and collaborations that
were established from their inception to be available or applicable to participants
in any country. The second type are those initiatives or collaborations that
were initially established with a geographical or sectoral limitation, but have
since become available to the international community.
International initiatives and collaborations 173

International services

International services considered here are The Internet Archive, JSTOR,


DuraSpace, LOCKSS, and MetaArchive.
The Internet Archive (www.archive.org)
The Internet Archive, established in 1996 by Brewster Kahle, is a nonprofit
organization based in San Francisco whose aim is to provide permanent access
to digital materials, primarily those on the web. Its web site (on 3 September
2011) describes its aims as ‘offering permanent access for researchers, historians,
scholars, people with disabilities, and the general public to historical collec-
tions that exist in digital format’. It is ‘working to prevent the Internet – a new
medium with major historical significance – and other “born-digital” materials
from disappearing into the past’ and promotes the ideals of ‘open and free access
to literature and other writings’ which have ‘long been considered essential to
education and to the maintenance of an open society.’
The Internet Archive is funded by donations from individuals and philan-
thropic organizations and by contract work it undertakes for bodies such as the
Library of Congress, the national archives of the US and Britain, and the Biblio-
thèque nationale de France.
The Internet Archive Wayback Machine contained 2.4 petabytes of data at
the end of 2010 and was growing at a rate of 20 terabytes per month, the result
of regular web crawls of all publicly accessible material on the web plus, from
1999, targeted web crawls, some commissioned by specific organizations. This
has resulted in several collections of web sites:

– The Asian Tsunami Web Archive, a collection of web sites relating to the
December 2004 Tsunami disaster in Asia
– Hurricanes Katrina and Rita, web sites that record the devastation caused
by Hurricane Katrina and its aftermath
– The UK Central Government Web Archive: selected UK Government web
sites from 2003, collected for the National Archives (UK)
– Election 2002 Web Archive: almost 4,000 web sites relating to the 2002
US elections, collected for the Library of Congress
– September 11th: archived web sites relating to the events of 11 September
2001 in the US, collected for the Library of Congress
– Election 2000: web sites relating to the US elections held in 2000, com-
missioned by the Library of Congress
– Web Pioneers: web sites illustrating the early years of the internet.

The Internet Archive is not, as is popularly believed, a complete depiction of


the web. It does not capture password-protected sites, dynamically-generated
content, and other material. It justifies its inclusive approach of archiving the
174 Digital Preservation Initiatives and Collaborations

entire publicly-available web by arguing that the cost of selection is greater


and riskier than capturing as much as possible. The thorny question of intellec-
tual property rights is addressed by collecting all publicly available material,
and responding to requests from site owners for privacy and to block access. It
is also considerably more than a collection of web sites, its collections having
expanded to include moving images, texts, audio and software.
The technical base of the Internet Archive is a large number of personal
computers running the Linux operating system and with one terabyte of storage
on hard disks, using storage it developed called the Petabox (www.petabox.org).
As of December 2010 the total amount of data stored was 5.8 petabytes, of
which 2.4 petabytes was for the Wayback Machine. It has mirror sites in Egypt
and Holland. Data are migrated as appropriate and, to counter data format ob-
solescence, the Archive also collects software and emulators. The Wayback
Machine, launched in 2001, provides access to archived versions of web pages.
The Internet Archive is active in research and development, for example to
develop more effective web crawlers and better user middleware. It actively
seeks collaboration, such as in the International Internet Preservation Consor-
tium (netpreserve.org). It offers services such as Archive-It, a subscription-
based web archiving service developed for institutions who wish or need to
outsource the harvesting, storage and access of collections of web sites. As of
3 September 2011, Archive-It has collected 3,159,734,496 URLs for 1,672
public collections.
The Internet Archive has influenced digital preservation significantly by
demonstrating that large quantities of web materials can be archived over time,
and through its technical developments such as the ARC file format, the Heritrix
web crawler, and the Wayback Machine. (This section is based on Lyman and
Kahle (1998), Smith (2003), Rackley (2010) and the Internet Archive’s web
site (www.archive.org).)
JSTOR (www.jstor.org)
JSTOR, a nonprofit organization based in the US, was established in 1995 to
address the problems libraries faced in providing storage space for long runs of
scholarly journals. Its role has expanded to help ‘scholars, researchers, and
students discover, use, and build upon a wide range of content on a trusted
digital archive of more than one thousand academic journals and one million
primary sources’ (JSTOR at a Glance, 2011). Today, users of JSTOR can search
and retrieve high resolution images from over 1,400 journals, as well as from
conference proceedings, primary source materials and books (from 2012).
More than 800 publishers participate in JSTOR by making their publications
available to users in over 7,000 institutions from more than 150 countries.
JSTOR’s content initially covered scholarly journals in the humanities and
social sciences, but its scope has expanded considerably; one example of the
expanded focus is JSTOR Plant Science (plants.jstor.org).
International initiatives and collaborations 175

JSTOR’s preservation strategies focus on stable standard formats, data back-


up, and redundancy. A content model and XML metadata scheme developed
by JSTOR are applied, assisting with preservation activities such as migration in
the future. Redundancy is applied by replicating a complete copy of the JSTOR
material in three data centres.
In 2009 JSTOR merged with and became a service of ITHAKA, a non-
profit organization founded in 2003 and ‘dedicated to helping the academic
community take full advantage of rapidly advancing information and network-
ing technologies’ (www.ithaka.org).
JSTOR’s significance for digital preservation lies mainly in its business
model which benefits all of its stakeholders – publishers, libraries and users.
This business model, Smith suggested in 2003, ‘promises to be sustainable
over time’ (Smith, 2003, p.21), and JSTOR’s continued growth and its strength
in 2011 indicates this to be the case. (This section is based on Heterick (2002),
Smith (2003), a presentation by Eileen Gifford Fenton at the DPC Forum,
London, 23 June 2004, JSTOR at a Glance (2011), and the JSTOR web site
(www.jstor.org).)

DuraSpace (duraspace.org)
DuraSpace was established in 2009 as a not-for-profit organization committed
to ‘providing leadership and innovation in the development and deployment
of open technologies that promote durable, persistent access to digital data’
(duraspace.org/about.php). Collaboration with a wide range of stakeholders,
such as scientists, researchers, librarians, and data specialists, is intrinsic to its
operation.
DuraSpace currently supports and develops two open-source repository
applications widely used in digital preservation, Fedora and DSpace, and has
introduced a new technology, DuraCloud. As noted on the DuraSpace web site,
its values ‘are expressed in our organizational byline, “open technologies for
durable digital content”’ and Fedora, DSpace and DuraCloud support access
to digital materials over long periods of time.
The open-source repository applications Fedora and DSpace have their
origins in research programmes based originally at US universities. They are
currently well established as important applications for digital preservation, being
implemented throughout the world in a wide range of contexts. DuraCloud,
also open-source, is a storage service that uses cloud storage and cloud com-
puting for online backup of digital materials in multiple locations.
DSpace (www.dspace.org)
DSpace began life as a sectoral initiative, as an institutional repository for
material produced by faculty of MIT (Massachusetts Institute of Technology).
At an early stage in its development DSpace was made available to other insti-
tutions and has been adopted around the world. Developed jointly by MIT
176 Digital Preservation Initiatives and Collaborations

Libraries and Hewlett-Packard, it was trialed within MIT from February 2002
and launched in September 2002. A significant characteristic of DSpace is that
it is an open-source system.
DSpace participants collaborated early in projects such as the DSpace@
Cambridge Project (www.dspace.cam.ac.uk). In 2003 the DSpace Federation
was established, its members all institutions that had implemented DSpace.
The intention of the DSpace Federation is to share ‘technical innovation,
content, and services’ and ‘to promote interoperability among institutional
repositories to support distributed services, virtual communities, virtual collec-
tions, and cataloging’. This happened through activities such as sharing in the
development and maintenance of the DSpace source code, and promoting the
DSpace service and interoperability of archival repositories. In 2007 the DSpace
Foundation was formed to support and develop DSpace (HP and MIT, 2007),
and in 2009 DSpace became part of DuraSpace.
From its beginnings DSpace was intended to support long-term preservation.
It ‘is committed to going beyond reliable file preservation to offer functional
preservation where files are kept accessible as technology formats, media, and
paradigms evolve over time for as many types of files as possible’ (DSpace,
2010?). Preservation is supported by such features as: automatic calculating
and verification of a checksum for each file uploaded; regular checking of
checksums to ensure file integrity; automatic identification of the file formats
of objects added to the repository; use of standards, including METS to main-
tain links between files and the Handle system to provide unique identifiers; and
maintaining the bit-stream of items added to the repository. DSpace recog-
nizes that faculty will create digital materials in a wide variety of formats that
support their own aims, and that repositories must, therefore, handle these
formats. It accepts all forms of digital materials and defines three levels of
preservation for file formats – supported, known, and unsupported. Bit-level
preservation is carried out for each of these levels. The functions of supported
file formats (those that are open and ‘archival’, such as TIFF, SGML, XML
and PDF) are preserved using either format migration or emulation techniques.
The future support of known file formats (popular proprietary formats such as
Microsoft Word and Powerpoint, Lotus 1-2-3, and WordPerfect) depends on
the likelihood that third-party format migration tools will be developed for
them. Functional preservation will not be applied to unsupported file formats
(those about which little is known, such as unique software programs). DSpace
assigns to material submitted ‘a unique identifier, stores provenance informa-
tion, maintains an auditable history and record of changes to the archive [and]
provides persistent storage’ (Sullivan et al., 2004; Pennock, 2006a).
DSpace has been widely adopted, its web site listing over 1,100 instances
at 4 September 2011. It is influential in digital preservation because it provides
a framework in which academic libraries and archives can develop strategies
and practices in a collaborative international environment.
International initiatives and collaborations 177

Fedora (fedora-commons.org)
Fedora (Flexible Extensible Digital Object and Repository Architecture) is an
open-source repository management system developed at Cornell University
and now in widespread use. Fedora’s suitability as the basis of a digital archive
was investigated by Tufts University and Yale University (Fedora, 2006).
They concluded that Fedora has value as ‘a preservation application, in con-
junction with “the appropriate people, infrastructure, policies, and procedures’,
noting ‘its agnostic view towards file formats and object types enables it to
manage essentially any type of file’, and features such as its use of XML and
its ability to manage multiple bit streams for a single object (Fedora, 2006,
Section 4.1)
The FedoraCommons web site (www.fedora-commons.org/about/features)
indicates that Fedora’s key features include the ability to store and manage all
types of content and associated metadata, scalability (up to ‘millions of objects’),
web access and search facilities, a wide range of storage options, and a ‘Re-
builder Utility’ for disaster recovery and data migration. Fedora’s support for
preservation includes its use of open standards such as METS and XML, its
system architecture which accommodates the OAIS Reference Model, and its
ability to handle metadata from a range of sources (Pennock, 2006b).
More than 170 institutions were registered with Fedora installations at
4 September 2011. It has an international community-based development team.
DuraCloud (duracloud.org)
DuraCloud is a new service, incorporating open-source technology, developed
by DuraSpace and released in 2010. Its aim is to make the use of cloud services
straightforward for end-users, providing pay-for-use access to digital materials
and at the same time ensuring the durability of digital content. The DuraCloud
service uses a cloud server environment and and multiple cloud storage pro-
viders, one of them Amazon AWS. It was decided that writing open-source
code would encourage community involvement in software development and
the creation of new services to integrate with the DuraCloud system.
DuraCloud supports digital preservation in various ways, one of which is
its support of redundancy. DuraCloud maintains multiple copies of digital
materials with different cloud storage providers. Another preservation support
feature is its integrity checking, which allows users to check the integrity of
material they have stored in DuraCloud. Other preservation-related features
allow bulk handling of file conversions and easy synchronizing of files between
a local repository and cloud storage.
DuraCloud, only recently introduced at the time of writing in 2011, is
planning many further initiatives. They include developing new alliances and
funding models, working on system integration of DuraCloud with DSpace,
Fedora, ePrints and other open-source products, offering education and train-
ing, and developing a service for individual researchers.
178 Digital Preservation Initiatives and Collaborations

The above section on DuraSpace and its subsidiary applications and ser-
vices is based on Barton and Walker (2003), Greenan (2003), Smith (2003), Sul-
livan et al. (2004), Walters and Skinner (2011), and the web sites of DuraSpace
(duraspace.org), FedoraCommons (fedora-commons.org) and DSpace (www.
dspace.org).

LOCKSS (lockss.stanford.edu)
The LOCKSS (Lots Of Copies Keep Stuff Safe) project is based on the well-
established preservation principle of redundancy (keeping multiple copies as a
safeguard against loss). LOCKSS is significant in digital preservation terms
because it established the feasibility of replication and peer-to-peer polling using
standard personal computers.
LOCKSS was developed initially for the preservation of e-journals and has
expanded to be applicable to any Web-published content. It is based on open-
source software that harvests, stores and copies digital content using standard
desktop computers, and ensures accuracy of the digital material through peer-
to-peer polling. It is inexpensive because it does not require costly hardware,
the software is free, and relatively little technical administration is required.
The LOCKSS programme for preserving e-journals has expanded rapidly; the
more than 80 libraries and 50 publishers using the LOCKSS software in 2004
has grown to ‘about 200 LOCKSS boxes in libraries around the world’ in 2010
(Rosenthal, 2010e) and 470 publishers in 2011 (lockss.stanford.edu/lockss/
Publishers_and_Titles).
The LOCKSS web site (lockss.stanford.edu) provides a detailed summary of
how the system works. An inexpensive personal computer running LOCKSS
software collects specified digital content, for which the publisher’s permis-
sion to collect has been secured, using a web crawler. It continually compares
the content collected with the same content collected by others in the network
and ensures that the content is identical. If changes in a file are detected, the
changed file is repaired from an intact copy. The LOCKSS software also allows
access to this content by authorized users and provides administrative func-
tions. In order to allow the LOCKSS crawler software access to their content,
publishers need to give permission to the LOCKSS system. The basis of the
preservation function of LOCKSS is the continual checking of digital content
against other copies and the repairing of discrepancies identified by comparing
copies through polling. LOCKSS is OAIS-compliant.
The LOCKSS concept and software has been applied widely elsewhere.
CLOCKSS (Controlled LOCKSS (www.clockss.org)) was established in 2006
by a consortium of publishers and libraries as an archive of electronic journals
no longer supported by any publisher (Reich and Rosenthal, 2009). LOCKSS
is used as the basis of PLNs (Private LOCKSS Networks), which have been
established by libraries and archives collaborating to preserve digital objects.
The operation of one LOCKSS PLN, the Alabama Digital Preservation Network
International initiatives and collaborations 179

(www.adpn.org) for locally created digital content, is described in detail by


Trehub and Wilson (2010). Other LOCKSS PLNs in operation in the US include
the Persistent Digital Archives and Library System (PeDALS, www.pedals
preservation.org), the Data Preservation Alliance for the Social Sciences
(Data-PASS, www.icpsr.umich.edu/icpsrweb/DATAPASS), and the US Govern-
ment Documents PLN, (www.lockss.org/lockss/Government_Documents_PLN).
LOCKSS software is the basis of the MetaArchive Cooperative (Skinner and
Halbert, 2009) and was being considered in 2010 as the basis of a national
digital preservation system for Germany (Seadle, 2010).

MetaArchive Cooperative (MetaArchive.org)


Projects that share archival storage resources, such as MetaArchive, demon-
strate a very high level of collaboration. The MetaArchive Cooperative was es-
tablished in 2003 in conjunction with the Library of Congress’s NDIIPP. It is a
coalition of libraries, archives and other cultural heritage organizations work-
ing cooperatively to develop and operate a preservation infrastructure for their
digital materials, with each member investing in the preservation infrastructure
rather than paying for services. The MetaArchive model focuses on sharing re-
sponsibility, expertise and infrastructure so that members can accomplish their
preservation goals. Forty-eight institutions in the US, South America and
Europe were members of MetaArchive as at 4 September 2011.
MetaArchive’s mission statement articulates its cooperative nature: ‘to foster
better understanding of distributed digital preservation methods and to create
enduring and stable, geographically dispersed “dark archives” of digital materials
that can, if necessary, be drawn upon to restore collections at the contributing
organizations’. The cooperative’s aims include encouraging and supporting
long-term preservation of digital materials that are at risk, promoting a decen-
tralized approach to digital preservation, encouraging members to build their
own preservation infrastructures and knowledge rather than outsourcing digital
preservation to external vendors, using and creating open standards and systems,
administering services that have wide applicability to a range of organizations
and digital content, and carrying out research and development that advance
best practice in digital preservation (MetaArchive Cooperative, 2010).
There are three categories of MetaArchive membership: Sustaining; Preser-
vation; and Collaborative. Each member runs a server for the MetaArchive
LOCKSS network. Sustaining Members commit to a high level of support and
input into future developments; Preservation Members are single instutions
that run and maintain a network server; and Collaborative Members are groups
of institutions that run a single, shared, centralized repository. Funding comes
from membership dues, consulting, sponsorship and fees for services. Central
staffing is intentionally kept low, based on the principle that local staff must be
actively involved in preservation activities; this also distributes risks from human
180 Digital Preservation Initiatives and Collaborations

error within the network. Three committees (content, preservation, and techni-
cal) provide guidance to MetaArchive.
MetaArchive is based on the principle of distributed digital preservation,
and to that end uses open-source LOCKSS software, which allows digital pres-
ervation to be carried out collaboratively at a series of geographically distributed
sites. Each member participates in a PLN (Private LOCKSS Network) and
operates a server in a secure preservation-dedicated network environment. This
ensures that materials are preserved in various ways: content is stored on at
least six servers in different geographic locations and maintained by different
systems administrators; content is constantly monitored for change, which if
detected is repaired; expertise and technical infrastructure is kept within insti-
tutions rather than residing with outside vendors; and centralized expertise is
available to advise and assist all members of MetaArchive. MetaArchive inte-
grates well with other repository applications, including DSpace and Fedora. If
the MetaArchive Cooperative decides that a format type held in its network
needs to be migrated, all material in this format will be migrated and both the
original copies and the migrated copies preserved.
MetaArchive works with other preservation initiatives, such as NDIIPP, the
National Digital Stewardship Alliance (NDSA), and the Networked Digital
Library of Theses and Dissertations (NDLTD). It partners with other groups
(Chronopolis, Data-PASS, and the California Digital Library are examples) to
develop preservation technologies. The MetaArchive Cooperative has since
2007 been a programme of the Educopia Institute (www.educopia.org), whose
goal is to foster successful community-based cyberinfrastructure, rather than to
develop its own assets, thereby building knowledge and resources in the com-
munity. (This section is based on Halbert and Skinner (2008), Skinner and
Halbert (2009), Minor, Phillips and Schultz (2010), Walters and Skinner (2011,
pp.36-39) and the MetaArchive Cooperative web site (MetaArchive.org).

International alliances

The examples of digital preservation services noted above are all based on
collaboration. Some international collaborations are different in that they are
intended primarily to conduct research, educate, inform and lobby, rather than
to establish services that actually preserve digital materials. The examples
noted in this section are UNESCO, PADI, OCLC, CAMiLEON, and the Inter-
national Internet Preservation Consortium. Another example, the InterPARES
Project, is noted in Chapter 5.
UNESCO (www.unesco.org)
UNESCO (United Nations Educational, Scientific and Cultural Organization)
has assumed a prominent role in promoting digital preservation. Its general
International initiatives and collaborations 181

conference in 2001 adopted a resolution that drew attention to the need to


safeguard endangered digital heritage. A discussion paper prepared by ECPA
(European Commission on Preservation and Access) (de Lusenet, 2002) and
further consultation, including regional expert meetings in Canberra (Australia),
Managua (Nicaragua), Addis Ababa (Ethiopia), Riga (Latvia), and Budapest
(Hungary) in 2002 and 2003, resulted in the 2004 Charter on the Preservation of
the Digital Heritage (UNESCO, 2004). This statement of principles is intended
for advocacy and public policy purposes. Its 12 articles describe the value of
digital resources and their vulnerability, note that it is vital that these resources
remain accessible, and stress the need for action. They also note the impor-
tance of taking a life-cycle approach, affirm the need to consult stakeholders
and to define criteria for selection, emphasize the importance of appropriate
legal and policy frameworks, stress the need for cooperation, and note the
responsibility of UNESCO (UNESCO, 2004, pp.14-17). De Lusenet (2007)
examines the Charter in detail and notes its significance in promoting the
preservation of digital heritage.
In conjunction with the Charter on the Preservation of the Digital Heritage,
UNESCO contracted the National Library of Australia to develop guidelines
for digital preservation. The Guidelines for the Preservation of Digital Heritage
(UNESCO, 2003), the product of wide international consultation with interest
groups from governments to individual experts, were made available via the
UNESCO web site in 2003. The preface to the Guidelines states their intention
is to introduce ‘general and technical guidelines for the preservation and con-
tinuing accessibility of the ever growing digital heritage of the world’ and to
complement the Charter on the Preservation of the Digital Heritage. They are
best seen as ‘a guide to the questions that programme managers need to find
answers to’ and do not claim to address ‘every technical and practical question
that will arise in managing digital preservation programmes’, relying rather on
establishing and stating principles that can be applied (UNESCO, 2003, pp.6,10).
The UNESCO Guidelines have proved valuable to the digital preservation
community for their comprehensive approach and statement of principles, as
the frequent reference to them throughout this book attests.
PADI (www.nla.gov.au/padi)
PADI (Preserving Access to Digital Information) described itself as ‘a subject
gateway to digital preservation resources’ and was considered by many as the
essential starting point for digital preservation matters. It was established in
1997 by the National Library of Australia to share information about digital
preservation and support activities in Australia and worldwide. PADI demon-
strates the collaborative aspects that characterize many digital preservation
activities: it was developed in cooperation with CLIR (Council on Library
and Information Resources) and the UK-based Digital Preservation Coalition,
and had an international advisory group. It was highly regarded by its users,
182 Digital Preservation Initiatives and Collaborations

one respondent to a survey in 2000 describing it as a ‘campfire around which


this community of interest can gather’ (Howell and Berthon, 2000, p.13). In
2010 the National Library of Australia announced it would cease to maintain
PADI beyond the end of 2010, having reluctantly concluded that it could neither
maintain any longer the large number of links in PADI nor provide the re-
sources needed to upgrade the site to current technical standards.
OCLC (www.oclc.org)
The activities of OCLC in digital preservation sit uneasily with the distinction
between services and alliances made in this chapter. OCLC has been active in
both arenas. Its collaborative digital preservation activities with RLG are noted
in this section. OCLC was founded, as the Ohio College Library Center, in 1967,
primarily to offer cataloging services to US libraries. Now the Online Computer
Library Center, it offers a wide range of services, including preservation ser-
vices, to an international clientele. In addition to collaboration with RLG, noted
below, OCLC’s digital preservation activities include a Registry of Digital
Masters (with the Digital Library Federation (www.diglib.org/community/
groups/rdm)), and participation in the development of METS (Metadata
Encoding Transmission Standard), an essential metadata standard for digital
archives. OCLC has also maintained a digital archive service since 2003. This
general purpose OAIS-compliant digital repository can handle multiple file for-
mats and through its links with Connexion offers automatic metadata retrieval;
it also has a web harvester service (www.oclc.org/digitalarchive).

RLG
RLG, founded as the Research Libraries Group in 1974, was a not-for-profit
alliance of libraries, archives, museums, and historical societies with strong
research collections. It provided a mechanism for collaborative action on the
problems facing research collections. RLG merged with OCLC in 2006.
Preservation was always a major interest of RLG, as demonstrated by its
significant digital preservation activities, particularly in advocacy and raising
awareness and in standards development. RLG’s advocacy and awareness-
raising activities included publication from 1997 to 2007 of the electronic
journal RLG DigiNews. Perhaps its most influential activity in digital preserva-
tion was its participation, with the Commission on Preservation and Access, in
the Task Force on Archiving of Digital Information. The report of the Task
Force in 1996 (Task Force on Archiving of Digital Information, 1996) laid the
foundations for much subsequent work in digital preservation.
RLG’s role in standards development was similarly significant. It partici-
pated in 1998 in the development of the OAIS Reference Model and in many
working groups, including:
International initiatives and collaborations 183

– RLG-OCLC Working Group on Digital Archive Attributes, whose final


report was Trusted Digital Repositories: Attributes and Responsibilities
(RLG/OCLC Working Group on Digital Archive Attributes, 2002)
– OCLC-RLG Preservation Metadata Working Group, whose report was A
Metadata Framework to Support the Preservation of Digital Objects
(OCLC/RLG Working Group on Preservation Metadata, 2002)
– Preservation Metadata: Implementation Strategies (PREMIS) Working
Group, another joint RLG-OCLC working group, reported in 2004: Imple-
menting Preservation Strategies for Digital Materials: Current Practice
and Emerging Trends in the Cultural Heritage Community (OCLC/RLG
PREMIS Working Group, 2004)
– PREMIS Working Group final report, published as Data for Preservation
Metadata: Final Report of the PREMIS Working Group (OCLC/RLG
PREMIS Working Group, 2005).

Chapter 5 notes in more detail the history of PREMIS and the role of OCLC
and RLG in it.
Other RLG collaborations included a ‘fruitful working relationship and
strategic partnership’ with JISC (Dale, 2004, p.20). RLG staff also contributed
to the advisory groups of many collaborative digital preservation activities,
such as Cedars and CAMiLEON. RLG was a founding member of the DPC.
(This section is based on Bellinger et al. (2004), Dale (2004) and the OCLC
web site (www.oclc.org).)

CAMiLEON (www.si.umich.edu/CAMILEON)
CAMiLEON (Creative Archiving at Michigan and Leeds), whose role in
developing emulation is described in Chapter 7, is noted here because it was a
significant example of an international digital preservation collaboration, in this
case a research collaboration. From 1999 to 2003 the University of Michigan (in
the US) and the University of Leeds (in the UK) combined forces to develop
and evaluate a range of techniques for long-term preservation of digital
materials. CAMiLEON’s reports and other publications, available on its web
site, remain valuable source material. Rusbridge considers its influence to have
been in its high-profile activities, especially the BBC Domesday project (see
Chapter 7), and its proof-of-concept of migration on request. CAMiLEON, he
suggests, ‘has given leadership, attracted public attention, advanced both
theory and practice, and highlighted many of the issues involved in digital
preservation today’ (Rusbridge, 2004, p.35).
International Internet Preservation Consortium (netpreserve.org)
The International Internet Preservation Consortium (IIPC) was formed in
2003, initially for three years. Its charter members were the national libraries
of Australia, Canada, Denmark, Finland, Iceland, Italy, Norway, and Sweden,
184 Digital Preservation Initiatives and Collaborations

the British Library, the Library of Congress, and the Internet Archive, with
overall coordination by the Bibliothèque nationale de France. After the initial
three-year period had elapsed, membership expanded, numbering forty in 2011,
and coordination moved to the British Library. The consortium’s mission is
‘to acquire, preserve and make accessible knowledge and information from
the Internet for future generations everywhere, promoting global exchange
and international relations’, this mission being articulated by three goals:

– To enable the collection of a rich body of Internet content from around the world
to be preserved in a way that it can be archived, secured and accessed over time
– To foster the development and use of common tools, techniques and standards that
enable the creation of international archives
– To encourage and support national libraries everywhere to address Internet archiving
and preservation (netpreserve.org).

The research activities of the IIPC are conducted by working groups, of which
there were three in 2010: Harvesting, concerned with developing web harvesting
techniques; Access, focusing on understanding and defining user requirements
for access; and Preservation, concentrating on policy, practices and resources
to support the preservation of web archives. Earlier working groups tackled
other aspects of web archiving such as standards, researchers’ requirements,
metrics, and access tools. The IIPC’s research activities are disseminated
through reports available on its web site and at its regular meetings.
The IIPC has developed software tools for web archiving, such as Heritrix
(an archive quality web crawler), DeepArc (software that extracts database
content to XML flat files), and the Web Curator Tool (a workflow manage-
ment tool). The IIPC aims to develop a toolkit of web archiving software that
is open-source and easy to install. It makes available from its web site software
tools recommended and used by its members. One of its principal activities has
been the development of the WARC file format, a container format that allows
one file to contain a large number of objects and associated metadata that
includes a record of actions taken to preserve the files. WARC was adopted as
an ISO standard, ISO 28500:2009 Information and documentation–WARC file
format, in 2009. (This section is based on information from the International
Internet Preservation Consortium’s web site (netpreserve.org) and from com-
ments made during the Archiving Web Resources International Conference,
Canberra, 9-11 November 2004.)
Regional initiatives and collaborations 185

Regional initiatives and collaborations


The term regional is applied in this chapter to those digital preservation initia-
tives or collaborations established to apply to, or be available for, participants
from a region greater than a single nation, for example, in the European Union.
As with some of the international initiatives and collaborations noted above,
the initial intention may have been for it to apply to a defined geographical
region or sectoral limitation, but it may have since become more widely avail-
able within a region. The regional service considered here is NEDLIB, and the
regional alliances noted are ERPANET and European Commission-funded
projects.

Regional services

NEDLIB (www.kb.nl/hrd/dd/dd_projecten/projecten_nedlib-en.html)
NEDLIB (the Networked European Deposit Library) was a European Com-
mission-funded research project that was launched at the start of 1998 and
ended in January 2001. NEDLIB was a collaborative project of national librar-
ies, a national archive, IT organizations and publishers, led by the Koninklijke
Bibliotheek (National Library of the Netherlands), formed to explore the tech-
nical and managerial issues of building a networked European deposit library.
The web site of the Koninklijke Bibliotheek (the National Library of the Nether-
lands) hosts the publications and reports resulting from its activities, which
include Jeff Rothenberg’s report An Experiment in Using Emulation to Preserve
Digital Publications (Rothenberg, 2000). Its outcomes include a proof-of-
concept demonstrator of a deposit system for electronic publications, and guide-
lines for best practice. Its results have been further developed by the Koninklijke
Bibliotheek and IBM-Netherlands in the implementation of Koninklijke Biblio-
theek’s electronic deposit system, e-Depot (described below). The significance
of NEDLIB lies in the background work it did to develop and implement a
fully functioning electronic deposit system. (This section is based on Beagrie
(2003) and the NEDLIB web site (www.kb.nl/hrd/dd/dd_projecten/projecten_
nedlib-en.html).)

Regional alliances

ERPANET (www.erpanet.org)
ERPANET (Electronic Resource Preservation and Access Network) was
funded by the European Commission and the Swiss Government from 2001 to
2004 to ‘enhance the preservation of cultural and scientific digital objects
through raising awareness, providing access to experience, sharing policies
186 Digital Preservation Initiatives and Collaborations

and strategies, and improving services’ (ERPANET, 2002). This collaboration


was led by HATII (Humanities Advanced Technology and Information Institute)
at the University of Glasgow, in partnership with the Schweizerisches Bundes-
archiv (Switzerland), ISTBAL at the Università di Urbino (Italy) and the
Nationaal Archief (Dutch National Archives). ERPANET produced a remarkable
variety of products and services, aimed at identifying quality sources of infor-
mation, and distributing the outcomes of its activities widely. They included
thematic workshops, training activities, studies of digital preservation in practice,
assessments of selected publications about digital preservation, an e-repository,
‘tools’ that provide structured advice to aspects of digital preservation activities,
an online discussion forum, and an advisory service. As one example, it as-
sessed significant literature on digital curation and preservation selected from a
broad range of publications, analyzed them, and provided commentary on
them (Ross, 2004). Its thematic workshops and seminars covered a wide range
of topics, such as the OAIS Reference Model, digital preservation policies, web
archiving, metadata, appraisal of scientific data, file formats for preservation,
and persistent identifiers.
From its research, ERPANET concluded that ‘the community at large is
hungry for practical case studies and reports of real world experiences’ (Ross,
2004), leading ERPANET to conduct case studies in a wide range of sectors,
including publishing, pharmaceutical, and government; completed case studies
are available on the ERPANET web site. These case studies led to the aware-
ness in 2003 that most organizations ‘are hesitant in their digital preservation
activities’, often because of a lack of awareness, and are ‘waiting for external
developments that they can adopt, or off-the-shelf solutions they can imple-
ment’ (Ross, Greenan and McKinney, 2003, p.8). This conclusion still holds
true in 2011.
At the heart of ERPANET was its collaborative nature. It depended on
synergy among many European and international agencies – national libraries,
universities, other digital preservation collaborative activities – and with pro-
fessionals (Ross, 2004). Some of ERPANET’s key functions were taken up
and continued by another collaborative project, DigitalPreservationEurope (DPE,
www.digitalpreservationeurope.eu). The ERPANET web site contains much
material that is still useful. (This section is based on Ross, Greenan and
McKinney (2003), Ross (2004) and the ERPANET web site (www.erpanet.org).)

European Commission-funded Projects


The European Commission has over the last decade funded many significant
research and development projects in digital preservation (CORDIS, 2011).
Although the initial intention of these projects may have been to focus on
activities within the European Union, they have increasingly attracted inter-
national partners, so that many of them can now be classified as international
projects. There are too many of these projects to describe in detail here; some
National initiatives and collaborations 187

have already been noted in this chapter, such as ERPANET and DigitalPreser-
vationEurope, and others elsewhere in this book. Several of the projects are,
however, so significant that they require specific mention.
The CASPAR project (www.casparpreserves.eu) ran from 2006 to 2009,
working in the domains of science, cultural heritage, and creative arts. One of
its products was ‘a suite of flexible, sustainable, and interchangeable digital
preservation services’ (Lamb et al., 2009). The suite includes: a registry and
repository of representation information; a representation information toolbox
to assist with creating, maintaining, and reusing metadata; a preservation data
store; a digital rights manager; and an authenticity management tool. The Planets
Project (www.planets-project.eu), which ran from 2006 to 2010, also devel-
oped a suite of tools to apply at each stage of the digital preservation process.
Among these tools are: the Planets Core Registry of file formats; emulation
tools; the GRATE (Global Remote Access to Emulation Services) tool to pro-
vide access to emulators; the Planets Testbed, a mechanism for testing the
effectiveness of different preservation tools; and Plato, a decision-support tool
that assists with preservation workflow planning. Planets tools are being inte-
grated into digital preservation activities at the British Library, the Swiss
Federal Archives, the Koninklijke Bibliotheek and Nationaal Archief in the
Netherlands, Det Kongelige Bibliotek (National Library of Denmark), and the
Österreichische Nationalbibliothek (Austrian National Library) (Integrating
Planets, 2009). The activities of Planets are being continued by the Open Planets
Foundation (www.openplanetsfoundation.org).
The suites of tools developed by the CASPAR and Planets projects are
examples of the influential outcomes of many of the research and development
projects in digital preservation by the European Commission. CASPAR and
Planets are by no means the only significant examples of digital preservation
research and development projects funded by the European Commmission.
Among others worthy of mention are Keeping Emulation Environments Portable
(KEEP) and SHAMAN. KEEP (www.keepproject.eu), funded from 2009 to
2011, is developing emulation tools and aims to improve understanding about
how emulation strategies can be integrated into digital archives. SHAMAN
(shaman-ip.eu/shaman), funded from 2007 to 2012, is building a digital preser-
vation framework using new technologies such as grid computing, virtualization
and distribution technologies with their associated tools. Its primary areas of
concern are scientific publishing and parliamentary archives, industrial design
and engineering, and scientific applications.

National initiatives and collaborations


The term national refers in this chapter to a single country (and in one case to
a single US state). While the initial intention may have been to develop a ser-
188 Digital Preservation Initiatives and Collaborations

vice or alliance within only one country, the availability (as with some of the
international and regional initiatives and collaborations noted above) may have
since expanded beyond national borders. National services considered here are
the AHDS and the Florida Digital Archive. National alliances noted are the
Digital Curation Centre, the Digital Preservation Coalition, NDIIPP, the
NDSA, and HathiTrust.

National services

AHDS (www.ahds.ac.uk)
The Arts and Humanities Data Service (AHDS) was a federation of five data
archive services: AHDS Archaeology; AHDS History; AHDS Literature, Lan-
guage and Linguistics; AHDS Performing; and AHDS Visual Arts. The AHDS
was established in 1996 as an outcome of a JISC feasibility study which rec-
ommended and funded a centrally managed distributed service. It ceased op-
eration in 2008. Its aims were to ‘preserve the rapidly growing unpublished
primary digital research materials being generated within the higher education
arts and humanities community and beyond’ (Beagrie, 2001, p.222). It was
based on the already existing Oxford Text Archive (established in 1976 at the
University of Oxford) and the History Data Service (established in 1993 at the
University of Essex), as well as on three more recently established data ar-
chives, the Archaeology Data Service (at the University of York), the Perform-
ing Arts Data Service (at the University of Glasgow) and the Visual Arts Data
Service (at the Surrey Institute of Art and Design).
The holdings of the AHDS included electronic texts, databases, still im-
ages, audio, GIS data, geophysics data, and metadata sets. The AHDS estab-
lished and guided the overall policy for the management of each of its compo-
nent services, provided outreach services, set standards, and encouraged best
practice through the guides it compiled and distributed, such as Digitising His-
tory (Townsend, Chappell and Struijvé, 1999).
The AHDS was significant for digital preservation as an early example of
collaboration in data archiving, that is, a centrally managed distributed model.
After the withdrawal of funding for a national service in 2008 several services
continued: the Centre for e-Research at King’s College London, the Archaeology
Data Service at the University of York, the Oxford Text Archive at Oxford
University, the History Data Service at the University of Essex, and the Visual
Arts Data service at the University for the Creative Arts. (This section is based
on Beagrie (2001) and information from the AHDS web site (www.ahds.ac.uk).)
National initiatives and collaborations 189

Florida Digital Archive (fclaweb.fcla.edu/FDA_landing_page)


The Florida Digital Archive (FDA) is not, strictly speaking, a national service
in the definition applied in this chapter, but is included here because it is a
well-documented example of interest and value to readers of this book. The
FDA is a centralized digital preservation repository run by the Florida Center
for Library Automation (FCLA), which was established in 1984 to provide
shared automated library systems and services for university libraries in the US
state of Florida, its role expanding in 2001 to encompass digital preservation.
The FDA has developed the DAITSS (Dark Archive in the Sunshine State)
preservation repository application. The FDA began operation in late 2006 and
after four years of operation had stored just over 73 terabytes. The DAITSS
application was re-engineered as a series of web services and DAITSS 2 was
available in 2011.
The FDA is a ‘dark archive’; it is concerned primarily with long-term
preservation and does not provide public access to the material it stores. Digital
materials submitted to the archive conform to a specified package format. The
FDA commits to returning, on request, a copy that is identical to the original
resource in terms of its bit-stream, plus a version that can be rendered with
tools available at the time of the request. Affiliated libraries are responsible for
selection, negotiating rights, providing metadata, submitting material conforming
to FDA specifications, and other actions. The FDA’s responsibilities include
managing the material in the archive, implementing specified preservation
strategies, preserving files exactly as submitted and demonstrating their integ-
rity, viability and authenticity, providing for all supported formats a version
that can be rendered, providing material on request, and aiming at certification
as a trustworthy repository.
DAITSS is at the heart of the FDA. It is based on standards such as the
OAIS Reference Model, METS and PREMIS, and conforms strictly with
OAIS. DAITSS uses format transformation. For files in supported formats it
creates a normalized or migrated version (or both) when feasible. Provenance
is ensured by the system-generated record of all format-based processing,
including migration and normalization, inside the system. DAITSS was redevel-
oped as a set of RESTful web services to become DAITSS 2 and is consequently
easier to modify and test and more readily integrated with other software. It is
intended that DAITSS 2 code be made available as open-source. (This section
is based on Caplan (2007), Caplan (2010), Walters and Skinner (2011), and the
FDA web site (fclaweb.fcla.edu/FDA_landing_page)).
190 Digital Preservation Initiatives and Collaborations

National alliances

Digital Curation Centre (www.dcc.ac.uk)


Funding of £1.3 million a year for three years was announced in February
2004 to establish a Digital Curation Centre (DCC) in the UK. The funding,
provided by JISC and the eScience Core Programme of the UK Government,
established a consortium led by the University of Edinburgh in partnership
with HATII (at the University of Glasgow), UKOLN (formerly the UK Office for
Library Networking), and the Council for the Central Laboratory of the Research
Councils (which withdrew from the consortium in 2010). Since its establish-
ment the DCC has become one of the key players internationally in the digital
preservation arena. The first edition of this book, published in 2005, noted that
the activities of the just-established DCC will be ‘watched with considerable
interest throughout the world’ and that ‘it has the potential to significantly
advance understanding of digital preservation practice’ (Harvey, 2005, p.173).
The phrase ‘digital curation’ was coined to encompass data archiving, digi-
tal preservation, and the active management and appraisal of data throughout
their life cycle. The DCC intended, according to press releases on 5 February
and 7 April 2004, to ‘provide a national focus for research into curation issues
and ... promote expertise and good practice, both national and international, for
the management of all research outputs in digital format’. To do this it would
lead a research programme addressing digital curation issues, create a network
to bring together and support curators, and provide services such as evaluation
of tools.
Since its establishment in 2004 the DCC has received two further rounds
of funding, Phase 2 from 2007 to 2010 and the current phase from 2010 to
2013. In its first phase the DCC’s projected audience was digital preservation
specialists in the UK higher and further education sector, and it sought links
with similar organizations elsewhere in the world and with standards groups.
Phase 2 saw an increased focus on the e-science research community, and
Phase 3 is concentrating on capacity building in the UK higher education re-
search community.
For digital presevation communities in the UK and elsewhere, the resources
that the DCC makes available though its web site are very significant. Probably
the most influential are the materials in the ‘Resources’ section of its web site,
which include chapters of its Curation Reference Manual and numerous brief,
accessible publications that provide guidance in aspects of digital curation.
These are essential starting points for anyone seeking quality information
about a comprehensive range of digital curation topics. The ‘Resources’ section
also provides access to the DCC Curation Lifecyle Model, which is widely
used as a planning tool in developing a digital archive. The DCC web site pro-
vides training materials and numerous case studies available in the ‘Projects’
section of the web site, including the immersive case studies developed by the
National initiatives and collaborations 191

DCC SCARP Project. (This section is based on Lord and Macdonald (2003),
Ross et al. (2004) and the Digital Curation Centre’s web site (www.dcc.ac.uk).)
Digital Preservation Coalition (www.dpconline.org)
The idea of a national digital preservation alliance was developed in 1999 at a
workshop on digital preservation held at the University of Warwick, which
recommended the establishment of a coalition to promote digital preservation
activities in the UK. To further the development of a coalition, JISC established
a Digital Preservation Focus in 2000 (Beagrie, 2001). The Digital Preservation
Coalition (DPC) was established in 2001 with seven founding members. Mem-
bership is open to all collectives and nonprofit organizations from all sectors in
the UK. The DPC had 26 members at January 2004, plus JISC. By 2011 its
membership had grown to 14 full members, 22 affiliate members, and 4 allied
organizations.
The DPC’s aim is ‘to secure the preservation of digital resources in the UK
and to work with others internationally to secure our global digital memory
and knowledge base’. To achieve this it has established goals which include:

– Producing, providing, and disseminating information on current research and practice


and building expertise amongst its members …
– Instituting a concerted and co-ordinated effort to get digital preservation on the
agenda of key stakeholders in terms that they will understand and find persuasive
– Acting in concert to make arguments for appropriate and adequate funding to secure
the nation’s investment in digital resources and ensure an enduring global digital
memory
– Providing a common forum for the development and co-ordination of digital preser-
vation strategies in the UK and placing them within an international context
– Promoting and developing services, technology, and standards for digital preserva-
tion (www.dpconline.org/about/mission-and-goals).

In the DPC’s early years its activities included an advocacy campaign, dis-
semination and current awareness, forums and training workshops, a survey of
industry vendors, its Technology Watch, and surveying DPC members as part
of a UK-wide needs assessment exercise. Its activities in 2011 have expanded,
as indicated by the impressive array of activities and materials on its web site.
The DPC co-sponsors a Digital Preservation Award. Its collaborative activities
are not limited to the UK; for example, it lists as ‘Allied Organisations’ ICSPR,
the National Library of Australia, and the Library of Congress’s NDIIPP.
Advisory materials, many available publicly, include the Digital Preservation
Handbook (referred to frequently in this book), Technology Watch Reports,
and the What’s New in Digital Preservation newsletter.
The significance of the DPC for digital preservation lies in its active en-
couragement of digital preservation through advocacy and training activities,
which have been influential in the UK and are keenly observed in other coun-
192 Digital Preservation Initiatives and Collaborations

tries. (This section is based on Beagrie (2001), the DPC annual reports for
2002-03 and 2003-04, Jones (2004), Simpson (2004) and the DPC’s web site
(www.dpconline.org).)
NDIIPP (www.digitalpreservation.gov)
The US-based National Digital Information Infrastructure and Preservation
Program (NDIIPP) is led by the Library of Congress. Federal legislation estab-
lished this programme in December 2000 with US$100 million in funding.
NDIIPP’s goals are to develop a national digital collection and preservation
strategy, to work with other stakeholders to establish partnerships and form
networks, to help identify and preserve at-risk digital content, and to support
the development of tools, models, and methods for digital preservation. Collabo-
ration lies at the heart of NDIIPP, having been mandated by the legislation that
created the programme, which expected collaboration between the Library of
Congress, National Archives and Records Administration, National Library of
Medicine, National Agricultural Library, National Institute of Standards and
Technology, and other federal agencies, as well as non-federal organizations
and institutions. Since its establishment NDIIPP has sought participation from
a wide range of organizations.
Early activities of NDIIPP included convening stakeholder meetings in
2001, commissioning environmental scans, and presenting a plan to Congress
in 2003. NDIIPP commissioned a report on international digital preservation
activities for its own information, the initiatives surveyed being selected for
their relevance and interest to NDIIPP. In mid-2004 Version 2.0 of the Technical
Architecture for NDIIPP was released for review and comment and NDIIPP
entered into a partnership with the National Science Foundation to fund research
programmes in digital preservation. In September 2004 NDIIPP awarded
US$15 million to eight US consortia for three-year projects to identify, collect
and preserve digital materials within a nationwide digital preservation infra-
structure.
NDIIPP considered its achievements in its first five years to include devel-
oping a network of 50 to 75 partners, establishing a large archive of at-risk
content, and making recommendations to the US Congress about the long-term
governance of a national digital preservation programme. The first edition of
this book noted in 2005 that NDIIPP ‘is being keenly observed by digital preser-
vation interest groups throughout the world. With such large resources at its
command it stands a good chance of providing tools that will revolutionize
how digital preservation is carried out.’ This promise has definitely been met.
By 2011 NDIIPP’s activities have expanded considerably. The number of
partners has more than doubled and includes international partners. Its web site
features an impressive array of resources aimed at all levels of understanding
of digital preservation. As examples, it has provided funding for preserving
at-risk digital collections listed on the web site; it lists tools and services
National initiatives and collaborations 193

developed by its partners; its Personal Archiving page links to an extensive


collection of advice, including videos, about preserving personal digital infor-
mation; it provides videos and podcasts about digital preservation topics; and
there is much more.
NDIIPP’s 2010 report notes that it will focus on building ‘a distributed
stewardship for a national collection’ (NDIIPP, 2011, p.4) and to that end has
established the National Digital Stewardship Alliance (noted below), is con-
tinuing to build a national digital collection, will explore new ways of funding
digital preservation through public–private partnerships, and will investigate
ways in which federal policy can provide greater incentives for digital preser-
vation and access. That it is moving effectively towards meeting these aims is
indicated by a list of the top developments for NDIIPP and its partners in
2010-2011. These included the development of the Recollection tool for
accessing and visualizing digital collections, release of JHOVE2, participation
at international workshops, the launch of DuraCloud, planning meetings to
establish the NDSA, the launch of the Geospatial Data Preservation Resource
Center, launching an NDIIPP Twitter account (@ndiipp) and The Signal blog,
and the second Personal Archiving Day (Manus, 2011a). (This section is based
on Beagrie (2003), Hodge and Frangakis (2004), Manus (2011a), NDIIPP
(2011) and NDIIPP’s web site (www.digitalpreservation.gov).)
National Digital Stewardship Alliance (www.digitalpreservation.gov/ndsa)
The National Digital Stewardship Alliance (NDSA) was launched in July 2010
by NDIIPP as a collaborative national network to support long-term access to
digital content. Its members collaborate to achieve aims such as: broadening
access to digital resources in the US; developing and coordinating sustainable
digital preservation infrastructures; promoting standards; promoting innova-
tion; and raising public awareness of the enduring value of digital resources
and the need for active stewardship of the nation’s digital resources. Collabo-
ration is at the heart of its activities, as indicated by one of its value statements:
‘collaborative work is the centering value of the Alliance; it is a value shared
by all members and a priority in work with all organizations and associations’
(www.digitalpreservation.gov/ndsa/about.html). Membership is open to organi-
zations that can demonstrate a commitment to digital preservation and share
the NDSA’s values of stewardship, collaboration, inclusiveness, and exchange.
Eighty-nine members were listed on the NDSA’s web site at 4 September
2011, coming from academia, all levels of government, and the private sector.
The NDSA is based on the experience of NDIIPP’s first decade, during
which NDIIPP developed and expanded a national digital preservation network.
This network includes a wide range of partners whose collective experience is
influential in developing best practices for the stewardship of digital collec-
tions. Members of NDSA join with other organizations who are committed to
digital preservation to share expertise, tools and practices for mutual benefit.
194 Digital Preservation Initiatives and Collaborations

They contribute by serving on working groups, collecting, providing and/or


curating digital materials, and by providing services.
The five working groups of the NDSA and their key activities are:

– Content – identifying content already preserved and at-risk materials, inves-


tigating selection guidelines, matching orphan content with NDSA partners
who will commit to its preservation
– Infrastructure – building a community for sharing information and best
practices about tools and systems for the long-term preservation of digital
materials
– Innovation – encouraging and sharing innovative digital preservation prac-
tices and technologies, conducting and guiding research and development
to develop new solutions
– Outreach – building relationships with stakeholders, developing and shar-
ing resources about digital preservation
– Standards – developing understanding of the role, benefits and effective
use of digital preservation standards.

(This section is based on Walters and Skinner (2011, pp.32-33) and on the
NDSA web site (www.digitalpreservation.gov/ndsa).)

HathiTrust (www.hathitrust.org)
HathiTrust is an alliance of universities, all except one in the US, whose mission
is ‘to contribute to the common good by collecting, organizing, preserving,
communicating, and sharing the record of human knowledge’. To do this it is
building a digital archive of library materials converted from print, strongly
improving access to these materials, preserving them, coordinating shared
storage strategies among libraries, developing sustainable cost models, and
creating a responsive technical framework (www.hathitrust.org/mission_goals).
HathiTrust began as an initiative by a cooperative of libraries that are
members of the Committee on Institutional Cooperation (CIC) and the Univer-
sity of California system, plus the University of Virginia, to make publicly
available the backup of the Google-digitized versions of books they owned. Its
aim was to archive and share the digitized collections of its members by preserv-
ing the intellectual content and, where possible, the materials’ exact appearance
and layout. It quickly expanded and at 4 September 2011 had as its members
three consortia (CIC, the Triangle Research Libraries Network, and the Uni-
versity of California) and 56 individual institutions. The focus is on preserving
and providing access to digitized book and journal content from partner libraries.
Longer term, HathiTrust’s objectives include expanding its focus to include
other types of digital materials, and engaging in research and development in
search of better discovery and preservation tools. Its growth has been impres-
sive. On 4 September 2011, the HathiTrust web site noted 9,554,741 total
National initiatives and collaborations 195

volumes (428 terabytes) as ‘Currently Digitized’, of which approximately 27 per


cent are in the public domain. This latter point is significant, because, although
HathiTrust’s primary commitment is to its member libraries, it is also committed
to making its materials publicly available to users anywhere in the world to the
extent that legal constraints allow.
Preservation strategies implemented by HathiTrust include redundancy,
quality control and checking regimes, format migration, and adherence to rati-
fied digital preservation standards. To ensure persistence of its data, HathiTrust
maintains two complete, geographically separated, active storage sites as well
as a tape backup. The two versions are automatically checked and synchro-
nized. Content is thoroughly validated on ingest, and regular automated checks
of the integrity of stored content compare digital objects with the versions
ingested. Files to be ingested are required to be in well-documented preserva-
tion formats, allowing effective migration of the archived content in the future.
These formats include TIFF, JPEG or JPEG2000, Unicode text, and XML files
with an accompanying DTD.
HathiTrust is committed to using digital preservation standards. Its specifi-
cations for digital objects require the use of METS and PREMIS. The HathiTrust
repository is designed according to the OAIS Reference Model and the Trust-
worthy Repository Audit and Certification (TRAC) criteria. HathiTrust is one
of only two repositories that had been certified at the time of writing (in 2011)
by the Center for Research Libraries (CRL) as a trusted digital repository
complying with the (TRAC) criteria for digital repositories. (This section is
based on York (2010), Walters and Skinner (2011, pp.43-46) and the Hathi
Trust web site (www.hathitrust.org).)

Sectoral initiatives and collaborations


The term sectoral refers in this chapter to a sector such as a discipline, interest
group, or industry. Case studies of sectoral initiatives and collaborations in
Europe, for example in the pharmaceutical, publishing and government sec-
tors, are available in the ERPANET case studies, (For more information about
these case studies see the section on ERPANET above, the ERPANET web
site, and Ross, Greenan and McKinney (2003).) The sectoral initiatives con-
sidered here are Cedars and JISC.

Sectoral services

Sectoral services for a number of scientific disciplines are described in the


DCC SCARP case studies (see the DCC web site (www.dcc.ac.uk/projects/
scarp) and a summary of the case studies by Lyon et al., 2010.)
196 Digital Preservation Initiatives and Collaborations

Cedars (www.webarchive.org.uk/ukwa/target/99695/source/search)
Cedars (CURL Examplars in Digital Archives) was a collaborative research
project that ran from 1998 to 2002, funded by JISC and hosted at the Universi-
ties of Leeds, Oxford and Cambridge. It is included here because it was ‘an
important test-bed for digital preservation within research libraries in the higher
education sector’ (Beagrie, 2001, p.219). Its aim was to provide guidance to
others in the sector about best practice for digital preservation. Among its out-
comes are a preservation metadata schema and an archiving demonstrator pro-
ject based on OAIS. Some of the activities of Cedars are noted in Chapter 5.
The significance of Cedars for digital preservation lies in its reports (available
on the Cedars web site, as preserved by the UK Web Archive) and its devel-
opment of a prototype distributed archiving system. (This section is based on
Beagrie (2001) and the Cedars site (www.webarchive.org.uk/ukwa/target/99695/
source/search).)

Sectoral alliances
JISC (www.jisc.ac.uk)
JISC (Joint Information Systems Committee) represents institutions in the UK
higher and further education sector. Its mission is ‘to provide world-class leader-
ship in the innovative use of Information and Communications Technology to
support education, research and institutional effectiveness’ (www.jisc.ac.uk/
aboutus/strategy.aspx). JISC has energetically pursued digital preservation
activities since its early collaborations, such as the AHDS, Cedars, and
CAMiLEON (all referred to elsewhere in this chapter), and currently through
its funding of programmes and projects in the field. It has also provided sub-
stantial funding for the Digital Curation Centre.
Beagrie’s description of JISC’s Continuing Access and Digital Preserva-
tion Strategy (one of the finalists in the inaugural Digital Preservation Award)
which ran from 2002 to 2005, notes that JISC had three major objectives in its
digital preservation activities:

– To establish and disseminate best practice and guidelines for digital preservation – a
major outcome was the publication of Preservation Management of Digital Materials:
A Handbook (Jones and Beagrie, 2001)
– To collaborate with other agencies worldwide – the DPC was the principal outcome
of this objective
– To develop a long-term digital preservation strategy relevant to the higher and further
education sector in the UK – the major outcome was JISC’s Continuing Access and
Digital Preservation Strategy and implementation plan (Beagrie, 2004).

JISC’s Continuing Access and Digital Preservation Strategy was influential. Its
activities included the provision of a wide range of resources on its web site,
Conclusion 197

for example internet resources, e-journals, e-prints, feasibility studies and risk
assessments that recommended actions for specific categories of digital mate-
rial of interest to JISC members. It lobbied successfully for a Digital Curation
Centre (described above). Beagrie notes other JISC achievements in progress-
ing digital preservation in the UK, including increased funding for digital preser-
vation activities, and partnerships such as with DPC and the UK Web Archiving
Consortium (Beagrie, 2004).
In 2011 JISC’s activities in this field come under the umbrella of digital
preservation and curation and are focused on keeping valuable and useful digital
material available for future use by scholars, researchers and other users. Seven
programmes are identified, including ‘Managing Research Data’, the ‘Reposi-
tories and Preservation Programme’, and ‘Digital Preservation and Records
Management’. Its many projects include research and development into archiv-
ing e-publications, archiving JISC-funded project web sites, tool development,
and policy studies. A list of these projects, some of which are noted elsewhere
in this book, is available on the JISC web site (www.jisc.ac.uk/whatwedo/
topics/digitalpreservation.aspx?page=1&filter=Projects).
JISC has been an important catalyst for digital preservation, not only in the
UK higher and further education sector, but also more widely in the UK and
internationally. (This section is based on Beagrie (2004) and on the JISC web site
(www.jisc.ac.uk, in particular, www.jisc.ac.uk/whatwedo/topics/digitalpreserva
tion.aspx).)

Conclusion
In describing this selection of digital preservation programmes and initiatives
the intention has been to illustrate the range and nature of digital preservation
activities, and to emphasize their increasingly collaborative nature. It is reas-
suring to note that many of the programmes and initiatives noted in the first
edition of this book, published in 2005, are still thriving. Some have formed
new alliances, combining with other programmes or initiatives, and many new
programmes and initiatives have emerged. Only a small selection of the many
that exist in 2011 can be provided in this book, which has inevitably omitted
mention of some significant examples. Investigating other examples may prompt
the reader to reflect on digital preservation activities and, in doing so, to derive
some value from them. Useful starting points to locate other digital preserva-
tion services and alliances are the ‘Partners’ and ‘Tools & Services’ sections of
the Library of Congress’s NDIIPP web site (www.digitalpreservation.gov) and
the DCC web site’s list of ‘Tools and Applications’ (www.dcc.ac.uk/resources/
tools-and-applications).
The reader should be aware that these descriptions are of activities up to
the middle of 2011. Given the rapid rate of progress in the field of digital preser-
198 Digital Preservation Initiatives and Collaborations

vation, information about activities in the field becomes outdated quickly.


Frequent reference to the web sites of these programmes and initiatives and to
the sources noted in the previous paragraph is recommended to ensure that
knowledge of them is up-to-date.
Chapter 10
Challenges for the Future of Digital Preservation

Introduction
The lack of awareness of digital preservation issues
by stakeholders, the lack of the necessary skill sets
to preserve digital materials, the lack of agreed in-
ternational approaches, a shortage of practical models
on which to base preservation practice, and a lack of
funding on an ongoing basis to address digital preser-
vation issues, all contribute to the problem (Beagrie,
2003, p.6)

This chapter identifies the key challenges that are likely to be the focus of digital
preservation activities in the future. Predicting futures is, of course, always a
risky business; in this attempt to do so the views of many experts are drawn
upon to derive a consensus about what the challenges will be.
In 2001 the key threats to digital continuity were listed, in an earlier ver-
sion of the Digital Preservation Coalition’s handbook, as:

– rapid obsolescence resulting in large part from a market-driven ethos


– the creators of digital materials and other stakeholders may lose interest
– the lack of awareness of digital preservation issues by stakeholders
– the lack of the necessary skill sets to preserve digital materials
– the lack of agreed international approaches
– a shortage of practical models on which to base preservation practice
– a lack of funding on an ongoing basis to address digital preservation issues
– traditional organizational structures that do not readily accommodate digital preser-
vation activities
– lack of agreement about which institutions should preserve digital materials
– lack of applicable selection principles
– legal issues, such as intellectual property rights and legal deposit (Jones and Beagrie,
2001, pp.27-34).

Ten years later the key threats are the same, but new areas of concern have
appeared:
200 Challenges for the Future of Digital Preservation

Technological Risks
– Hardware and software, both proprietary and open source, can be a challenge to
maintain and keep current.
– Content formats can be complex and fragile. They are often not well documented
and frequently become obsolete.
– Lifecycle management risks such as data migration, file degradation (“bit rot”), or
unauthorized use can make content unusable.

Legal and Policy Risks


– Copyright laws are unclear about libraries’ rights to create and keep preservation cop-
ies.
– Privacy claims can prohibit collection and documentation of content.
– Sarbanes-Oxley regulations can induce content owners to destroy historically valu-
able documents.
– The law does not recognize public value in preserving digital content. There are few
policy incentives for concerned parties to preserve content in the public interest.

Content Risks
– The volume or complexity of content makes it difficult to collect comprehensively.
– Insufficient description of content makes it challenging to discover or retrieve it
for use.

Organizational Risks
– Insufficient resources to maintain information can lead to content loss.
– Lines of authority and responsibility for maintaining digital content are often not
aligned with the demands of such content.
– Insufficient skilled personnel can prevent even routine best practices from being im-
plemented. (NDIIPP, 2011, p.12)

Although there is considerable commonality between these two lists, there are
also differences that indicate that our understandings and experience have
developed and suggest how our concerns have shifted over the decade. The
key tenet of obsolescence is a constant. Also present in both lists are concerns
about lack of skilled personnel, lack of sufficient funding, organizational struc-
tures that do not accommodate digital preservation, legal issues, and lack of
policy incentives to preserve and agreements about whose reponsibility it is to
preserve. The 2011 list no longer notes some concerns present a decade earlier,
suggesting that there is now little, perhaps no, apprehension about them: aware-
ness of digital preservation issues, including awareness by creators of digital
materials; a lack of international approaches; a shortage of practical models;
and selection. New concerns are expressed: the quantity and complexity of
digital materials; lack of metadata; and ‘lifecycle management risks’ (threats
from migration, file degradation, unauthorized use are specified – although these
were also present in 2001).
What have we learned so far? 201

What have we learned so far?


How far have we gone towards countering the threats to digital continuity?
What do we now know about digital preservation? The two lists from a decade
apart suggest that change has occurred. Some of our knowledge to date is
summed up in three aphorisms developed from the experiences of participants
in Canadian Conservation Institute workshops: something is better than noth-
ing; move it or lose it; don’t throw out the original (Strang, 2003). Webb’s six
points also summarize a significant body of experience:

– selection is necessary
– benign neglect does not work
– someone needs to accept responsibility
– working together is effective and increasing
– we already know much, including much about what we need to do
– action is possible now, even if we do not have all of the answers (Webb,
2004, p.50 (paraphrased)).

The Center for Research Libraries’ ‘Ten Principles’ statement in 2007 indicates
a more recent understanding of the requirements. In summary, there needs to
be institutional commitment to digital preservation; appropriate levels of staff-
ing, technical infrastructure and other resourcing; legal rights over the digital
materials; policies and planning procedures in place; selection criteria; techni-
cal procedures that maintain and ensure the integrity, authenticity and usability
of digital objects; metadata; and the ability to disseminate the digital objects
(Center for Research Libraries, 2007).
These are, however, only summaries. Closer examination of the challenges
is worthwhile to reinforce how much we still need to do and to delineate more
precisely those challenges. The challenges are unquestionably not just technical;
in fact, there is a school of thought that we have most of the technical solutions
at hand, or we can develop additional technical solutions to problems as they
are defined. The greater challenges, for which there can be no technical solu-
tions, are social, political and economic. Fourteen challenges were articulated
in the first edition of this book, published in 2005. They did not constitute a
comprehensive list, but were those about which there was a high level of con-
sensus, and it is revealing to consider them from the perspective of 2011. Two
more challenges have been added in this edition.
1) Developing standards for digital preservation and encouraging their use.
The development and implementation of standards (discussed in Chapters 7
and 8) are key aspects of most digital preservation activities. The benefits of
using standards include the opportunity for increased interoperability, greater
economies associated with limiting the number of file formats handled, and facili-
tation of collaboration. The rapid rate of change associated with technologies,
for instance, the rapid obsolescence of file formats as new formats replace them,
202 Challenges for the Future of Digital Preservation

means that monitoring and maintenance of standards need to be constant and


ongoing. Since 2001 there has been positive development, with standards
developed and/or ratified since then now in common use. Examples include
the OAIS Reference Model, ratified in 2003 as an ISO standard, and PDF/A,
also ratified as an ISO standard, first in 2005 as PDF/A-1 and in 2011 as
PDF/A-2. This challenge is ongoing, despite the considerable advances made.
2) Influencing data creation where possible. Encouraging the application
of standards such as metadata standards, persistent identifiers and the use of
non-proprietary file formats extends the period of time for which digital objects
remain accessible and usable.
Engaging the creators of digital objects so that they understand the issues
and apply appropriate standards when creating their digital materials is the
main way to address this challenge. The absence of this as a concern in the
2011 list of threats to digital preservation suggests that there has been positive
movement. Even if the level of awareness of this challenge has increased, there
is an ongoing need to educate new creators about the issues and techniques to
address them.
3) Increasing collaboration and engaging stakeholders in digital preserva-
tion. Chapter 9 noted in some detail the value of collaboration and Chapter 2 the
value of engaging stakeholders in digital preservation activities. Collaborative
research and development projects and consortia based on digital preservation
activities, such as storage, are now considered as the norm in the digital preser-
vation community. Collaboration does not occur without effort; the greatest
challenges are in identifying stakeholders and potential partners and engaging
them in collaborative activities, and other challenges lie in managing collabo-
rative activities so that they are effective and benefit all participants.
4) Developing policy about digital preservation. Clearly articulated policies,
such as a written policy framework specifically aimed at digital preservation
(noted in Chapter 6), are considered an essential requirement for effective digital
preservation. One aspect of policy, and perhaps the one where the greatest
challenges lie, is the decision about who will take responsibility. This is a
critical decision because digital materials require constant and ongoing attention
to ensure their preservation. Concern about where responsibility lies has grown
in the last ten years; the 2010 NDIIPP statement of risks notes specifically the
lack of alignment between who takes responsibility for maintaining digital
materials and the demands of the material.
5) Ensuring that legal rights do not impede digital preservation. A major
challenge (noted in Chapter 4) arises from the conflict between ownership and
control of intellectual property, on the one hand, and the needs of cultural heri-
tage institutions to collect and copy for preservation purposes, on the other.
Ongoing efforts are required in order to change legislation to ensure adequate
rights to collect and copy digital materials for preservation purposes. Although
there has been some activity over the last decade, such as lobbying of legislators
What have we learned so far? 203

to enact laws allowing preservation institutions to make and keep sufficient


copies of digital materials as required for effective preservation, the legal
impediments to digital preservation are still considered one of its major chal-
lenges.
6) Determining which digital materials, and what attributes of them, it is
crucial to maintain access to. Although some argue that it is technically feasible
to collect all digital materials, most commentators appreciate that it is not prac-
tical to collect everything and selection of digital materials is required (noted
in Chapter 4). In addition, decisions about the attributes of digital materials
that we should preserve are required (noted in Chapter 5). Clearly articulated
selection policies and guidelines must be developed and a better understanding
of the attributes of digital materials that affect their authenticity must be
reached. There is little evidence that much has changed in thinking about and
development of selection and appraisal policies over the last decade.
7) Integrating digital preservation into mainstream operations. It is now
recognized that the viability of ongoing digital preservation programmes is
compromised if they depend on short-term project funding, and that they need to
be fully incorporated into the mainstream activities of institutions. The challenge
is to build sufficient awareness of this so that digital preservation is appropri-
ately funded and resourced as a mainstream activity. The infrastructure and
organizational structures of most libraries and archives do not yet recognize
curation of digital records as a core activity. Some significant changes are
occurring, however, such as the widespread implementation of cloud computing
with cloud storage being used increasingly for digital preservation purposes
(Kimpton and Payette, 2010). Curation activities need to become a routine part
of the operation of libraries and archives, incorporating curation processes into
normal workflows. This is not necessarily straightforward, as the experience of
the Wellcome Library indicates (Wellcome Library, 2009). Integrating digital
preservation into mainstream operations remains a significant challenge. Aspects
of this challenge are discussed later in this chapter.
8) Taking action now, even if on a small scale. Over the last decade there
has been substantial activity in defining and codifying widely agreed-upon
techniques for digital preservation. ‘Is digital preservation now routine?’ a con-
ference attendee asked in 2010; he concluded that ‘not many years ago such
conferences were dominated by debate about the technical complexities it
posed, about … the risks we faced if and when we got it wrong’. Now, we are
in the position ‘where our ability to actually preserve this stuff indefinitely and
to continue to provide access to it seems, without much triumph or fanfare to
now be taken as read’ (Bailey, 2010). Earlier advice still holds: ‘take active
steps now, even small ones, which will preserve access for the “manageable
future”, while also planning for whatever long-term approaches appear to be
the most practical’ (UNESCO, 2003, p.121). Waiting for reliable techniques to
become available will almost certainly result in the loss of material. The chal-
204 Challenges for the Future of Digital Preservation

lenge is to promote greater awareness of the likelihood of loss and provide


appropriate guidelines about how to take action. There is more about aspects
of this challenge later in this chapter.
9) Maintaining access to the preserved digital materials. Perhaps the
greatest challenge faced in digital preservation is to maintain access – ‘the abil-
ity to present the essential elements of authentic digital materials’ (UNESCO,
2003, p.21) – to preserved digital materials. The numerous challenges in
achieving this are noted throughout this book, and strategies and techniques
available to us are noted in Chapters 6, 7, and 8. As indicated in challenge no. 8,
there has been significant progress in our abilities in this respect.
10) Maintaining the authenticity and integrity of the preserved digital
materials. This challenge is noted in detail in Chapter 5. Although authenticity
is not a primary requirement for all digital materials, for many categories (for
example, for digital materials that are used for evidential purposes) it is essential.
One challenge, as noted above, is to determine the attributes that determine
authenticity for different categories of materials; another is to maintain those
attributes into the future. Research activities, in particular those of the Inter
PARES Project and the InSPECT Project (both noted in Chapter 5), have over
the last decade contributed significantly to our understanding of the require-
ments.
11) Developing better costing data. Considerable research has been under-
taken over the last decade into the costs of digital preservation. Whereas in
2001 there was a lack of reliable data about the costs of digital preservation,
ten years later it is possible to state that we know a lot about it, although there
is still more to understand. Better costing data are needed immediately; with-
out better information about costs, actions such as securing funding and deter-
mining responsibilities for digital preservation are impeded.
12) Securing funding for digital preservation. As noted in challenge no. 7,
an emphasis on short-term project funding is not conducive to long-term viabil-
ity of digital preservation. Securing long-term funding requires that political
masters and funding agencies must be convinced of the need for digital infor-
mation to be preserved. It may, in part, be mitigated by success in addressing
one of the challenges mentioned above – integrating digital preservation into
mainstream operations. There has not been sufficient movement in this area;
integrating digital preservation into mainstream operations still remains as a
challenge, as it continues to rely heavily on short-term funding.
13) Increasing awareness about digital preservation. Digital preservation
will be incorporated into mainstream activities, will attract secure funding, will
attract appropriate resources such as skilled personnel, and will achieve maturity
in other respects only when awareness of it is heightened at all levels, from the
political to the individual. This challenge has been addressed positively; for
example, it no longer appears on the 2011 list of digital preservation risks. As
with challenge no. 2, even if the level of awareness has increased, there is an
What have we learned so far? 205

ongoing need to educate new creators about the issues and about the tech-
niques to address them.
14) Developing the skills required for digital preservation. Skilled personnel
with appropriate knowledge of digital preservation requirements are in short
supply around the world (noted briefly in Chapter 1). The challenge is to identify
more precisely the skill sets that are required and to encourage their acquisition.
While the number of opportunities to acquire skills in digital preservation has
increased considerably since the first edition of this book appeared in 2005, the
demand for skilled personnel will only continue to increase. This challenge is
discussed in more detail later in this chapter.
Two challenges have been added to the fourteen examined in the first edi-
tion of this book, published in 2005. They have been identified from recent
writings about digital preservation.
15) Automating digital preservation. The most pressing of the technical
issues requiring a better response is the need to automate curation processes for
handling the very large and increasing quantities of digital records. Software
tools that enable the automation of preservation processes, such as refreshing,
migration, tracking changes to data, verifying provenance and assigning meta-
data that are easy to implement and use, need to be developed. One challenge
is to automate more of the task of creating metadata; unless this is achieved we
are seriously limited by our still largely manual digital preservation processes,
which can only deal with small quantities of digital materials and generate a
relatively small amount of metadata. Although considerable progress has been
made in European research projects, such as toolkit development by the
Planets and CASPAR projects, more are definitely needed. Among current
research and development activities is the development of curation micro-
services (noted in Chapter 8).
16) Coping with large and increasing quantities of changing digital mate-
rial. The sheer quantity of digital material being generated, coupled with their
changing nature and changes in the way they are used, raise major challenges
for their preservation. Sharing and re-using material are emphasized more and
more, requiring that standards for creating digital objects are agreed and
widely implemented to ensure they can be discovered, located and preserved.
The deconstruction, repurposing and constant rebundling of digital information
have significant implications for the authenticity of digital records of all kinds,
including business records. As noted in Chapter 5, authenticity can be ensured
by actions such as paying attention to the significant properties of digital records
and being meticulous about quality-checking procedures during migration.
This poses challenges in part because of the quantities of records involved and
the present lack of viable automated procedures to support these tasks.
206 Challenges for the Future of Digital Preservation

Four major challenges


Of these 16 challenges, four emerged as the most intractable during interviews
with Australian digital preservation experts when the first edition of this book
was being prepared in 2004, and these have been reiterated in the literature of
digital preservation, at conferences, and through the concerns of research funding
bodies since then:

– challenge 1: managing digital preservation, especially the importance of


integrating digital preservation into mainstream operations
– challenge 2: funding digital preservation, especially without adequate costing
data for digital preservation activities
– challenge 3: peopling digital preservation, especially defining the skills re-
quired in the future and ensuring that they remain available
– challenge 4: making digital preservation fit, especially achieving scalability
of practice to both large systems and the smallest institutions.

The discussion that follows is based on the views and comments gathered during
the interviews with digital preservation experts in 2004, and amplified by the
literature of digital preservation.

Challenge 1: managing digital preservation

One frequently voiced issue is that digital preservation has not become a normal
part of mainstream practice in most institutions. Much of the development in
digital preservation to date has been carried out as special projects. The project
approach has some merit as a response to the magnitude of the problem, where
project-based approaches that bring stakeholders together to collaborate are
desirable and effective. It also has merit as a way of securing funding for scop-
ing studies, or of exploring a technique, or of testing the waters in other areas.
However, project-based funding is by definition short-term and engenders short-
term ways of thinking. Project-based digital preservation activities get in the way
of an appropriate recognition that digital preservation is here to stay and can only
be addressed if all institutions and stakeholders play a part. Beagrie states that

ultimately, digital preservation will be successful when it can be seen not as a stand
alone institutional activity but as an activity embedded in how institutions manage and
approach digital information and resources on an ongoing basis ... It remains a simple
objective, yet one immensely challenging to achieve (Beagrie, 2004).

Project-based approaches have achieved many notable results, but will not en-
sure sustainability in the medium or long term. The challenge is to integrate digi-
tal preservation fully into normal operations of libraries, archives, museums
Four major challenges 207

and other institutions with responsibility for digital sustainability. As one Austra-
lian digital preservation specialist put it, ‘the challenge we’ve got ... around
digital preservation … [is] that it just becomes the way we do things’. Another
Australian expert interviewed described the issue in the library context. Because
digital materials have not been incorporated into the normal processes, they
‘are treated as a different acquisition process, or as a different this or a different
that’. Short-term thinking based primarily on business need may lead to digital
preservation activities being halted; if no current demand for material is evident,
it may be disposed of.
The principal reason for this concern is that sustaining digital materials
over time requires ongoing, unbroken activity, expressed pithily by an Austra-
lian digital preservation specialist in this way: ‘preservation is not something
that you ever achieve. It’s just a process you’re in the middle of, or at the start
of, but never at the end of’. There needs to be full recognition that digital preser-
vation is an active process that is fully integrated into mainstream operations in
libraries, archives, museums and other institutions with responsibility for digital
sustainability.
How do we make the change from project-based, short-term support for
digital preservation to its integration into mainstream activities? What changes
will be required? Some institutions have created new positions at senior levels
that aim to break down existing structures by working across different areas. An
example is the position of Librarian/Archivist for Digital Projects at the
Schlesinger Library, part of Harvard University’s Radcliffe Institute for Ad-
vanced Study. This position has oversight of all digital materials in the
Schlesinger Library, from selection and appraisal to preservation. Other institu-
tions have already successfully integrated digital preservation into their normal
operations, or are well on the way to doing it. One is the National Archives of
Australia, whose preservation programme is ‘concerned to preserve all formats
of records … It’s not about digital … just as AV preservation is about preserva-
tion of the record – there are special skill sets for people needed in that too, but
it’s still fundamentally about preservation’. The National Library of Australia
‘has been able to institutionalize [digital preservation] or operationalize it’.
(These are the words of Australian digital preservation specialists interviewed in
2004.) The Wellcome Library is acquiring born-digital materials on the same ba-
sis as other material it collects, building digital preservation ‘into the everyday
business’ of the Library (Hilton and Thompson, 2007). Its early experience in
developing workflows and procedures to accommodate digital materials was
helped by not distinguishing digital materials as different (Thompson, 2008).
A Capability Maturity Guide developed by the Australian National Data
Service (2011), although intended to apply to research data infrastructure provi-
sion, is applicable to cultural heritage and other institutions who have digital
preservation programmes. It provides helpful guidelines for assessing the stage
an institution is at and for identifying areas for improvement. It proposes five
208 Challenges for the Future of Digital Preservation

levels: the initial level, the developmental level, the defined level, the managed
level, and the optimizing level. Summaries of each level are provided and within
each of these the level of performance in five areas (policies and procedures, IT
infrastructure, support services, and metadata management) is defined. At Level
1, ‘the organisation does not provide a stable environment to support research
data management. Expertise is likely to be concentrated within a few individuals
… Co-ordination and cohesion across the various groups (e.g. research office,
IT, library, records office, research areas) … is patchy, if non-existent’. By con-
trast, at Level 5 the organization ‘uses the policies, procedures, practices, ser-
vices and facilities already developed as the basis for continual improvement’.
Two approaches to encourage integration are gaining popularity. One is a
risk management approach, increasingly being applied to digital preservation
activities. This involves a systematic identification of potential risks and actions
applied to manage those risks. Another approach is to use business planning
methodologies (see Bishoff and Allen, 2004).

Challenge 2: funding digital preservation

The UNESCO Guidelines note that two aspects should characterize reliable
digital preservation programmes. One is ‘organisational viability’ which is
based on ‘an ongoing mandate’ and appropriate resources and infrastructure.
The other is ‘financial sustainability’: the organization is likely to ‘provide the
required resources well into the future, with a sustainable business model to
support its digital preservation mandate’ (UNESCO, 2003, p.42).
As noted in the discussion above, digital preservation is vulnerable unless
it becomes part of core business activities. This means that its funding must
become a core business expense and, in the library context, that preservation
‘needs to be viewed as an activity just as essential as is cataloging’ (A. Smith,
2002, p.5). Funding should be sustained to pay the digital mortgage. Adequate
sustained funding is critical; it may be the key issue for digital preservation,
subsuming all others. Technical issues, such as determining significant properties
for new formats and ensuring that they are captured, can be seen as resourcing
issues, because the expertise to carry out such technical work can only be hired
if money to pay salaries is available. The technical infrastructure can be ex-
pensive to purchase, operate and replace on a regular basis; again, this is
a resourcing issue. Digital archives typically grow in size, so their funding
requirements also continue to grow.
One problem that continues to obstruct securing ongoing funding is our
lack of concrete information about how much it will cost. Projects such as Ce-
dars attempted to develop firmer costing data as part of their research activities,
but despite their efforts the outcomes are still not well defined. Early investiga-
tions of the costs of digital preservation included research into the ongoing
Four major challenges 209

costs to libraries of maintaining periodicals in print and digital forms (Schon-


field et al., 2004). Cloonan and Sanett (2000, p.210) determined that the major
factor influencing decisions in many institutions is how much it will cost, and
they urged the development of better cost modeling. Sanett notes elsewhere that
‘a cost model makes intelligent planning possible’ (Sanett, 2003). One attempt
to cost repository storage at OCLC noted a 1996 prediction by Lesk that ‘“the
costs of the digital and traditional library operations [would] cross over in
about five years,” and that electronic storage would offer a “major cost advan-
tage” within ten’ (Chapman, 2003, p.8). The OCLC experience was that the
cross-over point had not been reached for one type of digital material, com-
pressed 600 dpi images, but had been for ASCII files. This study concluded
that ‘there are many variables to the equation of managed storage costs’, such
as formats, numbers of items, number of versions, and number of collections,
which impeded firm costings at that time (Chapman, 2003, p.13). James et al.
(2003) also researched costs of repository storage, in this case cost models for
preserving e-prints in institutional repositories.
A model developed by Lavoie describes the economic issues of funding
digital preservation in terms of incentives, which need to be strong enough to
solicit stakeholders’ participation. The key stakeholders are the owners of the
intellectual property rights, the archive that provides the services, and the entity
that benefits from the long-term preservation of the materials. Lavoie charac-
terized the incentives as ‘perceived motivation sufficient to 1) induce a party to
recognize a need to take action to secure the long-term viability of digital
materials in which they are a stakeholder, and 2) induce a party to develop and
implement technologies aimed at ensuring the long-term viability of digital
materials’ (Lavoie, 2003, p.ii). He noted that the absence of empirical data
about costing has required that values for many of the variables be estimated
(p.7), examined five organizational models, and provided an agenda for further
research that

should include the accumulation and synthesis of digital preservation case studies; the
development of appropriate policies for enhancing incentives based on the characteris-
tics of the underlying organizational model; characterizing and analyzing the structure
of aftermarkets associated with digital preservation services; and devising sustainable
pricing strategies for digital preservation services (Lavoie, 2003, p.iii).

Lavoie has since expanded his view of the economic issues to encompass the
responsibilities of stakeholders, strategies for organizing preservation resources
most efficiently (Lavoie, 2004), and public good in relation to responsibilities
and costings for digital preservation (Lavoie and Dempsey, 2004).
Research into costs is ongoing and to date focuses on developing better
costing data – Lavoie’s ‘accumulation and synthesis of digital preservation
case studies’ – and on investigating where resources might come from. The
210 Challenges for the Future of Digital Preservation

Blue Ribbon Task Force on Sustainable Digital Preservation and Access inves-
tigated digital preservation from the perspective of economic sustainability by
determining costs and identifying sustainable economic models that could
ensure the availability of resources for preservation activities. Its final report
(Blue Ribbon Task Force, 2010) is required reading for anyone interested in
digital preservation. Another investigation into sustainability, which focused
on the role of funding bodies, concluded that although the funders of digital
resource creation agree on the value of sustaining resources, their views on
what this means in practice is not uniform. The investigators’ report (Maron
and Loy, 2011) called for clearer articulation of thinking about requirements
for sustainability, the costs of achieving sustainability, and possible sources
of funding.
Several bodies are actively investigating the costs of ensuring that research
data are accessible. The Alliance for Permanent Access, one of whose aims is
to ‘support the development of a sustainable European Digital Information
infrastructure that guarantees the permanent access to the digital records of
science’ (www.alliancepermanentaccess.org/index.php/about) held a conference
in 2008 with the theme ‘Keeping the Records of Science Accessible: Can We
Afford It?’ (Alliance for Permanent Access, 2008). Among the points made at
this conference was that cost was determined in part by when preservation
actions were carried out. Curating data properly when they are created is much
cheaper than attempting to do so later. Clearly, then, planning data curation
before data creation begins is well worthwhile.
Two particularly significant cost modeling projects are LIFE and Keeping
Research Data Safe. The LIFE (Lifecycle Information for E-Literature) Project
(www.life.ac.uk), funded by JISC in the UK, had three phases. In the first
phase a LIFE model was developed that identified cost elements at specific
stages of the lifecycle; it was tested against case studies to develop a generic
preservation model. In Phase 2 new case studies were analyzed and the model
was further developed (Wheatley, 2008). Phase 3 has produced a web-based
predictive cost model (Hole et al., 2010). JISC also funded the Keeping Research
Data Safe (KRDS) project (www.beagrie.com/krds.php). Its outputs include a
toolkit to help determine the benefits of digital preservation, and a detailed
user guide to the application of the KRDS cost framework to develop local
cost models for digital preservation. A factsheet summarizes key findings from
the two phases of the KRDS project, including that acquisition and ingest of
digital materials into an archive cost much more (around 55 per cent of the total
preservation costs in one study) than archival storage and preservation activi-
ties, and that there are fixed costs that do not vary, regardless of the size of the
collection. The KRDS factsheet (Charles Beagrie Ltd and JISC, 2010) is re-
quired reading for anyone interested in digital preservation.
Those involved in the LIFE and KRDS projects conferred with personnel
from other cost modeling projects (the Danish CMDP, and DANS and the
Four major challenges 211

National Archives Testbed in the Netherlands) in 2010 to compare models and


collaborate on future activities (Expert Meeting, 2010). Among the observa-
tions reported were: file formats may last longer than previously expected
(TIFF, for example, for 16 years); assumptions underlying one model that 10
per cent of the records would require repair turn out to be much too low; fees
for managing public records may help cover long-term access costs; and, as
ingest is the most expensive phase, a way of funding preservation may be to
charge a one-time fee at ingest. The experts at this meeting concluded that
much more research on cost modeling must be done.

Challenge 3: peopling digital preservation

Commentators regularly lament the lack of people with the expertise needed to
extend the digital preservation agenda and have done so for many years. Hed-
strom and Montgomery noted in 1999 that ‘Lack of staff expertise is a common
problem both in institutions with digital preservation responsibilities and in
institutions that have not yet assumed responsibility for digital materials’, adding
that those with the required information technology skills typically did not
have an understanding of long-term preservation (Hedstrom and Montgomery,
1999, pp.16-18). In 2004 the International Council on Archives noted ‘the lack
of adequate training of, and human resources development for, records per-
sonnel’ and how this impeded the efforts of archivists to ‘protect archives as
evidence and to preserve an “authentic digital heritage”’ (Millar, 2004, p.5).
The proliferation of activities since then to identify skills needed for digital
preservation and to develop these skills, and the funding that has been devoted
to educating and training information professionals with digital preservation
skills indicates very clearly that the situation has persisted.
What new knowledge, what new skill sets are needed? Chapter 1 noted
that new skills would be required to implement new policies and develop new
procedures. A decade ago Jones and Beagrie described the digital preservation
work environment and identified some of its requirements, characterizing it as
an environment of rapid and constant change, where the boundaries of respon-
sibilities were blurred and increased weight was given to collaboration and
accountability. For digital preservation activities specifically, formal training
opportunities were rare, so that much was learned on the job, and informal
contacts with others working in the field were valuable (Jones and Beagrie,
2001, p.54). There was consensus that general professional skills were required
to a high level, to provide a deep understanding of the reasons why preservation
is required within the context of that profession and to provide an holistic under-
standing of digital preservation rather than a narrow perspective. Generic skills
such as project management, communication and presentation skills, and stra-
tegic thinking were also considered to be necessary at a high level. Technical
212 Challenges for the Future of Digital Preservation

skills to a higher degree than previously required were also considered essen-
tial. At a broader technical level, digital preservation specialists needed to
know about areas that assume greater importance for digital preservation, such
as metadata and XML, and also about the specific media that they work with.
The IT skills needed to be at the level where (in the words of an Australian
digital preservation specialist in 2004) ‘you’ve got the literacy to be able to get
down into that bit-stream and extract things and pull things out of corrupted
disks, and all of that – all that technical nous’, and, in addition to this computer
literacy, digital preservation specialists need to have more generic skills in areas
such as change management, writing comprehensible and rigorous documenta-
tion, and working as part of teams.
More recently, research has attempted to improve our understanding of ex-
actly what the skills needed for digital preservation are. The most detailed
investigation has been carried out by the DigCCurr (Digital Curation Curricu-
lum) project, based at the School of Information and Library Science, University
of North Carolina at Chapel Hill, which has produced a detailed listing of the
skills and competencies needed (Lee, 2008, 2009). Another investigation was
the SHERPA (Securing a Hybrid Environment for Research Preservation and
Access) Project (sherpa.ac.uk/index.html) which identified broad categories of
skills: management; software; metadata; storage and preservation; content;
advocacy, training and support; liaison (internal); liaison (external); and current
awareness and professional development (Robinson, 2009). The balance between
generic and technical skills is present in all listings of skills required for digital
preservation. Another example is given by Cunningham who includes advocacy
skills as well as generic personal attributes, such as the ability to respond to
change, among requirements for digital archivists (Cunningham, 2008, pp.541-
542). The interest in identifying better the skills needed and in ensuring the
availability of training in those continues, as evidenced by meetings such as
the ICE (International Curriculum Education) Forum held in London in 2011
(sils.unc.edu/events/2011/ice), at which participants from several countries dis-
cussed curricula, course design, and the production of educational materials at
all levels.
Where can the skills for digital preservation be acquired? The options range
from formal education programmes to learning on the job. In the past five years
there has been considerable growth in formal educational programmes devoted
to managing digital materials which have a substantial digital preservation
component. Examples are, in the UK, the MA in Digital Asset Management at
King’s College London (www.kcl.ac.uk/prospectus/graduate/index/name/digital-
asset-management) and the MSc in Information Management and Preservation
(Digital) offered by HATII (Humanities Advanced Technology and Information
Institute) at the University of Glasgow (www.gla.ac.uk/departments/hatii), and
in the US, the University of Arizona’s online Graduate Certificate in Digital
Information Management (digin.arizona.edu). Many other schools of informa-
Four major challenges 213

tion management or librarianship around the world offer individual subjects,


sequences of subjects, or graduate certificates in digital preservation or curation.
Increasingly the large digital preservation research projects are including
educational opportunities in their agendas. Among examples of these are:
InterPARES 3 (www.interpares.org/ip3/ip3_ index.cfm) which is developing
teaching modules for use in in-house training programs, continuing education
workshops and academic curricula; the Planets Project which has produced
training materials and made them freely available online (www.planets-
project.eu/training-materials); and DataONE (www.dataone.org/about), one of
the cyberinfrastructure projects funded generously in the US by the National
Science Foundation, which intends to provide training on best practices to
scientists and students as well as formal graduate-level training. Informal
opportunities also exist, such as online self-study tutorials. A long-established
example is the Cornell University Library’s online tutorial Digital Preservation
Management: Implementing Short-term Strategies for Long-term Problems
(Kenney et al., 2003). Other informal educational opportunities include work-
shops, summer schools and professional institutes.
Despite the considerable progress evident in the last ten years, the challenge
of meeting the ‘pressing need for education and training in new digital archiving
methods, tools, and technologies’ (Workshop on Research Challenges in Digital
Archiving and Long-term Preservation, 2003, p.xix) is still great and will take
many years to address comprehensively.

Challenge 4: making digital preservation fit

Kenney, in an unpublished address to the UNESCO Regional Consultation on


the Preservation of Digital Heritage meeting, Canberra, Australia, 4-6 November
2002, noted that digital preservation activities can be characterized as being
too high (conceptual rather than pragmatic), too low (narrowly defined), too
little (failing to acknowledge the problem), or too late (occurring reactively
rather than anticipating the problem). This characterization still holds ten years
later. For digital preservation activities to be fully integrated into standard
practice in cultural heritage institutions and into the working practice of scien-
tists, scholars, and indeed all individuals, they need to be applicable to indi-
viduals as well as to institutions of all sizes and shapes. Individuals and the
employees of institutions must be aware of the need for digital preservation
and must have ready access to practical advice that they can apply. Discussion
about digital preservation has too frequently been couched in terms that apply
only to large well-resourced institutions, with little regard for the requirements
of smaller organizations and of individuals. Yet it has become abundantly clear
that there is a need to engage all those involved in digital preservation and to
address its challenges at all levels.
214 Challenges for the Future of Digital Preservation

To date, responses to the challenges have largely been developed by large,


well-resourced institutions. So, what can be done about digital preservation in
smaller institutions? How can small libraries and archives respond to the chal-
lenges of preserving digital data? The issue here is one of scalability – scaling
down, as well as scaling up. This realization has taken firm hold in the last
decade and there are now many initiatives that seek to address the digital preser-
vation requirements of smaller institutions and of individuals.
A close look at the lessons learned by institutions that have been active in
digital preservation to see how they could apply in other contexts is rewarding.
The National Library of Australia’s experience provides some guidance and
assistance. One of its responses is

stages of action … Our most pressing demands were to make some decisions about
what we should try to preserve and to put those materials in a safe place … We have
formalized these processes into two broad terms: archiving and long-term preservation.
For the NLA, archiving refers to the process of bringing material into an archive; long-term
preservation refers to the process of ensuring that archived material remains authentic
and accessible (Webb, 2002, p.66)

This translates into a staged approach: start now; do what you can now; and
then consider the possibilities.
Think of a local history collection in a small public library. Such collections
are of considerable significance to the area they serve. This library will be
offered photographs of its locality in digital form; indeed, this is already hap-
pening. Local organizations now produce and disseminate newsletters only in
digital form, their web sites contain useful information for local history collec-
tions, and they develop significant databases that are widely used, for example
the St Croix African Roots Project (stx.visharoots.org). Other kinds of institu-
tions operate in similar ways, for instance the one-person special library, the
small archive perhaps staffed largely by volunteers, and the school library
which may already be attuned to the need to preserve the school’s heritage and
finds that much of it now resides on servers or on short-lived media. What
these small institutions have in common are digital objects in their collections,
staff whose duties are diverse and whose time is limited, and very limited
resources. These institutions urgently need guidance in the form of precise and
concise directions that can be readily implemented, perhaps as workflows
applicable to day-to-day operations. The challenge is to translate the lessons
learned in larger operations into guidelines and action lists for smaller institu-
tions.
Guides for individuals are now abundant. Examples include a guide to
designing preservable web sites whose five tips include ‘follow accessibility
standards’ and ‘avoid proprietary formats for important content or provide
alternate versions’ (Davis, 2011). YouTube clips are proliferating: ‘Backing
Up Your Digital Family Photos’ (www.youtube.com/watch?v=IQIoE8ceAu0)
Research and digital preservation 215

is but one of many examples. The Library of Congress also provides guidance
for individuals about archiving their personal digital photographs, audio,
video, emails and web sites (www.digitalpreservation.gov/you). In a similar
vein, the National Digital Stewardship Alliance in ‘Digital Preservation in a
Box’ provides resources that can be used in planning and presenting events
that introduce digital preservation (Lazorchak, 2011). There is definitely still
room for more.

Research and digital preservation


Digital preservation is characterized by a high level of research activity. This
is evident in many of the initiatives and collaborations described in Chapter 9.
They conduct research, or have done so in the past, and in some of them research
has a very high priority. Early examples include research commissioned by the
British Library to compare methods and costings (Hendley, 1998) and the
National Library of Australia’s Draft Research Agenda for the Preservation of
Physical Format Digital Publications (National Library of Australia, 1999).
Some of the most prominent examples were: Cedars and CAMiLEON, both
established as research collaborations; initiatives that were established specifi-
cally to test strategies and services, such as the Digital Preservation Testbed;
and initiatives in which research is one major activity alongside other activities,
such as the Digital Curation Centre and ERPANET.
The high levels of research activity in digital preservation can be explained
in part by the need for innovation in ways of thinking and activities. Few of the
pre-digital paradigm approaches are transferable and, apart from the experience
of the data archiving community, there has been no significant body of knowl-
edge to fall back on. Another reason is the urgency of the issues, which has,
Lavoie has suggested, ‘motivated an ambitious, community-wide research
agenda aimed at an improved understanding of what digital preservation entails’
(Lavoie, 2003, p.3). Certainly the engagement of a large number of stake-
holders – the ‘community-wide’ support noted by Lavoie – is clear, collabora-
tion being a hallmark of digital preservation, as Chapter 9 notes. Research has
resulted in many responses to the technical challenges, such as preservation
metadata schemes, increased understanding of the longevity of media, requisite
architectures for digital archives, and standards for trusted digital repositories.
Other challenges, too, have been addressed, including the social challenges,
such as a better understanding of risks, and increased knowledge of authenticity
requirements.
Statements about the digital preservation research agenda abound – general
statements, those that are sectorally focused, and some with a national focus.
Two digital preservation research agendas whose relevance persists are the
joint US National Science Foundation (NSF) and Library of Congress report,
216 Challenges for the Future of Digital Preservation

It’s About Time (Workshop on Research Challenges in Digital Archiving and


Long-term Preservation, 2003), and the joint NSF and European Union’s
DELOS report, Invest to Save (NSF-DELOS Working Group on Digital Archiv-
ing and Preservation, 2003). These agendas demonstrate an encouraging level
of consensus.
An early statement (Bennett, 1997) posed some key questions that are still
valid and can usefully motivate our current research:

– Why? What is the rationale for preservation? When an object is retrieved from the
archive will it still be valuable in 50 years time? Will it still be recognizable and com-
prehensible? … what costs are non-discretionary, how do they apply to an item’s life-
cycle in archive, and when will costs start to be discretionary? What benefits are
measurable, how can they be achieved, and who can be tasked with capturing them?
– How much? What contextual information is necessary for preservation? … what
contextual information is sufficient, so that when it is retrieved it can be interpreted
correctly? How the object will eventually be accessed, and for what purpose, how
will this affect the approach to preservation? …
– How? What are the preservation processes’ procedural needs in order to achieve a
long term archive? Who are the stakeholders, who will influence the way the archive
is built up and managed? What quick, cost-saving routes are there, which do not
adversely affect the quality of the archive? What safety nets exist which can provide a
fall-back for the archive should accidental loss or deliberate sabotage to the archive
occur?
– Where? While technology is in a state of continuous transition, when will technology
be resilient and stable enough for any item to be assured of its long term preservation?
(Bennett, 1997, pp.9-11).

At about the same time, Hedstrom noted four areas for research that were
likely to be fruitful – storage media, migration, conversion, and management
tools (Hedstrom, 1998, pp.197-200). Five years later, the outcomes from the
joint NSF and Library of Congress workshop to consider research challenges in
digital preservation provided a comprehensive map of the digital preservation
research agenda based on four themes:
1. Technical architectures for archival repositories: specification, system and tool
development, pilot implementation, and evaluation of repository models; develop a
spectrum of repository architectures; develop a spectrum of digital archiving
services; alternative repository models and interoperability; scalability and cost
2. Attributes of archival collections: articulating and modeling of curatorial processes;
developing appropriate preservation methods for diverse digital objects and collec-
tions; aggregation of items and objects into collections; decision models
3. Digital archiving tools and technologies: acquisition and ingest; managing the
evolution of tools, technology, standards, and metadata schema; naming and authori-
zation; standards and interoperability
4. Organizational, economic, and policy issues: metrics; economic and business models
(Workshop on Research Challenges in Digital Archiving and Long-term Preservation,
2003).
Research and digital preservation 217

The joint working group of the NSF and European Union’s DELOS Pro-
gramme also outlined an ambitious agenda, grouping their recommendations
for future research into three areas:

1. Preservation strategies: emerging research domains (including research into repository


models, software and format repositories, archival media, salvage and rescue, accele-
rated aging)
2. Re-engineering preservation processes (including research into modeling preservation
processes, automation of processes, scalability, distributed and grid storage)
3. Preservation of systems and technology (research into areas such as formats, man-
aging complex and dynamic digital materials, automated metadata creation,
acceptable loss) (NSF/DELOS Working Group on Digital Archiving and Preserva-
tion, 2003, pp.v-ix).

This report noted that three research areas were most likely to have the greatest
impact: self-contextualizing objects, metadata and the development of ontolo-
gies, and mechanisms for preservation of complex and dynamic objects
(NSF/DELOS Working Group on Digital Archiving and Preservation, 2003,
p.ix).
There have been frequent calls for more reports on practical experiences of
digital preservation. The predominant theme of a 2004 DPC forum about the
global context of digital preservation was ‘the need to develop and share prac-
tical experience’ (Digital Preservation Coalition, 2004). Lynch suggested that
the digital preservation experience to date is most applicable to the largest
organizations and that for smaller organizations at the state and local govern-
ment level, and for corporations of all sizes, ‘we need a research agenda to
give guidance and assistance for those who manage these valuable resources;
we need to offer them affordable and implementable approaches, and supporting
systems. There is a place not just for analytical research but also descriptive
research’ (Lynch, 2004).
A more recent agenda is the DPE’s Research Roadmap (DigitalPreservation
Europe, 2007), which is particularly useful for its synthesis of the major research
agendas published up to 2007. It identifies the principal issues that need to be
addressed in future digital preservation research. Ten areas are identified:
restoration; conservation; management; risk; significant properties of digital
objects; interoperability; automation; context; storage; and experimentation.
To what extent have these research agendas and concerns been addressed?
Three regions where there has been significant funding for research activity
stand out: the NDIIPP Program in the US; JISC-funded activities in the UK;
and European research projects funded by the EU. The wide scope of research
funded by NDIIPP is recorded in its 2010 report (NDIIPP, 2011), for example in
the Appendix C inventory of the tools and services created by NDIIPP partners
with funding from NDIIPP. Significant funding for digital preservation in the
US also comes from the National Science Foundation, most notably $100 million
218 Challenges for the Future of Digital Preservation

made available to ensure preservation and curation of engineering and science


data, the Institute of Museum and Library Services ($11 million since 2001), the
National Historical Publications and Records Commission, and the National
Endowment for the Humanities (Howard, 2011a and 2011b). JISC’s long-
standing interest in digital preservation (www.jisc.ac.uk/preservation) has meant
that funding has been available in the UK for digital preservation research that
has made significant contributions, such as the SHERPA, LIFE and KRDS
projects noted in this chapter.
What has probably advanced the digital preservation research agenda more
than anything else has been the funding available from the European Commis-
sion; many of its outputs have been noted in preceding chapters, for example
the output of the Planets and CASPAR projects. In May 2011 the European
Commission hosted a workshop of digital preservation experts to identify
where research into digital preservation should head when new funding rounds
commenced in 2013. The summary report of this workshop noted some general
areas for research: the need to expand and integrate communities of interest;
more automation and simplification; services to address increasingly complex
digital objects; developing self-preserving objects; the need for better business
models to support investment in digital preservation; and making digital preser-
vation a core part of the curriculum for computer science education. Fourteen
areas were proposed for consideration: extraction of preservation information;
integrated access (time – systems – community); reformulating digital preserva-
tion as a computer science question; integrated emulation systems; knowledge
preservation; quality assessment; complex object automation; ease of use and
private data; integration of digital preservation into digital asset management;
standards; market-driven and cost benefit; and self-preserving objects. The
report (Billenness, 2011) expands these areas and provides justifications for
their inclusion in the research agenda.
Although considerable research has been carried out, there is much more
to be done. Recent activity in the curation of large data sets has resulted in the
scoping of areas where our knowledge and understanding is limited, such as
how data are shared and the differences in data-sharing practices in different
communities, who reuses what data, what new kinds of education are required,
obstacles (for example, legal, technical, and social) to the creation of new digital
objects, and the extent to which these new digital objects need to be preserved
(Faniel and Zimmerman, 2011). There are also some notable gaps in research
carried out so far. Lavoie noted in 2003 one of the most significant, which is
still of concern today and still being investigated:

as digital preservation moves beyond the realm of small-scale, experimental projects to


become a routine component of a digital asset’s life-cycle management, the question of
how it can be shaped into an economically sustainable process begins to overshadow
other concerns … Digital preservation can hardly be classified as a new topic anymore,
Conclusion: the future of digital preservation 219

yet we still find ourselves not very far from the beginning in terms of exploring its eco-
nomic ramifications. No systematic study of the economics of digital preservation has
yet emerged (Lavoie, 2003, pp.41-42).

Research into digital preservation continues, and will continue into the fore-
seeable future, to be a pressing requirement if there is to be any real progress
in the preservation of digital materials.

Conclusion: the future of digital preservation


Although there is still considerable wariness and skepticism surrounding our
ability to maintain and provide access to digital materials in the long term, they
lessen as our understandings develop and as effective tools become available.
One early demonstration of this was the 2004 Association of Research Libraries
statement Recognizing Digitization as a Preservation Reformatting Method,
in which ‘the Association … endorses digitization as an accepted preservation
reformatting option for a range of materials’. Although this statement recog-
nized that many issues associated with the long-term preservation of digital
materials still needed to be resolved, it acknowledged that the means for short-
term preservation were available and that considerable activity was taking
place to develop long-term solutions, and it concluded that ‘libraries cannot
wait for these solutions to be completely settled before testing the waters.
Therefore, we must be prepared for persistent technological change’ (Arthur et
al., 2004). Similar points had already been made earlier by others, including
Woodyard (2000) who four years earlier noted that at the National Library of
Australia ‘we have ... analyzed the situation, recognized problems and made a
good start on implementing some solutions. We have not waited until they are
all thoroughly tested and proven because we believe there is limited time to
find solutions due to the rapid rate of technological change’.
Digital preservation has now reached a point of maturity where its require-
ments can be codified. At the end of a comprehensive overview of archiving in
the digital age, Tibbo provided seven edicts for the future of digital preserva-
tion; they need to be read with a broader participation in mind that involves
more than just archivists:

1. Society must recognize, understand, and actively support the efforts to solve the
challenges of digital archiving
2. The challenges that electronic records, and more broadly, digital objects, pose in re-
spect to their long-term preservation and access must be fully explored as a priority re-
search and development area that receives both strategic planning and extensive funding
3. Preservation and access must be viewed as having an inseparable relationship
4. Archivists, information scientists, librarians, policy makers, and computer scientists
must address the full range of issues integral to digital preservation in a coordinated
and collaborative fashion
220 Challenges for the Future of Digital Preservation

5. The information technology industry must produce tools to support digital preserva-
tion and access
6. The notion of responsible custody of digital assets must pervade society; digital
archiving must become ubiquitous
7. Archivists must take a leading role in educating society regarding digital preserva-
tion (Tibbo, 2003, pp.43-52).

Are we, as information professionals and as individuals, capable of meeting


these challenges? Do we have the will, the skills, and the influence to manage
this at institutional and governmental levels? Or is the best chance for large-
scale, affordable digital archiving likely to be linked to individuals needing to
preserve their own digital materials, with the consequence that affordable
software solutions will become widely available?
Although there has been significant progress in the quest for effective and
efficient preservation of digital materials, any pride in our achievements must
be tempered by an understanding that digital preservation is still undeveloped.
In particular, there seems to be a dearth of research into user requirements
from the perspectve of digital preservation; the work of Snow and her col-
leagues (2008) and Chowdury (2010) are notable exceptions. The personal
observation of an attendee at meetings in 2006 and 2011 gives an indication of
progress since 2006: ‘Most of the issues relating to digital preservation remain.
We are, for example, still looking for improved practices for dealing with digital
content across all phases of its life cycle. Securing sustainable funding remains
a challenge. Most preservation models remain institution-specific. The best
path forward still is broad-based collaboration.’ There have, however, been
many changes:

so many new technological developments since 2006 – social media, cloud computing,
smartphones, multi-terabyte hard drives. We’ve preserved so much digital content since
then, including websites, data sets and video. Plus there is so much more information
about digital preservation available: everything from short animated videos, to focused
websites, to specific guidance on personal digital archiving (LeFurgy, 2011).

Another indication of progress can be seen in the observations made by the


National Digital Information Infrastructure and Preservation Program (and
therefore offering a US- and Library of Congress-centric view) of the top ten
digital preservation developments of 2010 (The Top 10 Digital Preservation
Developments, 2010):

– The Library of Congress acquires the Twitter Archives


– The final report of the Blue Ribbon Task Force on Sustainable Digital
Preservation and Access is issued
– Memento (a web site archiving project) wins the Digital Preservation Award
2010
Conclusion: the future of digital preservation 221

– The rise of the digital preservation awareness video


– Collaborative efforts expand: examples include NDSA, LOCKSS, Hathi
Trust
– The Planets Project’s activities receive attention
– Personal archiving outreach is launched
– A focus on geospatial data preservation
– The (US) Federal Agencies Digitization Guidelines Initiative progresses
– The introduction of digital forensics for cultural heritage.

We have examples of practice that has been successful for more than fifty
years for preserving some kinds of digital materials, particularly social science
data sets, but these have been developed for and work best with relatively
small quantities of simple materials. We must keep reflecting, investigating,
experimenting, and researching as the complexities of the digital world deepen
and new kinds of data evolve, and as new ways of interacting with and using
these data continue to develop.
Bibliography
The Preface to this book notes that there is a considerable amount of high-quality informa-
tion available about preserving materials in digital form, and that much of it is available on
the web. The accessibility that this provides is countered by the impermanence of much web
material, as noted in several chapters in this book. All URLs in this Bibliography were cor-
rect at the time of writing.

Abbott, D. (2003) Overcoming the dangers of technological obsolescence: rescuing the


BBC Domesday Project. DigiCULT.Info: A Newsletter on Digital Culture, 4, 7-10
Abrams, S. et al. (2010) An emergent micro-services approach to digital curation infrastruc-
ture. Intenational Journal of Digital Curation, 5 (1), 172-186 <www.ijdc.net/index.php/
ijdc/article/viewFile/154/217>
Abrams, S., Morrissey, S. and Cramer, T. (2009) ‘What? so what’: the next-generation
JHOVE2 architecture for format-aware characterization. International Journal of Digi-
tal Curation, 4 (3), 123-136 <www.ijdc.net/index.php/ijdc/article/view/139/174>
Access in the future tense (2004) Washington, D.C.: Council on Library and Information
Resources
Adcock, E.P. (1998) Principles for the care and handling of library materials. International
Preservation Issues, no. 1. Paris: IFLA PAC
Alliance for Permanent Access (2008) Keeping the records of science accessible, can we
afford it?: report on the 2008 Annual Conference, Budapest, 4 November <www.
alliancepermanentaccess.org/wp-content/uploads/2010/12/documenten_Alliance2008con
ference_report.pdf>
Archives Association of Ontario (1999) Archival appraisal: what to keep and what to destroy?
<http://aao-archivists.ca/resources/document-downloads/doc_download/12-archival-
appraisal-what-to-keep-and-what-to-destroy->
Archiving Web Resources (2004) Themes emerging from Archiving Web Resources – Issues
for Cultural Heritage Organisations [Conference], National Library of Australia, Can-
berra, 9-11 November 2004 <www.nla.gov.au/webarchiving/ConferenceReport.rtf>
ARL Workshop on New Collaborative Relationships (2006) To stand the test of time: long-
term stewardship of digital data sets in science and engineering. Association of Research
Libraries <www.arl.org/bm~doc/digdatarpt.pdf>
Arps, M. (1993) CD-ROM: some archival considerations. In Preservation of electronic
formats and electronic formats for preservation, ed. J. Mohlhenrich, pp.83-107. Fort
Atkinson, Wisc: Highsmith
Arthur, K. et al. (2004) Recognizing digitization as a preservation reformatting method
<www.arl.org/bm~doc/digi_preserv.pdf>
AS ISO 15489.2-2002 (2002) Records management – guidelines. Sydney: Standards Australia
Aschenbrenner, A. (2004) The bits and bites of data formats: stainless design for digital
endurance. RLG DigiNews, 8 (1) <worldcat.org/arcviewer/2/OCC/2009/08/11/H125001
0316214/viewer/file2.html>
Australian National Data Service (2011) Research Data Management Framework: capability
maturity guide <ands.org.au/guides/dmframework/dmf-capability-maturity-guide.html>
Authenticity (1999) [PADI summary]. National Library of Australia <www.nla.gov.au/padi/
topics/4.html]
Bibliography 223

Authenticity in a digital environment (2000) Washington, D.C.: Council on Library and In-
formation Resources
Bailey, S. (2010) Is digital preservation now routine? Posting to Records Management
Futurewatch blog, 2 May <rmfuturewatch.blogspot.com/2010/05/is-digital-preservation-
now-routine.html>
Barrow, W.J. (1959) Deterioration of book stock: causes and remedies: two studies on the
permanence of book paper. Richmond, VA: Virginia State Library
Barton, M.R. and Walker, J.H. (2003) Building a business plan for DSpace, MIT Libraries’
digital institutional repository. Journal of Digital Information, 4 (2) <jodi.ecs.soton.ac.
uk/Articles/v04/i02/Barton>
Bastian, J., Cloonan, M. and Harvey, R. (2011) From teacher to learner to user: developing
a digital stewardship pedagogy. Library Trends, 59 (4), 607-622
Beagrie, N. (2001) Preserving UK digital library collections. Program, 35 (3), 217-226
Beagrie, N. (2003) National digital preservation initiatives: an overview of developments in
Australia, France, the Netherlands, and the United Kingdom and of related interna-
tional activity. Washington, D.C.: Council on Library and Information Resources and
Library of Congress
Beagrie, N. (2004) The Continuing Access and Digital Preservation Strategy for the UK
Joint Information Systems Committee (JISC). D-Lib Magazine, 10 (7/8) <www.dlib.
org/dlib/july04/beagrie/07beagrie.html>
Beagrie, N. et al. (2008) Digital preservation policies study. part 1, final report. Salisbury:
Charles Beagrie Ltd <www.jisc.ac.uk/media/documents/programmes/preservation/jisc
policy_p1finalreport.pdf>
Bearman, D. (1998) Archival methods. Archives and Museums Informatics Technical Report,
9. Pittsburgh: Archives and Records Informatics
Bellinger, M. et al. (2004) OCLC’s digital preservation program for the next generation
library. Advances in Librarianship, 27, 29-48
Bennett, J.C. (1997) A framework of data types and formats, and issues affecting the long
term preservation of digital material: JISC/NPO studies on the preservation of elec-
tronic materials. British Library Research and Innovation Paper, 50. London: British
Library Research and Innovation Centre
Bergau, N. (2010) Report on digital preservation practice and plans amongst LIBER members
with recommendations for practical action. EuropeanaTravel <www.europeanatravel.
eu/downloads/D1.3._ET_report_final_23092010.pdf>
Besek, J.M. et al. (2008) Digital preservation and copyright: an international study. Inter-
national Journal of Digital Curation, 3 (2), 103-111 <www.ijdc.net/index.php/ijdc/article/
viewFile/90/61>
Besser, H. (2000) Digital longevity. In Handbook for digital projects: a management tool
for preservation and access, ed. M. Sitts. Andover, Mass.: Northeast Document Con-
servation Center <www.nedcc.org/resources/digitalhandbook/dighome.htm>
Besser, H. (2007) Collaboration for electronic preservation. Library Trends, 56 (1), 216-229
Betts, M. (1999) Businesses worry about long-term data losses: will we access our saved
data in 20 years? Computerworld News, Sept 20
Bigourdan, J.-L. (2006) The preservation of magnetic tape collections: a perspective. Roches-
ter, NY: Image Permanence Institute <www.imagepermanenceinstitute.org/imaging/
research/magnetic-tape>
Billenness, C.S.G. (2011) The future of the past: shaping new visions for EU-research in
digital preservation. Luxemburg: European Commission, Information Society and Media
Directorate <http://cordis.europa.eu/fp7/ict/telearn-digicult/future-of-the-past_en.pdf>
224 Bibliography

Bishoff, L. and Allen, N. (2004) Business planning for cultural heritage institutions. Washing-
ton, D.C.: Council on Library and Information Resources
Blue Ribbon Task Force on Sustainable Digital Preservation and Access (2010) Sustainable
economics for a digital planet: ensuring long-term access to digital information <brtf.
sdsc.edu/biblio/BRTF_Final_Report.pdf>
Borghoff, U.M. et al. (2006) Long-term preservation of digital documents: principles and
practices. Berlin: Springer
Borgman, C.L. (2007) Scholarship in the digital age: information, infrastructure, and the
internet. Cambridge, MA: MIT Press
Boyle, F., Eveleigh, A. and Needham, H. (2008) Report on the survey regarding digital
preservation in local authority archive services <www.dpconline.org/docs/reports/dig
pressurvey08.pdf>
Brabazon, T. (2000) He lies like a rug: digitising memory. Media International Australia,
96, 153-161
Breeding, M. (2010) Ensuring our digital future. Computers in Libraries, 30 (9), 32-35
Brindley, L.J. (2000) Keynote address to the Preservation 2000 Conference. New Review of
Academic Librarianship, 6, 125-137
Brown, A. (2008a) Selecting file formats for long-term preservation. Digital preervation
guidance note 1 δwww.nationalarchives.gov.uk/documents/information-management/
selecting-file-formats.pdfε
Brown, A. (2008b) Selecting storage media for long-term preservation. Digital preservation
guidance note 2 δwww.nationalarchives.gov.uk/documents/information-management/
selecting-storage-media.pdfε
Brown, A. (2008c) Care, handling and storage of removable media. Digital preservation
guidance note 3 δwww.nationalarchives.gov.uk/documents/information-management/
removable-media-care.pdfε
Brown, G. and Woods, K. (2011) Born broken: fonts and information loss in legacy digital
documents. International Journal of Digital Curation, 6 (1), 5-19 δwww.ijdc.net/index.
php/ijdc/article/viewFile/159/243ε
Brown, H., et al. (2011) Digital curation reference manual: instalment on the role of microfilm
in digital preservation <www.dcc.ac.uk/resources/curation-reference-manual/microfilm>
Bryan, R.E. (2003) Survey results: preservation management of born-digital documents in
United States manuscripts repositories. In Symposium 2003: Preservation of Electronic
Records: New Knowledge and Decision-making, Ottawa, 15-18 September 2003: pre-
prints
BS4783 (1988- ) Storage, transportation and maintenance of media for use in data process-
ing and information storage. London: British Standards Institution
Bulger, M. et al. (2011) Reinventing research?: information practices in the humanities.
London: Research Information Network.
Burrows, T. (2000) Preserving the past, conceptualising the future: research libraries and
digital preservation. Australian Academic & Research Libraries, 31 (4), 142-153
Byers, F.R. (2003) Care and handling of CDs and DVDs: a guide for librarians and archi-
vists. Washington, D.C.: Council on Library and Information Resources and National
Institute of Standards and Technology
Caplan, P. (2006) DCC digital curation reference manual: instalment on preservation metadata
<www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/preservation-
metadata>
Caplan, P. (2007) The Florida Digital Archive and DAITSS: a working preservation repository
based on format migration. International Journal on Digital Libraries, 6 (4), 305-311.
Bibliography 225

Caplan. P. (2009) Understanding PREMIS. Washington, D.C.: Library of Congress <www.


loc.gov/standards/premis/understanding-premis.pdf>
Caplan, P. (2010) The Florida Digital Archive and DAITSS: a model for digital preserva-
tion. Library Hi Tech, 28 (2), 224-234
Casson, L. (2002) Libraries in the ancient world. New Haven: Yale Nota Bene
Cedars Project Team (2002) The Cedars Project Report, April 1998-March 2001 <www.web
archive.org.uk/wayback/archive/20050410120000/http://www.leeds.ac.uk/cedars/Our
Publications/CedarsProjectReportToMar01.pdf>
Center for Research Libraries (2007) Ten principles <www.crl.edu/archiving-preservation/
digital-archives/metrics-assessing-and-certifying/core-re>
Center for Research Libraries (2009) Metrics for assessment and certification <www.crl.
edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying>
Center for Research Libraries (2010) Portico audit report <www.crl.edu/archiving-preserva
tion/digital-archives/certification-and-assessment-digital-repositories/portico>
Chapman, S. (2003) Counting the cost of digital preservation: Is repository storage af-
fordable? Journal of Digital Information, 4 (2), 1-15 <journals.tdl.org/jodi/article/view/
100>
Charles Beagrie Ltd and JISC (2010) Keeping Research Data Safe factsheet: cost issues in
digital preservation of research data <www.beagrie.com/KRDS_Factsheet_0910.pdf>
Chivers, L. et al. (2010) Digital folklore preserved for the future: a case study for the use of
Planets in preserving the digital collection of the Danish Folklore Archives, the Royal
Library, Denmark. London: Planets <www.planets-project.eu/docs/casestudies/Planets
Casestudy_DigitalFolklore.pdf>
Chowdury, G. (2010) From digital libraries to digital preservation research: the importance
of users and context. Journal of Documentation, 66 (2), 207-223.
Clausen, L.R. (2004) Handling file formats. Århus: State and University Library
Cloonan, M.V. (1993) The preservation of knowledge. Library Trends, 41 (4), 594-605
Cloonan, M.V. (2001) W(h)ither preservation? Library Quarterly, 71 (2), 231-242
Cloonan, M.V. and Sanett, S. (2000) Comparing preservation strategies and practices for
electronic records. New Review of Academic Librarianship, 6, 205-216
Cloonan, M.V. and Sanett, S. (2002) Preservation strategies for electronic records: where
are we now – obliquity and squint? American Archivist, 65, 70-106
Collin, S.M.H. (2002) Dictionary of information technology, 3rd ed. Teddington: Peter
Collin Publishing
Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age (2009)
Ensuring the integrity, accessibility and stewardship of research data in the digital
age. Washington, D.C.: National Academies Press.
Conference of European National Librarians and Federation of European Publishers (CENL/
FEP) (2005) Statement on the development and establishment of voluntary deposit
schemes for electronic publications <www.nlib.ee/cenl/docs/05-11CENLFEP_Draft_
Statement050822_02.pdf>
Consultative Committee for Space Data Systems (2002) Reference Model for an Open Archi-
val Information System (OAIS). CCSDS 650.0-B-1 Blue Book. Washington, D.C.: NASA
Conway, P. (2000) Overview: rationale for digitization and preservation. In Handbook for
digital projects: a management tool for preservation and access, ed. M. Sitts. Andover,
Mass: Northeast Document Conservation Center <www.nedcc.org/resources/digital
handbook/dighome.htm>
Cook, T. (1995) It’s 10 o’clock: do you know where your data are? Technology Review,
January <web.mit.edu/erm/tcook.tr1995.html>
226 Bibliography

Cook, T. (2000) Beyond the screen: the records continuum and archival cultural heritage.
Paper presented at the Australian Society of Archivist Conference, Melbourne, 18 August
<www.mybestdocs.com/cook-t-beyondthescreen-000818.htm>
CORDIS (2011) TeLearn–DigiCult: research topics and projects <cordis.europa.eu/fp7/ict/
telearn-digicult/digicult-projects_en.html>
Cox, R.J. (2000?) The functional requirements for evidence in recordkeeping <www.archi
muse.com/papers/nhprc>
Cox, R.J. (2001) Managing records as evidence and information. Westport, Conn.: Quorum
Books
Cox, R.J. (2002) Vandals in the stacks: a response to Nicholson Baker’s assault on libraries.
Westport, Conn: Greenwood Press
Cunningham, A. (2008) Digital curation/digital archiving: a view from the National Archives
of Australia. American Archivist, 71, 530-543.
Dale, R.L. (2004) Consortial actions and collaborative achievements: RLG’s preservation
program. Advances in Librarianship, 27, 1-23
Dale, R. and Gore, E. (2010) Process models and the development of trustworthy digital re-
positories. Information Standards Quarterly, 22 (2), 14-19
Darlington, J., Finney, A. and Pearce, A. (2003) Domesday Redux: the rescue of the BBC
Domesday Project videodiscs. Ariadne, 36 <www.ariadne.ac.uk/issue36/tna>
Davis, R.C. (2011) Five tips for designing preservable websites <blog.photography.si.edu/
2011/08/02/five-tips-for-designing-preservable-websites>
Day, R. (1989) Where’s the rot?: a special report on CD longevity. Stereo Review, April,
23-24
de Lusenet, Y. (2002) Preservation of digital heritage: draft discussion paper prepared for
UNESCO. European Commission on Preservation and Access <www.ica.org/download.
php?id=606>
de Lusenet, Y. (2007) Tending the garden or harvetsing the fields: digital preservation and
the UNESCO Charter on the Preservation of the Digital Heritage, Library Trends, 56
(1), 164-182
Deegan, M. and Tanner, S. (2002) The digital dark ages. Update, May
Deegan, M. and Tanner, S. (eds) (2006) Digital preservation. London: Facet
Del Pozo, N., Long, A.S. and Pearson, D. (2010) ‘Land of the lost’: a discussion of what
can be preserved through digital preservation. Library Hi Tech, 28 (2), 290-300
Digital Curation Centre (2008) The DCC Curation Lifecycle Model <www.dcc.ac.uk/docs/
publications/DCCLifecycle.pdf>
Digital Curation Centre (2010) Tools <www.dcc.ac.uk/resources/external/software-and-hard
ware/tools>
Digital Preservation Coalition (2003) Annual company report 23 July 2002-31 July 2003
<www.dpconline.org/docs/DPCAR02-03.pdf>
Digital Preservation Coalition (2004) Annual company report 1 August 2003-31 July 2004
<www.dpconline.org/docs/DPCAR03-04.pdf>
Digital Preservation Coalition (2004) Digital preservation: the global context: report on the
DPC Forum held at the British Library Conference Centre, Wednesday 23 June <www.
dpconline.org/events/previous-events/299-digital-preservation-the-global-context?format
=pdf>
Digital Preservation Coalition (2006) Interactive assessment: selection of digital materials
for long-term retention <www.dpconline.org/advice/preservationhandbook/decision-tree?
format=pdf>.
Bibliography 227

Digital Preservation Coalition (2008) Preservation management of digital materials: the


handbook <www.dpconline.org/component/docman/doc_download/299-digital-preser
vation-handbook>
Digital Preservation Testbed (2003) From digital volatility to digital permanence: preserving
email. The Hague: ICTU <www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/vola
tility-permanence-email-en.pdf>
DigitalPreservationEurope (2007) Research roadmap <www.digitalpreservationeurope.eu/
publications/dpe_research_roadmap_D72.pdf>
Domesday reloaded: story of Domesday (2011) <www.bbc.co.uk/history/domesday/story>
Donnelly, M., Jones, S., and Pattenden-Fail, J.W. (2010) DMP Online: the Digital Curation
Centre’s web-based tool for creating, maintaining and exporting data management plans.
International Journal of Digital Curation, 5 (1) <www.ijdc.net/index.php/ijdc/article/
viewFile/155/218>
Dooley, J.M., and Luce, K. (2010) Taking our pulse: the OCLC Research survey of special
collections and archives. Dublin, Ohio: OCLC Research. <www.oclc.org/research/pub
lications/library/2010/2010-11.pdf>
DSpace (2010?) DSpace: an open source dynamic digital repository <www.dspace.org/
images/stories/dspace-diagram.pdf>
Dunn, M. (2009) Schlesinger Library Web Archiving Pilot LDI Project: final report <http://
hul.harvard.edu/ois/ldi/resources/Schlesinger_WAX.pdf>
Dureau, J.M. and Clements, D.W.G. (1986) Principles for the preservation and conserva-
tion of library materials. IFLA Professional Reports, no. 8. The Hague: IFLA
Eastwood, T. (2002) The appraisal of electronic records: what is new? Comma, 2002-1/2,
77-87
Edmondson, R. (2002) Memory of the World: general guidelines, rev. edn. Paris: UNESCO
Edmondson, R. (2004) Audiovisual archiving: philosophy and principles. Paris: UNESCO
<unesdoc.unesco.org/images/0013/001364/136477e.pdf>
Entlich, R. and Buckley, E. (2006) Digging up bits of the past: hands-on with obsolescence.
RLG Digi-News, 10 (5) <worldcat.org/arcviewer/1/OCC/2007/07/10/0000068996/viewer/
file1.html>
ERPANET (2002) erpanet <www.erpanet.org/brochure.pdf>
ERPANET (2003) Selecting technologies tool <www.erpanet.org/guidance/docs/ERPANET
Select_Techno.pdf>
ERPATraining (2004) File formats for preservation: briefing paper <www.erpanet.org/
events/2004/vienna/erpaTrainingWien_BriefingPaper_v02.pdf>
Exon, M. (1995) The long-term management issues in the preservation of electronic infor-
mation. Paper presented at Multimedia Preservation: Capturing the Rainbow Confer-
ence, Brisbane, 28-30 November 1995 <www.nla.gov.au/niac/meetings/npo95me.html>
Expert Meeting (2010) Price tags of digital preservation policy choices, The Hague, 16 Sep-
tember <www.ncdd.nl/en/documents/20100916PriceTagsConferenceReportfinal.pdf>
Faniel, I.M. and Zimmerman, A. (2011) Beyond the data deluge: a research agenda for
large-scale data sharing and reuse. International Journal of Digital Curation, 6 (1) <www.
ijdc.net/index.php/ijdc/article/viewFile/163/231>
Faulds, F. and Challinor, A. (1998) Archiving from the data warehouse. Information Man-
agement & Technology, 31 (6), 278-280
Fedora and the preservation of university records (2006) Medford, MA: Tufts University;
New Haven, CT: Yale University <http://dca.lib.tufts.edu/features/nhprc/reports>
Fellows, G. et al. (2008) Separating the wheat from the chaff: identifying key elements in the
NLA .au domain harvest. Australian Academic & Research Libraries, 39 (3), 137-148.
228 Bibliography

Flesch, J. (1996) A labour of love?: the story behind the compilation of Love brought to book:
a bio-bibliography of 20th century Australian romance novels. Australian Academic &
Research Libraries, 27 (3), 182-190
Flesch, J. (2004) From Australia with love: a history of modern Australian popular romance
novels. Fremantle: Curtin University Books
Florida Digital Archive (2009) FDA file preservation strategies by format <fclaweb.fcla.edu/
fda_format_landing_page>
Gertz, J. (2000) Selection for preservation in the digital age. Library Resources & Technical
Services, 44 (2), 97-104
Gilliland-Swetland, A.J. (2000) Enduring paradigm: the value of the archival perspective in
the digital environment. Washington, D.C.: Council on Library and Information Resources
Gilliland-Swetland, A.J. (2002) Testing our truths: delineating the parameters of the authentic
archival electronic record. American Archivist, 65, 196-215
Gilliland-Swetland, A.J. (2005) Electronic records management. ARIST: Annual Review of
Information Science and Technology, 39, 219-253
Gladney, H.M. (2007) Preserving digital information. Springer
Goethals, A. and Gogel, W. (2010). Reshaping the repository: the challenge of email archiving.
Presented at iPRES 2010, Vienna, September 20 <www.ifs.tuwien.ac.at/dp/ipres2010/
papers/goethals-08.pdf>
Gorman, M. (1997) What is the future of cataloguing and cataloguers? Paper presented at
the 63rd IFLA General Conference, Copenhagen, Denmark, August 31-September 5,
1997 <www.ifla.org/IV/ifla63/63gorm.htm>
Grace, S., Knight, G. and Montague, L. (2009) InSPECT final report <www.significant
properties.org.uk/inspect-finalreport.pdf>
Granger, S. (2000) Emulation as a digital preservation strategy. D-Lib Magazine, 6 (10)
<www.dlib.org/dlib/october00/granger/10granger.html>
Green, A., Dionne, J. and Dennis, M. (1999) Preserving the whole: a two-track approach to
rescuing social science data and metadata. Washington, D.C.: Council on Library and
Information Resources
Green, A., Macdonald, S. and Rice, R. (2009) Policy-making for research data in reposito-
ries: a guide. Version 1.2. Edinburgh: EDINA and University Data Library <www.
disc-uk.org/docs/guide.pdf>
Greenan, M. (2003) Dspace. DigiCULT.Info: A Newsletter on Digital Culture, 3, 8
Gunton, T. (1993) A dictionary of information technology and computer science, 2nd edn.
Manchester: NCC Blackwell
Gwinn, N.E. (1993) A national preservation program for agricultural literature <usain.org/
Preservation/preservation.pdf>
Hafner, K. (2004) Even digital memories fade. New York Times, 10 November
Halbert, M. and Skinner, K. (2008) The MetaArchive Cooperative: a new collaborative ser-
vice organization providing a distributed digital preservation infrastructure. CLIR Issues,
66 <www.clir.org/pubs/issues/issues66.html>
Harris, C. (2000) Selection for preservation. In Preservation: issues and planning, ed. P.N.
Banks and R. Pilette, pp.206-224. Chicago: American Library Association
Harter, R. (1999) Piltdown Man <home.tiac.net/~cri_a/piltdown/piltdown.html>
Harvey, R. (1993) Preservation in libraries: principles, strategies and practices for librari-
ans. London: Bowker-Saur
Harvey, R. (1995) From digital artefact to digital object. Paper presented at Multimedia
Preservation: Capturing the Rainbow Conference, Brisbane, 28-30 November 1995
<www.nla.gov.au/niac/meetings/npo95rh.html>
Bibliography 229

Harvey, R. (2005a) Preserving digital documentary heritage in libraries: what do we select?


In Preservation of electronic records: new knowledge and decision-making: postprints
of a conference, Symposium 2003, Ottawa, Canada, September 15-18, 2003, pp.13-20.
Ottawa: Canadian Conservation Institute
Harvey, R. (2005b) Preserving digital materials. London: K.G. Saur
Harvey, R. (2010) Digital curation: a how-to-do-it manual. New York: Neal Schuman
Harvey, R. and Thompson, D. (2010) Automating the appraisal of digital materials. Library
Hi Tech, 28 (2), 313-322.
Heazlewood, J. (2000) ePreservation in the archive: theories, practices. Australian Academic
& Research Libraries, 31 (4), 173-187
Heazlewood, J. (2002) Public Record Office Victoria: background report to the UNESCO
Regional Consultation on the Preservation of Digital Heritage, Canberra, 4-6 Novem-
ber 2002
Hedstrom, M. (1998) Digital preservation: a time bomb for digital libraries. Computers and
Humanities, 31, 189-202
Hedstrom, M. and Lampe, C. (2001) Emulation vs. migration: do users care? RLG DigiNews,
5 (6) <worldcat.org/arcviewer/2/OCC/2009/08/11/H1250008241211/viewer/file2.html>
Hedstrom, M. and Lee, C.A. (2002) Significant properties of digital objects: definitions,
applications, implications. Proceedings of the DLM-Forum 2002: @ccess and Preser-
vation of Electronic Information: Best Practices and Solutions, Barcelona, 6-8 May
2002, pp. 218-223. Brussels: European Commission
Hedstrom, M. and Montgomery, S. (1999) Digital preservation needs and requirements in
RLG member institutions: a study commissioned by the Research Libraries Group.
Mountain View, CA: Research Libraries Group
Helwig, P., Roberts, B. and Nimmo, E. (2010) The Nationaal Archief of the Netherlands
and the use of emulation: a study of the use of Planets in preserving the digital col-
lection of the Nationaal Archief of the Netherlands. London: Planets <www.planets-
project.eu/docs/casestudies/PlanetsCasestudy_NationalArchiefandemulation.pdf>
Hendley, T. (1998) Comparison of methods & costs of digital preservation. London: British
Library Research and Innovation Centre
Heslop, H., Davis, S. and Wilson, A. (2002) An approach to the preservation of digital
records. Canberra: National Archives of Australia <www.naa.gov.au/images/an-
approach-green-paper_tcm2-888.pdf>
Heterick, B. (2002) Applying the lessons learned from retrospective archiving to the digital
archiving conundrum. Information Services & Use, 22, 113-120
Higgins, S. (2006) What are metadata standards <www.dcc.ac.uk/resources/briefing-papers/
standards-watch-papers/what-are-metadata-standards>
Hilton, C. and Thompson, D. (2007) Collecting born digital archives at the Wellcome Library.
Ariadne, 50 <www.ariadne.ac.uk/issue50/hilton-thompson>
Hilton, C., Thompson, D. and Walters, N. (2010) Trust me, I’m an archivist: experiences
with digital donors. Ariadne, 65 <www.ariadne.ac.uk/issue65/hilton-et-al>
Hitchcock, S. and Tarrant, D. (2011) Characterising and preserving digital repositories: file
format profiles. Ariadne, 66 <www.ariadne.ac.uk/issue66/hitchcock-tarrant>
Hodge, G. and Frangakis, E. (2004) Digital preservation and permanent access to scientific
information: the state of the practice: a report sponsored by the International Council
for Scientific and Technical Information (ICSTI) and CENDI <cendi.dtic.mil/publica
tions/04-3dig_preserv.html>
Hofman, H. (2002) Review: some comments on preservation metadata and the OAIS Model.
DigiCULT.Info: A Newsletter on Digital Culture, 2, 15-20
230 Bibliography

Holdsworth, D. and Wheatley, P. (2001) Emulation, preservation, and abstraction. RLG


DigiNews, 5 (4) <worldcat.org/arcviewer/1/OCC/2007/08/08/0000070511/viewer/file
3149.html#feature2>
Hole, B. et al. (2010) Life3: a predictive costing tool for digital collections. Presented at iPRES
2010, Vienna, September 22 <www.ifs.tuwien.ac.at/dp/ipres2010/papers/hole-64. pdf>
Howard, B. (2011a) IMLS grants relating to digital preservation. Posting to The Signal:
Digital Preservation blog, September 1 <blogs.loc.gov/digitalpreservation/2011/09/
neh-grants-relating-to-digital-preservation-2>
Howard, B. (2011b) NEH grants relating to digital preservation. Posting to The Signal: Digital
Preservation blog, August 25 <blogs.loc.gov/digitalpreservation/2011/08/neh-grants-
relating-to-digital-preservation>
Howell, A. (2000) Perfect one day – digital the next: challenges in preserving digital infor-
mation. Australian Academic & Research Libraries, 31 (4), 121-141
Howell, A. (2001) Preserving information in a digital age: what’s the difference? Paper
Conservator, 25, 133-149
Howell, A. and Berthon, H. (2000) http://www.nla.gov.au/padi/: Preserving Access to Digital
Information (PADI) – an opportunity for global cooperation. Paper presented at IFLA
Symposium on Managing the Preservation of Periodicals and Newspapers, Paris, 21-
24 August <www.ifla.org.sg/VI/4/conf/howell.pdf>
HP and MIT create non-profit organization to support growing community of DSpace users
(2007) <www.hp.com/hpinfo/newsroom/press/2007/070717a.html>
Humphrey, C. (2003) Preserving research data: a time for action. In Symposium 2003: Preser-
vation of Electronic Records: New Knowledge and Decision-making, Ottawa, Canada,
15-18 September 2003: Preprints
ICPSR (2007). Glossary [Normalization] <www.icpsr.umich.edu/icpsrweb/ICPSR/curation/
preservation/glossary.jsp?token=>
Iglesias, E. and Meesangnil, W. (2010) Using Amazon S3 in digital preservation in a mid-
sized academic library: a case study of CCSU ERIS Digital Archive System. code{4}lib,
12 <journal.code4lib.org/articles/4468>
Integrating Planets into archives and libraries (2009) Planetarium, 7, 4-5 <www.planets-
project.eu/docs/newsletters/Planetarium7_July09.pdf>
International Association of Sound and Audiovisual Archives Technical Committee (2004)
Guidelines on the production and preservation of digital audio objects. Aarhus: IASA
International Association of Sound and Audiovisual Archives Technical Committee (2009)
Guidelines on the production and preservation of digital audio objects, IASA-TC04,
2nd ed. by Kevin Bradley. Auckland Park: IASA <www.iasa-web.org/tc04/audio-
preservation>
International Council on Archives Committee on Electronic Records (1997) Guide for man-
aging electronic records from an archival perspective. ICA Studies, 8. Paris: ICA
International Organization for Standardization (2003) Space data and information transfer
systems – Open archival information system – Reference model (ISO 14721:2003)
Internet Systems Consortium (2011) The ISC domain survey <www.isc.org/solutions/survey>
InterPARES Project (1999- ) International Research on Permanent Authentic Records in
Electronic Systems <www.interpares.org>
Iraci, J. (2010) Longevity of recordable CDs and DVDs (CCI Notes 19/1) <www.cci-icc.gc.
ca/crc/notes/html/19-1-eng.aspx>
James, H. et al. (2003) Feasibility and requirements study on preservation of e-prints: report
commissioned by the Joint Information Systems Committee. London: JISC <www.jisc.
ac.uk/uploaded_documents/e-prints_report_final.pdf>
Bibliography 231

Jeffs, B. (2011) Cloud computing, NEA Newsletter, 38 (3), 22 <www.newenglandarchivists.


org/pdfs/NEA_Newsletter_Summer_2011.pdf>
Jenkinson, H. (1965) A manual of archive administration, 2nd edn rev. London: Lund
Humphries
John, J.L. et al. (2010) Digital lives: personal digital archives for the 21st century: an initial
synthesis <www.ucl.ac.uk/infostudies/research/ciber/digitallives.pdf>
Jones, M. (2004) Editor’s interview: Digital Preservation Coalition. RLG DigiNews, 8 (1)
<worldcat.org/arcviewer/1/OCC/2007/08/08/0000070511/viewer/file3683.html#article1>
Jones, M. and Beagrie, N. (2001) Preservation management of digital materials: a hand-
book. London: British Library
Jones, S., Ruusalepp, R. and Ross, S. (2009) Data Audit Framework methodology. Glasgow:
HATII <www.data-audit.eu/DAF_Methodology.pdf>
JSTOR at a glance (2011) <about.jstor.org/sites/default/files/jstor-factsheet-20110610.pdf>
Kenney, A.R. and McGovern, N.Y. (2003) The five organizational stages of digital preser-
vation. In Digital libraries: a vision for the 21st century. Ann Arbor, MI: MPublishing,
University of Michigan Library <quod.lib.umich.edu/s/spobooks/bbv9812.0001.001/1:11?
rgn=div1;view=fulltext>
Kenney, A.R. and Rieger, O.Y. (2000) Moving theory into practice: digital imaging for
libraries and archives. Mountain View, CA: Research Libraries Group
Kenney, A.R. and Stam, D.C. (2002) The state of preservation programs in American college
and research libraries: building a common understanding and action agenda. Washing-
ton, D.C.: Council on Library and Information Resources
Kenney, A. et al. (2003) Digital preservation management: implementing short-term strate-
gies for long-term problems [tutorial]. Ithaca, N.Y.: Cornell University Library <www.
icpsr.umich.edu/dpm/>
Key Perspectives Ltd (2010) Data dimensions: disciplinary differences in research data
sharing, reuse and long term viability. Edinburgh: DCC <www.dcc.ac.uk/sites/default/
files/SCARP%20SYNTHESIS_FINAL.pdf>
Kimpton, M. and Payette, S. (2010) Using cloud infrastructure as part of a digital preservation
strategy with DuraCloud. Educause Quarterly, 33 (2) <www.educause.edu/library/
EQM10212>
Kirchhoff, A. et al. (2010) Becoming a certified trusted digital repository: the Portico experi-
ence. Presented at iPRES 2010, Vienna, September 20 <www.ifs.tuwien.ac.at/dp/ipres
2010/papers/Kirchhoff-35.pdf>
Kirschenbaum, M.G., Ovenden, R. and Redwine, G. (2010) Digital forensics and born-
digital content in cultural heritage collections. Washington, D.C.: Council on Library
and Information Resources.
Knight, G. and Pennock, M. (2009) Data without meaning: establishing the significant
properties of digital research. International Journal of Digital Curation, 4 (1) <www.
ijdc.net/index.php/ijdc/article/viewFile/110/87>
Kroll Ontrack (2011) Understanding data loss <www.krollontrack.com/data-recovery/under
standing-data-loss>
Lamb, D. et al. (2009) CASPAR <www.dcc.ac.uk/resources/briefing-papers/technology-
watch-papers/caspar>
Lannom, L. (2011) Editorial: research data. D-Lib Magazine, 17 (1/2) <www.dlib.org/dlib/
january11/01editorial.html>
Lavoie, B.F. (2003) The incentives to preserve digital materials: roles, scenarios, and eco-
nomic decision-making. Dublin, Ohio: OCLC <www.oclc.org/research/projects/digipres/
incentives-dp.pdf>
232 Bibliography

Lavoie, B.F. (2004) Of mice and men: economically sustainable preservation for the
twenty-first century. In Access in the future tense, pp.45-54
Lavoie, B. and Dempsey, L. (2004) Thirteen ways of looking at … digital preservation. D-Lib
Magazine, 10 (7/8) <www.dlib.org/dlib/july04/lavoie/07lavoie.html>
Lawrence, G.W. et al. (2000) Risk management of digital information: a file format investi-
gation. Washington, D.C.: Council on Library and Information Resources
Lazorchak, B. (2011) Digital preservation in a box: NDSA Outreach. Posting to The Signal:
Digital Preservation blog, August 3 <blogs.loc.gov/digitalpreservation/2011/08/digital-
preservation-in-a-box-ndsa-outreach>
Lee, C.A. (2008) High-level categories of digital curation functions. Draft, Version 14.
Chapel Hill, NC: DigCCurr <ils.unc.edu/digccurr/digccurr-funct-categories.pdf>
Lee, C.A. (2009) Matrix of digital curation knowledge and competencies (overview). Draft,
Version 13. Chapel Hill, NC: DigCCurr <ils.unc.edu/digccurr/digccurr-matrix.html>
Lee, C.A. (2010) Open Archival Information System (OAIS) Reference Model. In Encyclo-
pedia of Library and Information Sciences, 3rd ed., ed. by M.J. Bates and M.N. Maack,
1(1), pp.4020-4030. Boca Raton, Fl.: CRC Press
Lee, C.A. (ed.) (2011) I, digital: personal collections in the digital era. Chicago: Society of
American Archivists
Lee, K-H. et al. (2002) The state of the art and practice in digital preservation. Journal of
Research of the National Institute of Standards and Technology, 107 (1), 93-106
LeFurgy, B. (2011) Kicking off the 2011 NDIIPP Digital Preservation Partners Meeting.
Posting to The Signal: Digital Preservation blog, July 19 <blogs.loc.gov/digitalpreserva
tion/2011/07/kicking-off-the-2011-ndiipp-digital-preservation-partners-meeting>
Legal deposit (2007) <PADI summary]. National Library of Australia [www.nla.gov.au/
padi/topics/67.html>
Li, Y. and Banach, M. (2011) Institutional repositories and digital preservation: assessing
current practices at research libraries. D-Lib Magazine, 17 (5/6) <www.dlib.org/dlib/
may11/yuanli/05yuanli.html>
Library of Congress (2010) Partner tools and services <www.digitalpreservation.gov/partners/
resources/tools/index.html>
Lim S.L., Ramaiah, C.K. and Pitt K.W. (2003) Problems in the preservation of electronic
records. Library Review, 52 (3), 117-125
Linden, J. et al. (2005) The large-scale archival storage of digital objects: technology watch
report. DPC technology watch series report, 04-03 <www.dpconline.org/docs/dpctw04-
03.pdf>
Lohr, S. (2009) G.E.’s breakthrough can put 100 DCDs on a disc. New York Times, April 27
<www.nytimes.com/2009/04/27/technology/business-computing/27disk.html>
Lord, P. and Macdonald, A. (2003) E-science curation report: data curation for e-science
in the UK: an audit to establish requirements for future curation and provision. Pre-
pared for the JISC Committee for the Support of Research (JCSR). London: Digital
Archival Consultancy
Lorie, R. (2002) The UVC: a method of preserving digital documents ࣓ proof of concept.
Long-term preservation study report series, no. 4. Amsterdam: IBM Netherlands
Lowenthal, D. (1985) The past is a foreign country. Cambridge: Cambridge University
Press
Lukesh, S.S. (1999) E-mail and potential loss to future archives and scholarship or the dog
that didn’t bark. Firstmonday, 4 (9) <firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/
fm/article/view/692/602>
Bibliography 233

Lupovici, C. (2001) Technical data and preservation needs. Paper presented at the 67th
IFLA Council and General Conference, Boston, August 16-25 <www.ifla.org/IV/ifla67/
papers/163-168e.pdf>
Lyman, P. and Kahle, B. (1998) Archiving digital cultural artifacts: organizing an agenda
for action. D-Lib Magazine, July/August <www.dlib.org/dlib/july98/07lyman.html>
Lynch, C.A. (2004) Editor’s interview with Clifford A. Lynch. RLG DigiNews, 8 (4) <world-
cat.org/arcviewer/1/OCC/2007/08/08/0000070519/viewer/file3518.html#article0>
Lyon, L. et al. (2010) Disciplinary approaches to sharing, curation, reuse and preservation:
final report <www.dcc.ac.uk/sites/default/files/documents/scarp/SCARP-FinalReport-
Final-SENT.pdf>
Manus, S. (2011a) It’s been a busy year: partnership highlights. Posting to The Signal: Digi-
tal Preservation blog, August 19 <blogs.loc.gov/digitalpreservation/2011/08/it’s-been-
a-busy-year-–-partnership-highlights>
Manus, S. (2011b) A meeting of the minds for UDFR. Posting to The Signal: Digital Preser-
vation blog, June 17 <blogs.loc.gov/digitalpreservation/2011/06/a-meeting-of-the-minds-
for-udfr/>
Maron, N.L. and Kirby Smith, K. (2008) Current models of digital scholarly communica-
tion: results of an investigation conducted by Ithaka for the Association of Research
Libraries. Washington, D.C.: Association of Research Libraries <www.arl.org/bm~doc/
current-models-report.pdf >
Maron, N.L. and Loy, M. (2011) Funding for sustainability: how funders’ practices influence
the future of digital resources. Bristol: JISC <www.ithaka.org/ithaka-s-r/research/funding-
for-sustainability/FundingForSustainability.pdf>
McGovern, N.Y. (2007a) A digital decade: where have we been and where are we going in
digital preservation? RLG DigiNews, 11 (1) <worldcat.org/arcviewer/1/OCC/2007/07/
10/0000068890/viewer/file1.html#article3>
McGovern, N.Y. (2007b) ICPSR digital preservation policy framework <www.icpsr.umich.
edu/icpsrweb/ICPSR/curation/preservation/policies/dpp-framework.jsp>
McLeod, R. (2008) Risk assessment: using a risk based approach to prioritise handheld digital
information. Presented at iPRES 2008, British Library, London, September 29 <www.bl.
uk/ipres2008/presentations_day1/20_McLeod.pdf>
Meeting of Experts on Digital Preservation (2004) Report on the Meeting of Experts on
Digital Preservation: Metadata Specifications. Washington, D.C.: USGPO <www.nla.
gov.au/padi/metafiles/resources/15663.html>
Mellor, P. (2003) CAMiLEON: Emulation and BBC Domesday, RLG DigiNews, 7 (2)
<worldcat.org/arcviewer/1/OCC/2007/07/10/0000068904/viewer/file1.html#feature3>
MetaArchive Cooperative (2010) Charter. Atlanta, GA: Educopia Institute <www.meta
archive.org/public/resources/charter_member/2011_MetaArchive_Charter.pdf>
Millar, L. (2004) Authenticity of electronic records: a report prepared for UNESCO and the
International Council on Archives. ICA Study, 13-2. Paris: ICA
Minor, D., Phillips, M. and Schultz, M. (2010) Chronopolis and MetaArchive: preservation
cooperation. Presented at iPRES 2010, Vienna, September 21 <www.ifs.tuwien.ac.at/
dp/ipres2010/papers/minor-29.pdf>
Morris, S. (2002) The preservation problem: collaborative approaches. Information Services &
Use, 22, 127-132
Morrissey, S. et al. (2010) Portico: a case study in the use of XML for the long-term preser-
vation of digital artifacts. Presented at International Symposium on XML for the Long
Haul: Issues in the Long-term Preservation of XML, Montréal, Canada, August 2 <www.
balisage.net/Proceedings/vol6/html/Morrissey01/BalisageVol6-Morrissey01.html>
234 Bibliography

Muir, A. (2004) Digital preservation: awareness, responsibility and rights issues. Journal of
Information Science, 30 (1), 73-92
National Library of Australia (1999) A draft research agenda for the preservation of physical
format digital publications <pandora.nla.gov.au/pan/25426/20100713-1409/www.nla.gov.
au/policy/rsagenda.html>
National Library of Australia (2002) Persistent identifiers <www.nla.gov.au/initiatives/per
sistence.html>
National Library of Australia (2008) Digital preservation policy, 3rd ed. <www.nla.gov.au/
policy/digpres.html>
National Research Council (1995) Preserving scientific data on our physical universe: a
new strategy for archiving the nation’s scientific information resources. Washington,
D.C.: National Academy Press
NDIIPP (2011) Preserving our digital heritage: National Digital Information Infrastructure
and Preservation Program. Washington, D.C.: Library of Congress <www.digitalpreser
vation.gov/library/resources/pubs/docs/NDIIPP2010Report_Post.pdf>
nestor Working Group Trusted Repositories – Certification (2006) Catalogue of criteria for
trusted digital repositories. Frankfurt am Main: nestor <files.d-nb.de/nestor/materialien/
nestor_mat_08-eng.pdf>
Neumayer, R. and Rauber, A. (2007) Why appraisal is not ‘utterly’ useless and why it’s not
the way to go either: a provocative position paper <www.digitalpreservationeurope.
eu/publications/appraisal_final.pdf and also responses at www.digitalpreservation
europe.eu/forum/phpBB2/viewtopic.php?t=9>
NISO (2004) Understanding metadata. Bethesda, MD: National Information Standards
Organization Press
Noonan, D.W., McCrory, A. and Black, E.L. (2010) PDF/A: a viable addition to the
presevation toolkit. D-Lib Magazine, 16 (11-12) <www.dlib.org/dlib/november10/noonan/
11noonan.print.html>
NSF-DELOS Working Group on Digital Archiving and Preservation (2003) Invest to save:
report and recommendations of the NSF-DELOS Working Group on Digital Archiving
and Preservation. National Science Foundation & The European Union
Nurnberg, G. (1995) The places of books in the age of electronic reproduction. In Future
libraries, ed. R.H. Bloch and C.A. Hesse, pp.13-37. Berkeley: University of California
Press
OCLC/RLG Working Group on Preservation Metadata (2002) A metadata framework to
support the preservation of digital objects <www.oclc.org/research/projects/pmwg/pm_
framework.pdf>
OCLC/RLG PREMIS Working Group (2004) Implementing preservation strategies for
digital materials: current practice and emerging trends in the cultural heritage com-
munity. Dublin, OH: OCLC
OCLC/RLG PREMIS Working Group (2005) Data dictionary for preservation medata: final
report of the PREMIS Working Group. Dublin, OH: OCLC
Oliver, G. et al. (2008) Report on automated re-appraisal: managing archives in digital librar-
ies. Pisa: DELOS NoE
O’Mahony, D.P. (1998) Here today, gone tomorrow: what can be done to assure permanent
public access to electronic government information? Advances in Librarianship, 22,
107-121
Open Office (2010) File formats <wiki.services.openoffice.org/wiki/Documentation/OOo3_
User_Guides/Getting_Started/File_formats>
Bibliography 235

Pacey, A. (1991) Developing selection criteria for special collections. Canadian Library
Journal, 48, 187-190
Paradigm Project (2008) Workbook on digital private papers <www.paradigm.ac.uk>
PARBICA (2004?) Digital preservation. Toolkit guideline 18 (unpublished)
Paskin, N. (2010) Digital Object Identifier (DOI®) system. In Encyclopedia of Library and
Information Sciences, 3rd ed., ed. by M.J. Bates and M.N. Maack, 1(1), pp.1586-1592.
Boca Raton, Fl.: CRC Press
Pearson, D. (2009) Preserve or preserve not: there is no try: some dilemmas relating to per-
sonal digital archiving <www.nla.gov.au/openpublish/index.php/nlasp/article/view/1388/
1678>
Pennock, M. (2006a) DSpace digital repository software. Edinburgh: Digital Curation Centre
<www.dcc.ac.uk/webfm_send/462>
Pennock, M. (2006b) Fedora. Edinburgh: Digital Curation Centre <www.dcc.ac.uk/webfm_
send/463>
Persistent identifiers (2002) [PADI summary]. National Library of Australia <www.nla.gov.
au/padi/topics/36.html>
Piggott, M. (2001) Appraisal: the state of the art: paper delivered at a professional develop-
ment workshop presented by ASA South Australia Branch 26 March 2001 <asa.oxide
interactive.com.au/appraisal-state-art-26-march-2001>
Planets (2009) Survey analysis report <www.planets-project.eu/docs/reports/planets-survey-
analysis-report-dt11-d1.pdf>
Rackley, M. (2010) Internet Archive, Encyclopedia of Library and Information Sciences,
3rd ed., ed. by M.J. Bates and M.N. Maack, 1(1), pp.2966-2976. Boca Raton, Fl.: CRC
Press
Reich, V. and Rosenthal, D.S. (2009) Distributed digital preservation: private LOCKSS net-
works as business, social, and technical frameworks. Library Trends, 57 (3), 461-471
Research Information Network (2008) Stewardship of digital research data <www.rin.ac.uk/
system/files/attachments/Stewardship-data-guidelines.pdf>
Rhodes, S. (2011) Breaking down link rot: the Chesapeake Project Legal Information Ar-
chive’s examination of URL stability <www.llrx.com/features/linkrot.htm>
RLG-NARA Task Force on Digital Repository Certification (2007) Trustworthy repositories
audit & certification: criteria and checklist. Chicago: Center for Research Libraries
<www.crl.edu/PDF/trac.pdf>
RLG/OCLC Working Group on Digital Archive Attributes (2002) Trusted digital reposito-
ries: attributes and responsibilities. Mountain View, CA: Research Libraries Group
Robinson, M. (2009) Institutional repositories: staff and skills set. Nottingham: SHERPA
<www.sherpa.ac.uk/documents/Staff_and_Skills_Set_2009.pdf>
Rosenthal, D. (2010a) Bit preservation: a solved problem? International Journal of Digital
Curation, 5 (1), 134-148 <www.ijdc.net/index.php/ijdc/article/viewFile/151/224>
Rosenthal, D. (2010b) Format obsolescence: assessing the threat and the defenses. Library Hi
Tech, 28 (2), 195-210 <lockss.stanford.edu/locksswiki/files/LibraryHighTech2010. pdf>
Rosenthal, D. (2010c) The half-life of digital formats, dshr’s blog, November 24 <blog.dshr.
org/2010_11_01_archive.html>
Rosenthal, D. (2010d) Keeping bits safe: how hard can it be? Communications of the ACM,
53 (11), 47-55 <http://cacm.acm.org/magazines/2010/11/100620-keeping-bits-safe-how-
hard-can-it-be/fulltext>
Rosenthal, David (2010e) LOCKSS: Lots of Copies Keep Stuff Safe. Presented to the NIST
Digital Preservation Interoperability Framework Workshop, March 29-31. <http://lockss.
stanford.edu/locksswiki/files/NIST2010.pdf>
236 Bibliography

Ross, S. (2000) Changing trains at Wigan: digital preservation and the future of scholarship.
London: National Preservation Office, British Library <www.bl.uk/blpac/pdf/wigan.
pdf>
Ross, S. (2002) Position paper on integrity and authenticity of digital cultural heritage objects.
DigiCULT Thematic Issue, 1, 7-8 <www.digicult.info/downloads/thematic_issue_1_
final.pdf>
Ross, S. (2004) The role of ERPANET in supporting digital curation and preservation in
Europe. D-Lib Magazine, 10 (7/8) <www.dlib.org/dlib/july04/ross/07ross.html>
Ross, S. (2007) Digital preservation, archival science and methodological foundations for
digital libraries. Keynote address at the 11th European Conference on Digital Libraries,
Budapest, 17 September <www.ecdl2007.org/Keynote_ECDL2007_SROSS.pdf>
Ross, S. and Gow, A. (1999) Digital archaeology: rescuing neglected and damaged data
resources: a JISC/NPO study within Electronic Libraries (eLib) Programme on the
Preservation of Electronic Materials. London: Library Information Technology Centre
<www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf>
Ross, S., Greenan, M. and McKinney, P. (2003) Cross-sectoral development of digital preser-
vation strategies: ERPANET and the expansion of knowledge. In Symposium 2003:
Preservation of Electronic Records: New Knowledge and Decision-making, Ottawa,
15-18 September 2003: preprints
Ross, S. et al. (2004) New organisational structures responding to new challenges: the Digital
Curation Centre in the UK. DCC Public Lecture, Schweizerisches Bundesarchive, 25
October 2004 <www.erpanet.org/events/2004/bern/SR_DCCpresentation_BERNE_erpa
net_mtg_2.pdf>
Rothenberg, J. (1995) Ensuring the longevity of digital documents. Scientific American,
272, 42-47
Rothenberg, J. (1999a) Avoiding technological quicksand: finding a viable technical founda-
tion for digital preservation. Washington, D.C.: Council on Library and Information
Resources
Rothenberg, J. (1999b) Ensuring the longevity of digital information. Washington, D.C.:
Council on Library and Information Resources (Expanded version of Rothenberg 1995)
<www.clir.org/pubs/archives/ensuring.pdf>
Rothenberg, J. (2000) An experiment in using emulation to preserve digital publications
(NEDLIB report series, no. 1). Den Haag: Koninklijke Bibliotheek <www.kb.nl/hrd/dd/
dd_links_en_publicaties/nedlib/NEDLIBemulation.pdf>
Rothenberg, J. (2003) Digital preservation summary. Presented at Practical Experiences in
Digital Preservation Conference, National Archives, Kew, 2-4 April <www.national
archives.gov.uk/documents/rothenberg.pdf>
Rusbridge, A. (2004) Recognising advances in digital preservation. DigiCULT.Info: A News-
letter on Digital Culture, 8, 34-36
Saffady, W. (1993) Electronic document imaging systems: design, evaluation, and implemen-
tation. Westport, CT: Meckler
Sanett, S. (2003) The cost to preserve electronic records in perpetuity: comparing costs
across cost models and cost frameworks. RLG DigiNews, 7 (4) <worldcat.org/arcviewer/
1/OCC/2007/07/10/0000068885/viewer/file1.html#feature2>
Schonfield, R.C. et al. (2004) The nonsubscription side of periodicals: changes in library
operations and costs between print and electronic formats. Washington, D.C.: Council
on Library and Information Resources
Seadle, M. (2010) Archiving in the networked world: LOCKSS and national hosting. Library
Hi Tech, 28 (4), 710-171
Bibliography 237

Shaw, J. (2010) Digital preservation: an unsolved problem. Harvard Magazine, May-June


<harvardmagazine.com/2010/05/digital-preservation-an-unsolved-problem>
Shenton, H. (2000) From talking to doing: digital preservation at the British Library. New
Review of Academic Librarianship, 6, 163-177
(Significance): a guide to assessing the significance of cultural heritage objects and collec-
tions (2001) Canberra: Heritage Collections Council
Simpson, D. (2004) (comp.) Directory of digital preservation repositories and services in
the UK <www.dpconline.org/docs/guides/directory.pdf>
Skinner, K.and Halbert, M. (2009) The MetaArchive Cooperative: a collaborative approach
to distributed digital preservation. Library Trends, 57 (3), 371-392
Slats, J. (2004) Practical experiences of the Digital Preservation Testbed: office formats.
Paper presented at the ERPANET File Formats for Preservation seminar, Vienna, 10-
11 May <www.erpanet.org/events/2004/vienna/presentations/erpaTrainingVienna_Slats.
pdf>
Smith, A. (1999) The future of the past: preservation in American research libraries. Washing-
ton, D.C.: Council on Library and Information Resources
Smith, A. (2002) The cost of providing access. CLIR Issues, 27, 5
Smith, A. (2003) New-model scholarship: how will it survive? Washington, D.C.: Council
on Library and Information Resources
Smith, B. (2002) Preserving tomorrow’s memory: preserving digital content for future genera-
tions. Information Services & Use, 22, 133-139
Snow, K. et al. (2008) Considering the user perspective: research into usage and communica-
tion of digital information. D-Lib Magazine, 14 (5/6) <www.dlib.org/dlib/may08/ross/
05ross.html>
Strang, T. (2003) Choices and decisions. In Symposium 2003: Preservation of Electronic
Records: New Knowledge and Decision-making, Ottawa, 15-18 September 2003: pre-
prints
Sturges, P. (1990) Access to information from the past: the birth of the amnesiac society? In
Conference Proceedings: Papers Presented at the Australian Library and Information
Association 1st Biennial Conference, Perth, Western Australia, September 30-October
5 1990, v.2, pp.295-306. Canberra, ACT: ALIA
Suchodoletz, D. von et al. (2010) Seven steps for reliable emulation strategies: solved
problems and open issues. Presented at iPRES 2010, Vienna, September 22 <www.ifs.
tuwien.ac.at/dp/ipres2010/papers/vonsuchodoletz-53.pdf>
Sullivan, S. et al. (2004) Bringing hidden treasures to light: illuminating DSpace. Paper pre-
sented at VALA 2004 12th Biennial Conference and Exhibition, Melbourne, 3-5 February
<www.vala.org.au/vala2004/2004pdfs/17SHGY.PDF>
Tarrant, D. et al. (2010) Connecting preservation planning and Plato with digital repository
interfaces. Presented at iPRES 2010, Vienna, September 19 <eprints.ecs.soton.ac.uk/
21289/>
Task Force on Archiving of Digital Information (1996) Preserving digital information.
Washington, D.C.: Commission on Preservation and Access
Task Force on the Artifact in Library Collections (2001) The evidence in hand. Washington,
D.C.: Council on Library and Information Resources
Taylor, G. (1996) Cultural selection. New York: Basic Books
Thibodeau, K. (1999) Resolving the inherent tensions in digital preservation, Paper Pre-
sented at NSF Workshop on Data Archiving and Information Preservation, Washington,
D.C., 5 March 1999
238 Bibliography

Thibodeau, K. (2000) Certifying authenticity of electronic records, version 1, 19 April 2000.


Unpublished
Thibodeau, K. (2002) Overview of technological approaches to digital preservation and
challenges in coming years. In The state of digital preservation: an international perspec-
tive, pp.4-31. Washington, D.C.: Council on Library and Information Resources
Thibodeau, K., Moore, R. and Baru, C. (2000) Persistent object preservation: advanced
computing infrastructure for digital preservation. In Proceedings of the DLM Forum on
electronic records, Brusssels, 18-19 October 1999, pp. 113-118. Luxembourg: Of¿ce
for Of¿cial Publications of the European Communities (EUR-OP)
Thompson, D. (2008) Going digital, experiences at the Wellcome Library. Paper presented
at the 29th IATUL Conference, Auckland, 21-24 April <www.iatul.org/doclibrary/public/
Conf_Proceedings/2008/DThompson080306.doc>
Thompson, D. (2010). A pragmatic approach to preferred file formats for acquisition.
Ariadne, 63 <www.ariadne.ac.uk/issue63/thompson/>
Thorpe, V. (2011) Race to save digital art from the rapid pace of technological change. The
Observer, 8 May
Tibbo, H.R. (2003) On the nature and importance of archiving in the digital age. Advances
in Computers, 57, 1-67
Todd, M. (2009) File formats for preservation. DPC technology watch series report 09-02
<www.dpconline.org/component/docman/doc_download/375-file-formats-for-
preservation>
Tonkin, E. (2008) Persistent identifiers: considering the options. Ariadne, 56 <www.ariadne.
ac.uk/issue56/tonkin>
The top 10 digital preservation developments of 2010 (2010) <www.digitalpreservation.gov/
news/2010/20101229news_article_top10stories.html>
Townsend, S., Chappell, C. and Struijvé, O. (1999) Digitising history: a guide to creating
digital resources from historical documents. AHDS Guides to Good Practice <hds.essex.
ac.uk/g2gp/digitising_history/index.asp>
Trehub, A. and Wilson, T.C. (2010) Keeping it simple: the Alabama Digital Preservation
Network (ADPNet). Library Hi Tech, 28 (2), 245-258.
Tristram, C. (2002) Data extinction. Technology Review, October <www2.technology
review.com/articles/02/10/tristram1002.asp>
Tweney, D. (2010) Here comes the zettabyte age. Wired, April 30 <www.wired.com/gadgetlab/
2010/04/here-comes-the-zettabyte-age/>
UK Data Archive (2011?) Create & manage data: formatting your data: file formats table
<www.data-archive.ac.uk/create-manage/format/formats-table>
Underwood, W. (2009) Extensions of the UNIX file command and magic file for file type
identification (Technical Report ITTL/CSITD 09-02). Georgia Tech Research Institute
<perpos.gtri.gatech.edu/publications/TR%2009-02.pdf>
UNESCO (2003) Guidelines for the preservation of digital heritage, prepared by the National
Library of Australia. Paris: UNESCO <unesdoc.unesco.org/images/0013/001300/130071e.
pdf>
UNESCO (2004) Charter on the preservation of the digital heritage. Paris: UNESCO <portal.
unesco.org/ci/en/file_download.php/4cc126a2692a22c7c7dcc5ef2e2878c7Charter_en.pdf>
University of Leeds Representation and Rendering Project (2003) Survey and assessment of
sources of information on file formats and software documentation <www.jisc.ac.uk/
uploaded_documents/FileFormatsreport.pdf>
Upward, F. (2005) The records continuum. In Archives: recordkeeping in society, ed. S.
McKemmish et al., pp.197-222. Wagga Wagga: Centre for Information Studies
Bibliography 239

US-InterPARES Project (2002) InterPARES interpreted: a guide to findings on the preser-


vation of authentic electronic records
Van Bogart, J.W.C. (1995) Magnetic tape storage and handling: a guide for libraries and
archives. Washington, D.C.: Commission on Preservation and Access and National
Media Laboratory <www.clir.org/pubs/reports/pub54>
Van der Hoeven, J.R., Van Diessen, R.J. and Van der Meer, K. (2005) Development of a
Universal Virtual Computer (UVC) for long-term preservation of digital objects. Journal
of Information Science, 31 (3), 196-208
Van der Knijff, J. (2011) JPEG 2000 for long-term preservation: JP2 as a preservation for-
mat. D-Lib Magazine, 17 (5-6) <www.dlib.org/dlib/may11/vanderknijff/05vanderknijff.
html>
Van Diessen, R.J. and Van Rijnsoever, B.J. (2002) Managing media migration in a deposit
system. IBM/KB Long-term Preservation Study Report Series, no. 5. Amsterdam: IBM;
The Hague: Koninklijke Bibliotheek <www.kb.nl/hrd/dd/dd_onderzoek/reports/5-media
migration.pdf>
Van Garderen, P. (2010) Archivematica: using micro-services and open-source software to
deliver a comprehensive digital curation solution. Presented at iPRES 2010, Vienna, Sep-
tember 19 <www.ifs.tuwien.ac.at/dp/ipres2010/papers/vanGarderen28.pdf>
Van Wijngaarden, H. and Oltmans, E. (2004) Digital preservation and permanent access:
the UVC for images <www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/uvc-ist.pdf>
Vogt-O’Connor, D. (1999) Is the record of the 20th century at risk? CRM: Cultural Resource
Management, 22, 21-24
Vogt-O’Connor, D. (2000) Selection of materials for scanning. In Handbook for digital pro-
jects: a management tool for preservation and access, ed. M. Sitts. Andover, Mass.:
Northeast Document Conservation Center <www.nedcc.org/digital/dighome.htm>
Walters, T. and Skinner, K. (2010) Economics, sustainability, and the cooperative model in
digital preservation. Library Hi Tech, 28 (2), 259-272
Walters, T. and Skinner, K. (2011) New roles for new times: digital curation for preserva-
tion. Washington, D.C.: Association of Research Libraries <www.arl.org/rtl/plan/nrnt>
Walton, B. (2003) Accessibility and authenticity in digital records preservation. In Sympo-
sium 2003: Preservation of Electronic Records: New Knowledge and Decision-making,
Ottawa, 15-18 September 2003: preprints
Watters, A. (2011) The Library of Congress’ Twitter archive, one year later. Forbes, June 13
<www.forbes.com/sites/oreillymedia/2011/06/13/the-library-of-congress-twitter-archive-
one-year-later>
Webb, C. (2000) The role of preservation and the library of the future. Paper presented
at CONSAL 2000, Singapore, 26-28 April <www.nla.gov.au/nla/staffpaper/cwebb9.
html>
Webb, C. (2002) Digital preservation – a many-layered thing: experience at the National
Library of Australia. In The state of digital preservation: an international perspective,
pp.65-77. Washington, D.C.: Council on Library and Information Resources
Webb, C. (2004) The malleability of fire: preserving digital information. In Managing preser-
vation for libraries and archives: current practice and future developments, ed. J.
Feather, pp.27-52. Aldershot, Hants: Ashgate
Webopedia (2011) <www.webopedia.com/TERM/A/archive.html>
Wellcome Library (2009) Digital curation toolbox <library.wellcome.ac.uk/node289.html>
Wheatley, P. (2001) Migration: a CAMiLEON discussion paper. Ariadne, 29 <www.ariadne.
ac.uk/issue29/camileon/intro.html>
240 Bibliography

Wheatley, P. (2008) Costing the digital preservation lifecycle more effectively. Presented at
iPRES 2008, British Library, London, September 29 <www.bl.uk/ipres2008/presen
tations_day1/19_Wheatley.pdf>
Wheatley, P. et al. (2007) The LIFE Model v1.1. London: LIFE Project <http://discovery.
ucl.ac.uk/4831>
Whyte, A. and Wilson, A. (2010) How to appraise & select research data for curation.
<www.dcc.ac.uk/resources/how-guides/appraise-select-research-data>
Wilson, A. (2003) Why the Dublin Core Metadata Initiative (DCMI) is important. DigiCULT.
Info: A Newsletter on Digital Culture, 6, 32-34
Woods, K. and Brown, G. (2008) Migration performance for legacy data access. Interna-
tional Journal of Digital Curation, 3 (2), 74–87 <www.ijdc.net/index.php/ijdc/article/
viewFile/88/59>
Woodyard, D. (2000) Digital preservation: the Australian experience. Paper presented at
Positioning the Fountain of Knowledge: 3rd Digital Library Conference, Kuching 2-4
October <www.nla.gov.au/nla/staffpaper/dw001004.html>
Workshop on Research Challenges in Digital Archiving and Long-term Preservation (2003)
It’s about time: research challenges in digital archiving and long-term preservation:
final report. Washington, D.C.: National Science Foundation and Library of Congress
Workshop on the Future of the Past (2011) Summary Report on the Proceedings, Luxem-
bourg, 4-11 May 2011. Luxembourg: European Commission <http://cordis.europa.eu/
fp7/ict/telearn-digicult/future-of-the-past-summary_en.pdf>
Wright, R. (2005) Annual report on preservation issues for European audiovisual collections
<www.prestospace.org/project/deliverables/D22-4_Report_on_Preservation_Issues_
2004.pdf>
York, J. (2010) Building a future by preserving our past: the preservation infrastructure of
HathiTrust Digital Library. Paper presented at World Library and Information Congress:
76th IFLA General Conference and Assembly, Gothenburg, Sweden, 10-15 August
<www.hathitrust.org/documents/hathitrust-ifla-201008.pdf>
Zierau, E. and van Wijk, C. (2008) The Planets approach to migration tools. Paper presented
at IS&T Archiving conference δwww.planets-project.eu/docs/papers/Archiving2008_
Zierau_Wijk.pdfε
Index
A ARL see Association of Research
Libraries
acceptable loss 76, 98
artifact as carrier of information 7, 11
access, changing requirements 15
artifacts, preservation 9, 10, 14, 17, 60-61,
௅ definition 19
75
௅ maintenance 204
Arts and Humanities Data Service 188,
access devices 51-52, see also playback
196
equipment
Association of American Publishers 86
accessibility, definition 19
Association of Research Libraries 9, 170
acclimation, of storage media 126
survey of libraries 106-107
accountability 26, 28, 88
attributes to be preserved 75-98, 203
action, need for 203-204
audiovisual archiving, technology
active strategies 100
preservation 132
administrative metadata 83
audit trails 90
AHDS see Arts and Humanities Data
Australian National Data Service 207-208
Service
Australian practice 5
Alabama Digital Preservation Network
Australian preservationists, views on
178-179
strategies 105-106
Alliance for Permanent Access 31, 172,
Australian Standard for Records
210
Management 63
alteration for preservation 15-16, 75, 89-
authenticity 18, 76, 77, 80, 87-97, 108,
90, 160
156, see also trustworthiness
Amazon AWS 177
௅ definition 20, 22, 88
Amazon S3 165
௅ ensuring 40, 41, 54, 132, 204, 205
amnesiac society 33
௅ pre-digital conditions 89
analogue backups 100, 123, 129-
௅ research 95-97
130
௅ threats to 89-90
appraisal 56, 61-63, 64
automation, 103, 110, 205
௅ and systems design 71
௅ of appraisal processes 64, 73
appraisal practice, and digital materials
௅ of metadata creation and
63-64
management 85-86
archival discs 37, 44, 128
௅ of migration 160
archival file formats, developing 141, 153-
awareness, of preservation issues 37-38,
154
41-42, 103, 200, 204-205
archival practice 28, 58, 79-80
Archival Resource Key see ARK
B
archival value 10-11, 61
Archivematica 166 ‘backup and restore’ 142, 143, 164
archives, and memory 58 backwards compatibility 100
Archives Association of Ontario 62 Barrow, W., on deterioration of books 33
archiving, IT definitions 18 BBC Domesday Project 35, 36, 37, 135,
archivists, and selection 56, 61-63 136-137, 157, 183
preservation responsibility 28, 57 BEAM 131
ARK 87 benign neglect 3, 8, 10, 17, 201
242 Index

Besser, H. on preservation challenges 52- collection maintenance, decline 40


53 collection-based models 7
best practice 5 collections, declining importance 7
Bibliothèque nationale de France 184 combining strategies and practices 141,
bit-stream copying 100, 141, 142 165-166
bit-stream preservation 18, 79 commercial data archiving 164-165
blogs, preservation 27 commercial indifference to preservation
Blue Ribbon Task Force 210 37
Bodleian Electronic Archives and Committee on Institutional Cooperation
Manuscripts see BEAM 194
books, survey of deterioration 33 Common Strategic Framework 110
born-digital materials 22 community requirements 92, 93, 159
British Library 72, 134, 164, 184, 215 community will 105
BS4783 22 compact disks see CDs
business continuity 88 compromise of data 54, see also data
business planning methodologies 208 loss
business records, loss of 34 computer games, emulation 134
Computer History Museum 133
C computer scientists, training 103
conceptual objects, definition of 23
California Digital Library 149 conservation 10, 14, 17
CAMiLEON Project 135, 136-137, 161, Consultative Committee for Space Data
183, 196, 215 Systems 81
Canadian federal government records 36 content information 84
canonicalization 101 context preservation 51, 61-62, 66-67, 79-
carriers see digital storage media; physical 80, 89
objects continuous maintenance, need for 3, 12,
CASPAR 149-150, 160, 187, 205 101, 207
CD-R see optical disks continuum approach, and selection 68
CDs, storage and handling 45, 126-127 Conway, P. on paradigm shift 15
Cedars 70, 85, 90, 183, 196, 208, 215 Cook, T. 36, 58
Center for Research Libraries 97, 201 cooperation see collaboration
Central Connecticut State University copying 12
Library 165 copyright law 65, see also intellectual
certified digital archives see trusted digital property
repositories Cornell University 133, 177, 213
challenges 2-4, 12, 39, 40-41, 52-53, 199- corporate records see business records
219 cost models 210-211
choice 15 costing of preservation 204, 208
CICS see Committee on Institutional costs, of collaboration 170
cooperation Council on Library and Information
CLIR see Council on Library and Resources 181
Information Resources Cox, R. on paradigm shift 11
CLOCKSS 32, 178 Creative Archiving at Michigan and Leeds
Cloonan, M. on paradigm shift 12, 14 see CAMiLEON
cloud storage 165, 203 creators of information, preservation role
collaboration 29-30, 31, 40, 168-170, 200, 9, 30, 32, 67, 202
202 cultural heritage institutions, role 57
collaborative infrastructures 110, 169 cultural imperatives 25, 57
Index 243

Cultural, Artistic and Scientific descriptive metadata 83, 84


Knowledge for Preservation, Access designated communities 59, 67, 92-93,
and Retrieval see CASPAR 159
curation see digital curation deterioration of traditional library
Curation Lifecycle Model 15, 68-69, 190 materials 33
CURL Exemplars in Digital Archives see DigCCurr 212
Cedars digital archaeology 101, 124, 130
cyberinfrastructure 9-10, 169, 180 Digital Archive 165, 166
cyberscholarship 9-10, 205 digital artifacts 40-50
digital continuity, threats to 42-43
D Digital Curation Centre 38, 72, 112, 134,
149, 160, 190-191, 196, 197, 215, see
DAITSS 189 also Curation Lifecycle Model
Dark Archive in the Sunshine State see Digital Curation Curriculum see DigCCur
DAITSS digital curation, definition 16, 190
data, as preservation focus 78-79 digital forensics 124, 130-131
௅ definition 15 digital heritage, definition 20, 21
data archiving services 164-165 digital information, volume 40, 44, 57, 64,
Data Asset Framework 69 200, 205
data creation 202 Digital Lives project 31
data creators 30 digital materials see also digital
Data Curation Profiles project 68 information; digital storage media
data formats see file formats ௅ complexity 3, 13, 77-78, 200
data loss 33-37 ௅ definition 20, 21, 22
Data Preservation Alliance for the Social ௅ dependence on playback
Sciences see Data-PASS machinery 40
data protection legislation 65 ௅ deterioration causes 44-45
data recovery 34, 36-37, 101, 130-131 ௅ fragility 40
data recovery companies 41 ௅ selection for preservation 56-74
data / technology separation 78 Digital Object Identifier see DOI
data transformation see alteration for digital objects 50-55
preservation ௅ attributes 17
data-driven science 31 ௅ definition 50
Data-PASS 179 digital preservation, definition 20
database, definition 15 ௅ dynamic nature 16-17, 118
databases 80 ௅ future 199-221
DataONE 213 ௅ holistic approach 110
DataShare Project 113 ௅ management 206-208
DeepARC 184 ௅ models 12, 15, see also Curation
definitions, differences among information Lifecycle Model; OAIS Reference
professionals 18, 87 Model
௅ need for new 7-8, 13-16 ௅ practice 103-107
௅ pre-digital 13-14 ௅ publications 38
௅ revised 16-21 ௅ social aspects 12, 17, 26, 105, 106
௅ revision of 13-16 ௅ threats to 200
DELOS Programme 216, 217 ௅ typology 118-119
democracy, support of 26 Digital Preservation Coalition 70, 134,
description information 82, 84, see also 181, 183, 191-192
metadata ௅ typology of strategies 114
244 Index

digital preservation programmes see education and training 205, 212-213


preservation programmes Educopia Institute 180
Digital Preservation Testbed 150, 155, 215 Electronic Cultural Atlas Initiative 9
digital repositories 169, 175-176, 189 electronic journals 169, 178
Digital Repository Audit Method Based electronic publications, deposit system
on Risk Assessment see 185
DRAMBORA electronic recordkeeping 11, 95
digital signatures 90 electronic records 95
digital stewardship 16 Electronic Resource Preservation and
digital storage media 42-50, 101, 123, Access Network see ERPANET
125-128 Elsevier Science 32
௅ lack of artifactual value 45 email 32, 33, 35
digital storage systems see mass storage ௅ essential elements 76, 93-94
systems emulation 101, 105, 107, 134-138, 187
digital time-stamping 90 encapsulation 90, 101, 141, 162-163
digital watermarking 90 Encoded Archival Description 84
DigitalPreservationEurope Research entertainment industry 32
Roadmap 217 environmental conditions 125-127, 128
digitization, and preservation 22 equipment obsolescence see obsolescence
௅ selection for 69-70 ERPANET 185-186, 215
digitized materials 194-195 ௅ case studies 59, 104, 186, 195
Dioscuri 137-138 eScience Core Programme 190
DMP Online 112 essence see essential elements, significant
‘do nothing’ strategy 123, 124 properties
documentation, maintaining essential 22, essential elements 75, 90, see also
53-55 significant properties
DOI 87 ௅ definition 23
Domesday Reloaded 36, 137 European Commission 110, 170, 185-187,
DRAMBORA toolkit 98 218
DROID 148 European Commission for Preservation
DSpace 175-176 and Access 181
Dublin Core 84 evidential value 26, 76, 80, 88, 95
DuraCloud 165, 175, 177 Ex Libris Rosetta 166
Duranti, L. 95 executable files 79
DuraSpace 165, 175, 176 expertise see skills, development of
DVD-R see optical disks eXtensible Markup Language see XML

E F
e-Depot 185 Fedora 175, 177
e-journals 32 FIDO project 131
e-repositories see digital repositories file compression 52-53
e-research see cyberscholarship file formats 141, 143-154
e-science see cyberscholarship ௅ proliferation 40
EAD see Encoded Archival Description ௅ registries 147-150
economic rationale 26 ௅ restricting range of 152-153, see
economic sustainability 209-210 also normalization
ECPA see European Commission for ௅ standardization 144, 150-152, 157
Preservation and Access ௅ sustainability criteria 150-151
Index 245

Florida Digital Archive 152, 158, 189 incentives 209


Forensic Investigation of Digital Objects individuals, preservation responsibility 31-
see FIDO 32, 68, 214-215
formalization 116 information packages, definition 20, 22,
Functional Requirements for Evidence in 82
Recordkeeping project 34, 54, 95 information professionals, preservation
functional requirements, of systems 81-82 definitions 18, 87
funding 105, 200, 203, 204, 208-211 ingest 82
future, of digital preservation 199-221 initiatives, typologies 171-172
InSPECT 90, 91, 94, 204
G Institute of Museum and Library Services
218
geoscience, high level of expertise 37
institutional repositories 31, 175
Global Digital Format Registry 149
integrity 76, 89, see also trustworthiness
Global Remote Access to Emulation see
௅ checking 177
GRATE
௅ definition 88
government documents PLN 179
௅ ensuring 40, 54, 204
GRATE 137, 187
intellectual property 29, see also copyright
Greek drama, and authenticity 88-89
law
growth rate of digital materials 33, 41, 57,
intellectual property rights 40, 41, 64, 65,
205
67, 70, 170, 172, 184, 202-203
Inter-University Consortium for Political
H
and Social Research 152, 158
Handle System 87 international collaboration 170, 172-184,
handling media 101, 127, see also storage 200
and handling practices International Curriculum Education Forum
Harvard University Library 148 212
HathiTrust 194-195 International DOI Federation 87
HATII 186, 190 International Internet Preservation
HD Rosetta 128 Consortium 174, 183-184
Hedstrom, M., on paradigm shift 12 Internet Archive 34, 69, 72-73, 173-174,
Heritrix 184 184
Hewlett Packard 176 internet, growth 8-9
Humanities Advanced Technology and InterPARES 15, 95-97, 160, 204, 213
Information Institute see HATII and selection criteria 71
humanities research 9 interpretability, maintaining 55
hybrid libraries 7 ISO 14721:2003 81
ISO 15489 63
I ISO 19005-1 153
ISO 28500 184
IBM 176
ISO 32000-1 153
ICE Forum see International Curriculum
ISO/DIS 16363 97
Education Forum
ISTBAL 186
ICPSR see Inter-University Consortium
ITHAKA 175
for Political and Social Research
identity, definition 88
J
IIPC see International Internet
Preservation Consortium Jenkinson, Sir H. 28
ImageMagick 160 Jewish Women’s Archive 165
246 Index

JHOVE 148-149 logical objects 23


JISC 110, 183, 190, 196-197, 210, 218 definition 50-51
Joint Information Systems Committee see long-term, definition 19, 23, 81
JISC longevity, changing definition 15
journals, preservation of 174 loss of digital information 33-37, see also
JPEG 2000 154 acceptable loss
JSTOR 148, 174-175 Lots Of Copies Keep Stuff Safe see
JSTOR/Harvard Object Validation System LOCKSS Project
see JHOVE
M
K
magnetic media 46-47
Kahle, Brewster 173 ௅ lifespan 126
KEEP 187 ௅ storage conditions 125
keep everything approach, sustainability magnetic tapes 46-47
73, 203 mainstream, integration into 106, 203,
Keeping Emulation Environments 204, 205-208
Portable see KEEP maintenance of playback equipment see
Keeping Research Data Safe 210 playback equipment, maintenance
Kenney, A., on scalability 213 MAM-A 37
knowledge, preservation of 14 MAME project 134
Koninklijke Bibliotheek 32, 135, 138, 185 manufacturing quality of storage media
KRDS see Keeping Research Data Safe 45, 127-128
Kroll Ontrack 41 manuscript repositories, survey of 104
Marcum, D., on paradigm shift 11
L mass preservation treatments 14
Landsat data 36 mass storage systems 101, 141, 163-165
Lavoie, B., on economic issues 209 Massachusetts Institute of Technology see
Legacy Media Project 37 MIT
legal deposit legislation 9, 66, 88 media see also magnetic media; digital
legal reasons for preservation 28, 29 storage media
legal rights see intellectual property rights ௅ acclimation 126
libraries, preservation role 27-28, 57 ௅ definition 11
௅ selection practice 59-61 medium-term, definition 23
Library of Congress 27, 31, 68, 150, 184, MetaArchive Cooperative 179-180
192, 215-216 metadata 82-88, 96, 106, 154, 200
LIFE 69, 210 ௅ extraction tools 85-86
Life Cycle Information for E-Literature ௅ management tools 85
see LIFE ௅ schemes 84, 85-86
lifecycle management, risks 200 Metadata Encoding and Transmission
lifecycle models 68-69 see also Curation Standard see METS
Lifecycle Model; LIFE; Data Asset Metadata Object Description Schema see
Framework MODS
lifespan, of storage media 43-44, 126, METS 85, 177, 183, 195
128 micro-services 166
local government, awareness of issues 37 microfilm, copying to 129, 130
local history collections 214 migration 45, 101, 104, 105, 141, 156-161
LOCKSS Project 117, 143, 162, 178-179, ௅ and file formats 147-148
180 ௅ costs 159
Index 247

௅ on request 161-162, 183 NDLTD see Networked Digital Library of


௅ tools 160 Theses and Dissertations
MIT 175-176 NEDLIB 85, 135, 185
models see digital preservation models; Networked Digital Library of Theses and
lifecycle models Dissertations 180
MODS 84 Networked European Deposit Library see
museum objects, and provenance 89 NEDLIB
museums of computing 133 non-custodial models 7
museums, preservation role 57 non-proprietary formats and systems see
open formats and systems
N ‘non-solutions’ 121, 122-124
normalization 101, 105, 141, 158
NASA data 35 NSF see National Science Foundation
Nationaal Archief van Nederland 138, NSSDC see National Space Science Data
150, 186 Center
national archives, digital preservation
strategies 104 O
௅ survey of 106
National Archives (UK) 90, 137, 148, 149, OAIS Reference Model 19, 22, 59, 67, 80-
166 82, 85, 92-93, 97, 111-112, 113, 162,
National Archives of Australia 37, 53, 90, 166, 177, 182, 189, 195, 202
91, 154, 155, 163, 207 obsolescence 43-44, 51-52, 55, 128, 132,
national collaboration 187-195 134, 142, 156, 200
National Digital Heritage Archive 110-111 OCLC 85, 87, 105, 165, 182-183, 202
National Digital Information Infrastructure Online Computer Library Center see
and Preservation Program see OCLC
NDIIPP Ontario Hydro nuclear power plant 36
National Digital Stewardship Alliance Open Archival Information System see
180, 193-194, 215 OAIS
National Endowment for the Humanities open formats and systems 145-147, 153,
218 150
National Historical Publications and Open Office 145, 160
Records Commission 218 Open Planets Foundation 112, 160, 187
national libraries 183-184 optical disks 37, 47-49
௅ and legal deposit 60 ௅ care and handling 126-127
௅ cooperation with publishers 32 ௅ lifespan 128
௅ role 28 ௅ storage conditions 126
originals, preservation of 10, 60-61,
௅ survey of 106
132
௅ use of OAIS model 81
National Library of Australia 41, 91, 170,
P
181-182, 214
National Library of New Zealand 110-111 PADI 170, 181-182
metadata extraction tool 85-86 PANDORA 72-73
National Museum of Computing 133 paper, printing to 129
National Science Foundation 112, 192, paper deterioration 33
215, 216, 217-218 paradigm shift 3, 7, 8-10
National Space Science Data Center 36 partnerships see collaboration
NDIIPP 111, 160, 170, 179, 180, 192-193, passive strategies 100
217-218 PDF, as standard 104, 153
248 Index

PDF/A 153-154, 202 ௅ definition 16


PeDALS 179 ௅ extent of problem 12, 33
performance model 53, 91-92 ௅ principles 10, 14, 15
PersID 87 ௅ purposes 25-27
Persistent Digital Archives and Library ௅ responsibility for 25, 200-201
System see PeDALS Preservation and Long-term Access
persistent identifiers 86-87 through Networked Services see
persistent object preservation 101 Planets
Persistent Uniform Resource Locator see preservation metadata 84-86
PURL Preservation Metadata: Implementation
Personal Computer Museum 133 Strategies see PREMIS
personal correspondence 32 preservation paradigms 7-13
Personal Digital Archiving conferences 31 ௅ digital 12-13
personal information 31 ௅ pre-digital 3, 10-12, 17, 101-102,
personnel required 13, 200, 211-213 123
physical collections 7 preservation paradox 15-16, 39-40, 75
physical objects 7, 23 preservation programmes 14, 23
௅ as carriers of information 11 ௅ definition of 21, 24
௅ definition 50 ௅ surveys of 11
Piltdown man 75 ‘preserve objects’ approaches 140-167
Planets 137, 139, 160, 187, 205, 213, ‘preserve technology’ approaches 121-139
218 Preserving Access to Digital Information
௅ survey 81 see PADI
planning 111 Preserving and Accessing Networked
Planning Tool for Trusted Electronic Documentary Resources of Australia
Repositories see PLATTER see PANDORA
planning tools 112 principles 118, 201
Plato 112, 187 ௅ definition 99
PLATTER 98 Private LOCKSS Networks see PLNs
playback equipment, dependence on 40 procedures, definition 99
௅ maintenance 45, 46, 133 professional imperatives 27-29
PLNs 178-179, 180 project-based management 206-207
policies 111, 112-113 PRONOM 148, 149
policy, definition 99 proprietary formats and systems 79, 106,
policy development 101, 113, 123, 202 144-147, 166
Portable Document Format see PDF provenance 80, 89
Portico 97, 149, 155 Public Record Office of Victoria 155,
practices 119 162
௅ criteria for effective 107-111 publishers, preservation responsibility 9,
௅ definition 99 32, 169
pre-digital paradigm see preservation PURL 87
paradigms, pre-digital
pre-print archives 31 Q
PREMIS 85, 189, 195
QEMU 137
presentation of digital files 77
preservable essence see essential elements;
R
significant properties
preservation, awareness of issues 37-38 recordkeeping 27
௅ conceptualization of 15-16, 74 recordkeeping practice 58, 68, 79
Index 249

redundancy 117, 177 scientists, role in digital preservation 29, 31


reformatting 10 search engine companies 32
௅ and paradigm shift 14 sectoral collaboration 195-197
refreshing 12, 101, 105, 141, 142-143, 156 Securing a Hybrid Environment for
regional collaboration 185-187 Research Preservation and Access see
Registry of Digital Masters 182 SHERPA
relative humidity in storage areas 125-126 selection, for digitization 69-70
rendering of digital files 77 for preservation 56-74, 200, 201, 203
replication 101, 141, 143 ௅ research 73-74
representation information 82, see also ௅ vs. keeping everything approach
metadata 58, 73
research, into digital preservation 215-218 selection criteria 59, 72
Research Information Network 38 ௅ in archives 61-63
research libraries 169 ௅ in libraries 59-61
௅ new roles 10, 11, 29 ௅ bases for 21, 58-59
௅ North American survey 42, 106- selection decisions, debate about 58
107 ௅ importance 57
Research Libraries Group 85, 107, 182- ௅ influence of preservation cost 63,
183 64
௅ survey 41, 104-105, 144 selection frameworks 69-72
responsibility for preservation 200, 201 separation of data from technology 78,
restoration, definition 11 103, 140
restricting file formats 152-153, see also SHAMAN 187
normalization SHERPA 212
reversibility of treatments 10 short-term, definition 23
RIN see Research Information Network significant properties 70, 75, 88, 90-94,
risk management 71-72, 124, 134, 208 106
RLG see Research Libraries Group simple version migration 156
Rothenberg, J. 39 skills, development of 205, 211-213
௅ and emulation 135-136 ௅ required 13, 29, 200, 211-212
௅ on media lifespan 43 small institutions 214
௅ on migration 159 Smithsonian Astrophysics Observatory
௅ on strategies 107, 115-116 Astrophysics Data System 68
social imperatives 26, 57
S social science data 36
Social Sciences and Humanities Research
Safety Deposit Box 166 Council of Canada 36
scalability, of solutions 213-215 societal benefits 57
SCARP 31 software, dependence on 52-53, 77-78
௅ case studies 59, 68, 190-191, 195 ௅ reverse engineering 101
Schlesinger Library 27, 207 software obsolescence see obsolescence
scholarly output, dissemination 31 SoX 160
scholars, preservation expectations 88 St Croix African Roots Project 214
௅ role in digital preservation 29, stakeholders, expansion of 27, 29, 101,
31 103
Schweizerisches Bundesarchiv 186 ௅ preservation role 67-68, 169, 202,
science publishers, survey of 104 209, 215
scientific data, and research 26, 29 standardizing file formats 150-152,
௅ loss 36 157
250 Index

standards and standardization 18, 81, 85- technology-independent strategies 107,


86, 101-105, 111, 123-124, 141, 201- 140
202 temperature, for storage areas 125-126
Stanford University Libraries 131, 149 Tesella 166
State Library of New South Wales 50 Testbed Digitale Bewaring see Digital
storage and handling practices 12, 101, Preservation Testbed
123, 124-127 Thibodeau, K., on strategies 108, 131
storage capacity 44, see also mass storage ௅ typology of strategies 115
systems Tibbo, H., on future 219-220
storage conditions 124-126 TIFF 104
storage costs 209 TRAC 97, 195
strategies 99-120 training see education and training
௅ criteria for effective 107-111 transformation see alteration for
௅ definition 99 preservation
௅ history 101-103 Triangle Research Libraries Network 194
௅ independent of technology 107 trusted digital repositories 30, 90, 97-98,
௅ sustainability 103, 105, 106, 110, 195
113 Trusted Digital Repositories and Audit
௅ typology of 100, 114-118 Checklist see TRAC
structural metadata 83 trustworthiness 53-54, 80, see also
surveys, of book deterioration 33 authenticity; integrity
௅ of European national libraries 41- Tufts University 177
42, 106 Twitter content 27
௅ of international practice 105 typologies, of initiatives 171-172
௅ of North American research ௅ of strategies 100, 114-118
libraries 106-107
௅ of preservation programmes 11 U
௅ of RLG members 41, 104, 107,
UK Data Archive 152
144
UKOLN 190
௅ of science publishers 104
UNESCO 21, 180-181
௅ of UK local government 37
௅ typology of strategies 116-117
௅ of US manuscript repositories 104
Unified Digital Formats Registry 149
sustainability, of strategies 103, 105, 106,
Uniform Resource Locator see URL
110, 113
Uniform Resource Name see URN
systems design 71, 80
Universal Virtual Computer 101, 138-139
University of British Columbia 95
T
University of California 194
Task Force on Archiving of Digital University of California Digital Curation
Information 30, 35, 97, 182 Center 166
Task Force on the Artifact in Library University of Edinburgh 190
Collections 45 University of Leeds 148, 183
technical infrastructure, cost 208 University of Leeds Representation and
technical metadata 93 Rendering Project 148
technical solutions 107 University of Michigan 95, 183
technological obsolescence see University of Pittsburgh 34, 54
obsolescence university repositories 169, 175-176, 189
technology preservation 101, 132-133 UNIX 147
technology watch 101, 133-134 URL 86
Index 251

URN 86-87 Wayback Machine 173, 174


US 1960 Census data 35 web archiving 152, 173-174, 178, 184
US federal government records 33-34, 35 ௅ selective approach 72-73
user communities 92, 159 Web Curator Tool 184
user groups, and selection 59, 60, 67 web domain harvesting 72-73
user-driven selection for preservation 61 web sites, and varying displays 77
UVC see Universal Virtual Computer ௅ identification 86
௅ selection criteria 76
V web-based information, impermanence of
5, 197-198
value, determining see selection
Wellcome Library 152, 203, 207
VERS 155, 162, 166
௅ selection framework 71
version incompatibility 53
Wellcome Trust 112
version migration 156
Wheatley, P., on migration 157
Vietnam War combat area casualty data
Windows Virtual PC 134
35
Wotsit’s Format 150
viewers 141, 161
Viking Mars mission data 35
X
Visual Resources Association Core see
VRA Core Xena 154, 163
VMWare Fusion 134 XML 85, 101, 104, 154-155, 175, 177
VRA Core 84 XML Electronic Normalising of Archives
see Xena
W
Y
Walton, B., typology of strategies 114-115
WARC file format 184 Yale University Library 157, 177

You might also like