Ross Harvey - Preserving Digital Materials-De Gruyter Saur (2011)
Ross Harvey - Preserving Digital Materials-De Gruyter Saur (2011)
De Gruyter Saur
Ross Harvey
Preserving
Digital Materials
2nd Edition
De Gruyter Saur
ISBN 978-3-11-025368-9
e-ISBN 978-3-11-025369-6
ISSN 2191-2742
List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 1
What is Preservation in the Digital Age? Changing Preservation
Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Changing paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
The need for a new preservation paradigm . . . . . . . . . . . . . . . . . . . . . . . 10
Changing definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Preservation definitions in the digital world . . . . . . . . . . . . . . . . . . . . . . 16
What exactly are we trying to preserve? . . . . . . . . . . . . . . . . . . . . . . . . . 21
How long are we preserving them for? . . . . . . . . . . . . . . . . . . . . . . . . . . 23
What strategies and actions do we apply? . . . . . . . . . . . . . . . . . . . . . . . . 24
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 2
Why do we Preserve? Who Should do it? . . . . . . . . . . . . . . . . . . . . . . 25
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Why preserve digital materials? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Professional imperatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
New stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
How much data have we lost? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Current state of awareness of digital preservation problems. . . . . . . . . . . 37
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 3
Why There’s a Problem: Digital Artifacts and Digital Objects . . . . . . . 39
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Modes of digital death . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Digital storage media. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Magnetic media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Optical disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
The future for digital storage media . . . . . . . . . . . . . . . . . . . . . . . . . 49
VI Contents
Chapter 4
Selection for Preservation – The Critical Decision. . . . . . . . . . . . . . . . 56
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Selection for preservation, cultural heritage, and professional practice. . . 57
Selection criteria traditionally used by libraries and archives . . . . . . . . . . 59
Why traditional selection criteria do not apply to digital materials . . . . . . 63
IPR, context, stakeholders, and lifecycle models. . . . . . . . . . . . . . . . . . . 65
Intellectual property rights and legal deposit. . . . . . . . . . . . . . . . . . . 65
Context and community. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Stakeholder input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Value of lifecycle models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Developing selection frameworks for preserving digital materials . . . . . . 69
Some selection frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
How much to select? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Chapter 5
What Attributes of Digital Materials Do We Preserve? . . . . . . . . . . . . 75
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Digital materials, technology, and data. . . . . . . . . . . . . . . . . . . . . . . . . . 77
The importance of preserving context. . . . . . . . . . . . . . . . . . . . . . . . . . . 79
The OAIS Reference Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
The role of metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Preservation metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Preservation metadata standards. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Persistent identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Authenticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Significant properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Research into authenticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Functional Requirements for Evidence in Recordkeeping Project
(Pittsburgh). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
InterPARES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Trusted digital repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Contents VII
Chapter 6
Overview of Digital Preservation Strategies . . . . . . . . . . . . . . . . . . . . 99
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Historical overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Who is doing what?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Criteria for effective strategies and practices. . . . . . . . . . . . . . . . . . . . . . 107
Broader concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Typologies of principles, strategies, and practices. . . . . . . . . . . . . . . . . . 114
A typology of digital preservation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Chapter 7
‘Preserve Technology’ Approaches: Tried and Tested Methods. . . . . . 121
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
‘Non-solutions’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Do nothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Storage and handling practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Durable/persistent digital storage media . . . . . . . . . . . . . . . . . . . . . . 127
Analogue backups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Digital archaeology and digital forensics . . . . . . . . . . . . . . . . . . . . . 130
‘Preserve technology’ approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Technology preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Technology watch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
The Universal Virtual Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Chapter 8
‘Preserve Objects’ Approaches: New Frontiers? . . . . . . . . . . . . . . . . . 140
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
‘Preserve Objects’ approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Bit-stream copying, refreshing, and replication . . . . . . . . . . . . . . . . . . . . 142
Bit-stream copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Refreshing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Standard data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
File format registries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
VIII Contents
Chapter 9
Digital Preservation Initiatives and Collaborations . . . . . . . . . . . . . . . 168
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Typologies of digital preservation initiatives . . . . . . . . . . . . . . . . . . . . . 171
International initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . 172
International services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The Internet Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
JSTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
DuraSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
LOCKSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
MetaArchive Cooperative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
International alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
UNESCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
PADI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
OCLC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
CAMiLEON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
International Internet Preservation Consortium . . . . . . . . . . . . . . 183
Regional initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Regional services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
NEDLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Regional alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
ERPANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
European Commission-funded projects . . . . . . . . . . . . . . . . . . . 186
Digital Recordkeeping Initiative . . . . . . . . . . . . . . . . . . . . . . . . 188
National initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 187
National services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
AHDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Florida Digital Archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
National alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Digital Curation Centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Digital Preservation Coalition . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Contents IX
NDIIPP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
National Digital Stewardship Alliance . . . . . . . . . . . . . . . . . . . . 193
HathiTrust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Sectoral initiatives and collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Sectoral services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Cedars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Sectoral alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
JISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Chapter 10
Challenges for the Future of Digital Preservation . . . . . . . . . . . . . . . . 199
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
What have we learned so far? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Four major challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Challenge 1: managing digital preservation . . . . . . . . . . . . . . . . . . . 206
Challenge 2: funding digital preservation . . . . . . . . . . . . . . . . . . . . . 208
Challenge 3: peopling digital preservation . . . . . . . . . . . . . . . . . . . . 211
Challenge 4: making digital preservation fit . . . . . . . . . . . . . . . . . . . 213
Research and digital preservation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Conclusion: the future of digital preservation . . . . . . . . . . . . . . . . . . . . . 219
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
List of Figures
– The likely high costs of taking action, and the likely high costs of delaying or not
taking action (including the likelihood of loss of access)
– A mismatch between funding cycles and long term preservation commitments, even
for long existing institutions …, leading to the possibility that some preservation
commitments may have to be given priority over others
– Intellectual property and other rights-based constraints on preservation processes
and on the provision of access
– Administrative complexities in ensuring timely action is taken that will be cost-
effective over very long periods of time
– The need to develop and maintain suitable knowledge and systems to deal with
these challenges (National Library of Australia, 2008).
The need for digital preservation touches all our lives, whether we work in commercial
or public sector institutions, engage in e-commerce, participate in e-government, or use
a digital camera. In all these instances we use, trust and create e-content, and expect
that this content will remain accessible to allow us to validate claims, trace what we
have done, or pass a record to future generations (NSF-DELOS Working Group on
Digital Archiving and Preservation, 2003, p.i).
These words remain as relevant now as when they were written almost ten
years ago.
We cannot expect a technological quick fix. We now appreciate that the
challenges of maintaining digital materials so they remain accessible in the
future are not just technological. They are equally bound up with organiza-
tional infrastructure, resourcing, and legal factors, and we have not yet got
the balance right. These and other factors combine to make the task difficult,
although there are clear pointers to the way ahead. As Breeding (2010, p.32)
notes, ‘while the current state of the art in digital preservation falls short of an
4 Introduction
ideal system that guarantees permanent survival, much has been done to address
the vulnerabilities inherent in digital content’.
Both the library community and the recordkeeping community (archivists
and records managers), as well as an increasing number of other groups, are
energetically seeking solutions to the challenges of digital preservation. Over
the last decade there has been increased sharing of the outcomes of research
and practice. Developments in one community have considerable potential to
assist practice in other information and heritage communities. This book goes
some way towards addressing this need by providing examples from several
different communities.
Although much high-quality information is available to information pro-
fessionals concerned with preserving materials in digital form, most notably on
the web, its sheer volume causes problems for busy information professionals,
scholars and scientists, and individuals who wish to understand the issues and
learn about strategies and practices for digital preservation. Preserving Digital
Materials is written for these time-poor information professionals, scholars
and scientists, and individuals. Its synthesis of current information, research
and perspectives about digital preservation from a wide range of sources
across many areas of practice makes it of interest to a wide range of readers
from preservation administrators and managers who want a professional refer-
ence text to thinking practitioners who wish to reflect on the issues that digital
preservation raises in their professional practice. It will also be of interest to
students.
The reader should note two features of this book. Preserving Digital Mate-
rials is not a how-to-do-it manual, although it does include information about
practical applications, so it is not the place to learn how to apply the technical
procedures of digital preservation. It is not primarily concerned with digitiza-
tion and makes little distinction between information that is born-digital and
information that is digitized from physical media.
This book addresses four key questions which give the text its four-part
structure:
The first edition of Preserving Digital Materials (2005) used many Australian
examples, because Australian practice in digital preservation – from the library,
recordkeeping, audiovisual archiving, data archiving and geoscience sectors –
was often at the forefront of international best practice. This second edition of
Preserving Digital Materials provides a more international perspective, noting
major initiatives in the UK, the EU and the US since 2005. It is possible to do
this in 2011 because of the considerable quantity of material reported by these
and many other initiatives and readily available on web sites, in conference
proceedings and from other public sources.
As noted above, there is a considerable amount of high-quality information
available about preserving materials in digital form, much of it available on the
web. The accessibility that this provides is countered by the impermanence of
much web material, as noted in several chapters in this book. All URLs in this
book were correct at the time of writing.
The first edition of this book acknowledged my indebtedness to many
people, and these debts still remain. Producing the first edition I benefited from
discussions with many colleagues at that time. In particular, I acknowledged
the following individuals for their ideas and support: Tony Dean for suggesting
the example of Piltdown Man; Liz Reuben, Matthew Davies, Stephen Ellis and
Rachel Salmond for case studies; Alan Howell, of the State Library of Victoria,
and staff of the National Library of Australia, in particular Pam Gatenby, Colin
Webb, Kevin Bradley and Margaret Phillips, for their assistance with clarifying
6 Introduction
concepts. Some of the material in the first edition was based on interviews
with Australian digital preservation experts, whose assistance and encourage-
ment was invaluable: Toby Burrows, Mathew Davies, Ray Edmondson, Stephen
Ellis, Alan Howell, Maggie Jones, Gavan McCarthy, Simon Pockley, Howard
Quenault, Lloyd Sokvitne, Paul Tresize, and Andrew Wilson. Heather Brown
and Peter Jenkins provided examples, and their assistance and the permission
of the State Library of South Australia was gratefully acknowledged. Thanks
were due to Ken Thibodeau, CLIR, ERPANET and UNESCO for permission
to use their material. I acknowledged my gratitude to my then employer, Charles
Sturt University, which supported me by providing study leave in 2003. I was
fortunate to be based at the National Library of Australia as a National Library
Fellow from March to June 2003 and I greatly appreciated the generous sup-
port of its then Director-General, Jan Fullerton, and other staff of the National
Library. Finally, I acknowledged the unfailing support of Rachel Salmond in
this and others of my endeavours.
In writing the second edition of this book I have incurred new debts. In
addition to those noted for the first edition, I gratefully acknowledge students
who have enrolled in my courses on digital preservation at Yonsei University,
Seoul, and the Graduate School of Library and Information Science, Simmons
College, Boston. My ideas have been informed by conversations with people too
numerous to name, but I wish to particularly thank Jeannette Bastian, Michèle
Cloonan, Joy Davidson, Cal Lee, Michael Lesk, Martha Mahard, Seamus Ross,
Anne Sauer, Shelby Sanett, and Terry Plum. I am grateful for the support of
my current employer, the Graduate School of Library and Information Science,
Simmons College, Boston.
I must again acknowledge the unfailing support of Rachel Salmond. I owe
Rachel more than I can adequately express here for her help over three decades,
her editorial assistance and her patience with me as the preparation of this
book took over normal schedules.
Chapter 1
What is Preservation in the Digital Age?
Changing Preservation Paradigms
Introduction
To preserve, as the dictionary reminds us is to keep
safe … to maintain unchanged … to keep or maintain
intact. But the rapid obsolescence of information
technology entails the probability that any digital
object maintained unchanged for any length of time
will become inaccessible (Thibodeau, 1999)
Any discussion about the preservation of digital materials must begin with the
consideration of two interlinked areas: changing preservation paradigms, and
definitions of terms. Without a clear understanding of what we are discussing,
the potential for confusion is too great. In library and recordkeeping practice
we are moving rapidly from collection-based models, whose principles and
practices have been developed over many centuries, to models where collec-
tions are not of paramount importance and where what matters is the extent of
access provided to information resources, whether they are managed locally or
remotely. Archivists have considered, debated, and sometimes applied the con-
cept of non-custodial archives, where there is no central collection, to accom-
modate the massive increase in numbers of digital records. Librarians manage
hybrid libraries, consisting of both physical collections and distributed digital
information resources, and digital libraries. Other stakeholders with a keen inter-
est in digital preservation manage digital information in specific subject areas,
such as geospatial data or social science data. In the past this material, where it
existed, was maintained as collections of paper and other physical objects. The
practices developed and applied in libraries and archives are still largely based
on managing physical collections and cannot be applied automatically to man-
aging digital collections.
The changing models of library and recordkeeping practice require new
definitions. The old terms do not always convey useful meanings in the digital
environment and can be misleading and, on occasion, even harmful. In library
and recordkeeping practice we are changing from a preservation paradigm
where primary emphasis is placed on preserving the physical object (the arti-
fact as carrier of the information we wish to retain, for example, a CD) to one
where there is no physical carrier to preserve. What, then, does the term pres-
ervation mean in the digital environment? How has its meaning changed?
8 What is Preservation in the Digital Age?
What are the implications of these changes? The phrase benign neglect pro-
vides an example of a concept that is helpful in the pre-digital preservation
paradigm but is harmful in the new. It refers to the concept that many informa-
tion carriers made of organic materials (most notably paper-based artifacts)
will not deteriorate rapidly if they are left undisturbed. For digital materials
this concept is positively harmful. One thing we understand about information
in digital form is that actions must be applied almost from the moment it is
created, if it is to survive. Pre-digital paradigm definitions do not accommodate
new forms, such as works of art that incorporate digital technologies, and time-
defined creative enterprises such as performance art.
This chapter examines the effect of digital information on ‘traditional’
librarianship and recordkeeping paradigms, noting the need for a new preser-
vation paradigm in an environment that is dynamic and has many stakeholders,
often with competing interests. It considers the differences between born-
digital and digitized information, and defines key terms.
Changing paradigms
It is now commonplace to hear or read that we live in an information society,
of which one main characteristic is the widespread and increasing use of net-
worked computing, which relies on data. This is revolutionizing the way in
which large parts of the world’s population live, work and play, and how
libraries, archives, museums and other institutions concerned with preserving
documentary heritage function and are managed. New expectations of these
institutions are evolving.
The significance of these changes is readily illustrated by just one example.
The internet is rapidly becoming the first choice for people who are searching
for information on a subject, and a new verb, to google (derived directly from
Google, the name of a widely used internet search engine) has entered our
vocabulary. The sheer size and rapid rate of the internet’s growth mean that
no systems have been developed to provide comprehensive access to it. The
systems that do exist are embryonic and experimental, and the quality of the
information available on the web is variable. Attempts to estimate the rate of
the internet’s growth have included counting the number of domain names
over several years. There has been a dramatic increase in the number of domain
names since 1994, when only a small number were registered, rising to around
100 million at the start of 2001 and, ten years later, to almost 800 million in
2010 (Internet Systems Consortium, 2011).
These major changes – it is not too extreme to call it a revolution – raise
the question of how to keep the digital materials we decide are worth keeping.
The ever-increasing quantities being produced do not assist us in finding an
answer. ‘According to a recent study by market-research company IDC … the
Changing paradigms 9
size of the information universe is currently 800,000 petabytes. … but it’s just
a down payment on next year’s total, which will reach 1.2 million petabytes, or
1.2 zettabytes. If these growth rates continue, by 2020 the digital universe will
total 35 zettabytes, or 44 times more than in 2009’ (Tweney, 2010). Nor does
the rapidity with which changes in computer and information technology occur.
The challenges are new and complex for nearly all aspects of librarianship and
recordkeeping, including preservation.
There have also been changes in the ways in which information is pro-
duced and becomes available to communities of users. The internet is only one
of these ways. In the pre-digital (print) environment the processes of creation,
reproduction and distribution were separate and different; now, ‘technology
tends to erase distinctions between the separate processes of creation, repro-
duction and distribution that characterize the classic industrial model of print
commodities’ (Nurnberg, 1995, p.21). This has significant implications for
preservation, especially in terms of who takes responsibility for it and at what
stage preservation actions are first applied. For instance, in the industrial-mode
print world, acquiring the artifact – the book – so that it could be preserved
occurred by means such as legal deposit legislation, requiring publishers to
provide copies to libraries for preservation and other purposes. If the creator is
now also the publisher and distributor, as is often the case in the digital world,
who has the responsibility of acquiring the information? These points are noted
in more detail later in this book.
New ways of working and new structures are developing. Cyberscholarship
(known also as e-science or e-research) is based on ready access to digital mate-
rials and applies computing techniques to analyze, visualize and present results.
This research is typically highly collaborative, being based on the use of large
data sets produced and shared by international communities of scholars. The
practices developed in this cyberscholarship environment are significantly dif-
ferent from traditional practices. Other characteristics of cyberscholarship also
illustrate different practices. The enhanced ability to compute large quantities of
data, such as using visualizations and simulations, provide new possibilities,
some of which can be seen in the Electronic Cultural Atlas Initiative (ecai.org).
The generation of large quantities of data places heavy demands on how data are
stored and managed. Heavy emphasis is placed on sharing and re-using digital
information. All of these factors place different demands on how digital informa-
tion is managed, including on its preservation over time to ensure it remains
available and usable in the future. A 2008 study carried out for the Association
of Research Libraries provides examples of cyberscholarship in humanities, so-
cial sciences and scientific/technical/medical subject areas in the US (Maron and
Kirby Smith, 2008). Changing information practices in the humanities, in the
UK specifically, are described by Bulger and her colleagues (Bulger et al., 2011).
Cyberinfrastructure refers to the computer networks, libraries and archives,
online repositories and other resources needed to support cyberscholarship.
10 What is Preservation in the Digital Age?
– When materials are treated, the treatments should, when possible, be re-
versible
– Whenever possible or appropriate, the originals should be preserved; only
materials that are untreatable should be reformatted
– Library materials should be preserved for as long as possible
– Efforts should be put into preventive conservation, and aimed at providing
appropriate storage and handling of artifacts
– Benign neglect may be the best treatment (derived from Cloonan (1993,
p.596), Harvey (1993, pp.14,140), and Bastian, Cloonan and Harvey (2011,
pp.612-613)).
The definitions associated with the old preservation paradigm are firmly rooted
in the conservation of artifacts – the physical objects that carry the information
content. In fact, the term ‘materials conservation’ is sometimes used, especially
by museums. The definitions provided in the IFLA Principles for the Care and
Handling of Library Materials (Adcock, 1998), widely adopted in the library
and recordkeeping contexts, articulate principles firmly based on maintenance
of the physical artifact. The definition of Conservation notes that its aims are
to ‘slow deterioration and prolong the life of an object’, and that of Archival
The need for a new preservation paradigm 11
the information landscape has changed, thanks to the digital revolution. Libraries are
working to integrate access to print materials with access to digital materials. There is
likewise a challenge to integrate the preservation of analog and digital materials. Preser-
vation specialists have been trained to work with print-based materials, and they are
justifiably concerned about the increased complexity of the new preservation agenda
(Kenney and Stam, 2002, p.v).
Marcum’s description of the situation is still accurate ten years later. Addition-
ally research libraries are seeking new roles as experts in the curation of digital
materials (Walters and Skinner, 2011).
What is the new preservation agenda? How has the preservation paradigm
changed to accommodate it? Pre-digital preservation paradigm thinking does
include some useful understanding of digital preservation. For example, it
recognizes that copying (as in refreshing from tape to tape) is the basis of digital
12 What is Preservation in the Digital Age?
Further key elements are the scale and nature of the digital information we
wish to maintain into the future and the preservation challenges these pose.
The complexities of the variety of digital materials are described in this way:
Digital objects worthy of preservation include databases, documents, sound and video
recordings, images, and dynamic multi-media productions. These entities are created
on many different types of media and stored in a wide variety of formats. Despite a
steady drop in storage costs, the recent influx of digital information and its growing
complexity exceeds the archiving capacity of most organizations (Workshop on Research
Challenges in Digital Archiving and Long-term Preservation, 2003, p.7).
Arising from these key factors is the need for new kinds of skills. Current
preservation skills and techniques are labour-intensive and, even where ap-
propriate, do not scale up to the massive quantities of digital materials we are
already encountering. The problem cannot simply be addressed by technologi-
cal means. What kind of person will implement the new policies and develop
the new procedures required to maintain digital materials effectively into the
future? New kinds of positions, requiring new skill sets are already being es-
tablished in libraries and archives. Among key selection criteria for a Digital
Archivist at the MIT Libraries in Cambridge, Massachusetts, advertised in
May 2011, were:
People with these skill sets are still in short supply. But it is more than new
skills that is required. We need to redefine the field of preservation and the
terms we use to describe preservation activities.
Changing definitions
Pre-digital preservation paradigm definitions do not convey useful meanings
when they are applied to digital preservation. What are they?
14 What is Preservation in the Digital Age?
Currently ‘conservation’ is the more specific term and is particularly used in relation to
specific objects, whereas ‘preservation’ is a broader concept covering conservation as
well as actions relating to protection, maintenance, and restoration of library collections.
The eminent British conservator, Christopher Clarkson, emphasizes this broader aspect
when he states that preservation ‘encompasses every facet of library life’: it is, he says,
‘preventive medicine ... the concern of everyone who walks into, or works in, a library.’
For Clarkson conservation is ‘the specialized process of making safe, or to a certain
degree usable, fragile period objects’ and ‘restoration’ expresses rather extensive rebuild-
ing and replacement by modern materials within a period object, catering for a future
of more robust use.’ He neatly distinguishes the three terms by relating them to the extent
of operations applied to an item: ‘restoration implies major alterations, conservation
minimal and preservation none’ (Harvey, 1993, pp.6-7).
It is clear that professionals are revising their definitions of preservation from a once-
and-forever approach for paper-based materials to an all-the-time approach for digital
materials. Preservation must now accommodate both media and access systems …
while we once tended to think about preserving materials for a particular period of time
– for example, permanent/durable paper was expected to last for five hundred years –
we now think about retaining digital media for a period of continuing value (Cloonan
and Sanett, 2002, p.93).
the paradigms of any of the information professions come up short when compared with
the scope of the issues continuously emerging in the digital environment. An overarching
dynamic paradigm – that adopts, adapts, develops, and sheds principles and practices
of the constituent information communities as necessary – needs to be created.
Preserving the original bit-stream is only one part of the problem; equally im-
portant is the requirement to preserve ‘the means of interpreting, reading and
utilizing the bit stream’ (Deegan and Tanner, 2002).
The difficulties of definition are not helped by disciplinary differences.
There are, for instance, differences in the way archivists and librarians use terms.
Some terms, such as integrity and authenticity, arise from the world of archives
and were not, until recently, usually associated with the work of librarians.
These differences, however, pale in comparison with the significantly different
definitions used in the IT industry. How IT professionals think about the long-
term storage of data is a question that assumes importance for digital preserva-
tion because of the heavy reliance that information professionals place on their
skills and services. There is abundant evidence that they think very differently
about preservation. Definitions of archive, archiving and archival storage give
us some indication of the mindset of IT professionals. A selection of online
dictionaries of information technology indicate that the terms are used in two
ways:
1. The process of moving data to a different kind of storage medium: for ex-
ample, ‘archive … 2 verb to put data in storage … on backing storage (such
as magnetic tape rather than a hard disk)’ (Collins, 2002)
2. The process of backing up data for long-term storage: for example, ‘ar-
chive (v.) To copy files to a long-term storage medium for backup … On
smaller systems archiving is synonymous with backing up’ (Webopedia,
2011).
Few of the definitions located display any interest or concern with the reasons
why long-term storage might be required, although one earlier definition is a
notable exception: ‘archiving Long term storage of information on electronic
media. Information is archived for legal, security or historical reasons, rather
than for regular processing or retrieval’ (Gunton, 1993, p.11). Perhaps the mind-
set of IT professionals is better indicated by this excerpt: ‘You detect data that’s
not needed online and move it an off-shore store. When someone wants to use
it, go find the off-line media and restore the data’ (Faulds and Challinor, 1998,
p.280).
Preservation definitions in the digital world 19
A period of time long enough for there to be concern about the impacts of changing
technologies, including support for new media and data formats, and of a changing user
community, on the information being held in a repository (International Organization
for Standardization, 2003, p.1-11).
items, even if that is done for preservation purposes’. Such statements are
worth making firmly because of the misconception still too commonly encoun-
tered in the information professions that digitizing of analogue materials, usually
photographs or paper-based material, is sufficient for preservation purposes.
This is not the case, as a 2010 report of LIBER members (Ligue des Biblio-
thèques Européennes de Recherche, representing European research libraries)
indicates:
Making the digitised material available and visible online is only one of the challenges
faced ... Another lies in assuring long-term access to them. Digitised materials í like
other digital data í are also fragile items and need special measures and arrangements
in order to be accessible despite technological change. While the preservation of paper
documents is well understood and is supported by a well-established infrastructure and a
profession of librarians and other experts, the preservation of digital objects in general
and digitised material in particular is a relatively new task for libraries and poses great
challenges in terms of the expertise and resources required (Bergau, 2010, p.6).
In terms of how they are preserved, though, the definitions in these two sources
make no distinction between born-digital materials and digital materials created
by digitizing analogue materials. This is acknowledged in the Digital Preserva-
tion Coalition’s definition of digital materials, which covers both ‘digital sur-
rogates created as a result of converting analogue materials to digital form
(digitisation), and “born digital” for which there has never been and is never
intended to be an analogue equivalent, and digital records’ (Digital Preservation
Coalition, 2008, p.24). These definitions also make clear that it is not only the
bit-stream that we seek to preserve. In order to ensure access in the future to
digital materials, we also need to take account of other attributes of digital
materials. The UNESCO Guidelines indicate this in the definition of information
packages, which comes from the OAIS Reference Model in Chapter 5. In addi-
tion to the bit-stream, which is typically ‘not understandable or re-presentable’
by itself, ‘any information and tools that would be needed in order to access
and understand’ the digital materials must also be preserved (UNESCO, 2003,
p.39).
The definitions are also very clear about the need to maintain other attributes
of digital materials. To ensure that digital materials remain usable in the future,
access to them is required – and not simply access, but access to ‘all qualities
of authenticity, accuracy and functionality’ (Digital Preservation Coalition,
2008, p.24). This, in turn, requires definitions of authenticity, expressed by the
UNESCO Guidelines as the ‘quality of genuineness and trustworthiness of
some digital materials, as being what they purport to be, either as an original
object or as a reliable copy derived by fully documented processes from an
original’ (UNESCO, 2003, p.157). (Note the emphasis on the significance of
full documentation to ensure authenticity; this has important implications for
digital preservation, noted in Chapter 5.) Four further definitions in the
How long are we preserving them for? 23
Conclusion
This chapter has introduced some of the key concepts that are reshaping preser-
vation practice in the digital environment. It notes the need for new ways of
thinking about preservation and poses three key questions that need to be con-
sidered when we think about the preservation of digital materials:
These questions and other themes introduced in Chapter 1 are explored in the
rest of this book.
Chapter 2
Why do we Preserve? Who Should do it?
Introduction
Society, of course, has a vital interest in preserving
materials that document issues, concerns, ideas, dis-
course and events ... The ability of a culture to survive
into the future depends on the richness and acuity of
its members’ sense of history (Task Force on Archiv-
ing of Digital Information, 1996, p.1)
preservation worth doing, it is also, some suggest, a duty. Agresto, former head
of the US National Endowment for the Humanities, suggested that ‘we have a
human obligation not to forget’ (cited in Harvey, 1993, p.7) and that preserva-
tion is essential for the well-being of democracies that depend ‘on knowledge
and the diffusion of knowledge’ and on ‘knowledge shared’ (Harvey, 1993,
p.7). Even greater claims are made: ‘the ability of a culture to survive into the
future’ depends on the preservation of knowledge (Task Force on Archiving of
Digital Information, 1996, p.1).
The cultural and political imperatives that have led to preservation being
considered as fundamental have been explored in books, such as Lowenthal’s
The Past is a Foreign Country (Lowenthal, 1985) and Taylor’s Cultural Selec-
tion (Taylor, 1996), which persuade us that preservation is not simply the con-
cern of a limited number of cultural heritage institutions and professions, but
has dimensions that have significant impact, both limiting and sustaining, on
most aspects of society. There is, in fact, no single reason why we preserve
knowledge. Preservation, suggests Cloonan (2001, p.231), ‘has a life force
fueled by many (often disparate) sources’.
None of these reasons change when we consider the preservation of
knowledge encoded in digital materials, but the rhetoric alters to emphasize
economic rationales. Preserving digital materials is essential. If we do not attend
to it ‘what is at stake is the loss of data representing billions of dollars of in-
vestment in new intormation technology, new scientific discoveries, and new
information on which our economic prosperity and national security depend’
(NDIIPP, 2011, p.1). Evidential and accountability reasons are also commonly
given: ‘we expect that this [digital] content will remain accessible to allow us to
validate claims, trace what we have done, or pass a record to future generations’,
states the NSF-DELOS Working Group on Digital Archiving and Preservation
(2003, p.[i]), who also specify five conditions for preservation, any one of
which is sufficient to provide a benefit to society:
– If unique information objects that are vulnerable and sensitive and therefore subject
to risks can be preserved and protected;
– If preservation ensures long-term accessibility for researchers and the public;
– If preservation fosters the accountability of governments and organisations;
– If there is an economic or societal advantage in re-using information, or
– If there is a legal requirement to keep it (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.3).
What are, and will be, the social contexts and institutions for preserving digital docu-
ments? Indeed, what new kinds of institutions are possible in cyberspace, and what
technologies will support them? What kind of new social contexts and institutions
should be invented for cyberspace? (Lyman and Kahle, 1998).
Such new social contexts are emerging and their digital content is deemed
worth preserving. The Library of Congress’s work in preserving Twitter content
(Watters, 2011) and the Schlesinger Library’s in preserving blogs (Dunn, 2009)
are two examples.
The very aims of preservation are also being questioned – ‘What are we
preserving? For whom? And why?’ (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.2) – and the expanded number of stake-
holders in the digital age means that a range of different interests must be con-
sidered.
Professional imperatives
What do these changes mean for libraries and archives? Have there been signifi-
cant changes in their practices?
At one level, there has been little change. Libraries are still, to use Deanna
Marcum’s words, ‘society’s stewards of cultural and intellectual resources’
(Kenney and Stam, 2002, p.v). Preservation is nothing less than core business
for libraries who maintain collections for use in the future. One typical view of
the preservation role of libraries is Gorman’s statement:
Libraries have a duty to preserve and make available all the records of humankind.
That is a unique burden. No other group of people has ever been as successful in pre-
serving the records of the past and no other group of people has that mission today ... Let
there be no mistake: if we librarians do not rise to the occasion, successive generations
will know less and have access to less for the first time in human history. This is not a
challenge from which we can shrink or a mission in which we can fail (Gorman, 1997).
items to be used longer before they wear out, and by the ‘just in case’ argument:
‘It cannot easily be predicted what will be of interest to researchers in the
future. Preserving current collections is the best way to serve future users’
(Adcock, 1998, p.8).
Similarly, there has been no fundamental change in archivists’ secure under-
standing of their preservation responsibilities. They have typically placed the
physical care of their collections at least on a par with, if not at a higher level
of importance than, the provision of access to those collections. This ‘physical
defence of archives’ was indeed considered paramount by the British archivist
Sir Hilary Jenkinson, who formulated this influential statement in 1922:
The duties of the Archivist ... are primary and secondary. In the first place he has to
take all possible precautions for the safeguarding of his Archives and for their custody
... Subject to the discharge of these duties he has in the second place to provide to the
best of his ability for the needs of historians and other research workers. But the position
of primary and secondary must not be reversed (Jenkinson, 1965, p.15).
– Working with information creators to identify requirements for the long-term man-
agement of information;
– Identifying the roles and responsibilities of those who create, manage, provide ac-
cess to, and preserve information
– Ensuring the creation and preservation of reliable and authentic materials;
– Understanding that information can be dynamic in terms of form, accumulation,
value attribution, and primary and secondary use; …
– Identifying evidence in materials and addressing the evidential needs of materials
and their users through archival appraisal, description, and preservation activities
(Gilliland-Swetland, 2000, p.21).
Nor should we forget the specific legal reasons for preservation. In the case of
archives these reasons are often connected to administrative and political ac-
countability. For some types of libraries, statutory responsibilities require that
preservation is their core business. National libraries, for example, have a stat-
utory responsibility for collecting and safeguarding access to information pub-
lished in their countries.
While the traditional preservation responsibilities of libraries and archives
may remain the same, there has been significant change in the ways in which
they are interpreted and operationalized as digital materials have become prev-
New stakeholders 29
alent. In their report about repositioning research libraries Walters and Skinner
(2011, p.57) state firmly that
very few research libraries should have more than half of their infrastructure devoted to
physical collections at this point in time. The library needs to think of digital curation as a
core function of the library and to invest financial and other resources into it accordingly.
New expertise and new perspectives are required, without discarding the prin-
ciples of preservation developed for non-digital materials. Of greatest signifi-
cance is the need to engage with other stakeholders and ‘form new alliances and
partnerships’ (Webb, 2000). These new stakeholders may not always embrace
engagement willingly and will often need to be convinced of their roles, as
Hilton, Thompson and Walters (2010) point out when writing of donations of
digital material to the Wellcome Library in the UK.
Smith cogently summarizes the concepts in this section:
Society has always created objects and records describing its activities, and it has con-
sciously preserved them in a permanent way … Cultural institutions are recognised
custodians of this collective memory: archives, librar[ies] and museums play a vital
role in organizing, preserving and providing access to the cultural, intellectual and his-
torical resources of society. They have established formal preservation programs for
traditional materials and they understand how to safeguard both the contextual circum-
stances and the authenticity and integrity of the objects and information placed in their
care … It is now evident that the computer has changed forever the way information is
created, managed, archives and accessed, and that digital information is now an integral
part of our cultural and intellectual heritage. However the institutions that have tradi-
tionally been responsible for preserving information now face major technical, organiza-
tional, resource, and legal challenges in taking on the preservation of digital holdings
(B. Smith, 2002, pp.133-134).
New stakeholders
The new challenges of digital preservation call for the involvement of new par-
ticipants. No longer are librarians and archivists the main groups concerned
with preserving digital materials: it is increasingly evident that the cultural heri-
tage institutions traditionally charged with responsibility for preserving materials
cannot continue to carry this responsibility in the digital age without widening
the range of partners in their endeavours. Scholars and scientists who, increas-
ingly, base their research on large data sets, drug companies who need to prove
ownership of intellectual property, lawyers who must keep secure evidence in
digital form, are but a few of myriad potential stakeholders.
Not only are new kinds of stakeholders claiming an interest or claiming
control, but higher levels of collaboration among stakeholders are also com-
monly understood to be necessary for digital preservation to be effective.
30 Why do we Preserve? Who Should do it?
Narrowly focused localized solutions are not considered likely to be the most
effective. Cooperation ‘can enhance the productive capacity of a limited sup-
ply of digital preservation funds, by building shared resources, eliminating
redundancies, and exploiting economies of scale’ (Lavoie and Dempsey, 2004).
The preservation of digital materials has become ‘essentially a distributed proc-
ess’ where ‘traditional demarcations do not apply’ and one for which ‘an inter-
disciplinary approach is necessary’ (Shenton, 2000, p.164).
Collaboration is considered more and more as the only way in which viable
and sustainable solutions can be developed, as the problems are well beyond
the scope of even the largest and most well-resourced single institution.
(UNESCO, 2003, Chapter 11 explores collaboration in more detail, and Chap-
ter 9 of this book provides examples of collaborative activities).
Who, more specifically, are these new stakeholders? What are their preser-
vation roles in an increasingly digital environment? An early indication was
provided by the Task Force on Archiving of Digital Information, whose in-
fluential 1996 report set much of the digital preservation agenda for the fol-
lowing decade. This report suggested that ‘intense interactions among the
parties with stakes in digital information are providing the opportunity and
stimulus for new stakeholders to emerge and add value, and for the relation-
ships and division of labor among existing stakeholders to assume new forms’.
It proposed two principles, the first that information creators, providers and
owners ‘have initial responsibility for archiving their digital information ob-
jects and thereby assuming the long-term preservation of these objects’, and
the second that, where this mechanism fails or becomes unworkable, ‘certified
digital archives have the right and duty’ to preserve digital materials (Task
Force on Archiving of Digital Information, 1996, pp.19-20). Since 1996 the
landscape of digital preservation has become clearer and we now see significant
levels of collaboration, strong emphasis on and involvement of data creators as
first-line preservers, and the development of certified digital archives (trusted
digital repositories).
In addition to data creators and certified digital archives, there are other
new stakeholders. They include commercial services, government agencies,
individuals, rights holders, beneficiaries, funding agencies, and users (Hodge and
Frangakis, 2004, p.15). ‘Hardware and software developers, publishers, produc-
ers, and distributors of digital materials as well as other private sector partners’
(UNESCO, 2004, Article 10) can also be added to the list. All stakeholders
are learning how to work together, learning first to understand the languages
of other disciplines and then working out how complementary skills can fit
together. Collaboration was of course not unknown in the old preservation
paradigm, one example being the collaboration of scholars and librarians to
identify the core literature in specific discipline areas for microfilming and
scanning projects (see Gwinn (1993) for an example in agriculture). The extent
and nature of collaborative activity has, however, intensified. It is encapsulated
New stakeholders 31
The same examples are presented, even when they are no longer in the ‘lost or
compromised’ category: the BBC’s Domesday Project, NASA data, the Viking
Mars mission, the Combat Area Casualty file containing prisoner of war and
missing in action information for the Vietnam war, the first email, the first web
site, as described in more detail below. Attempting to answer the question
requires, first, a consideration of the issue of selection for preservation (noted
in more detail in Chapter 4). A common argument is that anything significant
is likely to be maintained anyway; so should we be concerned about the rest?
Some of the examples that follow assume that the first (email, web site, and so
on) is worth preserving; but is this necessarily the case? It is often a view
developed in hindsight. Betts tells us that Ray Tomlinson, principal engineer at
BBN Technologies in Cambridge, Massachusetts did not save the first network
email ever sent in 1972 because ‘it just didn’t seem worth saving … Even if
backup tapes did exist, they might not be readable. They were just mag tapes,
and after seven or eight years, the oxide starts falling off, especially from tapes
of that era’ (Betts, 1999).
The small number of specific examples located indicate how great the
problem of loss or compromise of digital materials could be. The most often
quoted, indeed overused, examples are those cited in the 1996 report of the Task
Force on Archiving of Digital Information. Because they have been reported
very widely since, they warrant quoting at some length. The report notes the
case of the US Census of 1960.
In 1976, the National Archives identified seven series of aggregated data from the 1960
Census as having long-term historical value. A large portion of the selected records,
however, resided on tapes that the Bureau could read only with a UNIVAC type-II-A
tape drive. By the mid-seventies, that particular tape drive was long obsolete, and the
Census Bureau faced a significant engineering challenge in preserving the data from
the UNIVAC type II-A tapes. By 1979, the Bureau had successfully copied onto industry-
standard tapes nearly all the data judged then to have long-term value (Task Force on
Archiving of Digital Information, 1996, p.2).
The report notes other lost examples, one of them the first email message ‘sent
either from the Massachusetts Institute of Technology, the Carnegie Institute
of Technology or Cambridge University’ in 1964 (Task Force on Archiving of
Digital Information, 1996, p.3).
Rothenberg reminds us of some examples noted in a 1990 US House of
Representatives report:
hundreds of reels of tape from the Department of Health and Human Services; files
from the National Commission on Marijuana and Drug Abuse, the Public Land Law
Review Commission, the President’s Commission on School Finance, and the National
Commission on Consumer Finance; the Combat Area Casualty file containing POW
and MIA information for the Vietnam war; herbicide information needed to analyze the
impact of Agent Orange; and many others (Rothenberg, 1999b, pp.1-2).
36 Why do we Preserve? Who Should do it?
He reiterates the paucity of specific examples and offers a reason: ‘this may
simply reflect the fact that documents or data that are recognized as important
while they are still retrievable are the ones most likely to be preserved’ (Rothen-
berg, 1999b, p.2).
Another frequently cited example is the BBC Domesday Project. This pro-
ject captured the national imagination in the UK and resulted in a multi-media
version of the Domesday Book on videodisc, produced to mark the 900th anni-
versary of the original. It became inaccessible in the late 1980s as the hardware
platform for which it was developed, the BBC microcomputer and interactive
videodisc player, became obsolete. Attempts to restore the data include applying
emulation techniques and reverse engineering so it could be used on a Windows
PC and made available on the web. In 2011 the BBC made available a full ex-
traction of the community disc on the Domesday Reloaded web site (www.
bbc.co.uk/history/domesday). This project is also noted in Chapter 7.
Cook’s 1995 call to action provides Canadian examples of data loss where
recordkeeping practices were ignored in the move to online recordkeeping.
Cook noted that ‘the National Archives of Canada … found not only that 30 out
of 100 randomly chosen policy documents could not be found in the govern-
ment’s paper records, but also that no system was in place to safeguard the
contents of the electronic system’. Ontario Hydro’s nuclear power plant failed
to keep adequate electronic or paper records of its construction and operation
(Cook, 1995).
Scientific data have been the focus of many studies. One is Preserving
Scientific Data on Our Physical Universe (National Research Council, 1995),
which indicates what scientific data were then available from United States
scientific observation and what they have been and might be used for. It includes
some comments about what has survived. Space physics research has gener-
ated significant quantities of data over the last 30 years and much of this was
‘archived’ by sending the tapes, and sometimes relevant documentation, to the
NSSDC (National Space Science Data Center). However, ‘there are many data
at the NSSDC that most scientists would find difficult to use with only the
information originally supplied’ (National Research Council, 1995, p.21). This
report also notes the Landsat data, a large part of which resided ‘on tapes that
cannot be read by any existing hardware. Recent data-rescue efforts have been
successful in getting older data into accessible form, but these efforts are time-
consuming and costly’ (National Research Council, 1995). Humphrey gives
examples of the research data generated by research funded by the Social
Sciences and Humanities Research Council of Canada. Of a set of 150 studies
from 1977 to 1980, the data sets from only three could be located in 1998
(Humphrey, 2003).
There is some cause for optimism. Our longer experience with digital
preservation, together with some research, indicates that data previously thought
to be unrecoverable can be made usable if sufficient time, expertise and money
Current state of awareness of digital preservation problems 37
are available. The Legacy Media Project at the National Archives of Australia,
for example, recovered data fully from 82 per cent of the carriers it investi-
gated, with only 2 per cent unrecoverable; this was, however, at a cost of about
$35 per megabyte (Pearson, 2009). Storage media failure is, of course, only
one of the issues.
Conclusion
This chapter dwells on the question of who should take responsibility for digital
preservation. As well as the organizations that have traditionally been concerned
with preservation (libraries, archives, museums), the preservation of digital
materials must involve many other stakeholders. Their input is needed in deter-
mining what is kept, in negotiating rights, and in developing policies and proce-
dures for managing digital materials. This chapter also considers the question
of how much digital material we have lost. Although the parameters of loss are
very unclear, there is little doubt that the amount of digital materials that we
are unable to access, or able to access only after considerable effort and expense,
is significant and will continue to increase unless action is taken. Our ability to
take action now is hampered by continuing widespread lack of awareness of
the problem. The next chapter looks more closely at why there is a problem by
considering the nature of digital materials.
Chapter 3
Why There’s a Problem:
Digital Artifacts and Digital Objects
Introduction
On the surface, digital technology appears to offer
few preservation problems. Bits and bytes are easy to
copy, so there should be no problems in developing
an unending chain of copies into the future, and hav-
ing copies all over the world in case of disaster. How-
ever, we already know that the reality is not so simple
and that there are very significant technical and man-
agement problems. The two main factors leading to
inaccessibility of digital information: changing tech-
nology platforms and media instability, are relentless,
with the potential to render digital information useless
(Webb, 2000)
Why are digital materials different? What are the modes of digital death?
Answering these questions provides a framework for understanding why digital
preservation poses major challenges. There are three sets of challenges:
– those relating to the nature of the media that are used to store digital mate-
rials;
– those resulting from the technologies required to create, store and access
digital materials;
– those characterized in this book as challenges to the integrity of digital
materials.
The issues are complex and are interrelated. Rothenberg’s perceptive statement
about them is worth quoting at length for its clear description of the range of
issues and their close relationship:
It is now generally recognized that the physical lifetimes of digital storage media are
often surprisingly short, requiring information to be ‘refreshed’ by copying it onto new
media with disturbing frequency. Moreover, most digital documents and artifacts exist
only in encoded form, requiring specific software to bring their bit streams to life and
make them truly usable; as these programs (or the hardware/software environments in
which they run) become obsolete, the digital documents that depend on them becomes
unreadable – held hostage to their own encoding. This problem is paradoxical, given
the fact that digital documents can be copied perfectly, which is often naively taken to
mean that they are eternal … In addition to the technical aspects of this problem, there
are administrative, procedural, organizational, and policy issues surrounding the man-
40 Why There’s a Problem: Digital Artifacts and Digital Objects
agement of digital material. Digital documents are significantly different from traditional
paper documents in ways that have significant implications for the means by which
they are generated, captured, transmitted, stored, maintained, access, and managed …
[mandating] new approaches to accessioning and saving digital documents to avoid
their loss. These approaches raise nontechnical issues concerning jurisdiction, funding,
responsibility for successive phases of the digital document life cycle, and the develop-
ment of policies requiring adherence to standard techniques and practices to prevent the
loss of digital information (Rothenberg, 1999a, p.2).
consider they have ‘complete control over the formats that they will accept and
enter into their archives’; and over 70 per cent currently held less than 100 ter-
abytes in 2009, but expect to store over 100 terabytes in ten years time (Planets,
2009, p.3). A 2010 survey of special collections in North American research
libraries described the ‘increasing availability of special collections materials
in digital form over the past decade … [as] nothing sort of revolutionary for
both users of special collections and the professionals who manage them’ and
highlighted two challenges – ‘the need for complex technical skills and chal-
lenging new types of intra-institutional collaboration’ (Dooley and Luce, 2010,
p.53). Most often cited as impediments to effective digital preservation in
these collections were lack of funding (69 per cent), lack of time for planning
(54 per cent) and lack of expertise (52 per cent) (p.60).
Figure 3.1 lists the threats to digital continuity (‘continuity of production,
continuity of survival, continuity of access’) identified in the UNESCO Guide-
lines. Although not all of these threats are specific to digital materials, the list
serves as a useful reminder of the magnitude of the challenge that we face in
preserving digital materials.
– The carriers used to store these digital materials are usually unstable and deteriorate
within a few years or decades at most
– Use of digital materials depends on means of access that work in particular ways:
often complex combinations of tools including hardware and software, which typi-
cally become obsolete within a few years and are replaced with new tools that work
differently
– Materials may be lost in the event of disasters such as fire, flood, equipment failure,
or virus or direct attack that disables stored data and operating systems
– Access barriers such as password protection, encryption, security devices, or hard-
coded access paths may prevent ongoing access beyond the very limited circum-
stances for which they were designed
– The value of the material may not be recognised before it is lost or changed
– No one may take responsibility for the material even though its value is recognised
– Those taking responsibility may not have adequate knowledge or facilities
– There may be insufficient resources available to sustain preservation action over the
required period
– It may not be possible to negotiate legal permissions needed for preservation
– There may not be the time or skills available to respond quickly enough to a sudden
and large change in technology
– The digital materials may be well protected but so poorly identified and described
that potential users cannot find them
– So much contextual information may be lost that the materials themselves are un-
intelligible or not trusted even when they can be accessed
Digital storage media 43
and the quality of the equipment used to access the medium. These are worth
examining in more detail.
The manufacturing quality of the medium is a crucial factor simply because
all materials deteriorate. In the words of David Bearman (1998, p.24), it is ‘a
fact of physics’, a fact that we must accept as a major limitation on digital
preservation, and the ‘outside boundary beyond which we cannot rationally
plan to retain the information without transforming the medium’. We can, how-
ever, attempt to influence manufacturers so that they improve the quality of
their products to meet archival and preservation requirements. Knowledge of
the physical and chemical makeup of digital storage media and of their processes
of deterioration is helpful in making decisions about preservation actions. One of
the keys to prolonging the life of digital artifacts is providing storage conditions
that slow down the rate of deterioration.
Howell (2001, p.138) makes the point that many digital storage media are
‘rotating technologies’ with moving parts and are, therefore, subject to wear that
may damage the media. For example, even a small rearrangement of the magnet-
ized particles on a magnetic tape, perhaps caused by accidental physical contact
with a part of the playback equipment, can result in loss of data and may some-
times be sufficient to render a whole file unreadable. Maintaining recording and
playback equipment to a high standard minimizes the likelihood of this occur-
ring. Similarly, inappropriate handling of optical media can result in physical con-
tact with the part of the media that records the data. For example, touching the
surface of a CD-ROM and leaving an oily residue can corrupt the bits stored on it.
One noteworthy change from the pre-digital preservation paradigm is the
realization that digital storage media have little or no artifactual value. This
point is explored in the 2001 report of the Task Force on the Artifact in Library
Collections. Artifacts are valued in library and archives collections because
their physical form demonstrates ‘the originality, faithfulness (or authenticity),
fixity, and stability of the content’ (Task Force on the Artifact in Library Collec-
tions, 2001, p.vi); the artifact is significant for research purposes because
it provides this evidence. When the information stored on or in an artifact is
reformatted, as in microfilming a book printed on brittle paper or copying data
from an obsolete format to a current one, these evidentiary qualities are lost.
Because deterioration of digital materials is, as already noted, ‘a fact of physics’,
refreshing and migration are facts of digital preservation life. Because refreshing
and migration replace obsolete digital media with current media, the artifact
itself, because it is replaced, cannot demonstrate qualities such as originality,
authenticity or fixity; other mechanisms are used to demonstrate these eviden-
tiary qualities. This is a major change from pre-digital paradigm thinking, and
we are only slowly changing our professional mindsets to accommodate the
necessary changes in practice.
The following examples of magnetic tape and optical disks illustrate many
of the factors that contribute to the deterioration of digital storage media. Other
46 Why There’s a Problem: Digital Artifacts and Digital Objects
storage media in current use, such as hard disk drives, could have been noted.
However, the principles that are identified for these two media types apply
more generally.
Magnetic media
This section is based on the writings of Van Bogart (1995), Ross and Gow
(1999), and the International Association of Sound and Audiovisual Archives
Technical Committee (2004 and 2009), which can be referred to for more de-
tailed information.
Magnetic tapes have been used for digital data storage from the 1960s.
Magnetic media (tape in reels and in housings such as cassettes and cartridges)
are still in common use today because they are versatile and cheap and can
provide higher data densities than other media. In 2011 the highest storage
capacity magnetic tapes were five terabytes. They are available in a large
number of formats: the IASA guidelines for digital audio objects provide the
specifications of 23 common data tape formats (International Association of
Sound and Audiovisual Archives Technical Committee, 2004, p.58), but the
number is higher, especially if formats not in current use are included in the
count. The wide range of computer tape formats handled by one data conversion
company is given on its web site (www.ndci.com/Home/NewsInfo/TapeFormats/
tabid/92/Default.aspx).
Magnetic tapes store information in the alignment of magnetic particles sus-
pended within a polymer binder, which sticks the magnetic information-carrying
layer to a substrate and provides a smooth surface that helps the tape to run
through playback equipment smoothly. If humidity levels are too high, the
binder softens or becomes brittle through hydrolysis, resulting in the ‘sticky
tape’ phenomenon where the binder sticks to the equipment’s tape heads. Data
loss from dropout is one result of ‘sticky tape’. The magnetic particles, which
store data in the direction of the magnetism in them, vary in their magnetic
stability. The substrate is usually made of chemically stable polyester film
(Mylar or polyethylene terephthalate (PET)). It is affected by mechanical prob-
lems such as stresses on the tape caused by fluctuations in temperature and
humidity levels in storage areas, resulting in mistracking during playback. The
substrate can also be stretched if the tape is not appropriately stressed when it
is wound or rewound. Other factors that cause data loss include the quality and
maintenance of tape recording and playback devices.
The longevity of magnetic tape can be improved by attention to its care
and handling. Appropriate storage is essential for minimizing deterioration
through binder hydrolysis, which is a result of excessive moisture. The rate of
hydrolysis can be reduced by lowering humidity levels and temperatures in
tape storage areas. Magnetic pigments degrade more slowly at lower tempera-
Digital storage media 47
tures. It is also important that temperature and humidity levels are kept constant
and stable. Storage at temperatures that are too high (above 23oC, suggests
Van Bogart) increases dropout because the tightness of the tape packing is
increased; this, in turn, increases tape distortion. Increased tape-pack stresses
also occur as the tape absorbs moisture and expands at relative humidity levels
greater than about 70 per cent. Temperature and relative humidity levels that
are too high also promote fungal growth. Attention should be paid to maintain-
ing good air quality and to reducing dust and debris. Conditioning (acclimati-
zation) is required if tape is stored in an environment that differs from the
environment in which it is used. (Further information about the care and han-
dling of magnetic media is available in Chapter 7 and in Van Bogart, 1995.)
An indication of the life expectancy of some common tapes is provided in
Figure 3.3. The important point to note here is the effect of different relative
humidity (RH) and temperature levels on the life expectancy of digital artifacts.
(Note that although D3 tape and DLT tape cartridges have been superceded,
the trends indicated in Figure 3.3 are still valid.)
Optical disks
This section is based on Byers (2003), Ross and Gow (1999), the International
Association of Sound and Audiovisual Archives Technical Committee (2004
and 2009) and Iraci (2010), which can be referred to for more detailed infor-
mation.
The term optical disk is applied to a large number of media which share
the characteristic of using laser light to record and retrieve bits from a data layer.
Optical disks became readily available at the end of the 1970s. The IASA
Guidelines on the Production and Preservation of Digital Audio Objects noted
12 commercially available CD and DVD disk types with storage capacities
48 Why There’s a Problem: Digital Artifacts and Digital Objects
media could reduce the frequency with which copying of data (refreshing or
migrating them) needs to be carried out. Further research into digital storage
media is needed and is, in fact, being conducted, as exemplified by General
Electric’s work on new holographic digital storage technology (Lohr, 2009)
and by experiments with archival-quality microfilm for long-term storage of
bit-streams (www.peviar.ch and www.bitsave.ch).
Until we succeed in improving digital storage media, we need interim re-
sponses to the challenges presented by deteriorating media. One such response
was proposed by Howell in 2001, who noted that ‘it is pragmatic to keep crucial
pieces of hardware and operating software tucked away from the IT depart-
ment’s upgrading programmes, for a few years at least’ and provided an example
from the State Library of New South Wales, where a 5¼-inch floppy disk
drive was maintained to provide access to legal deposit material in that format
(Howell, 2001, p.142). This is still sound advice today.
Harvey concluded in 1995 that ‘there are at present too many unknowns to
commit digital data to currently-available artefacts for anything other than
short-term storage’ (Harvey, 1995). This situation has not changed. If we
choose to preserve digital artifacts, then we do so in the knowledge that this is
a short-term expedient.
processable units … according to the logic of some application software. The rules that
govern the logical object are independent of how the data are written on a physical
medium … A logical unit is a unit recognized by some application software. This rec-
ognition is typically based on data type [for example, ASCII or other more complex
formats] … to preserve digital information as logical objects, we have to know the
requirements for correct processing of each object’s data type and what software can
perform correct processing (Thibodeau, 2002, p.7).
To preserve the digital object, then, we need to preserve not merely the bit-
stream, but also the means to process that bit-stream: the access devices that let
us read the bit-stream from the digital media on which it is stored; the software
that allows us to manipulate and present the information represented by the
data carried by the bit-stream; the documentation so that we can understand
the data formats used and the software; and the contextual information that is
essential to ensure the integrity and authenticity of the information. Ross lists
the main factors that ‘can render resources non-interpretable’ as degradation of
the media, loss of functionality of access devices, loss of manipulation capa-
bilities, loss of presentation capabilities, weak links in the documentation chain,
and loss of contextual information (Ross, 2000, p.12). His terms are used here-
after in this chapter. As the UNESCO Guidelines for the Preservation of Digital
Heritage bluntly put it, ‘Digital materials cannot be said to be preserved if
access is lost’ (UNESCO, 2003, p.21).
The short lifespans of digital storage media are only one reason that digital
preservation remains a challenge. Another reason is what is commonly referred
to as technological obsolescence, where hardware and software are replaced by
newer devices or versions that supersede the old technology. The consequence
is that information stored on and accessed using obsolete technologies become
inaccessible. Even if the digital storage media on which the bit-stream is stored
remains in usable condition and the bit-stream stored on it is intact, it is almost
inevitable that the drive, software driver or computer will no longer be available
to access it.
Australia academic Tara Brabazon described this situation in 2000:
I still own my first laptop computer, bought in 1991. It is an Olivetti M316. It functions,
although the battery no longer does. It has a 40 MB hard drive, which is not large enough
to install a current version of Windows 98, let alone the ability to use the Windows
environment to prepare documents. That is probably quite fortunate, as the ‘F’ key
does not work, and most of the letters on the keyboard have been scratched off through
excessive use. There is no possibility or space for a modem connection (Brabazon, 2000,
p.156).
52 Why There’s a Problem: Digital Artifacts and Digital Objects
The crux of the problem of preserving digital materials is that they are ‘inher-
ently software-dependent’ (Rothenberg, 1999a, p.8). The bit-stream can repre-
sent any of a very wide range of content and formats – often text or data, but also
images, audio and video, graphics, or combinations of these and other content.
These data require software to interpret them, to turn them into information:
This point cannot be overstated: in a very real sense, digital documents exist only by
virtue of software that understands how to access and display them; they come into
existence only by virtue of running this software (Rothenberg, 1999a, p.8).
underpinning authenticity and integrity and their preservation over time are the con-
cepts of fixity, stabilisation, trust, and the requirements of custodians and users … an
authentic digital object is one whose genuineness can be assumed on the basis of one or
more of the following: mode, form, state of transmission, and manner of preservation
and custody (Ross, 2002, p.7).
Many of the concepts that have been developed to ensure authenticity and in-
tegrity of preserved digital objects come from research and thinking among the
recordkeeping community, such as the outcomes of the Functional Requirements
for Evidence in Recordkeeping Project at the University of Pittsburgh, and
InterPARES (see Chapter 5). Gilliland-Swetland (2000, p.16) reminds us that
‘the value of an individual record is derived in part from the sequence of records
within which it is located’ and that ‘it can be difficult to understand an individual
record without understanding its historical, legal, procedural, and documentary
context’ (p.18).
Conclusion 55
Conclusion
This chapter notes three modes of digital death: instability of storage media;
obsolescence of storage and access technologies; and challenges to the integ-
rity of digital materials. The rapid obsolescence of hardware and software is, to
a large extent, a result of today’s prevailing market-driven ethos, the highly
competitive nature of which means that product obsolescence is essential to
the survival of businesses. Digital preservation requires that means of address-
ing rapid obsolescence must be established. (Some of these means are noted in
Chapters 6, 7 and 8.) However, rapid obsolescence is not the only threat to the
preservation of digital materials imposed by the prevailing competitive philoso-
phy; commercial imperatives seldom coincide with cultural heritage imperatives.
Creators of digital materials and other stakeholders may lose interest in their
digital output – a business might close down, or a web site might cease to be
maintained – which has consequences for the future of their materials. Other
threats include stakeholders’ lack of awareness about digital preservation issues,
a shortage of the skill sets needed to preserve digital materials, lack of inter-
nationally agreed approaches, a shortage of practical models on which to base
preservation practice, and a lack of ongoing funding to address digital preserva-
tion issues. These issues are noted further in the chapters following.
Chapter 4
Selection for Preservation – The Critical Decision
Introduction
Selection in the digital world is not a choice made
once and for all near the end of an item’s life cycle,
but rather is an ongoing process intimately connected
to the active use of the digital files (Conway, 2000)
Librarians and archivists have long acknowledged their responsibility for pre-
serving documents for future use, and have developed criteria and processes
for identifying the documents to which they will devote resources to ensure
their preservation. Archivists have developed a considerable body of theory
and practice about appraisal to support this responsibility. In the main, however,
these criteria, processes, theory and practice have been developed and applied
primarily to paper-based documents. They do not automatically translate to
digital materials and therefore need to be revisited and modified to ensure
that they can be applied effectively in the digital world.
Selection must be reconsidered as we move from old to new preservation
paradigms. For digital materials, selection decisions are ‘not a choice made
once and for all near the end of an item’s life cycle, but rather … an ongoing
process intimately connected to the active use of the digital files’ (Conway,
2000). The digital mortgage (that is, the ongoing costs) that result from selec-
tion decisions also needs to be considered: ‘Program costs don’t cease when
the Web site disappears’ (Vogt-O’Connor, 2000).
This chapter considers the important role that selection plays in the respon-
sible professional practice of librarians and recordkeepers, when applied to
digital materials. It notes selection criteria traditionally applied in library practice
and appraisal criteria traditionally used by archives, and indicates why these
selection criteria, developed for physical artifacts, do not translate well when
applied to digital materials. It considers what additional factors need to be
considered or existing factors emphasized when developing effective selection
policies and practices for digital materials, such as the role of intellectual
property ownership, the importance of preserving context, and the nature of
stakeholder input. Emerging frameworks for selecting digital documents,
incorporating some new selection criteria and modified weightings for tradi-
tional selection criteria, are noted.
Selection for preservation, cultural heritage, and professional practice 57
Selection decisions for cultural heritage materials have been the focus of
considerable debate, and this increasingly includes digital materials. The issues
have been articulated most clearly by the archives profession. Cook summarizes
many of them. Archives, he suggests, are ‘a source of memory about the past,
about history, heritage, and culture, about personal roots and family connec-
tions, about who we are as human beings and about glimpses into our common
humanity through recorded information in all media, much more than they are
about narrow accountabilities or administrative continuity’ (Cook, 2000). But
there are many dangers: ‘Memory is notoriously selective – in individuals, in
societies, and, yes, in archives. With memory comes forgetting. With memory
comes the inevitable privileging of certain records and records creators, and
the marginalizing or silencing of others’ (Cook, 2000).
This is not the only way of thinking about selection. Other approaches to
selection are based on a clear distinction between the activities of records man-
agers and the work of archivists. Records managers select records for retention
based on ‘risk avoidance, market opportunities, or desires to avoid embarrass-
ment or accountability’ but this approach ‘inevitably will privilege the needs of
business or government in terms of the issues that get addressed, the allocation
of resources, and the long-term survival of records’ (Cook, 2000). The records
that survive into the future will reflect the concerns of administrators, rather
than the full range of human experience. Recordkeepers, suggests Cook, ‘need
as a profession to remind [themselves] continually of the fate of records left to
White House presidents and Soviet commissars, South African apartheid police
forces and Canadian peacekeepers in Somalia, rogue Queensland politicians
and the American Internal Revenue Service’ (Cook, 2000).
In any approach to selection the selectors must inevitably bring to their
decisions their personal beliefs and values. Responsible selection practice in
preservation must aim to minimize bias in the value judgments that are inevitably
reflected by the decisions about what to select for preservation. Selection dis-
enfranchises some groups, as our experiences with non-digital collections have
shown; an example is that history has moved from being that of significant
individuals, usually male, to a wider view based on the records of other groups
such as women and the poor, and of indigenous cultures.
Should we save everything? This is now technologically feasible for digital
materials with massive decreases in the cost of digital storage and significant
increases in processing power. Current thinking is that we should not, for
many reasons. We lack sufficient resources. Retrieval tools are not sufficiently
developed to provide adequate results. But there are other reasons why the
answer to this question is no and some are noted later in this chapter.
The current state of development of digital preservation requires that we
still have to pose the question of what really matters. How do we decide? Various
selection approaches might be used. Preserving what is easiest to preserve –
‘picking the low-hanging fruit’ (this metaphor is used in UNESCO, 2003,
Selection criteria traditionally used by libraries and archives 59
should be kept? Who keeps it? (The rest of this chapter is based in part on
Harvey (2005a and 2010, chapter 11).)
Most library selection practice is aimed at meeting the current needs of
their user communities. (This is, of course, a generalization: national libraries
are a significant exception to this in their use of legal deposit mechanisms to
acquire comprehensive national documentary resources collections.) Secondary
principles, such as decisions to develop particular subject areas where user
needs are of lower priority, may modify these policies. These selection guide-
lines based on meeting current user needs do not automatically apply to selecting
material for preserving in the future.
Criteria developed by libraries to select artifacts for preservation vary little.
They address characteristics of artifacts – usually age, evidential value, aesthetic
value, scarcity, associational value, market value and exhibition value (Task
Force on the Artifact in Library Collections, 2001, pp.9,11). Other perspectives
on these criteria may apply: for instance, the fact that an artifact is an original
and not a copy may be strongly associated with its evidential value and its
exhibition value. Occasionally additional criteria are used; examples include
‘fragility or condition’ (Pacey, 1991, p.189) and whether the artifact is ‘held in
community esteem’ ((Significance) 2001, p.11). The practicalities of preserva-
tion are also influential, among them the existence of a management plan and
assessment of physical condition (Edmondson, 2002, pp.22-23; Harris, 2000,
pp.206-224). Constraints to preservation resulting from intellectual property
issues also make an appearance, as in Harris’s ‘are there any constraints caused
by copyright laws?’ (Harris, 2000, pp.206-224).
To summarize, traditional library selection practices for preservation are
based on the preservation of items (or artifacts) in their original formats, accord-
ing to five key criteria: evidential value; aesthetic value; market value; asso-
ciational value; and exhibition value. Additional criteria are often also used,
although there is less consistency in their application: physical condition;
resources available; use; and an ill-defined, but nonetheless well-represented,
category of social significance (is the item ‘held in community esteem’?).
Although this summary sounds straightforward and its criteria clear cut, in
practice many problems arise in their interpretation, most of which relate to the
difficulty of determining what value is. ‘What were the grounds for deciding in
favor of one object and against another? How can libraries cope with the fact
that the value of the artifact is never quite the same to different researchers?’
Artifacts are ‘cultural variables’ that ‘are viewed and used in a given culture at
a particular moment’ (Task Force on the Artifact in Library Collections, 2001,
p.12). Libraries haven’t always got this right – in fact it is impossible to get it
right, given changing perceptions of value. As an example, consider the chang-
ing status as research materials of romance novels, a genre which has been
‘banish[ed] … from the cultural record of the nation’ (Flesch, 1996, p.190;
2004). Other problems arise when determining if and when a surrogate is just
Selection criteria traditionally used by libraries and archives 61
as suitable as the original artifact. Copies of artifacts lose some of their infor-
mation each time they are copied – a microfilm can never capture all of the
information that resides in the physical structure of a book. This is known as
generational loss.
Many approaches to selection for preservation have been applied in libraries.
For some of these the aim has been to reduce the amount of intellectual input
into the selection process, or to expend that effort only once so that others do
not have to duplicate it. The ‘great collections’ approach was based on selecting
artifacts whose content was valuable intellectually and which were physically
fragile, but this is labour-intensive. Similar approaches involved the determina-
tion by a bibliographer or panel of experts of the core literature in a field, but
this too was labour-intensive and required a high level of bibliographic control
in the field. User-driven selection occurs when an item requested by a user is
in poor physical condition and is treated when attention is called to it (Cox, 2002,
pp.97-98). Smith (1999, pp.9-11) provides more details of these approaches.
None of these approaches is fully appropriate to the selection of artifacts for
preservation. The same reservations apply to their application in the digital
environment.
The archival community has well-developed theory and a set of practices
known as appraisal. Appraisal theory and practice is significantly more devel-
oped than library selection practice and its deeper theoretical basis informs our
thinking about what digital materials are worth preserving. Appraisal, which
lies at the core of archival practice, is traditionally defined in terms of evaluating
records in order to determine whether they need to be kept as archives (that is,
as long as possible) or for a specified period or whether they are to be destroyed.
Appraisal of archival materials applies criteria similar to those used in
library selection. Archival value encompasses several narrower values: admin-
istrative (usefulness for the conduct of business); fiscal (usefulness for financial
business); legal (worth for conduct of legal business); intrinsic (the inherent
nature of the material, its significance as artifact); evidential (its value as evi-
dence of the record creator’s origins, functions and activities); and informa-
tional (usefulness for more general research purposes because of the record’s
information content) (Tibbo, 2003, pp.29-30). What most strongly distinguishes
archival appraisal practice from library selection practice is a stronger recognition
of, and emphasis on, the importance of context; as Cox (2002, p.53) indicates,
‘from an archival perspective, context is crucial to understanding information
or evidence (in any form)’. The distinctive nature of appraisal is noted by Gilli-
land-Swetland:
Two of these questions (5 and 6) are concerned with practicalities: are the
resources (technical, financial and human) available to preserve the records
and make them available in the future? Are access restrictions so rigid that the
records cannot be made available to users in a realistic and timely way? Ques-
tion 1 is fundamental to appraisal. By linking the administrative, evidential and
informational value of the records to an organization, it clearly indicates the
importance of context. Cox (2001, p.6) further indicates that ‘a record is a spe-
cific entity. It is transaction oriented. It is evidence of activity (transaction) and
that evidence can only be preserved if we maintain content, structure, and con-
text’.
How might we interpret this for materials that are not records of trans-
actions, such as the kinds of materials (including digital materials) usually
managed by libraries? Cox (2001, p.6) assists us: ‘Structure is the record form.
Context is the linkage of one record to other records. Content is the information,
but content without structure and context cannot be information that is reliable’.
We need to identify the equivalents of content, structure and context for digital
materials that are not principally records of transactions.
Appraisal theory and practice are themselves being continually reappraised,
in part because appraisal ‘is not free of bias and subjectivity; its results reflect
the cultural and other values of the time’ (Piggott, 2001). The ‘great trinity
mystery’ of appraisal pithily notes:
Why traditional selection criteria do not apply to digital materials 63
– apart from some exceptional cases, it is beyond our resources and power to keep all
records; which is a pity, because
– beyond their original use, all records conceivable have their uses; we’ve come to
expect unexpected uses and yet
– it is almost impossible to accurately predict future use, and when we try, the passage
of time can cause serious havoc with appraisal judgments (Piggott, 2001).
typically the case for non-digital materials. A further difference which relates
directly to selection is the need to make conscious preservation decisions about
digital materials early in their existence. Burrows compares the pre-digital and
the digital environments:
In the pre-digital era, selection necessarily preceded preservation. Once a book or journal
had been acquired, a decision could be made about its long-term retention. The mere
act of placing it in a library was often seen as a sufficient method of preservation in itself.
In the digital era things are very different. ‘Digital records don’t just survive by accident’,
observes Margaret Hedstrom. As a result, a lack of preservation is tantamount to de-
selection (Burrows, 2000, p.152).
Or, as the UNESCO Guidelines concisely state, ‘it may not be possible to wait
for evidence of enduring value to emerge before making selection decisions’
(UNESCO, 2003, p.71).
One characteristic of digital materials that assumes greater significance
when considering selection is the quantity of digital materials being generated
(noted in Chapter 1). We currently consider that we need to preserve a consid-
erable percentage of this material, and this consideration has a major conse-
quence for selection as well as for other aspects of digital preservation. Our
current best-practice techniques were developed for small quantities of simply-
structured digital objects and they work best with these. They require significant
input by people and paying people costs money. One response is to develop
efficient and effective automated processes that require minimal human inter-
vention. Appraisal is one area where research into the development of auto-
mated processes is being carried out (for examples see Harvey and Thompson,
2010; Oliver et al., 2008).
Other differences between digital and non-digital materials also need to be
considered when developing selection criteria. For digital materials the quanti-
ties to be assessed may be significantly greater and their quality may be more
variable; for example, unlike most print-based publications, they may not go
through the quality assurance processes of a publisher. We also face the chal-
lenges posed by new genres being developed, especially those that incorporate
linked resources. There is, too, the question of exactly which attributes of digital
materials should be preserved (noted in more detail in Chapter 5). Yet another
issue is the difficulty of determining ownership of intellectual property rights
for some digital materials. Any effective process and criteria for selection of
digital materials for preservation must take account of these differences.
IPR, context, stakeholders, and lifecycle models 65
Among the powerful tools that some libraries apply to the preservation of
traditional documentary heritage materials are the provisions of legal deposit
legislation, through which material comes automatically to a designated library
without the expenditure of effort and resources to acquire that material. Legal
deposit is conceptually antithetical to selection; it implies that no selection is
made, but that all materials defined by the legislation are to be retained by
the library in which they are deposited. Legal deposit is noted here, however,
because it addresses one of the issues faced in preserving digital materials –
getting hold of them in the first place.
Because legal deposit legislation usually predates the digital era, digital
materials are not covered. Some countries have drafted and/or enacted legal
deposit legislation that covers all digital materials: these include Canada,
Denmark, Finland, France, Germany, Iceland, New Zealand, Norway, South
Africa, Sweden and the United Kingdom. Other countries have legal deposit
legislation that covers some digital materials, usually static publications such
as those issued on CD-ROM (Legal deposit, 2007). In the absence of legal
deposit legislation covering digital materials, there is some potential in negoti-
ating voluntary deposit schemes. A model code for voluntary legal deposit
agreements was developed and adopted in 2005 by the Conference of Euro-
pean National Libraries and the Federation of European Publishers (2005).
Where digital materials can only be understood by reference to a set of rules such as a
record keeping system, database or data generation system, or other contextual infor-
mation, selection processes must identify the documentation that will also need to be
preserved.
materials that are selected for preservation: in the case of a national library it is
a wide range, whereas for a university-based programme it will be narrower,
perhaps restricted only to the intellectual output of its faculty and research
students.
The necessary influence of context on selection for preservation is demon-
strated in the OAIS Reference Model, a key standard for developing digital
archives. This standard defines the concept of a ‘Designated Community’ (‘An
identified group of potential Consumers who should be able to understand a
particular set of information’ (Consultative Committee for Space Data Systems,
2002, p.1-10)) and in doing so allows us to be more precise about what serves
the needs of the specified designated community, including what is selected
for preservation for the use of that community. As well as the nature and extent
of the material that is preserved, the community will also define what kind of
contextual information is collected and preserved. For some communities it
may be sufficient to see only a passive rendition of the digital materials: a
screen shot or other visual presentation; perhaps a PDF version is all that is
required. For others, it will be necessary to retain sufficient contextual infor-
mation to allow the digital materials to be searched or manipulated. For yet
other communities, there must be enough contextual information of the right
kind to demonstrate that the authenticity of the digital materials has not been
compromised. Selection criteria for determining digital materials to be pre-
served need to take account of this contextual information. Chapter 5 examines
further the questions surrounding what attributes need to be preserved.
Stakeholder input
Chapter 2 noted that stakeholders typically play a greater role in digital preser-
vation than they played in the preservation of non-digital materials. There is
increasing awareness that the creators of digital materials are well placed to
influence the preservation of the materials they create, and that engaging and
influencing creators provide worthwhile benefits. The UNESCO Guidelines
(2003, p.73) suggest that because producers of digital materials are ‘well placed
to understand why digital objects were brought into being, their essential
“message”, and the relationships between objects and their context’ they are
likely to play an important role in selection decisions; in fact, it may be impos-
sible to reconstruct this information at a later date. An example of essential
engagement with stakeholders is the need to communicate with intellectual
property rights owners so that they understand the preservation implications of
the control they wish to exert over digital materials.
In the preservation of digital materials some community sectors, disci-
plines and individuals are increasingly engaged in the selection of what to
preserve, relieving archivists and librarians of sole responsibility for such
68 Selection for Preservation – The Critical Decision
As noted several times already in this book, selection decisions about which
digital materials to preserve are best made at an early stage in their existence.
Continuum and lifecycle models assist us to develop effective selection princi-
ples. The continuum approach, explained in more detail by Upward (2005), is
essentially a way of thinking about the life of a record from its creation onwards
and was conceived to get around issues associated with the traditional split
between records and archives. For electronic records, recordkeepers can no
longer wait ‘passively at the end of the life cycle for records to arrive at the
archives when their creators no longer wanted them – or were dead’ (Cook,
2000, p.2). Significant records must be identified early in their life so that
management and preservation decisions that will ensure ongoing access to the
critical aspects of those records, such as the information content and the attrib-
utes that determine their authenticity, are made right from the start.
Lifecycle models applicable to digital preservation have proliferated. Per-
haps the most widely applied, specifically developed for digital curation, is the
DCC Curation Lifecycle Model (Digital Curation Centre, 2008). This model
places heavy emphasis on selection and appraisal by encapsulating them in the
Sequential Action ‘Appraise & Select’ and the Occasional Actions ‘Reappraise’
Developing selection frameworks for preserving digital materials 69
and ‘Dispose’. (Harvey (2010, chapter 11) describes selection of digital mate-
rials for preservation in the context of the DCC Curation Lifecycle Model.)
Other lifecycle models similarly emphasize that selection of materials is essen-
tial for high-quality preservation. Selection is a component of the ‘Acquisition’
element of the LIFE (Life Cycle Information for E-Literature) Model (Wheatley
et al., 2007). Classifying data assets as ‘vital’, ‘important’ or ‘minor’ is part of
the Data Asset Framework process (www.data-audit.eu) (Jones, Ruusalepp and
Ross, 2009, p. 27).
– Does the item or collection have sufficient value to and demand from a current
audience to justify digitization?
– Do we have the legal right to create a digital version?
– Do we have the legal right to disseminate it?
– Can the materials be digitized successfully?
– Do we have the infrastructure to carry out a digital project?
– Does or can digitization add something beyond simply creating a copy?
– Is the cost appropriate? (Gertz, 2000, p.104)
There has been increasing recognition that additional criteria are needed to
adapt selection decisions to digital materials. The Cedars Project Team report
(2002) was particularly helpful with its definition of a digital object’s Significant
Properties as ‘the level of content and functionality retained’. These significant
properties are not empirical; they require judgments to be made by organiza-
tions about how they apply to their user communities and to the organization’s
preservation responsibilities. The Cedars Project Team report provides an
example: for a text in PDF, the decision is made that the text, not the format, is
significant, so information about the PDF format does not need to be stored
(Cedars Project Team, 2002, p.14-15). The report places high importance on
negotiating intellectual property rights before any other preservation actions
occur. It suggests that selection decisions for digital preservation must ‘be
pragmatic’ and based on the ‘estimated value of the material, the cost of stor-
age and support mechanisms, and the production of metadata to support the
material’ (Cedars Project Team, 2002, p.53). Primary criteria proposed were
that the digital materials was in currently high use, it was the type of material
that we would expect to preserve if it were published in traditional printed
format (typically commercially published scholarly works), and it was tied to
the long-term or cultural interests of the organization (Cedars Project Team,
2002, p.53). Additional criteria proposed included legal and IP issues, format
issues and technical issues.
The decision tree for selection of digital materials for long-term retention
(Digital Preservation Coalition, 2006), developed to accompany the Digital
Preservation Coalition’s handbook on the preservation of digital materials,
provides further guidance. It first poses questions relating to selection of con-
tent and format: is there an institutional selection policy? Does the material fit
into it? Is the material of long-term value? A second group of questions is
about legal and intellectual property issues: have acceptable rights been nego-
tiated? Can they be? Technical questions form a third group of questions: can
you handle the file format, now and in the future? Can the material be trans-
ferred to a more manageable format? The existence of documentation and
metadata form a fourth group: has sufficient been supplied?
These digital selection frameworks still place high priority on criteria for
determining value, but emphasize other criteria: the legal and intellectual
property rights governing a resource; whether we have the technical ability to
Developing selection frameworks for preserving digital materials 71
preserve it; the costs involved in preserving it; and the presence of appropriate
documentation and metadata. Other research and practice has indicated that
these other criteria are central to decision-making about selection of digital
materials for preservation. In Phase 1 of its investigations the InterPARES pro-
ject addressed the question of how to select electronic records for preservation.
It determined that records in digital format should be selected for long-term
preservation on the basis of their continuing value and authenticity, and whether
it is feasible to preserve them. The authenticity of records can be established
using the InterPARES Benchmark Requirements for Assessing the Authenticity
of Electronic Records. The feasibility of preservation should take into account
costs and capacity to preserve. Appraisal of records in digital form should be
carried out as early as possible, ideally being part of the design of records
management systems, but assessment of authenticity and feasibility may be
carried out later. Appraisal decisions should be monitored regularly to ensure
that the information kept about the records is valid (InterPARES, 1999-). The
UNESCO Guidelines consider the attributes that contribute to authenticity as
‘the elements that give material its value … the selection process should con-
sider what those elements and characteristics are’ (UNESCO, 2003, p.72).
These vary according to the kinds of material.
It is unlikely that a single selection framework will suffice for all digital materials
in every context, given factors such as disciplinary differences and the variant
structures of digital materials (an email is not the same as a database, for exam-
ple). Selection frameworks for some materials are available and provide models
that can be adapted to suit other materials in different contexts. The concept of
technical appraisal provides one example. Thompson (2010) describes how
this is being applied at the Wellcome Library. The long-term preservation of
digital materials depends on an understanding of how file formats work and
requires access to appropriate software and hardware and the skills to use them.
If these are not available in the archive, digital preservation cannot be success-
ful. The Wellcome Library assigns high, medium and low levels of confidence
in its ability to preserve material based on their formats. These levels of confi-
dence are ‘based on resources available, the availability of tools for managing
digital material and experience with the life cycle management of born digital
materials’ (Thompson, 2010). Thompson takes care to point out that ‘this
approach is clearly based on pragmatic considerations’ and has risks associated
with it, but is ‘appropriate for the Wellcome Library at this point in time’. He
notes that ‘other factors such as intellectual content, significance of the material,
significance of the donor/creator and any relationship to material already in the
Library also play a part’. Selection may also be based on an assessment of
72 Selection for Preservation – The Critical Decision
risks. Risk management principles used at the British Library to prioritize digi-
tal materials stored on portable media such as CDs and DVDs are described by
McLeod (2008).
Research funding agencies increasingly require that data management plans
are presented as a part of applications for funding. Guidance for researchers
about developing funding applications includes advice about selection. Whyte
and Wilson’s guide (2010), developed for the Digital Curation Centre, provides
seven criteria that should be present in any selection policy for research data,
noting that they will be modified by discipline-specific factors. These criteria
are worth quoting in full:
1. Relevance to Mission: The resource content fits the centre’s remit and any priori-
ties stated in the research institution or funding body’s current strategy, including
any legal requirement to retain the data beyond its immediate use.
2. Scientific or Historical Value: Is the data scientifically, socially, or culturally sig-
nificant? Assessing this involves inferring anticipated future use, from evidence of
current research and educational value.
3. Uniqueness: The extent to which the resource is the only or most complete source
of the information that can be derived from it, and whether it is at risk of loss if not
accepted, or may be preserved elsewhere.
4. Potential for Redistribution: The reliability, integrity, and usability of the data
files may be determined; these are received in formats that meet designated techni-
cal criteria; and Intellectual Property or human subjects issues are addressed.
5. Non-Replicability: It would not be feasible to replicate the data/resource or doing
so would not be financially viable.
6. Economic Case: Costs may be estimated for managing and preserving the resource,
and are justifiable when assessed against evidence of potential future benefits; fund-
ing has been secured where appropriate.
7. Full Documentation: the information necessary to facilitate future discovery, ac-
cess, and reuse is comprehensive and correct; including metadata on the resource’s
provenance and the context of its creation and use (Whyte and Wilson, 2010).
Each of these seven criteria for selecting research data are expanded in the
guide. They are robust enough to be used as the basis for selection frameworks
for digital materials other than research data.
org). Both have their strong proponents and in the preservation of web sites a
combination of targeted acquisitions and supplementary periodic snapshots of
a larger domain is often applied. The same spectrum applies to the selection
for preservation of other kinds of digital materials.
The cases for and against selection in the digital preservation context are
presented in this way:
Advocates of a comprehensive approach argue that any information may turn out to
have long-term value, and that the costs of detailed selection are greater than the costs
of collecting and storing everything. Advocates of a more selective approach argue that
it allows them to create collections of high value resources, with some assurance of tech-
nical quality and an opportunity to negotiate access rights with producers (UNESCO,
2003, p.73).
Conclusion
Generic frameworks for selection of digital materials for preservtion are now
available and can be tailored to meet the requirements of specific contexts or
kinds of digital materials. Further research into selection is still needed, as it is
into other parts of the life cycle of digital materials. Such investigation needs
to take into account the high levels of human input currently required in making
selection decisions, and their cost. By identifying selection processes that can
be automated, it may be possible to reduce the level of resources needed for
this aspect of digital preservation. The outcomes may well result in ‘radically
74 Selection for Preservation – The Critical Decision
A good example to illustrate this difficulty comes from the realm of the web.
Some web sites do not have a clearly-defined audience, so it is difficult to
determine what users of these sites might wish to see in the future and even
more difficult to define the characteristics of those web sites that it is essential
to preserve. Many web sites are dynamic and interactive, with ‘search and
retrieval aspects intrinsically bound with content’. Therefore we need to preserve
the functionality of the search and retrieval components, as well as the data
that the functionality interacts with to generate the required output (Smith,
2003, p.1).
In order to determine the essential elements of digital materials we want to
retain access to, we need to know in the future ‘that they are what they purport
to be’, that they ‘are complete and have not been altered or corrupted’ (Ross,
2002, p.7). If we cannot ensure this, we will be victim to fraud, as exemplified
by the Piltdown man. Considerable energy has been expended, particularly in
the recordkeeping context, on determining what authenticity and integrity
mean in the digital environment. If these characteristics cannot be established for
digital materials, then their genuineness and our ability to use them for evidential
purposes are devalued. How do we demonstrate that digital materials have not
been altered? What actions do we have to take to establish authenticity and in-
tegrity and maintain them over time so that the future user can be confident the
digital materials they are using retain their original meaning? (Ross, 2002, p.7).
Another concept that we need to know more about when we address the
question ‘what attributes of digital materials do we preserve?’ is that of accept-
able loss. It is unlikely, if not impossible, that we can preserve all of the attributes
and functionality of digital materials, but we do not know much at all about the
levels of loss that are acceptable to users. In 2003 a statement about research
needs in digital preservation asked what still remains a crucial question:
how can we measure what loss is acceptable? What tools can be developed to inform
future users about the relationship between the original digital entity and what they re-
ceive in response to a query? (NSF-DELOS Working Group on Digital Archiving and
Preservation, 2003, pp.19-20).
Digital materials, technology, and data 77
If an object is preserved only to enhance access and use, some transformations might
be desirable that would be prohibited if they transformed attributes deemed essential to
the object … It would seem obvious that strategies, tactics and method for the preserva-
tion of digital objects ought to be informed by a rich understanding of their nature and the
specific objectives for preserving them in digital form. The nature of digital objects
defines what we need to preserve (Thibodeau, 1999).
This chapter examines some of the key concepts and issues associated with
identifying the attributes of digital objects that need to be preserved.
– digital objects that are simple and less dependent on specific software
(plain text in ASCII files is an example)
– digital objects that depend on more complex software that is, however,
generic (such as HTML)
– digital objects that depend on software specific to a particular hardware
platform or operating environment (such as spreadsheets and word proc-
essing software).
Some digital objects contain executable files and software programs, and others
may combine all of the above.
The extent to which the digital object is dependent on software has a sig-
nificant effect on the way in which it can be preserved (UNESCO, 2003,
p.120). Furthermore, it is generally considered that it is not possible to preserve
originals of digital materials, because migration is probably inevitable at some
stage in the preservation process, and migration involves newer technology
that alters the originals. We are preserving copies of the originals, ‘preservation
copies made according to the particular methods and strategies that are appro-
priate or expedient’ (Gilliland-Swetland, 2002, p.198).
Crucial to the development of digital preservation was the realization that
separating technology from data is a key requirement. The technology (hard-
ware and software) changes rapidly and will continue to do so. Earlier thinking
placed a high emphasis on preserving the ability to read bit-streams in the
software that was used to create them, and digital preservation solutions such
as museums of technology and emulation were proposed. Once it was acknowl-
edged that bit-streams would have to be transferred from one medium to another
and that they could be rendered (or viewed or presented) using hardware/soft-
ware combinations other than those with which they were created, the focus
shifted from maintaining access to obsolete software and hardware to deter-
mining the characteristics or attributes of digital objects essential for maintaining
access to them on current and future hardware/software combinations. However,
this change in thinking brought its own problems, as an early statement about
electronic records recognized:
Once we accept that attention should be focused on the preservation of the data,
regardless of the hardware and software that the data will be processed with,
then we need to pay a lot more attention to those data – to the attributes of the
data to be preserved and the level of acceptable loss and alteration of the data.
As already noted, the process of preserving digital objects inevitably means
that they are altered. While at one level digital preservation is a relatively simple
matter of preserving the bit-stream (although its simplicity should not be over-
stated, as Rosenthal (2010a) reminds us), the result is meaningless unless the
means to read and interpret that bit-stream are available, either by preserving
them or developing software that makes them accessible. We might, for example,
emulate a software application with the probable resultant loss of some func-
tionality, which affects that software’s ability to interpret the bit-stream fully.
What effect does this have on the authenticity of the digital object? Is some
loss acceptable? If we can determine which attributes of that digital object are
essential for us to understand it, we can then seek to preserve these attributes.
For example, the colours presented on a screen may not be significant for text
content, but they could be significant for understanding a web page. Static
document-like digital materials, typically those in PDF, HTML or XML for-
mats, can be read using software other than that in which they were created.
(In fact, it is this very characteristic of these file formats that makes them so
useful for digital preservation purposes.) Other digital materials, such as edu-
cational software, games and web animations, are ‘executable’ files – that is,
they are in a format that is directly read by a computer as a program and is run
(‘executed’). These are typically in proprietary file formats and are, therefore,
not easily re-usable in other contexts.
To date the determination of which attributes of digital objects are essen-
tial to maintain into the future has been addressed chiefly by the recordkeeping
community in relation to establishing the requirements for authentic digital
records (considered later in this chapter). Once these have been established,
‘combinations of data, software and hardware that will re-present those elements
as accurately as required’ need to be maintained (UNESCO, 2003, pp.120-121).
dural contexts of their development as well as among all materials created by the same
activity. The organic nature of records refers to all these interrelationships, and archival
practices are designed to collectively document, capture, and exploit them. These
practices recognize that the value of an individual record is derived in part from the
sequence of records within which it is located. They also recognize that it can be difficult
to understand an individual record without understanding its historical, legal, proce-
dural, and documentary aspect (Gilliland-Swetland, 2000, pp.16,18).
community has adopted the concepts and terminology embedded within it. The
OAIS Reference Model is now firmly established as the key international
standard in digital preservation. (Lee (2010) provides a brief and informative
introduction to the OAIS Reference Model and its influence.)
The OAIS Reference Model was developed by the Consultative Committee
for Space Data Systems (based at NASA in the US) and, after input from other
communities, is now an international standard – ISO 14721:2003. The OAIS
Reference Model has been widely used as the basis for digital preservation
systems and research projects, including those of major libraries, such as the
British Library and the national libraries of Australia, France, New Zealand
and the Netherlands, and significant research projects, such as Planets (Lee,
2010, p.4028).
Part of the significance of the OAIS Reference Model lies in the establish-
ment of a common language for discussion of digital preservation; its Fore-
word notes that it ‘establishes a common framework of terms and concepts’ to
allow ‘existing and future archives to be more meaningfully compared and
contrasted’ and to promote standardization (Consultative Committee for Space
Data Systems, 2002, p.iii). For example, it defines long term preservation:
‘long enough to be concerned with the impact of changing technologies, in-
cluding support for new media and data formats, or with a changing user
community. Long Term may extend indefinitely’ (Consultative Committee for
Space Data Systems, 2002, p.1-1). It makes a clear distinction between simple
data storage and long-term preservation:
The significance of the OAIS Reference Model also lies in its articulation of
the functional requirements of a digital archival system. It defines seven func-
tions:
These seven functions identify the processes that need to be incorporated into a
system for preserving digital information.
The OAIS Reference Model has proved to be influential in the develop-
ment of digital preservation, one indication being the widespread adoption of
the OAIS Reference Model’s term ingest in digital preservation discussion. Of
greatest relevance to this chapter, however, is the OAIS concept of ‘information
package’, explained as
In other words, a digital object consists of considerably more than just the con-
tent that we wish to preserve (the ‘information which is the original target of
preservation’). It also comprises information that tells us what we need to know
in order to preserve it (‘to ensure it is clearly identified, and to understand the
environment in which the Content Information was created’), information about
its attributes (such as file formats), and so on. What is this information? And
why is it an essential component of the digital materials ‘package’ in relation
to preservation? This information – metadata – is an essential part of the OAIS
Reference Model and is vital to all digital preservation activities. The OAIS
Reference Model uses the terms description information and representation
information to refer to metadata (that is, ‘structured information that describes,
explains, locates, or otherwise makes it easier to retrieve, use, or manage an
information resource’ (NISO, 2004, p.1)).
17. Digital heritage materials must be uniquely identified, and described using
appropriate metadata for resource discovery, management and preservation.
18. Taking the right action later depends on adequate documentation. It is easier to
document the characteristics of digital resources close to their source than it is to
build that documentation later.
19. Preservation programmes should use standardised metadata schemas as they become
available, for interoperability between programmes.
20. The links between digital objects and their metadata must be securely maintained,
and the metadata must be preserved (UNESCO, 2003, p.22).
– ensure that we can always locate digital objects, regardless of where they
are stored
– describe digital objects clearly
– indicate the relationship between one digital object and other digital ob-
jects
– identify the technical characteristics of digital objects clearly
– indicate who has responsibility for managing and preserving digital objects
– describe how digitals object can legally be used
– describe the requirements for re-presenting digital objects
– record the history of digital objects
– document the authenticity of digital objects.
A reminder is timely at this point – digital preservation requires that the meta-
data associated with a digital object, as well as the digital object itself, is main-
tained over time and is readable in the future.
The kinds of metadata required to carry out the functions noted above are
generally categorized as:
Preservation metadata
metadata extraction tool.) Also required are tools that manage metadata schemes
so that they are useful over time, such as those ‘to track the provenance of
metadata schema, for version control, and to allow users to navigate from cur-
rent metadata schema and ontologies to those used when the digital entity
was created’. The value of metadata could be assessed relative to the costs of
‘extracting, creating and managing’ it, to provide a better understanding of the
‘minimum amount of metadata necessary for digital preservation’ (NSF-DELOS
Working Group on Digital Archiving and Preservation, 2003, p.19).
Persistent identifiers
Persistent identifiers are a form of metadata applied to digital materials. A per-
sistent identifier is ‘a name for a resource which will remain the same regardless
of where the resource is located’ (National Library of Australia, 2002). If the
material (or resource) is moved, the persistent identifier provides access to it in
its new location, provided that the persistent identifier is ‘maintained with the
correct current associated location when the resource was moved’ (Persistent
identifiers, 2002). Persistent identifiers can be considered as a element of preser-
vation metadata, in the subset of preservation description information already
noted above.
The use of persistent identifiers is usually explained with respect to web-
based digital materials. The normal way of identifying web materials is to use a
URL (Uniform Resource Locator). However, a URL points only to one location
of the material on the web; if the location of that material changes (for example,
the material is moved from one domain name to another), the URL is altered and
the material becomes inaccessible if the superceded URL is used. The persis-
tence of URLs has been investigated in many studies and their tendency to change
is widely recognized as an impediment to digital continuity. A recent example is
Rhodes report (2011) on a study which found that link rot was 8.3 per cent in the
first year, 14.3 per cent in the second year, 27.9 per cent in year three and 30.4
per cent in the fourth year. Persistent identifiers have been developed as a re-
sponse to this tendency. They are needed not only for web material but also for
data of all kinds. Unambiguous and reliable identification of digital materials is
essential for reliable long-term access and for ensuring their authenticity. It is
also required to support the linking of data, through which digital objects are
connected to other data or digital objects automatically and transparently.
Persistent identifiers are not new; formal schemes for identifying objects,
such as ISBN (International Standard Book Numbers), ISSN (International
Standard Serials Numbers) and ISMN (International Standard Music Numbers),
were developed and applied for many years in the pre-digital context. Various
kinds of persistent identifiers for digital materials have been developed and
used. The Uniform Resource Name (URN) is ‘a standard, persistent and unique
Authenticity 87
Authenticity
As noted in Chapter 1, definitions of terms taken from the pre-digital preserva-
tion paradigm are not always appropriate when applied to digital preservation.
Discussions about digital preservation are further confused by the difference in
terminology among librarians, recordkeepers and other interested groups:
‘Terms like provenance, archiving, context, records, etc. are used with slightly
different meanings … any discussion about preservation is challenged by
confusion of terminology’ (Hofman, 2002, p.15). The term authenticity is one
of these, so we need to establish what it means in the context of digital preser-
vation.
88 What Attributes of Digital Materials Do We Preserve?
– integrity, which for digital material is ‘the state of being whole, uncorrupted
and free of unauthorised and undocumented changes’ (UNESCO, 2003,
p.158)
– significant properties – ‘the elements, characteristics and attributes of a
given digital object that must be preserved in order to re-present its essential
meaning or purpose’ (UNESCO, 2003, pp.157-158); and
– identity – the attributes of a digital object that uniquely characterize and
distinguish it from other digital objects (Duranti, 2003, p.2).
The UNESCO Guidelines get to the heart of the matter, stating succinctly that
‘Authenticity derives from being able to trust both the identity of an object
– that it is what it says it is, and has not been confused with some other object –
and the integrity of the object – that it has not been changed in ways that
change its meaning’ and that ‘evaluating, maintaining and providing evidence
of continued authenticity are key responsibilities for most preservation pro-
grammes’ (UNESCO, 2003, p.108-109).
Authenticity is valued in a number of different contexts for a variety of
reasons. It is critical where digital materials are used as evidence. For records,
the authenticity of the record is paramount; all who use it need to be able to
trust that it is what it purports to be. Many kinds of enterprises, whether com-
merical or nonprofit, government or non-government, are legally obliged to
keep records for specified periods to demonstrate accountability and for other
reasons such as continuity of operations and organizational memory; their
records must be authentic if they are to support these purposes. Scholars and
researchers rely on references to the materials they cite being stable over time.
For legal purposes some material must be able to meet the evidential require-
ments of a jurisdiction. The list goes on: heritage materials are valued because
they are authentic; for scientific data, ‘trust in their ongoing authenticity is criti-
cal, for without it they are of virtually no value’ (UNESCO, 2003, p.108).
A high value has been placed on authenticity for a very long time. The
government of ancient Athens considered that a key function of a library
Authenticity 89
In addition, as noted in Chapters 2 and 3, there are threats that apply to all digi-
tal data – what the UNESCO Guidelines call ‘the ongoing integrity of data’ –
and they affect authenticity. Among these threats are breakdown of carriers,
malicious acts, such as attacks by hackers or viruses, terrorist attacks, war, civil
unrest (especially as they compromise power supplies and the integrity of build-
ings), accidental acts by staff, and fires, floods and other natural disasters, and
business failure (UNESCO, 2003, p.109).
Because authenticity is an attribute that is so highly valued, digital preserva-
tion programmes need to take appropriate steps to ensure that it is not compro-
mised during the processes of managing the materials in their custody. The
strategies applied to ensure the authenticity of digital materials include assigning
unique identifiers, applying encapsulation techniques (packaging together the
digital object, its metadata, and other associated data), digital watermarking,
using digital signatures, encryption, digital time stamping, maintaining audit
trails, controlling custody, and establishing trusted repositories (noted later in
this chapter).
Significant properties
Ensuring authenticity does not demand that materials are kept in their original
form without change; as already noted, it is impossible to keep digital materials
in their original form. To attempt to do so is incompatible with digital preser-
vation processes, which often result in the original bit-stream being altered.
We need to identify the attributes of digital materials that must be preserved to
ensure authenticity, and also those attributes that we do not need to preserve.
The attributes that must be preserved are variously described as essence, essential
elements, and significant properties, which has become the term most com-
monly encountered.
The term significant properties is usually defined as ‘the characteristics of
digital objects that must be preserved over time in order to ensure the continued
accessibilty, usability, and meaning of the objects, and their capacity to be ac-
cepted as evidence of what they purport to record’ (Grace, Knight and Montague,
2009, p.3). It was first defined and used by the Cedars Project, which ran from
1998 to 2002. Other research projects expanded on the work of Cedars, including
research at the National Archives of Australia (Heslop, Davis and Wilson, 2002)
and the InSPECT Project based at Kings College London and The National
Archives (Grace, Knight and Montague, 2009). Knight and Pennock (2009)
Authenticity 91
issues like look and feel … are significantly less important, because we can demonstrate
to you that if we show you the same Word document on a different operating system, it
can actually look different. If we show it on a different machine which has different
fonts installed, it will look different. If I alter the page setup, or even on different
machines with the same page setup, you’ll get a different pagination, and if you’ve got
an automatic footer with a page number in it, it will automatically change that pagination
for you, and it won’t be apparent to you that the pagination has changed. So a lot of that
we have actually defined as being ephemeral or circumstantial aspects of the performance
of the record in a particular situation … is not particularly relevant.
The concept is simple: the application of the same software and hardware to
the same digital materials should create a ‘presentation or performance’ that is
the same every time. For preserved digital materials, their essential elements
are presented during a performance sometime in the future: ‘copying data from
carrier to carrier, and providing the right tools to recreate the intended perform-
ance will preserve continuity of access to most digital objects’. This apparently
simple model is, however, rather more complex in practice:
it may be hard to define the performance that must be re-presented; it is usually diffi-
cult to work out what tools are needed once the original ones have been lost; the tools
themselves typically rely on other tools that also may have been superseded; and it may
be difficult to find tools that will create the required performance in a reliable, cost-
effective and timely way, especially in the context of many thousands, millions or more
of digital objects. Despite such underlying complexities, the performance model helps
in recognising what digital preservation programmes must aim for: the best means of
re-presenting what users need to access (UNESCO, 2003, p.35).
What this boils down to is that we need to understand the characteristics em-
bodied in the materials we are preserving and to determine the minimum set of
these characteristics that must be maintained in order to recreate the materials
in the future. As well as needing to preserve the physical object (the physical
form that carries the bit-stream) it is necessary to preserve the logical object
(the computer-readable code) and the conceptual object (‘the performance pre-
sented to a user’ (UNESCO, 2003, p.35)) that is meaningful to the human user.
However, we also need to envisage the digital object as ‘bundles of essential
elements that embody the message, purpose, or features for which the material
was chosen for preservation’ (UNESCO, 2003, p.35). Not all of the elements
that make up a digital object are equally important in recreating the conceptual
object.
It is also necessary to understand the needs of the community of users for
whom digital materials are being preserved. The significant properties concept
can also be understood as the properties of digital materials that are significant
to the stakeholder; that is, it is the stakeholder who assigns significance, rather
than the technical characteristics of the digital object that determine it (del Pozo,
Long and Pearson, 2010, p.293). ‘Community’ can be as closely defined as a
specific organization or discipline group, or as widely defined as ‘the general
public’. Community needs determine the kind of material selected for preserva-
tion and the level of authenticity required for the material selected. (As already
noted, the OAIS Reference Model articulates this as Designated Community –
‘An identified group of potential Consumers who should be able to understand
Authenticity 93
– For whom should this material be kept? Do they have specific expectations about
what they will be able to do with the material when it is re-presented?
– Why are the materials worth keeping? What gives them the value that warrants the
trouble of preserving them? Is that value associated with evidence, information, ar-
tistic or aesthetic factors, significant innovation, historic or cultural association,
what a user can make the material do or do with the material, culturally significant
characteristics?
– Is the value tied to the way the material looks? (Would it be lost or significantly de-
graded if the material looked different?)
– Is the value tied to the way the object works? (Would it be lost if particular func-
tions were removed? Or if particular functions happened at a different speed or re-
quired different keystrokes?)
– Is the value tied to the context of the material? (Would it be lost if links embedded
in the material did not work? Or if a user could no longer see evidence that con-
nected the material with its original context?)
– Is it possible to distinguish between elements within each of these areas? For exam-
ple, would advertising banners be considered an essential part of the way the mate-
rial looked? Would some navigation elements or display functions be needed but
not others?
– If it is difficult to define what needs to be maintained, it may be easier to consider
the impact of an element not being maintained, and to look for functions or elements
that are definitely not needed.
Further elaboration of the example of emails used earlier in this chapter illus-
trates some aspects of preserving authenticity and of significant properties.
Authenticity in this case ensures the trustworthiness of an email as a record of
94 What Attributes of Digital Materials Do We Preserve?
For emails, it could be decided that it is only the content information that users
require – ‘the name and address of the sender, subject, date and time, recipients,
and the message, in a standardised structure with only the most simple of for-
matting’ (UNESCO, 2003, p.74).
More research into authenticity and its requirements in the digital world is
required, although, as the authors of a CLIR publication (Authenticity in a digital
environment, 2000) emphasize, establishing just what the requirements are is
no easy task. The NSF-DELOS Working Group on Digital Archiving and
Preservation suggested in 2003 that the digital preservation research agenda
should include research into tools that allow future users of digital materials to
determine whether they are authentic (NSF-DELOS Working Group on Digital
Archiving and Preservation, 2003, p.vii). Research has also been done by the
InSPECT Project (Grace, Knight and Montague, 2009), but we still need defini-
tions of significant properties for more types of digital materials and a clearer
understanding of how these significant properties affect use and access of digital
materials, and knowledge of how much change is acceptable before the authen-
ticity of digital materials is compromised, among other things.
Research into authenticity 95
One of the ironies of digital preservation is that the web site on which the re-
ports of the Functional Requirements for Evidence in Recordkeeping Project,
administered by the University of Pittsburgh, were to be found was an early
victim of the instability of digital materials. The web site hosting the working
files of this project was deleted. Some of the site was subsequently recon-
structed using files captured and preserved by the Internet Archive (www.archi
muse.com/papers/nhprc). The project, commonly referred to as the Pittsburgh
Project, investigated during the early to mid 1990s the functional requirements
necessary for electronic recordkeeping systems to meet the needs of archivists
better. Its focus soon changed from system requirements to what is required for
electronic records to be considered as evidence. It developed a definition of what
constituted an electronic record that highlighted the importance of metadata
(Heazlewood, 2000, p.176).
InterPARES
tems that meet commonly accepted standards to ensure the ongoing manage-
ment, access and security of the digital materials accepted by the repository;
implementation of system evaluation methodologies; and clearly stated policies.
Because the creators, the owners and the users of digital materials need to
be able to trust digital repositories, certification is a vital aspect of their estab-
lishment and operation. Not only is certification required, but auditing is also
required to monitor ongoing trustworthiness. Checklists are available for self-
assessment of repositories, such as TRAC and nestor’s Catalogue of Criteria
for Trusted Digital Repositories (nestor Working Group, 2006). Tools to assist
auditors are available, two examples being the online interactive tool for
repository managers, DRAMBORA (Digital Repository Audit Method Based
on Risk Assessment) toolkit (www.repositoryaudit.eu), developed by the DCC
and Digital Preservation Europe, and PLATTER (Planning Tool for Trusted
Electronic Repositories; www.digitalpreservationeurope.eu/publications/reports/
Repository_Planning_Checklist_and_Guidance.pdf).
Conclusion
Chapter 1 notes that to ensure that digital materials remain usable in the future,
access to them is required – and not simply access, but access to ‘all qualities of
authenticity, accuracy and functionality’ (Digital Preservation Coalition, 2008,
p.24). Authenticity of digital material needs to be defined in relation to users’
requirements, and the elements of the materials that meet these requirements
must also be clearly defined. It is also essential that we realize that some loss of
elements and functionality is inevitable. Consequently the levels of acceptable
loss must also be clearly described. It must, however, be acknowledged that
there are no absolute answers.
Once we have defined the levels of loss that future users of digital materials
are able to accept, we can focus on how to preserve the essential elements. The
next chapter examines strategies and techniques for digital preservation.
Chapter 6
Overview of Digital Preservation Strategies
Introduction
Maintaining access to digital resources over the
long-term involves interdependent strategies for
preservation in the short to medium term based on
safeguarding storage media, content and documen-
tation, and computer software and hardware; and
strategies for long-term preservation to address the
issues of software and hardware obsolescence
(Digital Preservation Coalition, 2008, p.103)
This chapter first describes some of the history of the development of principles
and strategies for digital preservation. It then notes strategies that are being
applied. Finally, some existing typologies of principles, strategies and practices
are noted and a further typology is proposed.
Historical overview
Chapter 1 explored the need for a new preservation paradigm in the digital world.
It noted that the principles that lie behind preservation practice, and the preser-
vation techniques themselves, need to change. Where pre-digital preservation
paradigm practices are based on the preservation of artifacts, the new preserva-
tion paradigm requires different ways of thinking which are still not completely
clear. Some are accepted and understood: for example, that there is a need to
actively maintain digital information from the moment of its creation, and that
there is likely to be greater emphasis on collaboration and support from a wider
range of stakeholders.
102 Overview of Digital Preservation Strategies
this could be observed in the methods being applied: printing to paper or micro-
film, with the loss of ‘retrieval and reuse potential’, and migration strategies,
which by normalizing to a standardized data format often result in loss of
information about document structures and relationships (Hedstrom, 1998,
p.195). One year later Hedstrom and Montgomery reported the results of a
survey of 54 Research Library Group members. Only 13 institutions had digital
preservation methods in place, the most common being ‘transfer accessions to
new media’ (25), refresh (18), migrate (17), ‘limit formats accessioned’ (16) and
‘standards for archival masters’ (7) (Hedstrom and Montgomery, 1999, pp.14-
15).
Five years later Walton observed that, although there was no consensus on
the best method or methods, four major national archives selected strategies
which emphasized ‘preserving objects over preserving technologies: conversion
to standard format (NA, NARA, NAA) or format migration (PRO)’ (Walton,
2003, p.7). Also in 2003, Bryan’s small survey of United States manuscript
repositories, which sought to ascertain their practice in managing preservation
of born-digital material, noted that of the nine respondents, four ‘print electronic
documents to paper and three migrate them to a server’ (Bryan, 2003). In the
same period the European Commission-funded ERPANET Project developed
case studies documenting the experiences of and approaches to digital preser-
vation in the pharmaceutical, publishing and telecommunications sectors.
These note some general strategies, for example, migration and reducing the
proliferation of formats. In the pharmaceutical industry, ‘size and proliferation
of formats are the main obstacles to the preservation of objects’. Migration to
new data formats was carried out when necessary, PDF (Portable Data Format)
had become the standard format for preservation purposes, and there was a
general belief that digital preservation solutions should come from outside the
industry. In the publishing industries ‘PDF proved a very popular format along-
side TIFF (Tagged Image File Format), XML (Extensible Markup Language),
and SGML (Standardized General Markup Language) for distribution and preser-
vation.’ The telecommunications organizations relied on migration carried out
when their business software was updated (Ross, Greenan and McKinney, 2003).
Two more extensive surveys published in 2004 provide firmer data. A sur-
vey of digital preservation practice in 21 natural science and scientific
publishing organizations operating on an international level concluded that
‘migration remains the preservation strategy of choice; it is still too soon for
most archives to have undergone a significant technological change’ (Hodge
and Frangakis, 2004, p.2). The strategies used in this community were ‘Trans-
formation to a Preservation Format’ (for example, ASCII and XML), migration,
and ‘Migration On-Request [where] the original version of the material is re-
tained and when necessary, conversion tools are applied to convert the original
to the format required by the user’ – this had been tested but not applied (Hodge
and Frangakis, 2004, pp.38-40). Another international survey, reported in 2004,
Who is doing what? 105
The first theme (societal and organizational missions) involved concepts such
as assured funding, a sustainable supportive environment over time, knowing
the context in which preservation occurs, and community will. A sustainable
environment that supports digital preservation over time was considered essen-
tial, and this is related to ongoing funding. To be sustainable, digital preservation
activities had to be built into ‘normal operating activity’ and a good understand-
ing of the context was required. The second theme (knowing what you are pre-
serving) was expressed in terms of the need to think clearly about exactly what
it was that we are trying to preserve. These concepts were closely linked to the
third theme (standards), which noted strategies such as metadata, data formats
that remain accessible, normalization, and standard data formats. Standards
were considered as an essential aspect of any preservation strategy; XML,
metadata standards, and standard data formats were specifically noted. The
fourth group of characteristics comprised operational aspects: capture the
106 Overview of Digital Preservation Strategies
material first, build digital preservation into normal operations, integrate digital
preservation processes fully, keep data moving. The technical issues of the
fifth theme included such principles as the importance of not relying on pro-
prietary data formats or systems.
According to the Australian specialists interviewed, an effective digital
preservation strategy had these characteristics:
Has the situation changed in the last five years? Surveys of digital preservation
activities in 2009 and 2011 suggest that it has. Responses to a 2009 survey,
mainly of national libraries and archives in Europe but also with North American
input, indicated that digital preservation was understood as an issue that
demands action. Eighty-five per cent of respondents had or were planning a
digital preservation ‘solution’. Half of the respondents had in place a digital
preservation policy, which is significant because the existence of a policy meant
that an organization was three times more likely to have a budget for digital
preservation and a solution planned or in place (Planets, 2009, p.4). A significant
change from earlier surveys was the availability of software tools for digital
preservation, both proprietary and open-source, that had been evaluated and
used, among them DSpace, Fedora, E-prints, Ex Libris Digitool, and IBM
DIAS (Planets, 2009, p.50). Although this survey was not specifically about the
strategies in place or being considered, the high percentage of organizations
with policies, budgets, and solutions in place augurs well.
A 2011 survey of 72 members of the ARL (Association of Research Librar-
ies) about preservation of materials in their institutional repositories (Li and
Banach, 2011) also indicated that awareness of the importance of policies
about digital preservation had increased since a 2005 survey. Just over half of
the respondents to the 2011 survey indicated that they now have policies in
place. Specific strategies noted in this survey were backups (implemented by
Criteria for effective strategies and practices 107
93 per cent), storage in a secure system (76 per cent), checksums (63 per cent),
migration (50 per cent), refreshing (47 per cent), and emulation (7 per cent).
Many of these preservation processes were being handled within repository
software, DSpace being the most popular.
This approach must be extensible, since we cannot predict future changes, and it must
not require labor-intensive translation or examination of individual documents. It must
handle current and future documents of unknown type in a uniform way, while being
capable of evolving as necessary. Furthermore, it should allow flexible choices and
tradeoffs among priorities such as access, fidelity, and ease of document management
(Rothenberg, 1999b, p.30).
This is a tall order. Implicit in Rothenberg’s comments is the idea that there is
a single technical solution – he is well-known as championing emulation as
this single solution; but others do not agree. Lavoie and Dempsey (2004) have
suggested that factors such as ‘cost, user preferences, nature of the material,
whether it exists in multiple forms’ must be taken into account when ascertaining
which preservation strategies are appropriate.
108 Overview of Digital Preservation Strategies
– its feasibility (is there software and/or hardware capable of doing it?),
– its sustainability (can it be done into the future, or can an alternative future path be
identified?),
– its practicality (can it be applied within reasonable limits of difficulty and expense?),
and
– its appropriateness (this criterion relates to the type of objects and why we are pre-
serving them) (Thibodeau, 2002, pp.15-16).
Factors Remarks
Maturity Is the technology fully developed and are there
already systems in productive use?
Experience Are there already verifiable experiences in applying
the technology for the preservation of similar objects?
Spread Is the technology widespread enough to guarantee
that it will be supported by the manufacturers during
the desired lifespan of the preservation system?
Standardization; open Is the technology based on standards and are the speci-
specifications fications of all the critical elements laid open by the
General
Factors Remarks
Staff for maintenance Is the appropriately skilled work force in the right
number for the maintenance available?
Experience Is there sufficient experience with the technology for
support in case of difficulties available? (in-house or
easily accessible in the region)
Workflow Can the technology easily be implemented in the
preservation workflow or can the workflow be
adapted to the technology without major difficulties
or loss of efficiency?
Flexibility Can the technology flexibly be implemented? Does it
Procedures
Broader concerns
Attention to a more holistic approach to digital preservation is evident in the
increasing literature about broader concerns such as standards, planning, poli-
cies, and sustainability. Planning of digital preservation needs to take place at
all stages of digital preservation and needs to be based firmly in policy and
standards. Digital preservation operations need to be sustainable.
Standards
Planning
one of the central functions of this standard (see Chapter 5). Developing a data
management plan is increasingly required by funding bodies, who may specify
that data sharing, curation and preservation are a part of the plan for any pro-
ject for which funding is sought. Examples of funding bodies that require data
management plans are the National Science Foundation in the US (www.nsf.
gov/eng/general/dmp.jsp) and the Wellcome Trust (www.wellcome.ac.uk/About-
us/Policy/Spotlight-issues/Data-sharing/Guidance-for-researchers/index.htm), a
major funder of medical research in the UK.
Planning should start, ideally, when digital materials are created. They
should be created using open file formats that are in widespread use, are well
supported, and are stable. This reduces the risk of file formats becoming obso-
lete and unusable in the future. Later stages in the digital preservation life-
cycle also require planning: ingesting material into a digital archive requires
planning to ensure that procedures are consistent; storing digital materials
demands that sustainable storage is planned; making digital materials accessible
to users requires planning so that ways of accessing digital objects that are
appropriate to user communities are implemented; planning to collect relevant
metadata is vital at all stages of digital preservation. These are only some of
the areas where planning is required.
Software tools to assist with planning digital preservation are now avail-
able. One example is Plato, an outcome of the Preservation and Long-Term
Access through Networked Services (Planets) project, now the OpenPlanets
Foundation (www.openplanetsfoundation.org). Plato supports the decision-
making process to determine which preservation actions best suit the digital
materials that are to be preserved. It is available as open-source software
(www.ifs.tuwien.ac.at/dp/plato/intro.html). Case studies describing implemen-
tations of Plato are available: examples are its use in preservation of the digital
collection of the Danish Folklore Archive (Chivers et al., 2010) and its integra-
tion into the EPrints digital repository software (Tarrant et al., 2010). Another
planning tool is DMP Online (dmponline.dcc.ac.uk). Developed by the Digital
Curation Centre, this software tool assists users to develop a preservation plan
that meets the requirements of funding bodies for a data management plan
(Donnelly, Jones and Pattenden-Fail, 2010).
Policies
A principle that applies to all information services is that policies are necessary;
digital preservation is no different. Policies are a prerequisite for effective
digital preservation. They clearly state principles, values, and intentions, specify
how the policy is monitored and who is responsible for its maintenance, pro-
vide links to related policies, and indicate the process for reviewing the policy,
including frequency of review. The benefits of having policies about digital
Broader concerns 113
Sustainability
impractical because ‘it requires multiple solutions (that is, one for every dis-
tinct category of equipment being preserved) rather than a single one’, and
writing emulator software is labour-intensive and costly. There is also, for the
first category, the major issue of technology obsolescence.
The second type concentrates primarily on data formats. Format migration
attempts to ensure that the data remains live by continually updating it so it can
be accessed using whatever technology is current. Format standardization con-
verts the data to a limited number of non-proprietary formats (for example,
TIFF for images, XML for text) and there is consequently no need to rely on
one technology to read them, as many kinds of software can do this. Both of
these categories minimize the need to rely on software and hardware that will
become obsolete, but both require high levels of resources to manage ongoing
migration projects or convert to standard formats. Persistent object preservation
attempts to separate data completely from the technology. Digital materials are
described in terms that are independent of software or hardware; an example is
electronic records converted to XML and then encapsulated with metadata.
Walton notes, however, that ‘its operational viability remains to be demon-
strated’ (Walton, 2003, pp.5-7).
Walton’s approach is based on distinguishing between data preservation
and maintaining the technology. Thibodeau extends this approach. He describes
a ‘preservation spectrum’, with ‘preserve technology’ at one end and ‘preserve
objects’ at the other. On the ‘preserve technology’ end of the spectrum are
methods that attempt to keep data in specific logical or physical formats and to
use technology originally associated with those formats to access the data and
reproduce the objects. In the middle of the spectrum are methods that migrate
data formats as technology changes, enabling use of state-of-the-art technology
for discovery, access, and reproduction. On the ‘preserve objects’ end of the
spectrum are methods that focus on preserving essential characteristics of
objects that are defined explicitly and independently of specific hardware or
software (Thibodeau, 2002). But the suitability of a method for the material
being preserved also needs to be taken into account. Because some methods
are general and others apply only to specific technologies, Thibodeau proposes
that the method’s suitability (which he designates as applicability) needs also
to be accommodated.
Rothenberg takes a different tack. His classification is based on the extent
to which strategies and practices are complete. His ‘overview of proposed
approaches to preservation’ (Rothenberg, 2003) has three categories: non-
solutions, partial solutions, and potentially complete solutions. Non-solutions
include ‘do nothing’ and digital archaeology. Partial solutions form a longer
list:
data to a human readable form on a stable carrier such as paper, film or metal.
Maintaining the original format is the most popular category, with four strate-
gies: ‘Making the data ‘self-describing’ and ‘self-sustaining’ by packaging it
with metadata and with links to software that will continue to provide access
for some time’; ‘Maintaining the data in its original form … and providing
tools that will re-present it … using the original software and hardware … or
using new software that emulates the behaviour of the original software and/or
hardware’; ‘Maintaining the data and providing new presentation software
(viewers) that will render an acceptable presentation of it for each new operating
environment’; ‘Providing specifications for emulating the original means of
access on a theoretical intermediate computer platform, as a bridge to later
emulation in a future operating environment’. Standardizing formats includes
either ‘creating data in, or converting data to, a highly standardised form of
encoding and/or document structure (or file format) that will continue to be
widely recognised by computer systems for a long time’, or ‘converting the
data to a format where the means of access will be easier to find’. Migration
involves converting the data to new formats that can be read by each new
technology; it also includes providing the ability for migration on demand ‘by
maintaining the data and recording enough information about it to allow a future
user or manager to convert it to a then-readable form’ (UNESCO, 2003, p.121).
Finally, preservation programmes, these guidelines suggest, must be organiza-
tionally viable and financially sustainable, if they are to be effective and credible
(UNESCO, 2003, p.42).
This classification is noteworthy because of its emphasis on combining
strategies and practices. In the absence of a single, universally applicable solution
(which is unlikely ever to be developed), these guidelines support the applica-
tion of multiple strategies in preservation programmes. These will likely be
based on the use of standards for encoding, structuring and describing digital
objects, emulation of obsolete software or hardware, and migration of digital
objects (UNESCO, 2003, pp.120-121). Whatever we might consider as main-
stream approaches – the consensus is that these are normalization, migration,
technology preservation, and emulation – they are not mutually exclusive, and
will be deployed according to the kind of digital materials to which they are
applied and their access requirements.
None of the typologies described here is complete. They do not, for in-
stance, note the principle of redundancy (multiple copies at multiple locations),
best exemplified by the LOCKSS programme. However, they allow us to iden-
tify the principles that are most important and to make judgments about the
applicability and viability of specific strategies and practices in the longer
term. For example, they help us distinguish more clearly between ‘preserve
technology’ and ‘preserve objects’ approaches, they encourage us to consider
the nature of the digital object and the different requirements that each may
have, with consequent different ‘best’ strategies and technologies for their
118 Overview of Digital Preservation Strategies
preservation, and they suggest that the tools we have available to us at present
should be applied in combination.
Conclusion
Lavoie and Dempsey have noted that digital preservation is ‘a set of agreed
outcomes’ based on considerations such as the complexity of digital materials
and the features that we collectively decide should be preserved. These consid-
erations, they suggest, require that ‘the choice of preservation strategy will
need to reflect a consensus of all stakeholders associated with the archived
digital materials. Achieving such a consensus is difficult, and in some circum-
stances, impossible’ (Lavoie and Dempsey, 2004).
How, then, are we to determine which strategies are best if we do not have
this consensus to guide us? Lavoie and Dempsey continue:
A second-best solution is for the digital repository to articulate clearly what outcomes
can be expected from the preservation process. These outcomes should in turn be under-
stood and validated by stakeholders. Communication between the repository and stake-
holders, either to promote consensus on preservation outcomes, or for the repository to
disclose and explain its preservation policies, mitigates the risk that the repository’s
commitments are misaligned with stakeholder expectations (Lavoie and Dempsey, 2004).
serve object’ (noted earlier in this chapter), to which are added some aspects of
Rothenberg’s typology (also noted earlier in this chapter). They describe spe-
cific principles, strategies and practices. Chapter 7 examines approaches based
on preserving the technology, and Chapter 8 examines approaches based on
preserving the digital object.
Chapter 7
‘Preserve Technology’ Approaches: Tried and Tested
Methods
Introduction
The need to preserve digital materials runs counter
to the market ethos of the computing industry, which
requires high turn-over of hardware and software in
order to survive financially. This outlook necessitates
rapid changes in formats and functionality and an
unwillingness to support ‘obsolete’ technology, all of
which make it harder and harder to preserve access
to digital materials (Heazlewood, 2002)
The previous chapter presented a long list of possible strategies that are avail-
able or under consideration for digital preservation. It noted that they are still
developing, with no universally accepted solution in sight. The previous chap-
ter also noted some of the characteristics of digital preservation strategies and
practices and described some of the typologies proposed for their categoriza-
tion. The two chapters following this one take as their basis Thibodeau’s
typology, described in Chapter 6, add to it some parts of Rothenberg’s typology,
also described in Chapter 6, and use the outcome to describe specific principles,
strategies and practices. The result is three categories:
– Do nothing
– Storage and handling practices
– Durable/persistent digital storage media
– Analogue backups
– Policy development (already noted in Chapter 6)
– Standards (also already noted in chapter 6)
– Digital archaeology.
122 ‘Preserve Technology’ Approaches: Tried and Tested Methods
– Technology preservation
– Emulation.
The reader of this chapter and the next should keep firmly in mind that ‘Pre-
serve Technology’ and ‘Preserve Objects’, as noted in Chapter 6, are the two
ends of a spectrum of possibilities, not discrete points on that spectrum. There
are many points in between these two poles. It should also be understood that
there are ways of grouping the possible approaches. The grouping used in
Chapters 7 and 8 represents current thinking about digital preservation strate-
gies and practices, which will change. For instance, many of the approaches
described in this chapter – technology preservation, adherence to standards,
converting to stable analogue format, digital archaeology – are what the Digital
Preservation Coalition handbook refers to as secondary strategies, ‘those which
might be employed in the short to medium term either by the repository with
long-term preservation responsibility and/or by those with a more transient in-
terest in the materials’ (Digital Preservation Coalition, 2008, p.111). These
strategies are, in a sense, holding actions, buying time for digital materials while
longer-term strategies are developed, and practices that keep digital materials
viable are implemented. But, as is made clear in the Digital Preservation Coalition
handbook, these strategies and practices do not address the real issues of tech-
nology obsolescence; they only put off the need to make decisions.
This chapter draws heavily on two key sources: Preservation Management
of Digital Materials: A Handbook (Digital Preservation Coalition, 2008), and
the UNESCO Guidelines for the Preservation of Digital Heritage (2003). The
reader may wish to update these by referring to the numerous high quality
resources available on the web. Two useful starting points are the DCC web
site (www.dcc.ac.uk) and the Library of Congress’s Digital Preservation web
site (www.digitalpreservation.gov).
‘Non-solutions’
‘Non-solutions’ is one of three categories suggested by Rothenberg (2003) in
his overview of preservation approaches, the others being ‘partial solutions’,
and ‘potentially complete solutions’. This categorization is useful because it
emphasizes that some of the practices promoted as viable digital preservation
techniques, both in the past and at present, are not likely to achieve the aims
of digital preservation activities – that is, an assurance of long-term access
to authentic digital materials. Rothenberg’s examples of non-solutions are ‘do
nothing’ and digital archaeology. Leaving aside ‘do nothing’ for the moment,
‘Non-solutions’ 123
these non-solutions are strategies and practices that are useful in the suite of
approaches to digital preservation, or are essential parts of the infrastructure
required for successful digital preservation, but they do not provide outcomes
that result in preserved digital materials over time.
Of the five non-solutions noted in this chapter, the first (do nothing) is the
least realistic option. Unlike non-digital materials, for which the principle of
benign neglect can have validity, digital materials typically require active inter-
vention right from the moment of their creation, if they are to survive, as has
been noted in more detail in Chapter 1. Two of the non-solutions (storage and
handling practices and durable/persistent digital storage media) provide a
breathing space, extending the life of digital storage media and, thereby, ensuring
that the digital materials stored on them as bit-streams remain in good condition
for longer. This potentially provides sufficient time to develop and implement
strategies and practices that are viable over the long term. Storage and handling
practices focus on environmental control, handling, building design, and secu-
rity, although we note here only the first two. The durable/persistent digital
storage media non-solution is based on the premise that developing improved
storage media – media that will last longer, store more, and remain accessible
for longer than current media – will provide greater economies and efficiencies
by reducing the frequency of copying and the number of media that are handled.
However, both of these approaches are non-solutions because they focus on
the media, avoiding the real issue of technological obsolescence. Proposing
them as anything more than an interim solution is a classic example of the in-
appropriate application of pre-digital paradigm preservation thinking to digital
materials.
Another of the non-solutions, analogue backups, has the problem of negat-
ing the essential advantages of information in digital form, such as improved
retrieval of the information content and ease of dissemination. (There is, of
course, the possibility of converting back to digital form material that has been
copied to analogue form, although this is obviously expensive and unlikely to
happen.) This approach is also referred to as analogue copying, output to per-
manent paper or microfilm, and sometimes as page image techniques and saving
page images of artifacts, although these last two can also refer to making digi-
tal page images, such as PDF files. This approach, too, is an example of the in-
appropriate application of pre-digital paradigm preservation thinking to digital
materials. Two more approaches included in the category of non-solutions
(policy development and standards) are in fact principles, following the defi-
nition and use of this term in Chapter 6, rather than strategies or practices.
Attending to these provides a more favourable environment for successful
digital preservation. Policy development is a prerequisite for digital preserva-
tion. Both are considered in more detail later in this chapter. Standards are also
a prerequisite in that their development and application provides the same
benefits for digital preservation as they do for any cooperative endeavour.
124 ‘Preserve Technology’ Approaches: Tried and Tested Methods
Do nothing
The do nothing approach need not detain us long. As noted earlier in this chap-
ter and in more detail in previous chapters, especially Chapter 1, it is not an
option for digital materials. Doing nothing reduces to zero, in a very short
time, the possibility of preserving digital materials. One familiar example is
failure to monitor and respond to the deterioration of digital media (for example,
a diskette or a hard drive) and the consequent inaccessibility of any data it
might carry.
Some of the reasons for media deterioration were noted in Chapter 3. This sec-
tion notes techniques that are currently recommended as good practice for the
storage and handling of media, in order to prolong the length of time that the
data on them remain accessible. Application of these techniques assumes that
the hardware and software required for these media are still available. (The
sections ‘Technology preservation’ and ‘Emulation’ later in this chapter con-
sider what is necessary to support this assumption.)
Paying attention to the care and handing of digital storage media is a suit-
able preservation strategy because it reduces risk. The risk of losing access to
digital objects stored on digital media can be reduced significantly, meaning
that although this strategy is short-term it is, nonetheless, worth committing
resources to. There are also cost factors to be considered; keeping the media
accessible for longer periods reduces the frequency with which media refresh-
ing or migration needs to occur and, therefore, the costs associated with these
activities. Storing digital media in appropriate environmental conditions (this
term refers to temperature, relative humidity, light and clean air levels) extends
the time the media will last. It also protects them to some extent against acci-
dental damage, as does appropriate handling. Of course, the need for attention
‘Non-solutions’ 125
to the storage and handling of digital media is not unique to digital storage
media; it is fundamental to the preservation of other materials as well.
The National Library of Australia’s 1999 Draft Research Agenda for the
Preservation of Physical Format Digital Publications lists ‘what we already
know’, based on experience at the National Library of Australia and other in-
stitutions. One of its statements is particularly relevant to this section – that
‘magnetic media such as floppy disks have a relatively short useable life span,
so we have to slow down the rate of deterioration and/or move what we want
to preserve to a more stable carrier’. The comment is made that ‘we know
quite a lot about the conditions of storage and handling that will maximise the
useful life’ of these media (National Library of Australia, 1999). Optimum
environmental conditions for the storage of digital media are well documented.
Figure 7.1 summarizes a range of recommendations for magnetic tape, CD-
ROM, and DVDs; conditions for other digital storage media are noted by
Brown (2008c).
DVDs these procedures could include statements about handling the disks by
their centre hole and outer edge, not touching the disc surface, returning them
to storage cases immediately after use, marking or labeling disks in a non-
harmful way, not attempting to bend the disks, and cleaning them as seldom as
possible, and then using approved methods only (Byers, 2003, pp.vi,19,25-26;
Iraci, 2010, p.4).
Figure 7.2 summarizes recommendations derived from several sources
about the storage and handling of digital storage media. It is not exhaustive.
Storage areas Keep free of smoke, dust, dirt and other contaminants
Store magnetic media away from strong magnetic fields
Keep cool, dry, dust-free, stable and secure
Minimize light levels, especially direct sunlight
Prohibit smoking and eating in the storage area
Minimize the threat from natural disasters
Provide appropriate enclosures for media to afford additional pro-
tection
Store in appropriate conditions any non-digital accompanying
materials such as operating instructions and codebooks
Monitoring Monitor environmental conditions on a regular basis
Monitor media condition on a systematic basis
Handling Handle with care
practices Minimize the handling and use of archival media
Establish guidelines and procedures for acclimatizing media if
moving them from significantly different storage conditions
Do not place adhesive labels on optical disks
Follow recommended guidelines for labeling and marking
Document the contents of the media, when created, and their
frequency of use
Quality of Choose digital media with their longevity in mind
media and Use high quality equipment
equipment Keep access devices well maintained and clean
Disasters Prepare for accidents
Figure 7.2: Storage and Handling of Digital Storage Media (From Digital Preservation
Coalition, 2008, pp.104,139; Howell, 2001, p.144; Iraci, 2010, p.4; Ross and
Gow, 1999, p.43)
Analogue backups
when we first started thinking about preserving web sites, and capturing them was at
that stage seen to be too difficult, I considered for a time just at least printing out all the
web pages that we could identify … on good colour printers and storing them as repre-
sentations of at least what was on the Web in 1996 … [but] I thought that was inappro-
priate as a form of preservation … because it had to be digital. Well in fact now I really
rue that decision because … we would have had at least some record of what was
happening in 1996 on the Web.
is applied to. For instance, there is a loss of functionality, such as the ability to
carry out calculations in spreadsheets, to search, or to make lossless copies.
Because of its resource requirements, it is only applicable to small-scale opera-
tions. For these reasons, making analogue copies is best considered as an interim
just-in-case strategy that is limited in its application to a narrow range digital
materials – those for which loss of functionality is not important. Case studies
of institutions where microfilm is part of an integrated preservation programme
are provided by Brown et al. (2011).
security environments and is routinely used in these fields and others, such as
counter-terrorism, to locate data on digital storage media and authenticate
them so they meet standards of admissibility for legal purposes. This clearly
has strong resonances with the requirements to preserve authenticity in digital
preservation settings. Digital forensic techniques are based on capturing disk
images that are exact copies of information on digital media, including hidden
files and files that record changes to the information. This allows capturing of
contextual information that is often vital to preserving the authenticity of digital
materials. The methods and tools developed and used by digital forensics experts
are being adapted and applied in cultural heritage environments. Digital foren-
sics laboratories have been established at some larger archives and libraries,
including the Stanford University Libraries’ digital forensics laboratory (lib.
stanford.edu/digital-forensics) and BEAM (Bodleian Electronic Archives and
Manuscripts) (www.bodleian.ox.ac.uk/beam). The FIDO (Forensic Investigation
of Digital Objects) project (fido.cerch.kcl.ac.uk) is investigating the applica-
tion of digital forensics to archives in the university sector. Digital forensics
techniques are likely to become a standard part of the digital preservation toolkit.
Kirschenbaum, Ovenden and Redwine (2010) provide an excellent introduction
to the application of digital forensics to cultural heritage materials in digital
form.
Technology preservation
it is the most basic, and in some ways the most important first step in preserving access
if no other strategy is in place. If the hardware and software required for access are dis-
carded before other strategies are available, it may be effectively impossible to provide
later access without expensive and uncertain data recovery work (UNESCO, 2003,
p.131).
Technology watch
Emulation
The battle lines were drawn in the 1990s between migration and emulation as
the preservation strategy most likely to succeed. In the event neither has domi-
nated, as we learn to place less trust in a single-strategy salvation and to develop
ways of working and thinking that accommodate several approaches simulta-
neously.
Emulation is based on the principle that it is possible to imitate obsolete sys-
tems on current systems using software ‘that makes one technology behave as
another’ (UNESCO, 2003, p.140). For example, an obsolete operating system
can be imitated so that digital objects are rendered and programs executed
without any change to them on a current computer. This is instead of either
changing the digital objects so they can be read on a current system or locating
a working version of the obsolete operating system and the hardware on which
it runs. Emulation reduces the need to keep old hardware working.
Emulation is a well-established principle in the computing industry and
will be familiar to most computer users. Emulation can be of operating systems:
for example, Windows Virtual PC, which allows older version of the Windows
operating system to be run in Windows 7 (www.microsoft.com/windows/virtual-
pc) and VMWare Fusion (www.vmware.com/products/ fusion), which allows
Windows to be run on an Apple computer running the Mac OSX operating
system. It can be emulation of hardware platforms: for example, Kaypro or
Apple II machines can be emulated on a PC; or of the computer chips (visual
6502.org). It can be emulation of software applications: for example, arcade
games emulated through the MAME project (mamedev.org). Printers are often
designed to emulate Hewlett-Packard printers. Terminal emulation is also
common, so that a PC, for example, can be used as a terminal connected to a
larger computer; these were once very common in mainframe computer
environments. Although emulation is often associated with computer games,
such as emulators that allow Sony PlayStation games to be played on a PC, it
‘Preserve technology’ approaches 135
is an essential part of all areas of computing. The web is a fruitful source for of
information about emulators.
Both hardware emulation and software emulation have been experimented
with to determine their feasibility for preserving digital materials. Emulation
of hardware is attractive because the result is applicable to a wide range and
large volume of digital objects. The same widespread applicability applies to
emulation of operating systems. Emulation of software applications is less
favoured, because it is more limited in its use and the effort and level of skill
required to develop a complex piece of emulator software that can be used for
only a small number of digital materials may be too high for it to be an option
in most preservation situations.
Emulation has been investigated in several major projects, most notably
the CAMiLEON (Creative Archiving at Michigan and Leeds) project which
investigated emulation, testing available emulators and constructing an emula-
tor for the BBC Domesday Project. Among its conclusions was that ‘emulation
is not necessarily superior to migration for preserving the original look and
feel of complex digital objects’. (The project’s web site (www2.si.umich.edu/
CAMILEON) provides more information.) However, more research was
needed, as this was a study of limited scope (Hedstrom and Lampe, 2001).
Project NEDLIB (Networked European Deposit Library) ran from 1998 to 2000
and was based at the Koninklijke Bibliotheek in The Hague. One of its activities
was to conduct an experiment using commercial emulation tools to investigate
the feasibility of emulation for digital preservation. This experiment, con-
ducted by Rothenberg, concluded that emulation should work in principle, but
that further investigation was required to demonstrate that it can also work in
practice (Rothenberg, 2000).
Jeff Rothenberg’s name is firmly linked with emulation as a digital preser-
vation strategy. He has been a strong proponent of emulation as the only digital
strategy that is likely to be effective. His argument is that authenticity of a
digital object can only be ensured if it is run on the software with which it was
created, because it is unrealistic to expect software in the future to behave in
exactly the same way as the original. Rothenberg points out that all digital
materials depend on software and that many new kinds of digital materials are
‘inherently digital’ and ‘cannot be meaningfully represented as page images’
(Rothenberg, 2003). This means that preservation strategies such as saving
page images are of little use for the preservation of much digital material.
Emulation, he contends, is the only preservation approach that has multiple
advantages and capabilities, among them preserving executable digital objects
(objects in which software programs are embedded), providing a ‘single, con-
sistent way’ of preserving all kinds of digital materials, reducing the effort ex-
pended in preserving individual artifacts (except for the necessary effort of
copying the bit-stream onto new media), and minimizing the need to under-
stand record formats. Despite his strong advocacy of emulation, Rothenberg
136 ‘Preserve Technology’ Approaches: Tried and Tested Methods
suggests that a mixed strategy approach is most feasible, with emulation used
‘if original behavior is needed; or as a cheap backup, to preserve everything’
(Rothenberg, 2003).
Other authorities also suggest that emulation has potential. It is one of only
two primary strategies (those suitable for medium to long-term preservation of
digital materials) noted, the other being migration, in the Digital Preservation
Coalition’s handbook (2008, pp.111-114). This source suggests that it has the
advantage over other methods of recreating the look and feel and functionality
of the original digital material, as well as the potential for avoiding the high
costs associated with repeated migration. Emulation is judged to have good
prospects for preserving complex digital materials (Digital Preservation Coali-
tion, 2008, p.113). The UNESCO Guidelines note that emulation is already
well established and understood in computing, that many emulators already ex-
ist for a variety of hardware and software platforms, and that it has the potential
to ‘allow a range of digital objects to be recreated with full functionality, in-
cluding software objects, using the original, untransformed data stream in
combination with original preserved software’ (UNESCO, 2003, p.141).
Not all share Rothenberg’s enthusiasm. The arguments against emulation
form a list as long as the arguments in its favour (Digital Preservation Coalition,
2008, p.114; UNESCO, 2003, p.141). Chief among them is that emulation has
not been sufficiently tested in practice. Also high on the list of disadvantages is
the high cost of developing emulators, which may be greater than the costs of
repeated migration, because it requires high levels of expertise to write com-
plex software. There is some scepticism about the ability of emulation to do all
that is claimed, and it may not be possible to emulate fully all of the functionality
of the original, nor all of its look and feel. The lack of adequate documenta-
tion of hardware and software may frustrate emulation attempts. Copyright
issues associated with ownership of software code may impede emulator devel-
opment. Users may have problems in interacting with old applications, and
there may be a need to either migrate the emulators themselves or emulate
the emulators.
If emulation is to be applied, then the requirements are many and varied.
Appropriate expertise is, of course, essential. Documentation of the systems to
be emulated needs to be comprehensive and accurate. The emulation software
should be written in open-source code and should follow best industry prac-
tice, which includes thorough documentation.
Probably the most widely reported emulation project carried out for preser-
vation purposes to date is the BBC Domesday Project. Abbott (2003), Darling-
ton, Finney and Pearce (2003), Mellor (2003) and Domesday Reloaded (2011)
are just some of the reports on this project. The CAMiLEON web site (www2.
si.umich.edu/CAMILEON/domesday/domesday.html) is also a useful source.
The original Domesday Project, undertaken from 1984 to 1986, surveyed the
UK to celebrate the 900th birthday of the Domesday Book. It cost about ǧ2.5
‘Preserve technology’ approaches 137
million and involved about one million school children from 14,000 British
schools. The resulting images and text were recorded on two 12-inch video-
disks that were accessed using a LV-ROM (LaserVision Read Only Memory)
player attached to a BBC Master computer with additional software and hard-
ware (Abbott, 2003, p.7). As part of its contribution to the project to preserve
the Domesday Project, the CAMiLEON Project developed an emulation of the
original Domesday system hardware.
The process of developing this emulation involved migrating the data files
from the videodisks to current media and developing software that emulates
the BBC Master computer and the laserdisk player (Mellor, 2003). Image files
were re-digitized from the original one-inch analogue videotapes (Darlington,
Finney and Pearce, 2003). The need to avoid the obsolescence of the emulation
software was kept in mind; ideally, the operation of the software should not be
limited to any specific operating system or type of computer so that it will be
easier to run on future computers. In the end, however, the BBC Domesday
emulator that was developed ran only on the Windows operating system (Mellor,
2003, p.8). A separate initiative resulted in a web version that was available
until 2008 (domesday1986.com). In 2011 the BBC made available a full extrac-
tion of the community disk on the Domesday Reloaded web site (www.bbc.co.
uk/history/domesday). A second disk, the national disk, is not available, but
has been handed over to The National Archives for long-term preservation.
Significant lessons were learned from the BBC Domesday Project. This
emulation project was hampered by lack of documentation and software to test
the emulation, but was fortunate in that the original system was still available
and functioning. This allowed the developers to ‘compare with and validate the
migrated system’, which has special significance in a multimedia system ‘where
the look-and-feel and user interaction is important’ (Darlington, Finney and
Pearce, 2003). Wheatley, one of the team who worked on this project, summa-
rizes the issues:
Most of the really difficult problems we faced were due to the long time gap between the
creation of Domesday and its preservation. If we had conducted the rescue 10 years
earlier it would have been far easier. The timeliness of preservation work is a crucial
issue that Domesday really underlines. Would we be able to rescue Domesday if we left it
another 10 years? I’m sure we could, but it would be at far greater expense (Abbott, 2003,
p.10).
and portable, running on any computer platform that supports the Java Virtual
Machine.
There is still a need to allocate significant resources, probably through
collaborative action, to develop a range of emulators. Emulation is, however,
unlikely to be the single solution to digital preservation as has been advocated.
Nor is it likely to replace migration as a primary digital preservation strategy,
because emulators will themselves need to be migrated. Holdsworth and
Wheatley (2001) remind us that emulation
should not be over-sold as the answer to all digital preservation issues. It is just part of
the armoury necessary for defending our digital heritage against the ravages of time in
a world where innovation (and hence change) is highly prized.
There are plentiful signs that emulation will be used increasingly as more emu-
lators are developed for specific categories of digital materials, such as com-
plex digital materials, or for digital objects containing executable software that
will only run on particular hardware, or for digital materials that need to be
viewed in their original environment. Its use at the Nationaal Archief of the
Netherlands is described in a recent case study (Helwig, Roberts and Nimmo,
2010).
The simplicity of the Universal Virtual Computer ensures that it will be ‘rela-
tively easy to write an emulator for [it] in the future on the real machine being
used at that time’ (Lorie, 2002, p.1). The Universal Virtual Computer was
developed by the Koninklijke Bibliotheek and IBM who have implemented a
UVC tool that handles images in JPEG and GIF formats (Van Wijngaarden and
Oltmans, 2004). Van Der Hoeven, Van Diessen and Van der Meer (2005) note
the tool and its operation at the Koninklijke Bibliotheek in more detail.
Conclusion 139
Conclusion
This chapter has described a number of strategies and practices currently applied
or being tested for digital preservation. They are all focused on the principle
that alteration of digital materials must be kept to a minimum and that the
technology (hardware and software) to access these materials is kept operational
or is emulated. Of the range of strategies and practices noted here, some are
interim measures and only one – emulation – is generally considered to be viable
for the long-term preservation of digital materials. The next chapter considers a
range of approaches from the other end of the preservation spectrum: ‘preserve
objects’.
Chapter 8
‘Preserve Objects’ Approaches: New Frontiers?
Introduction
Handling file formats is an essential part of a long-
term archive … most file formats fall out of fashion
within a few decades, and unless action is taken at an
early stage, many archived files will be incomprehen-
sible blocks of bits (Clausen, 2004)
The previous chapter and this chapter are structured around three categories of
preservation strategies and practices, derived from typologies of Thibodeau
and Rothenberg:
1. ‘Non-solutions’
2. ‘Preserve Technology’ approaches
3. ‘Preserve Objects’ approaches.
This chapter examines the third category, the ‘Preserve Objects’ approaches
which are at the opposite end of Thibodeau’s spectrum of digital preservation
methods from ‘Preserve Technology’ approaches. Like Chapter 7, it relies heav-
ily on two key sources: Preservation Management of Digital Materials: The
Handbook (Digital Preservation Coalition, 2008) and the UNESCO Guidelines
for the Preservation of Digital Heritage (UNESCO, 2003).
lation and restricting the range of formats to be managed fall into this category.
Other strategies noted in this chapter (backwards compatibility and version
migration, and migration) are short-term strategies, those ‘likely to work best
over the short-term only’. Two of the strategies and practices included in this
chapter, migration and using standards, especially standards for data structuring,
are ‘current front-runners as long-term strategies’ that have been shown to work
over time (UNESCO, 2003, pp.120-121).
Bit-stream copying
Refreshing
Whereas bit-stream copying is carried out so that a backup copy exists in case
there is a problem with the primary source files, refreshing takes a longer-term
Standard data formats 143
view. It refers to copying data from one storage medium to another of the same
type, for example, from a DAT tape that is becoming unstable to a new DAT
tape. The bit-stream is not altered and well-established techniques, such as
check-sums, are applied to ensure that the data are copied accurately. As with
bit-stream copying, refreshing is ‘a necessary component of any successful digi-
tal preservation program’ but does not address issues of obsolescence (Kenney
et al., 2003).
Replication
they are the property of an owner who, for commercial reasons, is not willing
to provide access to documentation about them, and who may require a fee to
be paid for their use. Because one of the essential requirements of nearly all
digital preservation strategies and techniques is a thorough understanding of
file formats, the lack of access to full documentation about proprietary formats
presents a major barrier. By comparison, the documentation for open formats,
those that are in the public domain, is much more accessible. Consequently,
open formats are considered more favourably for use in digital preservation
applications than proprietary formats. To illustrate this point, consider the num-
ber of text document formats that the word-processing software in Open Office
(an open-source office software suite) can open: in addition to the OpenDocument
formats (.odt, .ott, .oth, and .odm), it opens formats used by earlier versions of
Open Office (.sxw, .stw, and .sxg), as well as the following:
Figure 8.1 compares open and proprietary formats from the perspective of their
use in digital preservation.
Figure 8.1: Formats – Open and Proprietary (Source: PARBICA Toolkit Guideline 18:
Digital Preservation)
Four main responses to the issues posed by the proliferation and complexity of
file formats can be identified: file format registries; standardizing file formats;
restricting the range of file formats handled in digital preservation systems;
and developing archival file formats such as PDF/A.
One response to the issues presented by file formats is the establishment of file
format registries. These exist in the computer science arena and have some
applicability for digital preservation; one widely used example is the UNIX
file command in the UNIX operating system (Underwood, 2009). Specialized
registries have also been established for digital preservation purposes. They have
been developed to provide detailed and reliable information about file formats,
which are the key to many digital preservation activities. In a case study of file
formats that is still pertinent, Lawrence and his associates note that the risks
associated with migration, such as the differences between the source and tar-
get formats, are quantifiable, so they can be managed (Lawrence et al., 2000,
pp.12-13).
The kinds of file format problems that can arise as a result of migration are
illustrated in a report on missing fonts (Brown and Woods, 2011). Some soft-
ware performs font substitution without warning; Brown and Woods begin
their account with the common example of ‘PowerPoint presentations where
the slides were clearly missing glyphs (visible characters) or were otherwise
poorly rendered … the direct result of copying the presentation from the machine
upon which it was created to a machine provided for the presentation without
ensuring that the target machine has the required fonts’ (Brown and Woods,
2011, p.6). They estimate that only 79 per cent of digital documents are rendered
accurately using the fonts on a modern desktop computer; even if this can be
148 ‘Preserve Objects’ Approaches: New Frontiers?
increased to 92 per cent without too much additional effort, that still leaves a
significant percentage that does not render accurately.
Risks of this kind can be minimized if specifications of file formats are
available. The experiments reported by Lawrence and his associates found that
‘the most difficult aspect of this project was the acquisition of complete and
reliable file format specifications’ (Lawrence et al., 2000, p.13). They concluded
that publicly available information about file formats is vulnerable because it
relies on the efforts of interested individuals, so noted that it was essential to
establish ‘reliable, sustained repositories of file format specifications, docu-
mentation, and related software’ to support migration (Lawrence et al., 2000,
p.13). The Representation and Rendering Project at the University of Leeds
reached the same conclusions; accessible file format information fell ‘far short
of what is required to successfully tackle the problems of data obsolescence’
and the accuracy of most of what was available is ‘mediocre at best’ (Univer-
sity of Leeds, 2003, p.42). It specifically recommended that, because file for-
mat information available on the web is vulnerable, it should, as a matter of
urgency, be captured and preserved in a file format registry.
There are now a number of file format registries for digital preservation
and work to develop them continues. They include PRONOM, JHOVE and
JHOVE2, the Global Digital Formats Registry and the Unified Digital Formats
Registry, the DCC/CASPAR Representation Information Registry, and a listing
provided by the Library of Congress.
PRONOM is one file format registry set up specifically to support digital
preservation. It was developed by The National Archives (UK) to provide and
manage information about file formats and about the software applications used
to render these formats. PRONOM is a response to the short life cycle of soft-
ware and to the fact that older file formats are not necessarily supported by later
versions of software, or, if they are, only with alteration of formatting or content,
a situation inimical to the faithful reproduction of electronic records. Originally
developed as an in-house tool to support the National Archives’ digital preserva-
tion activities, it was quickly made publicly accessible on the web. PRONOM
makes publicly available the specifications of file formats and their related soft-
ware, and allows searching for file format, software, vendor, and migration
pathways among other options. Associated with PRONOM is the software tool
DROID (Digital Record Object Identification). DROID was developed by The
National Archives (UK) so that searching PRONOM for file formats could be
automated. It is open-source and, because it is written in Java, is platform-
independent. PRONOM is being developed further to support a wider range of
digital preservation processes. Its database of file formats and software is being
incorporated into the UDFR (Unified Digital Formats Registry). (This section is
based on the PRONOM web site (www.nationalarchives.gov.uk/pronom).)
JHOVE (JSTOR/Harvard Object Validation Environment) was developed
by JSTOR (www.jstor.org) and the Harvard University Library as a characteri-
Standard data formats 149
zation framework that provides information about and confirms file formats as
being valid and well formed. It is written in Java, so it can run on a range of
operating systems that support Java. JHOVE2 (https://bitbucket.org/jhove2/main/
wiki/Home), developed by the California Digital Library, Portico and Stanford
University, and based on input from the international digital preservation
community, enhances JHOVE and expands the range of formats it can iden-
tify (Abrams, Morrissey and Cramer, 2009).
Another file format registry developed to support digital preservation is the
GDFR (Global Digital Format Registry) set up by US partners to be ‘a distrib-
uted and replicated registry of format information populated and vetted by
experts and enthusiasts world-wide’ (www.gdfr.info). The GDFR web site pro-
vides use cases that identify some potential situations where a format repository
would be used. These include assessing the risk associated with a digital for-
mat, collection audit, validating the ingest of a digital object with a new for-
mat, monitoring for technology obsolescence, identifying rendering conditions
for a digital object, determining the appropriate migration path for a digital
object, determining the format of a given digital object, and identifying the
migration pathway for a format. The GDFR is a partner in the UDFR (Unified
Digital Formats Registry), established in 2009 in conjunction with The National
Archives (UK) and other international partners. The UDFR will be based on
the PRONOM database of file formats and the use cases developed for the
GDFR. Expected to be operational in 2012, the UDFR will provide web access
to its content, an API (an application programming interface – software that
allows different software programs to communicate and interact with each
other) to allow interaction between the Registry and local repositories, the ability
to work with DROID, and other features (www.udfr.org). A use case for the
UDFR, explained by the Library of Congress’s Chief of Repository Develop-
ment, illustrates how it is envisaged to operate:
‘Say the archive of a famous writer was written with an obsolete program, such as
WordStar, which would need to be either rendered for use, or migrated to a more cur-
rent system.’ So, a decision would have to be made on which program or tool would be
used to extract the information from the archive. … ‘UDFR will provide the documen-
tation to help make the decisions, and be incorporated into the tools themselves to
make preservation format analysis and action easier’ (Manus, 2011b).
If ‘good’ file formats are used for creating digital materials, the difficulties of
preserving these digital materials will be minimized. This is the thinking behind
the digital preservation approach centered on encouraging document creation
in file formats that are most likely to be sustainable over the long term. Investi-
gations into the effectiveness of different formats for this purpose are ongoing.
One example of earlier research comes from the Nationaal Archief of the
Netherlands. In its search for ways to ensure sustained accessibility to authentic
archival records in the long term the Digital Preservation Testbed at the
Nationaal Archief of the Netherlands investigated the sustainability of different
record types (text documents, emails, spreadsheets, databases) in different file
formats – MS Word or WordPerfect for text documents, Outlook, Eudora,
Novell GroupWise, Hotmail, and Kmail for emails, MS Excel and Lotus for
spreadsheets, and MS Access and Oracle for databases. Together these account
for more than 90 per cent of Dutch Government records (Slats, 2004).
What are the characteristics of ‘good’ file formats? They are the formats
that are most likely to be viable for longer periods, and are most likely to be
open-source and widely available and supported. Many have written about
this, including Clausen, who tells us that most file formats ‘fall out of fashion
within a few decades’ (Clausen, 2004, p.23) and suggests criteria by which we
can predict the ongoing viability of a file format: openness criteria (for example,
whether there is an open, publicly available specification for the format); port-
ability criteria (for example, independent of hardware and operating systems);
and quality criteria (for example, robustness, simplicity) (Clausen, 2004,
pp.11-12). Another list of criteria is provided by The National Archives (UK)
(Brown, 2008a): ubiquity (how widely adopted); support (amount of current
software support); disclosure (availability of documentation); documentation
quality (how comprehensive, accurate, and understandable); stability (frequency
of change); ease of identification and validation (ability to accurately identify);
Standard data formats 151
intellectual property rights (the fewer the better); metadata support (can
metadata be included); complexity (not overspecified); interoperability; viability
(has built-in error detection capability), and; re-usability.
There is consensus in the literature on five main criteria:
important loss for some kinds of materials, it will be significant for others. In
this case another file format that retains formatting, such as PDF, could be
considered.
A point in favour of the strategy of requiring standard file formats is that it
is likely to slow the rate at which file formats become obsolete. If widely used
and supported file formats, or very basic formats such as ASCII, are selected,
then the software available to encode, decode and render these formats is
likely to be available for longer periods of time than those for less popular
formats. This strategy is not suitable for digital objects where selecting a dif-
ferent file format results in a loss of characteristics that are essential for under-
standing the object (UNESCO, 2003, p.123).
A strategy related to restricting the range of file formats in which files are
created is for the digital archive to restrict the range of file formats that it
agrees to receive and manage. The archive converts material it receives into
these formats, which are chosen for the extent to which they address require-
ments such as openness, portability, functionality, longevity, and preservability.
This practice has a long history in digital archives. Examples frequently cited
are the Florida Digital Archive, which indicates the formats to which it will
provide full preservation support (Florida Digital Archive, 2009), the UK Data
Archive, which notes optimal data formats for long-term preservation (UK
Data Archive, 2011?), and ICPSR (the Inter-University Consortium for Political
and Social Research archive), based at the University of Michigan in the US,
where digital files deposited are converted from their original format to a format
that has been determined by ICPSR to be a preservation format, and both the
original file and the normalized file are retained (ICPSR, 2007). The Wellcome
Library adds further criteria that take account of that library’s technical ability
to manage the formats over time (Thompson, 2010).
This strategy is most suitable where the digital archive is responsible for
digital materials that are uniform in structure and content. The archive may be
able to influence the creation of these materials. For example, a government
archives may specify and encourage, or require, that records created by its
agencies are in the standard formats and that it will only accept records in those
formats. Requirements for the successful operation of this strategy include
clear submission guidelines, effective data conversion processes (if data conver-
sion is needed), and quality control checking to a high standard (UNESCO,
2003, pp.128-129). Clausen proposes this strategy for web archiving. He
recommends that the file formats harvested and added to the web archive
should be constantly monitored so that action can be taken when new formats
become widespread and old formats fall out of favour. He also suggests that
Standard data formats 153
the files received should be retained in their original format as well as in the
converted and migrated versions ‘to allow for higher-quality conversion or
emulation at a later stage’ (Clausen, 2004, pp.23-24).
The use of stable, open file formats has the same advantages as those noted
in the previous section. The file formats selected will adhere to the characteris-
tics of openness, portability, functionality, longevity, and preservability. In
particular, the strategy has the potential to use the resources available for digital
preservation in the most efficient way to ensure that resource requirements,
such as specific software and expertise, are minimized and that the need for
customized attention during migration processes is reduced. It is most suitable
for materials for which retention of content is essential but formatting and
other characteristics are less significant. While this strategy ‘reduces the range
of problems needing to be managed’ (UNESCO, 2003, p.128), it needs to be
flexible enough to respond to the emergence of new formats.
needed to display a document must be present in the file, and cannot refer
to external sources (for example, fonts). This means that many document
characteristics, such as audio content, video content, JavaScript, executable
files, and encyption must not be present. The PDF/A standard also mandates
the use of metadata to specified standards. Its viability for preservation was
examined at Ohio State University’s archives and library (Noonan, McCrory
and Black, 2010). Because it is still relatively new, PDF/A has not yet been
widely adopted, but it has the potential to be infuential.
Another example is the image format JPEG 2000. It is being adopted by
cultural heritage institutions to replace uncompressed TIFF, but modifications
to its current specifications are needed before it should be widely adopted (van
der Knijff, 2011). An international community, including the JP2K-UK work-
ing group (wiki.opf-labs.org/display/JP2/Home), is collaborating on developing
JPEG 2000 to make it more effective for use in the heritage sector.
XML
XML (eXtensible Markup Language) is now considered as an important archi-
val data format. XML is non-proprietary, being based on ISO 8879-1986 (a
standard for SGML, the Standard Generalized Markup Language) and is con-
sidered to promise long-term sustainability. XML is not, strictly speaking, a
file format; rather, it is a set of rules to describe data and documents, a mark-
up language in the same sense that HTML, widely used for web documents, is.
XML was specifically designed to be used independently of any hardware
platform and is supported by many open software applications.
A principal reason for XML’s favourable consideration for digital preser-
vation, in addition to its openness, is its stability and longevity. It has been
used since 1996, so there is a considerable amount of knowledge about it in the
IT industry, and it is well supported by open-source software applications. The
National Archives of Australia adopted XML for digital preservation because
‘even if the IT industry replaces XML with another data format in the future,
we will still be able to create our own XML tools for as long as we wish be-
cause all the information needed to construct XML tools is publicly available’
(Heslop, Davis and Wilson, 2002, p.18). The National Archives of Australia
converts digital records in proprietary data formats into equivalent formats in
XML. It has defined XML schema for common formats (for instance JPEG
and PNG) and these are available on the National Archives web site. It has
developed open-source tools that convert digital materials in some proprietary
formats into XML. One of these tools is Xena, described in the ‘Encapsulation’
section later in this chapter.
XML is used in digital preservation contexts in three main ways: for
characterization purposes, to express digital objects and/or associated data (such
XML 155
Migration
We first need to establish what migration refers to in the context of digital
preservation. Is it format migration, software migration, or version migration?
In its simplest definition migration is ‘to copy data, or convert data, from one
technology to another, whether hardware or software, preserving the essential
characteristics of the data’ (Kenney et al., 2003). A commonly-encountered
longer definition is that migration is ‘a set of organized tasks designed to
achieve the periodic transfer of digital materials from one hardware/software
configuration to another, or from one generation of computer technology to a
subsequent generation’. (The first use of this definition appears to have been in
the report of the Task Force on Archiving of Digital Information (1996).)
Another definition describes migration as ‘a means of overcoming technologi-
cal obsolescence by transferring digital resources from one hardware/software
generation to the next’, the purpose being ‘to preserve the intellectual content
of digital objects and to retain the ability for clients to retrieve, display, and
otherwise use them in the face of constantly changing technology’ (Digital
Preservation Coalition, 2008, p.26). Migration, then, addresses the problems
caused by obsolescence of technology (hardware and software) and file for-
mats so that the intellectual content of the digital objects migrated is preserved
and the ability for users to retrieve, display, and use them is retained.
The term migration is sometimes used in the same sense as refreshing
(defined earlier as ‘the copying of data onto new media’), but it is considerably
more than that. Refreshing is carried out because of concerns about obsoles-
cence of the physical carrier of data; migration is additionally concerned with
obsolescence of data formats and attempts to ensure that file formats remain
readable. Migration is included in this chapter, rather than in Chapter 7, because
it is based on ensuring that the digital materials remain accessible in whatever
technology is current. Migration places a high premium on preserving the
integrity of digital objects so that they remain unchanged when used and ren-
dered, but it must also be recognized that migration inevitably changes the
data that are migrated. This raises concerns about the authenticity of migrated
data.
There are many varieties of migration. Thibodeau, for example, notes simple
version migration, format standardization, typed object model conversion,
Rosetta Stones translation, and object interchange format (Thibodeau, 2002,
pp.23-24). The most commonly used migration process is simple version migra-
tion – migration within the same software product. Because commercial soft-
ware developers provide this kind of migration there is always the likelihood
that they will discontinue support when they update their products. An exam-
ple is Microsoft Word, where the most recent version works with earlier ver-
sions of Word, although the number of versions for which backwards compati-
bility is possible is limited by the manufacturer. Another migration process is
Migration 157
Wheatley suggests that the minimum migration approach has the advantages of
simplicity and ease of execution, and, although all functionality is lost, it pro-
vides a ‘cheap and reliable way of getting to a substantial amount of the intel-
lectual content’. It has potential for application as a ‘useful stop gap measure’
as long as the original bit-stream and documentation are also maintained.
No matter how it is defined, migration has been, with emulation, the prin-
cipal preservation technique applied to date. It has a long history and was noted
by one writer as ‘the only serious candidate thus far for preservation of large
scale archives’ (Granger, 2000). There is considerable expertise in migration
on the part of data administrators and computer centre administrators.
The literature of digital preservation includes some substantial studies of
migration. An early example is the report Preserving the Whole: A Two-Track
Approach to Rescuing Social Science Data and Metadata (Green, Dionne and
Dennis, 1999), a detailed case study of preserving data in obsolete column
binary format, and its associated documentation. This case study is based on
the experience of the data archiving community, in particular that of Yale Uni-
versity Library’s management of social science numeric data since 1972.
It identified that the best strategy for data in column binary format was to
convert them to ASCII, because this format is software-independent and also
preserved the original content. This approach did, however, require ‘a file-by-
file migration strategy’ that is time-consuming for large numbers of files
(Green, Dionne and Dennis, 1999, p.24). One conclusion of this study was that
158 ‘Preserve Objects’ Approaches: New Frontiers?
the existence of detailed documentation about the obsolete column binary for-
mat meant that there were many options available to migrate this format to
others and also to read data in it (Green, Dionne and Dennis, 1999, p.[vi]). A
Dutch study of migration investigated the issues of large-scale migration, in
the order of 100 terabytes (Van Diessen and Van Rijnsoever, 2002). The quan-
tity of data posed major resource issues; writing them to optical storage media
with a capacity of about 5 gigabytes and a write speed of 4 megabytes per sec-
ond would take at least 290 days (Van Diessen and Van Rijnsoever, 2002, p.v).
This report developed ‘medium migration indicators’ which, when applied in
conjunction with ‘changes in the system architecture or load characteristics’,
can be used to ascertain when migration needs to be carried out. The report
also notes the actions required to manage media migration effectively in an
electronic deposit system (Van Diessen and Van Rijnsoever, 2002).
Migration is frequently used together with normalization (restricting the
number of file formats). The preservation processes implemented at ICPSR,
referred to above, applies migration and normalization together as its primary
digital preservation strategies. Normalization limits the range of file formats to
be preserved ‘to ensure that the digital preservation load is manageable’ and
that file formats are in the most recent rather than obsolete versions. Related
material, such as data files and documentation files, are converted to file for-
mats ‘that are as close as possible to ASCII for text, or TIFF for images, to
enable preservation’. Digital content is regularly copied from older to newer
storage media. Systematically applied policies are in place to govern the migra-
tion processes (McGovern, 2007b). The Florida Digital Archive also uses nor-
malization and migration together in a migration strategy that aims to be ‘as
lossless as possible, and retain the content, appearance, and behaviors of the
source’ (Caplan, 2007, p.307). File formats handled though migration, selected
based on the assumption that most files from participating libraries will be
documentary files of text, image, or sound, as opposed to executable software,
are AIFF, AVI, JPEG, JP2, JPX, PDF, TIFF, WAVE, XML, XML DTD, Quick
Time, and plain text. Migration workflows and routines have been developed
for these formats. Files in other formats for which no migration routine has
been developed are preserved as bit-streams and may be migrated in the future
if the archive receives a request for access to them. The development of migra-
tion routines for new file formats requires considerable resources, so files
taken in by the archive are normalized to a file format that the archive does
support. For example, PDFs are converted to page image TIFFs during the
ingest process.
Migration, as suggested above, is the preferred preservation strategy for most
digital archives. It is favoured for the reasons already noted – its long history
of use, and our experience with it, at least for some kinds of materials and
formats. Key criteria to consider when choosing an appropriate digital preser-
vation method or combination of methods are the nature of the data and digital
Migration 159
Encapsulation
Encapsulation has been on the digital preservation agenda for over two decades.
Encapsulation is a technique of ‘grouping together a digital resource and what-
ever is necessary to maintain access to it ... [including] metadata, software
viewers, and discrete files forming the digital resource’ (Digital Preservation
Coalition, 2008, p.117). In some cases, rather than including the actual soft-
ware that reads the data in the package, metadata that points to the software at
another location or to the software’s specifications is included. Encapsulation
is widely used; for example data, metadata and software are often wrapped in
XML metadata. This is the approach that has been taken in the VERS Project
at the Public Records Office of Victoria, Australia (210.8.122.120/vers).
VERS accepts records in a range of formats, including text files, PDF, PDF-A,
JPEG, TIFF and MPEG, which are encapsulated in an XML wrapper with
metadata developed according to defined standards and authenticated using a
digital signature. The OAIS Reference Model’s Information Packages, described
in Chapter 5, are another example of encapsulation.
The advantages that the Digital Preservation Coalition’s Handbook notes
as being offered by encapsulation as a digital preservation strategy lie in the
way it groups together relevant information, so that ‘all supporting information
required for access is maintained as one entity’ (Digital Preservation Coalition,
2008, p.117). There is some skepticism, however, in the statement that ‘osten-
sibly, the grouping process lessens the likelihood that any critical component
necessary to decode and render a digital object will be lost’ (Kenney et al.,
2003). In theory, everything that is necessary to access the digital object, along
Storage 163
with the digital object itself, is maintained into the future, but this is perhaps a
risky approach to take because ‘the only test of an encapsulated specification is
at the point it is used to implement a rendering tool. The risk of missing vital
information in the specification seems to invalidate this approach’ (University
of Leeds, 2003, p.39).
Encapsulation by itself is not considered a viable digital preservation tech-
nique because it ‘does not really address the basic problem of technological
change’ (UNESCO, 2003, p.127). The software encapsulated will still become
obsolete. It is a key part of emulation, and is best considered as a strategy that
is either a part of or a prerequisite for other digital preservation approaches.
The UNESCO Guidelines suggest that encapsulation is best considered as ‘a
basic good practice for all objects’, one that ‘may facilitate other strategies’
(UNESCO, 2003, p.128).
Encapsulation is one of the functions performed in Xena (XML Electronic
Normalizing for Archives; xena.sourceforge.net), open-source software devel-
oped by the National Archives of Australia for its digital preservation pro-
gramme. Xena detects the file format of a digital object, normalizes that format
to an open format, provides some metadata, and wraps all this in an XML
wrapper to produce a file with the .Xena extension. Xena is written in Java and
runs on the Linux, Windows, and Mac OSX operating systems.
Storage
All digital preservation activities require reliable techniques for storing large
quantities of data. ‘Storing digital objects leads, if nothing else, to large byte-
counts’, noted Linden and his associates, pointing out in 2005 that individuals
routinely dealt with gigabytes of data for their personal archiving requirements
(Linden et al., 2005, p.3). Just five years later these quantities can be measured
in terabytes. Although digital storage systems can be considered as a basic part
of computer system management and, therefore, outside the scope of this book,
they are included here because one of the fundamental routines required by
digital preservation is the secure storage of large amounts of data. Concepts
and practices such as ensuring security of computer systems, redundancy,
RAID, networked storage and virtual storage are all part of the knowledge re-
quired to develop and maintain digital storage systems that will provide secure,
long-term storage of both the digital materials and associated metadata.
A storage system for digital preservation can be simple or very complex,
depending on what is required of it. At its most basic, it is little more than a
storage and backup system. At its most complex, it is a fully automated system
with sophisticated, robust safeguards built in. Typically it requires a high
financial outlay on hardware and software, and ongoing financial investment
in running, upgrading, and maintenance. Although the costs of hardware and
164 ‘Preserve Objects’ Approaches: New Frontiers?
– The more copies, the safer. As the size of the data increases, the per-copy cost in-
creases, reducing the number of backup copies that can be afforded.
– The more independent the copies, the safer. As the size of the data increases, there
are fewer affordable storage technologies. Thus, the number of copies in the same
storage technology increases, decreasing the average level of independence.
– The more frequently the copies are audited, the safer. As the size of the data increases,
the time and cost needed for each audit to detect and repair damage increases, reduc-
ing their frequency (Rosenthal, 2010d).
Commercial data archiving services are available and have become big business.
They offer state-of-the-art data security and fully automated backup and restore
procedures. Some of them are specifically tailored to digital preservation; one
Combining principles, strategies, and practices 165
viewers and migration (of data and metadata encoded in XML) are combined
in the VERS strategy.
The concept of micro-services has been actively investigated for its rele-
vance to digital curation and services are being developed based on the con-
cept. This approach to developing infrastructure for digital preservation is based
on combining small services that perform a single preservation function into
one cohesive service. Each small service is self-contained and can be developed,
maintained and improved more easily than larger single software applications.
The small services can be combined to provide comprehensive preservation
environments. The micro-services approach has many advantages, such as ease
of updating or replacement of a component when it becomes outdated (Abrams
et al., 2010). The University of California Curation Center has adopted this
approach in developing its digital preservation infrastructure (www.cdlib.org/
services/uc3/curation).
Archivematica (archivematica.org) is also using the micro-services concept
to build its digital preservation system. This system aims to ‘reduce the cost
and technical complexity of deploying a comprehensive, interoperable digital
curation solution that is compliant with standards and best practices’ (Van
Garderen, 2010). Archivematica combines open-source software tools into an
integrated software suite that provides a digital preservation path from ingest
to access, compliant with the OAIS Reference Model. It uses best practice
metadata standards, such as METS, PREMIS and Dublin Core. Archivematica
is itself open-source and free. Examples of the software incorporated into
Archivematica include BagIt to package digital objects and metadata, FITS
(File Information Tools Set) for file format identification and validation,
ImageMagick for converting bitmap images, MD5 for check-sum generation
and verification, and OpenOffice to normalize office documents.
Proprietary managed digital preservation systems are also available. They
aim to provide a single solution to digital preservation. Three systems of this
kind available in 2011 are Tesella’s Safety Deposit Box, Ex Libris Rosetta,
and OCLC’s Digital Archive. Tesella’s Safety Deposit Box (www.digital-
preservation.com/solution/safety-deposit-box) was developed by Tesella in
conjunction with The National Archives (UK) and provides a complete digital
archival solution. The Safety Deposit Box is described as flexible, accommo-
dating different user requirements in a variety of ways, including providing a
choice of providers of storage solutions and relational databases, and accom-
modating an institution’s metadata requirements. Ex Libris Rosetta (www.
exlibrisgroup.com/category/RosettaOverview), developed in conjunction with
the National Library of New Zealand, is described as a flexible, highly scalable,
secure, and easily managed digital preservation digital preservation system.
OCLC’s Digital Archive (www.oclc.org/digitalarchive) promises a secure, easily
managed, scalable storage environment for digital materials. OCLC emphasizes
the Digital Archive’s automated monitoring and reporting capabilities.
Conclusion 167
Conclusion
Chapter 7 and 8 provide an insight into the range of practices that constitute
the current digital preservation landscape. The section ‘Combining principles,
strategies and practices’ in this chapter has noted that the practices described in
both of these chapters are often combined and gives examples of their combi-
nation. Chapter 9 takes this further by noting a number of digital preserva-
tion activities that have been selected as examples to illustrate the range of
approaches being taken in preserving digital materials.
Chapter 9
Digital Preservation Initiatives and Collaborations
Introduction
Preservation is a problem domain that demands col-
laborative action (Ross, 2004)
This chapter first considers the theme of collaboration that is strikingly apparent
in many digital preservation initiatives. It then notes some ways in which these
initiatives have been categorized and describes a number of digital preservation
activities in those categories, structured first around geographical considera-
tions – international, regional, national or sectoral – and then subdivided into
services (projects actually carrying out digital preservation) and alliances (col-
laborations to develop, test and/or promote approaches to digital preservation).
Only a selection of initiatives and collaborations are covered here. Their de-
scription is intended to illustrate the range and nature of contemporary digital
preservation activities, rather than to provide a comprehensive listing. The
selection and structuring of initiatives and collaborations in this chapter are
intended to provide a framework to help us reflect usefully on the experience
gained through these initiatives.
Collaboration
One characteristic of the digital preservation agenda is that collaboration is
firmly embedded in that agenda and has been so from the earliest days of its
compilation.
Partnerships have always been important in the digital preservation community. From
the very beginning it was apparent that no one organization – whether library, govern-
ment or academic – could adequately archive, preserve and continue to provide access
to the digital material, even with stringent selection criteria (Hodge and Frangakis,
2004, p.63).
Certainly collaborative activities have become more and more evident at all
levels, and collaboration is now assumed to be one of the keys to effective
confrontation of what seem at times to be overwhelming challenges of digital
preservation. Speakers at a 2004 workshop noted that ‘one key theme running
Collaboration 169
throughout the day was the need for active collaboration at every level and
across sectoral and geographic boundaries. Speaker after speaker illustrated
how this collaboration was essential’ (Digital Preservation Coalition, 2004).
The topic of Article 11 of the UNESCO Charter on the Preservation of the
Digital Heritage (UNESCO, 2004) is ‘Partnerships and cooperation’ and the
UNESCO Guidelines devote a whole chapter to collaboration (UNESCO,
2003, chapter 11, ‘Working Together’). In 2011 research libraries ‘cannot
hope to build a stable cyberinfrastructure unless they work together in collabo-
rative units, investing in community-generated solutions rather than insisting
on building individualized workflows and systems’, note Walters and Skinner
(2011, p.6), who provide recommendations for making collaborative strategies
work and a range of examples of collaborative activities. ‘The question’, they
suggest, ‘is no longer whether, but rather how to collaborate’ (p.13). Increasing
collaboration at local, national, regional and international levels can be readily
observed and the rest of this chapter describes examples of collaboration.
The reasons for the prominence of collaborative action in digital preservation
are to be found in large part in the scale of the issues and uncertainty surrounding
how to address them. Because digital preservation is expensive and resources
are scarce, collaborative activities ‘can enhance the productive capacity of a
limited supply of digital preservation funds, by building shared resources, elimi-
nating redundancies, and exploiting economies of scale’ (Lavoie and Dempsey,
2004). The issues are similar across different kinds of organizations (such as
libraries and archives) and different sectors (for example, different scientific
disciplines) so that ‘it makes sense to capitalise on the potential benefits of
pooling expertise and experience’; there may also be pressure, for example by
funding agencies, to collaborate (Digital Preservation Coalition, 2008, p.48).
Another compelling reason to collaborate is uncertainty about where the
responsibility for preserving digital materials lies. There are many stakeholders
(noted in Chapter 2), none of whom can realistically expect to develop workable,
scalable solutions on their own. Academic authors are encouraged to collaborate
with libraries in archiving their own works in university library-based digital
repositories, for example. Journal publishers are collaborating with libraries or
library-based organizations such as JSTOR and LOCKSS to provide continuing
access to their publications. Morris in 2002 saw promise in ‘the way that organi-
sations from different parts of the information chain are beginning to work to-
gether to address some of the problems’ (Morris, 2002, p.130), cautioning that
the only way we can hope to find reasonable and scholarship-friendly solutions is to
work together. We have to make sure that there is close communication between the
plethora of initiatives in different parts of the world, so that none of us wastes time and
money re-inventing the wheel ... And we need to make sure that all the members of the
information chain – information creators, information users, and all the intermediaries
in between – are involved in the discussions and in creating the solutions (Morris,
2002, p.132).
170 Digital Preservation Initiatives and Collaborations
Certainly one does not have to look far to identify collaborative activities. In
his 2002 survey of digital preservation activities, Beagrie observed that the
National Library of Australia ‘believes that international collaboration at many
levels is essential in digital preservation’ (Beagrie, 2003, p.18) and noted the
Library’s commitment to many international collaborations, including PADI
with its international advisory group, a Memorandum of Understanding with
the Digital Preservation Coalition, the Safekeeping Initiative with CLIR, the
National Library’s role in the Conference of Directors of National Libraries,
whose action plan includes aspects of digital preservation, and its participation
in working groups such as the OCLC/RLG working groups on preservation
metadata and on digital archive attributes (Beagrie, 2003, p.19). Current ex-
amples include the collaborative activities that ARL (Association of Research
Libraries) members are participating in (Walters and Skinner, 2011, pp.32-55),
the projects and partners of the US National Digital Information Infrastructure
and Preservation Program (NDIIPP, 2011, pp.47-56) and any one of the major
European Commission-funded digital preservation activities (CORDIS, 2011).
These are noted in more detail later in this chapter.
Typologies of digital preservation initiatives 171
denotes projects carrying out digital preservation, and alliances denotes col-
laborations whose aim is to develop, test and/or promote approaches to digital
preservation.) These categories, based on geography and the nature of the col-
laboration, required arbitrary distinctions to be made in 2004, when the first
edition of this book was being prepared. These distinctions are even more diffi-
cult to make in 2011, as many recently-established initiatives have characteristics
of more than one category. Although some of these programmes are catego-
rized in this chapter as national, or local, and so on, such is the collaborative
nature of digital preservation activities that their lessons are typically made
available to a wider audience and are keenly scrutinized. Despite its imperfec-
tions this typology is applied because it accommodates most current digital
preservation activities. Initiatives and programmes that have been described
elsewhere in this book are referred to only briefly in this chapter.
Services Alliances
International services
– The Asian Tsunami Web Archive, a collection of web sites relating to the
December 2004 Tsunami disaster in Asia
– Hurricanes Katrina and Rita, web sites that record the devastation caused
by Hurricane Katrina and its aftermath
– The UK Central Government Web Archive: selected UK Government web
sites from 2003, collected for the National Archives (UK)
– Election 2002 Web Archive: almost 4,000 web sites relating to the 2002
US elections, collected for the Library of Congress
– September 11th: archived web sites relating to the events of 11 September
2001 in the US, collected for the Library of Congress
– Election 2000: web sites relating to the US elections held in 2000, com-
missioned by the Library of Congress
– Web Pioneers: web sites illustrating the early years of the internet.
DuraSpace (duraspace.org)
DuraSpace was established in 2009 as a not-for-profit organization committed
to ‘providing leadership and innovation in the development and deployment
of open technologies that promote durable, persistent access to digital data’
(duraspace.org/about.php). Collaboration with a wide range of stakeholders,
such as scientists, researchers, librarians, and data specialists, is intrinsic to its
operation.
DuraSpace currently supports and develops two open-source repository
applications widely used in digital preservation, Fedora and DSpace, and has
introduced a new technology, DuraCloud. As noted on the DuraSpace web site,
its values ‘are expressed in our organizational byline, “open technologies for
durable digital content”’ and Fedora, DSpace and DuraCloud support access
to digital materials over long periods of time.
The open-source repository applications Fedora and DSpace have their
origins in research programmes based originally at US universities. They are
currently well established as important applications for digital preservation, being
implemented throughout the world in a wide range of contexts. DuraCloud,
also open-source, is a storage service that uses cloud storage and cloud com-
puting for online backup of digital materials in multiple locations.
DSpace (www.dspace.org)
DSpace began life as a sectoral initiative, as an institutional repository for
material produced by faculty of MIT (Massachusetts Institute of Technology).
At an early stage in its development DSpace was made available to other insti-
tutions and has been adopted around the world. Developed jointly by MIT
176 Digital Preservation Initiatives and Collaborations
Libraries and Hewlett-Packard, it was trialed within MIT from February 2002
and launched in September 2002. A significant characteristic of DSpace is that
it is an open-source system.
DSpace participants collaborated early in projects such as the DSpace@
Cambridge Project (www.dspace.cam.ac.uk). In 2003 the DSpace Federation
was established, its members all institutions that had implemented DSpace.
The intention of the DSpace Federation is to share ‘technical innovation,
content, and services’ and ‘to promote interoperability among institutional
repositories to support distributed services, virtual communities, virtual collec-
tions, and cataloging’. This happened through activities such as sharing in the
development and maintenance of the DSpace source code, and promoting the
DSpace service and interoperability of archival repositories. In 2007 the DSpace
Foundation was formed to support and develop DSpace (HP and MIT, 2007),
and in 2009 DSpace became part of DuraSpace.
From its beginnings DSpace was intended to support long-term preservation.
It ‘is committed to going beyond reliable file preservation to offer functional
preservation where files are kept accessible as technology formats, media, and
paradigms evolve over time for as many types of files as possible’ (DSpace,
2010?). Preservation is supported by such features as: automatic calculating
and verification of a checksum for each file uploaded; regular checking of
checksums to ensure file integrity; automatic identification of the file formats
of objects added to the repository; use of standards, including METS to main-
tain links between files and the Handle system to provide unique identifiers; and
maintaining the bit-stream of items added to the repository. DSpace recog-
nizes that faculty will create digital materials in a wide variety of formats that
support their own aims, and that repositories must, therefore, handle these
formats. It accepts all forms of digital materials and defines three levels of
preservation for file formats – supported, known, and unsupported. Bit-level
preservation is carried out for each of these levels. The functions of supported
file formats (those that are open and ‘archival’, such as TIFF, SGML, XML
and PDF) are preserved using either format migration or emulation techniques.
The future support of known file formats (popular proprietary formats such as
Microsoft Word and Powerpoint, Lotus 1-2-3, and WordPerfect) depends on
the likelihood that third-party format migration tools will be developed for
them. Functional preservation will not be applied to unsupported file formats
(those about which little is known, such as unique software programs). DSpace
assigns to material submitted ‘a unique identifier, stores provenance informa-
tion, maintains an auditable history and record of changes to the archive [and]
provides persistent storage’ (Sullivan et al., 2004; Pennock, 2006a).
DSpace has been widely adopted, its web site listing over 1,100 instances
at 4 September 2011. It is influential in digital preservation because it provides
a framework in which academic libraries and archives can develop strategies
and practices in a collaborative international environment.
International initiatives and collaborations 177
Fedora (fedora-commons.org)
Fedora (Flexible Extensible Digital Object and Repository Architecture) is an
open-source repository management system developed at Cornell University
and now in widespread use. Fedora’s suitability as the basis of a digital archive
was investigated by Tufts University and Yale University (Fedora, 2006).
They concluded that Fedora has value as ‘a preservation application, in con-
junction with “the appropriate people, infrastructure, policies, and procedures’,
noting ‘its agnostic view towards file formats and object types enables it to
manage essentially any type of file’, and features such as its use of XML and
its ability to manage multiple bit streams for a single object (Fedora, 2006,
Section 4.1)
The FedoraCommons web site (www.fedora-commons.org/about/features)
indicates that Fedora’s key features include the ability to store and manage all
types of content and associated metadata, scalability (up to ‘millions of objects’),
web access and search facilities, a wide range of storage options, and a ‘Re-
builder Utility’ for disaster recovery and data migration. Fedora’s support for
preservation includes its use of open standards such as METS and XML, its
system architecture which accommodates the OAIS Reference Model, and its
ability to handle metadata from a range of sources (Pennock, 2006b).
More than 170 institutions were registered with Fedora installations at
4 September 2011. It has an international community-based development team.
DuraCloud (duracloud.org)
DuraCloud is a new service, incorporating open-source technology, developed
by DuraSpace and released in 2010. Its aim is to make the use of cloud services
straightforward for end-users, providing pay-for-use access to digital materials
and at the same time ensuring the durability of digital content. The DuraCloud
service uses a cloud server environment and and multiple cloud storage pro-
viders, one of them Amazon AWS. It was decided that writing open-source
code would encourage community involvement in software development and
the creation of new services to integrate with the DuraCloud system.
DuraCloud supports digital preservation in various ways, one of which is
its support of redundancy. DuraCloud maintains multiple copies of digital
materials with different cloud storage providers. Another preservation support
feature is its integrity checking, which allows users to check the integrity of
material they have stored in DuraCloud. Other preservation-related features
allow bulk handling of file conversions and easy synchronizing of files between
a local repository and cloud storage.
DuraCloud, only recently introduced at the time of writing in 2011, is
planning many further initiatives. They include developing new alliances and
funding models, working on system integration of DuraCloud with DSpace,
Fedora, ePrints and other open-source products, offering education and train-
ing, and developing a service for individual researchers.
178 Digital Preservation Initiatives and Collaborations
The above section on DuraSpace and its subsidiary applications and ser-
vices is based on Barton and Walker (2003), Greenan (2003), Smith (2003), Sul-
livan et al. (2004), Walters and Skinner (2011), and the web sites of DuraSpace
(duraspace.org), FedoraCommons (fedora-commons.org) and DSpace (www.
dspace.org).
LOCKSS (lockss.stanford.edu)
The LOCKSS (Lots Of Copies Keep Stuff Safe) project is based on the well-
established preservation principle of redundancy (keeping multiple copies as a
safeguard against loss). LOCKSS is significant in digital preservation terms
because it established the feasibility of replication and peer-to-peer polling using
standard personal computers.
LOCKSS was developed initially for the preservation of e-journals and has
expanded to be applicable to any Web-published content. It is based on open-
source software that harvests, stores and copies digital content using standard
desktop computers, and ensures accuracy of the digital material through peer-
to-peer polling. It is inexpensive because it does not require costly hardware,
the software is free, and relatively little technical administration is required.
The LOCKSS programme for preserving e-journals has expanded rapidly; the
more than 80 libraries and 50 publishers using the LOCKSS software in 2004
has grown to ‘about 200 LOCKSS boxes in libraries around the world’ in 2010
(Rosenthal, 2010e) and 470 publishers in 2011 (lockss.stanford.edu/lockss/
Publishers_and_Titles).
The LOCKSS web site (lockss.stanford.edu) provides a detailed summary of
how the system works. An inexpensive personal computer running LOCKSS
software collects specified digital content, for which the publisher’s permis-
sion to collect has been secured, using a web crawler. It continually compares
the content collected with the same content collected by others in the network
and ensures that the content is identical. If changes in a file are detected, the
changed file is repaired from an intact copy. The LOCKSS software also allows
access to this content by authorized users and provides administrative func-
tions. In order to allow the LOCKSS crawler software access to their content,
publishers need to give permission to the LOCKSS system. The basis of the
preservation function of LOCKSS is the continual checking of digital content
against other copies and the repairing of discrepancies identified by comparing
copies through polling. LOCKSS is OAIS-compliant.
The LOCKSS concept and software has been applied widely elsewhere.
CLOCKSS (Controlled LOCKSS (www.clockss.org)) was established in 2006
by a consortium of publishers and libraries as an archive of electronic journals
no longer supported by any publisher (Reich and Rosenthal, 2009). LOCKSS
is used as the basis of PLNs (Private LOCKSS Networks), which have been
established by libraries and archives collaborating to preserve digital objects.
The operation of one LOCKSS PLN, the Alabama Digital Preservation Network
International initiatives and collaborations 179
error within the network. Three committees (content, preservation, and techni-
cal) provide guidance to MetaArchive.
MetaArchive is based on the principle of distributed digital preservation,
and to that end uses open-source LOCKSS software, which allows digital pres-
ervation to be carried out collaboratively at a series of geographically distributed
sites. Each member participates in a PLN (Private LOCKSS Network) and
operates a server in a secure preservation-dedicated network environment. This
ensures that materials are preserved in various ways: content is stored on at
least six servers in different geographic locations and maintained by different
systems administrators; content is constantly monitored for change, which if
detected is repaired; expertise and technical infrastructure is kept within insti-
tutions rather than residing with outside vendors; and centralized expertise is
available to advise and assist all members of MetaArchive. MetaArchive inte-
grates well with other repository applications, including DSpace and Fedora. If
the MetaArchive Cooperative decides that a format type held in its network
needs to be migrated, all material in this format will be migrated and both the
original copies and the migrated copies preserved.
MetaArchive works with other preservation initiatives, such as NDIIPP, the
National Digital Stewardship Alliance (NDSA), and the Networked Digital
Library of Theses and Dissertations (NDLTD). It partners with other groups
(Chronopolis, Data-PASS, and the California Digital Library are examples) to
develop preservation technologies. The MetaArchive Cooperative has since
2007 been a programme of the Educopia Institute (www.educopia.org), whose
goal is to foster successful community-based cyberinfrastructure, rather than to
develop its own assets, thereby building knowledge and resources in the com-
munity. (This section is based on Halbert and Skinner (2008), Skinner and
Halbert (2009), Minor, Phillips and Schultz (2010), Walters and Skinner (2011,
pp.36-39) and the MetaArchive Cooperative web site (MetaArchive.org).
International alliances
The examples of digital preservation services noted above are all based on
collaboration. Some international collaborations are different in that they are
intended primarily to conduct research, educate, inform and lobby, rather than
to establish services that actually preserve digital materials. The examples
noted in this section are UNESCO, PADI, OCLC, CAMiLEON, and the Inter-
national Internet Preservation Consortium. Another example, the InterPARES
Project, is noted in Chapter 5.
UNESCO (www.unesco.org)
UNESCO (United Nations Educational, Scientific and Cultural Organization)
has assumed a prominent role in promoting digital preservation. Its general
International initiatives and collaborations 181
RLG
RLG, founded as the Research Libraries Group in 1974, was a not-for-profit
alliance of libraries, archives, museums, and historical societies with strong
research collections. It provided a mechanism for collaborative action on the
problems facing research collections. RLG merged with OCLC in 2006.
Preservation was always a major interest of RLG, as demonstrated by its
significant digital preservation activities, particularly in advocacy and raising
awareness and in standards development. RLG’s advocacy and awareness-
raising activities included publication from 1997 to 2007 of the electronic
journal RLG DigiNews. Perhaps its most influential activity in digital preserva-
tion was its participation, with the Commission on Preservation and Access, in
the Task Force on Archiving of Digital Information. The report of the Task
Force in 1996 (Task Force on Archiving of Digital Information, 1996) laid the
foundations for much subsequent work in digital preservation.
RLG’s role in standards development was similarly significant. It partici-
pated in 1998 in the development of the OAIS Reference Model and in many
working groups, including:
International initiatives and collaborations 183
Chapter 5 notes in more detail the history of PREMIS and the role of OCLC
and RLG in it.
Other RLG collaborations included a ‘fruitful working relationship and
strategic partnership’ with JISC (Dale, 2004, p.20). RLG staff also contributed
to the advisory groups of many collaborative digital preservation activities,
such as Cedars and CAMiLEON. RLG was a founding member of the DPC.
(This section is based on Bellinger et al. (2004), Dale (2004) and the OCLC
web site (www.oclc.org).)
CAMiLEON (www.si.umich.edu/CAMILEON)
CAMiLEON (Creative Archiving at Michigan and Leeds), whose role in
developing emulation is described in Chapter 7, is noted here because it was a
significant example of an international digital preservation collaboration, in this
case a research collaboration. From 1999 to 2003 the University of Michigan (in
the US) and the University of Leeds (in the UK) combined forces to develop
and evaluate a range of techniques for long-term preservation of digital
materials. CAMiLEON’s reports and other publications, available on its web
site, remain valuable source material. Rusbridge considers its influence to have
been in its high-profile activities, especially the BBC Domesday project (see
Chapter 7), and its proof-of-concept of migration on request. CAMiLEON, he
suggests, ‘has given leadership, attracted public attention, advanced both
theory and practice, and highlighted many of the issues involved in digital
preservation today’ (Rusbridge, 2004, p.35).
International Internet Preservation Consortium (netpreserve.org)
The International Internet Preservation Consortium (IIPC) was formed in
2003, initially for three years. Its charter members were the national libraries
of Australia, Canada, Denmark, Finland, Iceland, Italy, Norway, and Sweden,
184 Digital Preservation Initiatives and Collaborations
the British Library, the Library of Congress, and the Internet Archive, with
overall coordination by the Bibliothèque nationale de France. After the initial
three-year period had elapsed, membership expanded, numbering forty in 2011,
and coordination moved to the British Library. The consortium’s mission is
‘to acquire, preserve and make accessible knowledge and information from
the Internet for future generations everywhere, promoting global exchange
and international relations’, this mission being articulated by three goals:
– To enable the collection of a rich body of Internet content from around the world
to be preserved in a way that it can be archived, secured and accessed over time
– To foster the development and use of common tools, techniques and standards that
enable the creation of international archives
– To encourage and support national libraries everywhere to address Internet archiving
and preservation (netpreserve.org).
The research activities of the IIPC are conducted by working groups, of which
there were three in 2010: Harvesting, concerned with developing web harvesting
techniques; Access, focusing on understanding and defining user requirements
for access; and Preservation, concentrating on policy, practices and resources
to support the preservation of web archives. Earlier working groups tackled
other aspects of web archiving such as standards, researchers’ requirements,
metrics, and access tools. The IIPC’s research activities are disseminated
through reports available on its web site and at its regular meetings.
The IIPC has developed software tools for web archiving, such as Heritrix
(an archive quality web crawler), DeepArc (software that extracts database
content to XML flat files), and the Web Curator Tool (a workflow manage-
ment tool). The IIPC aims to develop a toolkit of web archiving software that
is open-source and easy to install. It makes available from its web site software
tools recommended and used by its members. One of its principal activities has
been the development of the WARC file format, a container format that allows
one file to contain a large number of objects and associated metadata that
includes a record of actions taken to preserve the files. WARC was adopted as
an ISO standard, ISO 28500:2009 Information and documentation–WARC file
format, in 2009. (This section is based on information from the International
Internet Preservation Consortium’s web site (netpreserve.org) and from com-
ments made during the Archiving Web Resources International Conference,
Canberra, 9-11 November 2004.)
Regional initiatives and collaborations 185
Regional services
NEDLIB (www.kb.nl/hrd/dd/dd_projecten/projecten_nedlib-en.html)
NEDLIB (the Networked European Deposit Library) was a European Com-
mission-funded research project that was launched at the start of 1998 and
ended in January 2001. NEDLIB was a collaborative project of national librar-
ies, a national archive, IT organizations and publishers, led by the Koninklijke
Bibliotheek (National Library of the Netherlands), formed to explore the tech-
nical and managerial issues of building a networked European deposit library.
The web site of the Koninklijke Bibliotheek (the National Library of the Nether-
lands) hosts the publications and reports resulting from its activities, which
include Jeff Rothenberg’s report An Experiment in Using Emulation to Preserve
Digital Publications (Rothenberg, 2000). Its outcomes include a proof-of-
concept demonstrator of a deposit system for electronic publications, and guide-
lines for best practice. Its results have been further developed by the Koninklijke
Bibliotheek and IBM-Netherlands in the implementation of Koninklijke Biblio-
theek’s electronic deposit system, e-Depot (described below). The significance
of NEDLIB lies in the background work it did to develop and implement a
fully functioning electronic deposit system. (This section is based on Beagrie
(2003) and the NEDLIB web site (www.kb.nl/hrd/dd/dd_projecten/projecten_
nedlib-en.html).)
Regional alliances
ERPANET (www.erpanet.org)
ERPANET (Electronic Resource Preservation and Access Network) was
funded by the European Commission and the Swiss Government from 2001 to
2004 to ‘enhance the preservation of cultural and scientific digital objects
through raising awareness, providing access to experience, sharing policies
186 Digital Preservation Initiatives and Collaborations
have already been noted in this chapter, such as ERPANET and DigitalPreser-
vationEurope, and others elsewhere in this book. Several of the projects are,
however, so significant that they require specific mention.
The CASPAR project (www.casparpreserves.eu) ran from 2006 to 2009,
working in the domains of science, cultural heritage, and creative arts. One of
its products was ‘a suite of flexible, sustainable, and interchangeable digital
preservation services’ (Lamb et al., 2009). The suite includes: a registry and
repository of representation information; a representation information toolbox
to assist with creating, maintaining, and reusing metadata; a preservation data
store; a digital rights manager; and an authenticity management tool. The Planets
Project (www.planets-project.eu), which ran from 2006 to 2010, also devel-
oped a suite of tools to apply at each stage of the digital preservation process.
Among these tools are: the Planets Core Registry of file formats; emulation
tools; the GRATE (Global Remote Access to Emulation Services) tool to pro-
vide access to emulators; the Planets Testbed, a mechanism for testing the
effectiveness of different preservation tools; and Plato, a decision-support tool
that assists with preservation workflow planning. Planets tools are being inte-
grated into digital preservation activities at the British Library, the Swiss
Federal Archives, the Koninklijke Bibliotheek and Nationaal Archief in the
Netherlands, Det Kongelige Bibliotek (National Library of Denmark), and the
Österreichische Nationalbibliothek (Austrian National Library) (Integrating
Planets, 2009). The activities of Planets are being continued by the Open Planets
Foundation (www.openplanetsfoundation.org).
The suites of tools developed by the CASPAR and Planets projects are
examples of the influential outcomes of many of the research and development
projects in digital preservation by the European Commission. CASPAR and
Planets are by no means the only significant examples of digital preservation
research and development projects funded by the European Commmission.
Among others worthy of mention are Keeping Emulation Environments Portable
(KEEP) and SHAMAN. KEEP (www.keepproject.eu), funded from 2009 to
2011, is developing emulation tools and aims to improve understanding about
how emulation strategies can be integrated into digital archives. SHAMAN
(shaman-ip.eu/shaman), funded from 2007 to 2012, is building a digital preser-
vation framework using new technologies such as grid computing, virtualization
and distribution technologies with their associated tools. Its primary areas of
concern are scientific publishing and parliamentary archives, industrial design
and engineering, and scientific applications.
vice or alliance within only one country, the availability (as with some of the
international and regional initiatives and collaborations noted above) may have
since expanded beyond national borders. National services considered here are
the AHDS and the Florida Digital Archive. National alliances noted are the
Digital Curation Centre, the Digital Preservation Coalition, NDIIPP, the
NDSA, and HathiTrust.
National services
AHDS (www.ahds.ac.uk)
The Arts and Humanities Data Service (AHDS) was a federation of five data
archive services: AHDS Archaeology; AHDS History; AHDS Literature, Lan-
guage and Linguistics; AHDS Performing; and AHDS Visual Arts. The AHDS
was established in 1996 as an outcome of a JISC feasibility study which rec-
ommended and funded a centrally managed distributed service. It ceased op-
eration in 2008. Its aims were to ‘preserve the rapidly growing unpublished
primary digital research materials being generated within the higher education
arts and humanities community and beyond’ (Beagrie, 2001, p.222). It was
based on the already existing Oxford Text Archive (established in 1976 at the
University of Oxford) and the History Data Service (established in 1993 at the
University of Essex), as well as on three more recently established data ar-
chives, the Archaeology Data Service (at the University of York), the Perform-
ing Arts Data Service (at the University of Glasgow) and the Visual Arts Data
Service (at the Surrey Institute of Art and Design).
The holdings of the AHDS included electronic texts, databases, still im-
ages, audio, GIS data, geophysics data, and metadata sets. The AHDS estab-
lished and guided the overall policy for the management of each of its compo-
nent services, provided outreach services, set standards, and encouraged best
practice through the guides it compiled and distributed, such as Digitising His-
tory (Townsend, Chappell and Struijvé, 1999).
The AHDS was significant for digital preservation as an early example of
collaboration in data archiving, that is, a centrally managed distributed model.
After the withdrawal of funding for a national service in 2008 several services
continued: the Centre for e-Research at King’s College London, the Archaeology
Data Service at the University of York, the Oxford Text Archive at Oxford
University, the History Data Service at the University of Essex, and the Visual
Arts Data service at the University for the Creative Arts. (This section is based
on Beagrie (2001) and information from the AHDS web site (www.ahds.ac.uk).)
National initiatives and collaborations 189
National alliances
DCC SCARP Project. (This section is based on Lord and Macdonald (2003),
Ross et al. (2004) and the Digital Curation Centre’s web site (www.dcc.ac.uk).)
Digital Preservation Coalition (www.dpconline.org)
The idea of a national digital preservation alliance was developed in 1999 at a
workshop on digital preservation held at the University of Warwick, which
recommended the establishment of a coalition to promote digital preservation
activities in the UK. To further the development of a coalition, JISC established
a Digital Preservation Focus in 2000 (Beagrie, 2001). The Digital Preservation
Coalition (DPC) was established in 2001 with seven founding members. Mem-
bership is open to all collectives and nonprofit organizations from all sectors in
the UK. The DPC had 26 members at January 2004, plus JISC. By 2011 its
membership had grown to 14 full members, 22 affiliate members, and 4 allied
organizations.
The DPC’s aim is ‘to secure the preservation of digital resources in the UK
and to work with others internationally to secure our global digital memory
and knowledge base’. To achieve this it has established goals which include:
In the DPC’s early years its activities included an advocacy campaign, dis-
semination and current awareness, forums and training workshops, a survey of
industry vendors, its Technology Watch, and surveying DPC members as part
of a UK-wide needs assessment exercise. Its activities in 2011 have expanded,
as indicated by the impressive array of activities and materials on its web site.
The DPC co-sponsors a Digital Preservation Award. Its collaborative activities
are not limited to the UK; for example, it lists as ‘Allied Organisations’ ICSPR,
the National Library of Australia, and the Library of Congress’s NDIIPP.
Advisory materials, many available publicly, include the Digital Preservation
Handbook (referred to frequently in this book), Technology Watch Reports,
and the What’s New in Digital Preservation newsletter.
The significance of the DPC for digital preservation lies in its active en-
couragement of digital preservation through advocacy and training activities,
which have been influential in the UK and are keenly observed in other coun-
192 Digital Preservation Initiatives and Collaborations
tries. (This section is based on Beagrie (2001), the DPC annual reports for
2002-03 and 2003-04, Jones (2004), Simpson (2004) and the DPC’s web site
(www.dpconline.org).)
NDIIPP (www.digitalpreservation.gov)
The US-based National Digital Information Infrastructure and Preservation
Program (NDIIPP) is led by the Library of Congress. Federal legislation estab-
lished this programme in December 2000 with US$100 million in funding.
NDIIPP’s goals are to develop a national digital collection and preservation
strategy, to work with other stakeholders to establish partnerships and form
networks, to help identify and preserve at-risk digital content, and to support
the development of tools, models, and methods for digital preservation. Collabo-
ration lies at the heart of NDIIPP, having been mandated by the legislation that
created the programme, which expected collaboration between the Library of
Congress, National Archives and Records Administration, National Library of
Medicine, National Agricultural Library, National Institute of Standards and
Technology, and other federal agencies, as well as non-federal organizations
and institutions. Since its establishment NDIIPP has sought participation from
a wide range of organizations.
Early activities of NDIIPP included convening stakeholder meetings in
2001, commissioning environmental scans, and presenting a plan to Congress
in 2003. NDIIPP commissioned a report on international digital preservation
activities for its own information, the initiatives surveyed being selected for
their relevance and interest to NDIIPP. In mid-2004 Version 2.0 of the Technical
Architecture for NDIIPP was released for review and comment and NDIIPP
entered into a partnership with the National Science Foundation to fund research
programmes in digital preservation. In September 2004 NDIIPP awarded
US$15 million to eight US consortia for three-year projects to identify, collect
and preserve digital materials within a nationwide digital preservation infra-
structure.
NDIIPP considered its achievements in its first five years to include devel-
oping a network of 50 to 75 partners, establishing a large archive of at-risk
content, and making recommendations to the US Congress about the long-term
governance of a national digital preservation programme. The first edition of
this book noted in 2005 that NDIIPP ‘is being keenly observed by digital preser-
vation interest groups throughout the world. With such large resources at its
command it stands a good chance of providing tools that will revolutionize
how digital preservation is carried out.’ This promise has definitely been met.
By 2011 NDIIPP’s activities have expanded considerably. The number of
partners has more than doubled and includes international partners. Its web site
features an impressive array of resources aimed at all levels of understanding
of digital preservation. As examples, it has provided funding for preserving
at-risk digital collections listed on the web site; it lists tools and services
National initiatives and collaborations 193
(This section is based on Walters and Skinner (2011, pp.32-33) and on the
NDSA web site (www.digitalpreservation.gov/ndsa).)
HathiTrust (www.hathitrust.org)
HathiTrust is an alliance of universities, all except one in the US, whose mission
is ‘to contribute to the common good by collecting, organizing, preserving,
communicating, and sharing the record of human knowledge’. To do this it is
building a digital archive of library materials converted from print, strongly
improving access to these materials, preserving them, coordinating shared
storage strategies among libraries, developing sustainable cost models, and
creating a responsive technical framework (www.hathitrust.org/mission_goals).
HathiTrust began as an initiative by a cooperative of libraries that are
members of the Committee on Institutional Cooperation (CIC) and the Univer-
sity of California system, plus the University of Virginia, to make publicly
available the backup of the Google-digitized versions of books they owned. Its
aim was to archive and share the digitized collections of its members by preserv-
ing the intellectual content and, where possible, the materials’ exact appearance
and layout. It quickly expanded and at 4 September 2011 had as its members
three consortia (CIC, the Triangle Research Libraries Network, and the Uni-
versity of California) and 56 individual institutions. The focus is on preserving
and providing access to digitized book and journal content from partner libraries.
Longer term, HathiTrust’s objectives include expanding its focus to include
other types of digital materials, and engaging in research and development in
search of better discovery and preservation tools. Its growth has been impres-
sive. On 4 September 2011, the HathiTrust web site noted 9,554,741 total
National initiatives and collaborations 195
Sectoral services
Cedars (www.webarchive.org.uk/ukwa/target/99695/source/search)
Cedars (CURL Examplars in Digital Archives) was a collaborative research
project that ran from 1998 to 2002, funded by JISC and hosted at the Universi-
ties of Leeds, Oxford and Cambridge. It is included here because it was ‘an
important test-bed for digital preservation within research libraries in the higher
education sector’ (Beagrie, 2001, p.219). Its aim was to provide guidance to
others in the sector about best practice for digital preservation. Among its out-
comes are a preservation metadata schema and an archiving demonstrator pro-
ject based on OAIS. Some of the activities of Cedars are noted in Chapter 5.
The significance of Cedars for digital preservation lies in its reports (available
on the Cedars web site, as preserved by the UK Web Archive) and its devel-
opment of a prototype distributed archiving system. (This section is based on
Beagrie (2001) and the Cedars site (www.webarchive.org.uk/ukwa/target/99695/
source/search).)
Sectoral alliances
JISC (www.jisc.ac.uk)
JISC (Joint Information Systems Committee) represents institutions in the UK
higher and further education sector. Its mission is ‘to provide world-class leader-
ship in the innovative use of Information and Communications Technology to
support education, research and institutional effectiveness’ (www.jisc.ac.uk/
aboutus/strategy.aspx). JISC has energetically pursued digital preservation
activities since its early collaborations, such as the AHDS, Cedars, and
CAMiLEON (all referred to elsewhere in this chapter), and currently through
its funding of programmes and projects in the field. It has also provided sub-
stantial funding for the Digital Curation Centre.
Beagrie’s description of JISC’s Continuing Access and Digital Preserva-
tion Strategy (one of the finalists in the inaugural Digital Preservation Award)
which ran from 2002 to 2005, notes that JISC had three major objectives in its
digital preservation activities:
– To establish and disseminate best practice and guidelines for digital preservation – a
major outcome was the publication of Preservation Management of Digital Materials:
A Handbook (Jones and Beagrie, 2001)
– To collaborate with other agencies worldwide – the DPC was the principal outcome
of this objective
– To develop a long-term digital preservation strategy relevant to the higher and further
education sector in the UK – the major outcome was JISC’s Continuing Access and
Digital Preservation Strategy and implementation plan (Beagrie, 2004).
JISC’s Continuing Access and Digital Preservation Strategy was influential. Its
activities included the provision of a wide range of resources on its web site,
Conclusion 197
for example internet resources, e-journals, e-prints, feasibility studies and risk
assessments that recommended actions for specific categories of digital mate-
rial of interest to JISC members. It lobbied successfully for a Digital Curation
Centre (described above). Beagrie notes other JISC achievements in progress-
ing digital preservation in the UK, including increased funding for digital preser-
vation activities, and partnerships such as with DPC and the UK Web Archiving
Consortium (Beagrie, 2004).
In 2011 JISC’s activities in this field come under the umbrella of digital
preservation and curation and are focused on keeping valuable and useful digital
material available for future use by scholars, researchers and other users. Seven
programmes are identified, including ‘Managing Research Data’, the ‘Reposi-
tories and Preservation Programme’, and ‘Digital Preservation and Records
Management’. Its many projects include research and development into archiv-
ing e-publications, archiving JISC-funded project web sites, tool development,
and policy studies. A list of these projects, some of which are noted elsewhere
in this book, is available on the JISC web site (www.jisc.ac.uk/whatwedo/
topics/digitalpreservation.aspx?page=1&filter=Projects).
JISC has been an important catalyst for digital preservation, not only in the
UK higher and further education sector, but also more widely in the UK and
internationally. (This section is based on Beagrie (2004) and on the JISC web site
(www.jisc.ac.uk, in particular, www.jisc.ac.uk/whatwedo/topics/digitalpreserva
tion.aspx).)
Conclusion
In describing this selection of digital preservation programmes and initiatives
the intention has been to illustrate the range and nature of digital preservation
activities, and to emphasize their increasingly collaborative nature. It is reas-
suring to note that many of the programmes and initiatives noted in the first
edition of this book, published in 2005, are still thriving. Some have formed
new alliances, combining with other programmes or initiatives, and many new
programmes and initiatives have emerged. Only a small selection of the many
that exist in 2011 can be provided in this book, which has inevitably omitted
mention of some significant examples. Investigating other examples may prompt
the reader to reflect on digital preservation activities and, in doing so, to derive
some value from them. Useful starting points to locate other digital preserva-
tion services and alliances are the ‘Partners’ and ‘Tools & Services’ sections of
the Library of Congress’s NDIIPP web site (www.digitalpreservation.gov) and
the DCC web site’s list of ‘Tools and Applications’ (www.dcc.ac.uk/resources/
tools-and-applications).
The reader should be aware that these descriptions are of activities up to
the middle of 2011. Given the rapid rate of progress in the field of digital preser-
198 Digital Preservation Initiatives and Collaborations
Introduction
The lack of awareness of digital preservation issues
by stakeholders, the lack of the necessary skill sets
to preserve digital materials, the lack of agreed in-
ternational approaches, a shortage of practical models
on which to base preservation practice, and a lack of
funding on an ongoing basis to address digital preser-
vation issues, all contribute to the problem (Beagrie,
2003, p.6)
This chapter identifies the key challenges that are likely to be the focus of digital
preservation activities in the future. Predicting futures is, of course, always a
risky business; in this attempt to do so the views of many experts are drawn
upon to derive a consensus about what the challenges will be.
In 2001 the key threats to digital continuity were listed, in an earlier ver-
sion of the Digital Preservation Coalition’s handbook, as:
Ten years later the key threats are the same, but new areas of concern have
appeared:
200 Challenges for the Future of Digital Preservation
Technological Risks
– Hardware and software, both proprietary and open source, can be a challenge to
maintain and keep current.
– Content formats can be complex and fragile. They are often not well documented
and frequently become obsolete.
– Lifecycle management risks such as data migration, file degradation (“bit rot”), or
unauthorized use can make content unusable.
Content Risks
– The volume or complexity of content makes it difficult to collect comprehensively.
– Insufficient description of content makes it challenging to discover or retrieve it
for use.
Organizational Risks
– Insufficient resources to maintain information can lead to content loss.
– Lines of authority and responsibility for maintaining digital content are often not
aligned with the demands of such content.
– Insufficient skilled personnel can prevent even routine best practices from being im-
plemented. (NDIIPP, 2011, p.12)
Although there is considerable commonality between these two lists, there are
also differences that indicate that our understandings and experience have
developed and suggest how our concerns have shifted over the decade. The
key tenet of obsolescence is a constant. Also present in both lists are concerns
about lack of skilled personnel, lack of sufficient funding, organizational struc-
tures that do not accommodate digital preservation, legal issues, and lack of
policy incentives to preserve and agreements about whose reponsibility it is to
preserve. The 2011 list no longer notes some concerns present a decade earlier,
suggesting that there is now little, perhaps no, apprehension about them: aware-
ness of digital preservation issues, including awareness by creators of digital
materials; a lack of international approaches; a shortage of practical models;
and selection. New concerns are expressed: the quantity and complexity of
digital materials; lack of metadata; and ‘lifecycle management risks’ (threats
from migration, file degradation, unauthorized use are specified – although these
were also present in 2001).
What have we learned so far? 201
– selection is necessary
– benign neglect does not work
– someone needs to accept responsibility
– working together is effective and increasing
– we already know much, including much about what we need to do
– action is possible now, even if we do not have all of the answers (Webb,
2004, p.50 (paraphrased)).
The Center for Research Libraries’ ‘Ten Principles’ statement in 2007 indicates
a more recent understanding of the requirements. In summary, there needs to
be institutional commitment to digital preservation; appropriate levels of staff-
ing, technical infrastructure and other resourcing; legal rights over the digital
materials; policies and planning procedures in place; selection criteria; techni-
cal procedures that maintain and ensure the integrity, authenticity and usability
of digital objects; metadata; and the ability to disseminate the digital objects
(Center for Research Libraries, 2007).
These are, however, only summaries. Closer examination of the challenges
is worthwhile to reinforce how much we still need to do and to delineate more
precisely those challenges. The challenges are unquestionably not just technical;
in fact, there is a school of thought that we have most of the technical solutions
at hand, or we can develop additional technical solutions to problems as they
are defined. The greater challenges, for which there can be no technical solu-
tions, are social, political and economic. Fourteen challenges were articulated
in the first edition of this book, published in 2005. They did not constitute a
comprehensive list, but were those about which there was a high level of con-
sensus, and it is revealing to consider them from the perspective of 2011. Two
more challenges have been added in this edition.
1) Developing standards for digital preservation and encouraging their use.
The development and implementation of standards (discussed in Chapters 7
and 8) are key aspects of most digital preservation activities. The benefits of
using standards include the opportunity for increased interoperability, greater
economies associated with limiting the number of file formats handled, and facili-
tation of collaboration. The rapid rate of change associated with technologies,
for instance, the rapid obsolescence of file formats as new formats replace them,
202 Challenges for the Future of Digital Preservation
ongoing need to educate new creators about the issues and about the tech-
niques to address them.
14) Developing the skills required for digital preservation. Skilled personnel
with appropriate knowledge of digital preservation requirements are in short
supply around the world (noted briefly in Chapter 1). The challenge is to identify
more precisely the skill sets that are required and to encourage their acquisition.
While the number of opportunities to acquire skills in digital preservation has
increased considerably since the first edition of this book appeared in 2005, the
demand for skilled personnel will only continue to increase. This challenge is
discussed in more detail later in this chapter.
Two challenges have been added to the fourteen examined in the first edi-
tion of this book, published in 2005. They have been identified from recent
writings about digital preservation.
15) Automating digital preservation. The most pressing of the technical
issues requiring a better response is the need to automate curation processes for
handling the very large and increasing quantities of digital records. Software
tools that enable the automation of preservation processes, such as refreshing,
migration, tracking changes to data, verifying provenance and assigning meta-
data that are easy to implement and use, need to be developed. One challenge
is to automate more of the task of creating metadata; unless this is achieved we
are seriously limited by our still largely manual digital preservation processes,
which can only deal with small quantities of digital materials and generate a
relatively small amount of metadata. Although considerable progress has been
made in European research projects, such as toolkit development by the
Planets and CASPAR projects, more are definitely needed. Among current
research and development activities is the development of curation micro-
services (noted in Chapter 8).
16) Coping with large and increasing quantities of changing digital mate-
rial. The sheer quantity of digital material being generated, coupled with their
changing nature and changes in the way they are used, raise major challenges
for their preservation. Sharing and re-using material are emphasized more and
more, requiring that standards for creating digital objects are agreed and
widely implemented to ensure they can be discovered, located and preserved.
The deconstruction, repurposing and constant rebundling of digital information
have significant implications for the authenticity of digital records of all kinds,
including business records. As noted in Chapter 5, authenticity can be ensured
by actions such as paying attention to the significant properties of digital records
and being meticulous about quality-checking procedures during migration.
This poses challenges in part because of the quantities of records involved and
the present lack of viable automated procedures to support these tasks.
206 Challenges for the Future of Digital Preservation
The discussion that follows is based on the views and comments gathered during
the interviews with digital preservation experts in 2004, and amplified by the
literature of digital preservation.
One frequently voiced issue is that digital preservation has not become a normal
part of mainstream practice in most institutions. Much of the development in
digital preservation to date has been carried out as special projects. The project
approach has some merit as a response to the magnitude of the problem, where
project-based approaches that bring stakeholders together to collaborate are
desirable and effective. It also has merit as a way of securing funding for scop-
ing studies, or of exploring a technique, or of testing the waters in other areas.
However, project-based funding is by definition short-term and engenders short-
term ways of thinking. Project-based digital preservation activities get in the way
of an appropriate recognition that digital preservation is here to stay and can only
be addressed if all institutions and stakeholders play a part. Beagrie states that
ultimately, digital preservation will be successful when it can be seen not as a stand
alone institutional activity but as an activity embedded in how institutions manage and
approach digital information and resources on an ongoing basis ... It remains a simple
objective, yet one immensely challenging to achieve (Beagrie, 2004).
Project-based approaches have achieved many notable results, but will not en-
sure sustainability in the medium or long term. The challenge is to integrate digi-
tal preservation fully into normal operations of libraries, archives, museums
Four major challenges 207
and other institutions with responsibility for digital sustainability. As one Austra-
lian digital preservation specialist put it, ‘the challenge we’ve got ... around
digital preservation … [is] that it just becomes the way we do things’. Another
Australian expert interviewed described the issue in the library context. Because
digital materials have not been incorporated into the normal processes, they
‘are treated as a different acquisition process, or as a different this or a different
that’. Short-term thinking based primarily on business need may lead to digital
preservation activities being halted; if no current demand for material is evident,
it may be disposed of.
The principal reason for this concern is that sustaining digital materials
over time requires ongoing, unbroken activity, expressed pithily by an Austra-
lian digital preservation specialist in this way: ‘preservation is not something
that you ever achieve. It’s just a process you’re in the middle of, or at the start
of, but never at the end of’. There needs to be full recognition that digital preser-
vation is an active process that is fully integrated into mainstream operations in
libraries, archives, museums and other institutions with responsibility for digital
sustainability.
How do we make the change from project-based, short-term support for
digital preservation to its integration into mainstream activities? What changes
will be required? Some institutions have created new positions at senior levels
that aim to break down existing structures by working across different areas. An
example is the position of Librarian/Archivist for Digital Projects at the
Schlesinger Library, part of Harvard University’s Radcliffe Institute for Ad-
vanced Study. This position has oversight of all digital materials in the
Schlesinger Library, from selection and appraisal to preservation. Other institu-
tions have already successfully integrated digital preservation into their normal
operations, or are well on the way to doing it. One is the National Archives of
Australia, whose preservation programme is ‘concerned to preserve all formats
of records … It’s not about digital … just as AV preservation is about preserva-
tion of the record – there are special skill sets for people needed in that too, but
it’s still fundamentally about preservation’. The National Library of Australia
‘has been able to institutionalize [digital preservation] or operationalize it’.
(These are the words of Australian digital preservation specialists interviewed in
2004.) The Wellcome Library is acquiring born-digital materials on the same ba-
sis as other material it collects, building digital preservation ‘into the everyday
business’ of the Library (Hilton and Thompson, 2007). Its early experience in
developing workflows and procedures to accommodate digital materials was
helped by not distinguishing digital materials as different (Thompson, 2008).
A Capability Maturity Guide developed by the Australian National Data
Service (2011), although intended to apply to research data infrastructure provi-
sion, is applicable to cultural heritage and other institutions who have digital
preservation programmes. It provides helpful guidelines for assessing the stage
an institution is at and for identifying areas for improvement. It proposes five
208 Challenges for the Future of Digital Preservation
levels: the initial level, the developmental level, the defined level, the managed
level, and the optimizing level. Summaries of each level are provided and within
each of these the level of performance in five areas (policies and procedures, IT
infrastructure, support services, and metadata management) is defined. At Level
1, ‘the organisation does not provide a stable environment to support research
data management. Expertise is likely to be concentrated within a few individuals
… Co-ordination and cohesion across the various groups (e.g. research office,
IT, library, records office, research areas) … is patchy, if non-existent’. By con-
trast, at Level 5 the organization ‘uses the policies, procedures, practices, ser-
vices and facilities already developed as the basis for continual improvement’.
Two approaches to encourage integration are gaining popularity. One is a
risk management approach, increasingly being applied to digital preservation
activities. This involves a systematic identification of potential risks and actions
applied to manage those risks. Another approach is to use business planning
methodologies (see Bishoff and Allen, 2004).
The UNESCO Guidelines note that two aspects should characterize reliable
digital preservation programmes. One is ‘organisational viability’ which is
based on ‘an ongoing mandate’ and appropriate resources and infrastructure.
The other is ‘financial sustainability’: the organization is likely to ‘provide the
required resources well into the future, with a sustainable business model to
support its digital preservation mandate’ (UNESCO, 2003, p.42).
As noted in the discussion above, digital preservation is vulnerable unless
it becomes part of core business activities. This means that its funding must
become a core business expense and, in the library context, that preservation
‘needs to be viewed as an activity just as essential as is cataloging’ (A. Smith,
2002, p.5). Funding should be sustained to pay the digital mortgage. Adequate
sustained funding is critical; it may be the key issue for digital preservation,
subsuming all others. Technical issues, such as determining significant properties
for new formats and ensuring that they are captured, can be seen as resourcing
issues, because the expertise to carry out such technical work can only be hired
if money to pay salaries is available. The technical infrastructure can be ex-
pensive to purchase, operate and replace on a regular basis; again, this is
a resourcing issue. Digital archives typically grow in size, so their funding
requirements also continue to grow.
One problem that continues to obstruct securing ongoing funding is our
lack of concrete information about how much it will cost. Projects such as Ce-
dars attempted to develop firmer costing data as part of their research activities,
but despite their efforts the outcomes are still not well defined. Early investiga-
tions of the costs of digital preservation included research into the ongoing
Four major challenges 209
should include the accumulation and synthesis of digital preservation case studies; the
development of appropriate policies for enhancing incentives based on the characteris-
tics of the underlying organizational model; characterizing and analyzing the structure
of aftermarkets associated with digital preservation services; and devising sustainable
pricing strategies for digital preservation services (Lavoie, 2003, p.iii).
Lavoie has since expanded his view of the economic issues to encompass the
responsibilities of stakeholders, strategies for organizing preservation resources
most efficiently (Lavoie, 2004), and public good in relation to responsibilities
and costings for digital preservation (Lavoie and Dempsey, 2004).
Research into costs is ongoing and to date focuses on developing better
costing data – Lavoie’s ‘accumulation and synthesis of digital preservation
case studies’ – and on investigating where resources might come from. The
210 Challenges for the Future of Digital Preservation
Blue Ribbon Task Force on Sustainable Digital Preservation and Access inves-
tigated digital preservation from the perspective of economic sustainability by
determining costs and identifying sustainable economic models that could
ensure the availability of resources for preservation activities. Its final report
(Blue Ribbon Task Force, 2010) is required reading for anyone interested in
digital preservation. Another investigation into sustainability, which focused
on the role of funding bodies, concluded that although the funders of digital
resource creation agree on the value of sustaining resources, their views on
what this means in practice is not uniform. The investigators’ report (Maron
and Loy, 2011) called for clearer articulation of thinking about requirements
for sustainability, the costs of achieving sustainability, and possible sources
of funding.
Several bodies are actively investigating the costs of ensuring that research
data are accessible. The Alliance for Permanent Access, one of whose aims is
to ‘support the development of a sustainable European Digital Information
infrastructure that guarantees the permanent access to the digital records of
science’ (www.alliancepermanentaccess.org/index.php/about) held a conference
in 2008 with the theme ‘Keeping the Records of Science Accessible: Can We
Afford It?’ (Alliance for Permanent Access, 2008). Among the points made at
this conference was that cost was determined in part by when preservation
actions were carried out. Curating data properly when they are created is much
cheaper than attempting to do so later. Clearly, then, planning data curation
before data creation begins is well worthwhile.
Two particularly significant cost modeling projects are LIFE and Keeping
Research Data Safe. The LIFE (Lifecycle Information for E-Literature) Project
(www.life.ac.uk), funded by JISC in the UK, had three phases. In the first
phase a LIFE model was developed that identified cost elements at specific
stages of the lifecycle; it was tested against case studies to develop a generic
preservation model. In Phase 2 new case studies were analyzed and the model
was further developed (Wheatley, 2008). Phase 3 has produced a web-based
predictive cost model (Hole et al., 2010). JISC also funded the Keeping Research
Data Safe (KRDS) project (www.beagrie.com/krds.php). Its outputs include a
toolkit to help determine the benefits of digital preservation, and a detailed
user guide to the application of the KRDS cost framework to develop local
cost models for digital preservation. A factsheet summarizes key findings from
the two phases of the KRDS project, including that acquisition and ingest of
digital materials into an archive cost much more (around 55 per cent of the total
preservation costs in one study) than archival storage and preservation activi-
ties, and that there are fixed costs that do not vary, regardless of the size of the
collection. The KRDS factsheet (Charles Beagrie Ltd and JISC, 2010) is re-
quired reading for anyone interested in digital preservation.
Those involved in the LIFE and KRDS projects conferred with personnel
from other cost modeling projects (the Danish CMDP, and DANS and the
Four major challenges 211
Commentators regularly lament the lack of people with the expertise needed to
extend the digital preservation agenda and have done so for many years. Hed-
strom and Montgomery noted in 1999 that ‘Lack of staff expertise is a common
problem both in institutions with digital preservation responsibilities and in
institutions that have not yet assumed responsibility for digital materials’, adding
that those with the required information technology skills typically did not
have an understanding of long-term preservation (Hedstrom and Montgomery,
1999, pp.16-18). In 2004 the International Council on Archives noted ‘the lack
of adequate training of, and human resources development for, records per-
sonnel’ and how this impeded the efforts of archivists to ‘protect archives as
evidence and to preserve an “authentic digital heritage”’ (Millar, 2004, p.5).
The proliferation of activities since then to identify skills needed for digital
preservation and to develop these skills, and the funding that has been devoted
to educating and training information professionals with digital preservation
skills indicates very clearly that the situation has persisted.
What new knowledge, what new skill sets are needed? Chapter 1 noted
that new skills would be required to implement new policies and develop new
procedures. A decade ago Jones and Beagrie described the digital preservation
work environment and identified some of its requirements, characterizing it as
an environment of rapid and constant change, where the boundaries of respon-
sibilities were blurred and increased weight was given to collaboration and
accountability. For digital preservation activities specifically, formal training
opportunities were rare, so that much was learned on the job, and informal
contacts with others working in the field were valuable (Jones and Beagrie,
2001, p.54). There was consensus that general professional skills were required
to a high level, to provide a deep understanding of the reasons why preservation
is required within the context of that profession and to provide an holistic under-
standing of digital preservation rather than a narrow perspective. Generic skills
such as project management, communication and presentation skills, and stra-
tegic thinking were also considered to be necessary at a high level. Technical
212 Challenges for the Future of Digital Preservation
skills to a higher degree than previously required were also considered essen-
tial. At a broader technical level, digital preservation specialists needed to
know about areas that assume greater importance for digital preservation, such
as metadata and XML, and also about the specific media that they work with.
The IT skills needed to be at the level where (in the words of an Australian
digital preservation specialist in 2004) ‘you’ve got the literacy to be able to get
down into that bit-stream and extract things and pull things out of corrupted
disks, and all of that – all that technical nous’, and, in addition to this computer
literacy, digital preservation specialists need to have more generic skills in areas
such as change management, writing comprehensible and rigorous documenta-
tion, and working as part of teams.
More recently, research has attempted to improve our understanding of ex-
actly what the skills needed for digital preservation are. The most detailed
investigation has been carried out by the DigCCurr (Digital Curation Curricu-
lum) project, based at the School of Information and Library Science, University
of North Carolina at Chapel Hill, which has produced a detailed listing of the
skills and competencies needed (Lee, 2008, 2009). Another investigation was
the SHERPA (Securing a Hybrid Environment for Research Preservation and
Access) Project (sherpa.ac.uk/index.html) which identified broad categories of
skills: management; software; metadata; storage and preservation; content;
advocacy, training and support; liaison (internal); liaison (external); and current
awareness and professional development (Robinson, 2009). The balance between
generic and technical skills is present in all listings of skills required for digital
preservation. Another example is given by Cunningham who includes advocacy
skills as well as generic personal attributes, such as the ability to respond to
change, among requirements for digital archivists (Cunningham, 2008, pp.541-
542). The interest in identifying better the skills needed and in ensuring the
availability of training in those continues, as evidenced by meetings such as
the ICE (International Curriculum Education) Forum held in London in 2011
(sils.unc.edu/events/2011/ice), at which participants from several countries dis-
cussed curricula, course design, and the production of educational materials at
all levels.
Where can the skills for digital preservation be acquired? The options range
from formal education programmes to learning on the job. In the past five years
there has been considerable growth in formal educational programmes devoted
to managing digital materials which have a substantial digital preservation
component. Examples are, in the UK, the MA in Digital Asset Management at
King’s College London (www.kcl.ac.uk/prospectus/graduate/index/name/digital-
asset-management) and the MSc in Information Management and Preservation
(Digital) offered by HATII (Humanities Advanced Technology and Information
Institute) at the University of Glasgow (www.gla.ac.uk/departments/hatii), and
in the US, the University of Arizona’s online Graduate Certificate in Digital
Information Management (digin.arizona.edu). Many other schools of informa-
Four major challenges 213
stages of action … Our most pressing demands were to make some decisions about
what we should try to preserve and to put those materials in a safe place … We have
formalized these processes into two broad terms: archiving and long-term preservation.
For the NLA, archiving refers to the process of bringing material into an archive; long-term
preservation refers to the process of ensuring that archived material remains authentic
and accessible (Webb, 2002, p.66)
This translates into a staged approach: start now; do what you can now; and
then consider the possibilities.
Think of a local history collection in a small public library. Such collections
are of considerable significance to the area they serve. This library will be
offered photographs of its locality in digital form; indeed, this is already hap-
pening. Local organizations now produce and disseminate newsletters only in
digital form, their web sites contain useful information for local history collec-
tions, and they develop significant databases that are widely used, for example
the St Croix African Roots Project (stx.visharoots.org). Other kinds of institu-
tions operate in similar ways, for instance the one-person special library, the
small archive perhaps staffed largely by volunteers, and the school library
which may already be attuned to the need to preserve the school’s heritage and
finds that much of it now resides on servers or on short-lived media. What
these small institutions have in common are digital objects in their collections,
staff whose duties are diverse and whose time is limited, and very limited
resources. These institutions urgently need guidance in the form of precise and
concise directions that can be readily implemented, perhaps as workflows
applicable to day-to-day operations. The challenge is to translate the lessons
learned in larger operations into guidelines and action lists for smaller institu-
tions.
Guides for individuals are now abundant. Examples include a guide to
designing preservable web sites whose five tips include ‘follow accessibility
standards’ and ‘avoid proprietary formats for important content or provide
alternate versions’ (Davis, 2011). YouTube clips are proliferating: ‘Backing
Up Your Digital Family Photos’ (www.youtube.com/watch?v=IQIoE8ceAu0)
Research and digital preservation 215
is but one of many examples. The Library of Congress also provides guidance
for individuals about archiving their personal digital photographs, audio,
video, emails and web sites (www.digitalpreservation.gov/you). In a similar
vein, the National Digital Stewardship Alliance in ‘Digital Preservation in a
Box’ provides resources that can be used in planning and presenting events
that introduce digital preservation (Lazorchak, 2011). There is definitely still
room for more.
– Why? What is the rationale for preservation? When an object is retrieved from the
archive will it still be valuable in 50 years time? Will it still be recognizable and com-
prehensible? … what costs are non-discretionary, how do they apply to an item’s life-
cycle in archive, and when will costs start to be discretionary? What benefits are
measurable, how can they be achieved, and who can be tasked with capturing them?
– How much? What contextual information is necessary for preservation? … what
contextual information is sufficient, so that when it is retrieved it can be interpreted
correctly? How the object will eventually be accessed, and for what purpose, how
will this affect the approach to preservation? …
– How? What are the preservation processes’ procedural needs in order to achieve a
long term archive? Who are the stakeholders, who will influence the way the archive
is built up and managed? What quick, cost-saving routes are there, which do not
adversely affect the quality of the archive? What safety nets exist which can provide a
fall-back for the archive should accidental loss or deliberate sabotage to the archive
occur?
– Where? While technology is in a state of continuous transition, when will technology
be resilient and stable enough for any item to be assured of its long term preservation?
(Bennett, 1997, pp.9-11).
At about the same time, Hedstrom noted four areas for research that were
likely to be fruitful – storage media, migration, conversion, and management
tools (Hedstrom, 1998, pp.197-200). Five years later, the outcomes from the
joint NSF and Library of Congress workshop to consider research challenges in
digital preservation provided a comprehensive map of the digital preservation
research agenda based on four themes:
1. Technical architectures for archival repositories: specification, system and tool
development, pilot implementation, and evaluation of repository models; develop a
spectrum of repository architectures; develop a spectrum of digital archiving
services; alternative repository models and interoperability; scalability and cost
2. Attributes of archival collections: articulating and modeling of curatorial processes;
developing appropriate preservation methods for diverse digital objects and collec-
tions; aggregation of items and objects into collections; decision models
3. Digital archiving tools and technologies: acquisition and ingest; managing the
evolution of tools, technology, standards, and metadata schema; naming and authori-
zation; standards and interoperability
4. Organizational, economic, and policy issues: metrics; economic and business models
(Workshop on Research Challenges in Digital Archiving and Long-term Preservation,
2003).
Research and digital preservation 217
The joint working group of the NSF and European Union’s DELOS Pro-
gramme also outlined an ambitious agenda, grouping their recommendations
for future research into three areas:
This report noted that three research areas were most likely to have the greatest
impact: self-contextualizing objects, metadata and the development of ontolo-
gies, and mechanisms for preservation of complex and dynamic objects
(NSF/DELOS Working Group on Digital Archiving and Preservation, 2003,
p.ix).
There have been frequent calls for more reports on practical experiences of
digital preservation. The predominant theme of a 2004 DPC forum about the
global context of digital preservation was ‘the need to develop and share prac-
tical experience’ (Digital Preservation Coalition, 2004). Lynch suggested that
the digital preservation experience to date is most applicable to the largest
organizations and that for smaller organizations at the state and local govern-
ment level, and for corporations of all sizes, ‘we need a research agenda to
give guidance and assistance for those who manage these valuable resources;
we need to offer them affordable and implementable approaches, and supporting
systems. There is a place not just for analytical research but also descriptive
research’ (Lynch, 2004).
A more recent agenda is the DPE’s Research Roadmap (DigitalPreservation
Europe, 2007), which is particularly useful for its synthesis of the major research
agendas published up to 2007. It identifies the principal issues that need to be
addressed in future digital preservation research. Ten areas are identified:
restoration; conservation; management; risk; significant properties of digital
objects; interoperability; automation; context; storage; and experimentation.
To what extent have these research agendas and concerns been addressed?
Three regions where there has been significant funding for research activity
stand out: the NDIIPP Program in the US; JISC-funded activities in the UK;
and European research projects funded by the EU. The wide scope of research
funded by NDIIPP is recorded in its 2010 report (NDIIPP, 2011), for example in
the Appendix C inventory of the tools and services created by NDIIPP partners
with funding from NDIIPP. Significant funding for digital preservation in the
US also comes from the National Science Foundation, most notably $100 million
218 Challenges for the Future of Digital Preservation
yet we still find ourselves not very far from the beginning in terms of exploring its eco-
nomic ramifications. No systematic study of the economics of digital preservation has
yet emerged (Lavoie, 2003, pp.41-42).
Research into digital preservation continues, and will continue into the fore-
seeable future, to be a pressing requirement if there is to be any real progress
in the preservation of digital materials.
1. Society must recognize, understand, and actively support the efforts to solve the
challenges of digital archiving
2. The challenges that electronic records, and more broadly, digital objects, pose in re-
spect to their long-term preservation and access must be fully explored as a priority re-
search and development area that receives both strategic planning and extensive funding
3. Preservation and access must be viewed as having an inseparable relationship
4. Archivists, information scientists, librarians, policy makers, and computer scientists
must address the full range of issues integral to digital preservation in a coordinated
and collaborative fashion
220 Challenges for the Future of Digital Preservation
5. The information technology industry must produce tools to support digital preserva-
tion and access
6. The notion of responsible custody of digital assets must pervade society; digital
archiving must become ubiquitous
7. Archivists must take a leading role in educating society regarding digital preserva-
tion (Tibbo, 2003, pp.43-52).
so many new technological developments since 2006 – social media, cloud computing,
smartphones, multi-terabyte hard drives. We’ve preserved so much digital content since
then, including websites, data sets and video. Plus there is so much more information
about digital preservation available: everything from short animated videos, to focused
websites, to specific guidance on personal digital archiving (LeFurgy, 2011).
We have examples of practice that has been successful for more than fifty
years for preserving some kinds of digital materials, particularly social science
data sets, but these have been developed for and work best with relatively
small quantities of simple materials. We must keep reflecting, investigating,
experimenting, and researching as the complexities of the digital world deepen
and new kinds of data evolve, and as new ways of interacting with and using
these data continue to develop.
Bibliography
The Preface to this book notes that there is a considerable amount of high-quality informa-
tion available about preserving materials in digital form, and that much of it is available on
the web. The accessibility that this provides is countered by the impermanence of much web
material, as noted in several chapters in this book. All URLs in this Bibliography were cor-
rect at the time of writing.
Authenticity in a digital environment (2000) Washington, D.C.: Council on Library and In-
formation Resources
Bailey, S. (2010) Is digital preservation now routine? Posting to Records Management
Futurewatch blog, 2 May <rmfuturewatch.blogspot.com/2010/05/is-digital-preservation-
now-routine.html>
Barrow, W.J. (1959) Deterioration of book stock: causes and remedies: two studies on the
permanence of book paper. Richmond, VA: Virginia State Library
Barton, M.R. and Walker, J.H. (2003) Building a business plan for DSpace, MIT Libraries’
digital institutional repository. Journal of Digital Information, 4 (2) <jodi.ecs.soton.ac.
uk/Articles/v04/i02/Barton>
Bastian, J., Cloonan, M. and Harvey, R. (2011) From teacher to learner to user: developing
a digital stewardship pedagogy. Library Trends, 59 (4), 607-622
Beagrie, N. (2001) Preserving UK digital library collections. Program, 35 (3), 217-226
Beagrie, N. (2003) National digital preservation initiatives: an overview of developments in
Australia, France, the Netherlands, and the United Kingdom and of related interna-
tional activity. Washington, D.C.: Council on Library and Information Resources and
Library of Congress
Beagrie, N. (2004) The Continuing Access and Digital Preservation Strategy for the UK
Joint Information Systems Committee (JISC). D-Lib Magazine, 10 (7/8) <www.dlib.
org/dlib/july04/beagrie/07beagrie.html>
Beagrie, N. et al. (2008) Digital preservation policies study. part 1, final report. Salisbury:
Charles Beagrie Ltd <www.jisc.ac.uk/media/documents/programmes/preservation/jisc
policy_p1finalreport.pdf>
Bearman, D. (1998) Archival methods. Archives and Museums Informatics Technical Report,
9. Pittsburgh: Archives and Records Informatics
Bellinger, M. et al. (2004) OCLC’s digital preservation program for the next generation
library. Advances in Librarianship, 27, 29-48
Bennett, J.C. (1997) A framework of data types and formats, and issues affecting the long
term preservation of digital material: JISC/NPO studies on the preservation of elec-
tronic materials. British Library Research and Innovation Paper, 50. London: British
Library Research and Innovation Centre
Bergau, N. (2010) Report on digital preservation practice and plans amongst LIBER members
with recommendations for practical action. EuropeanaTravel <www.europeanatravel.
eu/downloads/D1.3._ET_report_final_23092010.pdf>
Besek, J.M. et al. (2008) Digital preservation and copyright: an international study. Inter-
national Journal of Digital Curation, 3 (2), 103-111 <www.ijdc.net/index.php/ijdc/article/
viewFile/90/61>
Besser, H. (2000) Digital longevity. In Handbook for digital projects: a management tool
for preservation and access, ed. M. Sitts. Andover, Mass.: Northeast Document Con-
servation Center <www.nedcc.org/resources/digitalhandbook/dighome.htm>
Besser, H. (2007) Collaboration for electronic preservation. Library Trends, 56 (1), 216-229
Betts, M. (1999) Businesses worry about long-term data losses: will we access our saved
data in 20 years? Computerworld News, Sept 20
Bigourdan, J.-L. (2006) The preservation of magnetic tape collections: a perspective. Roches-
ter, NY: Image Permanence Institute <www.imagepermanenceinstitute.org/imaging/
research/magnetic-tape>
Billenness, C.S.G. (2011) The future of the past: shaping new visions for EU-research in
digital preservation. Luxemburg: European Commission, Information Society and Media
Directorate <http://cordis.europa.eu/fp7/ict/telearn-digicult/future-of-the-past_en.pdf>
224 Bibliography
Bishoff, L. and Allen, N. (2004) Business planning for cultural heritage institutions. Washing-
ton, D.C.: Council on Library and Information Resources
Blue Ribbon Task Force on Sustainable Digital Preservation and Access (2010) Sustainable
economics for a digital planet: ensuring long-term access to digital information <brtf.
sdsc.edu/biblio/BRTF_Final_Report.pdf>
Borghoff, U.M. et al. (2006) Long-term preservation of digital documents: principles and
practices. Berlin: Springer
Borgman, C.L. (2007) Scholarship in the digital age: information, infrastructure, and the
internet. Cambridge, MA: MIT Press
Boyle, F., Eveleigh, A. and Needham, H. (2008) Report on the survey regarding digital
preservation in local authority archive services <www.dpconline.org/docs/reports/dig
pressurvey08.pdf>
Brabazon, T. (2000) He lies like a rug: digitising memory. Media International Australia,
96, 153-161
Breeding, M. (2010) Ensuring our digital future. Computers in Libraries, 30 (9), 32-35
Brindley, L.J. (2000) Keynote address to the Preservation 2000 Conference. New Review of
Academic Librarianship, 6, 125-137
Brown, A. (2008a) Selecting file formats for long-term preservation. Digital preervation
guidance note 1 δwww.nationalarchives.gov.uk/documents/information-management/
selecting-file-formats.pdfε
Brown, A. (2008b) Selecting storage media for long-term preservation. Digital preservation
guidance note 2 δwww.nationalarchives.gov.uk/documents/information-management/
selecting-storage-media.pdfε
Brown, A. (2008c) Care, handling and storage of removable media. Digital preservation
guidance note 3 δwww.nationalarchives.gov.uk/documents/information-management/
removable-media-care.pdfε
Brown, G. and Woods, K. (2011) Born broken: fonts and information loss in legacy digital
documents. International Journal of Digital Curation, 6 (1), 5-19 δwww.ijdc.net/index.
php/ijdc/article/viewFile/159/243ε
Brown, H., et al. (2011) Digital curation reference manual: instalment on the role of microfilm
in digital preservation <www.dcc.ac.uk/resources/curation-reference-manual/microfilm>
Bryan, R.E. (2003) Survey results: preservation management of born-digital documents in
United States manuscripts repositories. In Symposium 2003: Preservation of Electronic
Records: New Knowledge and Decision-making, Ottawa, 15-18 September 2003: pre-
prints
BS4783 (1988- ) Storage, transportation and maintenance of media for use in data process-
ing and information storage. London: British Standards Institution
Bulger, M. et al. (2011) Reinventing research?: information practices in the humanities.
London: Research Information Network.
Burrows, T. (2000) Preserving the past, conceptualising the future: research libraries and
digital preservation. Australian Academic & Research Libraries, 31 (4), 142-153
Byers, F.R. (2003) Care and handling of CDs and DVDs: a guide for librarians and archi-
vists. Washington, D.C.: Council on Library and Information Resources and National
Institute of Standards and Technology
Caplan, P. (2006) DCC digital curation reference manual: instalment on preservation metadata
<www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/preservation-
metadata>
Caplan, P. (2007) The Florida Digital Archive and DAITSS: a working preservation repository
based on format migration. International Journal on Digital Libraries, 6 (4), 305-311.
Bibliography 225
Cook, T. (2000) Beyond the screen: the records continuum and archival cultural heritage.
Paper presented at the Australian Society of Archivist Conference, Melbourne, 18 August
<www.mybestdocs.com/cook-t-beyondthescreen-000818.htm>
CORDIS (2011) TeLearn–DigiCult: research topics and projects <cordis.europa.eu/fp7/ict/
telearn-digicult/digicult-projects_en.html>
Cox, R.J. (2000?) The functional requirements for evidence in recordkeeping <www.archi
muse.com/papers/nhprc>
Cox, R.J. (2001) Managing records as evidence and information. Westport, Conn.: Quorum
Books
Cox, R.J. (2002) Vandals in the stacks: a response to Nicholson Baker’s assault on libraries.
Westport, Conn: Greenwood Press
Cunningham, A. (2008) Digital curation/digital archiving: a view from the National Archives
of Australia. American Archivist, 71, 530-543.
Dale, R.L. (2004) Consortial actions and collaborative achievements: RLG’s preservation
program. Advances in Librarianship, 27, 1-23
Dale, R. and Gore, E. (2010) Process models and the development of trustworthy digital re-
positories. Information Standards Quarterly, 22 (2), 14-19
Darlington, J., Finney, A. and Pearce, A. (2003) Domesday Redux: the rescue of the BBC
Domesday Project videodiscs. Ariadne, 36 <www.ariadne.ac.uk/issue36/tna>
Davis, R.C. (2011) Five tips for designing preservable websites <blog.photography.si.edu/
2011/08/02/five-tips-for-designing-preservable-websites>
Day, R. (1989) Where’s the rot?: a special report on CD longevity. Stereo Review, April,
23-24
de Lusenet, Y. (2002) Preservation of digital heritage: draft discussion paper prepared for
UNESCO. European Commission on Preservation and Access <www.ica.org/download.
php?id=606>
de Lusenet, Y. (2007) Tending the garden or harvetsing the fields: digital preservation and
the UNESCO Charter on the Preservation of the Digital Heritage, Library Trends, 56
(1), 164-182
Deegan, M. and Tanner, S. (2002) The digital dark ages. Update, May
Deegan, M. and Tanner, S. (eds) (2006) Digital preservation. London: Facet
Del Pozo, N., Long, A.S. and Pearson, D. (2010) ‘Land of the lost’: a discussion of what
can be preserved through digital preservation. Library Hi Tech, 28 (2), 290-300
Digital Curation Centre (2008) The DCC Curation Lifecycle Model <www.dcc.ac.uk/docs/
publications/DCCLifecycle.pdf>
Digital Curation Centre (2010) Tools <www.dcc.ac.uk/resources/external/software-and-hard
ware/tools>
Digital Preservation Coalition (2003) Annual company report 23 July 2002-31 July 2003
<www.dpconline.org/docs/DPCAR02-03.pdf>
Digital Preservation Coalition (2004) Annual company report 1 August 2003-31 July 2004
<www.dpconline.org/docs/DPCAR03-04.pdf>
Digital Preservation Coalition (2004) Digital preservation: the global context: report on the
DPC Forum held at the British Library Conference Centre, Wednesday 23 June <www.
dpconline.org/events/previous-events/299-digital-preservation-the-global-context?format
=pdf>
Digital Preservation Coalition (2006) Interactive assessment: selection of digital materials
for long-term retention <www.dpconline.org/advice/preservationhandbook/decision-tree?
format=pdf>.
Bibliography 227
Flesch, J. (1996) A labour of love?: the story behind the compilation of Love brought to book:
a bio-bibliography of 20th century Australian romance novels. Australian Academic &
Research Libraries, 27 (3), 182-190
Flesch, J. (2004) From Australia with love: a history of modern Australian popular romance
novels. Fremantle: Curtin University Books
Florida Digital Archive (2009) FDA file preservation strategies by format <fclaweb.fcla.edu/
fda_format_landing_page>
Gertz, J. (2000) Selection for preservation in the digital age. Library Resources & Technical
Services, 44 (2), 97-104
Gilliland-Swetland, A.J. (2000) Enduring paradigm: the value of the archival perspective in
the digital environment. Washington, D.C.: Council on Library and Information Resources
Gilliland-Swetland, A.J. (2002) Testing our truths: delineating the parameters of the authentic
archival electronic record. American Archivist, 65, 196-215
Gilliland-Swetland, A.J. (2005) Electronic records management. ARIST: Annual Review of
Information Science and Technology, 39, 219-253
Gladney, H.M. (2007) Preserving digital information. Springer
Goethals, A. and Gogel, W. (2010). Reshaping the repository: the challenge of email archiving.
Presented at iPRES 2010, Vienna, September 20 <www.ifs.tuwien.ac.at/dp/ipres2010/
papers/goethals-08.pdf>
Gorman, M. (1997) What is the future of cataloguing and cataloguers? Paper presented at
the 63rd IFLA General Conference, Copenhagen, Denmark, August 31-September 5,
1997 <www.ifla.org/IV/ifla63/63gorm.htm>
Grace, S., Knight, G. and Montague, L. (2009) InSPECT final report <www.significant
properties.org.uk/inspect-finalreport.pdf>
Granger, S. (2000) Emulation as a digital preservation strategy. D-Lib Magazine, 6 (10)
<www.dlib.org/dlib/october00/granger/10granger.html>
Green, A., Dionne, J. and Dennis, M. (1999) Preserving the whole: a two-track approach to
rescuing social science data and metadata. Washington, D.C.: Council on Library and
Information Resources
Green, A., Macdonald, S. and Rice, R. (2009) Policy-making for research data in reposito-
ries: a guide. Version 1.2. Edinburgh: EDINA and University Data Library <www.
disc-uk.org/docs/guide.pdf>
Greenan, M. (2003) Dspace. DigiCULT.Info: A Newsletter on Digital Culture, 3, 8
Gunton, T. (1993) A dictionary of information technology and computer science, 2nd edn.
Manchester: NCC Blackwell
Gwinn, N.E. (1993) A national preservation program for agricultural literature <usain.org/
Preservation/preservation.pdf>
Hafner, K. (2004) Even digital memories fade. New York Times, 10 November
Halbert, M. and Skinner, K. (2008) The MetaArchive Cooperative: a new collaborative ser-
vice organization providing a distributed digital preservation infrastructure. CLIR Issues,
66 <www.clir.org/pubs/issues/issues66.html>
Harris, C. (2000) Selection for preservation. In Preservation: issues and planning, ed. P.N.
Banks and R. Pilette, pp.206-224. Chicago: American Library Association
Harter, R. (1999) Piltdown Man <home.tiac.net/~cri_a/piltdown/piltdown.html>
Harvey, R. (1993) Preservation in libraries: principles, strategies and practices for librari-
ans. London: Bowker-Saur
Harvey, R. (1995) From digital artefact to digital object. Paper presented at Multimedia
Preservation: Capturing the Rainbow Conference, Brisbane, 28-30 November 1995
<www.nla.gov.au/niac/meetings/npo95rh.html>
Bibliography 229
Lavoie, B.F. (2004) Of mice and men: economically sustainable preservation for the
twenty-first century. In Access in the future tense, pp.45-54
Lavoie, B. and Dempsey, L. (2004) Thirteen ways of looking at … digital preservation. D-Lib
Magazine, 10 (7/8) <www.dlib.org/dlib/july04/lavoie/07lavoie.html>
Lawrence, G.W. et al. (2000) Risk management of digital information: a file format investi-
gation. Washington, D.C.: Council on Library and Information Resources
Lazorchak, B. (2011) Digital preservation in a box: NDSA Outreach. Posting to The Signal:
Digital Preservation blog, August 3 <blogs.loc.gov/digitalpreservation/2011/08/digital-
preservation-in-a-box-ndsa-outreach>
Lee, C.A. (2008) High-level categories of digital curation functions. Draft, Version 14.
Chapel Hill, NC: DigCCurr <ils.unc.edu/digccurr/digccurr-funct-categories.pdf>
Lee, C.A. (2009) Matrix of digital curation knowledge and competencies (overview). Draft,
Version 13. Chapel Hill, NC: DigCCurr <ils.unc.edu/digccurr/digccurr-matrix.html>
Lee, C.A. (2010) Open Archival Information System (OAIS) Reference Model. In Encyclo-
pedia of Library and Information Sciences, 3rd ed., ed. by M.J. Bates and M.N. Maack,
1(1), pp.4020-4030. Boca Raton, Fl.: CRC Press
Lee, C.A. (ed.) (2011) I, digital: personal collections in the digital era. Chicago: Society of
American Archivists
Lee, K-H. et al. (2002) The state of the art and practice in digital preservation. Journal of
Research of the National Institute of Standards and Technology, 107 (1), 93-106
LeFurgy, B. (2011) Kicking off the 2011 NDIIPP Digital Preservation Partners Meeting.
Posting to The Signal: Digital Preservation blog, July 19 <blogs.loc.gov/digitalpreserva
tion/2011/07/kicking-off-the-2011-ndiipp-digital-preservation-partners-meeting>
Legal deposit (2007) <PADI summary]. National Library of Australia [www.nla.gov.au/
padi/topics/67.html>
Li, Y. and Banach, M. (2011) Institutional repositories and digital preservation: assessing
current practices at research libraries. D-Lib Magazine, 17 (5/6) <www.dlib.org/dlib/
may11/yuanli/05yuanli.html>
Library of Congress (2010) Partner tools and services <www.digitalpreservation.gov/partners/
resources/tools/index.html>
Lim S.L., Ramaiah, C.K. and Pitt K.W. (2003) Problems in the preservation of electronic
records. Library Review, 52 (3), 117-125
Linden, J. et al. (2005) The large-scale archival storage of digital objects: technology watch
report. DPC technology watch series report, 04-03 <www.dpconline.org/docs/dpctw04-
03.pdf>
Lohr, S. (2009) G.E.’s breakthrough can put 100 DCDs on a disc. New York Times, April 27
<www.nytimes.com/2009/04/27/technology/business-computing/27disk.html>
Lord, P. and Macdonald, A. (2003) E-science curation report: data curation for e-science
in the UK: an audit to establish requirements for future curation and provision. Pre-
pared for the JISC Committee for the Support of Research (JCSR). London: Digital
Archival Consultancy
Lorie, R. (2002) The UVC: a method of preserving digital documents ࣓ proof of concept.
Long-term preservation study report series, no. 4. Amsterdam: IBM Netherlands
Lowenthal, D. (1985) The past is a foreign country. Cambridge: Cambridge University
Press
Lukesh, S.S. (1999) E-mail and potential loss to future archives and scholarship or the dog
that didn’t bark. Firstmonday, 4 (9) <firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/
fm/article/view/692/602>
Bibliography 233
Lupovici, C. (2001) Technical data and preservation needs. Paper presented at the 67th
IFLA Council and General Conference, Boston, August 16-25 <www.ifla.org/IV/ifla67/
papers/163-168e.pdf>
Lyman, P. and Kahle, B. (1998) Archiving digital cultural artifacts: organizing an agenda
for action. D-Lib Magazine, July/August <www.dlib.org/dlib/july98/07lyman.html>
Lynch, C.A. (2004) Editor’s interview with Clifford A. Lynch. RLG DigiNews, 8 (4) <world-
cat.org/arcviewer/1/OCC/2007/08/08/0000070519/viewer/file3518.html#article0>
Lyon, L. et al. (2010) Disciplinary approaches to sharing, curation, reuse and preservation:
final report <www.dcc.ac.uk/sites/default/files/documents/scarp/SCARP-FinalReport-
Final-SENT.pdf>
Manus, S. (2011a) It’s been a busy year: partnership highlights. Posting to The Signal: Digi-
tal Preservation blog, August 19 <blogs.loc.gov/digitalpreservation/2011/08/it’s-been-
a-busy-year-–-partnership-highlights>
Manus, S. (2011b) A meeting of the minds for UDFR. Posting to The Signal: Digital Preser-
vation blog, June 17 <blogs.loc.gov/digitalpreservation/2011/06/a-meeting-of-the-minds-
for-udfr/>
Maron, N.L. and Kirby Smith, K. (2008) Current models of digital scholarly communica-
tion: results of an investigation conducted by Ithaka for the Association of Research
Libraries. Washington, D.C.: Association of Research Libraries <www.arl.org/bm~doc/
current-models-report.pdf >
Maron, N.L. and Loy, M. (2011) Funding for sustainability: how funders’ practices influence
the future of digital resources. Bristol: JISC <www.ithaka.org/ithaka-s-r/research/funding-
for-sustainability/FundingForSustainability.pdf>
McGovern, N.Y. (2007a) A digital decade: where have we been and where are we going in
digital preservation? RLG DigiNews, 11 (1) <worldcat.org/arcviewer/1/OCC/2007/07/
10/0000068890/viewer/file1.html#article3>
McGovern, N.Y. (2007b) ICPSR digital preservation policy framework <www.icpsr.umich.
edu/icpsrweb/ICPSR/curation/preservation/policies/dpp-framework.jsp>
McLeod, R. (2008) Risk assessment: using a risk based approach to prioritise handheld digital
information. Presented at iPRES 2008, British Library, London, September 29 <www.bl.
uk/ipres2008/presentations_day1/20_McLeod.pdf>
Meeting of Experts on Digital Preservation (2004) Report on the Meeting of Experts on
Digital Preservation: Metadata Specifications. Washington, D.C.: USGPO <www.nla.
gov.au/padi/metafiles/resources/15663.html>
Mellor, P. (2003) CAMiLEON: Emulation and BBC Domesday, RLG DigiNews, 7 (2)
<worldcat.org/arcviewer/1/OCC/2007/07/10/0000068904/viewer/file1.html#feature3>
MetaArchive Cooperative (2010) Charter. Atlanta, GA: Educopia Institute <www.meta
archive.org/public/resources/charter_member/2011_MetaArchive_Charter.pdf>
Millar, L. (2004) Authenticity of electronic records: a report prepared for UNESCO and the
International Council on Archives. ICA Study, 13-2. Paris: ICA
Minor, D., Phillips, M. and Schultz, M. (2010) Chronopolis and MetaArchive: preservation
cooperation. Presented at iPRES 2010, Vienna, September 21 <www.ifs.tuwien.ac.at/
dp/ipres2010/papers/minor-29.pdf>
Morris, S. (2002) The preservation problem: collaborative approaches. Information Services &
Use, 22, 127-132
Morrissey, S. et al. (2010) Portico: a case study in the use of XML for the long-term preser-
vation of digital artifacts. Presented at International Symposium on XML for the Long
Haul: Issues in the Long-term Preservation of XML, Montréal, Canada, August 2 <www.
balisage.net/Proceedings/vol6/html/Morrissey01/BalisageVol6-Morrissey01.html>
234 Bibliography
Muir, A. (2004) Digital preservation: awareness, responsibility and rights issues. Journal of
Information Science, 30 (1), 73-92
National Library of Australia (1999) A draft research agenda for the preservation of physical
format digital publications <pandora.nla.gov.au/pan/25426/20100713-1409/www.nla.gov.
au/policy/rsagenda.html>
National Library of Australia (2002) Persistent identifiers <www.nla.gov.au/initiatives/per
sistence.html>
National Library of Australia (2008) Digital preservation policy, 3rd ed. <www.nla.gov.au/
policy/digpres.html>
National Research Council (1995) Preserving scientific data on our physical universe: a
new strategy for archiving the nation’s scientific information resources. Washington,
D.C.: National Academy Press
NDIIPP (2011) Preserving our digital heritage: National Digital Information Infrastructure
and Preservation Program. Washington, D.C.: Library of Congress <www.digitalpreser
vation.gov/library/resources/pubs/docs/NDIIPP2010Report_Post.pdf>
nestor Working Group Trusted Repositories – Certification (2006) Catalogue of criteria for
trusted digital repositories. Frankfurt am Main: nestor <files.d-nb.de/nestor/materialien/
nestor_mat_08-eng.pdf>
Neumayer, R. and Rauber, A. (2007) Why appraisal is not ‘utterly’ useless and why it’s not
the way to go either: a provocative position paper <www.digitalpreservationeurope.
eu/publications/appraisal_final.pdf and also responses at www.digitalpreservation
europe.eu/forum/phpBB2/viewtopic.php?t=9>
NISO (2004) Understanding metadata. Bethesda, MD: National Information Standards
Organization Press
Noonan, D.W., McCrory, A. and Black, E.L. (2010) PDF/A: a viable addition to the
presevation toolkit. D-Lib Magazine, 16 (11-12) <www.dlib.org/dlib/november10/noonan/
11noonan.print.html>
NSF-DELOS Working Group on Digital Archiving and Preservation (2003) Invest to save:
report and recommendations of the NSF-DELOS Working Group on Digital Archiving
and Preservation. National Science Foundation & The European Union
Nurnberg, G. (1995) The places of books in the age of electronic reproduction. In Future
libraries, ed. R.H. Bloch and C.A. Hesse, pp.13-37. Berkeley: University of California
Press
OCLC/RLG Working Group on Preservation Metadata (2002) A metadata framework to
support the preservation of digital objects <www.oclc.org/research/projects/pmwg/pm_
framework.pdf>
OCLC/RLG PREMIS Working Group (2004) Implementing preservation strategies for
digital materials: current practice and emerging trends in the cultural heritage com-
munity. Dublin, OH: OCLC
OCLC/RLG PREMIS Working Group (2005) Data dictionary for preservation medata: final
report of the PREMIS Working Group. Dublin, OH: OCLC
Oliver, G. et al. (2008) Report on automated re-appraisal: managing archives in digital librar-
ies. Pisa: DELOS NoE
O’Mahony, D.P. (1998) Here today, gone tomorrow: what can be done to assure permanent
public access to electronic government information? Advances in Librarianship, 22,
107-121
Open Office (2010) File formats <wiki.services.openoffice.org/wiki/Documentation/OOo3_
User_Guides/Getting_Started/File_formats>
Bibliography 235
Pacey, A. (1991) Developing selection criteria for special collections. Canadian Library
Journal, 48, 187-190
Paradigm Project (2008) Workbook on digital private papers <www.paradigm.ac.uk>
PARBICA (2004?) Digital preservation. Toolkit guideline 18 (unpublished)
Paskin, N. (2010) Digital Object Identifier (DOI®) system. In Encyclopedia of Library and
Information Sciences, 3rd ed., ed. by M.J. Bates and M.N. Maack, 1(1), pp.1586-1592.
Boca Raton, Fl.: CRC Press
Pearson, D. (2009) Preserve or preserve not: there is no try: some dilemmas relating to per-
sonal digital archiving <www.nla.gov.au/openpublish/index.php/nlasp/article/view/1388/
1678>
Pennock, M. (2006a) DSpace digital repository software. Edinburgh: Digital Curation Centre
<www.dcc.ac.uk/webfm_send/462>
Pennock, M. (2006b) Fedora. Edinburgh: Digital Curation Centre <www.dcc.ac.uk/webfm_
send/463>
Persistent identifiers (2002) [PADI summary]. National Library of Australia <www.nla.gov.
au/padi/topics/36.html>
Piggott, M. (2001) Appraisal: the state of the art: paper delivered at a professional develop-
ment workshop presented by ASA South Australia Branch 26 March 2001 <asa.oxide
interactive.com.au/appraisal-state-art-26-march-2001>
Planets (2009) Survey analysis report <www.planets-project.eu/docs/reports/planets-survey-
analysis-report-dt11-d1.pdf>
Rackley, M. (2010) Internet Archive, Encyclopedia of Library and Information Sciences,
3rd ed., ed. by M.J. Bates and M.N. Maack, 1(1), pp.2966-2976. Boca Raton, Fl.: CRC
Press
Reich, V. and Rosenthal, D.S. (2009) Distributed digital preservation: private LOCKSS net-
works as business, social, and technical frameworks. Library Trends, 57 (3), 461-471
Research Information Network (2008) Stewardship of digital research data <www.rin.ac.uk/
system/files/attachments/Stewardship-data-guidelines.pdf>
Rhodes, S. (2011) Breaking down link rot: the Chesapeake Project Legal Information Ar-
chive’s examination of URL stability <www.llrx.com/features/linkrot.htm>
RLG-NARA Task Force on Digital Repository Certification (2007) Trustworthy repositories
audit & certification: criteria and checklist. Chicago: Center for Research Libraries
<www.crl.edu/PDF/trac.pdf>
RLG/OCLC Working Group on Digital Archive Attributes (2002) Trusted digital reposito-
ries: attributes and responsibilities. Mountain View, CA: Research Libraries Group
Robinson, M. (2009) Institutional repositories: staff and skills set. Nottingham: SHERPA
<www.sherpa.ac.uk/documents/Staff_and_Skills_Set_2009.pdf>
Rosenthal, D. (2010a) Bit preservation: a solved problem? International Journal of Digital
Curation, 5 (1), 134-148 <www.ijdc.net/index.php/ijdc/article/viewFile/151/224>
Rosenthal, D. (2010b) Format obsolescence: assessing the threat and the defenses. Library Hi
Tech, 28 (2), 195-210 <lockss.stanford.edu/locksswiki/files/LibraryHighTech2010. pdf>
Rosenthal, D. (2010c) The half-life of digital formats, dshr’s blog, November 24 <blog.dshr.
org/2010_11_01_archive.html>
Rosenthal, D. (2010d) Keeping bits safe: how hard can it be? Communications of the ACM,
53 (11), 47-55 <http://cacm.acm.org/magazines/2010/11/100620-keeping-bits-safe-how-
hard-can-it-be/fulltext>
Rosenthal, David (2010e) LOCKSS: Lots of Copies Keep Stuff Safe. Presented to the NIST
Digital Preservation Interoperability Framework Workshop, March 29-31. <http://lockss.
stanford.edu/locksswiki/files/NIST2010.pdf>
236 Bibliography
Ross, S. (2000) Changing trains at Wigan: digital preservation and the future of scholarship.
London: National Preservation Office, British Library <www.bl.uk/blpac/pdf/wigan.
pdf>
Ross, S. (2002) Position paper on integrity and authenticity of digital cultural heritage objects.
DigiCULT Thematic Issue, 1, 7-8 <www.digicult.info/downloads/thematic_issue_1_
final.pdf>
Ross, S. (2004) The role of ERPANET in supporting digital curation and preservation in
Europe. D-Lib Magazine, 10 (7/8) <www.dlib.org/dlib/july04/ross/07ross.html>
Ross, S. (2007) Digital preservation, archival science and methodological foundations for
digital libraries. Keynote address at the 11th European Conference on Digital Libraries,
Budapest, 17 September <www.ecdl2007.org/Keynote_ECDL2007_SROSS.pdf>
Ross, S. and Gow, A. (1999) Digital archaeology: rescuing neglected and damaged data
resources: a JISC/NPO study within Electronic Libraries (eLib) Programme on the
Preservation of Electronic Materials. London: Library Information Technology Centre
<www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf>
Ross, S., Greenan, M. and McKinney, P. (2003) Cross-sectoral development of digital preser-
vation strategies: ERPANET and the expansion of knowledge. In Symposium 2003:
Preservation of Electronic Records: New Knowledge and Decision-making, Ottawa,
15-18 September 2003: preprints
Ross, S. et al. (2004) New organisational structures responding to new challenges: the Digital
Curation Centre in the UK. DCC Public Lecture, Schweizerisches Bundesarchive, 25
October 2004 <www.erpanet.org/events/2004/bern/SR_DCCpresentation_BERNE_erpa
net_mtg_2.pdf>
Rothenberg, J. (1995) Ensuring the longevity of digital documents. Scientific American,
272, 42-47
Rothenberg, J. (1999a) Avoiding technological quicksand: finding a viable technical founda-
tion for digital preservation. Washington, D.C.: Council on Library and Information
Resources
Rothenberg, J. (1999b) Ensuring the longevity of digital information. Washington, D.C.:
Council on Library and Information Resources (Expanded version of Rothenberg 1995)
<www.clir.org/pubs/archives/ensuring.pdf>
Rothenberg, J. (2000) An experiment in using emulation to preserve digital publications
(NEDLIB report series, no. 1). Den Haag: Koninklijke Bibliotheek <www.kb.nl/hrd/dd/
dd_links_en_publicaties/nedlib/NEDLIBemulation.pdf>
Rothenberg, J. (2003) Digital preservation summary. Presented at Practical Experiences in
Digital Preservation Conference, National Archives, Kew, 2-4 April <www.national
archives.gov.uk/documents/rothenberg.pdf>
Rusbridge, A. (2004) Recognising advances in digital preservation. DigiCULT.Info: A News-
letter on Digital Culture, 8, 34-36
Saffady, W. (1993) Electronic document imaging systems: design, evaluation, and implemen-
tation. Westport, CT: Meckler
Sanett, S. (2003) The cost to preserve electronic records in perpetuity: comparing costs
across cost models and cost frameworks. RLG DigiNews, 7 (4) <worldcat.org/arcviewer/
1/OCC/2007/07/10/0000068885/viewer/file1.html#feature2>
Schonfield, R.C. et al. (2004) The nonsubscription side of periodicals: changes in library
operations and costs between print and electronic formats. Washington, D.C.: Council
on Library and Information Resources
Seadle, M. (2010) Archiving in the networked world: LOCKSS and national hosting. Library
Hi Tech, 28 (4), 710-171
Bibliography 237
Wheatley, P. (2008) Costing the digital preservation lifecycle more effectively. Presented at
iPRES 2008, British Library, London, September 29 <www.bl.uk/ipres2008/presen
tations_day1/19_Wheatley.pdf>
Wheatley, P. et al. (2007) The LIFE Model v1.1. London: LIFE Project <http://discovery.
ucl.ac.uk/4831>
Whyte, A. and Wilson, A. (2010) How to appraise & select research data for curation.
<www.dcc.ac.uk/resources/how-guides/appraise-select-research-data>
Wilson, A. (2003) Why the Dublin Core Metadata Initiative (DCMI) is important. DigiCULT.
Info: A Newsletter on Digital Culture, 6, 32-34
Woods, K. and Brown, G. (2008) Migration performance for legacy data access. Interna-
tional Journal of Digital Curation, 3 (2), 74–87 <www.ijdc.net/index.php/ijdc/article/
viewFile/88/59>
Woodyard, D. (2000) Digital preservation: the Australian experience. Paper presented at
Positioning the Fountain of Knowledge: 3rd Digital Library Conference, Kuching 2-4
October <www.nla.gov.au/nla/staffpaper/dw001004.html>
Workshop on Research Challenges in Digital Archiving and Long-term Preservation (2003)
It’s about time: research challenges in digital archiving and long-term preservation:
final report. Washington, D.C.: National Science Foundation and Library of Congress
Workshop on the Future of the Past (2011) Summary Report on the Proceedings, Luxem-
bourg, 4-11 May 2011. Luxembourg: European Commission <http://cordis.europa.eu/
fp7/ict/telearn-digicult/future-of-the-past-summary_en.pdf>
Wright, R. (2005) Annual report on preservation issues for European audiovisual collections
<www.prestospace.org/project/deliverables/D22-4_Report_on_Preservation_Issues_
2004.pdf>
York, J. (2010) Building a future by preserving our past: the preservation infrastructure of
HathiTrust Digital Library. Paper presented at World Library and Information Congress:
76th IFLA General Conference and Assembly, Gothenburg, Sweden, 10-15 August
<www.hathitrust.org/documents/hathitrust-ifla-201008.pdf>
Zierau, E. and van Wijk, C. (2008) The Planets approach to migration tools. Paper presented
at IS&T Archiving conference δwww.planets-project.eu/docs/papers/Archiving2008_
Zierau_Wijk.pdfε
Index
A ARL see Association of Research
Libraries
acceptable loss 76, 98
artifact as carrier of information 7, 11
access, changing requirements 15
artifacts, preservation 9, 10, 14, 17, 60-61,
definition 19
75
maintenance 204
Arts and Humanities Data Service 188,
access devices 51-52, see also playback
196
equipment
Association of American Publishers 86
accessibility, definition 19
Association of Research Libraries 9, 170
acclimation, of storage media 126
survey of libraries 106-107
accountability 26, 28, 88
attributes to be preserved 75-98, 203
action, need for 203-204
audiovisual archiving, technology
active strategies 100
preservation 132
administrative metadata 83
audit trails 90
AHDS see Arts and Humanities Data
Australian National Data Service 207-208
Service
Australian practice 5
Alabama Digital Preservation Network
Australian preservationists, views on
178-179
strategies 105-106
Alliance for Permanent Access 31, 172,
Australian Standard for Records
210
Management 63
alteration for preservation 15-16, 75, 89-
authenticity 18, 76, 77, 80, 87-97, 108,
90, 160
156, see also trustworthiness
Amazon AWS 177
definition 20, 22, 88
Amazon S3 165
ensuring 40, 41, 54, 132, 204, 205
amnesiac society 33
pre-digital conditions 89
analogue backups 100, 123, 129-
research 95-97
130
threats to 89-90
appraisal 56, 61-63, 64
automation, 103, 110, 205
and systems design 71
of appraisal processes 64, 73
appraisal practice, and digital materials
of metadata creation and
63-64
management 85-86
archival discs 37, 44, 128
of migration 160
archival file formats, developing 141, 153-
awareness, of preservation issues 37-38,
154
41-42, 103, 200, 204-205
archival practice 28, 58, 79-80
Archival Resource Key see ARK
B
archival value 10-11, 61
Archivematica 166 ‘backup and restore’ 142, 143, 164
archives, and memory 58 backwards compatibility 100
Archives Association of Ontario 62 Barrow, W., on deterioration of books 33
archiving, IT definitions 18 BBC Domesday Project 35, 36, 37, 135,
archivists, and selection 56, 61-63 136-137, 157, 183
preservation responsibility 28, 57 BEAM 131
ARK 87 benign neglect 3, 8, 10, 17, 201
242 Index
E F
e-Depot 185 Fedora 175, 177
e-journals 32 FIDO project 131
e-repositories see digital repositories file compression 52-53
e-research see cyberscholarship file formats 141, 143-154
e-science see cyberscholarship proliferation 40
EAD see Encoded Archival Description registries 147-150
economic rationale 26 restricting range of 152-153, see
economic sustainability 209-210 also normalization
ECPA see European Commission for standardization 144, 150-152, 157
Preservation and Access sustainability criteria 150-151
Index 245