0% found this document useful (0 votes)
45 views

Citation For Published Version

Manual

Uploaded by

Antoniomotos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Citation For Published Version

Manual

Uploaded by

Antoniomotos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Citation for published version:

Heery, R, Powell, A & Day, M 1997, 'Metadata', Library & Information Briefings, vol. 75, pp. 1-19.

Publication date:
1997

Link to publication

University of Bath

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

Take down policy


If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.

Download date: 08. ago. 2020


Issue 75 September 1997

Metadata

by Rachel Heery, Andy Powell and Michael Day


Metadata Projects Group, UKOLN The UK Office for
Library and Information Networking, University of Bath

‘Metadata’ has become a fashionable and overused term, but


nevertheless provides a useful label within the library world for description
of digital resources. It is an important part of the activity being undertaken to

impose some order on the explosion of material available across networks.


This Briefing examines metadata within the context of network information
management and describes some of the growing number of projects and

services which are now using metadata for resource discovery in a networked
environment.
Metadata

metadata to mean the information about a resource


INTRODUCTION which enables us to identify, locate and request that
resource. Metadata also allows us to manage resources,
both in terms of local database management, and
Why metadata? access management (for example controlling terms
and conditions of access). Metadata can be ‘descrip-
Some people do not like the term ‘metadata’. Metadata tive data’, such as author, title; ‘subject data’, such as
means subtly different things within the various disci- uncontrolled keywords or controlled language
plines that use the term. It has also become a fashionable descriptors; ‘access data’, describing hardware and
term, and is often overused. We would argue, how- software requirements for using a resource; and
ever, that it is a label useful within the library world for metadata might also be ‘administrative data’, describ-
referring to information about resources, and in par- ing the metadata itself, such as who created the record,
ticular description of digital resources. There is a date the record was created, owner of the metadata
different emphasis within the computer science disci- record. It might also include information about terms
plines, where the term refers to data which describe and conditions of use. The range of metadata as
data elements, datasets or database management sys- described here illustrates that metadata is itself ‘data’
tems, and where metadata models and metadata systems and, particularly in the context of system design, is not
are constructed to integrate disparate databases. One usefully distinguished from other data.3
can see overlaps between such work and resource
discovery and information management, but there are What is metadata for?
marked differences in the nature of the data described:
the unit being described would be a data element in Much activity is centred on development of metadata
computer science, and a resource in the information formats and the standardization of these formats. The
world. In the information world metadata may consist emphasis on formats should not obscure the impor-
of an agreed set of data elements with agreed seman- tance of the process requirements—metadata cannot
tics, agreed syntax and agreed rules for formulating be viewed in isolation from the context in which it is
the content of the elements. used. Within information systems metadata performs
a range of functions. These include :
The term metadata is useful in that it acknowledges a
significant change in the emphasis between traditional • Searching: identifying the existence of a resource
book cataloguing and the activity being undertaken by keyword searching, browsing indexes or
today to impose some order on the explosion of visualization techniques.
material available across networks. Caplan points out • Location: finding a particular instance of a re
the advantages of using a ‘new’ term that does not source.
have the traditional connotations of cataloguing.1 The • Selection: analysis and evaluation based on the
popularity of its usage is indicative of the interest in description provided.
resource description running across both computer • Semantic interoperability: allowing searching
science and librarianship. It reflects changes in the across domains by means of equivalent elements.
nature of cataloguing brought about by digital tech- • Resource management: collection and database
nology, changes which David Levy typifies as management.
‘cataloguing in the digital order’.2 • Terms of availability information.

Within this Briefing we will be examining metadata in Decisions about formats will be influenced by which
the context of what is traditionally called biblio- of the above functions the metadata will perform.
graphic control but might more widely be understood Thus within a system it will sometimes be appropriate
as network information management. We will use to have a simple metadata format, for example to allow

-2- Library & Information Briefings 75


Metadata

for interoperability in searching across subject do- although in reality this will be limited in terms of
mains, while on occasion a richer format will be granularity and frequency of update.
required to enable selection of resources in a special- • Geographical: covering all Web sites in a
ized domain. particular area, country or region.
• Sectoral: this might be a subject area, a user
Much of this Briefing will concentrate on metadata as community like higher education, or a curatorial
it relates to networked resources and in particular to tradition like museums, libraries or archives.
World Wide Web resources. The opportunities pro- • Selective: typically sectoral services will select
vided by the Web for new services and new publishing resources for description on the basis of quality
processes require new forms of resource description. criteria.
The volatile nature of Web documents and the con- • Organizational/Intranet: organizations or
tinuing increase in the amount of information being individuals may want to allow searching of their
made available are driving services to seek alterna- own resources.
tives to high cost traditional cataloguing. Services
looking at incorporating the advantages of an auto- The indexes on which these services are based may be
mated approach to indexing are tending towards the derived from the automatically harvested full text of
use of simple resource description formats. Web resources, or they may be based on records
created manually. Pilot implementations are now be-
Resource discovery service models ginning to make use of metadata embedded in resources,
in particular Dublin Core embedded in HTML. In the
In the context of the Web, users are offered alternative future it seems likely that more metadata will be held
options for discovering resources, all of which are on Web sites independently of the HTML, or on third
based more or less on structured metadata. These party databases linked to the Web resource.
include:
Range of formats
• Lists: lists of pointers to useful resources.
• Searching: by keyword or controlled vocabulary. When examining the issues surrounding the use of
• Browsing: alphabetically by subject keyword, or metadata within the Web environment, it is helpful to
using more formal subject classification schemes. consider the wider context of resource discovery.
• Visualization: navigation of the Web site by Metadata formats vary according to a number of
spatial browsing techniques. criteria and there is increasing awareness of the
strengths and weaknesses of these various diverse
At present the predominant service for discovery of formats. Metadata ranges from generic simple Internet
Web resources is the search engine or search service resource descriptions to highly structured records
which may use one or more of these techniques. relating to complex objects such as databases. On the
Search engines can be categorized by their coverage one hand there is the full text indexing of the global
and selection policy, and by the method by which their search services (Excite, Lycos, etc.) where the com-
indexes are created. A number of search services have plete text of Web documents is indexed, there is no
been evaluated although the lack of information on fielded record, and the ‘display record’ is an extract
policies available from the larger services make com- from the full text, typically the first few lines. On the
parisons difficult.4,5 other hand there are the complex tagged record of
MARC formats, or the analytical mark-up of SGML-
Coverage of search engines can be characterized based formats.
as:
Detailed reviews of current metadata formats have
• Global: these would attempt to cover all Web sites, been carried out elsewhere.6,7 Here we will present a

Library & Information Briefings 75 -3-


Metadata

simple typology of formats along a continuum from


simple to rich (Figure 1).
DUBLIN CORE

Depending on the position on this continuum from Dublin Core history


simple to rich it is possible to associate a number of
characteristics with the three bands of metadata and The Dublin Core Element Set (Dublin Core or DC) is
these are summarized briefly in Figure 2. The simplest a fifteen element metadata set that is primarily in-
formats are used to create relatively unstructured tended to aid resource discovery on the Web.8 Dublin
indexes for locating items, whereas the most complex Core forms a simple description record, which has
records can be used as the basis of sophisticated emerged as a result of a series of workshops sponsored
analysis and navigational tools. The simpler records by the Online Computer Library Center (OCLC) and
are created automatically and the more complex by other organizations:
hand. This will affect the overall cost of record crea-
tion. Simpler records do not permit complex • OCLC/NCSA Metadata Workshop, Dublin, Ohio.
designation of sub-fields and qualifiers whereas the March 1995.
richer records have defined rules for detailed designa- • OCLC/UKOLN Warwick Metadata Workshop,
tion of sub-fields. The more complex formats are Warwick. April 1996.
associated with relatively heavy-weight search and • CNI/OCLC Image Metadata Workshop, Dublin,
retrieve protocols (like Z39.50), whereas the simpler Ohio. September 1996.
formats tend to be associated with directory service • Fourth Dublin Core Workshop, Canberra. March
protocols. 1997.

FIGURE 1: A SIMPLE TYPOLOGY OF RESOURCE DISCOVERY METADATA

Simple Rich

Full text indexing e.g. ROADS templates MARC


Alta Vista
Dublin Core TEI headers
Proprietary formats e.g. SOIF CIMI
Yahoo!
NetFirst EAD

FIGURE 2: ASSOCIATED CHARACTERISTICS OF METADATA FORMATS

Simple Rich

Location Selection Evaluation and analysis


Robot generated Robot plus manual input Manually created
Unstructured Attribute/value pairs Sub-fields
Proprietary Emerging standards International standards

-4- Library & Information Briefings 75


Metadata

The workshops represent a consensus building effort The Warwick workshop also looked at the implemen-
which has included participants from a range of back- tation of Dublin Core and requirements for
grounds (IETF, SGML, digital library research), extensibility, change control and implementation. The
domains (text, image, geographic information sys- Warwick framework emerged as a concept from the
tems) and professions (librarians, computer scientists, second workshop. This is a model for a container
content specialists). This consensus and the interna- architecture for packages of metadata, each package
tional acceptance of Dublin Core are probably the being metadata of a different type.9,10
most significant outcomes of the workshops, and have
largely been achieved through the leadership of OCLC. The third workshop, the CNI/OCLC Image metadata
workshop, considered use of the Dublin Core element
The objectives for Dublin Core set by the first work- set for describing images, in particular those images
shop were firstly to define a simple set of data elements which could be defined as ‘document like objects’.
so that authors and publishers of Internet documents Perhaps surprisingly the workshop reached the con-
could create their own metadata with no extensive clusion that images could be described using the
training—the Dublin Core approach being mid-way minimal Dublin Core elements with some minor ad-
between the detailed tagging of MARC or structured justments.
TEI headers and the automatic indexing of locator
services such as Alta Vista. Secondly, Dublin Core Discussion prior to the fourth workshop in Canberra
aimed to provide a basis for semantic interoperability resulted in agreement to extend the thirteen elements
between other, more complicated, formats. By means agreed in Dublin to fifteen, and these fifteen have been
of mapping from more complex formats, and by ‘fil- defined and documented in an Internet-Draft. 11 Note
tering’ more complex formats, Dublin core facilitates that all Dublin Core elements are optional, so you do
searching across other disparate record formats. not have to embed all fifteen elements into each Web
page. They can also be repeated if necessary, for
An initial element set was agreed upon and certain example to indicate that a page has more than one
principles were established for further development author.
of the set, these being :
The semantics of Dublin Core elements can be modi-
• Extensibility: the core set can be extended with fied using qualifiers, and use of qualifiers was central
further elements as it is acknowledged that many to discussions at the Canberra workshop.12 There are
‘publishers’ or metadata producers may wish to three kinds of qualifier: the TYPE qualifier which
augment this simple set with more specialized refines the meaning of an element; the SCHEME
data. qualifier which indicates that the element value con-
• Optionality: all elements are optional. forms to some external and widely recognized scheme;
• Repeatability: all elements are repeatable. and the LANGUAGE qualifier which indicates the
language of the element value. It has been agreed that
During the first workshop there was an explicit deci- the use of qualifiers should refine the element rather
sion not to define syntax at that stage, but first to reach than extend it. In general, the intention is that a Web
consensus on the semantics of a minimum element set. robot should be able to take the embedded Dublin
To tie Dublin Core semantics to any one particular Core metadata, throw away all of the qualifiers and
syntax (as in the MARC family of record formats) was still have something meaningful to add to its index.
seen as unhelpful. The second workshop, which took However, the widespread use of qualifiers could cause
place in the UK at the University of Warwick in April severe problems with interoperability.
1996 sponsored by UKOLN and OCLC, went on to
consider possible syntaxes. Embedding metadata in The marked confidence in Dublin Core has had sig-
resources using HTML was the obvious choice to nificant impact on standards-making activities such as
fulfil the immediate need of pilot implementations. USMARC discussions, Z39.50, and W3C initiatives;

Library & Information Briefings 75 -5-


Metadata

it has also been chosen as the solution for early More recently Web-site management tools have be-
implementations within projects in Australia, Scandi- come available which hold all the pages for a site in a
navia, Europe and the US. database. A ‘publish’ button causes the information in
the database to be written out as a set of HTML Web
Dublin Core creation and management pages. These tools have the immediate advantage of
standardizing the style of Web pages across a site, and
By embedding Dublin Core metadata into Web pages in future may become metadata aware. In the mean-
and then gathering it into searchable databases using time the use of these tools for managing metadata may
Web robots it will be possible to provide Web-based be possible using available ‘macro’ facilities.
search services with improved precision over those
currently available. Sites interested in home grown solutions to the issues
of managing metadata may choose to hold the metadata
In order for Web page authors and Web-site adminis- separately, in a neutral format, and then convert it and
trators to be able to embed Dublin Core metadata into embed it into Web pages using ‘server-side include’
Web pages there need to be tools available.13 As an aid scripts. A more detailed description about one such
to creating Dublin Core META tags several Web system being implemented at UKOLN is available
based ‘Dublin Core generators’ have been made avail- elsewhere.15
able on the Web. One of these is DC-dot, available
from the UKOLN Web-site.14 DC-dot first prompts
for the URL of the Web page that you want to describe.
It then retrieves that page from the Web and automati- WEB INDEXES
cally generates Dublin Core META tags to describe it.
The Dublin Core META tags are then displayed in
such a way that they can be updated and extended Harvesting
manually using a Web form. Once editing is complete
the tags can be copied into a Web page using cut-and- Once Dublin Core metadata is embedded into signifi-
paste to a text editor. Alternatively, DC-dot will convert cant numbers of HTML Web pages it needs to be
the Dublin Core into other formats, including collected into a Web index so that it can be made
USMARC, SOIF, XML, IAFA/ROADS, and send available using a search engine. This may be done on
these formats back to you via your Web browser or e- a site-wide basis, to form a local site search engine, or
mail. it may be done across a group of Web servers to form
a more comprehensive search engine encompassing,
However, the last few years have seen a general move for example, all the Web pages in a geographical
away from using simple text editors to create and region or subject area. The collection of metadata
maintain HTML pages towards the use of more so- from Web pages is usually done using a Web robot. A
phisticated authoring tools. These tools do not, Web robot can be thought of as an automated Web
in general, make it easy to add META tags to Web browser. Starting from a given URL or set of URLs it
pages. Even where tools do allow for the creation of visits each page in turn extracting the embedded
META tags there are longer term issues associated metadata and adding it into a database (Web index).
with embedding metadata by hand that must be con- For each page visited, the robot also extracts all the
sidered. What happens if the syntax for embedding embedded links in the page and adds them into a list of
metadata in HTML changes in the future? How easy URLs still to be visited. The robot needs to maintain
will it be to move embedded metadata into alternative this list of URLs in such a way that it does not visit the
metadata formats that are likely to become more same server too often in quick succession, thus over-
commonly used in the future, for example in PICS- loading it, but also needs to ensure that pages are
NG? revisited fairly regularly so that information in the

-6- Library & Information Briefings 75


Metadata

METADATA IN HTML
HTML allows arbitrary metadata to be embedded into the head section using the META tag. To make things
clearer, here is an example:
<HTML>
<HEAD>
<TITLE>UKOLN: UK Office for Library and Information Networking</TITLE>
<META NAME=”Keywords” CONTENT=”national centre, network information support, library
community, awareness, research, information services, public library networking,
bibliographic management, distributed library systems, metadata, resource discovery,
conferences, lectures, workshops”>
<META NAME=”Description” CONTENT=”UKOLN is a national centre for support in
network information management in the library and information communities. It provides
awareness, research and information services and is based at the University of Bath”>
</HEAD>
<BODY>
...
</BODY>
</HTML>
In this example, the TITLE tag and the two META tags give the title, some keywords and a short description
for the page. Note that the HTML specification does not say anything about what type of metadata should be
placed into the META tags. However, the Web robots used by some of the big Internet search engines (for
example Alta Vista) look for the two META tags shown in this example and use them to improve the
effectiveness of their searches. Words found in theses tags are given extra weight when they match user
queries and pages with these tags tend to appear higher up in search results than pages without them. Because
of this, these two META tags are in fairly common usage.

DUBLIN CORE IN HTML


The elements in the Dublin Core are TITLE, SUBJECT, DESCRIPTION, CREATOR, PUBLISHER, CON-
TRIBUTOR, DATE, TYPE, FORMAT, IDENTIFIER, SOURCE, LANGUAGE, RELATION, COVERAGE and
RIGHTS. These elements can be embedded into META tags in the head section of a Web page in a similar
way as the example above. Here is the same page with embedded Dublin Core tags:
<HTML>
<HEAD>
<TITLE>UKOLN: UK Office for Library and Information Networking</TITLE>
<META NAME=”DC.title” CONTENT=”UKOLN: UK Office for Library and Information Networking”>
<META NAME=”DC.subject” CONTENT=”national centre, network information support, library
community, awareness, research, information services, public library networking,
bibliographic management, distributed library systems, metadata, resource discovery,
conferences, lectures, workshops”>
<META NAME=”DC.description” CONTENT=”UKOLN is a national centre for support in network
information management in the library and information communities. It provides
awareness, research and information services and is based at the University of Bath”>
<META NAME=”DC.creator” CONTENT=”UKOLN Information Services Group”>
</HEAD>
<BODY>
...
</BODY>
</HTML>

Library & Information Briefings 75 -7-


Metadata

database does not become out of date. For large search metadata in pages.19 The eLib ROADS project, which
engines covering many Web sites it may be necessary provides the tools used by the other eLib ‘subject
to run several Web robots on several machines, all services’ to construct databases of Internet resource
feeding metadata into the same database, in order to descriptions, will also use this software to construct
increase the rate at which Web pages can be indexed. robot-generated ROADS databases. There are other
projects around the world looking at similar areas.20
This is exactly how the big search engines, like Alta Some of these projects are described in more detail
Vista, function. However, their Web robots do not later in this Briefing.
currently look for embedded Dublin Core and thus
have to extract the available metadata in the form of Distributed searching
Keywords and Description META tags or try to auto-
matically generate metadata based on the text of the Having collected metadata using a Web robot, it needs
HTML page or simply build a full-text index. In many to be made available for searching. There are several
cases a combination of these three approaches is approaches to this. A fundamental concept is that of
taken. centralized verses distributed searching. A central-
ized search engine pulls all the metadata into a single
In the case of building a search engine for a single database. Although this database may be mirrored in
Web-site it may not be necessary to run a Web robot several places, users only have the opportunity of
to collect metadata. The Web index can be built searching one database at a time. Alta Vista is an
directly from the files on the Web server filestore. This example of a centralized Web index. A distributed
is the approach taken by the public domain CNIDR Web index is made up of a group of databases that may
Isite software.16 Isite is an integrated Internet publish- well be physically distributed across the Internet. In
ing software package including a text indexer, a search addition to sharing the load across multiple servers
engine and Z39.50 communication tools to access this approach also allows for localized management of
databases.17 It is worth noting that there are a couple of server databases. Searches may be sent in parallel to
problems in building an index based directly on files all the databases and the results merged, or may be
rather than by using a Web robot. Firstly, a filestore routed to appropriate databases in some way.
view of a Web server may include many pages that are
not visible on the Web (because they are not linked to There are various protocols available to facilitate
any other pages). It may well be undesirable to include distributed searching, including Z39.50, WHOIS++,
such pages in a Web index. Secondly, metadata that is and LDAP (described below). These protocols enable
embedded using server side includes (SSI) will not be a client to send a search request to a server and obtain
available to a program that simply reads a file from the results from several databases. Depending on the
Web filestore. protocol and the contents of the underlying database,
the client may be able to request more detailed infor-
Although none of the big search engines looks for mation about the search results (which may initially be
embedded Dublin Core metadata, there are some returned as a simple list of hits) and may also be able
projects that are developing robots that do. The Euro- to request that the full text of the object be returned. In
pean DESIRE project is building a partial European some cases the client may be a dedicated piece of
Web index, covering the Nordic countries, using a software, for example a Java applet or a Web browser
Web robot that is being enhanced to extract embedded plug-in, running on the end user’s local computer.
Dublin Core metadata.18 Similarly, the UK Electronic Often, however, the search client will be a CGI based
Libraries Programme (eLib) NewsAgent for Libraries gateway running on a Web server and accessed by the
project will obtain information content for the service end user as a Web based form.21,22
by the use of a Web robot that will look for embedded
Dublin Core and other—NewsAgent—specific The DESIRE European Web Index, following the

-8- Library & Information Briefings 75


Metadata

distributed model, is made available using several ‘development of a limited number of top level net-
GILS compliant Z39.50 servers, one per country. working navigation tools in the UK to encourage the
Users indicate which of the servers they would like growth of local subject based tools and information
their search sent to as part of specifying the search. servers’.24 Once eLib was in place, it funded several
Results from multiple servers are merged before being Access to Network Resources (ANR) projects and
displayed to the user. services.25 These include:

In the ROADS project, distributed ROADS databases • ADAM: Art, Design, Architecture & Media
are made available using the WHOIS++ protocol. Information Gateway;
Searches across several ROADS databases (both ro- • Biz/ed: Business Education on the Internet;
bot-generated and manually constructed) are possible • EEVL: Edinburgh Engineering Virtual Library;
with searches currently being sent to each server in • IHR-Info: Institute of Historical Research;
parallel. Future versions of the ROADS software will • OMNI: Organizing Medical Networked
support the Common Indexing Protocol, which allows Information;
servers to share knowledge about their databases, and • RUDI: Resources for Urban Design Information;
thus route queries between different servers in a more • SOSIG: Social Science Information Gateway.
efficient manner.23 It should be noted that the Com-
mon Indexing Protocol is not specific to WHOIS++ These projects are creating large amounts of metadata
and could be used, in theory, to route queries between for network resources in their specialist areas. These
multiple LDAP servers or multiple Z39.50 servers. subject services, sometimes called subject-based in-
formation gateways, are one solution to the problem of
resource discovery on the Internet. The services use
specialist staff to select Internet resources ensuring
PROJECTS AND SERVICES quality control, and these are then described using
USING METADATA human-created metadata. The subject service approach
to resource discovery is based to some extent on the
traditional library model. Resources are chosen ac-
There are a growing number of projects and services cording to defined selection criteria and they will then
currently using metadata for resource discovery in a be manually ‘catalogued’ for inclusion in a database.
networked environment. The following section com- This process ensures that only good quality resources
prises a brief description of some of these projects. are made available through the service and that suffi-
cient metadata is available to enable the adequate
Projects funded by the Electronic searching and retrieval of these resources. The result-
Libraries Programme ing service often provides access both by searching
and by browsing, either by a list of subject terms or by
Access to Network Resources projects a particular subject-classification. Several of the eLib
subject services are based on the software tools devel-
The UK Electronic Libraries programme (eLib), a oped by the ROADS project.
series of projects, demonstrators and services funded
by the Joint Information Systems Committee (JISC) ROADS: Resource Organization and Discovery in
of the UK Higher Education Funding Councils, was Subject-based services
formed in 1995 in response to recommendations made
by the authors of the Report of the Joint Funding ROADS is an eLib project, also under the ANR strand,
Councils’ Libraries Review Group in December and is a collaboration between the Institute of Learn-
1993—the Follett Report. Amongst other things, the ing and Research Technology (ILRT) at the University
Report recommended that JISC should fund the of Bristol, the UK Office for Library and Information

Library & Information Briefings 75 -9-


Metadata

Networking (UKOLN) at the University of Bath and discovery tools. This work is intended to make the
the Department of Computer Studies at Loughbor- Harvest Web robot Dublin Core aware and will even-
ough University.26 Its aim is to develop and implement tually be made available with the public domain
a user-orientated resource discovery system enabling version of the Harvest software.29
users to find and access networked resources. In short,
ROADS is developing discovery software for a net- European Union funded projects
worked discovery framework primarily with regard to
the requirements of the eLib ANR services. DESIRE: Development of a European Service for
Information on Research and Education
ROADS is very much concerned with metadata—its
creation, organization and also how it can be searched The DESIRE Project is an extremely large project
and presented to users. ROADS templates, the metadata funded by the EU Telematics for Research Sector of
format chosen for use by the ROADS project, are the Fourth Framework Programme.30 The project is
based on IAFA/WHOIS++ templates—a format origi- investigating Web technology and the implementa-
nally designed for anonymous FTP archives. They are tion of pilot information services on behalf of European
based on simple (text based and human readable) researchers and is divided into ten work packages. The
attribute/value pairs of variable length. One major one with the most relevance to metadata issues is work
advantage of using ROADS templates is the possibil- package 3 (WP3), ‘Resource discovery and index-
ity of searching across multiple subject services using ing’,31 which has the general aim of supporting research
the WHOIS++ protocol.27 users of the Internet to locate information relevant to
their research. The work package partners include all
The nature of the ROADS project has resulted in its of the ROADS project partners, together with NetLab
participation in wider discussions of metadata and (University of Lund, Sweden) and the National Li-
Internet resource discovery. For this reason, ROADS brary of the Netherlands. It has two main strands:
partners have been involved with the Dublin Core
initiative and with deployment of WHOIS++. There is • Subject services (subject-based information
also a strong focus on the semantic interoperability of gateways). Building on the subject service
metadata formats: producing metadata mappings or approach to Internet subject services in
crosswalks, looking at potential interaction with the conjunction with work done at NetLab on
Z39.50 protocol; the development of template regis- engineering (EELS—Engineering Electronic
tries, cataloguing rules, etc. Library, Sweden) and the National Library of
the Netherlands(NBW—Nederlandse
NewsAgent Basisclassificatie Web), WP3 has looked at
quality-controlled subject-based information
NewsAgent for Libraries is another eLib project, this gateways based on library-type selection and
time in the Electronic Journals programme area. 28 The cataloguing skills. A demonstrator is planned for
aim of the project is to create a user-configurable European social science information, together
electronic news and current awareness service for with further services for engineering and fine
library and information professionals—the informa- art.
tion content being taken from selected UK library and • Automated indexing of WWW information sources.
information science journals and briefing materials WP3’s work on providing tools and methods for
from five organizations. The service will obtain infor- the automatic indexing of the WWW information
mation content from a Web robot designed to look for is an extension of work carried out at NetLab and
embedded Dublin Core and other—NewsAgent spe- the National Technological Library of Denmark
cific—metadata. As part of the project, UKOLN have (DTV) on the Nordic Web Index (NWI). A
developed a replacement for the HTML summarizer European Web Index (EWI) will be developed as
that is available as part of the Harvest suite of resource part of WP3 to provide a harvesting and indexing

- 10 - Library & Information Briefings 75


Metadata

service for the academic sector in Europe and to one involving the development and installation of the
establish a single uniform service with the aim of demonstration system at the sites of the project part-
indexing all European Internet documents ners and participating publishers. The first phase,
relevant to the academic area. however, consists of a series of seven work packages
investigating background issues for BIBLINK. Work
Several reports have been produced as part of the package 1, for example, made recommendations re-
project. NetLab have produced a state-of-the-art re- garding what particular formats should be accepted
view of indexing and data collection methods used in from publishers, deciding to look at SGML DTDs like
robot-based Internet search services32 and a functional Simplified SGML for Serial Headers (SSSH) for
specification for a European Web Index.33 WP3 has complex records and the use of Dublin Core as a
also resulted in a three-part report on a Specification minimum element set for data exchange.36 Work Pack-
for resource description methods which included a age 2 reviewed the important area of unique identifiers
survey of current metadata formats, a study of quality for electronic publications, including the Uniform
selection criteria for Internet subject services, and an Resource Name (URN), the Serial Item and Contribu-
evaluation of the use of subject classification schemes tion Identifier (SICI) and the Digital Object Identifier
for providing access to Internet resources.34 (DOI).37 Other work packages have looked at the
transmission of data between libraries and publishers,
BIBLINK: Linking Publishers and National Bib- conversion processes to investigate interoperability
liographic Services between publishers’ metadata and MARC formats,
and the important area of authentication.
The BIBLINK project is funded by the Telematics
Applications Programme of the European Commis-
Other metadata-related projects and
sion and aims to create an electronic link between
services
publishers of electronic material and national biblio-
graphic agencies.35 The project is led by the British
Library, and its partners include the national libraries Nordic Metadata Project
of France, The Netherlands, Norway and Spain, the
Universitat Oberta de Catalunya in Barcelona, and The Nordic Metadata Project is funded by
UKOLN. The intention of the project is that the NORDINFO, the Nordic Council for Scientific Infor-
bibliographic experts of the national libraries of Eu- mation, and has six participating organizations.38 The
rope, with cooperation of partners in the book industry, Nordic countries are used to sharing information about
will be able to examine what type of descriptive printed materials, but there is an awareness that shar-
metadata would be required for catalogues of elec- ing information about electronic documents has been
tronic publications and to investigate the possibility of complicated by the inadequacy of current resource
establishing electronic links for the transfer of this discovery mechanisms. The project is using Dublin
metadata from publishers to national bibliographic Core, and amongst other things, is investigating the
agencies. BIBLINK intends to produce an interactive following:
demonstration system which would enable selected
electronic publishers to transmit metadata to national • The production of conversion tables and programs
bibliographic agencies, where this data would then be to convert Dublin Core to Nordic MARC formats.
enriched and converted to specific MARC formats An experimental converter can currently produce
(primarily UNIMARC and UKMARC) for use by NORMARC, FINMARC and USMARC records.
national libraries. The level of data required is the Other Nordic formats will be added to the
minimum amount sufficient to support traditional converter, together with a MARC to DC converter,
Cataloguing in Publication (CIP) type functions. if required. It is intended that the software should
also be able to be easily adapted to convert DC to
There are two distinct phases in BIBLINK, the second non-Nordic MARC formats.

Library & Information Briefings 75 - 11 -


Metadata

• The production of tools for the creation of Dublin distributed catalogue.40 AHDS, in conjunction with
Core metadata to encourage an improvement in the UKOLN, initiated Resource Discovery Workshops in
quality and quantity of metadata that is made early 1997 so that specific requirements in all relevant
available. A Nordic Metadata DC production disciplines could be integrated into a system giving
template was published at the start of 1997 and has access to a distributed, interdisciplinary and mixed-
since been modified to conform with the changes media collection of digital resources.41 It is recognized
to HTML syntax agreed at the DC 4 Workshop in that each service provider may have its own preferred
Canberra. formats for storing metadata; for example, the Oxford
• Working with the DESIRE project to make the Text Archive will be using TEI headers. The AHDS is
Nordic Web Index robot metadata aware so that it looking at a solution where a core set of metadata,
can recognize and extract embedded Dublin Core. based on Dublin Core, could be used to provide ‘top-
level’ access to the distributed AHDS resource, while
The range of activities being carried out by the Nordic individual service providers maintain their own spe-
Metadata Project—metadata creation, harvesting and cific metadata for their own collections. It is possible
interoperability—will be of great interest to others that the subject-specific metadata created by service
who are considering the implementation of metadata- providers could be used to generate automatically
based systems. (through metadata mappings/crosswalks) a subset of
core metadata which could then be used in a ‘top-
Arts and Humanities Data Service level’ catalogue.

The Arts and Humanities Data Service (AHDS) is The MathN Broker
funded by JISC for the collection, description and
preservation of the electronic resources that result A service currently using metadata is the MathN
from and are used by research and teaching in the Broker—a mathematical pre-print service based at the
humanities.39 It consists of an executive based at University of Osnabrück, Germany.42 The service
King’s College London, and five service providers, grew out of a ‘Fachinformation’ project run by the
located throughout the UK: DMV, the German Mathematical Society. The service
gives electronic access to PostScript versions of pre-
• Archaeology Data Service (A consortium, led by prints stored on about 40 departmental Web servers in
the University of York); Germany.43 The Harvest software is used for indexing,
• History Data Service (The Data Archive, but this has limitations when used with PostScript. For
University of Essex); this reason, the pre-print service indexes metadata
• Oxford Text Archive (Oxford University which was originally stored in what Roland Schwänzl
Computing Services); has described as a ‘preliminary Warwick Container
• Performing Arts Data Service (Glasgow for HTML coded MetaData’, using a format known as
University); the MathDMV-Preprint Core.44 Since the beginning
• Visual Arts Data Service (Surrey Institute of Art of 1997 the service has used Dublin Core elements
and Design). embedded in HTML META tags. The metadata can
include subject classifications from the Mathematics
AHDS will provide a unified catalogue giving access Subject Classification (MSC), the Physics and As-
to its service provider’s holdings and possibly to other tronomy Classification Scheme (PACS) and the ACM
scholarly collections. For this reason, the AHDS has Computing Classification System (CCS), together
examined the needs of arts and humanities scholars with subject keywords and abstracts. The metadata is
with regard to information discovery and resource provided by authors using a Web page with a FORMS
description with the intention of identifying shared interface called the Mathematics Metadata Markup
metadata requirements which could be used in a editor (MMM).

- 12 - Library & Information Briefings 75


Metadata

‘forward knowledge’ held by one server about an-


ASSOCIATED TECHNOLOGIES other. This ‘forward knowledge’ is maintained using
the Common Indexing Protocol (CIP).

Protocols
Z39.50

HTTP Z39.50 is a standard for information retrieval ap-


proved by the National Information Standards
The Hypertext Transfer Protocol (HTTP) defines the
Organization (NISO), a committee accredited by the
way in which Web clients (typically Web browsers
American National Standards Institute (ANSI). It has
such as Netscape Navigator) and Web servers commu-
also been recognized by the International Organiza-
nicate with each other. It specifies how clients request
tion for Standardization (ISO) where it is known as
a particular page from a server—such requests are
ISO 23950. Z39.50 can be described as a protocol for
based on Uniform Resource Locators (URLs). It ena-
supporting the construction of distributed information
bles clients to ask for information about a page, such
retrieval applications.45 The protocol allows client
as when it was last updated. It also specifies how
applications (known in the standard as the ‘origin’) to
servers send Web pages, informational messages and
search databases on remote servers (the ‘target’) and
error messages back to the client.
to retrieve relevant information. As an open standard,
Z39.50 supports the retrieval of information from
LDAP
distributed remote databases.46 The first applications
The Lightweight Directory Access Protocol (LDAP) were developed specifically for bibliographic data,
was developed as a simple alternative to the ISO for example the distributed searching of library online
X.500 protocol, a protocol for providing distributed public access catalogues, but attribute-sets can be
information about people—names, e-mail addresses, defined to allow the protocol to work with many other
telephone numbers, etc. Although primarily designed types of data.
for providing access to information about people,
LDAP can also be used for other sorts of informa- Languages
tion—for example, to access data about Web pages.
LDAP servers are typically organized into a strict HTML
hierarchy with the ‘root’ at the top, country level
nodes below that, organizational nodes below them, The HyperText Markup Language (HTML) is the
etc. language in which World Wide Web documents are
written and is an application of the Standard General-
WHOIS++ ized Markup Language (SGML).47 HTML is primarily
concerned with two things: defining how documents
The WHOIS++ protocol was developed as a light- look—by the use of a variety of structural or
weight Internet protocol for providing distributed presentational tags; and the creation of hypertext links
information about people—names, e-mail addresses, to separate network documents. HTML pages are split
telephone numbers, etc. It can also be used for other into two main sections, the header or HEAD element
sorts of information. The eLib ROADS project pro- and the BODY. The HEAD section of a page contains
vides software that uses WHOIS++ to distribute information about the document (or metadata), for
descriptions of Internet resources. Unlike LDAP example an HTML TITLE tag, while the BODY will
and X.500, WHOIS++ does not have a strict hierar- typically contain the information content of the docu-
chical representation of the data space, instead ment itself together with its structural and
using a more flexible ‘mesh’ of servers. WHOIS++ presentational tags—which can then be displayed by
based searches are routed through this mesh based on a Web browser.

Library & Information Briefings 75 - 13 -


Metadata

XML XML will be used as the language for encoding PICS


labels.
XML stands for ‘Extensible Markup Language’, and
is a simplified subset of SGML. Development of XML PICS does not define semantics but is positioned as a
is an initiative within W3C (the World Wide Web transport syntax (i.e. a syntax for sharing data between
Consortium) and its aim is to define an SGML DTD applications). It is envisaged that different element
for the Web.48 XML is designed to allow flexibility sets might be encoded using PICS-NG, and that Dub-
and extensibility (hence the name). Whereas HTML lin Core might be one of these. PICS records might be
facilitates display of information on the Web, XML embedded in the resource, linked to the resource, or
provides for standards-based management of data indeed located independently on a third party data-
(including metadata). XML specifies how the seman- base.
tics of data elements can be expressed, indicating what
each data element means. Examples of XML tags
might include author, price, person lastname, person
firstname and so on, there being no limit on the tags IDENTIFIERS
that might be included in a schema. XML is a text-
based markup language similar to HTML to look at, Unique identifiers are an essential part of the technol-
but indicating the semantics of data rather than speci- ogy that enables electronic trading, copyright
fying mode of display. management, electronic tables of contents, produc-
tion tracking and resource discovery. Traditionally
Schema specifying agreed element names can be publishers and libraries have worked with identifiers
shared between Web ‘publishers’, and the schema can such as the ISBN and ISSN for paper products. These
itself be expressed in XML. XML can be used to add identifiers are assigned at the book or journal level, but
semantic information to an HTML document, and the need for a unique and persistent identifier for
HTML using devices such as stylesheets, can display electronic resources at a lower level of granularity has
the information expressed in XML in a standardized become more important. Increasingly, we need to
way. XML looks likely to be used within various identify much smaller fragments of complete works,
applications now under development for publishing for example parts of text, images, video clips, pieces
information on the Web: for the Meta-Content Frame- of software, etc. Recent schemes, such as the DOI, can
work (MCF), the Channel Definition Format (CDF) be used at arbitrary levels of granularity determined by
and the new version of PICS labels. individual publishers based on commercial or other
considerations.
PICS (Platform for Internet Content Selection)
There are significant outstanding issues in relation to
Another initiative within W3C is PICS which cur- identifiers:
rently provides a mechanism for associating numeric
content rating labels with Internet resources.49 PICS • What is being identified? For an online document
enables attributes to be linked to a resource and rated that has multiple versions and that is mirrored on
on a numeric scale (e.g. level of violence = 10). PICS several Web sites, is it the logical ‘document’ that
is now being used primarily as a means to filter content is being identified or particular instances of that
on the Web particularly against criteria such as suit- document?
ability for children. The next version of PICS • Identification vs. location. The Uniform Resource
(commonly referred to as PICS-NG) will provide an Locator (URL) that we are all familiar with is a
infrastructure for associating more general string la- locator rather than an identifier. If an object moves,
bels (i.e. metadata) with resources. It is likely that its associated URL changes and people using the

- 14 - Library & Information Briefings 75


Metadata

old URL are likely to get a failure indicating that to scientific publishers. For example: S0165-
it is no longer available. There are significant 3806(96)00403-8.
political and commercial interests which act as
barriers to establishing services which will resolve URN (Uniform Resource Name)
identifiers to URLs.
Uniform Resource Names (URNs) are intended to
serve as persistent, globally unique resource identifi-
ISSN (International Standard Serial Number)
ers that fit into the larger Internet information
architecture composed of, additionally, Uniform Re-
The ISSN is a standardized international numeric code
source Characteristics (URCs) and Uniform Resource
which enables the identification of serial publications,
Locators (URLs). URNs are for identification, URCs
for example periodicals, newspapers, annuals or se-
for including metadata and URLs for locating re-
ries. Serials can be in printed form, on other medium
sources. URNs are designed to make it easy to map
(microform, floppy disk, CD-ROM or CD-i), or can be
other identification schemes into URN-space. The
accessible online. An ISSN is normally represented as
exact format of URNs is still under discussion but it is
the string ‘ISSN’ followed by two sets of four digits:
likely that, for example, an ISBN may be represented
for example, ISSN 0374-0536.
as a URN as follows: urn:isbn:0-395-36341-1.
ISBN (International Standard Book Number)
DOI (Digital Object Identifier)

The ISBN system is an international standard number- The Digital Object Identifier (DOI) system is being
ing system for monographs. It has traditionally been developed on behalf of the Association of American
used for books, but has been expanded to include other Publishers (AAP). The DOI system is based around a
new media such as videocassettes and electronic me- directory, which stores an object’s DOI and its associ-
dia. An ISBN is normally represented as the string ated location (URL). Queries sent to the directory
‘ISBN’ followed by ten digits separated into four result in the DOI being looked up and the location
parts: for example, ISBN 82-7111-124-8. returned to the client. In Web terminology, this is a
standard Hypertext Transfer Protocol (HTTP) redi-
SICI (Serial Item and Contribution Identifier) rect. A DOI has two parts, a globally unique part called
the Publisher ID and a publisher assigned part called
The SICI is a variable length code that uniquely the Item ID. For example: 10.153/34571.
identifies serial issues (items) and articles within a
serial (contributions). The SICI is a complex identifier PURL (Persistent Uniform Resource Locator)
split into three parts: the item segment (based on the
ISSN of the serial); the contribution segment (which PURLs have been developed and deployed by OCLC
identifies an article or other contribution within the as a naming and resolution service for general Internet
serial); and the control segment. For example: 0730- resources. Functionally, a PURL is an URL. However,
9295(199206)11:2<168:CRFAOC>2.0.TX;2-#. instead of pointing directly to the location of an
Internet resource, a PURL points to an intermediate
PII (Publisher Item Identifier) resolution service. The PURL Resolution Service
associates the PURL with the actual URL and returns
Elsevier Science developed the PII to identify journal that URL to the client. The client can then complete
articles independently from their packaging unit, be- the URL transaction in the normal fashion. As with the
cause they may be published in different ways DOI this is achieved using an HTTP redirect. For
(database, CD-ROM, paper, World Wide Web, etc.). example: http://purl.oclc.org/OCLC/PURL/
It is primarily intended for document items of interest INET96.

Library & Information Briefings 75 - 15 -


Metadata

MARC MAchine Readable Cataloguing.


GLOSSARY
A family of formats based on
ISO 2709 for the exchange of
CIMI Computer Interchange of Museum bibliographic and other related
Information. CIMI records are an information in machine readable
SGML-based metadata format form.
developed for museum
information. PICS Platform Independent Content
Selection. Internet content
DTD Document Type Definition. An filtering infrastructure.The next
application program defining generation (PICS-NG) is likely
document types in an SGML to provide a general metadata
context. infrastructure.

Dublin Core Dublin Core Metadata Element ROADS Resource Organization and
Set. A metadata format defined on Discovery in Subject-based
the basis of international services. eLib funded
consensus which has defined a project developing software for
minimal information resource use by Internet subject services.
description, generally for use in a
Web environment. SGML Standard Generalized Markup
Language. An international
EAD Encoding Archival Description. standard (ISO 8879) for the
An SGML-based metadata format description of marked-up
developed for the description of electronic text.
archives.
SOIF Summary Object Interchange
GILS Government Information Locator Format. A metadata format
Service. Metadata format created developed for use with the
by the US Federal Government in Harvest architecture.
order to provide a means of
locating information generated by SSI Server Side Includes. A
government agencies. mechanism for dynamically
generating parts of Web pages.
Granularity The level of detail at which
indexing takes place. TEI Text Encoding Initiative. An
attempt to define, using SGML,
Harvest A system providing a set of the encoding of literary and
software tools for the gathering, linguistic texts in electronic
indexing and accessing of Internet form. TEI headers are an SGML-
information. Uses SOIF. based metadata format used for
the documentation of these texts.
IAFA Internet Anonymous FTP Archive
templates templates. Metadata format Warwick An architecture for the exchange
designed for anonymous FTP Framework of distinct metadata packages
archives, now adapted for use in involving the aggregation of meta-
ROADS project. data packages into containers.

- 16 - Library & Information Briefings 75


Metadata

9. Lagoze, C., Lynch, C. and Daniel, R., The Warwick Frame


REFERENCES
work: a container architecture for aggregating sets of
metadata. TR96-1593, June 21, 1996. Available from:
1. See: Caplan, P. ‘You call it corn, we call it syntax- <URL:http://cs-tr.cs.cornell.edu:80/Dienst/UI/2.0/Describe/
independent metadata for document-like objects’. The ncstrl.cornell%2fTR96-1593>
Public-Access Computer Systems Review, 6(4), 1995.
Available from: <URL:http://info.lib.uh.edu/pr/v6/n4/ 10. Lagoze, C. ‘The Warwick Framework: a container
capl6n4.html> architecture for diverse sets of metadata’. D-Lib Magazine,
July/August 1996. Available from: <URL:http://www.dlib.org/
2. Levy, D. Cataloging in the digital order. [Paper for: Digital dlib/july96/lagoze/07lagoze.html >
Libraries ‘95: The Second Annual Conference on the Theory
and Practice of Digital Libraries, Austin, Texas, June 11-13,
11. Weibel, S., Kunze, J. and Lagoze, C., Dublin Core Metadata
for simple resource description. Internet-Draft, 9 February
1995]. Available from: <URL:http://csdl.tamu.edu/DL95/
1997. Available from: <URL:ftp://ds.internic.net.internet-
papers/levy/levy.html>
drafts/draft-kunze-dc-00.txt>
3. Lagoze, C. ‘From static to dynamic surrogates: resource
12. Weibel, S., Iannella, R. and Cathro, W., ‘The 4th Dublin Core
discovery in the digital age’. D-Lib Magazine, June 1997.
Metadata Workshop Report: DC-4, March 3—5, 1997,
Available from: <URL:http://www.dlib.org/dlib/june97/
National Library of Australia, Canberra’. D-Lib Magazine,
06lagoze.html>
June 1997. Available from: <URL:http://www.dlib.org/dlib/
4. Stobart, S. and Kerridge, S. ‘An investigation into World june97/metadata/06weibel.html>
Wide Web Search Engine use from within the UK—
13. UKOLN Metadata Software Tools. Available from:
preliminary findings’. Ariadne, 6, November 1996.
<URL:http://www.ukoln.ac.uk/metadata/software-tools />
Available from: <URL:http://www.ariadne.ac.uk/issue6/
survey/> 14. DC-dot. Available from: <URL:http://www.ukoln.ac.uk/
metadata/dcdot/>
5. Koch, T., Ardö, A., Brümmer, A. and Lundberg, S., The
building and maintenance of robot based Internet search 15. Powell, A., ‘Dublin Core management’. Ariadne, 10, July
services: a review of current indexing and data collection 1997. Available from: <URL:http://www.ariadne.ac.uk/
methods. Draft D3.11 (version 3) for Work Package 3 of issue10/dublin/>
Telematics for Research project DESIRE, September 1996.
16. CNIDR Isite. Available from: <URL:http://vinca.cnidr.org/
Available from: <URL:http://www.ub2.lu.se/desire/radar/
software/Isite/Isite.html>
reports/D3.11/>
17. BSn Doctypes Description. Available from: <URL:http://
6. Dempsey, L. and Heery, R., with contributions from M.
w3.bsn.com/Z39.50/INTRO.html>
Hamilton, D. Hiom, J. Knight, T. Koch, M. Peereboom and A.
Powell, Specification for resource description methods— 18. Nordic Web Index. Available from: <URL:http://nwi.ub2.lu.se/>
Part 1: A review of metadata: a survey of current resource
19. Powell, A., Notes on use of Dublin Core by NewsAgent.
description formats. Deliverable 3.2 (1) for Work Package 3
Available from: <URL:http://www.ukoln.ac.uk/metadata/
of Telematics for Research project DESIRE, March 1997.
NewsAgent/dcusage.html>
Available from: <URL:http://www.ukoln.ac.uk/metadata/DE
SIRE/overview/> 20. UKOLN metadata resources—Dublin Core, list of projects.
Available from: <URL:http://www.ukoln.ac.uk/metadata/
7. Burnard, L. and Light, R., Three SGML metadata formats:
resources/dc.html>
TEI, EAD, and CIMI. A study for BIBLINK Work Package
1, December 1996. Available from:<URL:http://www. 21. Europagate. Available from: <URL:http://europagate.dtv.dk/
ukoln.ac.uk/metadata/BIBLINK/wp1/sgml/>
22. UKOLN Experimental Z39.50 based demonstrators.
8. The Dublin Core Metadata Element Set: Home Page. Avail Available from: <URL:http://roads.ukoln.ac.uk/cgi-bin/
able from: <URL:http://purl.org/metadata/dublin_core> egwcgi/egwirtcl/targets.egw>

Library & Information Briefings 75 - 17 -


Metadata

23. Allen, J. and Mealling, M., The Architecture of the Common DESIRE, February-May 1997. Available from: <URL:http:/
Indexing Protocol (CIP). Internet-Draft, 9 June 1997. /www.ukoln.ac.uk/metadata/DESIRE/specification.html>
Available from: <URL: ftp://ds.internic.net/internet-drafts/
draft-ietf-find-cip-arch-00.txt> 35. BIBLINK. Available from: <URL:http://www.ukoln.ac.uk/
metadata/BIBLINK/>
24. Joint Funding Councils’ Libraries Review Group, Report
[The Follett Report]. Bristol: Higher Education Funding 36. Heery, R., Metadata formats. Work Package 1 of Telematics
Council for England, December 1993, Section 265. for Libraries project BIBLINK (LB 4034), November 1996.
Available from: <URL:http://www.ukoln.ac.uk/metadata/
25. Electronic Libraries Programme, Project details. Available
BIBLINK/wp1/d1.1/>
from:<URL:http://www.ukoln.ac.uk/services/elib/projects/>
37. Høgås, H., van der Werf, T. and Powell, A., Identification.
26. ROADS. Available from: <URL:http://www.ukoln.ac.uk/
Work Package 2 of Telematics for Libraries project BIBLINK
roads/>
(LB 4034), May 1997. Available from: <URL:http://
27. Knight, J. and Hamilton, M., Overview of the ROADS www.ukoln.ac.uk/metadata/BIBLINK/wp2/d2.1/>
software. LUT CS-TR 1010. Loughborough: Loughborough
38. Nordic Metadata Project. Available from: <URL:http://
University of Technology, March 1996. Available from:
linnea.helsinki.fi/meta/>
<URL:http://www.roads.lut.ac.uk/Reports/arch/arch.html>
39. Arts and Humanities Data Service. Available from:
28. NewsAgent for Libraries. Available from: <URL:http://
<URL:http://ahds.ac.uk/>
www.sbu.ac.uk/~litc/newsagent/>
40. Dempsey, L., and Greenstein, D., Proposal to identify shared
29. Harvest Web Indexing. Available from: <URL:http://
metadata requirements. 15 January 1997. Available from:
www.tardis.ed.ac.uk/harvest/>
<URL:http://www.kcl.ac.uk/projects/ahds/jobs/
30. DESIRE. Available from: <URL:http://www.nic.surfnet.nl/ proposal.html>
surfnet/projects/desire/desire.html>
41. Miller,P., Resource Discovery Workshops: a guide to
31. DESIRE WP3 Resource Discovery and Indexing. Available implementation and participation. 23 May 1997. Available
from: <URL:http://www.ub2.lu.se/desire/> from: <URL:http://www.york.ac.uk/~apm9/focus01.html>

32. Koch, T., Ardö, A., Brümmer, A. and Lundberg, S., The 42. MathN Broker. Available from: <URL:http://www.
building and maintenance of robot based Internet search mathematik.uni-osnabrueck.de/harvest/brokers/MathN/>
services: a review of current indexing and data collection
43. Plümer, J. and Schwänzl, R., A mathematics preprint index:
methods. Draft D3.11 (version 3) for Work Package 3 of
DC in an application. [Paper for: 4th Dublin Core Metadata
Telematics for Research project DESIRE, September 1996.
Workshop, Canberra, 3-5 March 1997]. Available from:
Available from: <URL:http://www.ub2.lu.se/desire/radar/
<URL:http://www.dstc.edu.au/DC4/roland/>
reports/D3.11/>
44. Scheme Definition: DMV MetaData for Mathematical
33. Lundberg, S., Ardö, A., Brümmer, A. and Koch, T., The
Papers, Version 1.2. Available from: <URL:http://
European Web Index: an Internet search service for the
www.mathematik.uni-osnabrueck.de/ak-technik/
European higher education, research and development
DMVPreprint-Core.html>
communities. Deliverable 3.1 for Work Package 3 of Telematics
for Research project DESIRE, 1996. Available from: 45. Dempsey, L., Distributed library and information systems: the
<URL:http://www.nic.surfnet.nl/surfnet/projects/desire/ significance of Z39.50. Managing Information, 1(6), June
deliver/WP3/D3-1.html> 1994, 41-43.

34. Specification for resource description methods. Deliverable 46. Turner, F., An overview of the Z39.50 Information Retrieval
for Work Package 3 of Telematics for Research project standard. IFLA Universal Dataflow and

- 18 - Library & Information Briefings 75


Metadata

Telecommunications Core Programme, Occasional Paper, 3, August 1996. Available from: <URL:http://www.dlib.org/dlib/
July 1995, rev. January 1997. Available from: <URL:http:// july96/07dempsey.html>
www.nlc-bnc.ca/ifla/VI/5/op/udtop3.htm>
Dempsey, L., and Heery, R., ‘Metadata: a current view of practice
and issues’. Journal of Documentation (forthcoming).
47. Raggett, D., Le Hors, A. and Jacobs, I.(eds.), HTML 4.0
Specification. W3C Working Draft. 18 July 1997. Dempsey, L. and Weibel, S., ‘The Warwick Metadata Workshop:
Available from: <URL:http://www.w3.org/TR/WD-html40- a framework for the deployment of resource description’. D-Lib
970708/> Magazine, July/August 1996. Available from: <URL:http://
www.dlib.org/dlib/july96/07weibel.html>
48. XML White Paper. Microsoft Corporation, June 23,1997.
Available from: <http://www.microsoft.com/standards/xml/ Heery, R., ‘Review of metadata formats’. Program, 30(4), Octo-
xmlwhite.htm> ber 1996, 345-373.

49. Resnick, P. and Miller, J., PICS: Internet access controls Lynch, C., ‘Searching the Internet’. Scientific American, 276(3),
without censorship. Communications of the ACM, 39 (10), March 1997, 44-48. Also available from: <URL:http://
October 1996, 87-93 www.sciam.com/0397issue/0397lynch.html>

Wallace, D., ‘Metadata and the archival management of electronic


records: a review’. Archivaria, 36, Autumn 1993, 87-110.

FURTHER READING Weibel, S., ‘The World Wide Web and emerging Internet resource
discovery standards for scholarly literature’. Library Trends,
43(4), Spring 1995, 627-644.
Metadata is one of those subjects that has a rapidly growing
literature and is also an area which has regular changes of focus Weibel, S., ‘Metadata: The Foundations of Resource Descrip-
and emphasis. As can be seen by the references in this Briefing, a tion’. D-Lib Magazine, July 1995. Available from: <URL:http://
large amount of information on metadata topics is available on the www.dlib.org/dlib/July95/07weibel.html>
Internet and specifically through the World Wide Web. For these
reasons it may be useful to note the following Web sites devoted
to keeping up-to-date with the subject:

• International Federation of Library Associations. DIGITAL


ACKNOWLEDGEMENTS
LIBRARIES: Metadata Resources. Available from:
<URL:http://www.nlc-bnc.ca/ifla/II/metadata.htm> UKOLN is funded by the Joint Information Systems Committee of
• UKOLN Metadata Group. Metadata. Available from: the Higher Education Funding Councils and by the British Library
<URL:http://www.ukoln.ac.uk/metadata/> Research and Innovation Centre, as well as by project funding
from several sources.
Dempsey, L., ‘Meta Detectors’. Ariadne, 3, May 1996, 6-7.
Available from: <URL:http://www.ukoln.ac.uk/ariadne/issue3/ The work carried out in this document is supported by the
metadata/> ROADS, DESIRE and BIBLINK projects.

Dempsey, L., ‘ROADS to Desire: some UK and other European The authors would like to thank Lorcan Dempsey for commenting
metadata and resource discovery projects’. D-Lib Magazine, July/ on a draft version of this Briefing.

Library & Information Briefings 75 - 19 -

You might also like