locators

¶ 1 Leave a comment on paragraph 1 0 Beyond the problem of finding these resources, however, by uncovering references to appropriate documents when we search for them, we face another difficulty: ensuring that the texts themselves can be retrieved. Libraries are crucially founded upon the notion of stable, unique object locators, an idea that first became clear to me as an undergraduate, when my university’s library was caught in the middle of a transition from the Dewey Decimal System to the Library of Congress Classification System for shelving its books; half the collection was shelved one way, and half the other, and there seemed to me to be no rhyme or reason to the ordering of blocks of shelves on particular floors. In order to find any given text, you needed not only to know how to search the card catalog system — and yes, I am old enough that I mean the card catalog system — but also how to look up the locator found there on the library’s shelving map. With a few of the usual exceptions (mis-shelved books, books awaiting reshelving, and of course books that were checked out [4.20]), if you knew how to read the metadata, your search would lead you directly to the book, despite what seemed like a chaotic, if not wholly random, arrangement of texts.

¶ 2 Leave a comment on paragraph 2 1 In digital publishing, being able to rely on our locators requires that the links to the texts for which we are searching work; in theory, those links take us directly from the catalog to the text. The web, however, is of course notoriously prone to “link rot,” such that “hard-coded,” static links to specific URLs, or Universal Resource Locators, quickly break. One study, released in 2005, found that that half-life of the links included in articles in D-Lib Magazine, an online journal focused specifically on issues regarding digital libraries, was about 10 years, meaning that after that span of time 50% of the links no longer functioned.[4.21] This presents a significant enough problem for scholarship that the Modern Language Association, in its most recent update of its bibliographic format, eliminated the inclusion of URLs in citations, saying that these links were too fluid to serve as permanent referents, and that searching the web for current links would be more reliable.[4.22] And they’re not wrong: projects move, server structures change, and software upgrades or platform migrations produce entirely new URL models. When we reach a point, however, at which having no information about a text’s location is preferable to having some, because the some we have is more likely to be wrong than right, something has clearly gone wrong.

¶ 3 Leave a comment on paragraph 3 1 The mobility of digital resources, which the web’s very rewritability promotes, is of course extremely problematic for libraries and archives: “The function of bibliographic control is to insure that every item has its unique place in an organizational arrangement that allows for systematic searching, discovery and retrieval. But the web plays havoc with the notions of control, order, fixity and hierarchy that are at the heart of the bibliographic enterprise” (Manoff 3). The answer, however, is neither to mandate fixity on the part of a fluid medium nor to eliminate links and references in our citations. Rather, we need to focus on the implementation of more robust ways of determining where our desired resources are and of creating links to them that will not break even as the resources grow, develop, and move. Our digital publications thus must employ a system of bibliographic identification that allows object identifiers to resolve dynamically into the correct URL as materials move.

¶ 4 Leave a comment on paragraph 4 0 The Handle System is a key project working toward this goal. Handle was conceived and developed beginning in 1994 by researchers at the Corporation for National Research Initiatives (a not-for-profit organization that is also, among other things, the publisher of D-Lib Magazine); the system provides the specifications for assigning and resolving persistent identifiers for digital objects on the Internet, enabling “a distributed computer system to store names, or handles, of digital resources and resolve those handles into the information necessary to locate, access, and otherwise make use of the resources” (Sun et al). While traditional URLs focus entirely on location, combining the name of a particular web server (regulated by the Domain Name System, or DNS) with the local name of a document, thus tying that resource to a specific filepath on a specific server, handles are unique and persistent references for the name of a digital object, rather than its location.[4.23] A document’s handle can remain the same even as the document moves, so long as the metadata in the Handle system associating that document name with a particular URL is updated. Thus links to the document, structured as handles rather than URLs, will continue to function regardless of the document’s actual location. Moreover, Handle, unlike DNS, is a fully distributed system; the handle that names a particular document includes a reference to a “naming authority,” or a local instance of the Handle system that resolves the object’s name, followed by the name itself. The global Handle namespace is thus the sum of all local namespaces, registered with the global system and governed by local naming authorities. A user’s request for a particular handle is sent by the global Handle system to the appropriate local naming authority, which then resolves the handle into the correct URL.

¶ 5 Leave a comment on paragraph 5 0 While there are over 1000 handle services running today, a key implementation of the service is found in the DOI (Digital Object Identifier) system, which is in use in many scientific and scholarly publications and which has registered over 40 million handles.[4.24] The DOI system, which is governed by a not-for-profit foundation initially funded by its member organizations,[4.25] is an application of Handle focused specifically on intellectual property, adding much more robust metadata about each digital object, including information about access rights, display formats, encryption, and the like, in addition to the more bibliographic forms of metadata discussed above.[4.26] The identifier in the DOI system refers, as in Handle, to a particular object, but based upon the object’s metadata, the identifier might resolve to one of a number of potential URLs; for instance, a journal article may be mirrored on multiple servers or may be available in multiple formats, and thus the DOI would direct the user to the most appropriate copy for the user’s location and browser. CrossRef is the most extensive implementation of DOI, founded in 2000 by a group of leading scholarly publishers who together formed a non-profit, independent organization, Publishers International Linking Association, to oversee the project. Today nearly 3000 publishers and scholarly societies around the world use its services; publishers pay an annual fee as well as transaction costs for each DOI registered with CrossRef. Library affiliates of CrossRef can use its system to provide links from citations or database records to the digital objects in question.[4.27]

¶ 6 Leave a comment on paragraph 6 0 DOI and CrossRef thus provide a means of creating and resolving persistent, unique handles for and links to digital texts published online; they also provide the potential for access control to particular documents based on the copyright restrictions placed on those documents and the user’s subscription information. Used in conjunction with OpenURL, which “provides a standardized format for transporting bibliographic metadata about objects between information services” (Van de Sompel and Beit-Arie), thus providing for “context-sensitive linking” (Paskin XX), DOI handles can resolve to the most appropriate copy of a digital object, using information about a user’s institutional affiliation. If, for instance, the user’s library has a subscription to a particular journal, the DOI can resolve into full access to the article from an appropriate provider; otherwise, it can resolve into a link through which the user can request the article or get more information about the library’s holdings.

¶ 7 Leave a comment on paragraph 7 0 These projects — Handle, DOI, CrossRef, and OpenURL — are each produced by a group of interested parties, whether researchers, publishers, or librarians, are each governed by not-for-profit corporations or foundations, and are each designed with openness and extensibility in mind. The projects are therefore complementary rather than competitive, solving more problems together than they can independently. Of course, simply having systems such as these won’t do us much good unless those systems are used; as creators of new digital objects, whether on the authoring end or on the publishing side, we must insist on the use of persistently resolvable object identifiers in our links and appropriate metadata in our archives, to ensure that the resources we use and create in our research remain searchable and addressable in the future.

It’s shocking to remember that, not so very long ago, our library cataloging systems didn’t provide us with this crucial bit of information. Not knowing whether a text is actually available in my library before I walk over there is unthinkable to me today, suggesting the extent to which the kinds of information we consider crucial in our metadata changes over time.

See McCown et al. See also Koehler for a longitudinal study that suggests both that link degradation stabilizes after an initial, precipitous drop, and that links to different kinds of web objects degrade at different rates.

See the seventh edition of the MLA Handbook: “Inclusion of URLs has proved to have limited value, however, for they often change, can be specific to a subscriber or a session of use, and can be so long and complex that typing them into a browser is cumbersome and prone to transcription errors. Readers are now more likely to find resources on the Web by searching for titles and authors’ names than by typing URLs” (182). Note, of course, that the assumption is that a reader wanting to find a cited resource would need to transcribe that URL rather than simply clicking on a link; the default assumption in this handbook is still that the citation itself will appear in print.

Other forms of identifying digital objects by name rather than location exist, including URNs (or Uniform Resource Names); URLs and URNs are both subsets of the larger category of URIs, or Uniform Resource Identifiers. Technically, the W3C has deprecated the term URL in favor of URI, but popularly, the location-based term remains the norm, as it is location through which web browsers address the object.

See Handle System, “Quick Facts.”

It should be noted that the International DOI Foundation has announced its plans to move toward an economic model based on fees paid by registration agencies, who may in turn charge publishers wishing to register DOIs. See The DOI Handbook 78.

See Rosenblatt, “The Digital Object Identifier.”

See CrossRef.org, “Fast Facts.”

Page 38

Sean Gillies 11 January 2010 at 2.29 am

The problem with URLs isn’t inherent fragility but that we often don’t get the identifier space of our information architecture straight before we begin to publish resources on the web [1]. Major web “properties” like Wikipedia can and do maintain their URLs as their infrastructures change. 9 years ago http://www.wikipedia.org/wiki/Computing was served by a Perl CGI script on a single server. Now it’s served by 200 application servers, 20 database servers and 70 cache servers [1]. Wikipedia’s data has moved many times, yet the original URL still exists, now redirecting to a language-specific variant (http://en.wikipedia.org/wiki/Computing in my case). Maintaining the original URL is Wikipedia’s policy. Nine years might not seem very long to a librarian, but there’s no technical reason why (given funding) that policy can’t continue indefinitely, even if Wikipedia grew tenfold, physically relocated their data center, switched to app servers written in Erlang, or switched from Squid to Varnish.

[1] http://www.w3.org/Provider/Style/URI
[2] http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/

Reply to Sean Gillies

1 Comment on paragraph 3

Jesse Pearlman Karlsberg 11 November 2010 at 9.21 am

Just a copy edit to offer here. In the quote from Manoff, “insure” ought to read “ensure.” If this error is in the original, perhaps a “[sic]” is in order.

Reply to Jesse Pearlman Karlsberg

L’accademia dei morti viventi.Parte quarta: la conservazione 22 February 2012 at 4.57 pm

[…] scientifiche devono fare i conti con la mobilità delle risorse digitali, e la conseguente instabilità degli URL Occorre dunque un sistema di identificatori che si risolvano dinamicamente nell’URL, […]

Cancel

Planned Obsolescence

locators

Contents

Comments

Activity

Comments

0 Comments on the whole Page

0 Comments on paragraph 1

1 Comment on paragraph 2

1 Comment on paragraph 3

0 Comments on paragraph 4

0 Comments on paragraph 5

0 Comments on paragraph 6

0 Comments on paragraph 7

1 Pingback or trackback

Leave a Comment

Table of Contents

Activity

Recent Comments on this Page

Recent Comments in this Document

Untitled Widget