Latent OpenURLs in HTML for Resource Autodiscovery, Localization and Personalization

!! DRAFT FOR TRIAL USE, SUBJECT TO CHANGE !!

Recently, there has been very compelling work by thought leaders in the library information community focussing on the possibilities of embedding metadata in html web pages using OpenURL. (for example, see the obscurely named GCS-PCS list )

Although the possibility to embed OpenURLs in conventional HTML documents has been around for a while, implementation has been almost nonexistent. For a number of reasons, this situation may be rapidly changing.

  1. A large number of institutions have implemented OpenURL resolvers to manage linking to electronic resources.
  2. An increasing number of free or open-access internet resources need a simple and cost-effective way to provide OpenURL services to readers with access to full-text resources in libraries.
  3. New forms of publishing, such as blogs, syndicated news feeds and collaborative bookmarking environments, need ways to provide localized linking services to libraries.
  4. Barriers to client-side implementations have fallen, as javascript-based browser plugins and bookmarking techniques are becoming popular. Institutional agents such as rewriting proxy-servers that are widely deployed to facilitate web access could also act to implement localized linking.

What has been missing so far is agreement (or even awareness) among the diverse actors on the best way to implement OpenURL in conventional HTML. Example implementations have been reported by Van de Sompel (DLIB) and by Chudnov et al. (working paper) (Ariadne). The intent of the current document is to distill the essence of previous proposals into the simplest convention necessary for the majority of applications to make use of an OpenURL embedded in HTML.

Proposal : the Latent OpenURL Convention for HTML

To add a Latent OpenURL to an HTML document, put a NISO 1.0 OpenURL into the "REL" attribute of an HTML anchor ("A") tag with REL attribute set to "Z39.88". Example: a latent or unactivated OpenURL link:

<A REL="Z39.88" HREF="?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.issn=1045-4438"></A>

This latent OpenURL is placed directly below this line:

If you are being served by a compliant activating agent, you will see a link, if not, the line above should be empty.

The same link, after (hypothetical) activation:

<A REL="Z39.88" HREF="http://library.example.edu/?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.issn=1045-4438">Find at Example Library</A>

This hypothetically activated OpenURL is placed directly below this line:
Find at Example Library
If you are being served by a compliant activating agent, you will see a link, different from "Find at Example Library"

To activate Latent OpenURLs in an HTML document,

  1. select all anchor tags with REL="Z39.88".
  2. Replace the part of the HREF attribute before the "?" with the baseurl of the local link server.
  3. Replace the content of the anchor tag with the anchor text or button image for the local link server.
  4. If the target resolver supports only 0.1 OpenURL, adjust the rest of the URL accordingly.

Details

Empty anchors.

The example above shows an empty anchor tag, and the OpenURL is present with an empty baseurl. In the absence of an activating agent, the link will be completely invisible to the user. The page assumes that user or institutional activating agents will fill in anchor text to make the link visible; its layout and design should gracefully accommodate additions. Alternatively, the web page might use a default baseurl (see below) and anchor text for users without access to activating agents.

Why REL attributes?

The use of REL (relation) attributes to mark the elements with latent OpenURL links is chosen because the REL attribute has been used in a similar way in a number of other applications.

In HTML, multiple relations can be associated with an element using a single attribute: REL='rel1 rel2' Activating agents should recognize the Z39.88 relation even when combined with other relations.

Why Z39.88?

The official designator for the NISO OpenURL standard is Z39.88-2004; we removed year. If activating agent require version information they should look inside the OpenURL. "OpenURL" as an alternative to "Z39.88" was considered, but "Z39.88" was considered to be extremely unlikely to be chosen for any other application, and compatibility was judged to trump other considerations.

In HTML, relations are not case sensitive. Activating agents should recognize "z39.88" as being equivalent to "Z39.88".

What kind of OpenURL?

To make it easier for activating agents to deal with the complexities of having to deal with multiple embedded OpenURL formats, this convention requires use of a single format, a specific form of the NISO 1.0 version of OpenURL. By using the NISO standard, embedded OpenURLs can be used with ANY resolver system. Embedded 1.0 OpenURLs are restricted to using the "in-line ContextObject format" and "in-line metadata" for the referent. (In addition to transporting metadata in the key-value pairs of a query string, the full 1.0 OpenURL standard allows metadata objects to be transported "by-reference" using a network pointer and "by-value" in an encoded blob; these forms must not be used in latent OpenURLs.) A simple guide to OpenURL 1.0 implementation is HERE.

(for OpenURL aficionados) There are no format or transport restrictions on ContextObject entities other than the referent.

Converting 1.0 OpenURL for use with 0.1 resolver systems (a guide for authors of activating agents)

Activating agents will need to adjust the OpenURL for use with specific resolver systems such as those which only understand version 0.1. This is a straightforward process, which is outlined here in gory detail:

  1. First of all, 0.1 systems will only understand book and journal article links- in version 1.0, these openURLs use the journal and book metadata formats:
    rft_val_fmt=info:ofi/fmt:kev:mtx:journal
    rft_val_fmt=info:ofi/fmt:kev:mtx:book

    Activating agents will need to skip other metadata formats (patents, dissertations, Dublin Core links and other formats which may become popular in the future) for 0.1-only resolver systems.
  2. By removing "rft." from the book and journal referent metadata keys the OpenURL can be made functional for all known 0.1-only resolver systems.
  3. There are a small number of new metadata keys introduced in OpenURL 1.0 that need to be adjusted for use with 0.1 resolver systems.
    1. "jtitle" should be replaced by "title".
    2. "btitle" should be replaced by "title".
  4. genre should be set to "article" or "book" if it is not already present in the 1.0 OpenURL.
  5. the syntax for identifiers is different in 0.1 and 1.0. The following replacements should be made:
    1. "rft_id=info:pmid/" should be replaced by "id=pmid:".
    2. "rft_id=info:doi/" should be replaced by "id=doi:".
    3. "rft_id=info:bibcode/" should be replaced by "id=bibcode:".
  6. set sid=something. In 0.1 OpenURL, the value of sid is more or less a proprietary value, so you can't really convert the OpenURL 1.0 equivalent. So just use it to identify your activating agent.

Default baseURL and BASE

Our example above starts the latent OpenURL with "?". HTML processing software will typically interpret this as a relative URI and will use the document URI (or the URI specified in a BASE tag) to determine an absolute URI for the anchor element. Thus the latent OpenURL will often point back to the document to which it resides until activating agents have a chance to add a resolver baseURL and anchor. Software that is not aware the latent OpenURL convention may mistake the latent URL for a real one. In most cases this should not be a problem, but embedding sites should expect some odd log entries from ignorant spidering robots. For this reason, it may be useful to specify a default or dummy baseURL in the latent OpenURL.

Dummy baseURLs are urls that don't resolve. The DNS system defines "example.com" and "example.org" as domains that will not resolve to real addresses, and these might be used as dummy baseURL. Similarly, new URI schemes, such as "info" might be used to signify that a URI can be activated into an OpenURL.

Default baseURLs (and default anchor text) can be used when a publisher wants to provide default services for users that do not have OpenURL services available. A default baseURL might be a special purpase resolver, and it migh also be just a static web page address, taking advantage of the fact that web-servers ignore query data when serving static pages. This has been done for the link to the Van de Sompel article cited above.

Forms

This specification does not provide means to embed or activate latent OpenURL information in HTML forms.

Definitions

Activating Agent
Software that processes an HTML page to make OpenURLs active for a user.

Implementations

In this section we list implementations of latent OpenURLs

Embedding Sites

  1. MRS Internet Journal of Nitride Semiconductor Research (4000+ reference database pages)
  2. a wordpress plugin and examples

Activating Agents

  1. various "greasemonkey" scripts a demo site
  2. A beta version of the OpenURL referrer FireFox Plugin.

Notes

Note that, for clarity, our displayed examples have not converted ampersand to "&amp;" as they should.

Open Issues
  1. Which attribute of anchor element to use for marking?
    1. REL (suggested by Alf Eaton, seconded by Dan Chudnov. good arguments both ways)
    2. CLASS (my suggestion)
    3. TITLE (suggested by Ross Singer, used? in experiment.) It seems everyone else is indifferent.
  2. Should embedded 0.1 links be supported? (Decent activating agents will, of course have to support making both 0.1 and 1.0 links out of the embedded ones) My view is NO, but Ross reasonably raises the question. Perhaps his points can be addressed by requiring a very simple form of OpenURL 1.0?
  3. Is there need for TYPE? I don't understand what the motivation for TYPE is.
  4. Is there need for version? Versioning seems to be addressed by OpenURL 1.0, but I can see the advantages.
  5. should "Z39.88" be replaced by "OpenURL" as the key value?
  6. is there a need for a document marker and/or profile reference? For example, one could add a <LINK> element in the <HEAD> as an indication that a page contains OpenURL related information. Something like:
    <link rel="Z39.88" title="OpenURL enabled"/>
    the advantage is that activating software could be made more efficient and profile information could be included - the disadvantage is that if the anchor element is removed from the document context it may lose its OpenURL significance.