ResourceSync Framework Specification - Archives - Beta Draft

21 August 2013

This version:
http://www.openarchives.org/rs/0.9.1/archives
Latest version:
http://www.openarchives.org/rs/archives
Previous version:
http://www.openarchives.org/rs/0.9/archives
Editors:
Martin Klein, Robert Sanderson, Herbert Van de Sompel - Los Alamos National Laboratory
Simeon Warner - Cornell University
Graham Klyne - University of Oxford
Bernhard Haslhofer - University of Vienna
Michael Nelson - Old Dominion University
Carl Lagoze - University of Michigan

Abstract

The ResourceSync specifications describe a synchronization framework for the web consisting of various capabilities that allow third party systems to remain synchronized with a server's evolving resources. This ResourceSync Archives specification describes additional capabilities that extend the core specification to provide historical information about a set of resources.

This specification is one of several documents comprising the ResourceSync Framework Specifications.

Status of this Document

This specification is a beta draft released for public comment. Feedback is most welcome on the ResourceSync Google Group.

Table of Contents

1. Introduction
    1.1 Motivating Examples
    1.2 Structure
    1.3 Notational Conventions
2. Advertising Archive Capabilities
    2.1 Inclusion of Archives in a Capability List
    2.2 Linking to Archives
3. Resource List Archives
    3.1 Resource List Archive Index
4. Resource Dump Archives
5. Change List Archives
6. Change Dump Archives
7. References

Appendices

A. Acknowledgements
B. Change Log

1. Introduction

The ResourceSync specifications introduce a range of easy to implement capabilities that a server may support in order to enable remote systems to remain more tightly in step with its evolving resources. They also describe how a server can advertise the capabilities it supports. Remote systems can inspect this information to determine how best to remain aligned with the evolving data.

This ResourceSync Archives specification adds to the framework capabilities that allow a server to provide historical data based on archives of the core capabilities (Resource Lists, Resource Dumps, Change Lists, and Change Dumps). Like all other capabilities, Archives are implemented using the document formats introduced by the Sitemap protocol. Each archive capability is optional and may be implemented independently of any other archive capability. Archives need not be implemented in order to support synchronization with ResourceSync, but may facilitate certain use cases.

For example, a Change List Archive allows a server to list a timestamped set of historical Change Lists, thus allowing description of changes over an extended period without placing addition requirements on the generation and rotation of the current Change List. A Resource Dump Archive allows a server to list a timestamped set of historical Resource Dumps, providing snapshots of the server's resources at different times. A remote server may select an appropriate historical Resource Dump to synchronize with a past state of the server's resources.

This document is structured as follows:

All archive capabilities may have indexes to allow extension to very large numbers of entries in the same manner as the core capabilities. This is described in detail for the Resource List Archive Index in Section 3.1.

1.1. Motivating Examples

Many projects and services have synchronization needs and have implemented ad hoc solutions. ResourceSync provides a standard synchronization method that will reduce implementation effort and facilitate easier reuse of resources. Archive capabilities allow historical data to be described within the same framework as current synchronization information. This section describes motivating examples for the archive capabilities.

The way in which a ResourceSync Source generates Change Lists will be determined by the particular technical configuration of the Source, the frequency of changes, and the intended use. While Change Lists that use the Sitemap index format and a set of Sitemaps may have a very large number of entries, it may be convenient to rotate individual lists of changes frequently and avoid generating a very large Change List. Change List Archives add flexibility while retaining the ability for a Source to make available a complete change history enabling incremental synchronization from any past state. A Source with very frequent changes might create separate Sitemap files as part of a Change List at hourly intervals, and perhaps each month (about 720 hours) start a new Change List while archiving the old one. If all the resource states were recorded in addition to the change information, then Change Dumps and a Change Dump Archive could be used to optimize download of the changed resources.

Many services provide snapshots of historical content either as stable reference points, or to permit the evolution of the service's resources to be studied in situations where describing all updates would be difficult. Examples include Wikipedia Snapshots and Nature Linked Data Snapshots. The Resource Dump Archive capability provides the opportunity to describe such snapshots in a consistent and machine-navigable way.

Resource List Snapshots provide the ability for servers to describe the state of their resources at particular points in time. This would allow clients to investigate changes expressed in the metadata or to compare the current state with historical state.

1.2. Structure

The capabilities introduced in this specification extend the framework structure described in ResourceSync Core: Structure. Figure 1 shows how the archive capabilities fit into the ResourceSync Framework:

A. Acknowledgements

This specification is the collaborative work of NISO and the Open Archives Initiative. Funding for ResourceSync is provided by the Alfred P. Sloan Foundation. UK participation is supported by Jisc.

We also thank numerous individual contributors including: Martin Haye (California Digital Library), Richard Jones (Cottage Labs), Stuart Lewis (University of Edinburgh), Peter Murray (Lyrasis), David Rosenthal (LOCKSS), Shlomo Sanders (Ex Libris, Inc.), Ed Summers (Library of Congress), Paul Walk (UKOLN), Vincent Wehren (Microsoft), Zhiwu Xie (Virginia Tech), and Jeff Young (Online Computer Library Center).

B. Change Log

Date Editor Description
2013-08-21 simeon, martin, herbert reorder sections, add structure figure
2013-08-05 simeon, martin, herbert, rob version 0.9.1
2013-06-07 simeon version 0.9
2013-05-06 simeon separated archives portion for version 0.6
2013-02-01 martin, herbert, rob, simeon beta spec draft
2012-08-13 martin, herbert, simeon, bernhard first alpha spec draft

Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.