from Santa Fe Convention to OAI-PMH v.2.0

from Santa Fe Convention to OAI-PMH v.2.0

Introduction to the OAI-PMH Michael L. Nelson [email protected] http://www.cs.odu.edu/~mln/ Several Slides from Herbert Van de Sompel, Simeon Warner and Terry L. Harrison University of Southern California 6/15/04 Outline History of OAI-PMH UPS, Santa Fe Convention Overview of the OAI-PMH verbs

data model OAI 1.0, 1.1, 2.0 and how 2.0 was created Example data providers and service providers More information http://www.openarchives.org/ UPS and SFC The Rise and Fall of Distributed Searching wholesale distributed searching, popular at the time, is attractive in theory but troublesome in practice Davis & Lagoze, JASIS 51(3), pp. 273-80 Powell & French, Proc 5th ACM DL, pp. 264-265

distributed searching of N nodes still viable, but only for small values of N NCSTRL: N > 100; bad NTRS/NIX: N<=20; ok (but could be better) The Rise and Fall of Distributed Searching Other problems of distributed searching (from STARTS) source-metadata problem how do you know which nodes to search? query-language problem syntax varies and drifts over time between the various nodes

rank-merging problem how do you meaningfully merge multiple result sets? Universal Preprint Service A cross-archive DL that that provides services on a collection of metadata harvested from multiple archives based on NCSTRL+; a modified version of Dienst support for clustering support for buckets Demonstrated at Santa Fe NM, October 21-22, 1999 http://ups.cs.odu.edu/ D-Lib Magazine, 6(2) 2000 (2 articles) http://www.dlib.org/dlib/february00/02contents.html

UPS was soon renamed the Open Archives Initiative (OAI) http://www.openarchives.org/ UPS Participants Archive / DL Records in DL Buckets in UPS Buckets Linked to Full Content arXiv 128943

85204 85204 743 742 659 3036 3036 3036

29680 25184 9084 1590 1590 951 71359 71359 13582

235361 187115 112516 www.arxiv.org CogPrints cogprints.soton.ac.uk NACA naca.larc.nasa.gov NCSTRL www.ncstrl.org NDLTD www.ndltd.org RePEc

netec.mcc.ac.uk Totals: totals ca. July 1999 Metadata Harvesting Getting metadata out of archives not all archives support metadata extraction some archives have undocumented metadata extraction procedures not all archives support rich criteria for extraction single dump concept only Intellectual property and use rights not always clear

many policies akin to dont ask, dont tell Metadata Formatting and Quality Quality problems with: record duplication crucial missing fields internal errors ambiguous references to people and places, publications Different formats! arXiv

CogPrints NACA RePEc NDLTD NCSTRL (local) (local) refer ReDIF MARC RFC-1807 observation : n digital libraries results in O(n) metadata formats

Buckets: Information Surrogates in UPS Limitations on intellectual property, file size, transmission time, system load, etc. caused us to focus on metadata only Metadata was collected into buckets, with pointers back to the data files (still at the original sites) Value Added

Services Attached to the Buckets SFX Reference Linking Service, developed at Univ of Ghent, Belgium. - provides a layer of indirection between reference services available at a local site and the object itself SFX buttons are attached to the buckets themselves - communication occurs between SFX server and the bucket Adding other services to the buckets is easy...

Data and Service Providers Data Providers publishing into an archive Self-describing archives Much of the learning about the constituent UPS archives occurred out of band providing methods for metadata harvesting provide non-technical context for sharing information also Service Providers harvest metadata from providers implement user interface to data Even if these

are done by the same DL, these are distinct roles Metadata Harvesting Move away from distributed searching Extract metadata from various sources Build services on local copies of metadata data remains at remote repositories user individual nodes can still support direct user interaction metadata harvested

offline metadata harvested offline search for cfd applications all searching, browsing, etc. performed on the metadata here local copy of metadata metadata harvested

offline metadata harvested offline ... each node independently maintained Result OAI The OAI was the result of the demonstration and discussion during the Santa Fe meeting Initial focus was on federating collections of scholarly e-print materials

however, interest grew and the scope and application of OAI expanded to become a generic bulk metadata transport protocol Note: OAI is only about metadata -- not full text! what is metadata and what is full text? OAI is neutral with respect to the nature of the metadata or the resources the metadata describes read: commercial publishers have an interest in OAI too... Open Archives Initiative The protocol is openly Archive defined as a documented, and metadata collection of stuff -- Needed a TLA s exposed to at least somenot the archivists

peer group (note: rights definition of archive. management can still apply!)Repository used in most OAI documents. Open Archives Initiative Open Archival Information System exposure of metadata for harvesting insuring long-term preservation of archival materials OAIS OAIS w/

an OAI interface http://www.dlib.org/dlib/april01/04editorial.html http://www.dlib.org/dlib/may01/05letters.html http://ssdoo.gsfc.nasa.gov/nost/isoas/us/overview.html OAI Protocol for Metadata Harvesting Then: OAI-PMH originally a subset of the Dienst (NCSTRL) protocol and originally called the Santa Fe Convention originally defined an OAI-specific metadata format Now: OAI metadata format dropped in favor of unqualified

Dublin Core other formats possible, but DC is required as lowest common denominator No longer dependent on Dienst (Cornell CS TR 95-1514) defined independently (though still easily mappable) Dublin Core Dublin Core Metadata Initiative http://www.dublincore.org/ from 1994-1995, recognizing the need for simple, interoperable metadata for resource discovery

good overview of metadata & DC: http://www.dlib.org/dlib/january01/lagoze/01lagoze.html 15 elements (qualifiers/refinements possible) Title Creator Subject Description Publisher

Contributor Date Type Format Identifier Source Language Relation Coverage

Rights Open Archives Initiative Protocol for Metadata Harvesting OAI-PMH Actors data providers / repositories: A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in [the OAI-PMH document]. A repository is managed by a data provider to expose metadata to harvesters. service providers / harvesters: A harvester is a client application that issues

OAI-PMH requests. A harvester is operated by a service provider as a means of collecting metadata from repositories. Data Providers / Service Providers data providers (repositories) service providers (harvesters) Aggregators aggregators allow for: scalability for OAI-PMH load balancing

community building discovery data providers (repositories) aggregator service providers (harvesters) Aggregators Frequently interchangeable terms: aggregators: likely to be community / institutionally focused caches: stores a copy, less likely to be communityoriented proxies: less likely to store a copy, may gateway between

OAI-PMH and other protocols Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03 To learn more about aggregators, caches & proxies: http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm http://www.cs.odu.edu/~mln/jcdl03/ OAI-PMH Data Model set-membership is item-level property resource

item = identifier all available metadata about David Dublin Core metadata MARC metadata SPECTRUM metadata item records

record = identifier + metadata format + datestamp Overview of OAI-PMH Verbs Verb Identify metadata about the repository Function description of repository ListMetadataFormats metadata formats supported by repository ListSets sets defined by repository

ListIdentifiers OAI unique ids contained in repository ListRecords listing of N records GetRecord listing of a single record harvesting verbs

most verbs take arguments: dates, sets, ids, metadata format and resumption token (for flow control) supporting protocol requests service provider harvester data provider repository Identify Identify / Time / Request Repository identifier Base-URL Admin e-mail OAI protocol version

Description herbert van de sompel r e p o s i t o r y 1.1 Arguments none

Errors none Identify 2.0 Arguments none Errors badArgument supporting protocol requests service provider harvester

data provider repository ListMetadataFormats * identifier=oai:mlib:123a ListMetadataFormats / Time / Request REPEAT Format prefix Format XML schema /REPEAT herbert van de sompel r e

p o s i t o r y ListMetadataFormats 1.1 Arguments identifier (OPTIONAL)

Errors id does not exist 2.0 Arguments identifier (OPTIONAL) Errors badArgument noMetadataForma ts idDoesNotExist supporting protocol requests service provider

harvester ListSets data provider repository * resumptionToken ListSets / Time / Request REPEAT SetSpec SetName /REPEAT herbert van de sompel r e

p o s i t o r y 1.1 ListSets Arguments resumptionToken (EXCLUSIVE)

Errors no set hierarchy 2.0 Arguments resumptionToken (EXCLUSIVE) Errors badArgument badResumptionTok en noSetHierarchy harvesting requests * from=a

data provider * until=b repository * set=klm ListRecords * metadataPrefix=dc r * resumptionToken e p o s ListRecords / Time / Request i REPEAT t Identifier o

Datestamp r Metadata y /REPEAT service provider harvester herbert van de sompel 1.1 ListRecords Arguments

2.0 Arguments Errors Errors

from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) no records match metadata format cannot be disseminated from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED)

noRecordsMatch cannotDisseminateFormat badResumptionToken noSetHierarchy badArgument harvesting requests service provider harvester data provider

* from=a * until=b repository * set=klam * metadataPrefix r ListIdentifiers * resumptionToken e p o s ListIdentifiers / Time / Request i REPEAT t Identifier o

Datestamp r /REPEAT y herbert van de sompel 1.1 ListIdentifiers Arguments

from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) Errors no records match 2.0 Arguments

from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) Errors badArgument cannotDisseminateFormat badResumptionToken

noSetHierarchy noRecordsMatch harvesting requests service provider harvester data provider repository GetRecord * identifier=oai:mlib:123a * metadataPrefix=dc GetRecord / Time / Request Identifier Datestamp Metadata

herbert van de sompel r e p o s i t o r y 1.1 GetRecord

Arguments identifier (REQUIRED) metadataPrefix (REQUIRED) Errors id does not exist metadata format cannot be disseminated 2.0 Arguments identifier (REQUIRED) metadataPrefix (REQUIRED)

Errors badArgument cannotDisseminateFormat idDoesNotExist Argument Summary metadataPrefix from until set resumptionToke n

identifier Identify

ListMetadata Formats optional ListSets

exclusive ListIdentifiers optional

optional optional exclusive ListRecords optional optional

optional exclusive GetRecord

Error Summary Identify BA ListMetadata Formats BA ListSets BA

BRT ListIdentifiers BA BRT CDF NRM NSH ListRecords BA

BRT CDF NRM NSH GetRecord BA NMF IDDNE

NSH CDF IDDNE Generate badVerb on any input not matching the 6 defined verbs this is an inversion of the table in section 3.6 of the OAI-PMH specification Flow Control ListSets, ListIdentifiers, ListRecords are all allowed to return partial responses, via a combination of: resumptionToken an opaque, archive-defined data string that when passed back to the archive allows the response to begin where it left off each archive defines their own resumptionToken

syntax; it may have visible semantics or not 503 http status code retry after up to the harvester to understand this code and respect it, and up to the archive to enforce it resumptionToken scenario: harvesting 277 records in 3 separate 100 record chunks ListRecords harvester Records 1-100, resumptionToken=AXad31

ListRecords, resumptionToken=AXad31 Records 101-200, resumptionToken=pQ22-x ListRecords, resumptionToken=pQ22-x Records 201-277, resumptionToken=(null) RDBMS Lets Look at some Repositories Repository Explorer http://www.purl.org/NET/oai_explorer OAI-PMH 1.0, 1.1, 2.0

Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0 nature experimental experimental stable

verbs Dienst OAI-PMH OAI-PMH requests HTTP GET/POST HTTP GET/POST HTTP GET/POST

responses XML XML XML transport HTTP HTTP HTTP metadata

OAMS unqualified Dublin Core about eprints unqualified Dublin Core document like objects model

metadata harvesting metadata harvesting metadata harvesting resources Santa Fe Convention [02/2000] goal: optimize discovery of e-prints input: the UPS prototype RePEc /SODA data provider / service provider model

Dienst protocol deliberations at Santa Fe meeting [10/99] OAI-PMH v.1.0 [01/2001] goal: optimize discovery of documentlike objects input: SFC DLF meetings on metadata harvesting deliberations at Cornell meeting [09/00] alpha test group of OAI-PMH v.1.0 OAI-PMH v.1.0 [01/2001] low-barrier interoperability specification metadata harvesting model: data provider / service

provider focus on document-like objects autonomous protocol HTTP based XML responses unqualified Dublin Core experimental: 12-18 months Selected Pre- 2.0 OAI Highlights

October 21-22, 1999 - initial UPS meeting February 15, 2000 - Santa Fe Convention published in D-Lib Magazine precursor to the OAI metadata harvesting protocol June 3, 2000 - workshop at ACM DL 2000 (Texas) August 25, 2000 - OAI steering committee formed, DLF/CNI support September 7-8, 2000 - technical meeting at Cornell University defined the core of the current OAI metadata harvesting protocol September 21, 2000 - workshop at ECDL 2000 (Portugal) November 1, 2000 - Alpha test group announced (~15 organizations) January 23, 2001 - OAI protocol 1.0 announced, OAI Open Day in the U.S. (Washington DC) purpose: freeze protocol for 12-16 months, generate critical mass

February 26, 2001 - OAI Open Day in Europe (Berlin) July 3, 2001 - OAI protocol 1.1 announced to reflect changes in the W3Cs XML latest schema recommendation September 8, 2001 - workshop at ECDL 2001 (Darmstadt) OAI-PMH v.2.0 [06/2002] goal: recurrent exchange of metadata about resources between systems input: OAI-PMH v.1.0 feedback on OAI-implementers deliberations by OAI-tech [09/01 - 06/02] alpha test group of OAI-PMH v.2.0 [03/02 - 06/02] officially released June 14, 2002 OAI-PMH v.2.0 [06/2002] low-barrier interoperability specification

metadata harvesting model: data provider / service provider metadata about resources autonomous protocol HTTP based XML responses unqualified Dublin Core stable releasing OAI-PMH v.2.0 (illustrating the OAI process) See also Lagoze, Carl and Van de Sompel, Herbert. The making of the Open Archives Initiative Protocol for Metadata Harvesting. 2003. Library Hi Tech. v21, N2. Draft

creation of OAI-tech pre-alpha phase alpha-phase beta-phase creation of OAI-tech [06/01] created for 1 year period charge: review functionality and nature of OAI-PMH v.1.0 investigate extensions release stable version of OAI-PMH by 05/02 determine need for infrastructure to support broad adoption of the protocol communication: listserv, SourceForge, conference calls OAI-tech

US representatives Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) Michael Nelson (NASA) - Caroline Arms (LoC) - Mohammad Zubair (Old Dominion U) - Steven Bird (U Penn.) European representatives Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) Thomas Baron (CERN) - Les Carr (U of Southampton) pre-alpha phase [09/01 02/02] review process by OAI-tech: identification of issues conference call to filter/combine issues white paper per issue on-line discussion per white paper proposal for resolution of issue by OAI-exec discussion of proposal & closure of issue conference call to resolve open issues

pre-alpha phase [02/02] creation of revised protocol document in-person meeting Lagoze - Van de Sompel - Nelson Warner autonomous decisions internal vetting of protocol document alpha phase [02/02 05/02] alpha-1 release to OAI-tech March 1st 2002 OAI-tech extended with alpha testers discussions/implementations by OAI-tech ongoing revision of protocol document

OAI-PMH 2.0 alpha testers (1/2) The British Library Cornell U. -- NSDL project & e-print arXiv Ex Libris FS Consulting Inc -- harvester for my.OAI Humboldt-Universitt zu Berlin

InQuirion Pty Ltd, RMIT University Library of Congress NASA OCLC OAI-PMH 2.0 alpha testers (2/2) Old Dominion U. -- ARC , DP9 U. of Illinois at Urbana-Champaign U. Of Southampton -- OAIA (now Celestial), CiteBase, eprints.org UCLA, John Hopkins U., Indiana U., NYU -sheet music collection UKOLN, U. of Bath -- RDN Virginia Tech -- repository explorer beta phase [05/02-06/02]

beta release on May 1st 2002 to: registered data providers and service providers interested parties fine tuning of protocol document preparation for the release of 2.0 conformant tools by alpha testers OAI-PMH v.2.0 highlights quick recap important improvements in 2.0 corrections new functionality important improvements

protocol vs periphery clear distinction between protocol and periphery fixed protocol document extensible implementation guidelines: e.g. sample metadata formats, description containers, about containers allows for OAI guidelines and community guidelines OAI-PMH vs HTTP clear separation of OAI-PMH and HTTP OAI-PMH error handling

all OK at HTTP level? => 200 OK something wrong at OAI-PMH level? => OAI-PMH error (e.g. badVerb) http codes 302, 503, etc. still available to implementers, but no longer represent OAIPMH events other improvements better definitions of harvester, repository, item, unique identifier, record, set, selective harvesting oai_dc schema builds on DCMI XML Schema for unqualified Dublin Core usage of must, must not etc. as in RFC2119 wording on response compression other improvements

all protocol responses can be validated with a single XML Schema easier for data providers no redundancy in type definitions SOAP-ready clean for error handling response no errors 2002-0208T08:55:46Z http://arXiv.org/oai2

oai:arXiv:cs/0112017 2001-12-14

cs math note no http encoding

of the OAI-PMH request .. response with error 2002-0208T08:55:46Z http://arXiv.org/oai2

ShowMe is not a valid OAI-PMH verb with errors, only the correct attributes are echoed in corrections dates/times all dates/times are UTC, encoded in ISO8601, Z-notation 1957-03-20T20:30:00Z resumptionToken idempotency of resumptionToken: return same incomplete list when rT is reissued

while no changes occur in the repo: strict while changes occur in the repo: all items with unchanged datestamp new, optional attributes for the resumptionToken: expirationDate completeListSize cursor noRecordsMatch 1.x - if no records match, an empty list was returned noRecordsMatch 2.0 - if no records match, the error condition noRecordsMatch is returned -- not an empty list new functionality

harvesting granularity harvesting granularity mandatory support of YYYY-MM-DD optional support of YYYY-MM-DDThh:mm:ssZ other granularities considered, but ultimately rejected granularity of from and until must be the same Identify Identify more expressive Library of Congress 1 http://memory.loc.gov/cgi-bin/oai 2.0 [email protected] [email protected]

1990-02-01T00:00:00Z transient YYYY-MM-DDThh:mm:ssZ deflate header header contains set membership of item

oai:arXiv:cs/0112017 2001-12-14 cs math

..

eliminates the need for the double harvest 1.x required to get all records and all set information ListIdentifiers ListIdentifiers returns headers 2002-0208T08:55:46Z http://arXiv.org/oai2

oai:arXiv:hep-th/9801001 1999-02-23 physic:hep

oai:arXiv:hep-th/9801002 1999-03-20 physic:hep physic:exp

ListIdentifiers ListIdentifiers mandates metadataPrefix as argument http://www.perseus.tufts.edu/cgi-bin/pdataprov? verb=ListIdentifiers &metadataPrefix=olac &from=2001-01-01 &until=2001-01-01

&set=Perseus:collection:PersInfo ListIdentifiers the changes to ListIdentifiers are subtle, and reflect a change in the OAI-PMH data model Could have been named ListHeaders or reduced to an option for ListRecords ListIdentifiers kept for lexigraphical consistency metadataPrefix character set for metadataPrefix and setSpec extended to URL-safe characters A-Z a-z 0-9 _ ! $ ( ) + - . * in the periphery

provenance introduction of provenance container to facilitate tracing of harvesting history http://an.oa.org oai:r1:plog/9801001 2001-08-13T13:00:02Z oai_dc 2001-08-15T12:01:30Z

friends introduction of friends container to facilitate dynamic discovery of repositories http://cav2001.library.caltech.edu/perl/oai http://formations2.ulst.ac.uk/perl/oai http://cogprints.soton.ac.uk/perl/oai http://wave.ldc.upenn.edu/OLAC/dp/aps.php4 branding introduction of branding container for DPs to suggest rendering & association hints

http://my.site/icon.png http://my.site/homepage.html MySite(tm) 88 31 http://some.where/DCrender.xsl http://another.place/MARCrender.css

oai-identifier revision of oai-identifier oai oai-stuff.foo.org : oai:oai-stuff.foo.org:5324 domain based repository names

oai_dc OAI 1.x: oai_dc Schema defined by OAI OAI 2.0: oai_dc Schema imports from DCMI Schema for unqualified DC elements MARC21 OAI 1.x: oai_marc OAI 2.0: LoC marxml, oai_marc http://www.loc.gov/standards/marcxml/ did not make it into OAI-PMH v.2.0 SOAP implementation Result set filtering Multiple / best metadata

GetRecord -> GetRecords Machine readable rights management XML format for mini-archives Example Data and Service Providers NTRS OAI Architecture all searching, browsing etc. performed on the metadata here user individual nodes can still support direct user interaction

search for cfd applications NTRS local copy of metadata metadata harvested offline, through OAI interface LTRS ATRS GTRS

... CASITRS content (reports) remain archived at the local sites each node independently maintained NASA Technical Report Server replacement for the previous distributed searching version of NTRS

MySQL Va Tech harvester modified bucket details in Nelson, Rocker, Harrison, Library Hi-Tech, 21(2) (March 2003) a service provider & aggregator same OAI baseURL as used for interactive searching

http://ntrs.nasa.gov/ NASA Technical Report Server advanced, fielded search explicit query routing 12 NASA repositories 4 non-NASA repositories turned off by default >600k abstracts; >300k full-text NASA DLs in the Larger STI

Realm Publishers Universities DOD International ... DOE this could be a fully connected graph NTRS could also be a

data provider from the point of view of other DLs; allowing the harvesting of NASA report metadata. NTRS could also harvest metadata from other DLs, and provide access to non-NASA content. NTRS LTRS ATRS

CASITRS We hope to influence the direction of the science.gov effort to use OAI-PMH New Kinds of DLs Drawing from the same pool of DPs different interfaces, capabilities and collection policies for:

public affairs K-12 education science & research authors / librarians / managers NTRS and NIX could harvest from the same sources be the same DL, but with different interfaces? be replaced with a new, all-encompassing DL? DL creators can now focus on collection management ala carting their collections and sub collections instead of fussing over syntax synchronization of remote search services Scientific Communication With only some exceptions, which interface is used for discovery is not as

important as the fact that discovery occurred in the first place control of the discovered objects is not lost by data providers however, higher level mirroring services can be built on top of OAI (cf. NACA & ARC mirroring between NASA LaRC and MAGiC) NACA Technical Report Server publicly available began in 1996 details in NASA TM-1999209127 scanned reports from 19171958 NACA = predecessor to NASA

contents mirrored with the MaGIC project a UK-based grey-literature preservation project OAI-PMH used to mirror contents http://naca.larc.nasa.gov/ http://naca.larc.nasa.gov/oai2.0/ NACA Report 1345 as seen through its native DL http://naca.larc.nasa.gov/ NACA Report 1345 as seen through MAGiC

http://www.magic.ac.uk/ NACA Report 1345 as seen through Scirus (Elsevier) http://www.scirus.com/ NACA Report 1345 as seen through my.OAI (FS Consulting) http://www.myoai.com/ What Does OAI-PMH Mean for Authors? On the surface, absolutely nothing! the ideal OAI deployment should be absolutely invisible to normal DL operations uninterested users should not even notice or care

Indirectly, they should enjoy the benefits of the critical mass of current and developing DL tools & systems personal, institutional data providers proliferation of targeted, value-added service providers What Does OAI-PMH Mean For Publishers & Institutions? Absolutely everything The decoupling of SPs and DPs will have significant and profound implications on scientific and technical information exchange OAI-PMH is actually just one component in a larger engineering effort for scholarly communication (e.g. OpenURL)

Service and resource integration will be the focus of journals, professional societies, universities, etc. OAI-PMH will be a basic, core technology for scientific publishing as http & XML Field of Dreams It should be easy to be a data provider, even if it makes more work for the service provider. if enough data providers exist, the service providers will come (DPs >> SPs) Open-source / freely available tools drop-in data providers: industrial strength: http://www.eprints.org/ personal size: http://kepler.cs.odu.edu/

tools to make your existing DL a data provider: http://www.openarchives.org/tools/tools.htm also: OAI-implementers mailing list / mail archive! service providers: Arc: http://sourceforge.net/projects/oaiarc/ OAI-PMH Meeting History OAI Open Day, Washington DC 1/2001 2nd OAI Workshop CERN 10/2002 4

Protocol definition, development tools 1 5 DPs, retrofitting existing DLs 4 1 SPs, new services

11 0 Socio-EconomicPolitical Issues 6 Shift of Topics From the protocol itself, supporting & debugging tools and how to retrofit (existing) DLs to building (new) services that use the OAI-PMH as a core technology and reporting on their impact to the institution/community

Arc http://arc.cs.odu.edu/ harvests all known archives first end-user service provider source available through SourceForge hierarchical harvesting NCSTRL http://www.ncstrl.org/ metadata harvesting replacement for Dienstbased NCSTRL based on Arc

computer science metadata Archon http://archon.cs.odu.edu/ physics metadata based on Arc features: citation indexing equation-based searching Torii http://torii.sissa.it/ physics metadata

features personalization recommendations WAP access iCite http://icite.sissa.it/ physics metadata features citation based access to arXiv metadata my.OAI

http://www.myoai.com/ covers all registered metadata features result sets personalization many other advanced features Cyclades http://www.ercim.org/cyclades scientific metadata

features personalization recommendations collaboration status? citebase http://citebase.eprints.org/ arXiv metadata citation based indexing, reporting OAIster

http://oaister.umdl.umich.edu/ harvests all known archives Public Knowledge Project http://www.pkp.ubc.ca/harvester/ domain-specific filtering of harvested metadata (?) Perseus

http://www.perseus.tufts.edu/ they claim to harvest all DPs, but only humanities related DPs appear in the pull down menu Others Commercial publishers American Physical Society (APS) Institute of Physics (IOP) Elsevier / Scirus (www.scirus.com)

Department of Energy OSTI LANL Institutional servers DSpace (MIT; www.dspace.org) Eprints (www.eprints.org) DARE (All Dutch universities) Service Providers It is clear that SPs are proliferating, despite (because of?) the inherent bias toward DPs in the protocol easy to be a DP -> many DPs -> SPs eventually emerge hard to be a DP -> SPs starve currently 5x DPs more than SPs

SPs are beginning to offer increasingly sophisticated services competitive market originally envisioned for SPs is emerging OAI-PMH Observation: Front-End Only No input/registry mechanism OAI-PMH is always a front-end for something else filesystem, Dienst, RDBMS, LDAP, etc. convenient for pre-existing DLs, but does not address new DLs e.g., we want to do OAI Bounds the scope of OAI tension between functionality and simplicity

OAI-PMH Observation: No T&C No terms & conditions provisions assumes all metadata has uniform access rights how to restrict metadata to certain hosts? (see upcoming OAI-rights discussion) introducing T&C would increase the scope of application, but at the expense of simplicity how expensive do we want to make a just-afront-end protocol ? OAI-PMH Observation: No T&C Possible to use multiple repositories in a DMZ-like configuration

OAI requests from trusted hosts OAI requests from arbitrary hosts Public OAI Server Private OAI Server Source database could even use a separate copy of the database OAI-PMH Observation: No T&C

Possible to use OAI-PMH in closed, restricted systems all OAI requests originate from these 4 DLs OAI 1 OAI 2 OAI 4 OAI 3 see Technical Report Interchange Project --- http://www.cs.odu.edu/~mln/pubs/tri.pdf OAI-PMH Observation: Monolithic

A repository has no protocol-defined concept of other OAI repositories was added in 2.0 backups, mirrors, etc. have to be resolved outside of the scope of OAI scope vs. complexity again fully connected graph of DLs harvesting from each other is unnecessary cf. web crawlers vs. gathers in U of Colorados Harvest System 3rd party harvesting interfaces raise more T&C and data coherency issues OAI-PMH Observation: Data Coherency In the interest of implementer simplicity, several

issues are left for the service provider to interpret what is an update vs. addition? in the NACA repository, they are reported as the same and its up to the harvesting system to figure it out deletions? it is currently optional for repositories to mark records as deleted or not still left to the harvester to interpret Liu, et al., JCDL 2003 Repository Synchronization in the OAI Framework http://www.cs.odu.edu/~mln/pubs/freshness-jcdl.pdf OAI-PMH Observation: Harvest Model Frequency of harvests

all-at-once harvests? initial harvest resolving data coherency frequent incremental harvests? far more efficient for both service and data providers Webcrawling vs. digital library models webcrawlers: little to no a priori information about target DLs: frequent harvesting of a small number of known targets DC?! Metadata Q: Which format should I use? A: any/all of them

lowest common denominator: unqualified Dublin Core Again, little known about actual behavior will DC be actually be useful? or too lossy? will communities create/adopt specific formats? will native (presumably richer) formats be harvested? The Return of MARC ?! we very much want this to happen... XML Observations Service providers XML can be pretty picky a large

ListRecords result can be invalidated with a single error harvest in chunks? individual records? author contributed metadata particularly a problem (e.g. control characters from copy-npaste) one advantage of resumptionToken is that it compartmentalizes bad data Why The OAI-PMH is NOT Important Users dont care OAI-PMH is middleware I A O id e

s In if done right, the uninterested user should never have to know Using OAI-PMH does not insure a good SP OAI-PMH is (or is becoming) HTTP for DLs few people get excited about http now http & OAI-PMH are core technologies whose presence is now assumed

Recently Viewed Presentations

  • A NEW BAND PLAN FOR 15.7 - IEEE Standards Association

    A NEW BAND PLAN FOR 15.7 - IEEE Standards Association

    This documents introduce the V2I CAMCOM Link design consideration for VAT. The proposed Lighting infrastructure system model design parameters consideration for VAT to operate on the application services like ITS, ADAS, etc on road condition, Bio-Plant / Manufacturing Industry Safety...
  • A. Morton Thomas and Associates, Inc. Booth 107 Engineering ...

    A. Morton Thomas and Associates, Inc. Booth 107 Engineering ...

    A. Morton Thomas and Associates, Inc. Booth 107 Engineering, surveying, landscape architecture, planning Don Rissmeyer, Associate 10710 Midlothian Turnpike Suite 202, Richmond, VA 23235 (804) 276-6231 www.amtengineering.com Last modified by: gharter
  • ARGON AND AIR IN 2-D BUBBLE TEST CELL

    ARGON AND AIR IN 2-D BUBBLE TEST CELL

    images of colorimetric reactions induced by air plasma in . KI-Starch (tap water) solution. DOE Plasma Science Center. Control of Plasma Kinetics. Iodide is oxidized into iodine via advanced oxidation processes initiated by ROS generated by air plasma. Iodine is...
  • Ocean Pollution - WordPress.com

    Ocean Pollution - WordPress.com

    There's also the problem of the oil that goes down with the ship, which can contaminate the seabed and marine organisms. This oil can also resurface. In 2001, a cyclone off the island of Yap in Micronesia disturbed the oil...
  • 18 - Texas Tech University

    18 - Texas Tech University

    The goal of IMC is to ensure all the various marketing mix elements work together to deliver a consistent message. Therefore, IMC takes the best of each communications medium and combines it to achieve the most effective marketing communications campaign...
  • HDF5 in support of heterogeneous databases

    HDF5 in support of heterogeneous databases

    METS, PREMIS, ESML, NcML, and CSML. Concluded: Existing standards have different purposes than mapping schema. None meet all needs of mapping project. Develop new schema tailored to project goals. Harmonize with PREMIS. Leverage terminology and approaches from all.
  • How To Set Up A Group Bank Account

    How To Set Up A Group Bank Account

    • Not for businesses. Banks usually charge for business accounts. • Offers "free banking". This will make sure you are not charged simply for having the account. Opening . a Club or Society account will require proof that your group...
  • The Power of Planning and Sorcery of Strategy

    The Power of Planning and Sorcery of Strategy

    How does it apply to Academic Advising? Recognize goals or mission, strategize, and measure the outcome. Key Components of Strategic Planning. Mission and Values. Strengths, weaknesses, opportunities, and threats. Strategies. Measurements .