Date: Fri, 29 Mar 2024 11:42:05 -0400 (EDT) Message-ID: <1177413896.285.1711726925721@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_284_735332623.1711726925720" ------=_Part_284_735332623.1711726925720 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Contents
We use a few different metadata standards in DSpace in various places:= p>
However, they're used differently in different places. E.g. MODS is in t= he METS exporter code, Simple DC is in the OAI-PMH code. Why can't the OAI-= PMH code serve up MODS? Then we could have e.g.a single batch import/export= tool to concentrate our development efforts on instead of having multiple = ones.
This proposal is only concerned with importing and exporting metadat= a; see PackagerPlugins= for a parallel module to handle packages (SIPs and DIPs). The pac= kager plugins call these crosswalk plugins to handle the metadata when inge= sting or disseminating.
Ingestion in general is far more complicated and awkward than disseminat= ion, so they are considered separately, and naturally the easiest is consid= ered first :wink:
After conversations with Richard R & Larry, various renaming has bee= n done, for consistency and predictable behaviour.
*SubmissionCrosswalk -> *IngestionCrosswalk
MetsDissemination -> METSDisseminationCrosswalk
ModsCrosswalk -> MODSDisseminationCrosswalk
NullSubmissionCrosswalk -> NullIngestionCrosswalk
PremisCrosswalk -> PREMISCrosswalk
SimpleDCCrosswalk -> SimpleDCDisseminationCrosswalk
Xslt*Crosswalk -> XSLT*Crosswalk
The Crosswalk Plugin interface described here only addresses XM= L-based metadata formats. Since OAI-PMH can only export XML, and metadata c= ontainers like METS and IMS-CP have a preference for XML metadata, this is = not seen as an important limitation at this time. If there is a need, anyon= e can add a new plugin interface to handle binary or text-based metadata (e= .g. old-style MARC).
This file contains the interfaces and some sample crosswalk implementati= ons. It is still highly experimental and subject to sudden changes= so do not rely on the stability of this code: crosswalk2.zip=
Also see XsltCrosswalk for= another sample use of this plugin.
Whenever a DSpace object has to translate its metadata into some externa= l metadata format, and whenever an external metadata record is applied to a= DSpace object, call on a Crosswalk Plugin. All crosswalk activity= should live in the plugins, so every crosswalk developed for one purpose c= an be shared by all the consumers of crosswalks.
Consumers are typically:
This implementation includes a module for the OAI-PMH server, <i>o=
aicat</i>, which lets it use any <i>DisseminationCrosswalk</=
i> plugin. The single class <i>org.dspace.app.oai.PluginCrosswalk&=
lt;/i> implements any metadata prefix that matches the name of a dynamic=
plugin crosswalk.
Just add lines like these to <i>oaicat.properties</i>:
Crosswalks.MODS=3Dorg.dspace.app.oai.PluginCrosswalk
Crosswalks.OCW-LOM=3Dorg.dspace.app.oai.PluginCrosswalk
For dissemination, the metadata crosswalk turns an item's internal DC va= lues into a serialized representation, such as XML.
Note that metadata disseminations can be nested. For example, t= he METS format is actually a framework that includes (or r= efers to) objects in other standard formats. One disseminator could produce= METS with MODS descriptive metadata, while another produces METS with DC.<= /p>
Here is the disseminator interface, as implemented experimentally:
public interface DisseminationCrosswalk
{
// returns array of namespaces, which may be empty.
public org.jdom.Namespace[] getNamespaces();
// returns SchemaLocation string, including URI namespace,
// followed by whitespace and URI of XML schema document, or
// empty string if unknown.
public String getSchemaLocation();
// predicate, true if the given object can be crosswalked.
public boolean canDisseminate(DSpaceObject dso);
// returns results of crosswalk as list of XML elements.
public java.util.List disseminateList(DSpaceObject dso)
throws CrosswalkException,
IOException, SQLException, AuthorizeException;
// returns results of crosswalk as one XML element, root of document.
public org.jdom.Element disseminateElement(DSpaceObject dso)
throws CrosswalkException,
IOException, SQLException, AuthorizeException;
}
Note: The Dissemination methods do not have a Context object pa=
rameter,
since it is not required to get an object's DC values, and some callers (n=
otably
the OAI-PMH server) don't have a context available.
Since the disseminator is a dynamic plugin, use the PluginManager to get one:
DisseminationCrosswalk crosswalk =3D
PluginManager.getNamedPlugin(DisseminationCrosswalk.class, "MODS");
List mods =3D crosswalk.disseminateAsXml(item);
....
Crosswalk plugins are "configured" by being listed as dynamic plugins in= the PluginManager configurat= ion properties; it does the rest. A single class may implement both <i&g= t;DisseminationCrosswalk</i> and <i>SubmissionCrosswalk</i&g= t;.
The main issue being, is an `Item` object the most appropriate to pass i= n? Perhaps just a Handle? Or database ID? We need to enable efficient imple= mentations of this, but at the same time the `MetadataDisseminator` impleme= ntation shouldn't have to do all the work.
/!\ Changed to pass a <i>DSpaceObject</i> to disseminator, s=
ince collections and communities have metadata too; however, each class doe=
sn't need to implement it.
The submission crosswalk only needs to handle Items since only items have =
DC metadata, and we only "import" Items.
Passing in options. Disseminators might have common or specific paramete= rs. e.g.:
Trickier. This will need to support 'ingest new stuff' as well as 'updat= e stuff that's already there'; and also to support ingesting stuff that alr= eady has a persistent ID (such as a Handle) rather than having a new one cr= eated by the system.
The contract of a metadata ingester is to interpret the XML structure it= is given as metadata values, and set the appropriate values in the DSpace = Object (e.g. Item) metadata. See MetadataSupport for a proposal to attach metadata fields from schemas= other than DC to Items.
Here is a possible interface:
public interface SubmissionCrosswalk
{
// crosswalk from root element of a document
public void ingest(Context context, DSpaceObject dso, org.jdom.Element roo=
t)
throws CrosswalkException, IOException, SQLException, AuthorizeException;<=
/p>
// crosswalk from list of metadata fields
public void ingest(Context context, DSpaceObject dso, java.util.List eleme=
nts)
throws CrosswalkException, IOException, SQLException, AuthorizeException;<=
br>
}
Should the behavior be different when the Item already has valid metadat= a? Is there a need for a filter, limiting what fields can be set?
The ingester has the same nesting configuration issue as the disseminato= r; a framework format such as METS may call on several other ingesters to i= nterpret the metadata embedded in (or linked from) its stream.
The Submission and Dissemination crosswalk plugins share a family of exc= eptions, under the superclass `CrosswalkException`. They are:
Something went wrong inside the crosswalk, not necessarily caused by the= input or state (although it could be an incorrectly handled pathological c= ase). This is most likely a configuration problem. It deserves its own exce= ption because many crosswalks are configuration-driven (e.g. the XSLT cross= walks) so configuration errors are likely to be common enough that they oug= ht to be easy to identify and debug.
This indicates a problem with the input metadata (for submission) or ite= m state (dissemination). It is invalid or incomplete, or simply unsuitable = to be crosswalked.
I think the interface should use Java generics to allow compile time dat= a type checking. I think everywhere were we currently use the Java collecti= on API we should use generics. Thus, replace <i>java.util.List</i&= gt; use <i>java.util.List<org.jdom.Element></i>. This wil= l require that DSpace be compiled with a Java 1.5 compiler (However an olde= r jvm, such as 1.4, will still be able to exceute the bytecode). =E2=80=93 = ScottPhillips