Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Ingestion in general is far more complicated and awkward than dissemination, so they are considered separately, and naturally the easiest is considered first (wink)

Renaming note

After conversations with Richard R & Larry, various renaming has been done, for consistency and predictable behaviour.

Panel

*SubmissionCrosswalk -> *IngestionCrosswalk
MetsDissemination -> METSDisseminationCrosswalk
ModsCrosswalk -> MODSDisseminationCrosswalk
NullSubmissionCrosswalk -> NullIngestionCrosswalk
PremisCrosswalk -> PREMISCrosswalk
SimpleDCCrosswalk -> SimpleDCDisseminationCrosswalk
Xslt*Crosswalk -> XSLT*Crosswalk

XML Formats Only

The Crosswalk Plugin interface described here only addresses XML-based metadata formats. Since OAI-PMH can only export XML, and metadata containers like METS and IMS-CP have a preference for XML metadata, this is not seen as an important limitation at this time. If there is a need, anyone can add a new plugin interface to handle binary or text-based metadata (e.g. old-style MARC).

Sample Implementation

This file contains the interfaces and some sample crosswalk implementations. It is still highly experimental and subject to sudden changes so do not rely on the stability of this code: crosswalk2.zip

...

  • The OAI-PMH metadata provider server.
  • Network-based Package ingest and dissemination e.g. through LightweightNetworkInterface
  • Batch (command-line) ingest of packages through PackagerPlugins
  • Classic <i>ItemImporter</i> batch importer.

OAI-PMH plugin-driven Crosswalk

This implementation includes a module for the OAI-PMH server, <i>oaicat</i>, which lets it use any <i>DisseminationCrosswalk</i> plugin. The single class <i>org.dspace.app.oai.PluginCrosswalk</i> implements any metadata prefix that matches the name of a dynamic plugin crosswalk.
Just add lines like these to <i>oaicat.properties</i>:

...

Panel

List mods = crosswalk.disseminateAsXml(item);
....

Configuration

Crosswalk plugins are "configured" by being listed as dynamic plugins in the PluginManager configuration properties; it does the rest. A single class may implement both <i>DisseminationCrosswalk</i> and <i>SubmissionCrosswalk</i>.

Dissemination Issues

The main issue being, is an `Item` object the most appropriate to pass in? Perhaps just a Handle? Or database ID? We need to enable efficient implementations of this, but at the same time the `MetadataDisseminator` implementation shouldn't have to do all the work.

...

  • "Disseminators" the right word? Maybe just "crosswalk" or `createPackage`
    • I vote "crosswalk" --lcs
  • Filter out certain things? e.g. in OAI-PMH export of METS, you may want to exclude some provenance information. Already in OAI-PMH export, `description.provenance` is filtered out for privacy reasons. (It includes the email address of submitters.)
  • Package disseminators might want to know which metadata format to package. e.g. METS could include Dublin Core, MODS, or anything else...
    • It would be helpful to parameterize a crosswalk "type" in the configuration. This could get hairy since they would each want different sets of parameters (e.g. DC has none, METS has several sub-formats).
    • This could be an issue for a generalized plug-in framework to solve; each instance of a plugin is further specialized by a set of other plugins. The PackagerPlugins could use a mechanism like that to refer to crosswalk plugins, as well. <B>NOTE: I think the best way to do this is a superclass parametereized by subclassing it; harder to code but gives more flexibility, and since metadata has to meet outside specifications it is probably not the kind of thing you want to be configuring on the fly anyway.</b>

Submission (Ingest)

Trickier. This will need to support 'ingest new stuff' as well as 'update stuff that's already there'; and also to support ingesting stuff that already has a persistent ID (such as a Handle) rather than having a new one created by the system.

...

Panel

// crosswalk from list of metadata fields
public void ingest(Context context, DSpaceObject dso, java.util.List elements)
throws CrosswalkException, IOException, SQLException, AuthorizeException;
}

Ingester Issues

Should the behavior be different when the Item already has valid metadata? Is there a need for a filter, limiting what fields can be set?

The ingester has the same nesting configuration issue as the disseminator; a framework format such as METS may call on several other ingesters to interpret the metadata embedded in (or linked from) its stream.

Exceptions

The Submission and Dissemination crosswalk plugins share a family of exceptions, under the superclass `CrosswalkException`. They are:

CrosswalkInternalException

Something went wrong inside the crosswalk, not necessarily caused by the input or state (although it could be an incorrectly handled pathological case). This is most likely a configuration problem. It deserves its own exception because many crosswalks are configuration-driven (e.g. the XSLT crosswalks) so configuration errors are likely to be common enough that they ought to be easy to identify and debug.

MetadataValidationException

This indicates a problem with the input metadata (for submission) or item state (dissemination). It is invalid or incomplete, or simply unsuitable to be crosswalked.

Generics

I think the interface should use Java generics to allow compile time data type checking. I think everywhere were we currently use the Java collection API we should use generics. Thus, replace <i>java.util.List</i> use <i>java.util.List<org.jdom.Element></i>. This will require that DSpace be compiled with a Java 1.5 compiler (However an older jvm, such as 1.4, will still be able to exceute the bytecode). – ScottPhillips

...