Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The observant reader will notice that the IngestService findResource method returns a resource (of a given type) for an item without any additional parameters or information. How is this possible? On what basis is the selection made? CGI assumes that, in general, each resource type bears a similar relationship with each of its items, and thus that it can be expressed in a rule or formula. For reasons that will become apparent, these rules are known as expressions in a small grammar called RCL (Resource Composition Language). In most cases, RCL just amounts to a very simple template. For example, suppose we wanted to express the rule that DSpace currently uses in selecting an input form for configurable submission. We would write the property containing the expression as:

...

For clients of the CGI services, knowledge of the APIs and configuration properties discussed above should be sufficient to obtain the primary benefits of the facility: it makes no difference how they are implemented. But for the curious, the following discussion details some of the key decisions and trade-offs that were could be encountered in creating an acceptable, performant, and sustainable CGI implementation. And even the base service implementation raises numerous questions about user interfaces (esp. to the ResourceMapping service, but possibly also the ContextService) that might invoke them, which the base CGI does not address directly at all.

...

Perhaps the biggest challenge concerns how persistence can be managed. Recall that the IngestContexts, ResourceMaps, etc are all persistent, as well as the Resource objects themselves. The latter already mostly exist, and have persistent serializations in XML files (input-forms.xml and that crowd), so we will split the topic into two parts: context and mapping persistence, and resource persistence.

...

Context and

...

Mapping Persistence

Given that IngestContexts may be read and written to frequently, most file-based serializations (e.g. XML) do not seem attractive. The obvious alternative is the RDBMS, but we recall that CGI is an add-on, so we are reluctant to graft tables or columns to the standard DSpace database_schema.sql. Worse still, we have no Oracle licenses, (or expertise), etc. so cannot ensure that our database logic is portable across vendors. In fact, there is no existing practice for add-ons using DB tables at all. Is there a way forward? Fortunately, considerable work has been done in the area of virtualizing access to SQL data sources since DSpace was first written, and industry-standard, widely supported, performant, open source tools exist. The current Java enterprise standard is JPA (Java Persistence API), which we will use as the persistence layer for CGI data. These so-called 'ORM' (object-relational mapping) tools combine Java native OO semantics with familiar SQL constructs (e.g. query languages) to provide familiar but powerful programming idioms. Add to this convenient Java 5-style annotations to POJOs, and the result is concise but readable database code. As an illustration, here is a complete implementation of the IngestContext service class (remember again, this is merely demo code):

Code Block

@Entity
@NamedQuery(name="ic.findByItem",
            query="SELECT c FROM IngestContext c WHERE c.itemId = :itemId")
public class IngestContext {
    @Id @GeneratedValue(strategy=GenerationType.AUTO)
    private int id;
    private int itemId;
    @ElementCollection
    private Map<String, String> attributes = new HashMap<String, String>();

    public IngestContext() {}

    public String getAttribute(String name) {
        return attributes.get(name);
    }

    public void setAttribute(String name, String value) {
        attributes.put(name, value);
    }
}

The JPA persistence provider (e.g. Hibernate) code will worry about creating database schema, ensuring the correct SQL dialect for every vendor, etc. - responsibilities we are quite content to delegate.

Resource Persistence

The situation with Ingest Resources is quite different, as noted above: current DSpace practice (which basically all derives from configurable submission) is to encode the resource data in an XML file, which is parsed at