Fedora Architecture SummitCornell University, Ithaca, New York A meeting with leading members of the Fedora community to discuss the future of Fedora's architecture. PresentationsAll presentations are available online here Action ItemsNOTE: Prioritization is indicated by percent of attendees that voted an item to be: Essential, Highly Desirable, or Nice-to-Have. For example, "90/10/0" translates to ninety percent said it was "Essential," ten percent said it was "Highly Desirable" and zero percent said it was "Nice-to-Have." The items below are roughly ordered from highest to lowest priority based on this scheme. Transaction Management for Create/Read/Update/Delete (i.e., CRUD) Operations Upon a Graph of Related Digital Objects90/10/0 This requirement makes the observation that increasingly the "entity of management" for Fedora is a graph of related digital objects, as opposed to a single digital object. This is due to the reality of how digital objects are being modeled by organizations - deconstructing entities in constituent parts, and let each part be independent entity. We see more examples of compound entities made up of interrelated digital objects (i.e., "atomistic approach," "graph-oriented content models," "networks of objects"). The current Fedora APIs and management modules were designed to manage a transaction for a single operation on a single digital object (e.g., ingest an object, modify a datastream in an object). A simple example of this requirement exists around ingesting a compound entity such as an article object, where the text is one digital object, each figure is a separate "image" digital objects, and accompanying data is its own digital object. The requirement is to ingest all constituent parts as one composition, and if something fails on one component part, we want to roll back the ingest of the whole thing (to avoid having an incomplete/broken object). Another example is found in cases where humanities scholars are working on a set of digital objects representing a text, associated annotations and interpretive analysis. In such cases, a scholarly task can target several related objects. To support this, we must ensure that a scholar's modifications to multiple objects in a network are committed together (i.e., to prevent the work from existing in an unintelligible state) and, also, we must prevent two or more scholars from committing incompatible changes to the same objects (i.e., to enable reliable editing of the work). Currently, Fedora implementers deal with this via custom middleware that is designed to manage specific transactions around pre-defined "content models" or information models for their applications. The summit group put forth a new requirement that Fedora provide native support for CRUD operations upon digital object compositions that are a graph of related objects. There are different ways to accomplish this that should be investigated. A simple way forward might be to enable Fedora to receive a message/request that encodes a "batch" of API-M service requests to be executed as one transaction. In this scenario, you can imagine submitting a set of API-M requests that interleave read and write operations that pertain to the creation or updating of a compound entity make up of several related digital objects. If something fails in executing the set of operations, the whole set of operations will be rolled back. The idea is to create a "transaction boundary" that encompasses the whole composition of related objects. Other approaches might be built around the notion of a content model (needs more investigation) and have a new API operation that accepts a SIP that describes a graph of related objects. (Note relationship to ORE work here). Note that this scheme could be reduced to operate on a single digital object, since a graph could just have one node. Faster Performance for Retrieving Graphs of Digital Objects70/30/0 Devise ways to facilitate faster access/retrieval of digital object graphs, including the retrieval of datastream content (byte streams). One aspect of the problem is to avoid an explosion of Fedora API callback requests such as "getDatastream." An example of where this arises is in Topaz when a digital object retrieval requires lots of Fedora API-A calls to put together a "unit of viewing" or "unit of management" which is a graph of objects - and it is desirable/necessary to pre-fetch all content, as opposed to a more lazy approach to callback for content as needed. In such a case it is desirable to avoid lots of SOAP/REST requests, each returning independently. One idea is to have a way to return a "DIP" of the whole graph with all the binaries in it (SOAP with attachment or multipart MIME). From a pure API perspective, there is the possibility of new Fedora API operations:
Note: We should also think about the interplay of querying a triplestore (e.g., SPO on MPTstore; XQuery on DbXML; ITQL on Mulgara(itql) and calls to Fedora APIs to get datastream content. Note: create/update/delete operations are addressed in requirements for CRUD transactions on graph of objects (described in #1 above). CMDA: Better Service-to-Object Mapping in Fedora70/30/0 The CMDA design specification maintains the basic concept of a Fedora disseminator, but implements it differently. For one, it provides an indirect binding of behavior definitions/mechanisms to digital objects. It also opens up more possibilities for dynamic service-to-object mapping/binding in Fedora. The core Fedora development team already has a prototype of the CMDA design and will begin moving towards a production release for Fedora 2.3 or 3.0. See: http://www.cs.cornell.edu/payette/fedora/designs/cmda/ (NOTE: there is similar work is being done by others such as OWL-S and Semantic Web Services.) Better Support for Storing Very Large Datastreams in Fedora50/50/0
JMS Messaging in Fedora Framework80/10/10
Alternative Interfaces on Core Fedora Repository Service30/70/0 This is motivated by being able to improve performance by not having to do SOAP requests. Also, it can make for easier integration of Fedora with certain types of applications. The identified interface possibilities are:
Fedora Content Model Registry to Facilitate Sharing/Reuse20/70/10 Create a registry of Fedora content models to facilitate sharing and re-use. The registry can be implemented as a Fedora repository that stores Fedora "content model objects." Each content model object will have one or more datastreams each representing a different way of expressing content model constraints. A registry of content models would facilitate a bottom-up, organic, Darwinian approach to the sharing of content models (community defines them vs. top down promulgation of models). The idea is to let community create their "best offerings" and share them. If others like the models, they will adopt them. There is the question as to whether there is one hosted content model registry for the Fedora community, or whether we just provide a reference model for a Content Model Registry (implemented with a running instance of a Fedora repository). The first thing to be done is that the Fedora development team will put out a reference content model object that will include two datastreams - a datastream with an RDFS or OWL expression of the sample content model, and a datastream with a CMDA XML-based expression of the sample content model. The next move is to determine whether to distributed some reference model for a content model registry, or to actually run a content model registry for the community (using Fedora as the registry repository). Validation of Fedora Objects Based on Content Models20/60/20 Facilitate standard means to enforce constraints of a content model to in support of digital object validation and integrity checking. This entails having one or more logical expression languages to describe patterns and constraints for Fedora digital objects - that are useful for enforcement of constraints. the favorite mean to date is a shared logical expression language (meta modeling language) describing the pattern and constraints of Fedora digital object (per the fedora object model) and this included the notion of being able to describe the container-to-container (object type to object type rels). This gets at original idea of content model. (This could wind up being subsumed in #5, as it matures. It may merge) - FEDORA specific view of this. - This implies some sort of cmodel expression support by fedora. Manage a distributed transaction which entails having to update multiple services in the framework (multiple repos; repo plus several services)0/60/40 Reduce barriers for application developers with Fedora by offering alternate interfaces for Fedora. Enable better community process (developers).0/40/60
Note: the alternate interfaces requirement was discussed already in requirement #6 above. This motivates the same requirement from a different standpoint, that being making it easier for developers who are writing java-based middleware or applications upon Fedora. The can write directly to java interfaces, and bypass the SOAP interfaces. GetObject (PID, verisonDate)10/0/90 Provide the option to get the entire digital object (FOXML) as of a certain version date. You only get back one version of each datastream, and it's the version at the provided version date. Versioning enhancements
To enable "virtual distributed repository" ("fedororation")0/0/100 Other Points DiscussedSharing and re-useThe community must focus on the means of sharing and re-using objects from the standpoint of shared content models. What is the best way(s) to promote shared logical expressions/descriptions of the object types (i.e., "content models;" templates; application profiles; whatever) and shared vocabularies. Note that the ORE effort proposes a standard way to share "named sub-graphs." The ORE expression will be a generic graph, but we anticipate that the generic graph expressions can be sub-typed with community semantics. A means of this can be a shared registry of "entity types." We should think about our proposal for "content models" and content model registries, and how much we can generalize this is outside of Fedora. Note registries can enable discovery of content models (or entity types). A use case for sharing is actually developers trying to write apps to CRUD the units of management. Note: facilitate a bottom-up, organic, Darwinian approach where community creates; rather than top down. Note. the different communities working in this area include W3C semantic web, ORE, Topaz-Mulgara, Fedora, others. Fedora Commons can position to be a key player/leader in this area. Hourglass designWhen evolving Fedora keep this principle in mind. Aim for the "slim" IFaP (Interface, Formats, and Protocols) on core repository service (API-A/M). Think about this when we are thinking about adding more verbs to the APIs. Enduring/Reliable SystemWE ALREADY CONSIDER THIS ESSENTIAL
Better Extensibility schemes for digital object APIs
XACML outside the repositoryChi from the Australian RAMP project has done extensive work in this area. We must review and figure out how to deploy his work as an alternative configuration for XACML enforcement. |