Old Release

This documentation covers an old version of Fedora. Looking for another version? See all documentation.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 61 Next »

Notes:

(question)  How does an admin determine which of the files listed in the directory above map to Fedora nodes? In other words, what is the algorithm for generating these filenames?

(question) The pasted text in boxes below need detailed description. They are not fully "self-evident".

Walkthrough

The following steps simulate a typical user session. An end result (i.e a layout of file and directories) is then shown.

The user creates a node through the UI and uploads content – in this example it's a simple text file (chandni.txt) with a string ("chandni o. . ."):

Start Fedora

The modeshape configuration specifies using a file-based backend store.

cd fcrepo4/fcrepo-webapp
mvn jetty:run -Dfcrepo.modeshape.configuration=classpath:config/single-file/repository.json
Add Content
curl -v -XPUT --upload-file chandni.txt http://localhost:8080/rest/chandni/ds0/fcr:content

Fedora will create a directory "fcrepo4-data" in the current working directory. The default directories found in "fcrepo4-data" will be the following:

> ls fcrepo4-data
com.arjuna.ats.arjuna.common.ObjectStoreEnvironmentBean.default.objectStoreDir
com.arjuna.ats.arjuna.objectstore.objectStoreDir
fcrepo.activemq.dir
fcrepo.ispn.repo.CacheDirPath
fcrepo.modeshape.index.location

The serialized Fedora nodes can be found in the "fcrepo.ispn.repo.CacheDirPath/FedoraRepository" directory. The files in that directory would look something like this:

> ls fcrepo4-data/fcrepo.ispn.repo.CacheDirPath/FedoraRepository/
1008466944
1016939520
102151168
-1022408704
1028132864
1036310528
1040946176
...

These generated files contain serialized data about each of the JCR/Fedora nodes. Running the cURL command above, for example, generates binary files. Some things to note about these files and their contents:

  • The file names in the case of FileCacheStore are generated by calculating the Java hashcode of the node's UUID string (and shifting it by 22 fixed bit mask). For example, for root node, file -891938816 gets generated, containing the root node children.
  • Each of the files contain serialized ModeShape nodes. Although the generated files are binary, the data can be read using a tool that understands both BSON and JBoss serialization, or by using ModeShape API itself (It does not seem possible to read these files using existing bson tools, like mongoDB bsondump).
  • The serialization is done by a JBoss serialization library, not JDK's native object serialization machinery. For this reason the generated serialized files look different from an ordinary JDK serialized file. (Please refer to section on node contents for details.)

Using a tool can help in reading the contents of the file. For example, for our root node file (-891938816) , a tool could show the root node data as following:

{ "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "mode:root" } , "uuid" : "87a0a8c7505d64/" } } , "children" : [ { "key" : "87a0a8c317f1e7jcr:system" , "name" : "jcr:system" } , 
{ "key" : "87a0a8c7505d646988a17a-ad2f-48f6-aef5-e7d411e184d9" , "name" : "chandni" } ] , "childrenInfo" : { "count" : 2 } }

 

The actual binary file (in this case -891938816) looks something like this when opened up in a text editor:

> ls  -891938816

87a0a8c7505d641/7org.infinispan.schematic.internal.SchematicEntryLiteral6org.infinispan.marshall.jboss.JBossExternalizerAdapterq externalizer$org.infinispan.marshall.ExternalizerDorg.infinispan.schematic.internal.SchematicEntryLiteral

$Externalizer7org.infinispan.schematic.internal.SchematicExternalizer8org.infinispan.schematic.internal.document.BasicDocument;?org.infinispan.schematic.internal.document.DocumentExternalizer2;

23metadata?id87a0a8c7505d64/contentTypeapplication/jsoncontent>propertiesghttp://www.jcp.org/jcr/1.0FprimaryType$namemode:rootuuid

87a0a8c7505d64/children03<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64

53f49a07-8e14-41a1-bab3-abc59d86846enamechandni4

Some of these elements are:

  1. 87a0a8c7505d64 refers to root node UUID

  2. The grayed out text refers to Modeshape classes responsible for data representation and serialization. The serialization is done by the framework JBoss Marshalling, which can be configured to use custom serialization classes that read and write content in the format of their choosing (in this case it's BSON). The serialization format contains the name of datastructures and ModeShape custom marshallers (classes in org.infinispan.schematic.internal.*)

  3. The file contains binary data for specifying different internal attributes, so the file might show garbage characters or numbers in strings. /children0<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64 . Similarly, if object 'chandni' had siblings, the array would appear in the binary as:
  4. Name/string (6988a17a-ad2f-48f6-aef5-e7d411e184d9namechandni)
 

Similarly, for the datastream:

87a0a8c7505d64

8e6504dd-1fb8-4011-8d38-4fcd2e46c0f77org.infinispan.schematic.internal.SchematicEntryLiteral6org.infinispan.marshall.jboss.JBossExternalizerAdapterq externalizer$org.infinispan.marshall.ExternalizerDorg.infinispan.schematic.internal.SchematicEntryLiteral$Externalizer7org.infinispan.schematic.internal.SchematicExternalizer8org.infinispan.schematic.internal.document.BasicDocument; ?org.infinispan.schematic.internal.document.DocumentExternalizer;23metadatabid387a0a8c7505d64

8e6504dd-1fb8-4011-8d38-4fcd2e46c0f7

contentTypeapplication/jsoncontentkey

87a0a8c7505d64

8e6504dd-1fb8-4011-8d38-4fcd2e46c0f7

parent387a0a8c7505d64

46c11a1e-b2d9-496c-a950-5bd8cf7f2096

propertieshttp://www.jcp.org/jcr/1.0+primaryType$name nt:resourcedatachandni o meri chandni

lastModified.$date2013-12-05T18:52:00.714-05:00

mixinTypesL0D$name4{http://fedora.info/definitions/v4/rest-api#}binarylastModifiedBy bypassAdminmimeTypeapplication/octet-streamhttp://fedora.info/definitions/v4/rest-api

#NdigestA$uri2urn:sha1:1c63b638cab226a394ea27819c18397dd96687fchttp://www.loc.gov/premis/rdf/v1#hasSize55

 

Infinispan Configuration Options

Depending on the configured Infinispan backend, the directory layout and contents of the binary files would be different. The following sections covers other cache store options.

File system Backends

LevelDB

Currently, the default configuration outputs Fedora data to LevelDB (a fast filesystem based key-value store). When Fedora 4 is started, ModeShape (actually Infinispan and LevelDB in the background) will create several directories on the filesystem. Currently, the directories created are:

  1. fcrepo.ispn.binary.CacheDirPath (binary data)
  2. fcrepo.ispn.CacheDirPath (metadata)
  3. fcrepo.ispn.repo.CacheDirPath (repository)
  4. fcrepo.modeshape.index.location

The layout of files in directories 1-3 is determined by LevelDB. Some of the important files are:

  1. File .log holds entries for recent transactions. The relevant API for representing these entries is modeshape-schematics (see, e.g.,  org.infinispan.schematic.SchematicEntry)
  2. File .sst stores these entries when the .log file reaches a size threshold. A new log file is generated.
  3. File MANIFEST.x records info about .sst files (among other things). 
  4. File CURRENT specifies the current MANIFEST file.

Most of these files are binary and can be read by a LevelDB Java library.

 

  • No labels