Old Release

This documentation covers an old version of Fedora. Looking for another version? See all documentation.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 53 Next »

(question) TODO: How does an admin determine which of the files listed in the directory above map to Fedora nodes? In other words, what is the algorithm for generating these filenames?

(question) TODO: The pasted text in boxes below need detailed description. They are not fully "self-evident".

Walkthrough

The following steps simulate a typical user session. An end result (i.e a layout of file and directories) is then shown.

The user creates a node through the UI and uploads content – in this example it's a simple text file (chandni.txt) with a string ("chandni o. . ."):

Start Fedora

The modeshape configuration specifies using a file-based backend store.

cd fcrepo4/fcrepo-webapp
mvn jetty:run -Dfcrepo.modeshape.configuration=classpath:config/single-file/repository.json
Add Content
curl -v -XPUT --upload-file chandni.txt http://localhost:8080/rest/chandni/ds0/fcr:content

Fedora will create a directory "fcrepo4-data" in the current working directory. The default directories found in "fcrepo4-data" will be the following:

> ls fcrepo4-data
com.arjuna.ats.arjuna.common.ObjectStoreEnvironmentBean.default.objectStoreDir
com.arjuna.ats.arjuna.objectstore.objectStoreDir
fcrepo.activemq.dir
fcrepo.ispn.repo.CacheDirPath
fcrepo.modeshape.index.location

The serialized Fedora nodes can be found in the "fcrepo.ispn.repo.CacheDirPath/FedoraRepository" directory. The files in that directory would look something like this:

> ls fcrepo4-data/fcrepo.ispn.repo.CacheDirPath/FedoraRepository/
1008466944
1016939520
102151168
-1022408704
1028132864
1036310528
1040946176
...

These generated files contain serialized data about each of the JCR/Fedora nodes. Running the cURL command above, for example, generates binary files. The file names in the case of FileCacheStore are generated based on the hash of the node's UUID. For example, for root node, file -8.. gets generated, containing the node data. Although the generated file itself is binary, the data it contains can be read using a tool or ModeShape API. For example, for our root node, the data is:

{ "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "mode:root" } , "uuid" : "87a0a8c7505d64/" } } , "children" : [ { "key" : "87a0a8c317f1e7jcr:system" , "name" : "jcr:system" } , { "key" : "87a0a8c7505d646988a17a-ad2f-48f6-aef5-e7d411e184d9" , "name" : "chandni" } ] , "childrenInfo" : { "count" : 2 } }

The actual binary looks something like this:

87a0a8c7505d64/7org.infinispan.schematic.internal.SchematicEntryLiteral6org.infinispan.marshall.jboss.JBossExternalizerAdapterq externalizer$org.infinispan.marshall.ExternalizerDorg.infinispan.schematic.internal.SchematicEntryLiteral$Externalizer7
org.infinispan.schematic.internal.SchematicExternalizer8org.infinispan.schematic.internal.document.BasicDocument;?
org.infinispan.schematic.internal.document.DocumentExternalizer;23metadata?id87a0a8c7505d64/
contentTypeapplication/jsoncontent>propertiesghttp://www.jcp.org/jcr/1.0FprimaryType$namemode:rootuuid
87a0a8c7505d64/children0<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64
53f49a07-8e14-41a1-bab3-abc59d86846enamechandni

 

87a0a8c7505d64/7org.infinispan.schematic.internal.SchematicEntryLiteral6org.infinispan.marshall.jboss.JBossExternalizerAdapterq externalizer$org.infinispan.marshall.ExternalizerDorg.infinispan.schematic.internal.SchematicEntryLiteral$Externalizer7org.infinispan.schematic.internal.SchematicExternalizer8org.infinispan.schematic.internal.document.BasicDocument;?org.infinispan.schematic.internal.document.DocumentExternalizer;23metadata?id87a0a8c7505d64/contentTypeapplication/jsoncontent>propertiesghttp://www.jcp.org/jcr/1.0FprimaryType$namemode:rootuuid

 87a0a8c7505d64/children0<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64

53f49a07-8e14-41a1-bab3-abc59d86846enamechandni

 

After some editing for readability, the object and datastream text is highlighted in bold to show data of interest. The grayed out text refers to Modeshape classes responsible for data representation and serialization. Description of strings in root node file is this . . . 

 

87a0a8c7505d64 refers to root node UUID

Similary, for the datastream:

 

87a0a8c7505d64

8e6504dd-1fb8-4011-8d38-4fcd2e46c0f77org.infinispan.schematic.internal.SchematicEntryLiteral6org.infinispan.marshall.jboss.JBossExternalizerAdapterq externalizer$org.infinispan.marshall.ExternalizerDorg.infinispan.schematic.internal.SchematicEntryLiteral$Externalizer7org.infinispan.schematic.internal.SchematicExternalizer8org.infinispan.schematic.internal.document.BasicDocument; ?org.infinispan.schematic.internal.document.DocumentExternalizer;23metadatabid387a0a8c7505d64

8e6504dd-1fb8-4011-8d38-4fcd2e46c0f7

contentTypeapplication/jsoncontentkey

87a0a8c7505d64

8e6504dd-1fb8-4011-8d38-4fcd2e46c0f7

parent387a0a8c7505d64

46c11a1e-b2d9-496c-a950-5bd8cf7f2096

propertieshttp://www.jcp.org/jcr/1.0+primaryType$name nt:resourcedatachandni o meri chandni

lastModified.$date2013-12-05T18:52:00.714-05:00

mixinTypesL0D$name4{http://fedora.info/definitions/v4/rest-api#}binarylastModifiedBy bypassAdminmimeTypeapplication/octet-streamhttp://fedora.info/definitions/v4/rest-api

#NdigestA$uri2urn:sha1:1c63b638cab226a394ea27819c18397dd96687fchttp://www.loc.gov/premis/rdf/v1#hasSize55

 

Infinispan Configuration Options

Depending on the configured Infinispan backend, the directory layout would be different. The following sections covers some of the file system cache store options.

File system Backends

LevelDB

Currently, the default configuration outputs Fedora data to LevelDB (a fast filesystem based key-value store). When Fedora 4 is started, ModeShape (actually Infinispan and LevelDB in the background) will create several directories on the filesystem. Currently, the directories created are:

  1. fcrepo.ispn.binary.CacheDirPath (binary data)
  2. fcrepo.ispn.CacheDirPath (metadata)
  3. fcrepo.ispn.repo.CacheDirPath (repository)
  4. fcrepo.modeshape.index.location

The layout of files in directories 1-3 is determined by LevelDB. Some of the important files are:

  1. File .log holds entries for recent transactions. The relevant API for representing these entries is modeshape-schematics (see, e.g.,  org.infinispan.schematic.SchematicEntry)
  2. File .sst stores these entries when the .log file reaches a size threshold. A new log file is generated.
  3. File MANIFEST.x records info about .sst files (among other things). 
  4. File CURRENT specifies the current MANIFEST file.

Most of these files are binary and can be read by a LevelDB Java library.

FileCacheStore

As with the LevelDB option, when Fedora 4 is started with FileCacheStore configuration, ModeShape creates several directories on the filesystem:

  1. data
  2. expired
  3. index files

 

Specifying the FileCacheStore option would result in creating hundreds of binary files in that data directory (e.g. 11333332.. , -2334002.. etc)

Using the Infinispan 6.x deprecated FileCacheStore (specified via file/infinispan.xml, currently our ModeShape is on 5.x) results in creation of hundreds of binary files (compared to LevelDB). A hashing algorithm is used to map keys to buckets. The value files contain serialized ModeShape nodes. The key files can be read using org.infinispan.schematic.internal.document.BsonReader. (It does not seem possible to read these files using existing bson tools, like mongoDB bsondump, but further inspection is needed.)

  • No labels