Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Add Content
Code Block
curl -vX -XPUTPUT --uploaddata-filebinary fcrepo4"@fcrepo4_greetings.txt" "http://localhost:8080/rest/greetings_en/ds0/fcr:content"

Fedora will create a directory "fcrepo4-data" in the current working directory. The default directories found in "fcrepo4-data" will be the following:

...

  • The file names in the case of FileCacheStore are just signed 32-bit integers, so the usual and maximum length of file names is 10 or 11 characters. The file names (i.e. integers) are generated by calculating the Java hashCode of the node's UUID string (prefixed with root node UUID and masking it by a 22 fixed bit mask). For example, the following Java snippet code calculates the file name for root node (UUID "87a0a8c75085d64/"):

    Code Block
    languagejava
    fileName = "87a0a8c7505d64/".hashCode() & 0xfffffc00; //equals -891938816  

    Therefore, file -891938816 would contain data about the root node. 

  • Each of the files contain serialized ModeShape nodes. Although the generated files are binary, using a tool (a simple utility is under development) can help in reading the contents of the file (It does might not seem possible be easy to read these files using mongoDB bsondump alone). For example, for our root node file (-891938816) , this a tool would represent might display the root node data in JSON as following:

    Code Block
    languagejs
    { "properties" : 
      { "http://www.jcp.org/jcr/1.0" : 
        { "primaryType" : 
          { "$name" : "mode:root" } , "uuid" : "87a0a8c7505d64/" 
        } 
      } ,
     "children" : 
        [ { "key" : "87a0a8c317f1e7jcr:system" , "name" : "jcr:system" } , 
          { "key" : "7c366d9-25bc-4e2d-9e53-4428fe7b8152" , "name" : "greetings_en" } ] ,
        "childrenInfo" : { "count" : 2 } 
    }

    As an aside, this tool also proposes a tool could be written to show the full file-to-content mapping; the utility under development aims to display this information like the following snippet

    Code Block
    -891938816  UUID    87a0a8c7505d64 mode:root
    -1857312768 UUID    7c366d9-25bc-4e2d-9e53-4428fe7b8152 greetings_en
    328781824   UUID    7e84184d-3a8e-4f57-8e49-a550f1c19b3a greetings_en/ds0
    39043072    UUID    fc859bab-6f7b-46de-9b4c-f40dfe28643c hello, world!         
  • The serialization is done by the JBoss serialization library, not JDK's native object serialization machinery. For this reason the generated serialized files look different from an ordinary JDK serialized file. JBoss Marshalling can be configured to use custom serialization classes that read and write content in the format of the repository's choosing. 

  • The data is encoded in Binary JSON (BSON). If the file containing the root node (referencing the node 'greetings_en') is opened up in a hex editor, you would see /u0002 preceding strings (such as "name","key"); /u0004 preceding an array (representing sub-nodes); /u0003 representing the UUID -- in accordance with the BSON spec.

    Code Block
    \u0000\u0000\u0000\u0004children\u0000ü\u0000\u0000\u0000\u00030\u0000<\u0000\u0000\u0000
    \u0002key\u0000\u0019\u0000\u0000\u000087a0a8c317f1e7jcr:system\u0000\u0002name\
    u0000\u000B\u0000\u0000\u0000jcr:system\u0000\u0000\u00031\u0000X
    \u0000\u0000\u0000\u0002key\u00003\u0000\u0000\u000087a0a8c7505d6417c366d9-25bc-4e2d-9e53-4428fe7b8152\u0000\u0002name\u0000\u0000\u0000\u0000greetings_en\u0000\u0000\u0000\u0003childrenI

...

 

87a0a8c7505d64fc859bab-6f7b-46de-9b4c-f40dfe28643c1 . . .2

metadatabid387a0a8c7505d6 4fc859bab-6f7b-46de-9b4c-f40dfe286433ccontentTypeapplication/ jsoncontent4Ékey387a0a8c7505d64fc859babÉkey387a0a8c7505d64fc859bab-6f7b-46de-9b4c- f40dfe28643c3parent387a0a8c7505d647e84184d-3a8e-4f57-8e49-a550f1c19b3a5properties ̃http:// www.jcp.org/jcr/1.0"primaryType$name

nt:resourcedatahello, world!6 lastModified.$date2013-12-09T23:51:00.520-05:007mixinTypesL0D$name4{http://fedora.info/ definitions/v4/rest-api#}binarylastModifiedBy bypassAdmin8mimeTypeapplication/octet-stream9http://fedora.info/definitions/v4/rest-api#NdigestA $uri2urn:sha1:e91ba0972b9055187fa2efa8b5c156f487a8293a10http://www.loc.gov/premis/rdf/ v1#hasSize55

...

ModeShape currently makes use of Hibernate Search to manage Lucene indexes. The indexes can be viewed by using Luke, e.g.
These files can be found in the "fcrepo.modeshape.index.location" directory. 


Luke

Inspecting ObjectStore Folders

Directories "com.arjuna.ats.arjuna.objectstore.objectStoreDir" and "com.arjuna.ats.arjuna.common.ObjectStoreEnvironmentBean.default.objectStoreDir" are JBoss JTA transaction engine artifacts. The default Fedora Infinispan configuration attempts to find a JBossJTA transaction manager implementation via  "org.infinispan.transaction.lookup.GenericTransactionManagerLookup". This configuration uses Arjuna ShadowFileStore as a backend, resulting in several directories within fcrepo4-data such as "object-store" and "object-store-default":

Code Block
   |-object-store
   |---ShadowNoFileLockStore
   |-----defaultStore
   |-------Recovery
   |---------TransactionStatusManager
   |-object-store-default
   |---ShadowNoFileLockStore
   |-----defaultStore

A detailed description of the artifacts maintained by the JBossJTA implementation is most likely beyond the scope of this document (at least for now).

Infinispan Configuration Options

Depending on the configured Infinispan backend, the directory layout and contents of the binary files would be different. The following sections covers other cache store options.

...

  1. File .log holds entries for recent transactions. The relevant API for representing these entries is modeshape-schematics (see, e.g.,  org.infinispan.schematic.SchematicEntry)
  2. File .sst stores these entries when the .log file reaches a size threshold. A new log file is generated.
  3. File MANIFEST.x records info about .sst files (among other things). 
  4. File CURRENT specifies the current MANIFEST file.

Most of these files are binary and can be read by a LevelDB Java library.

Inspecting Individual Binary Data Files

As is the case with the FileCacheStore, the .log files are binary. The default initial .log file is 000003.log, and it contains serialized entries for all the nodes from recent transactions. If the repository has only a few hundred added nodes, this file will contain all the nodes (i.e. the additions and default properties like mix:etag).