Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
> ls fcrepo4-data
com.arjuna.ats.arjuna.common.ObjectStoreEnvironmentBean.default.objectStoreDir
com.arjuna.ats.arjuna.objectstore.objectStoreDir
fcrepo.activemq.dir
fcrepo.ispn.repo.CacheDirPath
fcrepo.modeshape.index.location

Directory "fcrepo.ispn.repo.CacheDirPath" contains the generated data files.  "fcrepo.modeshape.index.location" contains the Lucene index.

...

  • The file names in the case of FileCacheStore are just signed integers, so the usual and maximum length of file names is 10 or 11. The file names (i.e. integers) are generated by calculating the Java hashcode of the node's UUID string (and shifting it by 22 fixed bit mask). For example, for root node, file -891938816 gets generated, containing the root node children. 

  • Each of the files contain serialized ModeShape nodes. Although the generated files are binary, the data can be read using a tool that understands both BSON and JBoss serialization, or by using ModeShape API itself (It does not seem possible to read these files using existing bson tools, like mongoDB bsondump). Using a tool such as X can help in reading the contents of the file. For example, for our root node file (-891938816) , a tool could show the root node data as following:

    Code Block
    languagejs
    { "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "mode:root" } , "uuid" : "87a0a8c7505d64/" } } ,
     "children" : [ { "key" : "87a0a8c317f1e7jcr:system" , "name" : "jcr:system" } , 
    { "key" : "87a0a8c7505d646988a17a-ad2f-48f6-aef5-e7d411e184d9" , "name" : "chandni" } ] ,
     "childrenInfo" : { "count" : 2 } }

    As an aside, X would show the full file-node mapping as: 

    Code Block
    -891938816 UUID:87a0a8c7505d64 mode:root
    1036310528 UUID:    greetings_en
    -112221211 UUID:    hello, world!         
  • The serialization is done by a JBoss serialization library, not JDK's native object serialization machinery. For this reason the generated serialized files look different from an ordinary JDK serialized file. (Please refer to section on node contents for details.)

Running X reveals the file-node mapping:

Code Block
-891938816 UUID:87a0a8c7505d64 mode:root
1036310528 UUID:    chandni           

 

The actual binary file (e.g. -891938816) looks something like this when opened up in a text editor:

...

  1. 87a0a8c7505d64 refers to root node UUID

  2. The grayed out text refers to Modeshape classes responsible for data representation and serialization. The node serialization is done by the framework  JBoss Marshalling, which can be configured to use custom serialization classes that read and write content in the format of their choosing (in this case it's BSON). The serialization format contains the name of datastructures and ModeShape custom marshallers (classes in org.infinispan.schematic.internal.*)

  3. Node name.
  4. The file contains binary data for specifying different internal attributes, so when opened up in a text editor, the file might show garbage characters or numbers in strings. /children0<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64 . Similarly, if object 'chandni' had siblings, the array would appear in the binary as . . .:
  5. Node children UUID and name (6988a17a-ad2f-48f6-aef5-e7d411e184d9namechandni)
  6. Children count in binary.

...