Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Some key things to note about the files and their contents:

  • The file names in the case of FileCacheStore are just signed 32-bit integers, so the usual and maximum length of file names is 10 or 11 characters. The file names (i.e. integers) are generated by calculating the Java

    hashcode

    hashCode of the node's UUID string (prefixed with root node UUID and masking it by a 22 fixed bit mask). For example, the following Java snippet code calculates the file name for root node (UUID "87a0a8c75085d64/"):

    Code Block
    languagejava
    fileName = "87a0a8c7505d64/".hashCode() & 0xfffffc00; //equals -891938816  

    Therefore, file -891938816

     gets generated, specifying

     would contain data about the root node

    children


    AWoods: Osman Din, can you more specifically walk through this example of how -891938816 is created? i.e. write out the inputs to the hashing function. It would be great if your example would allow a user who knew a node's UUID to be able to calculate in which integer-named file the serialization of the node could be found.


  • Each of the files contain serialized ModeShape nodes. Although the generated files are binary, using a tool (a simple utility is under development) can help in reading the contents of the file (It does not seem possible to read these files using mongoDB bsondump). For example, for our root node file (-891938816) , this tool would represent the root node data in JSON as following:

    Code Block
    languagejs
    { "properties" : 
      { "http://www.jcp.org/jcr/1.0" : 
        { "primaryType" : 
          { "$name" : "mode:root" } , "uuid" : "87a0a8c7505d64/" 
        } 
      } ,
     "children" : 
        [ { "key" : "87a0a8c317f1e7jcr:system" , "name" : "jcr:system" } , 
          { "key" : "7c366d9-25bc-4e2d-9e53-4428fe7b8152" , "name" : "greetings_en" } ] ,
        "childrenInfo" : { "count" : 2 } 
    }

    As an aside, this tool also proposes to show the full file-to-content mapping: 

    Code Block
    -891938816  UUID    87a0a8c7505d64 mode:root
    -1857312768 UUID    7c366d9-25bc-4e2d-9e53-4428fe7b8152 greetings_en
    328781824   UUID    7e84184d-3a8e-4f57-8e49-a550f1c19b3a greetings_en/ds0
    39043072    UUID    fc859bab-6f7b-46de-9b4c-f40dfe28643c hello, world!         
  • The serialization is done by the JBoss serialization library, not JDK's native object serialization machinery. For this reason the generated serialized files look different from an ordinary JDK serialized file. JBoss Marshalling can be configured to use custom serialization classes that read and write content in the format of the repository's choosing. 

  • The data is encoded in Binary JSON (BSON). If the file containing the root node (referencing the node 'greetings_en') is opened up in a hex editor, you would see /u0002 preceding strings (such as "name","key"); /u0004 preceding an array (representing sub-nodes); /u0003 representing the UUID -- in accordance with the BSON spec.

    Code Block
    \u0000\u0000\u0000\u0004children\u0000ü\u0000\u0000\u0000\u00030\u0000<\u0000\u0000\u0000
    \u0002key\u0000\u0019\u0000\u0000\u000087a0a8c317f1e7jcr:system\u0000\u0002name\
    u0000\u000B\u0000\u0000\u0000jcr:system\u0000\u0000\u00031\u0000X
    \u0000\u0000\u0000\u0002key\u00003\u0000\u0000\u000087a0a8c7505d6417c366d9-25bc-4e2d-9e53-4428fe7b8152\u0000\u0002name\u0000\u0000\u0000\u0000greetings_en\u0000\u0000\u0000\u0003childrenI

...

  1. 87a0a8c7505d64 refers to node UUID

  2. The grayed out text refers to Modeshape classes responsible for data representation and serialization. The JBoss Marshalling serialization format contains the name of datastructures i.e. custom marshallers in org.infinispan.schematic.internal.* 

  3. Node name.
  4. The file contains binary data for specifying different internal attributes, so when opened up in

    a

    an ordinary text editor, the

    file

    text editor might show some numbers as garbage characters

    or numbers in strings. /children0<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64 . Similarly, if object 'chandni' had siblings, the array would appear in the binary as . . .: AWoods: Osman Din, please change example from "chandni" to "fcrepo4_greetings"Node children UUID

    . These numbers have meaning for node's attributes, though. For example, the highlighted entry #4 specifies an array of the children of a node. The index number of the child precedes its key and name. So, if object "greetings_en" gets another sibling "greetings_fr", the latter's entry would be preceded by 2 (its index). The full children node array would now appear something like: 

    /children0<key87a0a8c317f1e7jcr:systemnamejcr:system

    1Xkey387a0a8c7505d6417c366d9-25bc-4e2d-9e53-4428fe7b8152namegreetings_en

    2Xkey387a0a8c7505d64aa21b2ee-d22a-4b0a-8b43-a8e8b58c5ec6namegreetings_fr

    and name (6988a17a-ad2f-48f6-aef5-e7d411e184d9namechandni)

  5. Children count in binary.

...