Info |
---|
TODO Need to re-generate files since object name/text was changed; |
Walkthrough
The following steps simulate a typical user session. An end result (i.e a layout of file and directories) is then shown.
...
- The file names in the case of FileCacheStore are just signed 32-bit integers, so the usual and maximum length of file names is 10 or 11 characters. The file names (i.e. integers) are generated by calculating the Java hashcode of the node's UUID string (prefixed with root node UUID and shifting masking it by a 22 fixed bit mask). For example, for root node, file -891938816 gets generated, containing specifying the root node children.
Each of the files contain serialized ModeShape nodes. Although the generated files are binary, the data can be read using a tool that understands both BSON and JBoss serialization, or by using ModeShape API itself using a tool (a simple utility is under development) can help in reading the contents of the file (It does not seem possible to read these files using existing bson tools, like mongoDB bsondump). Using a tool such as X can help in reading the contents of the file. For example, for our root node file (-891938816) , a this tool could show would represent the root node data in JSON as following:
Code Block language js { "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "mode:root" } , "uuid" : "87a0a8c7505d64/" } } , "children" : [ { "key" : "87a0a8c317f1e7jcr:system" , "name" : "jcr:system" } , { "key" : "87a0a8c7505d646988a17a7c366d9-ad2f25bc-48f64e2d-aef59e53-e7d411e184d94428fe7b8152" , "name" : "chandnigreetings_en" } ] , "childrenInfo" : { "count" : 2 } }
As an aside, X would this tool also proposes to show the full file-node to-content mapping as:
Code Block -891938816 UUID: 87a0a8c7505d64 mode:root 1036310528-1857312768 UUID: 7c366d9-25bc-4e2d-9e53-4428fe7b8152 greetings_en -112221211 UUID: 328781824 UUID 7e84184d-3a8e-4f57-8e49-a550f1c19b3a greetings_en/ds0 39043072 UUID fc859bab-6f7b-46de-9b4c-f40dfe28643c hello, world!
- The serialization is done by a the JBoss serialization library, not JDK's native object serialization machinery. For this reason the generated serialized files look different from an ordinary JDK serialized file. (Please refer to section on node contents for details.) JBoss Marshalling can be configured to use custom serialization classes that read and write content in the format of their choosing (in this case BSON).
The data is encoded in Binary JSON (BSON). If the file containing the root node (referencing the node 'greetings_en') is opened up in a hex editor, you would see /u0002 preceding strings (such as "name","key"); /u0004 preceding an array (representing sub-nodes; /u0003 representing the UUID -- in accordance with the BSON spec.
Code Block "\u0000\u0000\u0000\u0004children\u0000ü\u0000\u0000\u0000\u00030\u0000<\u0000\u0000\u0000 \u0002key\u0000\u0019\u0000\u0000\u000087a0a8c317f1e7jcr:system\u0000\u0002name\u0000\u000B\u0000\u0000\u0000jcr:system \u0000\u0000\u00031\u0000X\u0000\u0000\u0000\u0002key\u00003\u0000\u0000\u000087a0a8c7505d6417c366d9-25bc-4e2d-9e53-4428fe7b8152\u0000\u0002name\u0000\u0000\u0000\u0000greetings_en\u0000\u0000\u0000\u0003childrenI";
The actual binary file (e.g. -891938816) looks something like this when opened up in a text editor:
87a0a8c7505d641/7org.infinispan.schematic.internal.SchematicEntryLiteral6org.infinispan.marshall.jboss.JBossExternalizerAdapterq externalizer$org.infinispan.marshall.ExternalizerDorg.infinispan.schematic.internal.SchematicEntryLiteral $Externalizer7org.infinispan.schematic.internal.SchematicExternalizer8org.infinispan.schematic.internal.document.BasicDocument;?org.infinispan.schematic.internal.document.DocumentExternalizer2; 23metadata?id87a0a8c7505d64/contentTypeapplication/jsoncontent>propertiesghttp://www.jcp.org/jcr/1.0FprimaryType$namemode:root3uuid 87a0a8c7505d64/children04<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64 53f49a07-8e14-41a1-bab3-abc59d86846enamechandni5 system1key387a0a8c7505d6417c366d9-25bc-4e2d-9e53-4428fe7b8152name greetings_en5childrenInfocount556 |
Key elements are annotatedSome of these elements are:
87a0a8c7505d64 refers to root node UUID
The grayed out text refers to Modeshape classes responsible for data representation and serialization. The node serialization is done by JBoss Marshalling , which can be configured to use custom serialization classes that read and write content in the format of their choosing (in this case BSON). The serialization format contains the name of datastructures and ModeShape i.e. custom marshallers (classes in org.infinispan.schematic.internal.*)
- Node name.
- The file contains binary data for specifying different internal attributes, so when opened up in a text editor, the file might show garbage characters or numbers in strings. /children0<key87a0a8c317f1e7jcr:systemnamejcr:system1Skey387a0a8c7505d64 . Similarly, if object 'chandni' had siblings, the array would appear in the binary as . . .:
- Node children UUID and name (6988a17a-ad2f-48f6-aef5-e7d411e184d9namechandni)
- Children count in binary.
...
Similarly, for the datastream content (omitting ModeShape API artifacts):
87a0a8c7505d641 8e6504dd 87a0a8c7505d64fc859bab- 6f7b- 46de- 9b4c-f40dfe28643c1 . . .2 metadatabid387a0a8c7505d6 4fc859bab-6f7b-46de-9b4c-f40dfe286433ccontentTypeapplication/ jsoncontent4Ékey387a0a8c7505d64fc859bab-6f7b-46de-9b4c- f40dfe28643c3parent387a0a8c7505d647e84184d-3a8e-4f57-8e49-a550f1c19b3a5properties ̃http SchematicEntryLiteral$Externalizer7org.infinispan.schematic.internal.SchematicExternalizer8org.infinispan.schematic.internal.document.BasicDocument; ?org.infinispan.schematic.internal.document.DocumentExternalizer;23metadatabid387a0a8c7505d642 8e6504dd-1fb8-4011-8d38-4fcd2e46c0f7 contentTypeapplication/jsoncontent3 key87a0a8c7505d648e6504dd-1fb8-4011-8d38-4fcd2e46c0f74 parent387a0a8c7505d6446c11a1e-b2d9-496c-a950-5bd8cf7f2096 http:// www.jcp.org/jcr/1.0 +"primaryType$name nt:resourcedata hello, world!6 lastModified. $date2013-12- 09T23: 51:00. 520-05:00 7mixinTypesL0D$name4{http://fedora.info/ definitions/v4/rest-api#}binarylastModifiedBy bypassAdminmimeTypeapplication/octet-stream 9http://fedora.info/definitions/v4/rest- api#NdigestA $uri2urn:sha1: e91ba0972b9055187fa2efa8b5c156f487a8293a10http://www.loc.gov/premis/rdf/ v1#hasSize55 |
Some of the The elements of interests are:
- Root UUID
- Omitted Serialization artifacts (see note on parent node for details).
- Content type of document (in this case it's the default content type for documents). See 8.
- UUID (repeats)
- Parent UUID (datastream)
- Actual text contentcontent ("hello world" text in this case).
- Last modified date.
- Last modified admin.
- Content type of datastream.
- SHA-1 generated by Fedora.
The corresponding readable representation is:
Code Block |
---|
{ "metadata" : { "id" : "87a0a8c7505d64/" , "contentType" : "application/json" } , "content" : { "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { |
...
Inspecting Indexing Folder
...