Table of Contents
Data Structure Approaches
In-memory Processing | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Triple Store Data Structure | In-memory Data Structure | Direct access at loaded position | Direct access URI as ID (pre-loaded) | Iteration foward | Iteration | Append | Append after global last | Insert after loaded item | Insert before loaded item | Insert after loaded position | Insert before loaded position | Delete loaded item | Delete from loaded position | Fetch one by global position | Fetch | Fetch range | Fetch next range | Persist one at global position | Persist one by URI | Persist range | List size | Comments |
doubly linked list | ||||||||||||||||||||||
doubly linked list | NO | SLOW | FAST | FAST | FAST | FAST | FAST | FAST | NO | NO | FAST | NO | NO | FAST | SLOW | SLOW | NO | FAST | SLOW | NO | ||
convert to array on fetch | FAST | SLOW | FAST | FAST | FAST | FAST | FAST | FAST | FAST | FAST | FAST | FAST | NO | FAST | SLOW | SLOW | NO | FAST | SLOW | NO | ||
convert to hash on fetch | NO | FAST | FAST | FAST | FAST | FAST | FAST | FAST | NO | NO | FAST | NO | NO | FAST | SLOW | SLOW | NO | FAST | SLOW | NO | key=proxy URI | |
doubly linked list + order index | ||||||||||||||||||||||
doubly linked list | SLOW | SLOW | FAST | FAST | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | FAST | FAST | SLOW | SLOW | FAST | SLOW | FAST | ||
convert to array on fetch | FAST | SLOW | FAST | FAST | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | FAST | FAST | SLOW | SLOW | FAST | SLOW | FAST | ||
convert to hash (key=proxy URI) on fetch | SLOW | FAST | FAST | FAST | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | SLOW | FAST | FAST | SLOW | SLOW | FAST | SLOW | FAST | key=proxy URI |
Analysis of Triple Store Data Structure
Triple Store Data Structure | Pros | Cons | Comments |
---|---|---|---|
doubly linked list | + all updates effect at most 3 items (i.e., prev, current, and next) | - requires traversal of links during fetch (i.e. fetch each item individually - one query for each item) | |
doubly linked list + order index | + much faster fetch retrieving all items in a range (i.e., single query gets all items) | - adding/removing items from list can be very slow when item is early in the list (worst case touches all list items) |
Analysis of In-memory Data Structure
In-memory Data Structure | Pros | Cons | Comments |
---|---|---|---|
doubly linked list | + least amount of processing during fetch (probably not significant relative to overall processing required) | - iteration requires traversing next links | |
convert to array on fetch | + allows for fast access by position to loaded items | ||
convert to hash on fetch | + allows for fast fast access by proxy URI to loaded items | - iteration requires traversing next links | I anticipate the need to have direct access by proxy URI more than positional. |
Use Cases:
Use Case: Fetch the first N items in the list and display to user; Fetch next N and display; Fetch prev N and display; Fetch page X and display
Scenario:
User is browsing the list starting from the beginning, displaying N items at a time with the ability to move forward and back through the display of N items.
Operations:
- Fetch range starting from first and retrieving the next N items
- Iterate over list from ListHeaderInfo.first_loaded to last_loaded using ListItemInfo.next (next method) to traverse the list
- Display items in order with value=ListItemInfo.proxyFor URI and id=ListItemInfo.uri (Actually will get the title from a proxyFor triple and use that as the display value, but proxyFor's title isn't available from List Item Info. It requires extra processing beyond the ORE GEM.)
- User presses next button causing fetch of next N items starting from ListHeaderInfo.resume_next_token
- Iterate and display
- User presses previous button causing fetch of prev N items starting from ListHeaderInfo.resume_prev_token
- User presses button for page X; Find first item on page X
- For Doubly linked list + order index, the first item on page X with page size = N can be calculated.
- For Doubly linked list, not directly supported. Could do traversal of list (potentially asynchronous) to determine first item on Y pages beyond and preceding the current page.
Analysis:
- In-memory Data Structure
- All three will perform equally well for iteration from first to last.
- Triple Store Data Structure
- Doubly linked list + order index will perform better for fetch N items. However, this is likely to not be significant at low N. Not sure how large N needs to be before this becomes an issue.
- Doubly linked list + order index Triple Store Data Structure will perform much better for fetch page X.
Use Case: Move position of an item in the list
Scenario:
User is browsing the list and drags an item to a new position in the list.
Operations:
- KNOWN: ListItemInfo.uri of item being moved (from id of item in display list) – could also use ListItemInfo.position
- KNOWN: ListItemInfo.uri of item preceding position of insert – could also use ListItemInfo.position
- update links of old prev item, old next item, new prev item, new next item, and item being moved (and potentially list first or last if needed)
- persist all modified resources (perhaps triggered by user clicking Save button or auto-save with each change)
Analysis:
- In-memory Data Structure
- Hash is most efficient with direct access by ListItemInfo.uri (key of hash) and no moves are required within the data structure. Only links are updated.
- Doubly linked list is less efficient as the list has to be traversed to find the items to update. No moves are required within the data structure as only links are updated.
- Array is the least efficient. Direct access using ListItemInfo.position can be used to locate the items, but the array will need to be reordered to reflect the change in order of the list items.
- Triple Store Data Structure
- Both will perform equally well for move.
Use Case: Add (append) item to end of list – default add position
Scenario:
User is browsing the catalog and decides to add one or more bibliographic references to the list.
Operations:
- create new instance of ORE::Proxy
- fetch ListHeaderInfo.last
- update list items prev and next links and list last
- persist all three resources
Analysis:
- In-memory Data Structure
- All three will perform equally well for append.
- Triple Store Data Structure
- Both will perform equally well for append.
List Header Info Structure
The list header info will be a hash. It holds information about the list as a whole and the currently loaded range of items as a sub-structure. The values for the list information stored depends on the type of the loaded items sub-structure. The loaded items sub-structure holds items as a group are in-memory using one of four possible grouping methods: doubly linked lists, doubly linked lists + order index, array, or hash with proxy URI as the key. NOTE: Order index is global to the entire list. The array and hash implementations may also have this global order index information stored with the item info.
example value types | Comments | |||||
---|---|---|---|---|---|---|
key | description of value | doubly linked | doubly linked + order | array | hash | |
first | pointer to first item in list | URI of first proxy | URI of first proxy | URI of first proxy | URI of first proxy | |
last | pointer to last item in list | URI of last proxy | URI of last proxy | URI of last proxy | URI of last proxy | |
count | count of all items in list | X | last.index | last.index | last.index | Count for full list is available for array and hash when order index is stored in triplestore. |
first_loaded | pointer to first item in loaded range of items | item | item | array[0] | URI of first loaded proxy | |
current | pointer to current item | item | item | int cpos | URI of current proxy |
|
last_loaded | pointer to last item in loaded range of items | item | item | array[size of array] | URI of last loaded proxy | |
count_loaded | count of items in currently loaded range | X | =position of last - position of first | size of array | size of hash | |
resume_next_token | pointer to item after last_loaded | URI of proxy | URI of proxy | URI of proxy | URI of proxy | |
resume_prev_token | pointer to item just before first_loaded | URI of proxy | URI of proxy | URI of proxy | URI of proxy | |
loaded_items | pointer to loaded items | use first_loaded | use first_loaded | array | hash |
List Item Info Structure
Each item's info is stored in a hash. It holds information about the individual item. The loaded items as a group are in-memory using one of four possible grouping methods: doubly linked lists, doubly linked lists + order index, array, or hash with proxy URI as the key.
example value types | ||||||
---|---|---|---|---|---|---|
key | description of value | doubly linked | doubly linked + order | array | hash | Comments |
uri | URI of proxy for this item | URI of this proxy | URI of this proxy | URI of this proxy | URI of this proxy | This is the hash key for loaded_items when the list is implemented as a hash. |
prev | pointer to previous item in list | URI of prev proxy | URI of prev proxy | = p - 1 | URI of prev proxy | |
next | pointer to next item in list | URI of next proxy | URI of next proxy | = p + 1 | URI of next proxy | |
prev_loaded | pointer to previous item in loaded range of items | prev item | prev item | = pl - 1 | URI of prev proxy | |
next_loaded | pointer to next item in loaded range of items | next item | next item | = pl + 1 | URI of next proxy | |
proxyFor | URI of list item being aggregated | URI | URI | URI | URI | |
proxyIn | URI of list aggregation | URI of aggregation | URI of aggregation | URI of aggregation | URI of aggregation | |
proxyIn_loaded | pointer to List Header | header | header | header | header | |
position (p) | position in full list | X | p | p | p | Position in full list is available for array and hash when order index is stored in triplestore. |
position_loaded (pl) | position in loaded range of items | X | pl | pl | X |