Time/Place
- Time: 12:00pm Eastern Daylight Time US (UTC-4)
- Place: Google-hangout, https://plus.google.com/hangouts/_/event/c1glu6soq43r1rr6ou17qtobug8
Attendees
- Andrew Woods
- Esme Cowles
- Chris Beer
Daniel Davis*Declan Fleming- A. Soroka
Benjamin Armintor- Zhiwu Xie
- Neil Jefferies (having issues with may data connection - may not be in the call if I can't fix it!)
- Yinlin Chen
Note-taker =
Previous note-taker = *
Agenda
- Review of areas of assessment
Action Item: Enhance descriptions of different areas (particularly 6, 7, and 8) - Architecture walk-through
Notes as comments on the wiki page. - Review to-date performance testing summary
Assign owners to (some number of) areas of assessment
- Thought exercise: "What would be the technical "risks" of releasing 4.0 Production *now*"?
- Or another way, "Where do we want to put next sprint's dev energy"?
Discussion
- Architecture walk-through
- Message-emitter should be added to F4 diagram
- Do we need two diagrams? - no
- as implemented
- aspirational
- It would be beneficial to define ci-tests
- There was interest in testing how to extend the code
- Next meeting: Wed meeting next week 8/27 at noon ET
Actions
- Esme to investigate current ModeShape development roadmap and how it aligns with F4
- clustering, etc
Adam and Ben to assess REST-API (goal of versioning this API)- Dan to enhance descriptions of "Areas of Assessment" numbers 6, 7, and 8
Neil to define initial set of system CI tests
2 Comments
Neil Jefferies
Some illustrative digital collection profiles for the Bod...
Does not include research data which has the potential to grow at approximately the same total volume as above per annum!
Chris Beer
Write performance
Our most time sensitive collection (where ingest performance and throughput is important) is a feed of scanned books from an external vendor. With Fedora 3, we managed to pull material from the vendor at a rate of 300 books/hour. Each book was estimated at about 50 MB/book, and may easily contain several hundred pages images. The entire dataset is likely around 5 million books.
Most other collections have no ingest performance targets, other than "fast enough".
Read performance
Our repository currently averages 5 - 10 data change operations / minute, and regular bursts of changes. Indexing operations should be fast enough to keep up with these changes, and we should be able to scale the repository out to handle the read load.
Currently, we can index about 10 objects / second (including pulling all the object metadata from the repository from ~10 XML datastreams, and often a handful of other supporting objects (collections, policies, etc)). At that rate, we can reindex our entire repository in under a couple days. Fedora 4 should have comparable or better performance.