Time/Place

This meeting is a hybrid teleconference and IRC chat. Anyone is welcome to join...here's the info:

Attendees

Agenda

  1. 4.0 Features, raise any comments/concerns now
    1. Development focus on reaching 4.0
  2. Single-node Transactions issue
  3. Cluster, how do we get over the hump?
  4. other...?

Minutes

  1. 4.0 Release features
    1. Andrew:  walkthrough of feature groups:  candidates for beta/4.0 release, then others (prioritized, unprioritized, parallel feature development in Hydra/Islandora)
    2. Timeline:  4.0 Beta by end of Q1 (April), 4.0 point release by end of Q2
    3. Short-term goal:  communicate to community, stakeholders the concrete timeline and set of features for the 4.0 release
    4. Andrew:  what do the developers think of the candidate feature list and timeline?
      1. Scott:  any features on the list in danger of slowing down the project, its release?
      2. Andrew:  none of the features proposed cover 100% of the use cases associated with them, but they are all on track to deliver some of the requested functionality for the initial release.  The only risky feature is clustering;  we need to get it to work, and we need to get it performant (faster than single node, faster than Fedora 3, for CRUD operations)
      3. Stefano:  timeline for releasing robust content modelling features?
      4. Andrew:  Stefano is working on a branch for testing, developing complex CNDs and node types.  Goals:
        1. Get a branch that builds
        2. Get others to test it out
        Stefano's branch may continue in parallel development for some time, beyond the initial 4.0 release.
      5. Stefano:  Goal:  he'll get a working branch by end of Q1
      6. Scott:  Also:  documentation for how you create, manipulate CNDs and objects, once his branch is in working order
        (Andrew asks Stefano to make sure he lets others know he's working on a branch when describing build problems on the tech list, to avoid confusion about the state of master)
      7. Stefano:  Connector/sequencer features (Fedora/JMS connectors, Fedora/Modeshape filesystem connector)?
      8. Andrew:  Implement the Fedora interface wrapper around the Modeshape filesystem connector interface:  see Modeshape documentation, examples
      9. Frank, Osman:  get clustering working
      10. Andrew:  OK – we'll work on these listed features over the next month and a half.
  2. Single Node Transactions bug
    1. Kai, Mike Durbin, Adam working on it
    2. Background:  overall Fedora 4 performance is slightly better than Fedora 3, but the session.save() method is costly in terms of performance.  Goal of transactions:  reduce the number of times we call session.save(), by bundling up several actions into a single transaction, then calling session.save() at the end of the transaction.
    3. Bug:  when a principal is tied to a transaction, the http session gets wiped out
      1. Scott:  how are multiple sessions, transactions handled when there is a shared single principal (such as fedoraAdmin?)
      2. Andrew:  same principal attached to the transactions
    4. Table this discussion, as none of those working on the problem are present
  3. Cluster work
    1. Frank, Scott, Greg working on clustering this sprint
    2. Andrew:  goals are:
      1. Get clusters to work (no bugs, no problems)
      2. Get it performant
    3. Scott:  current work:  has nodes provisioned, managed by puppet;  working on deploying, managing Fedora cluster with puppet module
      1. Question:  focus on getting something running first, then tooling?
      2. Andrew:  balance between developing tools to make cluster deployment easy and fast, and just getting a cluster up and running;  any tools you can develop to accomplish the second goal, great, but the priority is to iron out bugs and performance problems in clustered setups
      3. Scott:  will focus on getting a Fedora cluster up and working, by end of Friday
    4. Frank:  working on resolving two problems:
        1. Serialized processing:  have to create parent objects, wait for them to replicate across nodes, then create their child objects (slow)
        2. Unsynchronized commits
      1. Frank:  Goal is to get 10% speed improvement over Fedora 3 for ingests
        1. overhead of managing cluster is expensive
        2. Denmark use case:  ingesting large numbers of large binary objects (audio/video) – the higher the number of objects being ingested, the slower the cluster runs
      2. Andrew:  can you create properties on the object at the same time you create the object?  Frank agrees, will try that
      3. Scott:  read tests?  (argues that if reads are fast, then maybe slower ingests aren't so important)
      4. Frank:  Hasn't done any thorough read tests yet.  Agrees that most repositories will be mostly WORM-ish, but that if you can't get your stuff ingested in a timely manner, then repo admins will be frustrated – need to get ingests performant.  Andrew nods head in agreement.
      5. Andrew:  are you wrapping your ingests in transactions?
      6. Frank:  not really:  using direct Java API to perform actions, within a single http session
      7. Andrew:  are you splitting up the ingests across the nodes into groups of ingests per node?
      8. Frank:  using queues.  Query the cluster for info about nodes, creates a queue for each node, feeds ingests into queues.
      9. Andrew:  using a load balancer?
      10. Frank:  Set one up, then turned it off, to implement the queue-based processing directly against each node
    5. Andrew:  Frank, Scott, Greg:  work together on solving cluster issues (Scott nods head)
      1. Frank:  Scott can help by getting his Fedora cluster minimally configured, and reproduce Frank's issues
        Frank will push his scripts for configuring Fedora clusters up to github, for Scott/Greg to adapt and use

Meeting adjourned.

New Actions

  • Scott Prater to get a cluster similar to Frank's up and running by Friday, Feb. 14th
  • frank asseg to put cluster config scripts up on github for Scott, Greg to use
  • Stefano Cossu  to continue work to get content modelling branch working, ready for testing by others 
  • No labels