Time/Place

This meeting is a hybrid teleconference and slack chat. Anyone is welcome to join...here's the info:

Attendees 

**Each week a meeting chair will be assigned based on a rotating schedule.**

(star) - denotes note taker

🪑 - denotes chair

Agenda

  1. Announcements:
  2. Pop-up/Other Topics:
    1. Outstanding Slack Issues
      1. Triple store indexing and internal Activemq running out of memory
        1. https://fedora-project.slack.com/archives/C8B5TSR4J/p1750175735408399
  3. Migration Updates:
  4. Dependency Upgrade Project Updates
    1. PR's for Review
      1. https://github.com/fcrepo/fcrepo/pulls
    2. Up Next - Community-wide testing
      1. Up for review: Release Testing
      2. We should review: Fedora Release Process 
    3. Timeline for next steps
    4. LTS Version Conversation
  5. Updates on:
    REMINDER: Need to put tickets in meeting notes to ensure we don't lose information from dynamic filters.
    1. Open Tickets (but assigned in some cases):

      T Key Summary Assignee Reporter P Status Resolution Created Updated Due
      Loading...
      Refresh

    2. In Progress and older but still relevant open tickets:

      T Key Summary Assignee Reporter P Status Resolution Created Updated Due
      Loading...
      Refresh

    3. In Review:

      T Key Summary Assignee Reporter P Status Resolution Created Updated Due
      Loading...
      Refresh

    4. Recently closed tickets:  "closed" within "the last 2 weeks"

      T Key Summary Assignee Reporter P Status Resolution Created Updated Due
      Loading...
      Refresh

  6. Backlog Tickets to consider working: N/A
  7. Next Meeting Chair:
    1. Chair: Dan Field
    2. Note Taker: Volunteer?

See Rotating Schedule here  

Notes

  1. Announcements
    1. None
  2. Pop-up/Other Topics
    1. ActiveMQ issues
      1. Triplestore indexing problems
        1. NL Wales working on Fedora 3-->6 migration
        2. Very large repository, causing long DB index time (though switching to SSD helped get it down to 2 days)
        3. Not using archival groups
        4. Triplestore indexing (into Fuseki) isn't getting very far within a reasonable timeframe (and causes Fedora slowdowns / container pod failure)
          1. Process walks the tree, code has been customized to reduce extra queries based on known three-level structure of NLW collection
          2. Even with simplified code, process takes weeks and causes memory exhaustion (garbage collection taking up more time than actual processing)
          3. Theory: this has to do with internal ActiveMQ – but why would that be if we're only doing read operations?
          4. Using Artemis for external queue management, but problems are happening in the internal ActiveMQ
          5. Jason has observed log errors in Camel Toolbox referring to port 61616 even when brokerUrl is set to use port 41616; is this a sign that reindexing service is sending things to the wrong place?
          6. Jared wonders if part of the problem is related to the speed at which items are added to the queue.
            1. Daniel found that throttling in Camel Toolbox didn't make a difference; it seems like memory is not being properly released regardless of the rate of requests.
            2. Daniel also observed that some requests keep coming to Fedora even after the queue has been paused; Jared thinks this may be related to pre-fetching behavior (but Daniel observes that hundreds or thousands of requests are coming through, so this seems like a very high rate). Could these be in some Tomcat queue? Since Camel is synchronous, it seems like it shouldn't be generating this volume when paused.
          7. Suggestion: turn on metrics and see what that says about database and OCFL performance; maybe this would offer a clue. (This will also show heap, though we already know there's a problem there).
            1. It's possible to capture a heap dump and evaluate the objects in the dump to figure out how much memory is being used by ActiveMQ in particular (turn on capture of heap dump, allow the instance to fall over, then share the dump with the Fedora team for analysis).
            2. Last successful test run worked, but took weeks and periodically got close to falling over due to nearly maxing out 28GB heap. This was with Camel Toolbox throttled to 150 messages per second.
          8. There was discussion of persistence of data for the internal ActiveMQ: if you use a queue instead of a topic, data should be stored in a KahaDB directory inside your Fedora home directory.
          9. Camel Toolbox should never send messages to the Fedora internal ActiveMQ, if it is connected to an external ActiveMQ. Double-checking that this is not happening could be helpful.
          10. Another possible test: create a repo with 10,000 objects, and run concurrent requests against them; examine memory behavior under this load – is this behaving the same as the Camel Toolbox reindex.
            1. A thought: could Camel Toolbox be failing to close connections properly, causing a memory leak? Seems unlikely, but worth confirming. Should be possible to figure out through Tomcat tools.
          11. Question: does NLW use direct or indirect containers through LDP? (Answer: probably no).
          12. Fuseki does not seem to be the problem, though performance was improved by bundling Fuseki updates instead of sending each one individually (using a Camel AggregationStrategy).
          13. Limited test: 1 million objects indexed in about an hour using aggregation strategy, etc., but without throttling.
      2. Troubleshooting 
        1. Is there a way to get better logging/visibility into the internal ActiveMQ?
          1. Ben thinks maybe turning up log levels for the jms and activemq namespaces might help.
            1. Add system property: fcrepo.log.jms = DEBUG (or TRACE to actually get messages)
          2. The embedded ActiveMQ is missing HTML artifacts so there is no way to access its web console without changing Fedora's build to include more files.
  3. Migration Updates
    1. None
  4. Dependency Upgrade Project Updates
    1. PRs for Review
      1. Several issues are flagged for inclusion in the release; we agreed not to add any more beyond those already flagged.
    2. Up Next
      1. We should review our testing and release documentation to be sure we're ready for the 7.0 release; as far as we know, though, they should be fairly up to date.
    3. LTS version conversation: Arran is drafting documentation for inclusion on the website describing our stance on the LTS version. Agreement: once 7.0 is released, it will become the LTS version since 6.0 can no longer be easily upgraded due to its outdated libraries.
      1. LTS support period = 3 years.
        1. Maybe we need clarity on "3 years from what?"
      2. Fedora 6 came out in 2021
      3. Security fixes should be possible on 6.x in some cases, but other things will be harder to fix because of dependency issues.
  5. ...time ran out at this point; more next week!
  • No labels