NewRelic is a commercial application monitoring service that has the capability to introspect an application and report on key transaction performance.

Server Monitoring

New Relic can be used to perform an availability check of application services.  It will escalate an error if a service is unresponsive for several minutes.

  • Check availability of the XMLUI service (load home page)
    • Note, this does not explicitly test the availability of a database connection
  • Check availability of the solr repositories
    • search
    • statistics 
    • oai
  • Check availability of the handle service

Key Transaction Monitoring

Unfortunately, by default, all XMLUI transactions appear as a single task and it is difficult to distinguish or analyze different types of traffic.

  • /CocoonForwardController/forwardRequest

The following configuration change in newrelic.yml has enabled us to identify some useful key transactions

  • enable_auto_transaction_naming: false 

This has enabled us to create the following key transactions

  • /community-list
  • /handle/*/*
  • /discover
  • /handle/*/*/discover

Unfortunately, we have not yet been able to group bitstream downloads into a single key transaction.

  • No labels

6 Comments

  1. Few comments, we use it as well here to debug specific problems

    Different New Relic "Products": APM vs Browser vs Server

    Getting into New Relic, it was often confusing for me what the different views are used for. Short summary:

    APM = used to monitor your Tomcat. It's the java process of your tomcat that is supplying new relic with most of the info you see in APM

    Browser = actual user behaviour. Sent by javascript in the user's browser when you have activated it. Super super useful tool to look at performance issues of actual users and pages.

    Server = data sent directly from your Linux server. Can help you with overall processes on your machine, disk space availability etc.

    Standard circuit breaker thresholds

    New Relic has standard settings when it throttles back its own memory usage and reporting when it detects specific parameters (CPU, memory) going high. Since a tomcat running dspace will generally gobble up most of the memory it gets available, running high against the limit of the available memory is not super unusual for DSpace. So the circuit breaker thresholds should be tweaked to ensure you generate less false alerts.

    Problem: DSpace CRON's messing with performance

    Still something we often struggle with in diagnosing: often performance is hit by specific crons running wild in combination with the normal behaviour. The New Relic server option shows you when & how many java processes you have running, but I didn't yet find a way to split them out per exact process and reveal the arguments.

    Interested: reasons for buying the pro version?

    Until now, we're often able to diagnose everything with the free version. Has anyone found a killer feature in the pro version that makes it worth subscribing?

    APM: JVM view: Healthy DSpace server vs struggling DSpace server

    All the lines and graphs in the JVM view can be a little bit daunting. Just look at these two patters of a healthy dspace server vs struggling DSpace server.

    HEAP: Happy vs Struggling JVM


    Garbage Collect: Happy vs Struggling JVM

    Key indicator (for me): JVM spending a high percentage of CPU in garbage collection. Correct me if I'm wrong, but I don't think this is ever a good sign.

    When used heap approaches max heap = trouble

  2. Bram Luyten (Atmire), the pro version allows you to track logs events for 30 or 60 days.  We have found this valuable when trying to determine what event initiated a performance issue.

    Since some of the cron based tasks are performance intensive, we have found it helpful to view performance trends over several days.

    The Pro version also provides access to the APM feature which should allow you to introspect specific transactions.

    At our next DSpace release, we will be disabling: enable_auto_transaction_naming which we hope will allow us better introspection into specific request types.  

    As implemented, the Cocoon framework seems to be difficult to instrument since all activity goes through the same pipeline.

    The Pro version also provides SQL analysis.  Our DSpace database generally performs quite well, so I have not yet gained significant insights from this portion of the service.

  3. Terrence W Brady in reply to your comment about the cron based tasks

    ... Some of the cron based tasks are performance intensive ...

    We have been chasing performance issues where we suspect that conflicting java processes / cron jobs are fighting eachother for CPU and Memory. The "APM" window will only give you information about the JVM running tomcat, and not about the other Java processes.

    We've found that the "Server" overview shows all of the java processes on the machine, however, it aggregates these processes per user space. In the example below, you see that there's one java process running, a second, third and fourth one get started for a short while after which it falls back to two. 

    Something we still have to experiment with is passing "-Dnewrelic.config.app_name=MY_APPLICATION" in the execution of the cron jobs, which can help to ensure that each of these java processes reports data separately.

    https://discuss.newrelic.com/t/monitoring-java-processes-separately-by-name/5485

    I would be interested to hear if you've done anything to visualize the different resources consumed per cron job.

    1. Bram Luyten (Atmire), I tried providing -Dnewrelic.config.app_name for a cron task and I do not see the name in the process list within NewRelic.

      Have you successfully tested this yet?

  4. Bram Luyten (Atmire), I have not tried the option you listed above.  I am curious to see what we might learn with that option enabled.

    With some support help from NewRelic, we have some key transaction requests grouped into APM.  Looking at a recent 3 minute window, we see the following grouping of key transactions.

    We have not yet drawn any significant insights from these key transactions.  The next time we encounter a performance issue, I will be curious to see what we learn from this panel.

    1. Great to see. Looks like the community-list page is slow, which I've seen with many other repo's that have a sizeable comm-coll tree, for example:

      https://www.repository.cam.ac.uk/community-list

      https://cgspace.cgiar.org/community-list

      Might be worth looking at optimizing in DSpace 5 & than compare/test against DSpace 6 performance

      For DSpace 6, the a recent related discussion on these performance issues:

      https://groups.google.com/forum/#!msg/dspace-tech/tBNckjJ0ocE/mUCJXXHJCQAJ