Performance Metrics

Overview

Engine can be configured to record and report on a number of metrics that cover various aspects of its performance. By default, this feature is disabled, but you can enable it by adding a setting called EnableRecordingMetrics to your Engine config file with a value of true. Once this setting is enabled, you can analyze those metrics by querying /RusticiEngine/metrics. The information is reported using a Prometheus exporter, so we recommend that you analyze this data using a Prometheus server or a similar tool. For more information, you can refer to the Prometheus website.

Preface: Histogram Timers

Most of our metrics are recorded using histogram-backed timers. This means that each timer has multiple different labels, and each label identifies the bucket of the histogram being described. A single timer will show up on multiple lines of the metric report. Let's use the timer that monitors the response time of every single HTTP request that Engine receives as our example:

http_all_requests_duration_seconds_bucket{le="0.01"} 20
http_all_requests_duration_seconds_bucket{le="0.05"} 42
http_all_requests_duration_seconds_bucket{le="0.1"} 50
...etc...
http_all_requests_duration_seconds_bucket{le="60"} 56
http_all_requests_duration_seconds_bucket{le="+inf"} 56
http_all_requests_duration_seconds_sum 10.37343612
http_all_requests_duration_seconds_count 219

Metrics ending in _bucket are used to categorize the various response times of the API. The le label is short for "less than or equal to", so a _bucket metric with a label of le="0.25" specifies the number of requests that Engine responded to in less than 0.25 seconds. These buckets are cumulative, so if a request lands in the le="0.25" bucket, then it will also contribute to the le="0.5" and le="1" buckets, all the way up to le="+inf". The _sum and _count metrics measure the total amount of time spent responding to given endpoint and the number of requests Engine has received to that endpoint.

Course Runtime

To see the response times of the endpoints used by the player, you can review course_runtime_operations_duration_seconds. Along with the usual le label, this metric also uses a label called operation, which has the following possible values:

  • launch_page - Used when launching a registration. All learning standards will use this page.
  • player_configuration - This is the endpoint called by the player after being launched. This is used by all learning standards, except for Tin Can courses when the setting UsePlayerForTinCanLaunches is set to false (the default value in Engine 2018+ is true).
  • record_results - Called to record a registration's state during the course of a SCORM course.
  • record_results_on_player_exit - Called when a learner exits a SCORM course.
  • process_aicc_request - Used to perform various operations during AICC courses
  • cmi5_au_launch - Only used when launching cmi5 courses

Here is an example:

course_runtime_operations_duration_seconds_bucket{le="0.01", operation="player_configuration"} 128

The API

We keep recording metrics covering the requests to both versions of the API. The method label describes the HTTP method used to request the URL, and the url label contains the URL that was requested with any identifiers replaced with [id]. Aside from the name used for the metric and the possible values of the url label, versions 1 and 2 of the API are presented identically. They are formatted like this:

api_v2_requests_duration_seconds_bucket{method="get", le="0.01", url="/registrations/[id]"} 28
API Counters

In addition to the above histograms, Engine also records counters to keep track of the number of requests made to/by each tenant and the status codes used in its responses to API requests:

api_requests_total{tenant="default"} 3
api_responses_total{status_code="200"} 328

The LRS

The performance of Engine's LRS can also be monitored. The metrics reporter uses the method label to differentiate the different xAPI resources being requested. For a detailed description of each of these endpoints, please refer to the xAPI spec. The values currently used for the method label are:

  • activities
  • activities_profile
  • activities_state
  • actors
  • actor_profile
  • extended_actions
  • statements

Example:

xapi_requests_duration_seconds_bucket{method="activities", le="0.01"} 17
Statement Pipe Metrics

We monitor each xAPI statement pipe individually. You can see the IDs of your statements pipes using the /xapi/pipes resource. You can then monitor a specific pipe by matching its ID to the value of the pipe_id label:

xapi_statement_pipe_processing_duration_seconds_bucket{le="0.01", pipe_id="0374c238-e5d8-4361-97e7-e5612e1acf18"} 0

The Database

Engine keeps timers describing how long it takes to execute its various database operations. These timers follow our usual histogram timer reporting patterns from above, and they utilize a method label. Internally, Engine separates its operations into queries (pulling data out of the database) and non-queries (inserting new or manipulating existing data). These are the two possible values for the method label on our database metrics, and they can be used to help diagnose the source of any issues with database speed that you might be having.

database_data_helper_operations_duration_seconds_bucket{method="query", le="0.01"} 221

In addition to monitoring its own database operations, Engine can also keep track of certain JDBC operations. Right now, we only record the time it takes to acquire a connection, but more JDBC operations may be added in the future.

database_jdbc_operations_duration_seconds_bucket{le="0.01", operation="acquire_connection"} 5042

JVM Metrics

The Java version of Engine can also report on various JVM statistics. There are various metrics recorded within a few category that each have their own predictable prefix:

  • Metrics describing garbage collection begin with jvm_gc_
  • Memory usage statistics are denoted by jvm_memory_
  • jvm_threads_ denotes information about the JVM's thread usage

results matching ""

    No results matching ""