Performance Metrics
Overview
Engine can be configured to record and report on a number of metrics that cover various aspects of its performance. By default, this feature is disabled, but you can enable it by adding a setting called EnableRecordingMetrics
to your Engine config file with a value of true
. Once this setting is enabled, you can analyze those metrics by querying /RusticiEngine/metrics
. The information is reported using a Prometheus exporter, so we recommend that you analyze this data using a Prometheus server or a similar tool. For more information, you can refer to the Prometheus website.
Preface: Histogram Timers
Most of our metrics are recorded using histogram-backed timers. This means that each timer has multiple different labels, and each label identifies the bucket of the histogram being described. A single timer will show up on multiple lines of the metric report. Let's use the timer that monitors the response time of every single HTTP request that Engine receives as our example:
http_all_requests_duration_seconds_bucket{le="0.01"} 20
http_all_requests_duration_seconds_bucket{le="0.05"} 42
http_all_requests_duration_seconds_bucket{le="0.1"} 50
...etc...
http_all_requests_duration_seconds_bucket{le="60"} 56
http_all_requests_duration_seconds_bucket{le="+inf"} 56
http_all_requests_duration_seconds_sum 10.37343612
http_all_requests_duration_seconds_count 219
Metrics ending in _bucket
are used to categorize the various response times of the API. The le
label is short for "less than or equal to", so a _bucket
metric with a label of le="0.25"
specifies the number of requests that Engine responded to in less than 0.25 seconds. These buckets are cumulative, so if a request lands in the le="0.25"
bucket, then it will also contribute to the le="0.5"
and le="1"
buckets, all the way up to le="+inf"
. The _sum
and _count
metrics measure the total amount of time spent responding to given endpoint and the number of requests Engine has received to that endpoint.
Course Runtime
To see the response times of the endpoints used by the player, you can review course_runtime_operations_duration_seconds
. Along with the usual le
label, this metric also uses a label called operation
, which has the following possible values:
launch_page
- Used when launching a registration. All learning standards will use this page.player_configuration
- This is the endpoint called by the player after being launched. This is used by all learning standards, except for Tin Can courses when the settingUsePlayerForTinCanLaunches
is set to false (the default value in Engine 2018+ is true).record_results
- Called to record a registration's state during the course of a SCORM course.record_results_on_player_exit
- Called when a learner exits a SCORM course.process_aicc_request
- Used to perform various operations during AICC coursescmi5_au_launch
- Only used when launching cmi5 courseslti_launch
- Only used when launching LTI links
Here is an example:
course_runtime_operations_duration_seconds_bucket{le="0.01", operation="player_configuration"} 128
The API
We keep recording metrics covering the requests to both versions of the API. The method
label describes the HTTP method used to request the URL, and the url
label contains the URL that was requested with any identifiers replaced with [id]
. Aside from the name used for the metric and the possible values of the url
label, versions 1 and 2 of the API are presented identically. They are formatted like this:
api_v2_requests_duration_seconds_bucket{method="get", le="0.01", url="/registrations/[id]"} 28
API Counters
In addition to the above histograms, Engine also records counters to keep track of the number of requests made to/by each tenant and the status codes used in its responses to API requests:
api_requests_total{tenant="default"} 3
api_responses_total{status_code="200"} 328
The LRS
The performance of Engine's LRS can also be monitored. The metrics reporter uses the method
label to differentiate the different xAPI resources being requested. For a detailed description of each of these endpoints, please refer to the xAPI spec. The values currently used for the method
label are:
activities
activities_profile
activities_state
actors
actor_profile
extended_actions
statements
Example:
xapi_requests_duration_seconds_bucket{method="activities", le="0.01"} 17
Statement Pipe Metrics
We monitor each xAPI statement pipe individually. You can see the IDs of your statements pipes using the /xapi/pipes resource. You can then monitor a specific pipe by matching its ID to the value of the pipe_id
label:
xapi_statement_pipe_processing_duration_seconds_bucket{le="0.01", pipe_id="0374c238-e5d8-4361-97e7-e5612e1acf18"} 0
The Database
Engine keeps timers describing how long it takes to execute its various database operations. These timers follow our usual histogram timer reporting patterns from above, and they utilize a method
label. Internally, Engine separates its operations into queries (pulling data out of the database) and non-queries (inserting new or manipulating existing data). These are the two possible values for the method
label on our database metrics, and they can be used to help diagnose the source of any issues with database speed that you might be having.
database_data_helper_operations_duration_seconds_bucket{method="query", le="0.01"} 221
In addition to monitoring its own database operations, Engine can also keep track of certain JDBC operations. Right now, we only record the time it takes to acquire a connection, but more JDBC operations may be added in the future.
database_jdbc_operations_duration_seconds_bucket{le="0.01", operation="acquire_connection"} 5042
JVM Metrics
The Java version of Engine can also report on various JVM statistics. There are various metrics recorded within a few category that each have their own predictable prefix:
- Metrics describing garbage collection begin with
jvm_gc_
- Memory usage statistics are denoted by
jvm_memory_
jvm_threads_
denotes information about the JVM's thread usage