Performance Metrics

Overview

Engine can be configured to record and report on a number of metrics that cover various aspects of its performance. By default, this feature is disabled, but you can enable it by adding a setting called EnableRecordingMetrics to your Engine config file with a value of true. Once this setting is enabled, you can analyze those metrics by querying /RusticiEngine/metrics. The information is reported using a Prometheus exporter, so we recommend that you analyze this data using a Prometheus server or a similar tool. For more information, you can refer to the Prometheus website.

Preface: Histogram Timers

Most of our metrics are recorded using histogram-backed timers. This means that each timer has multiple different labels, and each label identifies the bucket of the histogram being described. A single timer will show up on multiple lines of the metric report. Let's use the timer that monitors the response time of every single HTTP request that Engine receives as our example:

http_all_requests_duration_seconds_bucket{le="0.01"} 20
http_all_requests_duration_seconds_bucket{le="0.05"} 42
http_all_requests_duration_seconds_bucket{le="0.1"} 50
...etc...
http_all_requests_duration_seconds_bucket{le="60"} 56
http_all_requests_duration_seconds_bucket{le="+inf"} 56
http_all_requests_duration_seconds_sum 10.37343612
http_all_requests_duration_seconds_count 219

Metrics ending in _bucket are used to categorize the various response times of the API. The le label is short for "less than or equal to", so a _bucket metric with a label of le="0.25" specifies the number of requests that Engine responded to in less than 0.25 seconds. These buckets are cumulative, so if a request lands in the le="0.25" bucket, then it will also contribute to the le="0.5" and le="1" buckets, all the way up to le="+inf". The _sum and _count metrics measure the total amount of time spent responding to given endpoint and the number of requests Engine has received to that endpoint.

Course Runtime

To see the response times of the endpoints used by the player, you can review course_runtime_operations_duration_seconds. Along with the usual le label, this metric also uses a label called operation, which has the following possible values:

launch_page - Used when launching a registration. All learning standards will use this page.
player_configuration - This is the endpoint called by the player after being launched. This is used by all learning standards, except for Tin Can courses when the setting UsePlayerForTinCanLaunches is set to false (the default value in Engine 2018+ is true).
record_results - Called to record a registration's state during the course of a SCORM course.
record_results_on_player_exit - Called when a learner exits a SCORM course.
process_aicc_request - Used to perform various operations during AICC courses
cmi5_au_launch - Only used when launching cmi5 courses
lti_launch - Only used when launching LTI links
persist_runtime_data - The saving of course runtime data to the database

Here is an example:

course_runtime_operations_duration_seconds_bucket{le="0.01", operation="player_configuration"} 128

The API

We keep recording metrics covering the requests to both versions of the API. The method label describes the HTTP method used to request the URL, and the url label contains the URL that was requested with any identifiers replaced with [id]. Aside from the name used for the metric and the possible values of the url label, versions 1 and 2 of the API are presented identically. They are formatted like this:

api_v2_requests_duration_seconds_bucket{method="get", le="0.01", url="/registrations/[id]"} 28

API Counters

In addition to the above histograms, Engine also records counters to keep track of the number of requests made to/by each tenant and the status codes used in its responses to API requests:

api_requests_total{tenant="default"} 3
api_responses_total{status_code="200"} 328

API Import Course Jobs

Every time a course is imported, we track how long it takes to do so under the api_import_course_jobs name. This metric has a label titled 'type', which designates the type of import as uploaded_zip, ad_hoc_import, import_by_manifest, file_path, or zip_url. An example metric for the 10-second bucket for ad-hoc imports follows:

api_import_course_jobs_duration_seconds_bucket{le="10", type="ad_hoc_import"} 1

Webhooks

In the past, Engine only officially supported postback updates for two things: registration rollups and import job status. In Engine 22, we introduced our new subscription system, allowing you to subscribe to Engine event notifications and receive postbacks to a URL of your choosing.

Although we strongly advise you use the new subscription system, Engine still is capable of using the legacy system. As such, the following metrics are used for both the new subscription system and the legacy postback system:

api_postback_successful_responses: postbackType label with Queue ID as string
api_postback_failed_responses: postbackType label with Queue ID as string
api_postback_queue_size : type label with Queue ID as stringapi_postback_queue_items : type label with Queue ID as string

Example:

api_postback_queue_size{type="rollup_reg"} 0

api_postback_queue_items{type="rollup_reg"} 0

Modern Webhooks Metrics

sub_postback: type label with potential value send_message

Example:

sub_postback_duration_seconds_bucket{le="30", type="send_message"} 1

Legacy Postback Metrics

api_postback: operation label with potential values import_results and reg_rollup

Example:

api_postback{le="30", operation="import_results"} 1

The LRS

The performance of Engine's LRS can also be monitored. The metrics reporter uses the method label to differentiate the different xAPI resources being requested. For a detailed description of each of these endpoints, please refer to the xAPI spec. The values currently used for the method label are:

activities
activities_profile
activities_state
actors
actor_profile
extended_actions
statements

Example:

xapi_requests_duration_seconds_bucket{method="activities", le="0.01"} 17

xAPI Requests by Version

xAPI requests are tracked by version and sorted into three seperate buckets (listed below). These metrics include a method label for what HTTP method was used for the request and a path label to specify what xAPI URL path was hit.

xapi__v0_9_requests
xapi__v0_95_requests
xapi__v1_0_x_requests

Here's what the metric for accessing GET activities/state with xAPI 1.0 or above might look like:

xapi__v1_0_x_requests{path="activities/state", method="get"} 1

Statement Pipe Metrics

We monitor each xAPI statement pipe individually. You can see the IDs of your statements pipes using the /xapi/pipes resource. You can then monitor a specific pipe by matching its ID to the value of the pipe_id label:

xapi_statement_pipe_processing_duration_seconds_bucket{le="0.01", pipe_id="0374c238-e5d8-4361-97e7-e5612e1acf18"} 0

xAPI Metadata Fetching

Every time Engine fetches metadata for an activity, it's logged under the xapi_metadata_fetches_total meter.

This meter includes a label titled fetch_type with possible values: success_registry, success_other, tried_registry, tried_other, objStoreCached_registry, objStoreCached_other.

Example:

xapi_metadata_fetches_total{fetch_type="success_registry"} 9

Background Processors

Engine can have several processes running in the background performing different tasks, like processing event messages or forwarding xAPI statements. The list below contains a complete record of the metric names associated with our various processors.

registration_postback_queue_processor
import_postback_queue_processor
x_api_statement_pipe_processor
scorm_to_tin_can_processor
x_api_token_deletion_processor
pens_async_processor
event_message_processor
notification_sequencing_deletion_processor
launch_history_expiration_processor

Example:

registration_postback_queue_processor_duration_seconds_bucket{le="0.05", status="running"} 66

Note: Alongside the typical _duration_seconds_bucket, _duration_seconds_count, and _duration_seconds_sum suffixes for metrics, Background Processor metrics also track the completion and failure of tasks with the _successful and _failure suffixes.

Example:

registration_postback_queue_processor_success{status="completed"} 1

The Database

Engine keeps timers describing how long it takes to execute its various database operations. These timers follow our usual histogram timer reporting patterns from above, and they utilize a method label. Internally, Engine separates its operations into queries (pulling data out of the database) and non-queries (inserting new or manipulating existing data). These are the two possible values for the method label on our database metrics, and they can be used to help diagnose the source of any issues with database speed that you might be having.

database_data_helper_operations_duration_seconds_bucket{method="query", le="0.01"} 221

In addition to monitoring its own database operations, Engine can also keep track of certain JDBC operations. Right now, we only record the time it takes to acquire a connection, but more JDBC operations may be added in the future.

database_jdbc_operations_duration_seconds_bucket{le="0.01", operation="acquire_connection"} 5042

JVM Metrics

The Java version of Engine can also report on various JVM statistics. There are various metrics recorded within a few category that each have their own predictable prefix:

Metrics describing garbage collection begin with jvm_gc_
Memory usage statistics are denoted by jvm_memory_
jvm_threads_ denotes information about the JVM's thread usage

Performance Metrics

Performance Metrics

Overview

Preface: Histogram Timers

Course Runtime

The API

API Counters

API Import Course Jobs

Webhooks

Modern Webhooks Metrics

Legacy Postback Metrics

The LRS

xAPI Requests by Version

Statement Pipe Metrics

xAPI Metadata Fetching

Background Processors

The Database

JVM Metrics

results matching ""

No results matching ""