The mender-monitor
service supports the following directory structure:
/etc/mender-monitor
`-- monitor.d
|-- available
| |-- log_auth_root_session.sh
| `-- service_mender-connect.sh
|-- dbus.sh
|-- enabled
| |-- log_auth_root_session.sh -> /etc/mender-monitor/monitor.d/available/log_auth_root_session.sh
| `-- service_mender-connect.sh -> /etc/mender-monitor/monitor.d/available/service_mender-connect.sh
|-- log.sh
`-- service.sh
What is listed under the root level of monitor.d
are the Subsystems (log.sh
, dbus.sh
and service.sh
).
The directory monitor.d/available
lists the created Checks. By using create
and delete
, mender-monitorctl
will create or delete a Check which means it will create the file in the correct naming convention and define variables within it. The name convention follows the structure:
<monitoring_subsystem_name>_<check_name>.sh
For the examples listed one is a Check for the log Subsystem (log_auth_root_session.sh) and the other for the service (service_mender-connect.sh).
A Check needs to be enabled before it will be taken into consideration. By running mender-monitorctl
with the enable
or disable
parameters, it will create a symbolic link inside the enabled
folder to the right check from the available
folder. From this folder, the mender-monitor
service executes the defined Subsystems based on the enabled Checks.
Create a service which will count some seconds and then fail.
cat > countdown.sh << "EOF"
#!/bin/sh
i=30
while [ $i -gt 0 ]; do
echo "INFO: $i seconds remaining"
i=$((i-1))
sleep 1
done
echo "ERROR: Exiting with return 1"
exit 1
EOF
PATH_TO_SCRIPT=$(realpath countdown.sh)
chmod +x $PATH_TO_SCRIPT
cat > /etc/systemd/system/countdown.service << EOF
[Unit]
Description=Countdown Service
[Service]
Type=simple
ExecStart=$PATH_TO_SCRIPT
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl start countdown.service
You can confirm the countdown with journalctl -fu countdown.service
.
Now create the check for the systemd service
monitoring subsystem:
sudo mender-monitorctl create service countdown systemd
This will create a file in the directory /etc/mender-monitor/monitor.d/available
with the check definitions of the service name and the service type:
cat /etc/mender-monitor/monitor.d/available/service_countdown.sh
# This file was autogenerated by Monitoring Utilities based on the configuration
SERVICE_NAME="countdown"
SERVICE_TYPE="systemd"
Then, you can enable the the check by running:
sudo mender-monitorctl enable service countdown
This command links the file in /etc/mender-monitor/monitor.d/available
to
/etc/mender-monitor/monitor.d/enabled
. You can verify it by running:
readlink /etc/mender-monitor/monitor.d/enabled/service-countdown.sh
The following use cases extend beyond the typical usage of Mender Monitor, but they are attainable due to the tool's customizable design.
Since the Checks and Subsystems are represented by a directory structure there is an option to modify the files directly instead of using the CLI tool.
You must run the following lines with root permission
To use the Mender Monitor library, first you need to source the environment with the function set provided to interact with the Mender Server and Monitor logic.
cd /usr/share/mender-monitor
source lib/monitor-lib.sh
Once the environment is sourced, there will be new functions available to use. For example the function monitor_send_alert
sends the alert data (OK or CRITICAL) to the Mender Server.
This function takes the following parameters:
monitor_send_alert "alert_type" "alert_description" "alert_details" "subject_name" "subject_status" "subject_type" "log_pattern" "log_file_path" "lines_before" "line_matching" "lines_after"
By sending an OK alert you can clean your alert level. Assuming you did not implement it on your subsystem, then you can force it by running a command similar to the one below (assuming a service subsystem):
SERVICE_NAME = "your-service-name"
monitor_send_alert OK "Service ${SERVICE_NAME} running" "The main process is present again" "${SERVICE_NAME}" "running" "service"
In this example, we will guide you through the necessary steps to create a new subsystem that monitors disk usage on the device.
First, let us start by creating the subsystem file:
cat >/etc/mender-monitor/monitor.d/diskusage.sh <<EOF
#!/usr/bin/env bash
# Copyright 2022 Northern.tech AS
#
# All Rights Reserved
#
#
# Monitor the disk space of a given disk.
#
#
# More specifically
#
# DISKUSAGE_NAME=<some name>
# DISKUSAGE_THRESHOLD=<1-100> (default: 80)
#
EOF
In the file, it is necessary to source the mender-monitor
library to enable the required functions for interacting with the Mender Server:
. common/common.sh
. lib/monitor-lib.sh
. lib/alert-lib.sh
. lib/subsystem-storage-lib.sh
Next, we need to validate the input it may require. In this example, we will only validate the DISKUSAGE_NAME
variable:
#
# Parse the input
#
if [[ -z "${DISKUSAGE_NAME}" ]]; then
log_error "DISKUSAGE_NAME not set, this is an error."
exit 0
fi
The definition of the actual command or logic that the subsystem is going to monitor comes next:
function disk_usage() {
df --output=pcent ${DISKUSAGE_NAME} | tail -1 | cut -d% -f1
}
It is important to remember that each check will have a unique key to store its last alarm status. To retrieve the check name used to source the subsystem, we can use the following function:
function get_monitor_name() {
local -r monitor_name=$(basename "${env}")
local -r strip_subsystem_name=${monitor_name//diskusage_/}
echo ${strip_subsystem_name%.sh}
}
Finally, we define how and when the OK and CRITICAL alerts are generated and sent to the Mender Server.
In this case, we check if the DISKUSAGE_USAGE
exceeded the DISKUSAGE_THRESHOLD
value, in case it does, it will send the CRITIAL alert using the function monitor_send_alert
from the mender-monitor
library. When the DISKUSAGE_USAGE
comes down and it is no longer exceeding, it will send the OK alert.
CONNECTIVITY_MONITOR_KEY=$(get_monitor_name)
DISKUSAGE_USAGE=$(disk_usage)
if [[ ${DISKUSAGE_USAGE} -gt ${DISKUSAGE_THRESHOLD:-80} ]]; then
log_debug "Disk storage has grown to fill more than ${DISKUSAGE_THRESHOLD:-80}% of the disk"
if [[ $(subsystem_get "${SUBSYSTEM_NAME}" "LAST_ALARM_${CONNECTIVITY_MONITOR_KEY}") != CRITICAL ]]; then
log_debug "Disk storage alarm ready to send CRITICAL"
monitor_send_alert \
CRITICAL \
"Disk storage has grown to fill more than ${DISKUSAGE_THRESHOLD:-80}% of the disk '${DISKUSAGE_NAME}'" \
"Disk ${DISKUSAGE_NAME} is now at ${DISKUSAGE_USAGE}% capacity, above the ${DISKUSAGE_THRESHOLD:-80}% threshold" \
"${DISKUSAGE_NAME}" \
DISKUSAGE_USAGE_WARNING \
"disk"
subsystem_set "${SUBSYSTEM_NAME}" "LAST_ALARM_${CONNECTIVITY_MONITOR_KEY}" CRITICAL
else
log_debug "The disk usage is too high, but the alarm CRITICAL is already sent"
fi
else
if [[ $(subsystem_get "${SUBSYSTEM_NAME}" "LAST_ALARM_${CONNECTIVITY_MONITOR_KEY}") == CRITICAL ]]; then
log_debug "Disk storage alarm send OK"
monitor_send_alert \
OK \
"Disk storage has grown to fill more than ${DISKUSAGE_THRESHOLD:-80}% of the disk '${DISKUSAGE_NAME}'" \
"Disk ${DISKUSAGE_NAME} is now at ${DISKUSAGE_USAGE}% capacity, back below the ${DISKUSAGE_THRESHOLD:-80}% threshold" \
"${DISKUSAGE_NAME}" \
DISKUSAGE_USAGE_WARNING \
"disk"
subsystem_set "${SUBSYSTEM_NAME}" "LAST_ALARM_${CONNECTIVITY_MONITOR_KEY}" OK
else
log_debug "Disk usage is fine, and no need to send alarm OK"
fi
fi
After creating the subsystem, let us proceed to create a new check named root_space
to monitor the disk usage of the root directory mounted at /
:
cat >/etc/mender-monitor/monitor.d/available/diskusage_root_space.sh <<EOF
# Copyright 2022 Northern.tech AS
#
# All Rights Reserved
#
# Monitor the whole rootfs space usage
#
DISKUSAGE_NAME="/"
# Report on 3/4 full disk
DISKUSAGE_THRESHOLD=75
EOF
To enable the check, you can do so by running the following command:
mender-monitorctl enable diskusage root_space
It is possible to create a level of abstraction for the subsystems known as pseudo subsystems. These are predefined configurations built upon existing subsystems, designed to streamline the process of creating new checks.
Further documentation on pseudo subsystems in the form of an example.
dockerevents Pseudo Subsystem
dockerevents
pseudo subsystem, can be used to generate a check to monitor any events as reported by docker events
command.
The dockerevents
definition can be found in the mender-monitor
library source code:
cat /usr/share/mender-monitor/lib/ctl-lib.sh
...
function ctl_create_dockerevents_subsystem_entry() {
local -r service_name="$1"
local -r container_name="$2"
local -r action_name="$3"
EXTRA_SETTINGS="LOG_ALERT_DESCRIPTION=\"Docker container ${container_name} ${action_name}\"\nLOG_ALERT_DETAILS=\"Alert was raised due to:%line_matching\"\nLOG_ALERT_STATUS=DOCKEREVENTS_CONTAINER_EVENT\nLOG_ALERT_TYPE=docker_event\n" ctl_create_log_subsystem_entry "$service_name" ".*container ${action_name}.*name=${container_name}.*" "@docker events" "$4"
}
declare -A SUBSYSTEMS_NAME_TO_SUBSYSTEM=([dockerevents]="log")
...
From the previous code, you can see it predefines the required variables for the log subsystem to function and creates the check just like a regular log check.
Using this pseudo subsystem makes the creation of checks easier:
mender-monitorctl create dockerevents scanner_kill scanner kill 16
mender-monitorctl enable dockerevents scanner_kill
systemctl restart mender-monitor
These commands will create the Check log_scanner_kill.sh
in the folder /etc/mender-monitor/monitor.d/available/
. The resulting check uses the log monitor, as you can see with:
cat /etc/mender-monitor/monitor.d/available/log_scanner_kill.sh
# This file was autogenerated by Monitoring Utilities based on the configuration
SERVICE_NAME="scanner_kill"
LOG_PATTERN=".*container kill.*name=scanner.*"
LOG_FILE="@docker events"
LOG_PATTERN_EXPIRATION=16
LOG_ALERT_DESCRIPTION="Docker container scanner kill"
LOG_ALERT_DETAILS="Alert was raised due to:%line_matching"
LOG_ALERT_STATUS=DOCKEREVENTS_CONTAINER_RESTART
LOG_ALERT_TYPE=docker_event
With the above configuration, you will receive a CRITICAL
alert if someone or something kills your scanner container.
This will lead the Mender UI to present the device in a critical monitoring state. Since there is no natural
way, to recover from this situation, we are using the last and optional argument
to the mender-monitorctl create dockerevents
command which stands for the number of seconds
after which the Mender Monitor daemon sends an automatic OK. In that way after 16s without
a kill
event on the container, the device will recover to a normal state.
© 2024 Northern.tech AS