Monitor your node
This guide will walk you through how to set up Prometheus with Grafana to monitor your node using Ubuntu 18.04 or 20.04.
A Substrate-based chain exposes data such as the height of the chain, the number of connected peers to your node, CPU, memory usage of your machine, and more. To monitor this data, Prometheus is used to collect metrics and Grafana allows for displaying them on the dashboard.
Preparation
First, create a user for Prometheus by adding the --no-create-home
flag to disallow prometheus
from logging in.
Create the directories required to store the configuration and executable files.
Change the ownership of these directories to prometheus
so that only prometheus can access them.
Installing and Configuring Prometheus
After setting up the environment, update your OS, and install the latest Prometheus. You can check the latest release by going to their GitHub repository under the releases page.
The following two binaries are in the directory:
prometheus - Prometheus main binary file
promtool
The following two directories (which contain the web interface, configuration files examples and the license) are in the directory:
consoles
console_libraries
Copy the executable files to the /usr/local/bin/
directory.
Change the ownership of these files to the prometheus
user.
Copy the consoles
and console_libraries
directories to /etc/prometheus
Change the ownership of these directories to the prometheus
user.
Once everything is done, run this command to remove prometheus
directory.
Before using Prometheus, it needs some configuration. Create a YAML configuration file named prometheus.yml
by running the command below.
The configuration file is divided into three parts which are global
, rule_files
, and scrape_configs
.
scrape_interval
defines how often Prometheus scrapes targets, whileevaluation_interval
controls how often the software will evaluate rules.rule_files
block contains information of the location of any rules we want the Prometheus server to load.scrape_configs
contains the information which resources Prometheus monitors.
The configuration file should look like this below:
With the above configuration file, the first exporter is the one that Prometheus exports to monitor itself. As we want to have more precise information about the state of the Prometheus server we reduced the scrape_interval
to 5 seconds for this job. The parameters static_configs
and targets
determine where the exporters are running. The second exporter is capturing the data from your node, and the port by default is 9615
.
You can check the validity of this configuration file by running promtool check config /etc/prometheus/prometheus.yml
.
Save the configuration file and change the ownership of the file to prometheus
user.
Starting Prometheus
To test that Prometheus is set up properly, execute the following command to start it as the prometheus
user.
The following messages indicate the status of the server. If you see the following messages, your server is set up properly.
Go to http://SERVER_IP_ADDRESS:9090/graph
to check whether you are able to access the Prometheus interface or not. If it is working, exit the process by pressing on CTRL + C
.
Next, we would like to automatically start the server during the boot process, so we have to create a new systemd
configuration file with the following config.
Once the file is saved, execute the command below to reload systemd
and enable the service so that it will be loaded automatically during the operating system's startup.
Prometheus should be running now, and you should be able to access its front again end by re-visiting IP_ADDRESS:9090/
.
Installing Grafana
In order to visualize your node metrics, you can use Grafana to query the Prometheus server. Run the following commands to install it first.
If everything is fine, configure Grafana to auto-start on boot and then start the service.
You can now access it by going to the http://SERVER_IP_ADDRESS:3000/login
. The default user and password is admin/admin.
NOTE
If you want to change the port on which Grafana runs (3000 is a popular port), edit the file /usr/share/grafana/conf/defaults.ini
with a command like sudo vim /usr/share/grafana/conf/defaults.ini
and change the http_port
value to something else. Then restart grafana with sudo systemctl restart grafana-server
.
In order to visualize the node metrics, click settings to configure the Data Sources
first.
Click Add data source
to choose where the data is coming from.
Select Prometheus
.
The only thing you need to input is the URL
that is https://localhost:9090
and then click Save & Test
. If you see Data source is working
, your connection is configured correctly.
Next, import the dashboard that lets you visualize your node data. Go to the menu bar on the left and mouse hover "+" then select Import
.
Import via grafana.com
- It allows you to use a dashboard that someone else has created and made public. You can check what other dashboards are available via https://grafana.com/grafana/dashboards. In this guide, we use "My Polkadot Metrics", so input "12425" under the id field and click Load
.
Once it has been loaded, make sure to select "Prometheus" in the Prometheus dropdown list. Then click Import
.
In the meantime, start your Polkadot node by running ./polkadot
. If everything is done correctly, you should be able to monitor your node's performance such as the current block height, CPU, memory usage, etc. on the Grafana dashboard.
Installing and Configuring Alertmanager (Optional)
In this section, let's configure the Alertmanager that helps to predict the potential problem or notify you of the current problem in your server. Alerts can be sent in Slack, Email, Matrix, or others. In this guide, we will show you how to configure the email notifications using Gmail if your node goes down.
First, download the latest binary of AlertManager and unzip it by running the command below:
Gmail Setup
To allow AlertManager to send an email to you, you will need to generate something called an app password
in your Gmail account. For details, click here to follow the whole setup.
You should see something like below:
Copy and save it somewhere else first.
AlertManager Configuration
There is a configuration file named alertmanager.yml
inside the directory that you just extracted in the previous command, but that is not of our use. We will create our alertmanager.yml
file under /etc/alertmanager
with the following config.
NOTE
Ensure to change the ownership of "/etc/alertmanager" to prometheus
by executing
With the above configuration, alerts will be sent using the the email you set above. Remember to change YOUR_EMAIL
to your email and paste the app password you just saved earlier to the YOUR_APP_PASSWORD
.
Next, create another systemd
configuration file named alertmanager.service
by running the command sudo nano /etc/systemd/system/alertmanager.service
with the following config.
SERVER_IP
Change to your host IP address and make sure port 9093 is opened.
To the start the Alertmanager, run the following commands:
You should see the process status is "active (running)" if you have configured properly.
There is a Alertmanager plugin in Grafana that can help you to monitor the alert information. To install it, execute the command below:
And restart Grafana once the plugin is successfully installed.
Now go to your Grafana dashboard SERVER_IP:3000
and configure the Alertmanager datasource.
Go to Configuration -> Data Sources, search "Prometheus AlertManger" if you cannot find it at the top.
Fill in the URL
to your server location followed by the port number used in the Alertmanager.
Then click "Save & Test" at the bottom to test the connection.
To monitor the alerts, let's import dashboard "8010" that is used for Alertmanager. And make sure to select the "Prometheus AlertManager" in the last column. Then click "Import".
You will end up having the following:
AlertManager Integration
To let the Prometheus server be able to talk to the AlertManager, we will need to add the following config in the etc/prometheus/prometheus.yml
.
That is the updated etc/prometheus/prometheus.yml
.
We will need to create a new file called "rules.yml" under /etc/prometheus/
that is defined all the rules we would like to detect. If any of the rules defined in this file is fulfilled, an alert will be triggered. The rule below checks whether the instance is down. If it is down for more than 5 minutes, an email notification will be sent. If you would like to learn more about the details of the rule defining, go here. There are other interesting alerts you may find useful here.
Change the ownership of this file to prometheus
instead of root
by running:
To check the rules defined in the "rules.yml" is syntactically correct, run the following command:
Finally, restart everything by running:
Now if one of your target instances down, you will receive an alert on the AlertManager and Gmail like below.
Last updated