Skip to main content

Performance monitoring

Monitoring TON server performance

Tools such as htop, iotop, iftop, dstat, and nmon are effective for measuring real-time system performance; however, they fall short when it comes to troubleshooting performance issues that have occurred in the past.

This guide recommends using the Linux SAR (System Activity Report) utility for monitoring the performance of TON servers, and it provides an explanation of how to use it effectively.

tip

This guideline helps to identify whether your server experiences a resource shortage, not whether the validator engine performs badly.

Installation

SAR Installation

sudo apt-get install sysstat

Enable automatic statistics gathering

sudo sed -i 's/false/true/g' /etc/default/sysstat

Enable the service

sudo systemctl enable sysstat sysstat-collect.timer sysstat-summary.timer

Start the service

sudo systemctl start sysstat sysstat-collect.timer sysstat-summary.timer

Usage

By default, the SAR gathers statistics every 10 minutes and shows statistics for the current day, starting at midnight. You can check it by running the SAR without parameters:

sar

If you want to see statistics of the previous day or two days before, pass the number as an option:

sar -1   # previous day
sar -2 # two days ago

For the exact date, you should use the f option to point to the sa file of a given day within a month. Thus, for the September 23rd it would be:

sar -f /var/log/sysstat/sa23

To identify performance issues, it is essential to run specific SAR reports and analyze their results effectively.

The following list outlines the SAR commands that can be used to gather various system statistics. By combining these commands with the provided options, you can quickly generate reports for the desired date.

Memory report

sar -rh

The TON validator engine uses the jemalloc feature, which allows it to cache a significant amount of data. As a result, the sar —rh command often returns a low number in the %memused column.

Meanwhile, you will typically see a high number in the kbcached column. Therefore, there is no need to worry about the low amount of free RAM indicated in the kbmemfree column. The key indicator to monitor is the value shown in the %memused column.

If the percentage exceeds 90%, you should consider adding more RAM. Additionally, keep an eye on your validator engine to ensure it doesn't stop unexpectedly due to an out-of-memory (OOM) issue. The best way to check for this is to grep the /var/ton-work/log file for any Signal messages.

Swap usage:

sar -Sh

If you notice that a swap is used, you should consider adding more RAM. The general recommendation from the TON Core Team is to disable the swap.

CPU report

sar -u

If your server utilizes CPU on average up to 70% (see the %user column), this should be considered as good.

Disk Usage report

sar -dh

Watch the %util column and react accordingly if it stays above 90% for a particular disk.

Network report

Use the commands :

sar -n DEV -h

or

sar -n DEV -h --iface=<interface name>

if you want to filter results by network interface name.

Check out the result of column %ifutil - it shows the usage of your interface considering its maximum link speed.

You can check what speed is supported by your NIC by executing the command below:

cat /sys/class/net/<interface>/speed
info

The speed you're experiencing is not what your provider promised you.

Consider upgrading your link speed if %ifutil shows above 70% usage or columns rxkB/s and txkB/s reporting values close to a bandwidth provided by your provider.

Reporting a performance issue

Before reporting any performance issues, ensure that you meet the minimum requirements for the node. Then, execute the following commands:

To generate today's report, run:

sar -rudh | cat && sar -n DEV -h --iface=eno1 | cat > report_today.txt

For yesterday's report, use the following command:

sar -rudh -1 | cat && sar -n DEV -h --iface=eno1 -1 | cat > report_yesterday.txt

Additionally, stop the TON node and measure your disk I/O and network speed with the command below:

sudo fio --randrepeat=1 --ioengine=io_uring --direct=1 --gtod_reduce=1 --name=test --filename=/var/ton-work/testfile --bs=4096 --iodepth=1 --size=40G --readwrite=randread --numjobs=1 --group_reporting

Look for the value at read: IOPS= and include it in your report. A value above 10K IOPS is considered good.

Check your download and upload speeds using the following command:

curl -s https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py | python3 -

Speeds exceeding 700 Mbit/s are deemed satisfactory.

When reporting, please send the SAR report, IOPS, and network speed results to @mytonctrl_help_bot.

Initial version by @neodix - TON Core Team, September 23, 2024