Performance monitoring
Monitoring TON server performance
Tools such as htop
, iotop
, iftop
, dstat
, and nmon
are effective for measuring real-time system performance; however, they fall short when it comes to troubleshooting performance issues that have occurred in the past.
This guide recommends using the Linux SAR (System Activity Report) utility for monitoring the performance of TON servers, and it provides an explanation of how to use it effectively.
This guideline helps to identify whether your server experiences a resource shortage, not whether the validator engine performs badly.
Installation
SAR Installation
sudo apt-get install sysstat
Enable automatic statistics gathering
sudo sed -i 's/false/true/g' /etc/default/sysstat
Enable the service
sudo systemctl enable sysstat sysstat-collect.timer sysstat-summary.timer
Start the service
sudo systemctl start sysstat sysstat-collect.timer sysstat-summary.timer
Usage
By default, the SAR gathers statistics every 10 minutes and shows statistics for the current day, starting at midnight. You can check it by running the SAR without parameters:
sar
If you want to see statistics of the previous day or two days before, pass the number as an option:
sar -1 # previous day
sar -2 # two days ago
For the exact date, you should use the f option to point to the sa
file of a given day within a month. Thus, for the September 23rd it would be:
sar -f /var/log/sysstat/sa23
To identify performance issues, it is essential to run specific SAR reports and analyze their results effectively.
The following list outlines the SAR commands that can be used to gather various system statistics. By combining these commands with the provided options, you can quickly generate reports for the desired date.
Memory report
sar -rh
The TON validator engine uses the jemalloc feature, which allows it to cache a significant amount of data. As a result, the sar —rh
command often returns a low number in the %memused
column.
Meanwhile, you will typically see a high number in the kbcached
column. Therefore, there is no need to worry about the low amount of free RAM indicated in the kbmemfree
column. The key indicator to monitor is the value shown in the %memused
column.
If the percentage exceeds 90%, you should consider adding more RAM. Additionally, keep an eye on your validator engine to ensure it doesn't stop unexpectedly due to an out-of-memory (OOM) issue. The best way to check for this is to grep the /var/ton-work/log
file for any Signal messages.
Swap usage:
sar -Sh
If you notice that a swap is used, you should consider adding more RAM. The general recommendation from the TON Core Team is to disable the swap.
CPU report
sar -u
If your server utilizes CPU on average up to 70% (see the %user
column), this should be considered as good.
Disk Usage report
sar -dh
Watch the %util
column and react accordingly if it stays above 90% for a particular disk.
Network report
Use the commands :
sar -n DEV -h
or
sar -n DEV -h --iface=<interface name>
if you want to filter results by network interface name.
Check out the result of column %ifutil
- it shows the usage of your interface considering its maximum link speed.
You can check what speed is supported by your NIC by executing the command below:
cat /sys/class/net/<interface>/speed
The speed you're experiencing is not what your provider promised you.
Consider upgrading your link speed if %ifutil
shows above 70% usage or columns rxkB/s
and txkB/s
reporting values close to a bandwidth provided by your provider.
Reporting a performance issue
Before reporting any performance issues, ensure that you meet the minimum requirements for the node. Then, execute the following commands:
To generate today's report, run:
sar -rudh | cat && sar -n DEV -h --iface=eno1 | cat > report_today.txt
For yesterday's report, use the following command:
sar -rudh -1 | cat && sar -n DEV -h --iface=eno1 -1 | cat > report_yesterday.txt
Additionally, stop the TON node and measure your disk I/O and network speed with the command below:
sudo fio --randrepeat=1 --ioengine=io_uring --direct=1 --gtod_reduce=1 --name=test --filename=/var/ton-work/testfile --bs=4096 --iodepth=1 --size=40G --readwrite=randread --numjobs=1 --group_reporting
Look for the value at read: IOPS=
and include it in your report. A value above 10K IOPS is considered good.
Check your download and upload speeds using the following command:
curl -s https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py | python3 -
Speeds exceeding 700 Mbit/s are deemed satisfactory.
When reporting, please send the SAR report, IOPS, and network speed results to @mytonctrl_help_bot.
Initial version by @neodix - TON Core Team, September 23, 2024