Monitoring
Objective
Set up Prometheus and Grafana for TON node metrics. kube-prometheus-stack is recommended because the chart includes a ServiceMonitor template for automatic scrape discovery.
Prerequisites
-
Enable the metrics HTTP server in node config (
config.json):{ "metrics": { "address": "0.0.0.0:9100", "global_labels": { "network": "mainnet", "node_id": "my-node-0" } } }The server exposes
/metrics(Prometheus format),/healthz(liveness), and/readyz(readiness). Ifmetricsis absent, the server is not started.Required labels
global_labelswithnetwork, andnode_idare required for the bundled Grafana dashboard. Without them, dashboard variables are empty and panels show no data. -
Set
ports.metricsin Helm values:ports: metrics: 9100The port must match the
metrics.addressport in node config.
Network security
The metrics port is never exposed on public per-replica LoadBalancer services. The chart creates a dedicated internal <release>-metrics ClusterIP service instead, accessible only inside the cluster.
External metrics access can be added with a custom LoadBalancer service that targets the metrics port. The recommended approach is an ingress with authentication (basic auth, OAuth2 proxy, and similar) that proxies to <release>-metrics.
Quick start
Minimal values to enable metrics, probes, and ServiceMonitor:
Not runnable
ports:
metrics: 9100
probes:
startup:
httpGet:
path: /healthz
port: metrics
failureThreshold: 60
periodSeconds: 10
liveness:
httpGet:
path: /healthz
port: metrics
periodSeconds: 30
failureThreshold: 3
readiness:
httpGet:
path: /readyz
port: metrics
periodSeconds: 10
failureThreshold: 3
metrics:
serviceMonitor:
enabled: trueServiceMonitor configuration
Enable ServiceMonitor so kube-prometheus-stack discovers and scrapes node metrics automatically:
Not runnable
metrics:
serviceMonitor:
enabled: trueLabel matching
Some Prometheus Operator installations filter ServiceMonitor resources by labels (serviceMonitorSelector in the Prometheus custom resource). If a Prometheus instance requires labels:
Not runnable
metrics:
serviceMonitor:
enabled: true
labels:
release: kube-prometheus-stackScrape interval
By default, ServiceMonitor inherits the global Prometheus scrape interval (typically 30s). To override:
Not runnable
metrics:
serviceMonitor:
enabled: true
interval: "15s"
scrapeTimeout: "10s"Cross-namespace monitoring
If Prometheus runs in a different namespace, set the ServiceMonitor namespace to the namespace where Prometheus looks:
Not runnable
metrics:
serviceMonitor:
enabled: true
namespace: monitoringA namespaceSelector is added automatically so Prometheus can discover services in the release namespace.
Alternative: Prometheus annotations
If Prometheus Operator is not used and services are scraped through prometheus.io/* annotations:
Not runnable
metrics:
annotations:
enabled: trueThis adds prometheus.io/scrape, prometheus.io/port, and prometheus.io/path to the <release>-metrics ClusterIP service.
Alternative: static scrape config
For other Prometheus setups, the metrics endpoint is available through the internal ClusterIP service:
<release>-metrics.<namespace>.svc.cluster.local
Grafana dashboard
The Grafana dashboard is authored as TypeScript with Grafana Foundation SDK and generated to JSON. Dashboard source is available in TON Rust Node Grafana source. Generated output file name is ton-node-overview.json.
The dashboard uses two multi-select template variables:
networknode_id
These correspond to global_labels in node metrics config.
Dashboard sections:
- Node Status
- Build Info
- Transactions per second
- Sync and Block Progress
- Validation and Collation
- Outbound Message Queue
- Network
- Database and Storage
Generate dashboard JSON
Run from the TON Rust Node repository root.
cd grafana
bun install
bun run generatebun run generate writes ton-node-overview.json.
Import into Grafana
- Open Dashboards > New > Import.
- Upload
ton-node-overview.json. - Select a Prometheus data source.
- Click Import.
Edit workflow
- Edit dashboard TypeScript source files.
- Run
bun run generate. - Import the generated JSON and verify panels.
- Commit TypeScript source files. The generated JSON file is ignored by Git.
Alert rules
PrometheusRule resources can be created to trigger alerts based on TON node metrics.
Last updated on