Currently all nixos boxes are installed with node-exporter to expose system metrics on port 9100. node-exporter is installed with various plugins to gather system metrics.
node-exporter is configured by nixos here
Prometheus is configured to scrape metrics from various sources. Prometheus is currently deployed with static scrape configs pointing to DNS entries of servers.
All servers need to be added here to ensure they are scraped by prometheus for metrics.
By default prometheus scrapes every
15s, this may need to be reduced to
1m later on. All data is retained for 15 days by default. Redbrick
currently has no use cases for long term data. But if required an influx or
graphite database should be used as a remote_write for Prometheus.
Fluentd is used as a syslog endpoint.
log.internal:514 is the logs endpoint.
Fluentd can be configured to parse and tag logs. Manual parsing of should be
avoided in fluentd in favour of Loki and fluentd plugins.
Fluentd is configured to send logs to Loki on the same host it is running on.
Loki is grafana's logging solution. Loki is query able in grafana. All Logs should be configured to send to it. Loki supports multiple ways to receive logs, redbrick uses fluentd and docker logging driver.
To send logs to Loki using a Loki client point logs to
Grafana is a graphing front end. Grafana has a large number of dashboards for reviewing metrics and logs from every node. Alerts should be configured in grafana to alert admins and root holders when events occur based on the metrics or log events.