Motivation
Why you would run an monitoring system like this? I would mostly say because we can but no, this time it is necessary to run such an stack. Since I run autonomous systems and an IT-company and Emile mostly run this as an companion in most of this non-customer based projects. After two years of maintaining a shitload of Icinga2 and check_mk based systems, I decided to migrate the whole monitoring to a new shiny system. After 2 weeks of evaluation i tried the setup we describe in the blogpost and can recommend the setup!
Quickstats
Monitoring in a nutshell: Have a master to which the workers report their status. Scalabale, simple, efficient: the ETVGA stack (Exporter Telegraf Victoriametrics Grafana Alertmanager).

The bird exporter exports metrics that are scraped by Telegraf. Telegraf then sends the scraped data to Victoria Metrics. Grafana then accesses the data exposed for it by Victoria Metrics.
Overall concept
The individual nodes report their stats to the master. This makes it possible to dynamically add nodes without needing to adjust stuff on the master node.

In the example schema above, the worker nodes node[1-n].company.com
report their stats to the master node located at masternode.company.com
.
Setup
All files needed are located in this git repository.

The setup works like this: The Ansible inventory is built using the data provided by the netbox. This is then used by the Ansible runner to create the exporter and sidecar Telegraf service for exporting the data on the individual nodes.
Setup the main node
Install docker + docker-compose
- install docker
$ curl -s https://get.docker.com | sh
- install docker-compose
$ sudo curl -L "https://github.com/docker/compose/releases/download/1.25.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
Setup the directory structure
- create a docker directory
$ mkdir -p /docker/monitoring
$ cd /docker/monitoring
Insert the needed files
- insert docker-compose here
- insert the grafana.env here
Adjust the docker-compose to suite your needs
- adjust host rules (replace "yourdomain.com" with your domain)
sed -i 's/yourdomain.com/newdomain.com/g' docker-compose.yml
- create passwords using htpasswd
- create passwords the auth (traefik, victoria-metrics)
- create a password for grafana in the grafana.env
Deploy the compose
docker-compose up -d
Setup grafana
- login to grafana using the user admin and the password defined in the env file
- add the victoria metrics endpoint
Setup the worker nodes
Ansible setup
We use ansible to deploy Telegraf and the Exporter onto the devices.
- Add influx repo
- Add influx repo gpg key
- Update apt cache
- Install Telegraf
- Build config from template
- Restart telegraf
- add the host to the ansible inventory
This is done like this
- adjust the telegraf config file
- host
- password
- run the ansible playbook
ansible playbook -i <inventory> Playboks/setup-telegraf.yml --limit "<ip>"
Master
This is the master node which bundles the metrics. This means that all other nodes PUSH their metrics here and Victoria Metrics bundles the results so that Grafana. can display them.
Nodes
These are worker nodes that aggergate metrics that should be monitored. This happens in two steps:
- Aggergate the metrics using an Exporter (such as bird_exporter)
- Scrape the exported data on the node using Telegraf. This periodically collects the results from the exporter and pushes the data to the Victoria Metrics instance on the master node.
Alerting
The alerting is done by the Grafana, i want to attach the alertmanager by Prometheus, but it is currently not support by Victoria Metrics