The central aspects of the provision of a BI platform include a comprehensive health check of the BI system components and tools as well as a technical examination of the processes involved in data management and transformation. Based on a monitoring stack of Grafana and Prometheus, the PTA defines both technical and business KPIs and creates appropriate dashboards. The state of the higher-level components is determined on the basis of a roll-up principle of coherent, detailed KPIs.
Supplement
At the lowest level, dashboards are mapped with detailed panels and the time history for the technical KPIs, such as CPU utilization, memory usage, drives, etc. The technical metrics are provided by a node exporter that is installed on all VMs of the BI platform. Information about the business processes is also displayed in the form of ETL jobs. The client side is also monitored in terms of HTTP checks and the resolution of DNS names. On the basis of defined metrics and threshold values, a tile is created for each system component, encapsulated, that shows the current status in the colors green, yellow, or red. At the top level, exactly one tile represents the state of the entire platform based on the states of the associated platform components. In addition to the graphical display for the threshold values in Grafana, alerting is activated on the basis of the Prometheus Alerting Rules.
Subject description
The components of the BI platform are largely operated in a cloud and can already be monitored at the operating system level. Email alerting can also be configured for the relevant metrics. However, because distributed and heterogeneous monitoring does not provide visibility into the state of interdependent tools and processes, contiguous dashboards are defined. This ensures that the effects of specific failures or overloads are detected and remedied in a timely manner.