Introduction
When a system goes into operation it is important that the system is monitored. The Operation team will be in charge of this, and the Platform will provide the mechanisms so that it can be monitored.
For this, the Platform offers 2 components, what we call Basic Monitoring and Advanced Monitoring. (readWhat does Onesait Platform's Advanced Monitoring offer?
Technologies
Platform monitoring allows monitoring both cluster machines and Kubernetes components, Pods, services, etc... and is based on the following technologies:
Prometheus: It is a tool that allows the collection of Kubernetes metrics, both of the k8s own components (Control Plane, Scheduler, etcd) and system components, as well as the deployed Platform pods in addition to basic metrics of the nodes that are part of the cluster.
Grafana: Allows to visualize, once the Datasource is configured, the metrics stored by both Prometheus and InfluxDB.
In addition, CaaS (Rancher 2 or OpenShift) integrates monitoring with Prometheus+Grafana so that from its administration console the different cluster dashboards can be accessed.
Deployment in Rancher
To deploy the Grafana+Prometheus combo offered by Rancher 2, click on the Tools>Monitoring tab.
This will open a new window where you can change the default settings for the display of these tools.
When you click on deploy, Rancher2 will run a Helm chart that after a few minutes will leave the monitoring of the elements that make up the cluster ready.
How to access the Monitoring in Rancher
To access the monitoring of the cluster and/or the components or microservices deployed in the different namespaces we will have to log into the Rancher 2 console:
Once inside we access the cluster we want to monitor, in this case onesaitplatform
A dashboard will open showing the percentage capacity of CPU, memory and POD's deployed in the cluster. In addition to a direct link to a specific dashboard for each Kubernetes component: etcd, ControlPlane, Scheduler and Nodes.
How to access the Monitoring of the Platform components
In the same way that we have accessed to the cluster monitoring (k8s component and nodes in which it deploys) we can access to the monitoring of each one of the Platform components, for this we will have to access to the namespace we are interested in from the Rancher 2 console.
Within the namespace we will access the workloads or deployed resources, in this case Deployments and Pods:
In the list of Deployments we will choose the one we want to monitor:
The workload metrics will be displayed along with a link to the dashboard if we click on the Grafana icon.
Within Grafana we will be able to examine the dashboard of any Deployment of the namespace, for it will be enough to choose it by means of the Deployment drop-down:
We will even be able to inspect the dashboards of other namespaces without having to go back to the Rancher 2 console.
New Monitoring System in Rancher >= 2.5
Monitoring using the Cluster Manager is deprecated as of Rancher version v2.5.0. The updated version is installed from the Cluster Explorer Marketplace.
Deploying Monitoring from Cluster Explorer
Open Cluster Explorer
Open the marketplace
Configure the monitoring application. The deployment is done with Helm 3. You can use the form to configure the installation, but also directly from the YAML file.
Select the project where the resources will be deployed.
Select the type of cluster
Selecting the retention policy for Prometheus
Choose whether to persist the data. To set up persistence you need to create and configure several additional resources depending on the cloud provider. The following steps are necessary if you are in an on premise environment without a disk provider.
Choose the appropriate size taking into account the retention previously indicated.
Prometheus is deployed using a StatefulSet, therefore PVCs are created using a PVC template defined in the StatefulSet itself. When there is no disk provider, the PVs have to be created by hand with tags that allow to define selectors so that the dynamically created PVCs can use the correct PVs.
Ejemplo de PV:
apiVersion: v1 kind: PersistentVolume metadata: name: prometheus-1 namespace: cattle-monitoring-system labels: minsait/storage: prometheus spec: capacity: storage: 100Gi nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: minsait/node operator: In values: - osp1 accessModes: - ReadWriteOnce hostPath: path: /Data/onesaitplatform/prometheus
There is a bug in RKE using Local or HostPath storage with subpaths in the persisted volume. The Helm chart does the mapping of PVCs to Prometheus PODs using subpaths and therefore fails. If this happens, it is apparently well installed, but the data is not persisted in the specified directory but in a local Kubernetes volume in /var/lib/kubelet. To fix this you have to edit the YAML and add the setting disableMountSubPath: true. This is done in the storageSpec section of the prometheusSpec configuration.
prometheus: prometheusSpec storageSpec: disableMountSubPath: true
To configure Grafana persistence you need to create a PV and a PVC manually as there is no disk provider.
Finally, affinity rules can be defined to choose on which nodes the resources will be deployed. For example:
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: minsait/env operator: In values: - osp
Some pods used to extract metrics from nodes are deployed on all nodes.
Finally, if you want to delete the monitoring application it is necessary to delete several resources manually. Once the application has been deleted from the marketplace it is necessary to check that the monitoring namespace has been deleted. Sometimes a resource gets stuck in updating state in APIServices. You can delete all the APIServices that are monitoring APIServices. The same must be done with CRD (Custom Resource Definitios). In addition, some webhooks can be left unlocked. This can be checked and deleted with the following commands:
kubectl get validatingwebhookconfiguration -A --insecure-skip-tls-verify=true # kubectl delete validatingwebhookconfiguration/rancher-monitoring-admission --insecure-skip-tls-verify=true kubectl get mutatingwebhookconfiguration -A --insecure-skip-tls-verify=true # kubectl delete mutatingwebhookconfiguration/rancher-monitoring-admission --insecure-skip-tls-verify=true