Current thresholds don't matche real warning/minor values for
the time_squeeze numbers. As a result we have false positive.
Change-Id: I6990c101fe671c05d75d0640fd6799667b5f3fa1
Related-PROD: PROD-24406 (PROD:24406)
Since we added to nstat's telegraf plugin the possibility
to collect data from `/proc/net/softnet_stat` regarding
dropped packets and rx_net_action a.k.a time squeeze, we need to enable
it globally on all hosts.
Also grafana dashboard update to include new graphs + added four
new Prometheus alers.
Related-Bug: PROD-21090
Change-Id: I9dfe87bdc8b677a51e3f305dd3c75c7d4cc4e0d4
The other disk alerts use predict_linear() to trigger before a disk gets
full but they don't trigger when the disk is effectively (or nearly)
full.
Change-Id: I8e6794d35bf96378ca3e3d527db4315d2b3a868d
Since metrics on dropped packets are counters, the alerts should use
the rate() function. This change also fixes some inconsistencies in the
alert descriptions.
Change-Id: I9abbc0a49f45ba760836c436a3e7e65aa62f652e
This change adds the Telegraf configuration to collect swap metrics, the
associated Prometheus alarms and graphs to the Grafana dashboard.
Change-Id: I3595fd0b8cab06215c620642da69dd29c398396a