Mateusz Matuszkowiak
ee7c76af8b
Enable nstat input plugin for softnet_stat data
Since we added to nstat's telegraf plugin the possibility
to collect data from `/proc/net/softnet_stat` regarding
dropped packets and rx_net_action a.k.a time squeeze, we need to enable
it globally on all hosts.
Also grafana dashboard update to include new graphs + added four
new Prometheus alers.
Related-Bug: PROD-21090
Change-Id: I9dfe87bdc8b677a51e3f305dd3c75c7d4cc4e0d4
vor 6 Jahren
mkobus
f546f9582f
Revert "Add monitoring for cron job"
As we resign to develop full-stack solution to monitor cron jobs
This reverts commit 697ce4bf04
.
Change-Id: Icab6008011141bb658c836897a05018dd6ce2984
vor 6 Jahren
Michal Kobus
697ce4bf04
Add monitoring for cron job
Change-Id: I710b65decf6697d0bb5d21fc3fc2d332b78119c5
Closes-bug: PROD-21073
vor 6 Jahren
Michal Kobus
97242f156a
Cosmetic changes for alerts
Change-Id: I9e8464e3ee5ef28ca5eb7eb84e645e42fb6576cd
Closes-bug: PROD-20466
vor 6 Jahren
Michal Kobus
d40d0f1e24
Alerts reworked
Change alerts names, severity and descriptions.
Closes-bug: PROD-19718
Change-Id: I238fbcd51cf48389b504ccb531ba9b2bc9dd4be6
vor 6 Jahren
Mateusz Matuszkowiak
734ab84c19
Added one more alert regarding bond
Partial-Bug: PROD-16264
Change-Id: I4f548a95bfb83076301f4669c1ff662c213c4aa3
vor 6 Jahren
Mateusz Matuszkowiak
55ca321447
Added bond related Prometheus alerts
Change-Id: Ic3c3186f42762062a65d340010b0ebff40f7c577
Partial-Bug: PROD-16264
vor 6 Jahren
Bartosz Kupidura
6616077674
Generate metrics from logs
Change-Id: I5a8ccb235d36c1b4115794904f373a5704c2296d
vor 7 Jahren
Kirill Mashchenko
01ad2ccdce
Increase disk issues timeout for alerts
Change-Id: I646a852be587598ff0866e5941d954a6ac1fdd08
vor 7 Jahren
Kirill Mashchenko
f2a380d42a
Reduce alerting noise for system disk issues
Change-Id: I4fb69e8defa44a9d92a9fb7c23a6280fffc1a3e9
vor 7 Jahren
Szymon Bańka
a0dd1737af
Fix SystemDiskInodesTooLow alert
Change-Id: I715f78983c69084c81d4efd4a5625d5dfe0f276f
vor 7 Jahren
Ramon Melero
14ef04f504
Adds alert to warn for open files being depleted
Change-Id: I87d132ce6473715b0992e561b2855456f24bcb3b
vor 7 Jahren
Dmitry Kalashnik
2dd3b450d5
Raise severity for System(Tx,Rx)PacketsDroppedTooHigh
Raise severity from warning to critical
Partial-Bug: PROD-15203
Change-Id: I32f19b5520bc200d61280da57f4ab5842b060454
vor 7 Jahren
Bartosz Kupidura
652ed7ced6
Remove SwapUsed alert
Change-Id: I67531b6ad15a2e96ee05178f17aae2504b3362bf
vor 7 Jahren
Simon Pasquier
b9d6e99ca1
Add alerts on disk full
The other disk alerts use predict_linear() to trigger before a disk gets
full but they don't trigger when the disk is effectively (or nearly)
full.
Change-Id: I8e6794d35bf96378ca3e3d527db4315d2b3a868d
vor 7 Jahren
Simon Pasquier
1483c5b3d3
Add a critical alert on low memory
Change-Id: I1c8e752de9ad3479da830706ae736df6846b977f
vor 7 Jahren
Simon Pasquier
c462fdfe27
Fix typos in linux/meta/prometheus.yml
Change-Id: Ia7df4918732ce8fcf28b1d6eed629073146a567c
vor 7 Jahren
Simon Pasquier
db768fb47c
Fix Prometheus alerts on dropped packets
Since metrics on dropped packets are counters, the alerts should use
the rate() function. This change also fixes some inconsistencies in the
alert descriptions.
Change-Id: I9abbc0a49f45ba760836c436a3e7e65aa62f652e
vor 7 Jahren
Simon Pasquier
c7b79ad6b4
Rename Prometheus alerts for consistency
Change-Id: I1cc00b41a6a1774d1401a9f71ab4c6364c65d139
vor 7 Jahren
Olivier Bourdon
0723131ffd
Fix linux/meta/prometheus.yml for the CI
Change-Id: Idc73c152a0e71d5ac2a8c10f46c955755d8e77ae
vor 7 Jahren
Simon Pasquier
9083abf8a3
Add monitoring of the swap usage
This change adds the Telegraf configuration to collect swap metrics, the
associated Prometheus alarms and graphs to the Grafana dashboard.
Change-Id: I3595fd0b8cab06215c620642da69dd29c398396a
vor 7 Jahren
Simon Pasquier
4d290b5eec
Add Prometheus alerts for dropped packets
Change-Id: If50f18367b22338b3fba1ff15902d557a0bdf2ea
vor 7 Jahren
Simon Pasquier
d32688e7aa
Reword Prometheus alert messages
Change-Id: I54e02e0741d53ec7b2335145dc968b7b8c8f5e00
vor 7 Jahren
Bartosz Kupidura
d8b54c95da
Add variables in prometheus alerts
Change-Id: I1765fc6aa4a8c3da25330f19bb043ddbf548b9ad
vor 7 Jahren
Bartosz Kupidura
0bd8565876
Add support for prometheus
Change-Id: I66576b4ed40ef160c5f13747a908f018f252b6b4
vor 7 Jahren