Michael Fladischer
1e41e3065d
Use items() instead of iteritems() for Python3 compatibility.
iteritems() was dropped in recent Python3 releases and items() is compatible
with Python 2.7.
6 vuotta sitten
Mateusz Matuszkowiak
03538a8530
Change regex for hdd errors to be more strict
Its possible for fluentd to match and report false positives with
current regex for hdd errors. The following log example line:
failed to deactivate service binding for container
jenkins_slave02.1.tijvdstzxrs6gikbwrtu85078" error=
can be catched by the regex and report about the (false positive)
issue will be sent to prometheus. So the new regex must be more strict,
in order to avoid such alerts.
Change-Id: Ieb27ca39a32ad7bf6e1d0e88d564405e460a4f5f
Closes-Bug: PROD-17883
6 vuotta sitten
Bartosz Kupidura
6616077674
Generate metrics from logs
Change-Id: I5a8ccb235d36c1b4115794904f373a5704c2296d
7 vuotta sitten
Kirill Mashchenko
01ad2ccdce
Increase disk issues timeout for alerts
Change-Id: I646a852be587598ff0866e5941d954a6ac1fdd08
7 vuotta sitten
Kirill Mashchenko
f2a380d42a
Reduce alerting noise for system disk issues
Change-Id: I4fb69e8defa44a9d92a9fb7c23a6280fffc1a3e9
7 vuotta sitten
Filip Pytloun
36200a423f
Fix meta/salt.yml to workaround broken formulas
Change-Id: I6b6fbaaebd3e349bf76aa05cb9eb2004a842d9c5
7 vuotta sitten
Bartosz Kupidura
3852f9cffc
Move fluentd config under agent role
Change-Id: I22e7e4713e20f6a0f79c5ab4b3066f1f0129feb0
7 vuotta sitten
sgudz
f73b92fddf
Fix for disabled repos
Change-Id: Icbbd64144e6619eaa56e02c6c9362c7bcad9dd96
7 vuotta sitten
Bartosz Kupidura
f2706bc09a
Fix pos_file location
Change-Id: Ifa2787d76cf18e29f54583046349c071c5e9a25e
7 vuotta sitten
Bartosz Kupidura
19330f5e9e
Add fluentd support
Change-Id: I64a93135daebe7d55430adc51de2c9186c7a5ad7
7 vuotta sitten
Szymon Bańka
a0dd1737af
Fix SystemDiskInodesTooLow alert
Change-Id: I715f78983c69084c81d4efd4a5625d5dfe0f276f
7 vuotta sitten
Ramon Melero
14ef04f504
Adds alert to warn for open files being depleted
Change-Id: I87d132ce6473715b0992e561b2855456f24bcb3b
7 vuotta sitten
Dmitry Kalashnik
2dd3b450d5
Raise severity for System(Tx,Rx)PacketsDroppedTooHigh
Raise severity from warning to critical
Partial-Bug: PROD-15203
Change-Id: I32f19b5520bc200d61280da57f4ab5842b060454
7 vuotta sitten
Bartosz Kupidura
652ed7ced6
Remove SwapUsed alert
Change-Id: I67531b6ad15a2e96ee05178f17aae2504b3362bf
7 vuotta sitten
Serhiy Ovsianikov
67bd56a83c
Add atop
Change-Id: I59297736406469e5314236cb40851d9a6f94386e
7 vuotta sitten
Simon Pasquier
0ab8d27812
Update Telegraf config to ignore aufs partitions
Change-Id: I94f09359f976ccd0f207277da52d20e659b36a69
7 vuotta sitten
Simon Pasquier
b9d6e99ca1
Add alerts on disk full
The other disk alerts use predict_linear() to trigger before a disk gets
full but they don't trigger when the disk is effectively (or nearly)
full.
Change-Id: I8e6794d35bf96378ca3e3d527db4315d2b3a868d
7 vuotta sitten
Ales Komarek
7a7ddfbf8f
Fixed the dns records grain
Change-Id: I574c6e1a31f71502eb279cdc3c5768ee483d73fa
7 vuotta sitten
Ales Komarek
417e8c5cdb
Allow mining for the dns records for local hosts records
Change-Id: I8f2a66c6edafc425794d7cedc8b9217df7ee5951
7 vuotta sitten
Jaymes Mosher
a2c295dc68
Add bond member status monitoring.
Pillar values:
linux.monitoring.bond_status.interfaces = [ 'bond0', 'all', 'etc' ]
Leave bond_status.interfaces undefined to disable (default).
Depends-On: Ia07d4c473bf64d98170f51599caaedb46645ede3
Change-Id: I62a7d59251d37cb6c7fc7b761f63a5599930f1dc
7 vuotta sitten
Simon Pasquier
05a8fd2bb1
Don't collect metrics from overlay filesystems
This is typically used to mount Docker containers but it generates too
many volatile metrics which aren't useful.
Change-Id: I00117895570515b2c8f9690542e83061309464c3
7 vuotta sitten
Simon Pasquier
1483c5b3d3
Add a critical alert on low memory
Change-Id: I1c8e752de9ad3479da830706ae736df6846b977f
7 vuotta sitten
Simon Pasquier
c462fdfe27
Fix typos in linux/meta/prometheus.yml
Change-Id: Ia7df4918732ce8fcf28b1d6eed629073146a567c
7 vuotta sitten
Bartosz Kupidura
3d2af0c43f
Don't collect metrics from 'virtual' filesystems
Change-Id: I456ed02ad54a9b55486b4c4a61c9cebfb8f28613
7 vuotta sitten
Bartosz Kupidura
d2c6bc323a
Disable not used metrics exposed per cpu
Change-Id: Ie3f9da382c23148836e4a20ff0f37c3929e062cf
7 vuotta sitten
Simon Pasquier
db768fb47c
Fix Prometheus alerts on dropped packets
Since metrics on dropped packets are counters, the alerts should use
the rate() function. This change also fixes some inconsistencies in the
alert descriptions.
Change-Id: I9abbc0a49f45ba760836c436a3e7e65aa62f652e
7 vuotta sitten
Simon Pasquier
c7b79ad6b4
Rename Prometheus alerts for consistency
Change-Id: I1cc00b41a6a1774d1401a9f71ab4c6364c65d139
7 vuotta sitten
Olivier Bourdon
0723131ffd
Fix linux/meta/prometheus.yml for the CI
Change-Id: Idc73c152a0e71d5ac2a8c10f46c955755d8e77ae
7 vuotta sitten
Jaymes Mosher
aa2a52cf9b
Scratch using interfaces_override
7 vuotta sitten
Jaymes Mosher
603e62ab9e
Keep regex as default but still allow overrides.
7 vuotta sitten
Simon Pasquier
9083abf8a3
Add monitoring of the swap usage
This change adds the Telegraf configuration to collect swap metrics, the
associated Prometheus alarms and graphs to the Grafana dashboard.
Change-Id: I3595fd0b8cab06215c620642da69dd29c398396a
7 vuotta sitten
Jaymes Mosher
cf6dbf1d6a
Use Pillar to chose which interfaces to monitor.
The `linux_netlink.ls` function used a regex to choose which interfaces
to collect metric for.
`_alphanum_re = re.compile(r'^[a-z0-9]+$')`
Unfortunately, by default this excludes vlan and tap interfaces, which
are kind of important. ie `bond0.120` or `tap2a3dab86-fb`.
We also have a problem where even if we update the regex to include
these interfaces... if someone deletes and spawns a new instance then
the tap device name changes on the compute host, which will not be
monitored unless someone re-runs the `collectd` on the compute again.
Less than ideal.
This commit lets us choose `VerboseInterface "all"` using Pillar data
to avoid this problem.
7 vuotta sitten
Simon Pasquier
4d290b5eec
Add Prometheus alerts for dropped packets
Change-Id: If50f18367b22338b3fba1ff15902d557a0bdf2ea
7 vuotta sitten
Simon Pasquier
d32688e7aa
Reword Prometheus alert messages
Change-Id: I54e02e0741d53ec7b2335145dc968b7b8c8f5e00
7 vuotta sitten
Ales Komarek
02f35a537c
Graph metadata
Change-Id: If0ee6f1ac5ab697559fcd853225e1520de2e8c1c
7 vuotta sitten
Simon Pasquier
234e14acda
Add Grafana dashboard for Prometheus datasource
Change-Id: Icacb0ca22a34f1ff438a895700040563d250bac9
7 vuotta sitten
Simon Pasquier
b1813426dc
Enable kernel, net and process metrics for Telegraf
Change-Id: I008818853c2058746be08365283b949177efa254
Depends-On: I3c3c569a013aff8c3ab8e46cffb93a60d74ddf09
7 vuotta sitten
Swann Croiset
d66a782570
Enable diskio input telegraf plugin
Change-Id: I80193afad1842f67967d1bab164f049078e3cd75
7 vuotta sitten
Erick Cantwell
e5770ac50f
[MMO-132] Check the length of the dict, instead of if it's defined (it
will always be defined since the default is an empty dict)
7 vuotta sitten
Filip Pytloun
ea11327afe
Fix grains generation when linux_netlink.ls is not available
Change-Id: Id4b0b405872457bd8b20f450e4031d6808d3cf59
7 vuotta sitten
Filip Pytloun
e70606d0d2
Manage grains using support metadata
Change-Id: I25fb0eb0d4b922b8853eceb0c1c220a4040e1704
7 vuotta sitten
Bartosz Kupidura
d8b54c95da
Add variables in prometheus alerts
Change-Id: I1765fc6aa4a8c3da25330f19bb043ddbf548b9ad
7 vuotta sitten
Damian Szeluga
1e47abe149
Add option to parametrize checks
7 vuotta sitten
Bartosz Kupidura
0bd8565876
Add support for prometheus
Change-Id: I66576b4ed40ef160c5f13747a908f018f252b6b4
7 vuotta sitten
Bartosz Kupidura
df9b40d973
Add telegraf support
Change-Id: I03bed44bafdebbcd22f487e59ef0de45dfbf3463
7 vuotta sitten
Simon Pasquier
a4a6f16bbe
Fix severity for the linux_system_cpu_warning alarm
Change-Id: Ic3a1e77f2d38c5d916dd3c07211a6ea160559e6f
7 vuotta sitten
Simon Pasquier
89b97640d0
Report swap metrics in bytes
Change-Id: Ic39fa0f18e0d9aeca0ef73ae6d985d12d15a1c3a
7 vuotta sitten
vmikes
37837f3280
Revert "turn off check swap if needed"
This reverts commit a63f4053f3
.
8 vuotta sitten
vmikes
a63f4053f3
turn off check swap if needed
8 vuotta sitten
Éric Lemoine
6d6f5b4c00
Remove support for log_collector
The support for collecting syslog is going to be moved to the rsyslog
formula.
8 vuotta sitten