Serhiy Ovsianikov
67bd56a83c
Add atop
Change-Id: I59297736406469e5314236cb40851d9a6f94386e
hace 7 años
Simon Pasquier
0ab8d27812
Update Telegraf config to ignore aufs partitions
Change-Id: I94f09359f976ccd0f207277da52d20e659b36a69
hace 7 años
Ales Komarek
7a7ddfbf8f
Fixed the dns records grain
Change-Id: I574c6e1a31f71502eb279cdc3c5768ee483d73fa
hace 7 años
Ales Komarek
417e8c5cdb
Allow mining for the dns records for local hosts records
Change-Id: I8f2a66c6edafc425794d7cedc8b9217df7ee5951
hace 7 años
Jaymes Mosher
a2c295dc68
Add bond member status monitoring.
Pillar values:
linux.monitoring.bond_status.interfaces = [ 'bond0', 'all', 'etc' ]
Leave bond_status.interfaces undefined to disable (default).
Depends-On: Ia07d4c473bf64d98170f51599caaedb46645ede3
Change-Id: I62a7d59251d37cb6c7fc7b761f63a5599930f1dc
hace 7 años
Simon Pasquier
05a8fd2bb1
Don't collect metrics from overlay filesystems
This is typically used to mount Docker containers but it generates too
many volatile metrics which aren't useful.
Change-Id: I00117895570515b2c8f9690542e83061309464c3
hace 7 años
Simon Pasquier
1483c5b3d3
Add a critical alert on low memory
Change-Id: I1c8e752de9ad3479da830706ae736df6846b977f
hace 7 años
Simon Pasquier
c462fdfe27
Fix typos in linux/meta/prometheus.yml
Change-Id: Ia7df4918732ce8fcf28b1d6eed629073146a567c
hace 7 años
Bartosz Kupidura
3d2af0c43f
Don't collect metrics from 'virtual' filesystems
Change-Id: I456ed02ad54a9b55486b4c4a61c9cebfb8f28613
hace 7 años
Bartosz Kupidura
d2c6bc323a
Disable not used metrics exposed per cpu
Change-Id: Ie3f9da382c23148836e4a20ff0f37c3929e062cf
hace 7 años
Simon Pasquier
db768fb47c
Fix Prometheus alerts on dropped packets
Since metrics on dropped packets are counters, the alerts should use
the rate() function. This change also fixes some inconsistencies in the
alert descriptions.
Change-Id: I9abbc0a49f45ba760836c436a3e7e65aa62f652e
hace 7 años
Simon Pasquier
c7b79ad6b4
Rename Prometheus alerts for consistency
Change-Id: I1cc00b41a6a1774d1401a9f71ab4c6364c65d139
hace 7 años
Olivier Bourdon
0723131ffd
Fix linux/meta/prometheus.yml for the CI
Change-Id: Idc73c152a0e71d5ac2a8c10f46c955755d8e77ae
hace 7 años
Jaymes Mosher
aa2a52cf9b
Scratch using interfaces_override
hace 7 años
Jaymes Mosher
603e62ab9e
Keep regex as default but still allow overrides.
hace 7 años
Simon Pasquier
9083abf8a3
Add monitoring of the swap usage
This change adds the Telegraf configuration to collect swap metrics, the
associated Prometheus alarms and graphs to the Grafana dashboard.
Change-Id: I3595fd0b8cab06215c620642da69dd29c398396a
hace 7 años
Jaymes Mosher
cf6dbf1d6a
Use Pillar to chose which interfaces to monitor.
The `linux_netlink.ls` function used a regex to choose which interfaces
to collect metric for.
`_alphanum_re = re.compile(r'^[a-z0-9]+$')`
Unfortunately, by default this excludes vlan and tap interfaces, which
are kind of important. ie `bond0.120` or `tap2a3dab86-fb`.
We also have a problem where even if we update the regex to include
these interfaces... if someone deletes and spawns a new instance then
the tap device name changes on the compute host, which will not be
monitored unless someone re-runs the `collectd` on the compute again.
Less than ideal.
This commit lets us choose `VerboseInterface "all"` using Pillar data
to avoid this problem.
hace 7 años
Simon Pasquier
4d290b5eec
Add Prometheus alerts for dropped packets
Change-Id: If50f18367b22338b3fba1ff15902d557a0bdf2ea
hace 7 años
Simon Pasquier
d32688e7aa
Reword Prometheus alert messages
Change-Id: I54e02e0741d53ec7b2335145dc968b7b8c8f5e00
hace 7 años
Ales Komarek
02f35a537c
Graph metadata
Change-Id: If0ee6f1ac5ab697559fcd853225e1520de2e8c1c
hace 7 años
Simon Pasquier
234e14acda
Add Grafana dashboard for Prometheus datasource
Change-Id: Icacb0ca22a34f1ff438a895700040563d250bac9
hace 7 años
Simon Pasquier
b1813426dc
Enable kernel, net and process metrics for Telegraf
Change-Id: I008818853c2058746be08365283b949177efa254
Depends-On: I3c3c569a013aff8c3ab8e46cffb93a60d74ddf09
hace 7 años
Swann Croiset
d66a782570
Enable diskio input telegraf plugin
Change-Id: I80193afad1842f67967d1bab164f049078e3cd75
hace 7 años
Erick Cantwell
e5770ac50f
[MMO-132] Check the length of the dict, instead of if it's defined (it
will always be defined since the default is an empty dict)
hace 7 años
Filip Pytloun
ea11327afe
Fix grains generation when linux_netlink.ls is not available
Change-Id: Id4b0b405872457bd8b20f450e4031d6808d3cf59
hace 7 años
Filip Pytloun
e70606d0d2
Manage grains using support metadata
Change-Id: I25fb0eb0d4b922b8853eceb0c1c220a4040e1704
hace 7 años
Bartosz Kupidura
d8b54c95da
Add variables in prometheus alerts
Change-Id: I1765fc6aa4a8c3da25330f19bb043ddbf548b9ad
hace 7 años
Damian Szeluga
1e47abe149
Add option to parametrize checks
hace 7 años
Bartosz Kupidura
0bd8565876
Add support for prometheus
Change-Id: I66576b4ed40ef160c5f13747a908f018f252b6b4
hace 7 años
Bartosz Kupidura
df9b40d973
Add telegraf support
Change-Id: I03bed44bafdebbcd22f487e59ef0de45dfbf3463
hace 7 años
Simon Pasquier
a4a6f16bbe
Fix severity for the linux_system_cpu_warning alarm
Change-Id: Ic3a1e77f2d38c5d916dd3c07211a6ea160559e6f
hace 7 años
Simon Pasquier
89b97640d0
Report swap metrics in bytes
Change-Id: Ic39fa0f18e0d9aeca0ef73ae6d985d12d15a1c3a
hace 7 años
vmikes
37837f3280
Revert "turn off check swap if needed"
This reverts commit a63f4053f3
.
hace 8 años
vmikes
a63f4053f3
turn off check swap if needed
hace 8 años
Éric Lemoine
6d6f5b4c00
Remove support for log_collector
The support for collecting syslog is going to be moved to the rsyslog
formula.
hace 8 años
Guillaume Thouvenin
b4f82c6013
Put Grafana dashboards into their own directory
hace 8 años
Éric Lemoine
2f06db9e6d
Add more alarms
This commit adds more built-in alarms to the Linux formula.
hace 8 años
Guillaume Thouvenin
e29d0a4f77
Provides Grafana dashboard
hace 8 años
Swann Croiset
210e98304e
Redefine alerting property
The alerting property can be one of 'disabled', 'enabled' or
'enabled_with_notification'
hace 8 años
Simon Pasquier
8db94b38f4
Fix Syslog pattern for system logs
Currently Syslog doesn't log the priority ('<PRI>').
hace 8 años
Simon Pasquier
e877605126
Add timezone support for system logs
hace 8 años
Éric Lemoine
1787f0b297
Rename netlink.py to linux_netlink.py
This is to comply to a comment from @cznewt in
https://github.com/tcpcloud/salt-formula-heka/pull/24 .
hace 8 años
Éric Lemoine
1c39744e43
Use netlink collectd plugin instead of interface
This patch replaces the "interface" collectd plugin by the "netlink" one. The
"netlink" plugin provides the same metrics as "interface" but plus other
metrics such as the number of dropped packets.
hace 8 años
Éric Lemoine
3035609caf
Remove Heka decoder tz handling
This is now handled by the Heka formula the same way for all the Heka sandbox
decoders. https://github.com/tcpcloud/salt-formula-heka/pull/20
hace 8 años
Adam Tengler
599068289d
Orchestration metadata
hace 8 años
Simon Pasquier
318ebd1569
Remove the log counter filter from meta/heka
This filter should be configured by the heka formula itself.
hace 8 años
Ales Komarek
480003965f
Sample alarms
hace 8 años
Éric Lemoine
b87ccd327d
Add timezone to syslog decoder config
hace 8 años
Éric Lemoine
bf02e9dede
Use the proper module directory
The stacklight module dir is /usr/share/lma_collector/common, not
/usr/share/lma_collector_modules. This fixes it.
hace 8 años
Éric Lemoine
1a1f375498
Set "hostname" in the linux_hdd_errors|counters filters
hace 8 años