Parcourir la source

Tune default thresholds for the nstat_time_squeeze based alerts

Current thresholds don't matche real warning/minor values for
the time_squeeze numbers. As a result we have false positive.

Change-Id: I6990c101fe671c05d75d0640fd6799667b5f3fa1
Related-PROD: PROD-24406 (PROD:24406)
pull/173/head
Ildar Svetlov il y a 6 ans
Parent
révision
4be6e49698
2 fichiers modifiés avec 6 ajouts et 6 suppressions
  1. +2
    -2
      linux/map.jinja
  2. +4
    -4
      linux/meta/prometheus.yml

+ 2
- 2
linux/map.jinja Voir le fichier

@@ -431,8 +431,8 @@ Debian:
'warn': 5,
},
'net_rx_action_per_cpu_threshold': {
'warning': '0',
'minor': '100'
'warning': '500',
'minor': '5000'
},
'packets_dropped_per_cpu_threshold': {
'minor': '0',

+ 4
- 4
linux/meta/prometheus.yml Voir le fichier

@@ -234,24 +234,24 @@ server:
{%- endraw %}
{%- set net_rx_action_warning_threshold = monitoring.net_rx_action_per_cpu_threshold.warning %}
if: >-
floor(increase(nstat_time_squeeze[24h])) > {{ net_rx_action_warning_threshold }}
floor(increase(nstat_time_squeeze[1d])) > {{ net_rx_action_warning_threshold }}
labels:
severity: warning
service: system
annotations:
summary: "CPU terminated {{ net_rx_action_warning_threshold }}{%- raw %} net_rx_action loops"
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours."
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours. Modify the net.core.netdev_budget kernel parameter."
NetRxActionByCpuMinor:
{%- endraw %}
{%- set net_rx_action_minor_threshold = monitoring.net_rx_action_per_cpu_threshold.minor %}
if: >-
floor(increase(nstat_time_squeeze[24h])) > {{ net_rx_action_minor_threshold }}
floor(increase(nstat_time_squeeze[1d])) > {{ net_rx_action_minor_threshold }}
labels:
severity: minor
service: system
annotations:
summary: "CPU terminated {{ net_rx_action_minor_threshold }}{%- raw %} net_rx_action loops"
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours."
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours. Modify the net.core.netdev_budget kernel parameter."
{%- endraw %}
{%- if monitoring.bond_status.interfaces is defined and monitoring.bond_status.interfaces %}
{%- raw %}

Chargement…
Annuler
Enregistrer