Quellcode durchsuchen

Tune default thresholds for the nstat_time_squeeze based alerts

Current thresholds don't matche real warning/minor values for
the time_squeeze numbers. As a result we have false positive.

Change-Id: I6990c101fe671c05d75d0640fd6799667b5f3fa1
Related-PROD: PROD-24406 (PROD:24406)
pull/173/head
Ildar Svetlov vor 6 Jahren
Ursprung
Commit
4be6e49698
2 geänderte Dateien mit 6 neuen und 6 gelöschten Zeilen
  1. +2
    -2
      linux/map.jinja
  2. +4
    -4
      linux/meta/prometheus.yml

+ 2
- 2
linux/map.jinja Datei anzeigen

'warn': 5, 'warn': 5,
}, },
'net_rx_action_per_cpu_threshold': { 'net_rx_action_per_cpu_threshold': {
'warning': '0',
'minor': '100'
'warning': '500',
'minor': '5000'
}, },
'packets_dropped_per_cpu_threshold': { 'packets_dropped_per_cpu_threshold': {
'minor': '0', 'minor': '0',

+ 4
- 4
linux/meta/prometheus.yml Datei anzeigen

{%- endraw %} {%- endraw %}
{%- set net_rx_action_warning_threshold = monitoring.net_rx_action_per_cpu_threshold.warning %} {%- set net_rx_action_warning_threshold = monitoring.net_rx_action_per_cpu_threshold.warning %}
if: >- if: >-
floor(increase(nstat_time_squeeze[24h])) > {{ net_rx_action_warning_threshold }}
floor(increase(nstat_time_squeeze[1d])) > {{ net_rx_action_warning_threshold }}
labels: labels:
severity: warning severity: warning
service: system service: system
annotations: annotations:
summary: "CPU terminated {{ net_rx_action_warning_threshold }}{%- raw %} net_rx_action loops" summary: "CPU terminated {{ net_rx_action_warning_threshold }}{%- raw %} net_rx_action loops"
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours."
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours. Modify the net.core.netdev_budget kernel parameter."
NetRxActionByCpuMinor: NetRxActionByCpuMinor:
{%- endraw %} {%- endraw %}
{%- set net_rx_action_minor_threshold = monitoring.net_rx_action_per_cpu_threshold.minor %} {%- set net_rx_action_minor_threshold = monitoring.net_rx_action_per_cpu_threshold.minor %}
if: >- if: >-
floor(increase(nstat_time_squeeze[24h])) > {{ net_rx_action_minor_threshold }}
floor(increase(nstat_time_squeeze[1d])) > {{ net_rx_action_minor_threshold }}
labels: labels:
severity: minor severity: minor
service: system service: system
annotations: annotations:
summary: "CPU terminated {{ net_rx_action_minor_threshold }}{%- raw %} net_rx_action loops" summary: "CPU terminated {{ net_rx_action_minor_threshold }}{%- raw %} net_rx_action loops"
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours."
description: "The {{ $labels.cpu }} CPU on the {{ $labels.host }} node terminated {{ $value }} net_rx_action loops during the last 24 hours. Modify the net.core.netdev_budget kernel parameter."
{%- endraw %} {%- endraw %}
{%- if monitoring.bond_status.interfaces is defined and monitoring.bond_status.interfaces %} {%- if monitoring.bond_status.interfaces is defined and monitoring.bond_status.interfaces %}
{%- raw %} {%- raw %}

Laden…
Abbrechen
Speichern