Skip to content

Production-ready: Configure Observability #77

@slopezz

Description

@slopezz

In 3scale SaaS we have been using successfully limitador for a couple of years together with Redis, to protect all our public endpoints. However:

  • We are using an old image community image
  • Yamls are managed individually via ArgoCD

We would like to update how we manage limitador application, and use the most recommended limitador setup using limitador-operator, with a production-ready grade.

Current limitador-operator (at least the version 0.4.0 that we use):

  • Provides a few prometheus metrics in the HTTP port
  • Do not create a prometheus PodMonitor by default
  • Do not create a GrafanaDashboard by default
  • Do not permit to create a prometetheus PodMonitor via CR
  • Do not permit to create a GrafanaDashboard via CR

Desired features:

  • Permit to create a prometheus PodMonitor via CR
  • Permit to create a GrafanaDashboard via CR
  • Being observability something optional, might not be enabled by default

3scale SaaS specific example

Example of the PodMonitor used in 3scale SaaS production to manage between 3,500 and 5,500 requests/second with 3 limitador pods (selector labels need to coincide with the labels managed right now by limitador-operator):

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: limitador
spec:
  podMetricsEndpoints:
    - interval: 30s
      path: /metrics
      port: http
      scheme: http
  selector:
    matchLabels:
      app.kubernetes.io/name: limitador

Possible CR config

Both PodMonitor and GrafanaDashboard should be able to be customized via CR, but use default sane values if they are enabled, so you dont need to provide all the config if you dont want, and want to trust on defaults.

  • PodMonitor possible customization:
    • enabled: true/false
    • interval: how often prometheus-operator will scrape limitador pods (have an impact on prometheus memory/timeseries database sizes)
    • labelSelector: sometimes prometheus-operator is configured to scrape PodMonitors/ServiceMonitors with specific label selectors
  • GrafanaOperator possible customization:
    • enabled: true/false
    • labelSelector: sometimes grafana-operator is configured to scrape GrafanaDashboards with specific label selectors
apiVersion: limitador.kuadrant.io/v1alpha1
kind: Limitador
metadata:
  name: limitador-sample
spec:
  podMonitor:
    enabled: true  # by default it is false, so does not create a PodMonitor
    interval: 30s # by default it is 30 if not defined
    labelSelector: XX ## by default not define any label/selector
    ...  ## maybe in the future permit to override more PodMonitor fields if needed, dont think anymore is needed by now
  grafanaDashboard:
    enabled: true
    labelSelector: XX ## by default not define any label/selector

The initial dashboard would be provided by us initially (3scale SRE), can be embedded into operator as an asset, like done with 3scale-operator.

Current Dashboard screenshots including limitador metrics by limitador_namespace (the app being limited), and also pods, resources cpu/mem/net metrics:

image
image
image

PrometheusRules (aka prometheus alerts)

Regarding PrometheusRules (prometheus alerts), my advise is to not embed them into the operator, but provide in the repo a yaml with an example of possible alerts that can be deployed, tuned... by the app administrator if needed.

Example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: limitador
spec:
  groups:
    - name: limitador.rules
      rules:
        - alert: LimitadorJobDown
          annotations:
            message: Prometheus Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
          expr: up{job=~".*limitador.*"} == 0
          for: 5m
          labels:
            severity: critical

        - alert: LimitadorPodDown
          annotations:
            message: Limitador pod {{ $labels.pod }} on {{ $labels.namespace }} is DOWN
          expr: limitador_up == 0
          for: 5m
          labels:
            severity: critical

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions