Production-ready: Configure Observability

In **3scale SaaS** we have been using successfully limitador for a couple of years together with Redis, to protect all our public endpoints. However:
- We are using an old image community image
- Yamls are  managed individually via ArgoCD

We would like to update how we manage limitador application, and use the most recommended limitador setup using limitador-operator, with a **production-ready** grade.


### Current limitador-operator (at least the version `0.4.0` that we use):
- Provides a few prometheus metrics in the HTTP port 
- Do not create a prometheus PodMonitor by default
- Do not create a GrafanaDashboard by default
- Do not permit to create a prometetheus PodMonitor via CR
- Do not permit to create a GrafanaDashboard via CR

### Desired features:
- Permit to create a prometheus PodMonitor via CR 
- Permit to create a GrafanaDashboard via CR 
- Being observability something optional, might not be enabled by default

### 3scale SaaS specific example
Example of the PodMonitor used in 3scale SaaS production to manage between 3,500 and 5,500 requests/second with 3 limitador pods (selector labels need to coincide with the labels managed right now by limitador-operator):

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: limitador
spec:
  podMetricsEndpoints:
    - interval: 30s
      path: /metrics
      port: http
      scheme: http
  selector:
    matchLabels:
      app.kubernetes.io/name: limitador
```


### Possible CR config

Both PodMonitor and GrafanaDashboard should be able to be customized via CR, but use default sane values if they are enabled, so you dont need to provide all the config if you dont want, and want to trust on defaults.

- PodMonitor possible customization:
  - enabled: true/false
  - interval: how often prometheus-operator will scrape limitador pods (have an impact on prometheus memory/timeseries database sizes)
  - labelSelector: sometimes prometheus-operator is configured to scrape PodMonitors/ServiceMonitors with specific label selectors
- GrafanaOperator possible customization:
  - enabled: true/false
  - labelSelector:  sometimes grafana-operator is configured to scrape GrafanaDashboards with specific label selectors
```yaml
apiVersion: limitador.kuadrant.io/v1alpha1
kind: Limitador
metadata:
  name: limitador-sample
spec:
  podMonitor:
    enabled: true  # by default it is false, so does not create a PodMonitor
    interval: 30s # by default it is 30 if not defined
    labelSelector: XX ## by default not define any label/selector
    ...  ## maybe in the future permit to override more PodMonitor fields if needed, dont think anymore is needed by now
  grafanaDashboard:
    enabled: true
    labelSelector: XX ## by default not define any label/selector
```

The initial dashboard would be provided by us initially (3scale SRE), can be embedded into operator as an asset, like done with 3scale-operator.

Current Dashboard screenshots including limitador metrics by `limitador_namespace` (the app being limited), and also pods, resources cpu/mem/net metrics:

![image](https://github.com/Kuadrant/limitador-operator/assets/41513123/693fea2e-1e91-4186-b44b-81e0f85e1798)
![image](https://github.com/Kuadrant/limitador-operator/assets/41513123/e385e8c1-063f-45d1-9fc0-40356a6ff247)
![image](https://github.com/Kuadrant/limitador-operator/assets/41513123/e9c03bf8-895d-4087-b509-8200c871180c)



### PrometheusRules (aka prometheus alerts)
Regarding PrometheusRules (prometheus alerts), my advise is to not embed them into the operator, but provide in the repo a yaml with an example of possible alerts that can be deployed, tuned... by the app administrator if needed.

Example:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: limitador
spec:
  groups:
    - name: limitador.rules
      rules:
        - alert: LimitadorJobDown
          annotations:
            message: Prometheus Job {{ $labels.job }} on {{ $labels.namespace }} is DOWN
          expr: up{job=~".*limitador.*"} == 0
          for: 5m
          labels:
            severity: critical

        - alert: LimitadorPodDown
          annotations:
            message: Limitador pod {{ $labels.pod }} on {{ $labels.namespace }} is DOWN
          expr: limitador_up == 0
          for: 5m
          labels:
            severity: critical
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production-ready: Configure Observability #77

Current limitador-operator (at least the version `0.4.0` that we use):

Desired features:

3scale SaaS specific example

Possible CR config

PrometheusRules (aka prometheus alerts)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Production-ready: Configure Observability #77

Description

Current limitador-operator (at least the version 0.4.0 that we use):

Desired features:

3scale SaaS specific example

Possible CR config

PrometheusRules (aka prometheus alerts)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Current limitador-operator (at least the version `0.4.0` that we use):