@@ -8,139 +8,259 @@ description: GreptimeDB 触发器概述。
88Trigger 允许用户基于 SQL 语句定义触发规则,GreptimeDB 根据这些触发规则进行周期性
99计算,当满足条件后对外发出通知。
1010
11- 本篇文档的下述内容通过一个示例来展示如何使用 Trigger 监控系统负载并触发告警。
12- 如果想了解如何撰写 Trigger 的具体语法,请参考[ 语法] ( /reference/sql/trigger-syntax.md ) 文档。
11+ ## 关键特性
1312
14- # 快速入门示例
13+ - ** SQL 原生** :用 SQL 定义触发规则,复用 GreptimeDB 内置函数,无需额外学习成本
14+ - ** 多阶段状态管理** :内置 pending / firing / inactive 状态机,防止抖动和重复
15+ 通知
16+ - ** 丰富的上下文** :自定义 labels 和 annotations,并自动注入查询结果字段,便于精
17+ 准定位根因
18+ - ** 生态友好** :告警负载完全兼容 Prometheus Alertmanager,可直接使用其分组、抑制、
19+ 静默和路由功能
1520
16- 本节将通过一个端到端示例展示如何使用触发器监控系统负载并触发告警。
21+ ## 快速入门示例
1722
18- 下图展示了该示例的完整端到端工作流程 。
23+ 本节通过端到端示例展示如何监控系统负载( ` load1 ` ),当负载超过阈值时触发告警 。
1924
20- ![ 触发器演示架构 ] ( /trigger-demo-architecture.png )
25+ 在本示例中你将:
2126
22- 1 . Vector 持续采集主机指标并写入 GreptimeDB。
23- 2 . GreptimeDB 中的 Trigger 每分钟评估规则;当条件满足时,会向 Alertmanager 发送
24- 通知。
25- 3 . Alertmanager 依据自身配置完成告警分组、抑制及路由,最终通过 Slack 集成将消息
26- 发送至指定频道。
27+ - 创建指标表:建立 ` load1 ` 表存储主机负载指标
28+ - 定义 Trigger:通过 SQL 设定触发条件,配置 labels、annotations 及通知方式
29+ - 模拟数据写入:依次注入正常与异常的负载数据,激活告警逻辑
30+ - 观察状态变化:实时查看告警实例从 pending → firing → inactive 的完整生命周期
2731
28- ## 使用 Vector 采集主机指标
32+ ### 1. 创建数据表
2933
30- 首先,使用 Vector 采集本机的负载数据,并将数据写入 GreptimeDB 中。Vector 的配置
31- 示例如下所示:
34+ 使用 MySQL 客户端连接 GreptimeDB,并创建 ` load1 ` 表:
3235
33- ``` toml
34- [sources .in ]
35- type = " host_metrics"
36- scrape_interval_secs = 15
36+ ``` sql
37+ CREATE TABLE `load1 ` (
38+ host STRING,
39+ load1 FLOAT32,
40+ ts TIMESTAMP TIME INDEX
41+ ) WITH (' append_mode' = ' true' );
42+ ```
43+
44+ ### 2. 创建 Trigger
3745
38- [sinks .out ]
39- inputs = [" in" ]
40- type = " greptimedb"
41- endpoint = " localhost:4001"
46+ 使用 MySQL 客户端连接 GreptimeDB,并创建 ` load1_monitor ` Trigger:
47+
48+ ``` sql
49+ CREATE TRIGGER IF NOT EXISTS ` load1_monitor`
50+ ON (
51+ SELECT
52+ host AS label_host,
53+ avg (load1) AS avg_load1,
54+ max (ts) AS ts
55+ FROM public .load1
56+ WHERE ts >= NOW() - ' 1 minutes' ::INTERVAL
57+ GROUP BY host
58+ HAVING avg (load1) > 10
59+ ) EVERY ' 1 minutes' ::INTERVAL
60+ FOR ' 3 minutes' ::INTERVAL
61+ KEEP FIRING FOR ' 3 minutes' ::INTERVAL
62+ LABELS (severity= warning)
63+ ANNOTATIONS (comment= ' Your computer is smoking, should take a break.' )
64+ NOTIFY(
65+ WEBHOOK alert_manager URL ' http://localhost:9093' WITH (timeout= ' 1m' )
66+ );
4267```
4368
44- GreptimeDB 会在数据写入的时候自动创建表,其中,` host_load1 ` 表记录了 load1 数据,
45- load1 是衡量系统活动的关键性能指标。我们可以创建监控规则来跟踪此表中的值。表结构
46- 如下所示:
69+ 该 Trigger 每分钟运行一次,计算过去 60 秒内每台主机的平均负载,并为每个满足
70+ ` avg(load1) > 10 ` 的主机生成告警实例。
71+
72+ 关键参数说明:
73+
74+ - ** FOR** :指定条件需要持续多久才会进入 firing 状态
75+ - ** KEEP FIRING FOR** :指定条件不再满足后,告警实例在 firing 状态保持多久
76+
77+ 详见 [ Trigger 语法] ( /reference/sql/trigger-syntax.md ) 。
78+
79+ ### 3. 查看 Trigger 状态
80+
81+ #### 列出所有 Trigger
4782
4883``` sql
49- + -- ---------+----------------------+------+------+---------+---------------+
50- | Column | Type | Key | Null | Default | Semantic Type |
51- + -- ---------+----------------------+------+------+---------+---------------+
52- | ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
53- | collector | String | PRI | YES | | TAG |
54- | host | String | PRI | YES | | TAG |
55- | val | Float64 | | YES | | FIELD |
56- + -- ---------+----------------------+------+------+---------+---------------+
84+ SHOW TRIGGERS;
5785```
5886
59- ## 配置 Alertmanager 与 Slack 集成
87+ 输出:
6088
61- GreptimeDB Trigger 的 Webhook payload 与 [ Prometheus Alertmanager] ( https://prometheus.io/docs/alerting/latest/alertmanager/ )
62- 兼容,因此我们可以复用 Alertmanager 的分组、抑制、静默和路由功能,而无需任何额外
63- 的胶水代码。
89+ ``` text
90+ +---------------+
91+ | Triggers |
92+ +---------------+
93+ | load1_monitor |
94+ +---------------+
95+ ```
6496
65- 你可以参考 [ 官方文档] ( https://prometheus.io/docs/alerting/latest/configuration/ )
66- 对 Prometheus Alertmanager 进行配置。为在 Slack 消息中呈现一致、易读的内容,可以
67- 配置以下消息模板。
97+ #### 查看创建语句
98+
99+ ``` sql
100+ SHOW CREATE TRIGGER ` load1_monitor` \G
101+ ```
102+
103+ 输出:
68104
69105``` text
70- {{ define "slack.text" }}
71- {{ range .Alerts }}
106+ *************************** 1. row ***************************
107+ Trigger: load1_monitor
108+ Create Trigger: CREATE TRIGGER IF NOT EXISTS `load1_monitor`
109+ ON (SELECT host AS label_host, avg(load1) AS avg_load1 ...) EVERY '1 minutes'::INTERVAL
110+ FOR '3 minutes'::INTERVAL
111+ KEEP FIRING FOR '3 minutes'::INTERVAL
112+ LABELS (severity = 'warning')
113+ ANNOTATIONS (comment = 'Your computer is smoking, should take a break.')
114+ NOTIFY(
115+ WEBHOOK `alert_manager` URL `http://localhost:9093` WITH (timeout = '1m'),
116+ )
117+ ```
72118
73- Labels:
74- {{- range .Labels.SortedPairs }}
75- - {{ .Name }}: {{ .Value }}
76- {{ end }}
119+ #### 查看 Trigger 详情
77120
78- Annotations:
79- {{- range .Annotations.SortedPairs }}
80- - {{ .Name }}: {{ .Value }}
81- {{ end }}
121+ ``` sql
122+ SELECT * FROM information_schema .triggers \G
123+ ```
82124
83- {{ end }}
84- {{ end }}
125+ 输出:
126+
127+ ``` text
128+ *************************** 1. row ***************************
129+ trigger_name: load1_monitor
130+ trigger_id: 1024
131+ raw_sql: (SELECT host AS label_host, avg(load1) AS avg_load1, ...)
132+ interval: 60
133+ labels: {"severity":"warning"}
134+ annotations: {"comment":"Your computer is smoking, should take a break."}
135+ for: 180
136+ keep_firing_for: 180
137+ channels: [{"channel_type":{"Webhook":{"opts":{"timeout":"1m"}, ...}]
138+ flownode_id: 0
85139```
86140
87- 使用上述模板生成 slack 消息会遍历所有的告警,并把每个告警的标签和注解展示出来 。
141+ 关于更多字段说明,参见 [ Triggers ] ( /reference/sql/information-schema/triggers ) 。
88142
89- 当配置完成之后,启动 Alertmanager。
143+ #### 查看告警实例
144+
145+ ``` sql
146+ SELECT * FROM information_schema .alerts ;
147+ ```
148+
149+ 如果尚未写入数据,将返回空结果。
90150
91- ## 创建 Trigger
92151
93- 在 GreptimeDB 中创建 Trigger。使用 MySQL 客户端连接 GreptimeDB 并执行以下 SQL:
152+ 关于更多字段说明,参见 [ Alerts] ( /reference/sql/information-schema/alerts ) 。
153+
154+ ### 4. 写入数据并观察告警状态
155+
156+ 下面的脚本模拟数据写入:先写入 1 分钟正常值,再写入 6 分钟的高值触发告警,随后
157+ 恢复到正常值。
158+
159+ ``` bash
160+ #! /usr/bin/env bash
161+
162+ MYSQL=" mysql -h 127.0.0.1 -P 4002"
163+
164+ insert_normal () {
165+ $MYSQL -e " INSERT INTO load1 (host, load1, ts) VALUES
166+ ('newyork1', 1.2, now()),
167+ ('newyork2', 1.1, now()),
168+ ('newyork3', 1.3, now());"
169+ }
170+
171+ insert_high () {
172+ $MYSQL -e " INSERT INTO load1 (host, load1, ts) VALUES
173+ ('newyork1', 1.2, now()),
174+ ('newyork2', 12.1, now()),
175+ ('newyork3', 11.5, now());"
176+ }
177+
178+ # 第一分钟:正常数据
179+ for i in {1..4}; do insert_normal; sleep 15; done
180+
181+ # 接下来 6 分钟:高负载
182+ for i in {1..24}; do insert_high; sleep 15; done
183+
184+ # 之后:恢复正常
185+ while true ; do insert_normal; sleep 15; done
186+ ```
187+
188+ #### 状态变迁
189+
190+ 在另一个终端中查询告警状态:
191+
192+ ** 阶段 1:无告警**
94193
95194``` sql
96- CREATE TRIGGER IF NOT EXISTS load1_monitor
97- ON (
98- SELECT collector AS label_collector,
99- host as label_host,
100- val
101- FROM host_load1 WHERE val > 10 and ts >= now() - ' 1 minutes' ::INTERVAL
102- ) EVERY ' 1 minute' ::INTERVAL
103- LABELS (severity= warning)
104- ANNOTATIONS (comment= ' Your computer is smoking, should take a break.' )
105- NOTIFY(
106- WEBHOOK alert_manager URL ' http://localhost:9093' WITH (timeout= " 1m" )
107- );
108- ```
109-
110- 上述 SQL 将创建一个名为 ` load1_monitor ` 的触发器,每分钟运行一次。它会评估 ` host_load1 `
111- 表中最近 60 秒的数据;如果任何 load1 值超过 10,则 ` NOTIFY ` 子句中的 ` WEBHOOK `
112- 选项会指定 Trigger 向在本地主机上运行且端口为 9093 的 Alertmanager 发送通知。
113-
114- 执行 ` SHOW TRIGGERS ` 查看已创建的触发器列表。
195+ SELECT * FROM information_schema .alerts \G
196+ ```
197+
198+ 输出:
199+
200+ ```
201+ Empty set
202+ ```
203+
204+ ** 阶段 2:pending** (条件首次满足,未达 ` FOR ` 时长)
115205
116206``` sql
117- SHOW TRIGGERS ;
207+ SELECT trigger_id, labels, active_at, fired_at, resolved_at FROM information_schema . alerts ;
118208```
119209
120- 输出结果应如下所示:
210+ ``` text
211+ +------------+-----------------------------------------------------------------------+----------------------------+----------+-------------+
212+ | trigger_id | labels | active_at | fired_at | resolved_at |
213+ +------------+-----------------------------------------------------------------------+----------------------------+----------+-------------+
214+ | 1024 | {"alert_name":"load1_monitor","host":"newyork3","severity":"warning"} | 2025-12-29 11:58:20.992670 | NULL | NULL |
215+ | 1024 | {"alert_name":"load1_monitor","host":"newyork2","severity":"warning"} | 2025-12-29 11:58:20.992670 | NULL | NULL |
216+ +------------+-----------------------------------------------------------------------+----------------------------+----------+-------------+
217+ ```
218+
219+ ** 阶段 3:firing** (满足 ` FOR ` ,开始发送通知)
220+
221+ ``` sql
222+ SELECT trigger_id, labels, active_at, fired_at, resolved_at FROM information_schema .alerts ;
223+ ```
121224
122225``` text
123- +---------------+
124- | Triggers |
125- +---------------+
126- | load1_monitor |
127- +---------------+
226+ +------------+-----------------------------------------------------------------------+----------------------------+----------------------------+-------------+
227+ | trigger_id | labels | active_at | fired_at | resolved_at |
228+ +------------+-----------------------------------------------------------------------+----------------------------+----------------------------+-------------+
229+ | 1024 | {"alert_name":"load1_monitor","host":"newyork3","severity":"warning"} | 2025-12-29 11:58:20.992670 | 2025-12-29 12:02:20.991713 | NULL |
230+ | 1024 | {"alert_name":"load1_monitor","host":"newyork2","severity":"warning"} | 2025-12-29 11:58:20.992670 | 2025-12-29 12:02:20.991713 | NULL |
231+ +------------+-----------------------------------------------------------------------+----------------------------+----------------------------+-------------+
128232```
129233
130- ## 测试 Trigger
234+ ** 阶段 4:inactive ** (条件不满足 + KEEP FIRING FOR 期满,发送恢复通知)
131235
132- 使用 [ stress-ng] ( https://github.com/ColinIanKing/stress-ng ) 模拟 60 秒的高 CPU 负载:
236+ ``` sql
237+ SELECT trigger_id, labels, active_at, fired_at, resolved_at FROM information_schema .alerts ;
238+ ```
133239
134- ``` bash
135- stress-ng --cpu 100 --cpu-load 10 --timeout 60
240+ ``` text
241+ +------------+-----------------------------------------------------------------------+----------------------------+----------------------------+----------------------------+
242+ | trigger_id | labels | active_at | fired_at | resolved_at |
243+ +------------+-----------------------------------------------------------------------+----------------------------+----------------------------+----------------------------+
244+ | 1024 | {"alert_name":"load1_monitor","host":"newyork3","severity":"warning"} | 2025-12-29 11:58:20.992670 | 2025-12-29 12:02:20.991713 | 2025-12-29 12:05:20.991750 |
245+ | 1024 | {"alert_name":"load1_monitor","host":"newyork2","severity":"warning"} | 2025-12-29 11:58:20.992670 | 2025-12-29 12:02:20.991713 | 2025-12-29 12:05:20.991750 |
246+ +------------+-----------------------------------------------------------------------+----------------------------+----------------------------+----------------------------+
136247```
137248
138- load1 值将快速上升,Trigger 通知将被触发,在一分钟之内,指定的 Slack 频道将收到如下
139- 告警:
249+ ### 5. 集成 Alertmanager(可选)
250+
251+ 如果已部署 Prometheus Alertmanager,GreptimeDB 会自动将 firing 和 inactive 状态
252+ 的告警推送过去。
253+
254+ 每次评估后,Trigger 会将查询结果中的字段注入 labels 和 annotations。本示例中,
255+ ` host ` 会作为 label,` avg_load1 ` 会作为 annotation。这些字段会传递到 Alertmanager,
256+ 并可在通知模板中使用。
140257
141- ![ Slack 告警示意图] ( /trigger-slack-alert.png )
258+ 由于 payload 与 Alertmanager 兼容,你可以直接复用其分组、抑制、静默和路由能力,
259+ 无需适配器。
142260
143261## 参考资料
144262
145- - [ Trigger 语法] ( /reference/sql/trigger-syntax.md ) : 与 ` TRIGGER ` 相关的 SQL 语句的语法细节。
263+ - [ Trigger 语法] ( /reference/sql/trigger-syntax.md ) : 与 ` TRIGGER ` 相关的 SQL 语句的语法细节
264+ - [ INFORMATION_SCHEMA.TRIGGERS] ( /reference/sql/information-schema/triggers ) : 关于 ` Trigger ` 元数据的视图
265+ - [ INFORMATION_SCHEMA.ALERTS] ( /reference/sql/information-schema/alerts ) : 关于告警实例元数据的视图
146266
0 commit comments