Conversation
📝 WalkthroughWalkthrough新增周期性 SMART 缓存采集服务并将结果写入 Changes
Sequence Diagram(s)sequenceDiagram
participant UI as WebUI
participant Script as PVE-Tools.sh
participant Cache as /run/smart-cache.json
participant Py as smart-cache-collect.py
participant Systemd as systemd
participant SmartCtl as smartctl
Systemd->>Systemd: timer 定期触发
Systemd->>Py: 启动 oneshot 服务
Py->>SmartCtl: 扫描并收集 NVMe/SATA/RAID
SmartCtl-->>Py: 返回原始 SMART 数据
Py->>Cache: 写入/更新 /run/smart-cache.json
UI->>Script: 请求节点概览数据
Script->>Cache: 读取缓存
Cache-->>Script: 返回 nvme_all / sata_all / raid_all
Script->>Script: 聚合/转义/格式化 渲染块
Script-->>UI: 返回格式化概览数据
UI->>UI: 渲染聚合设备列表
sequenceDiagram
participant Admin as 管理员
participant Script as PVE-Tools.sh
participant Systemd as systemd
participant Py as smart-cache-collect.py
participant Cache as /run/smart-cache.json
Admin->>Script: 调用 cpu_add()
Script->>Script: 启用内核/模块/工具
Script->>Script: deploy_smart_cache_service()
Script->>Systemd: 安装 timer & oneshot 单元
Systemd->>Py: 定期执行收集
Py->>Cache: 初始化并更新缓存
Admin->>Script: 调用 cpu_del()
Script->>Script: remove_smart_cache_service()
Script->>Systemd: 卸载 timer & oneshot 单元
Systemd->>Systemd: 停止定期收集
预估代码审查工作量🎯 4 (复杂) | ⏱️ ~45 分钟 诗歌
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@PVE-Tools.sh`:
- Around line 2847-2849: The code is defaulting missing smart_status.passed to
true which incorrectly marks unknown/unsupported SMART as healthy; instead only
set result[key]["smart_status"] when the smart_status field is present in the
parsed data. Update the SATA/NVMe handling (the block using
data.get("smart_status", {}).get("passed", True) and any analogous code around
smart_status handling, also the similar section referenced at lines 3233-3246)
to check for the existence of "smart_status" (or that "passed" key exists)
before assigning result[key]["smart_status"], matching the RAID branch semantics
of "only write when field exists."
- Around line 4026-4031: display_rows 计算少算了 RAID 控制器的标题行:在有 RAID 控制器时函数 raid-all
会 unshift 一行“控制器: ...”,但现在只把 raidi 加到高度里,导致高度少 1 行。修改 display_rows 的计算(涉及变量
display_rows、raidi 和调用 raid-all 的逻辑)以在存在 RAID 控制器时再加 1(即当 raidi>0
或检测到控制器信息时把额外的控制器标题行计入高度),确保 enable_ups 分支保持不变。
- Around line 3746-3785: The SATA renderer crashes because escapeHtml is
undefined in its scope and the sleep flag is misspelled; update the SATA
renderer to either import/define the same escapeHtml helper used by the NVMe
renderer (so calls to escapeHtml(...) resolve) or move escapeHtml to a shared
scope accessible to both renderers, and correct the property check from v.standy
to v.standby so the standby branch executes; ensure you reference the SATA
renderer function, the escapeHtml helper, and the v.standby property when making
the changes.
- Around line 2808-2810: 当前对磁盘的 smartctl 调用使用 "-a" 会唤醒处于 standby 的硬盘,需改为先加 "-n
standby" 以避免唤醒:在产生 cmd 的位置(变量名 cmd)将 smartctl 参数从 "smartctl {device} {args} -a
-j" 改为 "smartctl -n standby {device} {args} -a -j"(保持 subprocess.run
调用、shell/timeout/capture_output/text 不变),并在脚本中其他所有对 smartctl 的调用点(SATA/NVMe 检测块和
RAID 检测块,当前以 smartctl 调用为标识)做同样替换以统一行为。
- Around line 3316-3320: 当前用正则从 $cache_data 提取 JSON 片段(匹配 "sd$sdi" 或 RAID 条目,赋值到
$sd_data 等)会在对象内含嵌套结构时失配,导致 $res->{sdX} / $res->{raidX} 退回为 '{}';请改为用 JSON
解析代替正则:在处理缓存时用 JSON->new/JSON::XS 的 decode_json 将整个 $cache_data 解析为 Perl
数据结构,直接访问 $decoded->{"sd$sdi"} 或相应 RAID 键并赋值给 $sd_data/$raid_data;同样将 3396-3400
处的 RAID 提取逻辑替换为从已解析的结构读取值,并添加必要的存在性和类型检查以保底(若键不存在仍回退到 '{}')。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a2ed859d-f7b0-419c-8065-78bd19757996
⛔ Files ignored due to path filters (1)
images/add-raid-storage.pngis excluded by!**/*.png
📒 Files selected for processing (1)
PVE-Tools.sh
| # smart_status | ||
| passed = data.get("smart_status", {}).get("passed", True) | ||
| result[key]["smart_status"] = {"passed": passed} |
There was a problem hiding this comment.
缺失的 SMART 结果不能默认回填为通过。
smart_status.passed 缺失时这里直接写成 true,前端就会把“未知 / 读取失败 / 不支持”渲染成绿色“正常”。这和这次改动里“保留原始 SMART 信号”的目标是相反的;RAID 分支已经改成“字段存在才写入”,SATA/NVMe 这里也应该保持同样语义。
🛠️ 建议只在字段存在时输出 SMART 状态
- passed = data.get("smart_status", {}).get("passed", True)
- result[key]["smart_status"] = {"passed": passed}
+ smart_status = data.get("smart_status", {})
+ if "passed" in smart_status:
+ result[key]["smart_status"] = {"passed": smart_status["passed"]}Also applies to: 3233-3246
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@PVE-Tools.sh` around lines 2847 - 2849, The code is defaulting missing
smart_status.passed to true which incorrectly marks unknown/unsupported SMART as
healthy; instead only set result[key]["smart_status"] when the smart_status
field is present in the parsed data. Update the SATA/NVMe handling (the block
using data.get("smart_status", {}).get("passed", True) and any analogous code
around smart_status handling, also the similar section referenced at lines
3233-3246) to check for the existence of "smart_status" (or that "passed" key
exists) before assigning result[key]["smart_status"], matching the RAID branch
semantics of "only write when field exists."
主要功能: - 支持检测和显示 RAID 控制器(如 LSI MR9260-8i、DELL PERC H310) - 支持显示 RAID 卡下的物理硬盘 SMART 信息 - 异步 SMART 数据采集,避免阻塞 API 请求 - 动态 JSON 数组渲染,支持任意数量的硬盘 安全修复: - 添加 HTML 转义函数防止 XSS 漏洞 - 修复空对象过滤问题 - 保留原始 SMART 信号不强制默认值 - 补充缺失字段保持前后端一致 技术实现: - 添加 smart-cache-collect.py 异步采集脚本 - 添加 systemd 定时服务自动采集 SMART 数据 - 修改 Nodes.pm 后端 API 支持 RAID 数据查询 - 修改 pvemanagerlib.js 前端渲染 RAID 硬盘列表 - 添加 device_type 字段区分控制器和物理硬盘
87e6512 to
ddc9f37
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
PVE-Tools.sh (1)
3236-3249:⚠️ Potential issue | 🟠 Major不要把缺失的 NVMe SMART 结果回填成
true。Line 3236 默认
'true'后,Line 3249 就会把“未知/读取失败/不支持”渲染成绿色“正常”。这里应和 SATA/RAID 分支保持一致:只有在smart_status.passed真正存在时才输出该字段。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PVE-Tools.sh` around lines 3236 - 3249, The code defaults $smart_passed to 'true' when no "smart_status.passed" is present in $nvme_full, causing unknown/unsupported reads to appear as healthy; change this by not defaulting $smart_passed and instead only append the "smart_status" field into $nvme_data when the regex on $nvme_full actually captures a value (i.e., if $smart_passed is defined/exists after matching). Update the extraction from $nvme_full so $smart_passed is undefined on no match, and modify the $nvme_data construction to conditionally add the " \"smart_status\": {\"passed\": ...} " entry only when that capture succeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@PVE-Tools.sh`:
- Around line 3224-3261: Replace the inline smartctl collection (the block that
builds $nvme_full/$nvme_data and assigns $res->{nvme$nvi}) so it reads
precomputed NVMe JSON from the smart-cache-collect.py cache instead of running
smartctl per request: look up the cache file produced by smart-cache-collect.py
for the device, parse/validate its JSON and assign that to $res->{nvme$nvi};
only if the cache is missing or stale, fall back to a non-blocking quick read or
return an empty/placeholder value (do not run timeout-bound smartctl
synchronously). Update the code that references $nvme_full, $nvme_data and
$res->{nvme$nvi} accordingly to use the cached JSON format and preserve the same
keys (model_name, temperature.current, power_on_time.hours, power_cycle_count,
smart_status.passed, nvme_smart_health_information_log) so downstream consumers
remain compatible.
- Line 2926: The device-glob patterns (e.g., the loop using for dev in
sorted(glob.glob("/dev/sd[a-z]")) and the nvme0-9 handling) miss multi-letter
device names like sdaa or nvme10; replace these ad-hoc globs with a robust
whole-disk enumeration using lsblk or /sys/block: call lsblk -dn -o NAME (or
read /sys/block and select entries where /sys/block/<name>/partition does not
exist) then filter names by whole-disk regex (e.g., ^sd[a-z]+$ or
^nvme[0-9]+n[0-9]+$/or simply accept all names returned by lsblk), prepend /dev/
and iterate over that list; update all places referencing the old glob (the for
dev in sorted(glob.glob("/dev/sd[a-z]")) site and the nvme handling) to use the
new enumeration helper so multi-letter device names are collected.
- Around line 4025-4056: The UI/count logic must account for the always-present
"nvme-all", "sata-all", and "raid-all" blocks even when there are zero drives;
update nvme_ui_count, sata_ui_count, and raid_ui_count so they reflect the
presence of those blocks (i.e. count as 1 even when nvi/sdi/raidi == 0) and then
compute addRs using those UI counts and enable_ups as before; also change
display_rows to add one row per block when there are zero drives (use:
display_rows = 2 + (nvi>0 ? nvi : 1) + (sdi>0 ? sdi : 1) + (raidi>0 ? raidi : 1)
and still add the extra RAID controller row when raidi>0 and the UPS row when
enable_ups=true) so empty-state "未检测到 …" rows are included (update references to
nvme_ui_count, sata_ui_count, raid_ui_count, addRs, display_rows, nvi, sdi,
raidi, enable_ups, ups_info).
---
Duplicate comments:
In `@PVE-Tools.sh`:
- Around line 3236-3249: The code defaults $smart_passed to 'true' when no
"smart_status.passed" is present in $nvme_full, causing unknown/unsupported
reads to appear as healthy; change this by not defaulting $smart_passed and
instead only append the "smart_status" field into $nvme_data when the regex on
$nvme_full actually captures a value (i.e., if $smart_passed is defined/exists
after matching). Update the extraction from $nvme_full so $smart_passed is
undefined on no match, and modify the $nvme_data construction to conditionally
add the " \"smart_status\": {\"passed\": ...} " entry only when that capture
succeeded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3d8235c1-814c-4e38-87d7-d380cf32324a
⛔ Files ignored due to path filters (1)
images/add-raid-storage.pngis excluded by!**/*.png
📒 Files selected for processing (1)
PVE-Tools.sh
| # 收集 SATA 设备(包括物理硬盘和 RAID 控制器虚拟磁盘) | ||
| sdi = 0 | ||
| raid_controller_key = None # 用于存储 RAID 控制器信息 | ||
| for dev in sorted(glob.glob("/dev/sd[a-z]")): |
There was a problem hiding this comment.
设备匹配模式会静默漏盘。
Line 2926 和 Line 3287 只覆盖 /dev/sd[a-z],超过 26 块盘后的 /dev/sdaa、/dev/sdab 等都不会被采集;Line 3221 也只覆盖到 nvme0~nvme9。这个功能正好面向多盘/HBA/RAID 主机,这里最好改成基于 lsblk 或 /sys/block 的整盘枚举,而不是依赖单字符模式。
Also applies to: 3221-3221, 3287-3287
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@PVE-Tools.sh` at line 2926, The device-glob patterns (e.g., the loop using
for dev in sorted(glob.glob("/dev/sd[a-z]")) and the nvme0-9 handling) miss
multi-letter device names like sdaa or nvme10; replace these ad-hoc globs with a
robust whole-disk enumeration using lsblk or /sys/block: call lsblk -dn -o NAME
(or read /sys/block and select entries where /sys/block/<name>/partition does
not exist) then filter names by whole-disk regex (e.g., ^sd[a-z]+$ or
^nvme[0-9]+n[0-9]+$/or simply accept all names returned by lsblk), prepend /dev/
and iterate over that list; update all places referencing the old glob (the for
dev in sorted(glob.glob("/dev/sd[a-z]")) site and the nvme handling) to use the
new enumeration helper so multi-letter device names are collected.
There was a problem hiding this comment.
@firmament2008 理解,已记录为待改进项。如果需要,我可以帮您创建一个 GitHub Issue 来追踪这个优化点(将设备枚举从 glob("/dev/sd[a-z]") 改为基于 lsblk//sys/block 的完整枚举,以支持 26 块以上 SATA 盘和 nvme10+ 等多位数 NVMe)。需要的话请告知。
✏️ Learnings added
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:2926-2926
Timestamp: 2026-03-25T07:21:17.071Z
Learning: In PVE-Tools.sh (smart-cache-collect.py), the device glob patterns `/dev/sd[a-z]` (for SATA, line ~2926) and the NVMe glob (nvme0-9) are intentionally limited to single-character/single-digit device names. The author (firmament2008) has accepted this as a known limitation because the target use case (compute nodes) is not expected to have more than 26 SATA disks or more than 10 NVMe devices. This is tracked as a future TODO to improve using lsblk or /sys/block enumeration. Do not flag this as a blocking issue.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:2976-3028
Timestamp: 2026-03-24T04:34:32.616Z
Learning: In PVE-Tools.sh, the RAID card disk smartctl probing (around line 3018) intentionally uses `timeout 1` per device. The existing SATA (`smartctl $sd -a -j`) and NVMe (`smartctl $nvme -a -j`) sections have NO timeout at all, so the RAID `timeout 1` is actually more protective than the existing pattern — not a regression. Do not flag the per-disk `timeout 1` in the RAID section as a blocking concern relative to the other disk types.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:3471-3482
Timestamp: 2026-03-24T04:32:47.952Z
Learning: In PVE-Tools.sh (Bash/JavaScript), the RAID card disk SMART status rendering logic (in the JavaScript renderer injected for `raidX` panels) is intentionally designed with the following priority: (1) gate on `smart_support.available === false` to show "不支持"; (2) use `smart_status.passed` as the primary SMART pass/fail indicator; (3) fall back to `ata_smart_data.self_test.status.passed` only when `smart_status.passed` is undefined. The `ata_smart_data` fallback is intentional for RAID controllers where `smart_status` may be absent.
There was a problem hiding this comment.
lsblk 或 /sys/block 不能l列全 raid1 的所有设备。 待有环境后再开发。
There was a problem hiding this comment.
@firmament2008 了解,lsblk//sys/block 同样无法完整枚举 RAID1 阵列背后的所有物理设备,这个改进确实需要在有实际 RAID 环境下才能正确设计和验证。已记录这一额外约束,等您有环境时再开发。
✏️ Learnings added
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:2926-2926
Timestamp: 2026-03-25T07:43:03.128Z
Learning: In PVE-Tools.sh (smart-cache-collect.py), the proposed improvement of replacing `/dev/sd[a-z]` glob with `lsblk` or `/sys/block` enumeration is also insufficient for full RAID1 array device discovery — lsblk/sys/block cannot enumerate all physical devices behind a RAID1 array either. The author (firmament2008) has deferred this improvement until a proper RAID environment is available for development and testing. Do not suggest lsblk/sys/block as a drop-in replacement for the current glob pattern in the context of RAID device enumeration.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3224-3261
Timestamp: 2026-03-25T07:26:27.733Z
Learning: In PVE-Tools.sh (cpu_add function), the NVMe smartctl collection (around line 3228) is intentionally run inline within the Nodes.pm API path rather than via the async smart-cache-collect.py cache. The author (firmament2008) has accepted this because NVMe devices are CPU-directly attached (PCIe direct), platform lane limits bound the count to a very small number (typically 1-4), and the cumulative per-device overhead is not significant enough to block the API. Do not flag inline NVMe smartctl in Nodes.pm as a blocking performance concern.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:2926-2926
Timestamp: 2026-03-25T07:21:17.071Z
Learning: In PVE-Tools.sh (smart-cache-collect.py), the device glob patterns `/dev/sd[a-z]` (for SATA, line ~2926) and the NVMe glob (nvme0-9) are intentionally limited to single-character/single-digit device names. The author (firmament2008) has accepted this as a known limitation because the target use case (compute nodes) is not expected to have more than 26 SATA disks or more than 10 NVMe devices. This is tracked as a future TODO to improve using lsblk or /sys/block enumeration. Do not flag this as a blocking issue.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:3471-3482
Timestamp: 2026-03-24T04:32:47.952Z
Learning: In PVE-Tools.sh (Bash/JavaScript), the RAID card disk SMART status rendering logic (in the JavaScript renderer injected for `raidX` panels) is intentionally designed with the following priority: (1) gate on `smart_support.available === false` to show "不支持"; (2) use `smart_status.passed` as the primary SMART pass/fail indicator; (3) fall back to `ata_smart_data.self_test.status.passed` only when `smart_status.passed` is undefined. The `ata_smart_data` fallback is intentional for RAID controllers where `smart_status` may be absent.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:2976-3028
Timestamp: 2026-03-24T04:34:32.616Z
Learning: In PVE-Tools.sh, the RAID card disk smartctl probing (around line 3018) intentionally uses `timeout 1` per device. The existing SATA (`smartctl $sd -a -j`) and NVMe (`smartctl $nvme -a -j`) sections have NO timeout at all, so the RAID `timeout 1` is actually more protective than the existing pattern — not a regression. Do not flag the per-disk `timeout 1` in the RAID section as a blocking concern relative to the other disk types.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add function), the NVMe/SATA/RAID device topology (sdN, nvmeN, raidN key names) is intentionally captured at cpu_add() install time rather than dynamically at API runtime. The Perl code injected into Nodes.pm reads /run/smart-cache.json at each API call, but the set of sdN/nvmeN/raidN keys to look up is fixed when cpu_add() runs. This is an accepted design trade-off by the author (firmament2008) for efficiency — the async smart-cache-collect.py handles collection, and the static key set avoids runtime device scanning overhead. Do not flag this as a blocking issue.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add), each sdN and raidN Perl block independently opens and JSON::PP::decode_json's the entire /run/smart-cache.json file. This repeated open+decode pattern per disk is intentional and accepted by the author (firmament2008) — the async collection design is the intended efficiency mechanism, not per-call optimization. Do not flag repeated JSON decodes as a blocking concern.
| for nvme in $(ls /dev/nvme[0-9] 2> /dev/null); do | ||
| chmod +s /usr/sbin/smartctl 2>/dev/null | ||
|
|
||
| # 使用 -a 获取完整信息(-i 可能无法获取 nvme_smart_health_information_log) | ||
| cat >> $tmpf << EOF | ||
|
|
||
| \$res->{nvme$nvi} = \`smartctl $nvme -a -j\`; | ||
| { | ||
| my \$nvme_full = \`timeout 2 smartctl $nvme -a -j 2>/dev/null\`; | ||
| my \$nvme_data = '{}'; | ||
| if (\$nvme_full =~ /^\s*\{/) { | ||
| # 提取必要字段,精简 JSON 数据 | ||
| my \$model = (\$nvme_full =~ /"model_name"\s*:\s*"([^"]+)"/) ? \$1 : ''; | ||
| my \$temp = (\$nvme_full =~ /"temperature"\s*:\s*\{[^}]*"current"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$hours = (\$nvme_full =~ /"power_on_time"\s*:\s*\{[^}]*"hours"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$cycles = (\$nvme_full =~ /"power_cycle_count"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$smart_passed = (\$nvme_full =~ /"smart_status"\s*:\s*\{[^}]*"passed"\s*:\s*(true|false)/) ? \$1 : 'true'; | ||
| # NVMe 健康信息日志 | ||
| my \$pct_used = (\$nvme_full =~ /"percentage_used"\s*:\s*(\d+)/) ? \$1 : ''; | ||
| my \$media_err = (\$nvme_full =~ /"media_errors"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$unsafe_shutdown = (\$nvme_full =~ /"unsafe_shutdowns"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$data_read = (\$nvme_full =~ /"data_units_read"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$data_written = (\$nvme_full =~ /"data_units_written"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| # 构建精简 JSON | ||
| \$nvme_data = "{\n"; | ||
| \$nvme_data .= " \\"model_name\\": \\"\$model\\",\n"; | ||
| \$nvme_data .= " \\"temperature\\": {\\"current\\": \$temp},\n"; | ||
| \$nvme_data .= " \\"power_on_time\\": {\\"hours\\": \$hours},\n"; | ||
| \$nvme_data .= " \\"power_cycle_count\\": \$cycles,\n"; | ||
| \$nvme_data .= " \\"smart_status\\": {\\"passed\\": \$smart_passed}"; | ||
| if (\$pct_used ne '') { | ||
| \$nvme_data .= ",\n \\"nvme_smart_health_information_log\\": {\n"; | ||
| \$nvme_data .= " \\"percentage_used\\": \$pct_used,\n"; | ||
| \$nvme_data .= " \\"media_errors\\": \$media_err,\n"; | ||
| \$nvme_data .= " \\"unsafe_shutdowns\\": \$unsafe_shutdown,\n"; | ||
| \$nvme_data .= " \\"data_units_read\\": \$data_read,\n"; | ||
| \$nvme_data .= " \\"data_units_written\\": \$data_written\n"; | ||
| \$nvme_data .= " }"; | ||
| } | ||
| \$nvme_data .= "\n}"; | ||
| } | ||
| \$res->{nvme$nvi} = \$nvme_data; | ||
| } | ||
| EOF | ||
| echo "检测到 NVME 硬盘: $nvme (nvme$nvi)" | ||
| let nvi++ |
There was a problem hiding this comment.
设备拓扑被安装时写死了。
这几段 for 循环是在执行 cpu_add() 时把当前盘数展开进 Nodes.pm,不是在 API 运行时动态发现设备。后面新增/删掉的 NVMe/SATA/RAID 盘不会自动进入 *_all,而且每个 sdN/raidN 块都会重复 open + decode_json 整个缓存文件;更稳的实现是一次性读取 /run/smart-cache.json,直接按 key 前缀生成 nvme_all / sata_all / raid_all。
Also applies to: 3287-3335, 3357-3419
There was a problem hiding this comment.
@firmament2008 明白了,异步缓存采集的设计是为了效率考虑,在 cpu_add() 安装时确定设备拓扑、运行时从 /run/smart-cache.json 读取数据,属于有意为之的权衡,不再追究。
✏️ Learnings added
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add function), the NVMe/SATA/RAID device topology (sdN, nvmeN, raidN key names) is intentionally captured at cpu_add() install time rather than dynamically at API runtime. The Perl code injected into Nodes.pm reads /run/smart-cache.json at each API call, but the set of sdN/nvmeN/raidN keys to look up is fixed when cpu_add() runs. This is an accepted design trade-off by the author (firmament2008) for efficiency — the async smart-cache-collect.py handles collection, and the static key set avoids runtime device scanning overhead. Do not flag this as a blocking issue.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add), each sdN and raidN Perl block independently opens and JSON::PP::decode_json's the entire /run/smart-cache.json file. This repeated open+decode pattern per disk is intentional and accepted by the author (firmament2008) — the async collection design is the intended efficiency mechanism, not per-call optimization. Do not flag repeated JSON decodes as a blocking concern.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:2976-3028
Timestamp: 2026-03-24T04:34:32.616Z
Learning: In PVE-Tools.sh, the RAID card disk smartctl probing (around line 3018) intentionally uses `timeout 1` per device. The existing SATA (`smartctl $sd -a -j`) and NVMe (`smartctl $nvme -a -j`) sections have NO timeout at all, so the RAID `timeout 1` is actually more protective than the existing pattern — not a regression. Do not flag the per-disk `timeout 1` in the RAID section as a blocking concern relative to the other disk types.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:3471-3482
Timestamp: 2026-03-24T04:32:47.952Z
Learning: In PVE-Tools.sh (Bash/JavaScript), the RAID card disk SMART status rendering logic (in the JavaScript renderer injected for `raidX` panels) is intentionally designed with the following priority: (1) gate on `smart_support.available === false` to show "不支持"; (2) use `smart_status.passed` as the primary SMART pass/fail indicator; (3) fall back to `ata_smart_data.self_test.status.passed` only when `smart_status.passed` is undefined. The `ata_smart_data` fallback is intentional for RAID controllers where `smart_status` may be absent.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:2926-2926
Timestamp: 2026-03-25T07:21:17.071Z
Learning: In PVE-Tools.sh (smart-cache-collect.py), the device glob patterns `/dev/sd[a-z]` (for SATA, line ~2926) and the NVMe glob (nvme0-9) are intentionally limited to single-character/single-digit device names. The author (firmament2008) has accepted this as a known limitation because the target use case (compute nodes) is not expected to have more than 26 SATA disks or more than 10 NVMe devices. This is tracked as a future TODO to improve using lsblk or /sys/block enumeration. Do not flag this as a blocking issue.
| # 使用 -a 获取完整信息(-i 可能无法获取 nvme_smart_health_information_log) | ||
| cat >> $tmpf << EOF | ||
|
|
||
| \$res->{nvme$nvi} = \`smartctl $nvme -a -j\`; | ||
| { | ||
| my \$nvme_full = \`timeout 2 smartctl $nvme -a -j 2>/dev/null\`; | ||
| my \$nvme_data = '{}'; | ||
| if (\$nvme_full =~ /^\s*\{/) { | ||
| # 提取必要字段,精简 JSON 数据 | ||
| my \$model = (\$nvme_full =~ /"model_name"\s*:\s*"([^"]+)"/) ? \$1 : ''; | ||
| my \$temp = (\$nvme_full =~ /"temperature"\s*:\s*\{[^}]*"current"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$hours = (\$nvme_full =~ /"power_on_time"\s*:\s*\{[^}]*"hours"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$cycles = (\$nvme_full =~ /"power_cycle_count"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$smart_passed = (\$nvme_full =~ /"smart_status"\s*:\s*\{[^}]*"passed"\s*:\s*(true|false)/) ? \$1 : 'true'; | ||
| # NVMe 健康信息日志 | ||
| my \$pct_used = (\$nvme_full =~ /"percentage_used"\s*:\s*(\d+)/) ? \$1 : ''; | ||
| my \$media_err = (\$nvme_full =~ /"media_errors"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$unsafe_shutdown = (\$nvme_full =~ /"unsafe_shutdowns"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$data_read = (\$nvme_full =~ /"data_units_read"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| my \$data_written = (\$nvme_full =~ /"data_units_written"\s*:\s*(\d+)/) ? \$1 : 0; | ||
| # 构建精简 JSON | ||
| \$nvme_data = "{\n"; | ||
| \$nvme_data .= " \\"model_name\\": \\"\$model\\",\n"; | ||
| \$nvme_data .= " \\"temperature\\": {\\"current\\": \$temp},\n"; | ||
| \$nvme_data .= " \\"power_on_time\\": {\\"hours\\": \$hours},\n"; | ||
| \$nvme_data .= " \\"power_cycle_count\\": \$cycles,\n"; | ||
| \$nvme_data .= " \\"smart_status\\": {\\"passed\\": \$smart_passed}"; | ||
| if (\$pct_used ne '') { | ||
| \$nvme_data .= ",\n \\"nvme_smart_health_information_log\\": {\n"; | ||
| \$nvme_data .= " \\"percentage_used\\": \$pct_used,\n"; | ||
| \$nvme_data .= " \\"media_errors\\": \$media_err,\n"; | ||
| \$nvme_data .= " \\"unsafe_shutdowns\\": \$unsafe_shutdown,\n"; | ||
| \$nvme_data .= " \\"data_units_read\\": \$data_read,\n"; | ||
| \$nvme_data .= " \\"data_units_written\\": \$data_written\n"; | ||
| \$nvme_data .= " }"; | ||
| } | ||
| \$nvme_data .= "\n}"; | ||
| } | ||
| \$res->{nvme$nvi} = \$nvme_data; |
There was a problem hiding this comment.
NVMe 采集仍然阻塞 Nodes API。
Line 3228 这里还在 Nodes.pm 请求路径里逐块执行 smartctl。主机一旦有多块 NVMe,节点概览每次刷新都会继续吃掉这段 IO/超时预算,异步缓存服务并没有覆盖这条路径;NVMe 也应该走 smart-cache-collect.py 生成的缓存。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@PVE-Tools.sh` around lines 3224 - 3261, Replace the inline smartctl
collection (the block that builds $nvme_full/$nvme_data and assigns
$res->{nvme$nvi}) so it reads precomputed NVMe JSON from the
smart-cache-collect.py cache instead of running smartctl per request: look up
the cache file produced by smart-cache-collect.py for the device, parse/validate
its JSON and assign that to $res->{nvme$nvi}; only if the cache is missing or
stale, fall back to a non-blocking quick read or return an empty/placeholder
value (do not run timeout-bound smartctl synchronously). Update the code that
references $nvme_full, $nvme_data and $res->{nvme$nvi} accordingly to use the
cached JSON format and preserve the same keys (model_name, temperature.current,
power_on_time.hours, power_cycle_count, smart_status.passed,
nvme_smart_health_information_log) so downstream consumers remain compatible.
There was a problem hiding this comment.
nvme 是CPU直连设备,支持数量有限,不影响整体性能。
There was a problem hiding this comment.
@firmament2008 明白了,感谢说明。NVMe 设备通过 CPU PCIe 通道直连,受平台通道数量限制,实际部署中数量极为有限(通常 1-4 块),每次 API 刷新的 smartctl 累计开销可控,不会形成显著阻塞。已记录此设计决策,后续审查不再将此标记为阻塞问题。
✏️ Learnings added
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3224-3261
Timestamp: 2026-03-25T07:26:27.733Z
Learning: In PVE-Tools.sh (cpu_add function), the NVMe smartctl collection (around line 3228) is intentionally run inline within the Nodes.pm API path rather than via the async smart-cache-collect.py cache. The author (firmament2008) has accepted this because NVMe devices are CPU-directly attached (PCIe direct), platform lane limits bound the count to a very small number (typically 1-4), and the cumulative per-device overhead is not significant enough to block the API. Do not flag inline NVMe smartctl in Nodes.pm as a blocking performance concern.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add), each sdN and raidN Perl block independently opens and JSON::PP::decode_json's the entire /run/smart-cache.json file. This repeated open+decode pattern per disk is intentional and accepted by the author (firmament2008) — the async collection design is the intended efficiency mechanism, not per-call optimization. Do not flag repeated JSON decodes as a blocking concern.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:3471-3482
Timestamp: 2026-03-24T04:32:47.952Z
Learning: In PVE-Tools.sh (Bash/JavaScript), the RAID card disk SMART status rendering logic (in the JavaScript renderer injected for `raidX` panels) is intentionally designed with the following priority: (1) gate on `smart_support.available === false` to show "不支持"; (2) use `smart_status.passed` as the primary SMART pass/fail indicator; (3) fall back to `ata_smart_data.self_test.status.passed` only when `smart_status.passed` is undefined. The `ata_smart_data` fallback is intentional for RAID controllers where `smart_status` may be absent.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:2976-3028
Timestamp: 2026-03-24T04:34:32.616Z
Learning: In PVE-Tools.sh, the RAID card disk smartctl probing (around line 3018) intentionally uses `timeout 1` per device. The existing SATA (`smartctl $sd -a -j`) and NVMe (`smartctl $nvme -a -j`) sections have NO timeout at all, so the RAID `timeout 1` is actually more protective than the existing pattern — not a regression. Do not flag the per-disk `timeout 1` in the RAID section as a blocking concern relative to the other disk types.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add function), the NVMe/SATA/RAID device topology (sdN, nvmeN, raidN key names) is intentionally captured at cpu_add() install time rather than dynamically at API runtime. The Perl code injected into Nodes.pm reads /run/smart-cache.json at each API call, but the set of sdN/nvmeN/raidN keys to look up is fixed when cpu_add() runs. This is an accepted design trade-off by the author (firmament2008) for efficiency — the async smart-cache-collect.py handles collection, and the static key set avoids runtime device scanning overhead. Do not flag this as a blocking issue.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:2926-2926
Timestamp: 2026-03-25T07:21:17.071Z
Learning: In PVE-Tools.sh (smart-cache-collect.py), the device glob patterns `/dev/sd[a-z]` (for SATA, line ~2926) and the NVMe glob (nvme0-9) are intentionally limited to single-character/single-digit device names. The author (firmament2008) has accepted this as a known limitation because the target use case (compute nodes) is not expected to have more than 26 SATA disks or more than 10 NVMe devices. This is tracked as a future TODO to improve using lsblk or /sys/block enumeration. Do not flag this as a blocking issue.
问题:RAID 检测逻辑使用 'or',只要 smart_support.available=false 就认为是 RAID 但 HBA 直通模式(IT mode)下的普通硬盘也可能 SMART 不可用 修复:改用 'and',必须同时满足: 1. SMART 不支持 2. 有 RAID 相关标识(vendor/product 包含 RAID 关键字) 这样可以正确区分: - RAID 虚拟磁盘:MR/PERC/HPSA 等 + SMART 不可用 - HBA 直通硬盘:SMART 可能不可用,但无 RAID 标识
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
PVE-Tools.sh (1)
3289-3336:⚠️ Potential issue | 🟠 Major
sdN编号和采集端的规则已经错位了。采集脚本会跳过被识别为 RAID 虚拟盘的
/dev/sdX,这里只要看到/dev/sdX就继续递增sdi。一旦主机同时存在 RAID 虚拟盘和真实 SATA 盘,smart-cache-collect.py写出的sdN与这里读取的sdN就会错位,后面的真实 SATA 盘会读错缓存,甚至直接从sata_all里消失。这里至少要复用采集端同一套 RAID 虚拟盘过滤条件,保证sdi只对真正进入缓存的 SATA 设备递增。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PVE-Tools.sh` around lines 3289 - 3336, The loop increments sdi for every /dev/sdX even when the device is a RAID virtual disk, causing sdN mismatches with the collector; update the loop so you first apply the exact same RAID-virtual-device filter used in smart-cache-collect.py (reuse its predicate/logic) to decide whether a /dev/sdX is a real SATA device, and only then proceed to append the Perl JSON block and increment sdi. Concretely, perform the RAID-check before using sd, sdsn, sdcr and before writing the block that references "sd$sdi" so sdi only increments for devices that passed the collector's filter. Ensure the same identifying symbols (sd, sdsn, sdcr, sdi and the cache key "sd$sdi") are used so keys remain aligned with smart-cache-collect.py.
♻️ Duplicate comments (2)
PVE-Tools.sh (2)
3229-3263:⚠️ Potential issue | 🟠 Major改用
JSON::PP组装 NVMe 条目,别再手拼 JSON。这段一边用正则从
smartctl输出里抠字段,一边再手工拼 JSON。model_name里只要出现引号或反斜杠,nvme_all就会变成非法 JSON,前端整块JSON.parse会直接失败;而且smart_status缺失时这里仍会被回填成true,把“未知”误显示成“正常”。直接decode_json后按字段存在性构造 Perl hash,再encode_json会稳很多。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PVE-Tools.sh` around lines 3229 - 3263, Replace the hand-rolled JSON assembly around $nvme_full/$nvme_data with JSON::PP-based parsing and encoding: use decode_json on $nvme_full to get a Perl hashref, then build a result hash only adding keys when the source fields exist (e.g. model_name, temperature->{current}, power_on_time->{hours}, power_cycle_count, smart_status->{passed} — do not default missing smart_status to true), include nvme_smart_health_information_log only when percentage_used exists, and finally set $res->{nvme$nvi} = encode_json($result_hashref); this avoids injection/escaping bugs from manual string concatenation and preserves correct "unknown" state handling.
4029-4058:⚠️ Potential issue | 🟡 Minor自动高度还没把空状态块算进去。
nvme-all、sata-all、raid-all是无条件插入的,但这里只有在nvi/sdi/raidi > 0时才给对应 block 计高度。纯 SATA、纯 NVMe,或者无 RAID 的常见场景都会少算 1~2 行,默认“自动计算”下左侧卡片仍然可能被裁掉。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@PVE-Tools.sh` around lines 4029 - 4058, The auto-height calculation ignores the always-present empty blocks (nvme-all, sata-all, raid-all), so adjust the counters to account for those UI blocks even when nvi/sdi/raidi are zero: set nvme_ui_count, sata_ui_count, and raid_ui_count to 1 when their block is always inserted (i.e., make their initial assignment unconditional or check the block-presence flag), then use these ui_count values when computing addRs and display_rows (replace usages of nvi/sdi/raidi in the display_rows base count with nvme_ui_count/sata_ui_count/raid_ui_count so empty blocks contribute 1 row), and keep the extra RAID controller row logic (display_rows += 1 when raidi > 0) and UPS adjustments as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@PVE-Tools.sh`:
- Around line 2830-2854: The code currently fills missing fields with defaults
(temp=0, hours=0, cycles=0, standby=False) causing the UI to render nonexistent
data; change the logic in the block handling
temperature/power_on_time/power_cycle_count/standby so you only assign
result[key]["temperature"], result[key]["power_on_time"],
result[key]["power_cycle_count"], and result[key]["standby"] when the upstream
data actually contains those fields (e.g., check for "temperature" key before
setting temperature.current, check for "hours" or "raw.value" in power_on_time
before setting power_on_time.hours, check that power_cycle_count exists before
writing cycles, and only set standby when "standby" is present), mirroring how
smart_status is handled so absent fields are left unset.
---
Outside diff comments:
In `@PVE-Tools.sh`:
- Around line 3289-3336: The loop increments sdi for every /dev/sdX even when
the device is a RAID virtual disk, causing sdN mismatches with the collector;
update the loop so you first apply the exact same RAID-virtual-device filter
used in smart-cache-collect.py (reuse its predicate/logic) to decide whether a
/dev/sdX is a real SATA device, and only then proceed to append the Perl JSON
block and increment sdi. Concretely, perform the RAID-check before using sd,
sdsn, sdcr and before writing the block that references "sd$sdi" so sdi only
increments for devices that passed the collector's filter. Ensure the same
identifying symbols (sd, sdsn, sdcr, sdi and the cache key "sd$sdi") are used so
keys remain aligned with smart-cache-collect.py.
---
Duplicate comments:
In `@PVE-Tools.sh`:
- Around line 3229-3263: Replace the hand-rolled JSON assembly around
$nvme_full/$nvme_data with JSON::PP-based parsing and encoding: use decode_json
on $nvme_full to get a Perl hashref, then build a result hash only adding keys
when the source fields exist (e.g. model_name, temperature->{current},
power_on_time->{hours}, power_cycle_count, smart_status->{passed} — do not
default missing smart_status to true), include nvme_smart_health_information_log
only when percentage_used exists, and finally set $res->{nvme$nvi} =
encode_json($result_hashref); this avoids injection/escaping bugs from manual
string concatenation and preserves correct "unknown" state handling.
- Around line 4029-4058: The auto-height calculation ignores the always-present
empty blocks (nvme-all, sata-all, raid-all), so adjust the counters to account
for those UI blocks even when nvi/sdi/raidi are zero: set nvme_ui_count,
sata_ui_count, and raid_ui_count to 1 when their block is always inserted (i.e.,
make their initial assignment unconditional or check the block-presence flag),
then use these ui_count values when computing addRs and display_rows (replace
usages of nvi/sdi/raidi in the display_rows base count with
nvme_ui_count/sata_ui_count/raid_ui_count so empty blocks contribute 1 row), and
keep the extra RAID controller row logic (display_rows += 1 when raidi > 0) and
UPS adjustments as-is.
| # temperature | ||
| temp = data.get("temperature", {}).get("current", 0) | ||
| result[key]["temperature"] = {"current": temp} | ||
|
|
||
| # power_on_time | ||
| hours = 0 | ||
| power_on = data.get("power_on_time", {}) | ||
| if "hours" in power_on: | ||
| hours = power_on["hours"] | ||
| elif "raw" in power_on and "value" in power_on["raw"]: | ||
| hours = power_on["raw"]["value"] | ||
| result[key]["power_on_time"] = {"hours": hours} | ||
|
|
||
| # power_cycle_count | ||
| cycles = data.get("power_cycle_count", 0) | ||
| result[key]["power_cycle_count"] = cycles | ||
|
|
||
| # smart_status - 只在存在时才输出,不默认 true | ||
| smart_status = data.get("smart_status", {}) | ||
| if "passed" in smart_status: | ||
| result[key]["smart_status"] = {"passed": smart_status["passed"]} | ||
|
|
||
| # standby - 休眠状态 | ||
| standby = data.get("standby", False) | ||
| result[key]["standby"] = standby |
There was a problem hiding this comment.
缺失字段不要回填成 0/false。
前端现在已经按“字段是否存在”来决定是否渲染。这里把 temperature.current、power_on_time.hours、power_cycle_count(以及 SATA 的 standby)默认为 0/false 后,会把“未返回/不支持”误显示成 0°C、0时 或“非休眠”,和这次“保留原始值”的目标是冲突的。建议和 smart_status 一样,只在上游确实返回对应字段时才写入。
🛠️ 建议改法
- temp = data.get("temperature", {}).get("current", 0)
- result[key]["temperature"] = {"current": temp}
+ temp = data.get("temperature", {})
+ if "current" in temp:
+ result[key]["temperature"] = {"current": temp["current"]}
- hours = 0
power_on = data.get("power_on_time", {})
if "hours" in power_on:
- hours = power_on["hours"]
+ result[key]["power_on_time"] = {"hours": power_on["hours"]}
elif "raw" in power_on and "value" in power_on["raw"]:
- hours = power_on["raw"]["value"]
- result[key]["power_on_time"] = {"hours": hours}
+ result[key]["power_on_time"] = {"hours": power_on["raw"]["value"]}
- cycles = data.get("power_cycle_count", 0)
- result[key]["power_cycle_count"] = cycles
+ if "power_cycle_count" in data:
+ result[key]["power_cycle_count"] = data["power_cycle_count"]
- standby = data.get("standby", False)
- result[key]["standby"] = standby
+ if "standby" in data:
+ result[key]["standby"] = data["standby"]Also applies to: 2875-2888
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@PVE-Tools.sh` around lines 2830 - 2854, The code currently fills missing
fields with defaults (temp=0, hours=0, cycles=0, standby=False) causing the UI
to render nonexistent data; change the logic in the block handling
temperature/power_on_time/power_cycle_count/standby so you only assign
result[key]["temperature"], result[key]["power_on_time"],
result[key]["power_cycle_count"], and result[key]["standby"] when the upstream
data actually contains those fields (e.g., check for "temperature" key before
setting temperature.current, check for "hours" or "raw.value" in power_on_time
before setting power_on_time.hours, check that power_cycle_count exists before
writing cycles, and only set standby when "standby" is present), mirroring how
smart_status is handled so absent fields are left unset.
There was a problem hiding this comment.
@firmament2008 了解,感谢告知!下版本补齐"仅在字段存在时才写入"的逻辑后,0°C / 0时 等误显示问题就能一并解决了。期待后续版本 👍
🧠 Learnings used
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3224-3261
Timestamp: 2026-03-25T07:26:27.733Z
Learning: In PVE-Tools.sh (cpu_add function), the NVMe smartctl collection (around line 3228) is intentionally run inline within the Nodes.pm API path rather than via the async smart-cache-collect.py cache. The author (firmament2008) has accepted this because NVMe devices are CPU-directly attached (PCIe direct), platform lane limits bound the count to a very small number (typically 1-4), and the cumulative per-device overhead is not significant enough to block the API. Do not flag inline NVMe smartctl in Nodes.pm as a blocking performance concern.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:3471-3482
Timestamp: 2026-03-24T04:32:47.952Z
Learning: In PVE-Tools.sh (Bash/JavaScript), the RAID card disk SMART status rendering logic (in the JavaScript renderer injected for `raidX` panels) is intentionally designed with the following priority: (1) gate on `smart_support.available === false` to show "不支持"; (2) use `smart_status.passed` as the primary SMART pass/fail indicator; (3) fall back to `ata_smart_data.self_test.status.passed` only when `smart_status.passed` is undefined. The `ata_smart_data` fallback is intentional for RAID controllers where `smart_status` may be absent.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 73
File: PVE-Tools.sh:2976-3028
Timestamp: 2026-03-24T04:34:32.616Z
Learning: In PVE-Tools.sh, the RAID card disk smartctl probing (around line 3018) intentionally uses `timeout 1` per device. The existing SATA (`smartctl $sd -a -j`) and NVMe (`smartctl $nvme -a -j`) sections have NO timeout at all, so the RAID `timeout 1` is actually more protective than the existing pattern — not a regression. Do not flag the per-disk `timeout 1` in the RAID section as a blocking concern relative to the other disk types.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add function), the NVMe/SATA/RAID device topology (sdN, nvmeN, raidN key names) is intentionally captured at cpu_add() install time rather than dynamically at API runtime. The Perl code injected into Nodes.pm reads /run/smart-cache.json at each API call, but the set of sdN/nvmeN/raidN keys to look up is fixed when cpu_add() runs. This is an accepted design trade-off by the author (firmament2008) for efficiency — the async smart-cache-collect.py handles collection, and the static key set avoids runtime device scanning overhead. Do not flag this as a blocking issue.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:3221-3265
Timestamp: 2026-03-25T07:25:27.596Z
Learning: In PVE-Tools.sh (cpu_add), each sdN and raidN Perl block independently opens and JSON::PP::decode_json's the entire /run/smart-cache.json file. This repeated open+decode pattern per disk is intentional and accepted by the author (firmament2008) — the async collection design is the intended efficiency mechanism, not per-call optimization. Do not flag repeated JSON decodes as a blocking concern.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:2926-2926
Timestamp: 2026-03-25T07:43:03.128Z
Learning: In PVE-Tools.sh (smart-cache-collect.py), the proposed improvement of replacing `/dev/sd[a-z]` glob with `lsblk` or `/sys/block` enumeration is also insufficient for full RAID1 array device discovery — lsblk/sys/block cannot enumerate all physical devices behind a RAID1 array either. The author (firmament2008) has deferred this improvement until a proper RAID environment is available for development and testing. Do not suggest lsblk/sys/block as a drop-in replacement for the current glob pattern in the context of RAID device enumeration.
Learnt from: firmament2008
Repo: Mapleawaa/PVE-Tools-9 PR: 74
File: PVE-Tools.sh:2926-2926
Timestamp: 2026-03-25T07:21:17.071Z
Learning: In PVE-Tools.sh (smart-cache-collect.py), the device glob patterns `/dev/sd[a-z]` (for SATA, line ~2926) and the NVMe glob (nvme0-9) are intentionally limited to single-character/single-digit device names. The author (firmament2008) has accepted this as a known limitation because the target use case (compute nodes) is not expected to have more than 26 SATA disks or more than 10 NVMe devices. This is tracked as a future TODO to improve using lsblk or /sys/block enumeration. Do not flag this as a blocking issue.
与 smart_status 保持一致的处理逻辑: - temperature.current - 只在存在时输出 - power_on_time.hours - 只在存在时输出 - power_cycle_count - 只在存在时输出 - standby - 只在存在时输出 - ata_smart_attributes.table - 只在存在时输出 - smart_support.available - 只在存在时输出 - ata_smart_data.self_test.status.passed - 只在存在时输出 避免把'未返回/不支持'误显示成 0°C、0时或'非休眠'
|
这个 PR 我不会合并,后续会直接关闭。 先说明一点,“已经在你自己的机器上运行良好”不能作为主仓库合并依据。你的环境能跑,只能说明这套实现适合你的用法,不说明它适合主仓库,更不说明它适合其他用户环境。 这次不合并的原因不是单纯代码风格问题,而是实现方向本身已经偏离了主仓库当前的维护边界。 主仓库当前的设计目标很明确:尽量保持单脚本、少状态、可回滚,不为了一个节点概览展示功能再额外增加后台服务和新的维护链路。
这已经不是“增加 RAID 阵列卡显示支持”,而是另外起了一套常驻采集机制。这个方向主仓库不接受。 除此之外,这个 PR 还带来了明确的问题:
所以这不是“你机器上跑得挺好,我这里看不顺眼”的问题,而是这版实现对主仓库来说维护成本更高、状态更多、风险也更大。 如果你认可这套方案,可以继续在你自己的 fork 里维护一个变体版本,这个没有问题。 另外提醒一下,本仓库许可证是 GPL-3.0。 |
主要功能:
安全修复:
技术实现:
Summary by CodeRabbit
新功能
性能优化