Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,33 @@ All notable changes to NightMend will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [0.1.0.0] - 2026-04-10

### Added
- **Autopilot Demo Mode**: zero-config Docker demo that showcases the full AI detect→diagnose→fix→verify loop. Run `docker compose -f docker-compose.demo.yml up` and watch NightMend auto-detect a disk-full fault, diagnose root cause with AI, execute cleanup, and verify recovery. No configuration needed.
- Demo orchestrator: pre-seeds 3 hosts, alert rules, service topology, and metrics. Injects disk fault at T+60s, triggers immediate alert evaluation, and auto-starts AI diagnosis session.
- `GET /api/v1/demo/status` endpoint for frontend to poll demo phase (seeding, exploring, injecting, diagnosing, complete).
- Auto-approve mode for command execution during demo (skips 60-second user confirmation wait).
- Frontend: DEMO badge on OpsAssistant, global alert bar with 3-second auto-redirect, demo session auto-select.
- `docker-compose.demo.yml` with `DEMO_MODE=true` for one-command demo startup.

### Fixed
- AI ops assistant "new session" button now creates a new session each call instead of reusing blank drafts.
- Host selector shows "no online hosts" guidance with link to host management when no agents are connected.

## [2026.04.08] - 2026-04-08

### Fixed
- AI 运维助手"新建会话"按钮点击无效:`POST /api/v1/ops/sessions` 之前会复用已有空白草稿,导致用户每次点击都拿到同一个 session id 且 `title` 被静默丢弃。改为总是创建新会话并清理残留空白草稿。
- AI 运维助手在数据库无在线主机时进入死循环:主机选择控件被 `hosts.length > 0` 守卫整段隐藏,但发送仍要求 `selectedHostId`,用户找不到任何选择主机的入口。改为始终渲染该行,空状态时提示"暂无在线主机 — 请先到 主机管理 添加并启动 Agent"并提供跳转链接。

### Added
- 4 个 ops session 路由回归测试 (`backend/tests/test_ops_session_routes.py`):
- 连续创建会话返回不同 id
- `body.title` 持久化
- 空白草稿清理
- 带 title / 带消息的会话不被误删

## [2026.03.29] - 2026-03-29

### Added
Expand Down
76 changes: 76 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,79 @@ Key routing rules:
- Architecture review → invoke plan-eng-review
- Save progress, checkpoint, resume → invoke checkpoint
- Code quality, health check → invoke health

## Overview
NightMend (vigilops) — AI 驱动的智能运维监控平台。接收告警 → AI 分析 → 自动修复。

## Tech Stack
- **后端**: Python 3, FastAPI, SQLAlchemy (async), PostgreSQL, Redis, Alembic migrations
- **前端**: React 19, TypeScript, Vite 7, Ant Design 6, ECharts, i18n
- **Agent**: Python, asyncssh, 独立部署到被监控主机
- **部署**: Docker Compose (dev/demo/prod/ssl 四套), ClickHouse (时序数据)
- **AI**: DeepSeek API, MCP 集成, fastmcp

## Project Structure
```
vigilops/
├── backend/ # FastAPI 后端
│ ├── app/
│ │ ├── api/ # REST API 路由
│ │ ├── models/ # SQLAlchemy 模型
│ │ ├── services/ # 业务逻辑
│ │ ├── remediation/ # 自动修复引擎
│ │ ├── mcp/ # MCP 集成
│ │ └── tasks/ # 后台任务
│ ├── alembic/ # 数据库迁移
│ └── tests/ # 后端测试
├── frontend/ # React 前端
│ └── src/
│ ├── pages/ # 页面
│ ├── components/ # 组件
│ ├── services/ # API 调用
│ └── stores/ # 状态管理
├── agent/ # 被监控主机 Agent
│ └── nightmend_agent/ # Agent 核心代码
├── charts/ # Helm charts
├── deploy/ # 部署脚本
└── docker-compose*.yml # 多环境部署
```

## Development

### Common Commands
- **后端开发**: `cd backend && uvicorn app.main:app --reload`
- **后端测试**: `cd backend && pytest`
- **前端开发**: `cd frontend && npm run dev`
- **前端构建**: `cd frontend && tsc -b && npm run build`
- **前端 lint**: `cd frontend && npm run lint`
- **全栈部署**: `docker compose up -d --build`
- **Demo 模式**: `docker compose -f docker-compose.demo.yml up -d`
- **DB 迁移**: `cd backend && alembic upgrade head`

### Branch Strategy
- Main branch: `main`
- Feature: `feat/[description]`
- Fix: `fix/[description]`
- Commit format: conventional commits (`feat:`, `fix:`, `chore:`)

## Rules
- 数据库变更必须通过 Alembic migration,不要手动改表
- 前端有 i18n 支持,新增文案必须同时加中英文 key
- Agent 模块独立部署,修改时注意向下兼容
- Docker Compose 有四套配置 (dev/demo/prod/ssl),改一个要检查是否影响其他
- AI/MCP 相关代码修改后,确认 fastmcp 接口兼容性

## Health Stack

- typecheck: cd frontend && npx tsc --noEmit
- lint_fe: cd frontend && npx eslint .
- lint_be: /usr/local/Cellar/pyenv/versions/3.12.12/bin/ruff check . --exclude ".git,node_modules,frontend/node_modules,.venv,venv,__pycache__"
- test: cd backend && /usr/local/Cellar/pyenv/versions/3.12.12/bin/python -m pytest --tb=short -q
- deadcode: cd frontend && npx knip
- shell: find . -name "*.sh" -not -path "./.git/*" -not -path "*/node_modules/*" -exec shellcheck {} \;

## Workflow
默认档位: **M** — 产品迭代期,多为中等功能开发和 bug 修复。
紧急 bug (P0/P1) 用 S 档:`/investigate` → 修复 → `/review` → `/ship`
新大功能 (如新告警源接入) 用 L 档:`/autoplan` → 开发 → `/qa` → `/cso` → `/ship`
使用 `/workflow [任务描述]` 获取具体路由建议。
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.1.0.0
7 changes: 7 additions & 0 deletions agent/install-agent-centos7.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,13 @@ YUM_BACKUP_DIR="/etc/yum.repos.d.bak.vigilops-$(date +%Y%m%d%H%M%S)"
cp -r /etc/yum.repos.d "$YUM_BACKUP_DIR"
log_info "已备份到: $YUM_BACKUP_DIR"

# 恢复 yum 源函数(需在步骤 2 之前定义,供错误处理使用)
restore_yum() {
rm -f /etc/yum.repos.d/*.repo
cp -r "$YUM_BACKUP_DIR"/* /etc/yum.repos.d/
log_info "yum 源已恢复"
}

echo ""

################################################################################
Expand Down
2 changes: 0 additions & 2 deletions backend/alembic/versions/023_add_menu_settings_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@
Create Date: 2026-03-20
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB

revision = "023_add_menu_settings_table"
down_revision = "022_add_ops_tables"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
Create Date: 2026-03-20
"""
from alembic import op
import sqlalchemy as sa

revision = "024_add_ai_operation_logs_table"
down_revision = "023_add_menu_settings_table"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
Expand Down
2 changes: 0 additions & 2 deletions backend/alembic/versions/a9f923ad91b0_merge_multiple_heads.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
Expand Down
2 changes: 1 addition & 1 deletion backend/app/api/v1/alert_deduplication.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
Provides configuration management and statistics APIs for alert deduplication and aggregation.
"""
from datetime import datetime, timedelta
from typing import Dict, List
from typing import List

from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel, Field
Expand Down
2 changes: 1 addition & 1 deletion backend/app/api/v1/data_retention.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
- View data statistics and cleanup history
"""
from datetime import datetime, timedelta
from typing import Dict, Any
from typing import Dict

from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel, Field
Expand Down
5 changes: 4 additions & 1 deletion backend/app/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
management for database connections, Redis cache, AI services, JWT authentication, and other modules.
"""
import logging
import os
import secrets

from pydantic_settings import BaseSettings
Expand Down Expand Up @@ -143,6 +142,10 @@ class Settings(BaseSettings):
enable_remediation: bool = True # False = 仅诊断模式,不执行修复 (False = diagnosis-only demo mode)
demo_sse_max_clients: int = 50 # SSE 最大并发连接数 (Max concurrent SSE connections for demo)

# Autopilot Demo 配置 (Autopilot Demo Configuration)
demo_mode: bool = False # 启用 Autopilot Demo 模式 (Enable Autopilot Demo mode)
demo_fault_delay_seconds: int = 60 # 故障注入延迟秒数 (Fault injection delay in seconds)

# 环境变量别名(Environment Variable Aliases)
# Pydantic Settings 需要明确指定环境变量名称
WEBHOOK_ALLOWED_DOMAINS: str = ""
Expand Down
1 change: 0 additions & 1 deletion backend/app/core/log_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
- loki: Grafana Loki时序日志存储
"""

import asyncio
import logging
from abc import ABC, abstractmethod
from datetime import datetime
Expand Down
3 changes: 1 addition & 2 deletions backend/app/core/rate_limiting.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,9 @@
- 普通级别:常规业务 API (Normal: regular business API)
- 宽松级别:静态资源、健康检查 (Relaxed: static resources, health check)
"""
import hashlib
import json
import time
from typing import Dict, Optional, List, Tuple
from typing import Dict, Optional, Tuple

from fastapi import Request, HTTPException, status
from starlette.middleware.base import BaseHTTPMiddleware
Expand Down
2 changes: 1 addition & 1 deletion backend/app/core/security_middleware.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"""
import os
import re
from typing import Dict, List, Optional
from typing import Dict, Optional

from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware
Expand Down
11 changes: 10 additions & 1 deletion backend/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,12 @@ async def lifespan(app: FastAPI):
from app.tasks.remediation_listener import remediation_listener_loop
task_factories["remediation_listener"] = lambda: remediation_listener_loop()

# Autopilot Demo 流程(仅在 DEMO_MODE=true 时启动)
if app_settings.demo_mode:
from app.services.demo_orchestrator import run_demo_flow
task_factories["demo_autopilot"] = lambda: run_demo_flow()
logger.info("Demo mode enabled — Autopilot Demo will start automatically")

# 启动所有任务
for name, factory in task_factories.items():
background_tasks[name] = asyncio.create_task(factory(), name=name)
Expand Down Expand Up @@ -252,7 +258,6 @@ async def _monitor_tasks():
"https://demo.lchuangnet.com",
"https://lchuangnet.com",
"https://www.lchuangnet.com",
"http://139.196.210.68:3001",
]
if _frontend_url and _frontend_url not in allowed_origins:
allowed_origins.append(_frontend_url)
Expand Down Expand Up @@ -311,6 +316,10 @@ async def _monitor_tasks():
app.include_router(webhooks.router) # 外部告警源 Webhook (External Alert Source Webhooks)
app.include_router(alert_stream.router) # 告警诊断 SSE 流 (Alert Diagnosis SSE Stream)

# Demo 路由(始终注册,内部检查 DEMO_MODE)
from app.routers import demo
app.include_router(demo.router) # Autopilot Demo 状态 (Autopilot Demo Status)


@app.get("/health")
@app.get("/api/v1/health")
Expand Down
1 change: 0 additions & 1 deletion backend/app/mcp/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
"""
import argparse
import logging
import os
import sys
from pathlib import Path

Expand Down
4 changes: 1 addition & 3 deletions backend/app/mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,9 @@
import uuid
from contextlib import contextmanager
from datetime import datetime, timedelta, timezone
from typing import Dict, List, Optional, Any, Union
from typing import Dict, Optional, Any

from fastmcp import FastMCP
from pydantic import BaseModel
from sqlalchemy.orm import joinedload

from app.core.config import settings
from app.core.database import SessionLocal
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/alert.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from datetime import datetime, time
from typing import Optional

from sqlalchemy import String, Integer, Float, DateTime, Boolean, Text, JSON, Time, func, ARRAY
from sqlalchemy import String, Integer, Float, DateTime, Boolean, Text, JSON, Time, func
from sqlalchemy.orm import Mapped, mapped_column

from app.core.database import Base
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/alert_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from typing import Optional

from sqlalchemy import String, Integer, DateTime, Text, JSON, Boolean, func
from sqlalchemy.orm import Mapped, mapped_column, relationship
from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy import ForeignKey

from app.core.database import Base
Expand Down
1 change: 0 additions & 1 deletion backend/app/models/custom_runbook.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
Allows users to create custom remediation scripts, extending built-in Runbook capabilities.
"""
from datetime import datetime
from typing import Optional

from sqlalchemy import String, Integer, Boolean, DateTime, Text, JSON, func
from sqlalchemy.orm import Mapped, mapped_column
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/dashboard_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- 布局配置持久化保存
- 预设布局模板
"""
from sqlalchemy import Column, String, Integer, Boolean, Text, DateTime, ForeignKey, JSON
from sqlalchemy import Column, String, Integer, Boolean, DateTime, ForeignKey, JSON
from sqlalchemy.orm import relationship
from sqlalchemy.sql import func
from app.core.database import Base
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/notification_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from datetime import datetime
from typing import Optional

from sqlalchemy import String, Integer, DateTime, Boolean, Text, func
from sqlalchemy import String, DateTime, Boolean, Text, func
from sqlalchemy.orm import Mapped, mapped_column

from app.core.database import Base
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/ops_message.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""运维消息模型 (Ops Message Model)"""
import uuid
from datetime import datetime
from sqlalchemy import String, DateTime, ForeignKey, func, Text
from sqlalchemy import String, DateTime, ForeignKey, func
from sqlalchemy.dialects.postgresql import UUID, JSONB
from sqlalchemy.orm import Mapped, mapped_column
from app.core.database import Base
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/ops_session.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""运维会话模型 (Ops Session Model)"""
import uuid
from datetime import datetime
from sqlalchemy import String, DateTime, Integer, func, Text
from sqlalchemy import String, DateTime, Integer, func
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import Mapped, mapped_column
from app.core.database import Base
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/suppression_rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"""
from datetime import datetime

from sqlalchemy import Integer, String, Text, DateTime, Boolean, JSON, ForeignKey, func
from sqlalchemy import Integer, String, Text, DateTime, Boolean, ForeignKey, func
from sqlalchemy.orm import Mapped, mapped_column

from app.core.database import Base
Expand Down
2 changes: 1 addition & 1 deletion backend/app/models/user.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
role permissions, and other fields. Provides user authentication,
permission management, and account management functions for the system.
"""
from datetime import datetime, timezone
from datetime import datetime

from sqlalchemy import String, Boolean, DateTime, func
from sqlalchemy.orm import Mapped, mapped_column, relationship
Expand Down
2 changes: 1 addition & 1 deletion backend/app/remediation/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
from datetime import datetime, timezone

UTC = timezone.utc
from typing import Any, Dict, List, Optional
from typing import Any, Optional

from pydantic import BaseModel, Field

Expand Down
4 changes: 0 additions & 4 deletions backend/app/routers/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,9 @@

Author: NightMend Team
"""
import os
from datetime import datetime, timezone
from pathlib import Path

from fastapi import APIRouter, Depends, HTTPException
from fastapi.responses import FileResponse
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession

Expand Down Expand Up @@ -768,7 +765,6 @@ async def _check_log_keyword_alerts(logs: list, db: AsyncSession):
# =============================================================================

from fastapi import WebSocket, WebSocketDisconnect
from datetime import datetime
import asyncio
import hashlib
import hmac
Expand Down
Loading
Loading