langflow-ai
diff --git a/‎.agents/skills/backend-code-review/SKILL.md‎
Lines changed: 225 additions & 0 deletions b/‎.agents/skills/backend-code-review/SKILL.md‎
Lines changed: 225 additions & 0 deletions
@@ -0,0 +1,225 @@
+---
+name: backend-code-review
+description: Review backend code for quality, security, maintainability, and best practices based on established checklist rules. Use when the user requests a review, analysis, or improvement of backend files (e.g., `.py`) under the `src/backend/` directory. Do NOT use for frontend files (e.g., `.tsx`, `.ts`, `.js`). Supports pending-change review, code snippets review, and file-focused review.
+---
+
+# Backend Code Review
+
+## When to use this skill
+
+Use this skill whenever the user asks to **review, analyze, or improve** backend code (e.g., `.py`) under the `src/backend/` directory. Supports the following review modes:
+
+- **Pending-change review**: when the user asks to review current changes (inspect staged/working-tree files slated for commit to get the changes).
+- **Code snippets review**: when the user pastes code snippets (e.g., a function/class/module excerpt) into the chat and asks for a review.
+- **File-focused review**: when the user points to specific files and asks for a review of those files (one file or a small, explicit set of files, e.g., `src/backend/base/langflow/api/v1/flows.py`).
+
+Do NOT use this skill when:
+
+- The request is about frontend code or UI (e.g., `.tsx`, `.ts`, `.js`, `src/frontend/`).
+- The user is not asking for a review/analysis/improvement of backend code.
+- The scope is not under `src/backend/` (unless the user explicitly asks to review backend-related changes outside `src/backend/`).
+
+## How to use this skill
+
+Follow these steps when using this skill:
+
+1. **Identify the review mode** (pending-change vs snippet vs file-focused) based on the user's input. Keep the scope tight: review only what the user provided or explicitly referenced.
+2. Follow the rules defined in **Checklist** to perform the review. If no Checklist rule matches, apply **General Review Rules** as a fallback to perform the best-effort review.
+3. Compose the final output strictly following the **Required Output Format**.
+
+Notes when using this skill:
+- Always include actionable fixes or suggestions (including possible code snippets).
+- Use best-effort `File:Line` references when a file path and line numbers are available; otherwise, use the most specific identifier you can.
+
+## Checklist
+
+- db schema design: if the review scope includes code/files under `src/backend/base/langflow/services/database/models/` or Alembic migrations under `src/backend/base/langflow/alembic/versions/`, follow [references/db-schema-rule.md](references/db-schema-rule.md) to perform the review
+- architecture: if the review scope involves route/service/model layering, dependency direction, or moving responsibilities across modules, follow [references/architecture-rule.md](references/architecture-rule.md) to perform the review
+- service abstraction: if the review scope contains table/model operations (e.g., `select(...)`, `session.execute(...)`, joins, CRUD) and is not already inside a service under `src/backend/base/langflow/services/`, follow [references/repositories-rule.md](references/repositories-rule.md) to perform the review
+- sqlalchemy patterns: if the review scope involves SQLAlchemy/SQLModel session/query usage, db transaction/crud usage, `session_scope()` usage, or raw SQL usage, follow [references/sqlalchemy-rule.md](references/sqlalchemy-rule.md) to perform the review
+
+## General Review Rules
+
+### 1. Security Review
+
+Check for:
+- SQL injection vulnerabilities (especially raw `text()` queries with string interpolation). Consequence: attacker can read/modify/delete any data in the database.
+- Server-Side Request Forgery (SSRF) in component HTTP calls. Consequence: attacker uses the server to scan internal networks or access cloud metadata endpoints.
+- Command injection (especially in subprocess or shell-executing components). Consequence: attacker gains shell access to the server.
+- Insecure deserialization (pickle, yaml.load without SafeLoader). Consequence: arbitrary code execution on the server.
+- Hardcoded secrets/credentials. Consequence: secrets leak via git history and are impossible to fully revoke.
+- Improper authentication/authorization (missing `CurrentActiveUser` dependency). Consequence: unauthenticated users can access protected endpoints.
+- Insecure direct object references (missing `user_id` scoping on queries). Consequence: user A can read/modify user B's flows, variables, API keys.
+- Path traversal in file storage operations. Consequence: attacker reads arbitrary server files (e.g., `/etc/passwd`, `.env`).
+
+### 2. Performance Review
+
+Check for:
+- N+1 queries (especially in loops calling `session.execute()`). Consequence: 100 flows = 101 DB queries instead of 2; page load goes from 50ms to 5s.
+- Missing database indexes on frequently queried columns. Consequence: full table scans on large datasets; queries degrade from O(log n) to O(n).
+- Memory leaks (unbounded caches, retained references in long-lived services). Consequence: server OOM after hours of operation; pods restart in production.
+- Blocking operations in async code (`time.sleep()`, synchronous I/O, CPU-bound work without `run_in_executor`). Consequence: entire event loop stalls; all concurrent requests hang until the blocking call completes.
+- Missing caching opportunities for expensive computations. Consequence: repeated computation of the same result on every request.
+- Large result sets loaded entirely into memory without pagination. Consequence: memory spike + slow response when user has 10K+ flows.
+
+### 3. Code Quality Review
+
+Check for:
+- Code forward compatibility with Python 3.10-3.13
+- Code duplication (DRY violations — extract when the *exact same business rule* is duplicated in 3+ places)
+- Functions doing too much (SRP violations — if you need "and" to describe it, split it)
+- Deep nesting / complex conditionals (prefer early returns and guard clauses)
+- Magic numbers/strings (extract to named constants or enums)
+- Poor naming: unclear abbreviations, misleading names, generic names (`data`, `result`, `obj`, `temp`). Functions should use verbs (`get`, `create`, `validate`). Booleans should use prefixes (`is_`, `has_`, `can_`, `should_`).
+- Missing error handling (bare `except`, swallowed exceptions, silent failures)
+- Incomplete type coverage (use strong typing, avoid `Any` where a concrete type is known)
+- Use Python 3.10+ union syntax (`X | Y` not `Union[X, Y]`, `X | None` not `Optional[X]`)
+- Use `TYPE_CHECKING` guard for imports only needed for type annotations (prevents circular imports)
+- Use `Annotated[Type, Depends(...)]` with project aliases (`CurrentActiveUser`, `DbSession`, `DbSessionReadOnly`) for FastAPI DI
+- Google-style docstrings (enforced by Ruff): `Args:`, `Returns:`, `Raises:` sections for public functions
+- Violations of SOLID principles
+- YAGNI violations (code that anticipates future needs without a present requirement)
+- Line length exceeding 120 characters (project Ruff config)
+- Comments that explain WHAT instead of WHY (comments should only explain reasoning, not restate code)
+- Commented-out code (use version control instead)
+- Boolean parameters that switch function behavior (split into two named functions instead)
+- Mutable shared state where immutable alternatives exist (prefer returning new objects over mutation)
+
+### 4. File Structure Review
+
+Check for:
+- Production files exceeding ~500 lines of code (excluding imports, types, and docstrings). Files above 600 lines are a red flag and should be split by responsibility. Why: Files above 500 lines have statistically higher defect rates and take longer to review. They signal multiple responsibilities (SRP violation). In Langflow, services like `DatabaseService` that grow beyond this limit should have their CRUD operations extracted to dedicated modules.
+- Test files exceeding ~1000 lines. Split by logical grouping if exceeded.
+- No more than 5 functions with different responsibilities in a single file (per AGENTS-example.md).
+- Each file has a single reason to exist and a single reason to change (SRP).
+- No generic file names: `utils.py`, `helpers.py`, `misc.py`, `common.py` as standalone files. Why: A file named `utils.py` becomes a dumping ground for unrelated functions. Within months it has 50+ functions covering formatting, validation, parsing, and HTTP calls — violating SRP. Each function group should be in a file named after its responsibility (`formatting.py`, `validation.py`).
+
+### 5. Testing Review
+
+Check for:
+- Missing test coverage for new code paths
+- Tests that don't test behavior (testing implementation details)
+- Flaky test patterns (time-dependent, order-dependent, external-service-dependent)
+- Proper use of `pytest.mark.asyncio` for async tests
+- Excessive mocking (prefer real integrations per project conventions)
+- Coverage target: 80% (minimum acceptable: 75%)
+- Test anti-patterns: The Liar (passes but doesn't verify claimed behavior), The Mirror (asserts exactly what code does), The Giant (50+ lines setup), The Mockery (tests only mock setup), The Inspector (coupled to implementation), The Chain Gang (depends on execution order), The Flaky (inconsistent results)
+
+**Happy path tests are the foundation but are NOT enough.** Tests MUST also challenge the code to find real defects:
+
+- **Unexpected inputs**: `None`, `""`, `[]`, `{}`, `0`, `-1`, `UUID("00000000-0000-0000-0000-000000000000")`
+- **Boundary values**: max length strings, exactly at the limit, one past the limit, zero items, max items
+- **Malformed data**: missing required fields, extra unexpected fields, wrong types, invalid formats
+- **Error states**: what happens when the database is down? When an external API returns 500? When the user doesn't exist?
+- **What should NOT happen**: verify that user A CANNOT access user B's flows. Verify that a deleted flow returns 404. Verify that invalid `endpoint_name` is rejected with 422.
+- **Error messages and types**: not just that it fails, but that it fails with the RIGHT exception and the RIGHT message
+- **Concurrency**: what happens when two requests try to update the same flow simultaneously?
+
+**Write tests based on REQUIREMENTS/SPEC, not on what the source code currently does.** This is how you catch bugs where the code diverges from expected behavior.
+
+**When a test fails:** first ask if the CODE is wrong, not the test. Do NOT silently change a failing assertion to match the current code without understanding WHY.
+
+### 6. Observability Review
+
+Check for:
+- Use the async logger from `lfx.log.logger` with `a`-prefixed methods (`adebug`, `ainfo`, `awarning`, `aerror`, `aexception`). Never use `print()` or stdlib `logging`.
+- Log at key decision points and boundaries, not inside tight loops
+- Include: operation name, relevant IDs, outcome (success/failure), duration if relevant
+- Correct log levels: ERROR (broken, needs attention), WARN (degraded but recoverable), INFO (significant events), DEBUG (diagnostic, off in prod)
+- **ZERO PII TOLERANCE**: Never log email addresses, user names, phone numbers, tokens, passwords. Only approved identifiers: `user_id`, `flow_id`, `session_id`
+- No `print()` statements — these go to production logs
+- Use `{e!s}` for string representation of exceptions in log messages
+
+### 7. Pre-Commit Verification
+
+For pending-change reviews, verify the author has run:
+- `make format_backend` (Ruff formatter) — inconsistent formatting creates noisy diffs that hide real changes in code review. Format first, review second.
+- `make lint` (MyPy type checking) — type errors caught at lint time are 10x cheaper to fix than runtime crashes in production. Langflow services use duck typing via `Service` base class; MyPy catches mismatches early.
+- `make unit_tests` (pytest) — a failing test means the change breaks existing behavior. Never merge with failing tests; investigate whether the code or the test is wrong.
+
+## Required Output Format
+
+When this skill is invoked, the response must exactly follow one of the two templates:
+
+### Template A (any findings)
+
+```markdown
+# Code Review Summary
+
+Found <X> critical issues need to be fixed:
+
+## 🔴 Critical (Must Fix)
+
+### 1. <brief description of the issue>
+
+FilePath: <path> line <line>
+<relevant code snippet or pointer>
+
+#### Explanation
+
+<detailed explanation and references of the issue>
+
+#### Suggested Fix
+
+1. <brief description of suggested fix>
+2. <code example> (optional, omit if not applicable)
+
+---
+... (repeat for each critical issue) ...
+
+Found <Y> suggestions for improvement:
+
+## 🟡 Suggestions (Should Consider)
+
+### 1. <brief description of the suggestion>
+
+FilePath: <path> line <line>
+<relevant code snippet or pointer>
+
+#### Explanation
+
+<detailed explanation and references of the suggestion>
+
+#### Suggested Fix
+
+1. <brief description of suggested fix>
+2. <code example> (optional, omit if not applicable)
+
+---
+... (repeat for each suggestion) ...
+
+Found <Z> optional nits:
+
+## 🟢 Nits (Optional)
+### 1. <brief description of the nit>
+
+FilePath: <path> line <line>
+<relevant code snippet or pointer>
+
+#### Explanation
+
+<explanation and references of the optional nit>
+
+#### Suggested Fix
+
+- <minor suggestions>
+
+---
+... (repeat for each nits) ...
+
+## ✅ What's Good
+
+- <Positive feedback on good patterns>
+```
+
+- If there are no critical issues or suggestions or optional nits or good points, just omit that section.
+- If the issue number is more than 10, summarize as "Found 10+ critical issues/suggestions/optional nits" and only output the first 10 items.
+- Don't compress the blank lines between sections; keep them as-is for readability.
+- If there is any issue that requires code changes, append a brief follow-up question to ask whether the user wants to apply the fix(es) after the structured output. For example: "Would you like me to use the Suggested fix(es) to address these issues?"
+
+### Template B (no issues)
+
+```markdown
+## Code Review Summary
+✅ No issues found.
+```