|
| 1 | +--- |
| 2 | +name: backend-code-review |
| 3 | +description: Review backend code for quality, security, maintainability, and best practices based on established checklist rules. Use when the user requests a review, analysis, or improvement of backend files (e.g., `.py`) under the `src/backend/` directory. Do NOT use for frontend files (e.g., `.tsx`, `.ts`, `.js`). Supports pending-change review, code snippets review, and file-focused review. |
| 4 | +--- |
| 5 | + |
| 6 | +# Backend Code Review |
| 7 | + |
| 8 | +## When to use this skill |
| 9 | + |
| 10 | +Use this skill whenever the user asks to **review, analyze, or improve** backend code (e.g., `.py`) under the `src/backend/` directory. Supports the following review modes: |
| 11 | + |
| 12 | +- **Pending-change review**: when the user asks to review current changes (inspect staged/working-tree files slated for commit to get the changes). |
| 13 | +- **Code snippets review**: when the user pastes code snippets (e.g., a function/class/module excerpt) into the chat and asks for a review. |
| 14 | +- **File-focused review**: when the user points to specific files and asks for a review of those files (one file or a small, explicit set of files, e.g., `src/backend/base/langflow/api/v1/flows.py`). |
| 15 | + |
| 16 | +Do NOT use this skill when: |
| 17 | + |
| 18 | +- The request is about frontend code or UI (e.g., `.tsx`, `.ts`, `.js`, `src/frontend/`). |
| 19 | +- The user is not asking for a review/analysis/improvement of backend code. |
| 20 | +- The scope is not under `src/backend/` (unless the user explicitly asks to review backend-related changes outside `src/backend/`). |
| 21 | + |
| 22 | +## How to use this skill |
| 23 | + |
| 24 | +Follow these steps when using this skill: |
| 25 | + |
| 26 | +1. **Identify the review mode** (pending-change vs snippet vs file-focused) based on the user's input. Keep the scope tight: review only what the user provided or explicitly referenced. |
| 27 | +2. Follow the rules defined in **Checklist** to perform the review. If no Checklist rule matches, apply **General Review Rules** as a fallback to perform the best-effort review. |
| 28 | +3. Compose the final output strictly following the **Required Output Format**. |
| 29 | + |
| 30 | +Notes when using this skill: |
| 31 | +- Always include actionable fixes or suggestions (including possible code snippets). |
| 32 | +- Use best-effort `File:Line` references when a file path and line numbers are available; otherwise, use the most specific identifier you can. |
| 33 | + |
| 34 | +## Checklist |
| 35 | + |
| 36 | +- db schema design: if the review scope includes code/files under `src/backend/base/langflow/services/database/models/` or Alembic migrations under `src/backend/base/langflow/alembic/versions/`, follow [references/db-schema-rule.md](references/db-schema-rule.md) to perform the review |
| 37 | +- architecture: if the review scope involves route/service/model layering, dependency direction, or moving responsibilities across modules, follow [references/architecture-rule.md](references/architecture-rule.md) to perform the review |
| 38 | +- service abstraction: if the review scope contains table/model operations (e.g., `select(...)`, `session.execute(...)`, joins, CRUD) and is not already inside a service under `src/backend/base/langflow/services/`, follow [references/repositories-rule.md](references/repositories-rule.md) to perform the review |
| 39 | +- sqlalchemy patterns: if the review scope involves SQLAlchemy/SQLModel session/query usage, db transaction/crud usage, `session_scope()` usage, or raw SQL usage, follow [references/sqlalchemy-rule.md](references/sqlalchemy-rule.md) to perform the review |
| 40 | + |
| 41 | +## General Review Rules |
| 42 | + |
| 43 | +### 1. Security Review |
| 44 | + |
| 45 | +Check for: |
| 46 | +- SQL injection vulnerabilities (especially raw `text()` queries with string interpolation). Consequence: attacker can read/modify/delete any data in the database. |
| 47 | +- Server-Side Request Forgery (SSRF) in component HTTP calls. Consequence: attacker uses the server to scan internal networks or access cloud metadata endpoints. |
| 48 | +- Command injection (especially in subprocess or shell-executing components). Consequence: attacker gains shell access to the server. |
| 49 | +- Insecure deserialization (pickle, yaml.load without SafeLoader). Consequence: arbitrary code execution on the server. |
| 50 | +- Hardcoded secrets/credentials. Consequence: secrets leak via git history and are impossible to fully revoke. |
| 51 | +- Improper authentication/authorization (missing `CurrentActiveUser` dependency). Consequence: unauthenticated users can access protected endpoints. |
| 52 | +- Insecure direct object references (missing `user_id` scoping on queries). Consequence: user A can read/modify user B's flows, variables, API keys. |
| 53 | +- Path traversal in file storage operations. Consequence: attacker reads arbitrary server files (e.g., `/etc/passwd`, `.env`). |
| 54 | + |
| 55 | +### 2. Performance Review |
| 56 | + |
| 57 | +Check for: |
| 58 | +- N+1 queries (especially in loops calling `session.execute()`). Consequence: 100 flows = 101 DB queries instead of 2; page load goes from 50ms to 5s. |
| 59 | +- Missing database indexes on frequently queried columns. Consequence: full table scans on large datasets; queries degrade from O(log n) to O(n). |
| 60 | +- Memory leaks (unbounded caches, retained references in long-lived services). Consequence: server OOM after hours of operation; pods restart in production. |
| 61 | +- Blocking operations in async code (`time.sleep()`, synchronous I/O, CPU-bound work without `run_in_executor`). Consequence: entire event loop stalls; all concurrent requests hang until the blocking call completes. |
| 62 | +- Missing caching opportunities for expensive computations. Consequence: repeated computation of the same result on every request. |
| 63 | +- Large result sets loaded entirely into memory without pagination. Consequence: memory spike + slow response when user has 10K+ flows. |
| 64 | + |
| 65 | +### 3. Code Quality Review |
| 66 | + |
| 67 | +Check for: |
| 68 | +- Code forward compatibility with Python 3.10-3.13 |
| 69 | +- Code duplication (DRY violations — extract when the *exact same business rule* is duplicated in 3+ places) |
| 70 | +- Functions doing too much (SRP violations — if you need "and" to describe it, split it) |
| 71 | +- Deep nesting / complex conditionals (prefer early returns and guard clauses) |
| 72 | +- Magic numbers/strings (extract to named constants or enums) |
| 73 | +- Poor naming: unclear abbreviations, misleading names, generic names (`data`, `result`, `obj`, `temp`). Functions should use verbs (`get`, `create`, `validate`). Booleans should use prefixes (`is_`, `has_`, `can_`, `should_`). |
| 74 | +- Missing error handling (bare `except`, swallowed exceptions, silent failures) |
| 75 | +- Incomplete type coverage (use strong typing, avoid `Any` where a concrete type is known) |
| 76 | +- Use Python 3.10+ union syntax (`X | Y` not `Union[X, Y]`, `X | None` not `Optional[X]`) |
| 77 | +- Use `TYPE_CHECKING` guard for imports only needed for type annotations (prevents circular imports) |
| 78 | +- Use `Annotated[Type, Depends(...)]` with project aliases (`CurrentActiveUser`, `DbSession`, `DbSessionReadOnly`) for FastAPI DI |
| 79 | +- Google-style docstrings (enforced by Ruff): `Args:`, `Returns:`, `Raises:` sections for public functions |
| 80 | +- Violations of SOLID principles |
| 81 | +- YAGNI violations (code that anticipates future needs without a present requirement) |
| 82 | +- Line length exceeding 120 characters (project Ruff config) |
| 83 | +- Comments that explain WHAT instead of WHY (comments should only explain reasoning, not restate code) |
| 84 | +- Commented-out code (use version control instead) |
| 85 | +- Boolean parameters that switch function behavior (split into two named functions instead) |
| 86 | +- Mutable shared state where immutable alternatives exist (prefer returning new objects over mutation) |
| 87 | + |
| 88 | +### 4. File Structure Review |
| 89 | + |
| 90 | +Check for: |
| 91 | +- Production files exceeding ~500 lines of code (excluding imports, types, and docstrings). Files above 600 lines are a red flag and should be split by responsibility. Why: Files above 500 lines have statistically higher defect rates and take longer to review. They signal multiple responsibilities (SRP violation). In Langflow, services like `DatabaseService` that grow beyond this limit should have their CRUD operations extracted to dedicated modules. |
| 92 | +- Test files exceeding ~1000 lines. Split by logical grouping if exceeded. |
| 93 | +- No more than 5 functions with different responsibilities in a single file (per AGENTS-example.md). |
| 94 | +- Each file has a single reason to exist and a single reason to change (SRP). |
| 95 | +- No generic file names: `utils.py`, `helpers.py`, `misc.py`, `common.py` as standalone files. Why: A file named `utils.py` becomes a dumping ground for unrelated functions. Within months it has 50+ functions covering formatting, validation, parsing, and HTTP calls — violating SRP. Each function group should be in a file named after its responsibility (`formatting.py`, `validation.py`). |
| 96 | + |
| 97 | +### 5. Testing Review |
| 98 | + |
| 99 | +Check for: |
| 100 | +- Missing test coverage for new code paths |
| 101 | +- Tests that don't test behavior (testing implementation details) |
| 102 | +- Flaky test patterns (time-dependent, order-dependent, external-service-dependent) |
| 103 | +- Proper use of `pytest.mark.asyncio` for async tests |
| 104 | +- Excessive mocking (prefer real integrations per project conventions) |
| 105 | +- Coverage target: 80% (minimum acceptable: 75%) |
| 106 | +- Test anti-patterns: The Liar (passes but doesn't verify claimed behavior), The Mirror (asserts exactly what code does), The Giant (50+ lines setup), The Mockery (tests only mock setup), The Inspector (coupled to implementation), The Chain Gang (depends on execution order), The Flaky (inconsistent results) |
| 107 | + |
| 108 | +**Happy path tests are the foundation but are NOT enough.** Tests MUST also challenge the code to find real defects: |
| 109 | + |
| 110 | +- **Unexpected inputs**: `None`, `""`, `[]`, `{}`, `0`, `-1`, `UUID("00000000-0000-0000-0000-000000000000")` |
| 111 | +- **Boundary values**: max length strings, exactly at the limit, one past the limit, zero items, max items |
| 112 | +- **Malformed data**: missing required fields, extra unexpected fields, wrong types, invalid formats |
| 113 | +- **Error states**: what happens when the database is down? When an external API returns 500? When the user doesn't exist? |
| 114 | +- **What should NOT happen**: verify that user A CANNOT access user B's flows. Verify that a deleted flow returns 404. Verify that invalid `endpoint_name` is rejected with 422. |
| 115 | +- **Error messages and types**: not just that it fails, but that it fails with the RIGHT exception and the RIGHT message |
| 116 | +- **Concurrency**: what happens when two requests try to update the same flow simultaneously? |
| 117 | + |
| 118 | +**Write tests based on REQUIREMENTS/SPEC, not on what the source code currently does.** This is how you catch bugs where the code diverges from expected behavior. |
| 119 | + |
| 120 | +**When a test fails:** first ask if the CODE is wrong, not the test. Do NOT silently change a failing assertion to match the current code without understanding WHY. |
| 121 | + |
| 122 | +### 6. Observability Review |
| 123 | + |
| 124 | +Check for: |
| 125 | +- Use the async logger from `lfx.log.logger` with `a`-prefixed methods (`adebug`, `ainfo`, `awarning`, `aerror`, `aexception`). Never use `print()` or stdlib `logging`. |
| 126 | +- Log at key decision points and boundaries, not inside tight loops |
| 127 | +- Include: operation name, relevant IDs, outcome (success/failure), duration if relevant |
| 128 | +- Correct log levels: ERROR (broken, needs attention), WARN (degraded but recoverable), INFO (significant events), DEBUG (diagnostic, off in prod) |
| 129 | +- **ZERO PII TOLERANCE**: Never log email addresses, user names, phone numbers, tokens, passwords. Only approved identifiers: `user_id`, `flow_id`, `session_id` |
| 130 | +- No `print()` statements — these go to production logs |
| 131 | +- Use `{e!s}` for string representation of exceptions in log messages |
| 132 | + |
| 133 | +### 7. Pre-Commit Verification |
| 134 | + |
| 135 | +For pending-change reviews, verify the author has run: |
| 136 | +- `make format_backend` (Ruff formatter) — inconsistent formatting creates noisy diffs that hide real changes in code review. Format first, review second. |
| 137 | +- `make lint` (MyPy type checking) — type errors caught at lint time are 10x cheaper to fix than runtime crashes in production. Langflow services use duck typing via `Service` base class; MyPy catches mismatches early. |
| 138 | +- `make unit_tests` (pytest) — a failing test means the change breaks existing behavior. Never merge with failing tests; investigate whether the code or the test is wrong. |
| 139 | + |
| 140 | +## Required Output Format |
| 141 | + |
| 142 | +When this skill is invoked, the response must exactly follow one of the two templates: |
| 143 | + |
| 144 | +### Template A (any findings) |
| 145 | + |
| 146 | +```markdown |
| 147 | +# Code Review Summary |
| 148 | + |
| 149 | +Found <X> critical issues need to be fixed: |
| 150 | + |
| 151 | +## 🔴 Critical (Must Fix) |
| 152 | + |
| 153 | +### 1. <brief description of the issue> |
| 154 | + |
| 155 | +FilePath: <path> line <line> |
| 156 | +<relevant code snippet or pointer> |
| 157 | + |
| 158 | +#### Explanation |
| 159 | + |
| 160 | +<detailed explanation and references of the issue> |
| 161 | + |
| 162 | +#### Suggested Fix |
| 163 | + |
| 164 | +1. <brief description of suggested fix> |
| 165 | +2. <code example> (optional, omit if not applicable) |
| 166 | + |
| 167 | +--- |
| 168 | +... (repeat for each critical issue) ... |
| 169 | + |
| 170 | +Found <Y> suggestions for improvement: |
| 171 | + |
| 172 | +## 🟡 Suggestions (Should Consider) |
| 173 | + |
| 174 | +### 1. <brief description of the suggestion> |
| 175 | + |
| 176 | +FilePath: <path> line <line> |
| 177 | +<relevant code snippet or pointer> |
| 178 | + |
| 179 | +#### Explanation |
| 180 | + |
| 181 | +<detailed explanation and references of the suggestion> |
| 182 | + |
| 183 | +#### Suggested Fix |
| 184 | + |
| 185 | +1. <brief description of suggested fix> |
| 186 | +2. <code example> (optional, omit if not applicable) |
| 187 | + |
| 188 | +--- |
| 189 | +... (repeat for each suggestion) ... |
| 190 | + |
| 191 | +Found <Z> optional nits: |
| 192 | + |
| 193 | +## 🟢 Nits (Optional) |
| 194 | +### 1. <brief description of the nit> |
| 195 | + |
| 196 | +FilePath: <path> line <line> |
| 197 | +<relevant code snippet or pointer> |
| 198 | + |
| 199 | +#### Explanation |
| 200 | + |
| 201 | +<explanation and references of the optional nit> |
| 202 | + |
| 203 | +#### Suggested Fix |
| 204 | + |
| 205 | +- <minor suggestions> |
| 206 | + |
| 207 | +--- |
| 208 | +... (repeat for each nits) ... |
| 209 | + |
| 210 | +## ✅ What's Good |
| 211 | + |
| 212 | +- <Positive feedback on good patterns> |
| 213 | +``` |
| 214 | + |
| 215 | +- If there are no critical issues or suggestions or optional nits or good points, just omit that section. |
| 216 | +- If the issue number is more than 10, summarize as "Found 10+ critical issues/suggestions/optional nits" and only output the first 10 items. |
| 217 | +- Don't compress the blank lines between sections; keep them as-is for readability. |
| 218 | +- If there is any issue that requires code changes, append a brief follow-up question to ask whether the user wants to apply the fix(es) after the structured output. For example: "Would you like me to use the Suggested fix(es) to address these issues?" |
| 219 | + |
| 220 | +### Template B (no issues) |
| 221 | + |
| 222 | +```markdown |
| 223 | +## Code Review Summary |
| 224 | +✅ No issues found. |
| 225 | +``` |
0 commit comments