Skip to content

feat: diff tree from lockfile and new dependency added, example poc#391

Open
lirantal wants to merge 1 commit intomainfrom
feat/transitive-deps-check
Open

feat: diff tree from lockfile and new dependency added, example poc#391
lirantal wants to merge 1 commit intomainfrom
feat/transitive-deps-check

Conversation

@lirantal
Copy link
Owner

@lirantal lirantal commented Nov 29, 2025

User description

IDEATION Experimental branch for diff'ing dependency changes when adding a new dependency to a project with a lockfile or node_modules

Description

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Checklist:

  • I have updated the documentation (if required).
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

Enhancement


Description

  • Add dependency tree analysis using npm Arborist library

  • Implement diff calculation between actual and ideal dependency trees

  • Extract and report newly added transitive dependencies

  • Support lockfile-based dependency resolution without node_modules


Diagram Walkthrough

flowchart LR
  A["Project Path"] --> B["Load Actual Tree"]
  A --> C["Load Virtual Tree"]
  B --> D["Build Ideal Tree<br/>with New Dependency"]
  C --> D
  D --> E["Calculate Diff<br/>actual vs ideal"]
  E --> F["Extract Added<br/>Dependencies"]
  F --> G["Output JSON<br/>Added Packages"]
Loading

File Walkthrough

Relevant files
Enhancement
deps-analyzer.js
Dependency tree diff analyzer with Arborist                           

lib/deps-analyzer.js

  • New module for analyzing dependency tree changes using Arborist
  • Loads actual tree from node_modules and virtual tree from lockfile
  • Builds ideal tree with new dependency and calculates diff
  • Walks diff tree to extract all packages with ADD action
  • Outputs added dependencies with name, version, location and integrity
    info
+96/-0   
Dependencies
package.json
Add Arborist dependency                                                                   

package.json

  • Add @npmcli/arborist v9.1.8 as new dependency
+1/-0     

@qodo-free-for-open-source-projects
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🔴
Hardcoded dependency injection

Description: Hardcoded dependency name 'ai' in production code could lead to unintended package
installation, potentially introducing malicious or vulnerable dependencies into the
project.
deps-analyzer.js [28-28]

Referred Code
args.add = "ai";
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Generic variable naming: Variables arb, err, and adds use abbreviated or generic names that don't clearly
express their purpose without additional context.

Referred Code
const arb = new Arborist({ path: projectPath });

// Load the actual tree if node_modules exists (this returns the root node)
// If you only have a package-lock.json and no node_modules, loadVirtual() is the method that reads lockfile/yarn.lock
let actual;
try {
  actual = await arb.loadActual();
} catch (err) {
  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
  actual = arb.actualTree || undefined;
}

// If you have a lockfile and want to build from it, ensure the virtual tree is loaded first.
// (If you already have node_modules, buildIdealTree will still consult lockfiles.)
try {
  await arb.loadVirtual();
} catch (err) {
  // loadVirtual will throw if there's no package-lock.json/package.json; ignore if not using it.
}

// Build the ideal tree with the new dependency. `add` takes an array of specifiers (e.g. 'foo@1.2.3').


 ... (clipped 12 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Silent error handling: Errors from loadActual() and loadVirtual() are caught but silently ignored without logging
or providing context about what failed.

Referred Code
try {
  actual = await arb.loadActual();
} catch (err) {
  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
  actual = arb.actualTree || undefined;
}

// If you have a lockfile and want to build from it, ensure the virtual tree is loaded first.
// (If you already have node_modules, buildIdealTree will still consult lockfiles.)
try {
  await arb.loadVirtual();
} catch (err) {
  // loadVirtual will throw if there's no package-lock.json/package.json; ignore if not using it.
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Error exposure risk: The catch block at line 93 logs the entire error object to console, which may expose
internal system details or stack traces to end users.

Referred Code
main().catch(err => {
  console.error(err);
  process.exit(2);
});

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing input validation: The args.add value is used directly without validation or sanitization before being passed
to buildIdealTree(), which could lead to injection or unexpected behavior.

Referred Code
  const args = {}
  args.path = process.cwd();
  args.add = "ai";

const projectPath = path.resolve(args.path);
if (!fs.existsSync(projectPath)) {
  console.error('Project path does not exist:', projectPath);
  process.exit(1);
}

// instantiate Arborist pointing at the project. Add registry/auth options here if needed.
const arb = new Arborist({ path: projectPath });

// Load the actual tree if node_modules exists (this returns the root node)
// If you only have a package-lock.json and no node_modules, loadVirtual() is the method that reads lockfile/yarn.lock
let actual;
try {
  actual = await arb.loadActual();
} catch (err) {
  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
  actual = arb.actualTree || undefined;


 ... (clipped 14 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Unvalidated input

Description: The script hardcodes args.add = "ai" and defaults args.path = process.cwd() without
validation or sanitization, enabling unintended package installation analysis on arbitrary
working directories and potentially causing the tool to resolve untrusted specs or paths
if later integrated with install flows.
deps-analyzer.js [26-35]

Referred Code
  const args = {}
  args.path = process.cwd();
  args.add = "ai";

const projectPath = path.resolve(args.path);
if (!fs.existsSync(projectPath)) {
  console.error('Project path does not exist:', projectPath);
  process.exit(1);
}
Sensitive info exposure

Description: Unstructured error logging with console.error(err) may leak sensitive tokens or
registry/auth details present in Arborist error objects to stdout/stderr.
deps-analyzer.js [90-95]

Referred Code
  console.log(JSON.stringify({ added: adds }, null, 2));
}

main().catch(err => {
  console.error(err);
  process.exit(2);
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing input validation: The script uses args.path and args.add without validating or sanitizing external input
(the parseArgs path is commented out), and proceeds to build an ideal tree and output
dependency metadata without validation.

Referred Code
  const args = {}
  args.path = process.cwd();
  args.add = "ai";

const projectPath = path.resolve(args.path);
if (!fs.existsSync(projectPath)) {
  console.error('Project path does not exist:', projectPath);
  process.exit(1);
}

// instantiate Arborist pointing at the project. Add registry/auth options here if needed.
const arb = new Arborist({ path: projectPath });

// Load the actual tree if node_modules exists (this returns the root node)
// If you only have a package-lock.json and no node_modules, loadVirtual() is the method that reads lockfile/yarn.lock
let actual;
try {
  actual = await arb.loadActual();
} catch (err) {
  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
  actual = arb.actualTree || undefined;


 ... (clipped 17 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: The new script performs dependency analysis and outputs results but does not implement
structured audit logging of critical actions or outcomes beyond console output, making
auditability unclear.

Referred Code
// instantiate Arborist pointing at the project. Add registry/auth options here if needed.
const arb = new Arborist({ path: projectPath });

// Load the actual tree if node_modules exists (this returns the root node)
// If you only have a package-lock.json and no node_modules, loadVirtual() is the method that reads lockfile/yarn.lock
let actual;
try {
  actual = await arb.loadActual();
} catch (err) {
  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
  actual = arb.actualTree || undefined;
}

// If you have a lockfile and want to build from it, ensure the virtual tree is loaded first.
// (If you already have node_modules, buildIdealTree will still consult lockfiles.)
try {
  await arb.loadVirtual();
} catch (err) {
  // loadVirtual will throw if there's no package-lock.json/package.json; ignore if not using it.
}



 ... (clipped 35 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Generic errors: Errors are printed with console.error and the process exits without contextual details
(e.g., which operation failed or inputs used), and edge cases like missing lockfile/actual
tree are partially handled but without structured context.

Referred Code
if (!fs.existsSync(projectPath)) {
  console.error('Project path does not exist:', projectPath);
  process.exit(1);
}

// instantiate Arborist pointing at the project. Add registry/auth options here if needed.
const arb = new Arborist({ path: projectPath });

// Load the actual tree if node_modules exists (this returns the root node)
// If you only have a package-lock.json and no node_modules, loadVirtual() is the method that reads lockfile/yarn.lock
let actual;
try {
  actual = await arb.loadActual();
} catch (err) {
  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
  actual = arb.actualTree || undefined;
}

// If you have a lockfile and want to build from it, ensure the virtual tree is loaded first.
// (If you already have node_modules, buildIdealTree will still consult lockfiles.)
try {


 ... (clipped 45 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Unstructured logging: The code logs operational messages and outputs using console and raw JSON without a
structured logging framework or safeguards to prevent sensitive data exposure in errors.

Referred Code
if (!fs.existsSync(projectPath)) {
  console.error('Project path does not exist:', projectPath);
  process.exit(1);
}

// instantiate Arborist pointing at the project. Add registry/auth options here if needed.
const arb = new Arborist({ path: projectPath });

// Load the actual tree if node_modules exists (this returns the root node)
// If you only have a package-lock.json and no node_modules, loadVirtual() is the method that reads lockfile/yarn.lock
let actual;
try {
  actual = await arb.loadActual();
} catch (err) {
  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
  actual = arb.actualTree || undefined;
}

// If you have a lockfile and want to build from it, ensure the virtual tree is loaded first.
// (If you already have node_modules, buildIdealTree will still consult lockfiles.)
try {


 ... (clipped 44 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Refactor the POC into a reusable library module

The current proof-of-concept script should be refactored into a reusable library
module. This involves creating a function with a proper API that accepts
parameters and returns structured data, rather than using hardcoded values and
logging to the console.

Examples:

lib/deps-analyzer.js [23-96]
async function main() {
//   const args = parseArgs();

    const args = {}
    args.path = process.cwd();
    args.add = "ai";

  const projectPath = path.resolve(args.path);
  if (!fs.existsSync(projectPath)) {
    console.error('Project path does not exist:', projectPath);

 ... (clipped 64 lines)

Solution Walkthrough:

Before:

// lib/deps-analyzer.js

async function main() {
  // Hardcoded arguments
  const args = {}
  args.path = process.cwd();
  args.add = "ai";

  // ... Arborist logic to calculate diff ...

  const adds = [];
  // walk diff tree and populate `adds` array

  // Log result to console and exit
  console.log(JSON.stringify({ added: adds }, null, 2));
}

main().catch(err => {
  console.error(err);
  process.exit(2);
});

After:

// lib/deps-analyzer.js
'use strict';
const Arborist = require('@npmcli/arborist');
// ... other imports

async function getDependencyChanges(options) {
  const { projectPath, add: addSpec } = options;

  const arb = new Arborist({ path: projectPath });
  // ... Arborist logic to calculate diff using options ...
  await arb.buildIdealTree({ add: addSpec });
  // ...

  const adds = [];
  // walk diff tree and populate `adds` array

  return { added: adds };
}

module.exports = { getDependencyChanges };
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies the submitted code as a non-reusable proof-of-concept script and proposes a critical architectural refactoring to make it an integrable library module, which is essential for its utility.

High
Possible issue
Avoid incorrect dependency diff calculation

Remove the fallback to arb.virtualTree when defining the ideal tree. Instead,
rely only on arb.idealTree and add a check to ensure it was generated, throwing
an error if it's missing to prevent incorrect diff calculations.

lib/deps-analyzer.js [62-63]

 // After buildIdealTree, the ideal tree should be available on arb.idealTree
-const ideal = arb.idealTree || arb.virtualTree;
+const ideal = arb.idealTree;
 
+if (!ideal) {
+  throw new Error('Ideal tree was not generated by buildIdealTree.');
+}
+
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a logic bug where falling back to arb.virtualTree would produce an incorrect dependency diff, defeating the script's purpose. The fix ensures the script fails explicitly instead of silently generating wrong output.

Medium
Improve error handling for dependency loading

Refine the error handling for arb.loadActual() to only ignore errors for a
missing node_modules directory (ENOENT). Re-throw all other errors to prevent
the script from producing incorrect results on unexpected failures.

lib/deps-analyzer.js [41-47]

 let actual;
 try {
   actual = await arb.loadActual();
 } catch (err) {
-  // loadActual may fail if there's no node_modules; that's okay, we'll still have an "actual" value undefined/null
+  // loadActual may fail if there's no node_modules.
+  // We should only ignore the error if it's a missing directory error.
+  if (err.code !== 'ENOENT') {
+    throw err;
+  }
+  // It's okay if node_modules doesn't exist, we'll treat it as an empty tree.
   actual = arb.actualTree || undefined;
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that swallowing all errors from arb.loadActual() can lead to silent failures and incorrect results. The proposed fix to only ignore ENOENT errors significantly improves the script's robustness.

Medium
  • More

@qodo-free-for-open-source-projects
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Refactor PoC script into a module

The current proof-of-concept script should be refactored into a reusable module
with a clear API. This involves removing hardcoded values and direct console
outputs to make it testable and integrable.

Examples:

lib/deps-analyzer.js [23-96]
async function main() {
//   const args = parseArgs();

    const args = {}
    args.path = process.cwd();
    args.add = "ai";

  const projectPath = path.resolve(args.path);
  if (!fs.existsSync(projectPath)) {
    console.error('Project path does not exist:', projectPath);

 ... (clipped 64 lines)

Solution Walkthrough:

Before:

// lib/deps-analyzer.js

async function main() {
  // Hardcoded arguments
  const args = {}
  args.path = process.cwd();
  args.add = "ai";

  // ... logic to load trees and calculate diff ...
  const arb = new Arborist({ path: projectPath });
  await arb.buildIdealTree({ add: [args.add] });
  // ...

  // Direct output to console
  console.log(JSON.stringify({ added: adds }, null, 2));
}

main().catch(err => {
  console.error(err);
  process.exit(2);
});

After:

// lib/deps-analyzer.js

export async function getDependencyChanges(projectPath, packagesToAdd) {
  const arb = new Arborist({ path: projectPath });

  // ... logic to load trees and calculate diff ...
  await arb.buildIdealTree({ add: packagesToAdd });
  // ...

  const adds = [];
  // ... walk diff and collect added dependencies

  // Return structured data
  return { added: adds };
}

// CLI wrapper can be in a separate file
// e.g. bin/cli.js
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies the code as a non-integrable proof-of-concept and proposes a critical refactoring into a reusable module, which is essential for maintainability and future integration.

High
Possible issue
Ensure correct dependency tree is used

To prevent incorrect dependency diffs, remove the fallback to arb.virtualTree
and instead throw an error if arb.idealTree is not available after being built.

lib/deps-analyzer.js [62-63]

 // After buildIdealTree, the ideal tree should be available on arb.idealTree
-const ideal = arb.idealTree || arb.virtualTree;
+const ideal = arb.idealTree;
+if (!ideal) {
+  throw new Error('Failed to build the ideal tree.');
+}
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical logical bug where falling back to arb.virtualTree would lead to incorrect dependency analysis, and proposes a robust fix by ensuring the correct tree is used or an error is thrown.

High
General
Prevent potential stack overflow errors

Convert the recursive walk function to an iterative approach using a stack to
prevent potential "Maximum call stack size exceeded" errors on deep dependency
trees.

lib/deps-analyzer.js [70-87]

-function walk(diffNode) {
-  // Diff nodes have an `action` property (ADD / REMOVE / CHANGE / null)
-  if (diffNode.action === 'ADD') {
-    const idealNode = diffNode.ideal;
-    if (idealNode && idealNode.package) {
-      adds.push({
-        name: idealNode.package.name,
-        version: idealNode.package.version,
-        location: idealNode.location,
-        resolved: idealNode.resolved || null,
-        integrity: idealNode.integrity || null,
-      });
+function walk(rootNode) {
+  if (!rootNode) {
+    return;
+  }
+  const stack = [rootNode];
+  while (stack.length > 0) {
+    const diffNode = stack.pop();
+    // Diff nodes have an `action` property (ADD / REMOVE / CHANGE / null)
+    if (diffNode.action === 'ADD') {
+      const idealNode = diffNode.ideal;
+      if (idealNode && idealNode.package) {
+        adds.push({
+          name: idealNode.package.name,
+          version: idealNode.package.version,
+          location: idealNode.location,
+          resolved: idealNode.resolved || null,
+          integrity: idealNode.integrity || null,
+        });
+      }
     }
-  }
-  if (Array.isArray(diffNode.children)) {
-    for (const child of diffNode.children) walk(child);
+    if (Array.isArray(diffNode.children)) {
+      for (const child of diffNode.children) {
+        stack.push(child);
+      }
+    }
   }
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential stack overflow issue with the recursive walk function and provides a standard iterative solution, improving the script's robustness for deep dependency trees.

Low
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant