Skip to content

Feature/knowledge connector sharepoint#870

Open
tmccaughey wants to merge 6 commits intomasterfrom
feature/knowledge-connector-sharepoint
Open

Feature/knowledge connector sharepoint#870
tmccaughey wants to merge 6 commits intomasterfrom
feature/knowledge-connector-sharepoint

Conversation

@tmccaughey
Copy link
Copy Markdown
Contributor

Added the Knowledge Connector for Sharepoint

Copilot AI review requested due to automatic review settings January 29, 2026 10:35
@graymalkin77
Copy link
Copy Markdown

graymalkin77 commented Jan 29, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@tmccaughey tmccaughey requested a review from falak-asad January 29, 2026 10:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Knowledge Connector for Microsoft SharePoint that enables extracting and indexing documents from SharePoint sites into Cognigy's knowledge base system.

Changes:

  • Added SharePoint Knowledge Connector with support for multiple file types (PDF, DOCX, TXT, CSV, JSON, JSONL, MD, PPTX)
  • Implemented file extraction, text chunking, and knowledge source creation
  • Updated dependencies to include LangChain libraries for document processing

Reviewed changes

Copilot reviewed 11 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
package.json Added base dependencies for extension tools and axios
extensions/microsoft-sharepoint/tsconfig.json Added skipLibCheck configuration
extensions/microsoft-sharepoint/src/module.ts Registered new connector and connection type
extensions/microsoft-sharepoint/src/knowledge-connectors/sharepoint-connector.ts Main connector implementation with file processing logic
extensions/microsoft-sharepoint/src/knowledge-connectors/helpers/utils/config.ts Configuration utilities for chunk size management
extensions/microsoft-sharepoint/src/knowledge-connectors/helpers/text_extractor.ts Document text extraction with LangChain loaders
extensions/microsoft-sharepoint/src/knowledge-connectors/helpers/text_chunker.ts Text splitting functionality for document chunking
extensions/microsoft-sharepoint/src/knowledge-connectors/helpers/list_files.ts SharePoint file listing with filtering
extensions/microsoft-sharepoint/src/knowledge-connectors/helpers/chunk_extractor.ts File download and chunk extraction orchestration
extensions/microsoft-sharepoint/src/connections/connectorConnection.ts Connection schema for SharePoint authentication
extensions/microsoft-sharepoint/package.json Updated dependencies including LangChain packages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fields: [
{ fieldName: "tenantId" },
{ fieldName: "clientId" },
{ fieldName: "clientSecret"},
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent spacing: missing space before the closing brace. Should be { fieldName: \"clientSecret\" }, to match the formatting of the other fields.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…rs/chunk_extractor.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 29, 2026 10:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
} catch (error) {
// Continue with next file even if this one fails
console.error(`Failed to process file ${file.name}:`, error);
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message doesn't provide actionable context. Consider including the drive name and file path, and suggesting next steps.

Suggested change
console.error(`Failed to process file ${file.name}:`, error);
console.error(
`Failed to process file "${file.name}" (id: ${file.id}) in drive "${drive.name}" (id: ${driveId}). ` +
`Check that the file exists in SharePoint, that the connector has access permissions, and that the file type is supported.`,
error
);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@falak-asad I might just remove the logs again?

Comment on lines +13 to +14
export const logger = {
log: (level: string, context: any, message: string) => {
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom logger object should have a more specific name like 'customLogger' or 'textExtractorLogger' to avoid confusion with standard console logging.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 16, 2026 09:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +153 to +157
if (!pptx)
return [];
return [
new Document({
pageContent: pptx,
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseOfficeAsync function from the officeparser package is called without any error handling beyond the try-catch wrapper. According to the library documentation, if the parsing fails, it may return an empty string or undefined. The code checks for !pptx but should also handle the case where pptx is an empty string to avoid creating empty knowledge chunks.

Suggested change
if (!pptx)
return [];
return [
new Document({
pageContent: pptx,
// Handle cases where parsing fails or returns empty/meaningless content
if (!pptx) {
return [];
}
const cleanedContent = removeUnnecessaryChars(pptx);
if (!cleanedContent) {
return [];
}
return [
new Document({
pageContent: cleanedContent,

Copilot uses AI. Check for mistakes.
tags: sourceTags as string[],
chunkCount: chunks.length,
contentHashOrTimestamp: contentHash,
externalIdentifier: file.name,
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue with external identifier uniqueness. Using file.name as the externalIdentifier may cause collisions if files with the same name exist in different drives or folders. Consider using a more unique identifier such as a combination of driveId and file.id, or the full file path, to prevent potential conflicts and ensure proper cleanup logic.

Copilot uses AI. Check for mistakes.
} finally {
// Clean up temp file
if (fs.existsSync(tempFilePath)) {
try { fs.unlinkSync(tempFilePath); } catch (e) { /* ignore cleanup errors */ }
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temp file cleanup may fail silently in the catch block, potentially leaving temporary files on disk if the unlink operation fails. While the error is caught to prevent crashing, accumulated temp files could eventually consume disk space. Consider logging a warning when cleanup fails to help diagnose issues in production.

Suggested change
try { fs.unlinkSync(tempFilePath); } catch (e) { /* ignore cleanup errors */ }
try {
fs.unlinkSync(tempFilePath);
} catch (e) {
console.warn(`Warning: Failed to remove temp file at ${tempFilePath}:`, e);
}

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +76
// Convert split documents to ChunkContent format
const chunks: ChunkContent[] = splitDocuments
.slice(0, MAX_CHUNKS_PER_FILE)
.map((doc, index) => ({
text: doc.pageContent,
data: {
title: `${fileName} - Part ${index + 1}`,
source: fileName,
fileType: fileExtension,
},
}));
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded chunk limit of 500 chunks per file may not be appropriate for all use cases. Large documents could be truncated without warning to the user. Consider either making this configurable through the connector fields or logging a warning when a file exceeds this limit so users are aware that content is being truncated.

Suggested change
// Convert split documents to ChunkContent format
const chunks: ChunkContent[] = splitDocuments
.slice(0, MAX_CHUNKS_PER_FILE)
.map((doc, index) => ({
text: doc.pageContent,
data: {
title: `${fileName} - Part ${index + 1}`,
source: fileName,
fileType: fileExtension,
},
}));
const totalChunks = splitDocuments.length;
if (totalChunks > MAX_CHUNKS_PER_FILE) {
console.warn(
`Truncating SharePoint file "${fileName}": ` +
`${totalChunks} chunks generated, but only the first ` +
`${MAX_CHUNKS_PER_FILE} will be kept. ` +
`${totalChunks - MAX_CHUNKS_PER_FILE} chunks will be discarded.`
);
}
// Convert split documents to ChunkContent format
const limitedDocuments = splitDocuments.slice(0, MAX_CHUNKS_PER_FILE);
const chunks: ChunkContent[] = limitedDocuments.map((doc, index) => ({
text: doc.pageContent,
data: {
title: `${fileName} - Part ${index + 1}`,
source: fileName,
fileType: fileExtension,
},
}));

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +151
// Process each file
for (const file of files) {
try {
const fileExtension = path.extname(file.name).slice(1).toLowerCase();

// Get chunks for this file
const chunks = await getSharePointFileChunks(
accessToken,
driveId,
file.id,
file.name,
fileExtension
);

if (chunks.length === 0) continue;

// Compute content hash to support safe upserts
const contentHash = createContentHash(chunks);

// Upsert knowledge source so we can skip re-ingestion if unchanged
const knowledgeSource = await api.upsertKnowledgeSource({
name: file.name,
description: `Data from ${file.name} in SharePoint library ${drive.name}`,
tags: sourceTags as string[],
chunkCount: chunks.length,
contentHashOrTimestamp: contentHash,
externalIdentifier: file.name,
});

if (knowledgeSource) {
// Create all chunks (the runtime may optimise if content unchanged)
for (const chunk of chunks) {
await api.createKnowledgeChunk({
knowledgeSourceId: knowledgeSource.knowledgeSourceId,
text: chunk.text,
data: chunk.data,
});
}
}

// Track processed external identifiers for cleanup
newSources.push(file.name);
} catch (error) {
// Continue with next file even if this one fails
console.error(`Failed to process file ${file.name}:`, error);
continue;
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential processing of files and chunks with await in loops will cause significant performance issues for SharePoint libraries with many files. Each file is processed one at a time, and within each file, chunks are created one at a time. For a library with hundreds of files, this could take hours. Consider using Promise.all() or batch processing to process multiple files in parallel, and batch-create chunks to improve performance.

Suggested change
// Process each file
for (const file of files) {
try {
const fileExtension = path.extname(file.name).slice(1).toLowerCase();
// Get chunks for this file
const chunks = await getSharePointFileChunks(
accessToken,
driveId,
file.id,
file.name,
fileExtension
);
if (chunks.length === 0) continue;
// Compute content hash to support safe upserts
const contentHash = createContentHash(chunks);
// Upsert knowledge source so we can skip re-ingestion if unchanged
const knowledgeSource = await api.upsertKnowledgeSource({
name: file.name,
description: `Data from ${file.name} in SharePoint library ${drive.name}`,
tags: sourceTags as string[],
chunkCount: chunks.length,
contentHashOrTimestamp: contentHash,
externalIdentifier: file.name,
});
if (knowledgeSource) {
// Create all chunks (the runtime may optimise if content unchanged)
for (const chunk of chunks) {
await api.createKnowledgeChunk({
knowledgeSourceId: knowledgeSource.knowledgeSourceId,
text: chunk.text,
data: chunk.data,
});
}
}
// Track processed external identifiers for cleanup
newSources.push(file.name);
} catch (error) {
// Continue with next file even if this one fails
console.error(`Failed to process file ${file.name}:`, error);
continue;
}
// Process each file in small concurrent batches to avoid sequential bottlenecks
const FILE_BATCH_SIZE = 5;
for (let i = 0; i < files.length; i += FILE_BATCH_SIZE) {
const batch = files.slice(i, i + FILE_BATCH_SIZE);
await Promise.all(
batch.map(async (file) => {
try {
const fileExtension = path.extname(file.name).slice(1).toLowerCase();
// Get chunks for this file
const chunks = await getSharePointFileChunks(
accessToken,
driveId,
file.id,
file.name,
fileExtension
);
if (chunks.length === 0) {
return;
}
// Compute content hash to support safe upserts
const contentHash = createContentHash(chunks);
// Upsert knowledge source so we can skip re-ingestion if unchanged
const knowledgeSource = await api.upsertKnowledgeSource({
name: file.name,
description: `Data from ${file.name} in SharePoint library ${drive.name}`,
tags: sourceTags as string[],
chunkCount: chunks.length,
contentHashOrTimestamp: contentHash,
externalIdentifier: file.name,
});
if (knowledgeSource) {
// Create all chunks concurrently (the runtime may optimise if content unchanged)
await Promise.all(
chunks.map((chunk) =>
api.createKnowledgeChunk({
knowledgeSourceId: knowledgeSource.knowledgeSourceId,
text: chunk.text,
data: chunk.data,
})
)
);
}
// Track processed external identifiers for cleanup
newSources.push(file.name);
} catch (error) {
// Continue with next file even if this one fails
console.error(`Failed to process file ${file.name}:`, error);
return;
}
})
);

Copilot uses AI. Check for mistakes.
{ fieldName: "tenantId" },
{ fieldName: "clientId" },
{ fieldName: "clientSecret" },

Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line before the closing bracket. Remove the blank line at line 10 to maintain consistent formatting with other connection schemas in the codebase.

Suggested change

Copilot uses AI. Check for mistakes.
import * as path from 'path';
import * as crypto from "crypto";


Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line after imports. Remove the extra blank line at line 8 to follow consistent formatting patterns seen in other knowledge connectors in the codebase.

Suggested change

Copilot uses AI. Check for mistakes.
"sourceMap": true,
"moduleResolution": "node",
"watch": false
"skipLibCheck": true
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'watch' property was removed from tsconfig.json. While 'skipLibCheck' was added (which is good for build performance), removing 'watch: false' changes the default TypeScript behavior. If watch mode was explicitly disabled before, this change may cause unintended watch mode activation. Verify that this change is intentional and won't affect the build process.

Copilot uses AI. Check for mistakes.
Comment thread package.json Outdated
Comment on lines +147 to +150
} catch (error) {
// Continue with next file even if this one fails
console.error(`Failed to process file ${file.name}:`, error);
continue;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling silently continues on failure but error messages may contain sensitive information or stack traces that could expose internal details. While the continuation logic is correct to keep processing other files, ensure that error logging is controlled in production environments to avoid exposing sensitive data.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 09:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread package.json
@@ -0,0 +1 @@
{}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This root-level package.json is empty, but the root package-lock.json declares dependencies. This mismatch will break reproducible installs (npm will consider the lock out of sync) and is inconsistent with the repo’s per-extension package.json structure. Either remove the root package.json/package-lock.json entirely, or define the intended root dependencies and regenerate the lockfile from that package.json.

Suggested change
{}
{
"name": "root",
"version": "1.0.0",
"private": true
}

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +49
const url = `${baseUrl}?$top=1000`;

const response = await axios.get(url, {
headers: {
'Authorization': `Bearer ${accessToken}`,
'Content-Type': 'application/json'
}
});

if (!response.data.value) {
return [];
}

// Filter for files only (not folders), non-empty, and supported types
const sharePointFiles: SharePointFile[] = response.data.value
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment claims pagination support, but the implementation only fetches the first page (?$top=1000) and ignores @odata.nextLink. Either implement pagination by following @odata.nextLink until exhausted, or update the comment to avoid promising behavior that doesn’t exist.

Suggested change
const url = `${baseUrl}?$top=1000`;
const response = await axios.get(url, {
headers: {
'Authorization': `Bearer ${accessToken}`,
'Content-Type': 'application/json'
}
});
if (!response.data.value) {
return [];
}
// Filter for files only (not folders), non-empty, and supported types
const sharePointFiles: SharePointFile[] = response.data.value
let url = `${baseUrl}?$top=1000`;
const allItems: any[] = [];
while (url) {
const response = await axios.get(url, {
headers: {
'Authorization': `Bearer ${accessToken}`,
'Content-Type': 'application/json'
}
});
if (!response.data || !response.data.value) {
break;
}
allItems.push(...response.data.value);
const nextLink = response.data['@odata.nextLink'];
url = nextLink && typeof nextLink === 'string' ? nextLink : '';
}
if (allItems.length === 0) {
return [];
}
// Filter for files only (not folders), non-empty, and supported types
const sharePointFiles: SharePointFile[] = allItems

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +68
// Log first few files for debugging
sharePointFiles.slice(0, 3).forEach((file, index) => {
console.log(`File ${index + 1}: ${file.name} (${file.size} bytes)`);
});
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug logging will run on every connector sync and can create noisy logs (and potentially leak document names). Consider removing it or gating it behind an explicit debug flag (e.g., env var) so production runs stay quiet by default.

Suggested change
// Log first few files for debugging
sharePointFiles.slice(0, 3).forEach((file, index) => {
console.log(`File ${index + 1}: ${file.name} (${file.size} bytes)`);
});
// Log first few files for debugging (enabled only when explicitly requested)
if (process.env.SHAREPOINT_CONNECTOR_DEBUG === 'true') {
sharePointFiles.slice(0, 3).forEach((file, index) => {
console.log(`File ${index + 1}: ${file.name} (${file.size} bytes)`);
});
}

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +135
// split document into paragraphs according to specified or default splitter
const splitDocuments = (
await splitDocs(docs)
).map((doc) => doc.pageContent);

// join the paragraphs into the format we want
const textParagraphs = splitDocuments.join('\n\n');

logger.log("info", null, "Successfully used langchain to extract content");

return textParagraphs;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lsExtractor already splits documents into chunks (via splitDocs) and then rejoins them into a single string, but getSharePointFileChunks immediately calls splitDocs again on that returned string. This duplicates work and makes it unclear where chunking is supposed to happen. Consider having lsExtractor return extracted text only (no chunking), or return split Document[] and let chunk_extractor be the single place that chunks.

Suggested change
// split document into paragraphs according to specified or default splitter
const splitDocuments = (
await splitDocs(docs)
).map((doc) => doc.pageContent);
// join the paragraphs into the format we want
const textParagraphs = splitDocuments.join('\n\n');
logger.log("info", null, "Successfully used langchain to extract content");
return textParagraphs;
// Join cleaned document contents into a single string; downstream code is responsible for chunking
const fullText = docs.map((doc) => doc.pageContent).join('\n\n');
logger.log("info", null, "Successfully used langchain to extract content");
return fullText;

Copilot uses AI. Check for mistakes.
{ fieldName: "tenantId" },
{ fieldName: "clientId" },
{ fieldName: "clientSecret" },

Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: there’s an extra blank line in the fields array, and indentation differs from the other connection schema files in this extension. Cleaning this up will keep formatting consistent and reduce diffs over time.

Suggested change

Copilot uses AI. Check for mistakes.

// Save to temp file (text_extractor needs file path)
const tempDir = os.tmpdir();
const tempFileName = `${Date.now()}_${fileName}`;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temp file name is derived from fileName. To avoid filesystem issues (invalid characters/very long names) and to guarantee uniqueness, prefer generating the temp file name from stable IDs (e.g., fileId) and/or sanitize the original file name before using it on disk.

Suggested change
const tempFileName = `${Date.now()}_${fileName}`;
const tempFileName = `${fileId}_${Date.now()}`;

Copilot uses AI. Check for mistakes.
const tempFileName = `${Date.now()}_${fileName}`;
const tempFilePath = path.join(tempDir, tempFileName);

fs.writeFileSync(tempFilePath, bodyContents as any);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses synchronous filesystem I/O (writeFileSync) inside an async workflow. When processing many files, this will block the Node.js event loop and can significantly reduce throughput. Prefer await fs.promises.writeFile(...) (and similar async APIs for cleanup) to keep the connector responsive.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 3, 2026 10:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 20 changed files in this pull request and generated 13 comments.

Comments suppressed due to low confidence (2)

extensions/microsoft-sharepoint/src/nodes/getSharepointSiteInfo.ts:108

  • The form still references a "connectionSection" (form[0]), but sections no longer defines it. This mismatch can break the node editor UI. Either re-add the missing section definition or remove the form reference.
	sections: [
		{
			key: "storageOption",
			label: "Storage Option",
			defaultCollapsed: true,

extensions/microsoft-sharepoint/src/nodes/getSharepointListItems.ts:125

  • The form references a "connectionSection" (form[0]), but sections does not define it. This mismatch can break the node editor UI. Either add the section definition back or remove the form entry.
	sections: [
		{
			key: "storageOption",
			label: "Storage Option",
			defaultCollapsed: true,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +137 to +140
logMessage(`Failed to process file ${file.name}: ${error.message}`,
"sharepoint-connector",
"error"
);
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error logging assumes error.message exists; caught values can be non-Error objects and Axios errors often contain useful response.status/response.data. Normalize the error to a string (and include status when present) to avoid logs like "...: undefined" and improve diagnosability.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +14
const params = new URLSearchParams();
params.append('client_id', clientId);
params.append('client_secret', clientSecret);
params.append('scope', 'https://graph.microsoft.com/.default');
params.append('grant_type', 'client_credentials');
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OAuth scope is hard-coded to https://graph.microsoft.com/.default, but callers also use this token against SharePoint REST endpoints (${url}/_api/...). Graph-audience tokens are typically not accepted by SharePoint REST (audience mismatch). Consider parameterizing the scope (Graph vs https://{tenant}.sharepoint.com/.default) or splitting into separate token helpers.

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +135
// Get OAuth2 access token for SharePoint Online
const { tenantId, clientId, clientSecret } = cloudAuth;
const accessToken = await getAccessToken(tenantId, clientId, clientSecret);

const response = await request.get({
url: `${url}/_api/web`,
headers: headers,
json: true
});
const response = await axios.get(
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This node requests an access token via getAccessToken (currently Graph-scoped) and then calls the SharePoint REST API (/_api/web). If the token audience is Graph, SharePoint commonly returns 401/403. Consider requesting a SharePoint-scoped token for the site host or switching this node to Microsoft Graph equivalents.

Copilot uses AI. Check for mistakes.

// Save to temp file (text_extractor needs file path)
const tempDir = os.tmpdir();
const tempFileName = `${Date.now()}_${fileName}`;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temp file name is built from untrusted fileName (${Date.now()}_${fileName}) and then joined into a path. A SharePoint filename containing path separators or very long names can cause write failures or unexpected paths. Prefer a safe name (e.g., use fileId + random suffix, and/or sanitize to remove path separators).

Suggested change
const tempFileName = `${Date.now()}_${fileName}`;
const tempFileName = `${Date.now()}_${fileId}`;

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +120
tags: sourceTags as string[],
chunkCount: chunks.length,
contentHashOrTimestamp: contentHash,
externalIdentifier: file.name,
});
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalIdentifier: file.name is not stable/unique across drives/folders and can collide when multiple libraries contain the same filename or when files are renamed. This can cause upserts/cleanup to affect the wrong Knowledge Source. Prefer a unique identifier like ${driveId}:${file.id} (or include the full path/webUrl).

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +53
fs.writeFileSync(tempFilePath, bodyContents as any);

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronous file IO (writeFileSync) blocks the Node.js event loop during connector execution, which can hurt throughput/timeouts for large files. Prefer using fs.promises.writeFile (and similarly async cleanup) to keep the connector non-blocking.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +85
const newSources: string[] = [];

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newSources is an array and later cleanup uses includes, making deletion checks O(n) each (O(n²) overall). For sites with many files, this can become slow. Track processed identifiers in a Set instead (add via set.add(...) and check via set.has(...)).

Copilot uses AI. Check for mistakes.
Comment on lines 126 to 127
const { cloudAuth, authentication, url, storeLocation, inputKey, contextKey } = config;

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node function destructures authentication but never uses it (and always uses cloudAuth). Once Basic auth is removed, consider removing authentication from the config type + UI entirely to avoid misleading configuration and dead code.

Copilot uses AI. Check for mistakes.
Comment on lines 145 to 146
const { cloudAuth, authentication, url, list, filter, storeLocation, inputKey, contextKey } = config;

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authentication is included in the config and destructured in the function, but it’s effectively constant (only "cloud") and never used. Consider removing it from both the UI schema and the params interface to simplify the node configuration and avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +57
case 'pptx':
documentLoader = new PPTXLoader(inputFile);
break;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPTXLoader is referenced in the switch-case before it is declared. In JS/TS, class declarations are not usable before initialization, so hitting the 'pptx' case will throw a ReferenceError at runtime. Move the class PPTXLoader ... declaration above lsExtractor (or define it before the switch).

Copilot uses AI. Check for mistakes.
@falak-asad falak-asad force-pushed the feature/knowledge-connector-sharepoint branch from 7d1c8a4 to 00726e0 Compare March 5, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants