Fixing highlights metadata by Strift · Pull Request #1359 · meilisearch/meilisearch-js-plugins

Strift · 2025-01-15T08:32:21Z

Pull Request

Related issue

Fixes #1337

What does this PR do?

...

PR checklist

Please check if your PR fulfills the following requirements:

Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
Have you read the contributing guidelines?
Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

Tests
- Added comprehensive test coverage for search result highlighting across nested and complex data structures.
Refactor
- Improved search highlighting accuracy by enhancing metadata calculation for better match detection across all field types.

changeset-bot · 2025-01-15T08:32:25Z

⚠️ No Changeset found

Latest commit: 1836b95

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-04-06T05:50:07Z

📝 Walkthrough

Walkthrough

This PR enriches Meilisearch results with highlight metadata (fullyHighlighted, matchLevel, matchedWords) to match Algolia's capabilities. A new highlight.ts module parses highlight strings and computes metadata. The fetchMeilisearchResults function is refactored to recursively enrich nested highlight structures using these utilities, bounded by a depth limit. Tests validate recursive enrichment across various nested document structures.

Changes

Cohort / File(s)	Summary
Highlight Metadata Module `packages/autocomplete-client/src/search/highlight.ts`	New module exporting `HighlightMetadata` interface and `calculateHighlightMetadata` function. Parses Meilisearch highlight strings by extracting segments wrapped in pre/post tags, computing `fullyHighlighted` status, determining `matchLevel` ('none', 'partial', 'full'), and extracting `matchedWords`.
Search Results Refactoring `packages/autocomplete-client/src/search/fetchMeilisearchResults.ts`	Refactored hit enrichment pipeline: delegated request construction to `buildSearchRequest`, introduced `buildHits` and `enrichHighlightTree` for recursive highlight enrichment, added `shouldIncludeTopLevelHighlightField` filtering, imported `calculateHighlightMetadata`, and set `MAX_HIGHLIGHT_DEPTH = 20` to bound recursion.
Utility Additions `packages/autocomplete-client/src/utils.ts`	Added generic `mapOneOrMany` utility function to conditionally apply a mapping function to either a single value or array of values, preserving input shape.
Test Suite Expansion `packages/autocomplete-client/src/search/__tests__/fetchMeilisearchResults.test.ts`	Added comprehensive test suite ("Highlighting Metadata") validating recursive enrichment of `_highlightResult` across top-level arrays, nested objects within arrays, multiple fields in nested objects, and case-insensitive match-level validation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant fetchMeilisearch as fetchMeilisearchResults
    participant buildHits
    participant enrichTree as enrichHighlightTree
    participant calcMeta as calculateHighlightMetadata

    Client->>fetchMeilisearch: fetchMeilisearchResults(queries)
    fetchMeilisearch->>fetchMeilisearch: buildSearchRequest(queries)
    fetchMeilisearch->>fetchMeilisearch: execute Meilisearch query
    fetchMeilisearch->>buildHits: buildHits(result, query)
    
    loop For each hit
        buildHits->>enrichTree: enrichHighlightTree(highlightResult)
        loop Recursive traversal (depth <= 20)
            enrichTree->>enrichTree: traverse structure
            alt Is highlight value
                enrichTree->>calcMeta: calculateHighlightMetadata(...)
                calcMeta-->>enrichTree: HighlightMetadata{value, fullyHighlighted, matchLevel, matchedWords}
                enrichTree->>enrichTree: attach metadata
            else Is object/array
                enrichTree->>enrichTree: recurse into nested structure
            end
        end
        enrichTree-->>buildHits: enriched highlight tree
    end
    
    buildHits-->>fetchMeilisearch: enriched hits with metadata
    fetchMeilisearch-->>Client: results with HighlightMetadata

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Highlights dance in nested trees,
Metadata flows with gentle ease,
Match levels sing from deep within,
Recursion counts to twenty—then!
Algolia's glow, now Meilisearch's art. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fixing highlights metadata' is concise and directly describes the main change: adding metadata fields to highlight results.
Linked Issues check	✅ Passed	The PR adds fullyHighlighted, matchLevel, and matchedWords properties to _highlightResult across top-level fields, arrays, and nested objects, fully addressing issue `#1337`'s requirements.
Out of Scope Changes check	✅ Passed	All changes are scoped to highlight metadata enrichment; refactors to fetchMeilisearchResults and utility additions directly support the stated objective.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/highlights-metadata

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

packages/autocomplete-client/src/utils.ts (1)
1-7: Drop the JSDoc here.

mapOneOrMany and its type signature already explain the helper, so this block mostly adds noise.

As per coding guidelines, "Prefer self-descriptive code to comments. Only use comments for complex logic."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autocomplete-client/src/utils.ts` around lines 1 - 7, Remove the
redundant JSDoc block above the mapOneOrMany helper: delete the multi-line
comment that documents the function and its parameters/returns so the code
remains self-descriptive; locate the comment immediately preceding the
mapOneOrMany function in utils.ts and remove it, leaving only the function
declaration and its TypeScript signature.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/autocomplete-client/src/search/highlight.ts`:
- Around line 37-40: The current fullyHighlighted check uses highlightedText =
matches.join(''), which drops the original separators (spaces/punctuation) and
makes comparisons with cleanValue incorrect for multi-tag full matches; instead
reconstruct the highlightedText using the matched ranges from the original
string (use the match indices/positions associated with matches) and include the
intervening substrings from cleanValue between matches so the rebuilt
highlightedText preserves separators, then set fullyHighlighted = cleanValue ===
rebuiltHighlightedText (update references to matches and fullyHighlighted
accordingly).
- Around line 23-35: The current code constructs RegExp with raw user-provided
highlight tags (preTag/postTag) in the highlighting logic (variables preTag,
postTag, highlightValue), which breaks for regex metacharacters; fix by escaping
meta-characters before building regexes: implement or use an escapeRegExp helper
and apply it to preTag and postTag when creating the two RegExp instances used
for extracting matches (highlightRegex) and for removing tags (the replace
regex), so both the exec loop in the highlight extraction and the cleanValue
replace operate on the literal tag strings.

---

Nitpick comments:
In `@packages/autocomplete-client/src/utils.ts`:
- Around line 1-7: Remove the redundant JSDoc block above the mapOneOrMany
helper: delete the multi-line comment that documents the function and its
parameters/returns so the code remains self-descriptive; locate the comment
immediately preceding the mapOneOrMany function in utils.ts and remove it,
leaving only the function declaration and its TypeScript signature.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2c811d6-d743-41fb-b93e-d900f4ca4a38

📥 Commits

Reviewing files that changed from the base of the PR and between 18e8370 and 1836b95.

📒 Files selected for processing (4)

packages/autocomplete-client/src/search/__tests__/fetchMeilisearchResults.test.ts
packages/autocomplete-client/src/search/fetchMeilisearchResults.ts
packages/autocomplete-client/src/search/highlight.ts
packages/autocomplete-client/src/utils.ts

coderabbitai · 2026-04-06T06:12:27Z

packages/autocomplete-client/src/search/highlight.ts

+  // Extract all highlighted segments
+  const highlightRegex = new RegExp(`${preTag}(.*?)${postTag}`, 'g')
+  const matches: string[] = []
+  let match
+  while ((match = highlightRegex.exec(highlightValue)) !== null) {
+    matches.push(match[1])
+  }
+
+  // Remove highlight tags to get the highlighted text without the tags
+  const cleanValue = highlightValue.replace(
+    new RegExp(`${preTag}|${postTag}`, 'g'),
+    ''
+  )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and find the highlight.ts file find . -name "highlight.ts" -type f 2>/dev/null | head -20

Repository: meilisearch/meilisearch-js-plugins

Length of output: 132

🏁 Script executed:

# Get the full content of the highlight.ts file to understand the context cat -n packages/autocomplete-client/src/search/highlight.ts

Repository: meilisearch/meilisearch-js-plugins

Length of output: 2461

🏁 Script executed:

# Search for other RegExp constructions with template literals in the autocomplete-client package rg 'new RegExp.*\$\{' packages/autocomplete-client/src --type ts --type tsx -A 2

Repository: meilisearch/meilisearch-js-plugins

Length of output: 105

🏁 Script executed:

# Check if there are any existing tests for this function that might reveal expected behavior rg 'calculateHighlightMetadata|preTag|postTag' packages/autocomplete-client --type ts --type tsx | head -30

Repository: meilisearch/meilisearch-js-plugins

Length of output: 105

🏁 Script executed:

# Search for other RegExp constructions with template literals in the autocomplete-client package rg 'new RegExp.*\$\{' packages/autocomplete-client/src --type ts -A 2

Repository: meilisearch/meilisearch-js-plugins

Length of output: 566

🏁 Script executed:

# Check if there are any existing tests for this function find . -path "*/test*" -o -path "*/spec*" | grep -i highlight

Repository: meilisearch/meilisearch-js-plugins

Length of output: 60

🏁 Script executed:

# Look for test files that might test calculateHighlightMetadata find . -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "calculateHighlightMetadata" 2>/dev/null

Repository: meilisearch/meilisearch-js-plugins

Length of output: 62

🏁 Script executed:

# Check how calculateHighlightMetadata is called in the codebase rg 'calculateHighlightMetadata' packages/autocomplete-client/src -B 3 -A 3

Repository: meilisearch/meilisearch-js-plugins

Length of output: 2188

🏁 Script executed:

# Look for where preTag and postTag come from (likely from Meilisearch response) rg 'preTag|postTag' packages/autocomplete-client/src -B 2 -A 2 | head -50

Repository: meilisearch/meilisearch-js-plugins

Length of output: 4034

🏁 Script executed:

# Check the package.json and documentation for context on highlight tags cat packages/autocomplete-client/package.json

Repository: meilisearch/meilisearch-js-plugins

Length of output: 1241

🏁 Script executed:

# Check the constants to see what the default highlight tags are rg 'HIGHLIGHT_PRE_TAG|HIGHLIGHT_POST_TAG' packages/autocomplete-client/src -B 2 -A 2

Repository: meilisearch/meilisearch-js-plugins

Length of output: 2114

🏁 Script executed:

# Look at the complete context of how preTag and postTag flow through the code cat -n packages/autocomplete-client/src/search/fetchMeilisearchResults.ts | head -50

Repository: meilisearch/meilisearch-js-plugins

Length of output: 1902

Escape custom highlight tags before using them in regex patterns.

highlightPreTag and highlightPostTag come from user-provided parameters and are directly interpolated into RegExp constructors at lines 24 and 33. Any regex metacharacter (e.g., ., *, [, ], ^, $, |, +, ?, (, ), {, }, \) will either cause a runtime error or alter the regex semantics, breaking metadata extraction for otherwise valid custom tags.

Suggested fix

+function escapeRegExp(value: string): string { + return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&') +} + export function calculateHighlightMetadata( query: string, preTag: string, postTag: string, highlightValue: string ): HighlightMetadata { - const highlightRegex = new RegExp(`${preTag}(.*?)${postTag}`, 'g') + const escapedPreTag = escapeRegExp(preTag) + const escapedPostTag = escapeRegExp(postTag) + const highlightRegex = new RegExp( + `${escapedPreTag}(.*?)${escapedPostTag}`, + 'g' + ) const matches: string[] = [] let match while ((match = highlightRegex.exec(highlightValue)) !== null) { matches.push(match[1]) } const cleanValue = highlightValue.replace( - new RegExp(`${preTag}|${postTag}`, 'g'), + new RegExp(`${escapedPreTag}|${escapedPostTag}`, 'g'), '' )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Extract all highlighted segments

const highlightRegex = new RegExp(`${preTag}(.*?)${postTag}`, 'g')

const matches: string[] = []

let match

while ((match = highlightRegex.exec(highlightValue)) !== null) {

matches.push(match[1])

}

// Remove highlight tags to get the highlighted text without the tags

const cleanValue = highlightValue.replace(

new RegExp(`${preTag}|${postTag}`, 'g'),

''

)

function escapeRegExp(value: string): string {

return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')

}

export function calculateHighlightMetadata(

query: string,

preTag: string,

postTag: string,

highlightValue: string

): HighlightMetadata {

// Extract all highlighted segments

const escapedPreTag = escapeRegExp(preTag)

const escapedPostTag = escapeRegExp(postTag)

const highlightRegex = new RegExp(

`${escapedPreTag}(.*?)${escapedPostTag}`,

'g'

)

const matches: string[] = []

let match

while ((match = highlightRegex.exec(highlightValue)) !== null) {

matches.push(match[1])

}

// Remove highlight tags to get the highlighted text without the tags

const cleanValue = highlightValue.replace(

new RegExp(`${escapedPreTag}|${escapedPostTag}`, 'g'),

''

)

🧰 Tools

🪛 ast-grep (0.42.0)

[warning] 23-23: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(${preTag}(.*?)${postTag}, 'g')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

[warning] 32-32: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(${preTag}|${postTag}, 'g')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/autocomplete-client/src/search/highlight.ts` around lines 23 - 35, The current code constructs RegExp with raw user-provided highlight tags (preTag/postTag) in the highlighting logic (variables preTag, postTag, highlightValue), which breaks for regex metacharacters; fix by escaping meta-characters before building regexes: implement or use an escapeRegExp helper and apply it to preTag and postTag when creating the two RegExp instances used for extracting matches (highlightRegex) and for removing tags (the replace regex), so both the exec loop in the highlight extraction and the cleanValue replace operate on the literal tag strings.

coderabbitai · 2026-04-06T06:12:28Z

packages/autocomplete-client/src/search/highlight.ts

+  // Determine if the entire attribute is highlighted
+  // fullyHighlighted = true if cleanValue and the concatenation of all matched segments are identical
+  const highlightedText = matches.join('')
+  const fullyHighlighted = cleanValue === highlightedText


⚠️ Potential issue | 🟠 Major

fullyHighlighted is wrong for multi-tag full matches.

matches.join('') drops separators, so <em>Star</em> <em>Wars</em> becomes StarWars and compares unequal to Star Wars. A value whose entire content is highlighted across multiple tags is therefore reported as fullyHighlighted: false, which breaks the exact selection/ranking behavior this PR is adding.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/autocomplete-client/src/search/highlight.ts` around lines 37 - 40, The current fullyHighlighted check uses highlightedText = matches.join(''), which drops the original separators (spaces/punctuation) and makes comparisons with cleanValue incorrect for multi-tag full matches; instead reconstruct the highlightedText using the matched ranges from the original string (use the match indices/positions associated with matches) and include the intervening substrings from cleanValue between matches so the rebuilt highlightedText preserves separators, then set fullyHighlighted = cleanValue === rebuiltHighlightedText (update references to matches and fullyHighlighted accordingly).

Strift marked this pull request as draft January 15, 2025 08:32

Strift added the bug Something isn't working label Apr 6, 2026

Strift added 7 commits April 6, 2026 13:07

Create highlight helper file

8b0d4ca

Move utils to its own file

755790b

Refactor: extract function to build search requests

68e38c9

Refactor: create buildHits function

52a8bc6

Add new tests for unhandled usecase

f0d87a7

Implement recursive highlight metadata enrichment

36c3559

Restore type safety

fe7bd75

Strift force-pushed the fix/highlights-metadata branch from 29f70c0 to fe7bd75 Compare April 6, 2026 05:50

Format

1836b95

Strift marked this pull request as ready for review April 6, 2026 06:01

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing highlights metadata#1359

Fixing highlights metadata#1359
Strift wants to merge 8 commits intomainfrom
fix/highlights-metadata

Strift commented Jan 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

changeset-bot bot commented Jan 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 6, 2026

Uh oh!

coderabbitai bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Strift commented Jan 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Related issue

What does this PR do?

PR checklist

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Strift commented Jan 15, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Jan 15, 2025 •

edited

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading