Skip to content

Fixing highlights metadata#1359

Open
Strift wants to merge 8 commits intomainfrom
fix/highlights-metadata
Open

Fixing highlights metadata#1359
Strift wants to merge 8 commits intomainfrom
fix/highlights-metadata

Conversation

@Strift
Copy link
Copy Markdown
Collaborator

@Strift Strift commented Jan 15, 2025

Pull Request

Related issue

Fixes #1337

What does this PR do?

  • ...

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • Tests

    • Added comprehensive test coverage for search result highlighting across nested and complex data structures.
  • Refactor

    • Improved search highlighting accuracy by enhancing metadata calculation for better match detection across all field types.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Jan 15, 2025

⚠️ No Changeset found

Latest commit: 1836b95

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Strift Strift marked this pull request as draft January 15, 2025 08:32
@Strift Strift added the bug Something isn't working label Apr 6, 2026
@Strift Strift force-pushed the fix/highlights-metadata branch from 29f70c0 to fe7bd75 Compare April 6, 2026 05:50
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

This PR enriches Meilisearch results with highlight metadata (fullyHighlighted, matchLevel, matchedWords) to match Algolia's capabilities. A new highlight.ts module parses highlight strings and computes metadata. The fetchMeilisearchResults function is refactored to recursively enrich nested highlight structures using these utilities, bounded by a depth limit. Tests validate recursive enrichment across various nested document structures.

Changes

Cohort / File(s) Summary
Highlight Metadata Module
packages/autocomplete-client/src/search/highlight.ts
New module exporting HighlightMetadata interface and calculateHighlightMetadata function. Parses Meilisearch highlight strings by extracting segments wrapped in pre/post tags, computing fullyHighlighted status, determining matchLevel ('none', 'partial', 'full'), and extracting matchedWords.
Search Results Refactoring
packages/autocomplete-client/src/search/fetchMeilisearchResults.ts
Refactored hit enrichment pipeline: delegated request construction to buildSearchRequest, introduced buildHits and enrichHighlightTree for recursive highlight enrichment, added shouldIncludeTopLevelHighlightField filtering, imported calculateHighlightMetadata, and set MAX_HIGHLIGHT_DEPTH = 20 to bound recursion.
Utility Additions
packages/autocomplete-client/src/utils.ts
Added generic mapOneOrMany utility function to conditionally apply a mapping function to either a single value or array of values, preserving input shape.
Test Suite Expansion
packages/autocomplete-client/src/search/__tests__/fetchMeilisearchResults.test.ts
Added comprehensive test suite ("Highlighting Metadata") validating recursive enrichment of _highlightResult across top-level arrays, nested objects within arrays, multiple fields in nested objects, and case-insensitive match-level validation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant fetchMeilisearch as fetchMeilisearchResults
    participant buildHits
    participant enrichTree as enrichHighlightTree
    participant calcMeta as calculateHighlightMetadata

    Client->>fetchMeilisearch: fetchMeilisearchResults(queries)
    fetchMeilisearch->>fetchMeilisearch: buildSearchRequest(queries)
    fetchMeilisearch->>fetchMeilisearch: execute Meilisearch query
    fetchMeilisearch->>buildHits: buildHits(result, query)
    
    loop For each hit
        buildHits->>enrichTree: enrichHighlightTree(highlightResult)
        loop Recursive traversal (depth <= 20)
            enrichTree->>enrichTree: traverse structure
            alt Is highlight value
                enrichTree->>calcMeta: calculateHighlightMetadata(...)
                calcMeta-->>enrichTree: HighlightMetadata{value, fullyHighlighted, matchLevel, matchedWords}
                enrichTree->>enrichTree: attach metadata
            else Is object/array
                enrichTree->>enrichTree: recurse into nested structure
            end
        end
        enrichTree-->>buildHits: enriched highlight tree
    end
    
    buildHits-->>fetchMeilisearch: enriched hits with metadata
    fetchMeilisearch-->>Client: results with HighlightMetadata
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Highlights dance in nested trees,
Metadata flows with gentle ease,
Match levels sing from deep within,
Recursion counts to twenty—then!
Algolia's glow, now Meilisearch's art.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fixing highlights metadata' is concise and directly describes the main change: adding metadata fields to highlight results.
Linked Issues check ✅ Passed The PR adds fullyHighlighted, matchLevel, and matchedWords properties to _highlightResult across top-level fields, arrays, and nested objects, fully addressing issue #1337's requirements.
Out of Scope Changes check ✅ Passed All changes are scoped to highlight metadata enrichment; refactors to fetchMeilisearchResults and utility additions directly support the stated objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/highlights-metadata

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Strift Strift marked this pull request as ready for review April 6, 2026 06:01
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/autocomplete-client/src/utils.ts (1)

1-7: Drop the JSDoc here.

mapOneOrMany and its type signature already explain the helper, so this block mostly adds noise.

As per coding guidelines, "Prefer self-descriptive code to comments. Only use comments for complex logic."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autocomplete-client/src/utils.ts` around lines 1 - 7, Remove the
redundant JSDoc block above the mapOneOrMany helper: delete the multi-line
comment that documents the function and its parameters/returns so the code
remains self-descriptive; locate the comment immediately preceding the
mapOneOrMany function in utils.ts and remove it, leaving only the function
declaration and its TypeScript signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/autocomplete-client/src/search/highlight.ts`:
- Around line 37-40: The current fullyHighlighted check uses highlightedText =
matches.join(''), which drops the original separators (spaces/punctuation) and
makes comparisons with cleanValue incorrect for multi-tag full matches; instead
reconstruct the highlightedText using the matched ranges from the original
string (use the match indices/positions associated with matches) and include the
intervening substrings from cleanValue between matches so the rebuilt
highlightedText preserves separators, then set fullyHighlighted = cleanValue ===
rebuiltHighlightedText (update references to matches and fullyHighlighted
accordingly).
- Around line 23-35: The current code constructs RegExp with raw user-provided
highlight tags (preTag/postTag) in the highlighting logic (variables preTag,
postTag, highlightValue), which breaks for regex metacharacters; fix by escaping
meta-characters before building regexes: implement or use an escapeRegExp helper
and apply it to preTag and postTag when creating the two RegExp instances used
for extracting matches (highlightRegex) and for removing tags (the replace
regex), so both the exec loop in the highlight extraction and the cleanValue
replace operate on the literal tag strings.

---

Nitpick comments:
In `@packages/autocomplete-client/src/utils.ts`:
- Around line 1-7: Remove the redundant JSDoc block above the mapOneOrMany
helper: delete the multi-line comment that documents the function and its
parameters/returns so the code remains self-descriptive; locate the comment
immediately preceding the mapOneOrMany function in utils.ts and remove it,
leaving only the function declaration and its TypeScript signature.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2c811d6-d743-41fb-b93e-d900f4ca4a38

📥 Commits

Reviewing files that changed from the base of the PR and between 18e8370 and 1836b95.

📒 Files selected for processing (4)
  • packages/autocomplete-client/src/search/__tests__/fetchMeilisearchResults.test.ts
  • packages/autocomplete-client/src/search/fetchMeilisearchResults.ts
  • packages/autocomplete-client/src/search/highlight.ts
  • packages/autocomplete-client/src/utils.ts

Comment on lines +23 to +35
// Extract all highlighted segments
const highlightRegex = new RegExp(`${preTag}(.*?)${postTag}`, 'g')
const matches: string[] = []
let match
while ((match = highlightRegex.exec(highlightValue)) !== null) {
matches.push(match[1])
}

// Remove highlight tags to get the highlighted text without the tags
const cleanValue = highlightValue.replace(
new RegExp(`${preTag}|${postTag}`, 'g'),
''
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and find the highlight.ts file
find . -name "highlight.ts" -type f 2>/dev/null | head -20

Repository: meilisearch/meilisearch-js-plugins

Length of output: 132


🏁 Script executed:

# Get the full content of the highlight.ts file to understand the context
cat -n packages/autocomplete-client/src/search/highlight.ts

Repository: meilisearch/meilisearch-js-plugins

Length of output: 2461


🏁 Script executed:

# Search for other RegExp constructions with template literals in the autocomplete-client package
rg 'new RegExp.*\$\{' packages/autocomplete-client/src --type ts --type tsx -A 2

Repository: meilisearch/meilisearch-js-plugins

Length of output: 105


🏁 Script executed:

# Check if there are any existing tests for this function that might reveal expected behavior
rg 'calculateHighlightMetadata|preTag|postTag' packages/autocomplete-client --type ts --type tsx | head -30

Repository: meilisearch/meilisearch-js-plugins

Length of output: 105


🏁 Script executed:

# Search for other RegExp constructions with template literals in the autocomplete-client package
rg 'new RegExp.*\$\{' packages/autocomplete-client/src --type ts -A 2

Repository: meilisearch/meilisearch-js-plugins

Length of output: 566


🏁 Script executed:

# Check if there are any existing tests for this function
find . -path "*/test*" -o -path "*/spec*" | grep -i highlight

Repository: meilisearch/meilisearch-js-plugins

Length of output: 60


🏁 Script executed:

# Look for test files that might test calculateHighlightMetadata
find . -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "calculateHighlightMetadata" 2>/dev/null

Repository: meilisearch/meilisearch-js-plugins

Length of output: 62


🏁 Script executed:

# Check how calculateHighlightMetadata is called in the codebase
rg 'calculateHighlightMetadata' packages/autocomplete-client/src -B 3 -A 3

Repository: meilisearch/meilisearch-js-plugins

Length of output: 2188


🏁 Script executed:

# Look for where preTag and postTag come from (likely from Meilisearch response)
rg 'preTag|postTag' packages/autocomplete-client/src -B 2 -A 2 | head -50

Repository: meilisearch/meilisearch-js-plugins

Length of output: 4034


🏁 Script executed:

# Check the package.json and documentation for context on highlight tags
cat packages/autocomplete-client/package.json

Repository: meilisearch/meilisearch-js-plugins

Length of output: 1241


🏁 Script executed:

# Check the constants to see what the default highlight tags are
rg 'HIGHLIGHT_PRE_TAG|HIGHLIGHT_POST_TAG' packages/autocomplete-client/src -B 2 -A 2

Repository: meilisearch/meilisearch-js-plugins

Length of output: 2114


🏁 Script executed:

# Look at the complete context of how preTag and postTag flow through the code
cat -n packages/autocomplete-client/src/search/fetchMeilisearchResults.ts | head -50

Repository: meilisearch/meilisearch-js-plugins

Length of output: 1902


Escape custom highlight tags before using them in regex patterns.

highlightPreTag and highlightPostTag come from user-provided parameters and are directly interpolated into RegExp constructors at lines 24 and 33. Any regex metacharacter (e.g., ., *, [, ], ^, $, |, +, ?, (, ), {, }, \) will either cause a runtime error or alter the regex semantics, breaking metadata extraction for otherwise valid custom tags.

Suggested fix
+function escapeRegExp(value: string): string {
+  return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
+}
+
 export function calculateHighlightMetadata(
   query: string,
   preTag: string,
   postTag: string,
   highlightValue: string
 ): HighlightMetadata {
-  const highlightRegex = new RegExp(`${preTag}(.*?)${postTag}`, 'g')
+  const escapedPreTag = escapeRegExp(preTag)
+  const escapedPostTag = escapeRegExp(postTag)
+  const highlightRegex = new RegExp(
+    `${escapedPreTag}(.*?)${escapedPostTag}`,
+    'g'
+  )
   const matches: string[] = []
   let match
   while ((match = highlightRegex.exec(highlightValue)) !== null) {
     matches.push(match[1])
   }

   const cleanValue = highlightValue.replace(
-    new RegExp(`${preTag}|${postTag}`, 'g'),
+    new RegExp(`${escapedPreTag}|${escapedPostTag}`, 'g'),
     ''
   )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Extract all highlighted segments
const highlightRegex = new RegExp(`${preTag}(.*?)${postTag}`, 'g')
const matches: string[] = []
let match
while ((match = highlightRegex.exec(highlightValue)) !== null) {
matches.push(match[1])
}
// Remove highlight tags to get the highlighted text without the tags
const cleanValue = highlightValue.replace(
new RegExp(`${preTag}|${postTag}`, 'g'),
''
)
function escapeRegExp(value: string): string {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}
export function calculateHighlightMetadata(
query: string,
preTag: string,
postTag: string,
highlightValue: string
): HighlightMetadata {
// Extract all highlighted segments
const escapedPreTag = escapeRegExp(preTag)
const escapedPostTag = escapeRegExp(postTag)
const highlightRegex = new RegExp(
`${escapedPreTag}(.*?)${escapedPostTag}`,
'g'
)
const matches: string[] = []
let match
while ((match = highlightRegex.exec(highlightValue)) !== null) {
matches.push(match[1])
}
// Remove highlight tags to get the highlighted text without the tags
const cleanValue = highlightValue.replace(
new RegExp(`${escapedPreTag}|${escapedPostTag}`, 'g'),
''
)
🧰 Tools
🪛 ast-grep (0.42.0)

[warning] 23-23: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(${preTag}(.*?)${postTag}, 'g')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)


[warning] 32-32: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(${preTag}|${postTag}, 'g')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autocomplete-client/src/search/highlight.ts` around lines 23 - 35,
The current code constructs RegExp with raw user-provided highlight tags
(preTag/postTag) in the highlighting logic (variables preTag, postTag,
highlightValue), which breaks for regex metacharacters; fix by escaping
meta-characters before building regexes: implement or use an escapeRegExp helper
and apply it to preTag and postTag when creating the two RegExp instances used
for extracting matches (highlightRegex) and for removing tags (the replace
regex), so both the exec loop in the highlight extraction and the cleanValue
replace operate on the literal tag strings.

Comment on lines +37 to +40
// Determine if the entire attribute is highlighted
// fullyHighlighted = true if cleanValue and the concatenation of all matched segments are identical
const highlightedText = matches.join('')
const fullyHighlighted = cleanValue === highlightedText
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

fullyHighlighted is wrong for multi-tag full matches.

matches.join('') drops separators, so <em>Star</em> <em>Wars</em> becomes StarWars and compares unequal to Star Wars. A value whose entire content is highlighted across multiple tags is therefore reported as fullyHighlighted: false, which breaks the exact selection/ranking behavior this PR is adding.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autocomplete-client/src/search/highlight.ts` around lines 37 - 40,
The current fullyHighlighted check uses highlightedText = matches.join(''),
which drops the original separators (spaces/punctuation) and makes comparisons
with cleanValue incorrect for multi-tag full matches; instead reconstruct the
highlightedText using the matched ranges from the original string (use the match
indices/positions associated with matches) and include the intervening
substrings from cleanValue between matches so the rebuilt highlightedText
preserves separators, then set fullyHighlighted = cleanValue ===
rebuiltHighlightedText (update references to matches and fullyHighlighted
accordingly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add 'fullyHighlighted', 'matchLevel' and 'matchedWords' to '_highlightResult' in autocomplete client

1 participant