Skip to content

Commit 63b342e

Browse files
docs: Sync Haystack API reference on Docusaurus (#10490)
Co-authored-by: julian-risch <4181769+julian-risch@users.noreply.github.com>
1 parent f0987de commit 63b342e

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

docs-website/reference/haystack-api/preprocessors_api.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,9 @@ def __init__(remove_empty_lines: bool = True,
192192
remove_regex: str | None = None,
193193
unicode_normalization: Literal["NFC", "NFKC", "NFD", "NFKD"]
194194
| None = None,
195-
ascii_only: bool = False)
195+
ascii_only: bool = False,
196+
strip_whitespaces: bool = False,
197+
replace_regexes: dict[str, str] | None = None)
196198
```
197199

198200
Initialize DocumentCleaner.
@@ -213,6 +215,12 @@ Note: This will run before any other steps.
213215
Will remove accents from characters and replace them with ASCII characters.
214216
Other non-ASCII characters will be removed.
215217
Note: This will run before any pattern matching or removal.
218+
- `strip_whitespaces`: If `True`, removes leading and trailing whitespace from the document content
219+
using Python's `str.strip()`. Unlike `remove_extra_whitespaces`, this only affects the beginning
220+
and end of the text, preserving internal whitespace (useful for markdown formatting).
221+
- `replace_regexes`: A dictionary mapping regex patterns to their replacement strings.
222+
For example, `{r'\n\n+': '\n'}` replaces multiple consecutive newlines with a single newline.
223+
This is applied after `remove_regex` and allows custom replacements instead of just removal.
216224

217225
<a id="document_cleaner.DocumentCleaner.run"></a>
218226

0 commit comments

Comments
 (0)