Skip to content

fix: pass encoding to partition_csv for CSV loader#538

Open
DoTA (dotuananh0712) wants to merge 1 commit intolangchain-ai:mainfrom
dotuananh0712:main
Open

fix: pass encoding to partition_csv for CSV loader#538
DoTA (dotuananh0712) wants to merge 1 commit intolangchain-ai:mainfrom
dotuananh0712:main

Conversation

@dotuananh0712
Copy link

Summary

Issue

When users pass encoding via unstructured_kwargs, the unstructured library ignores it and defaults to UTF-8, causing UnicodeDecodeError for CSV files with other encodings.

Fix

def _get_elements(self) -> List:
    from unstructured.partition.csv import partition_csv

    input_encoding = self.unstructured_kwargs.get("encoding")
    return partition_csv(
        filename=self.file_path, encoding=input_encoding, **self.unstructured_kwargs
    )

Testing

AI assistance used for code review and implementation

Fixes langchain-ai#505 - Need to pass encoding to partition_csv() to allow
users to specify custom encodings for CSV files.

Previously, encoding was passed via unstructured_kwargs but the
unstructured library ignores kwargs for encoding, defaulting to utf-8.
Now encoding is extracted from kwargs and passed directly to partition_csv.

AI assistance used for code review
@github-actions github-actions bot added the fix label Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant