You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use npz instead of pickle for Thai W2P weights (#1328)
* Update np.load to disallow allow_pickle
Change np.load to disallow pickling for security.
* Fix np.load allow_pickle=False to work correctly with .npz format (#1329)
* Initial plan
* Fix np.load allow_pickle=False to work with .npz NpzFile format
- Replace .item().get(key) with [key] dict-style access on NpzFile
- Remove variables instance attribute; use local variable instead
- Add type annotation for variables local var as np.lib.npyio.NpzFile
- Add allow_pickle=False to embeddings.npy load in words_spelling_correction.py
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
* Initial plan
* Fix np.load allow_pickle for legacy .npy corpus (ValueError fix)
The thai_w2p corpus v0.2 is stored as a .npy pickled dict.
Loading it with allow_pickle=False raises ValueError.
Detect file format by extension:
- .npz: use allow_pickle=False (secure, for future corpus versions)
- .npy (legacy): use allow_pickle=True + dict validation
Also add `import os` for os.path.splitext().
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
* Initial plan
* Sync dev, add pickle warning, fix docstring and code style
Co-authored-by: bact <128572+bact@users.noreply.github.com>
* Update CHANGELOG for release 5.3.1
This release focuses on security issues related to corpus file loading, including improved pickle handling and defensive file loading.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Only load pickle file for PYTHAINLP_W2P_ALLOW_LEGACY_PICKLE is set
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Add PYTHAINLP_W2P_ALLOW_LEGACY_PICKLE: 1 to untitest env
* Change PYTHAINLP_W2P_ALLOW_LEGACY_PICKLE to PYTHAINLP_ALLOW_UNSAFE_PICKLE
Add is_unsafe_pickle_allowed() function
* Sort import
* Add test for PYTHAINLP_ALLOW_UNSAFE_PICKLE
* Fix import
* Set PYTHAINLP_ALLOW_UNSAFE_PICKLE for tone detector test
* Refactor is_unsafe_pickle_allowed()
* Refactor model loading to use .npz format only
Updated model name and refactored variable loading to use .npz format exclusively, removing legacy .npy handling.
* Clean up w2p.py by removing os and warnings imports
Removed unused imports from w2p.py
* Remove PYTHAINLP_ALLOW_UNSAFE_PICKLE tests
* Update CHANGELOG.md
* Fix imports
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Update CHANGELOG.md
* Remove PYTHAINLP_ALLOW_UNSAFE_PICKLE from doc
We no longer use pickle. Do not advertise this env var. Keep it internally for future use. (may remove in 6.0.0)
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: Arthit Suriyawongkul <arthit@gmail.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
0 commit comments