chore: migrate sentence segmentation from NLTK to spaCy#75
chore: migrate sentence segmentation from NLTK to spaCy#75Efreet408 wants to merge 8 commits intodevelopmentfrom
Conversation
|
|
||
| [tool.poetry.dependencies] | ||
| python = "^3.11" | ||
| python = ">=3.11, <3.15" |
There was a problem hiding this comment.
| pydantic = "^2.7.1" | ||
| nltk = "^3.8.1" | ||
| pydantic = {version = ">=2.13.0b2", allow-prereleases = true} | ||
| spacy = ">=3.8.9,<4" |
There was a problem hiding this comment.
https://github.com/explosion/spaCy/releases/tag/release-v3.8.9
v3.8.9: Support Python 3.14
| srsly = ">=2.5.2" | ||
| murmurhash = ">=1.0.14" | ||
| cymem = ">=2.0.12" | ||
| preshed = ">=3.0.11" |
There was a problem hiding this comment.
For some reason, spaCy does not pull its dependencies up to versions compatible with Python 3.14 (even though they exist https://github.com/explosion/preshed/releases/tag/release-v3.0.11
https://github.com/explosion/murmurhash/releases/tag/release-v1.0.14
https://github.com/explosion/cymem/releases/tag/release-v2.0.12
https://github.com/explosion/srsly/releases/tag/release-v2.5.2
), so I had to do it manually.
pyproject.toml
Outdated
| s3fs = {version = "^2024.3.1", optional = true} | ||
| pydantic = "^2.7.1" | ||
| nltk = "^3.8.1" | ||
| pydantic = {version = ">=2.13.0b2", allow-prereleases = true} |
There was a problem hiding this comment.
When running on Python 3.14, the error occurs pydantic.v1.errors.ConfigError: unable to infer type for attribute "REGEX"
This issue is documented in spaCy’s GitHub (explosion/spaCy#13902 explosion/spaCy#13895)
Unfortunately, the required changes in Pydantic are still in prerelease
(https://github.com/pydantic/pydantic/releases/tag/v2.13.0b1
Latest V1.10.26 release under the pydantic.v1 namespace. This version includes support for Python 3.14.)
Description of changes
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.