Resolve NumPy compatibility hell#3231
Resolve NumPy compatibility hell#3231menshikh-iv wants to merge 12 commits intopiskvorky:developfrom
Conversation
Wouldn't that preclude us from using any new API functions from numpy? (new = introduced since If so, that doesn't sound like a desirable solution. For example, this PR seems to introduce a regression: #2864 and #3220 fixed |
gensim/models/word2vec.py
Outdated
| if not hasattr(self, 'random'): | ||
| # use new instance of numpy's recommended generator/algorithm | ||
| self.random = np.random.default_rng(seed=self.seed) | ||
| self.random = np.random.RandomState(self.seed) |
There was a problem hiding this comment.
Wouldn't that preclude us from using any new API functions from numpy? (new = introduced since oldest-supported-numpy)
It doesn't have to. Where we can easily replicate the new API functions, we should do so (as @menshikh-iv has done with gensim.utils.default_rng function). Where we can't (so far, there are no such cases, but there may be in the future), we could do something like:
def np_cool_new_function(*args, **kwargs):
try:
fn = np.cool_new_function
except AttributeError:
raise RuntimeError("your numpy version is too old, please upgrade")
else:
return fn(*args, **kwargs)There was a problem hiding this comment.
Perhaps. Although this smells of re-implementing dependency resolution internally in Gensim, which would be extra hell.
But I guess as long as setup.py requires an actual working version of numpy (not just oldest-supported-numpy), it should be OK.
But requiring some dependency version in setup.py, and then raising RuntimeErrors dynamically suggesting that version is too old, would not be OK.
There was a problem hiding this comment.
But I guess as long as setup.py requires an actual working version of numpy (not just oldest-supported-numpy), it should be OK.
The problem with going that way is that it prevents us from building against oldest-supported-numpy. We then have to reinvent that particular wheel in our build system infrastructure.
There was a problem hiding this comment.
Let's see if I got this right:
| numpy API (setup.py): oldest-support-numpy | numpy API (setup.py): 1.17.0 | |
|---|---|---|
| numpy ABI (wheels): oldest-support-numpy | this #3231 | current develop |
| numpy ABI (wheels): 1.17.0 | ? | desired, #3226 (comment) |
Building against the oldest numpy we actually support makes more sense to me than building against oldest-supported-numpy and then throwing RuntimeError because it's too old, or maintaining separate code paths for deprecated functions we don't really need/want (RandomState).
@mpenkov what exactly was the problem with the "desired" quadrant? Too much code?
There was a problem hiding this comment.
It's not exactly code, it's build system configuration.
We're essentially replicating this with slightly different business logic.
There was a problem hiding this comment.
OK. Reading the comment in your .yml link, that sounds exactly right, exactly like what we want = the "desired" state. Why does that not work?
@gojomo can you check too? I may have gotten lost in the PRs and comments.
@mpenkov not to get too philosophical – let's release bugfix 4.1.1 ASAP. We can discuss a "proper" solution once the fire is put out.
If we'll have tests for oldest & newest numpy in CI, these will be detected automatically |
Adding some comments to the code
|
Closing this, because we've moved on. I also recall a problem with oldest-supported-numpy, but can't remember what it is at this stage. @menshikh-iv Thank you for your effort and for guiding us through these numpy issues. |
|
🙏 |
Idea:
oldest-supported-numpyeverywhere (on every python and platform)Goal: Make gensim works with numpy correctly (avoid issues like #3226).
Fixes #3226.