-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Open
Description
Proposal
Document speciesist bias as a recognized form of bias in spaCy's documentation, particularly in sections about word vectors and model evaluation.
Context
Peer-reviewed research has demonstrated that word embeddings and NLP models β the foundation of spaCy's pipeline β encode speciesist associations:
- Takeshita et al. (2022). "Speciesist language and nonhuman animal bias in English masked language models." Information Processing & Management. Found BERT, DistilBERT, RoBERTa, and ALBERT all associate harmful words with nonhuman animals.
- Hagendorff et al. (2023). "Speciesist bias in AI." AI and Ethics. Documented speciesist bias across image recognition, word embeddings, and language models.
- Leach et al. (2023). Large-scale word embedding analysis establishing anthropocentric speciesism in everyday language. British Journal of Social Psychology.
Relevance to spaCy
- Word vectors (en_core_web_md, en_core_web_lg): Pre-trained vectors inherit speciesist associations from training corpora.
- NER: Named entity models trained on biased corpora may treat animal-related entities differently
- Text classification: Downstream models built on spaCy inherit these biases
- Sentiment: spaCy-based sentiment pipelines reflect speciesist patterns in training data
Suggested Change
A brief note in the word vectors or model evaluation documentation acknowledging speciesist bias as a documented phenomenon, with citations. This aligns with spaCy's existing documentation on model limitations and biases.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels