docs: document speciesist bias in word vectors and NLP models

## Proposal

Document speciesist bias as a recognized form of bias in spaCy's documentation, particularly in sections about word vectors and model evaluation.

## Context

Peer-reviewed research has demonstrated that word embeddings and NLP models — the foundation of spaCy's pipeline — encode speciesist associations:

- **Takeshita et al. (2022).** "Speciesist language and nonhuman animal bias in English masked language models." *Information Processing & Management*. Found BERT, DistilBERT, RoBERTa, and ALBERT all associate harmful words with nonhuman animals.
- **Hagendorff et al. (2023).** "Speciesist bias in AI." *AI and Ethics*. Documented speciesist bias across image recognition, word embeddings, and language models.
- **Leach et al. (2023).** Large-scale word embedding analysis establishing anthropocentric speciesism in everyday language. *British Journal of Social Psychology*.

## Relevance to spaCy

- **Word vectors** (en_core_web_md, en_core_web_lg): Pre-trained vectors inherit speciesist associations from training corpora.
- **NER**: Named entity models trained on biased corpora may treat animal-related entities differently
- **Text classification**: Downstream models built on spaCy inherit these biases
- **Sentiment**: spaCy-based sentiment pipelines reflect speciesist patterns in training data

## Suggested Change

A brief note in the word vectors or model evaluation documentation acknowledging speciesist bias as a documented phenomenon, with citations. This aligns with spaCy's existing documentation on model limitations and biases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: document speciesist bias in word vectors and NLP models #13926

Proposal

Context

Relevance to spaCy

Suggested Change

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

docs: document speciesist bias in word vectors and NLP models #13926

Description

Proposal

Context

Relevance to spaCy

Suggested Change

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions