Skip to content

docs: document speciesist bias in word vectors and NLP modelsΒ #13926

@stuckvgn

Description

@stuckvgn

Proposal

Document speciesist bias as a recognized form of bias in spaCy's documentation, particularly in sections about word vectors and model evaluation.

Context

Peer-reviewed research has demonstrated that word embeddings and NLP models β€” the foundation of spaCy's pipeline β€” encode speciesist associations:

  • Takeshita et al. (2022). "Speciesist language and nonhuman animal bias in English masked language models." Information Processing & Management. Found BERT, DistilBERT, RoBERTa, and ALBERT all associate harmful words with nonhuman animals.
  • Hagendorff et al. (2023). "Speciesist bias in AI." AI and Ethics. Documented speciesist bias across image recognition, word embeddings, and language models.
  • Leach et al. (2023). Large-scale word embedding analysis establishing anthropocentric speciesism in everyday language. British Journal of Social Psychology.

Relevance to spaCy

  • Word vectors (en_core_web_md, en_core_web_lg): Pre-trained vectors inherit speciesist associations from training corpora.
  • NER: Named entity models trained on biased corpora may treat animal-related entities differently
  • Text classification: Downstream models built on spaCy inherit these biases
  • Sentiment: spaCy-based sentiment pipelines reflect speciesist patterns in training data

Suggested Change

A brief note in the word vectors or model evaluation documentation acknowledging speciesist bias as a documented phenomenon, with citations. This aligns with spaCy's existing documentation on model limitations and biases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions