• 67 Citations

Abstract

Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.
LanguageEnglish
Pages183-186
Number of pages4
JournalScience
Volume356
Issue number6334
DOIs
StatusPublished - 14 Apr 2017

Fingerprint

Learning systems
Semantics
Artificial intelligence

Keywords

  • cs.AI
  • cs.CL
  • cs.CY
  • cs.LG

Cite this

Semantics derived automatically from language corpora contain human-like biases. / Caliskan, Aylin; Bryson, Joanna J; Narayanan, Arvind.

In: Science, Vol. 356, No. 6334, 14.04.2017, p. 183-186.

Research output: Contribution to journalArticle

Caliskan, Aylin ; Bryson, Joanna J ; Narayanan, Arvind. / Semantics derived automatically from language corpora contain human-like biases. In: Science. 2017 ; Vol. 356, No. 6334. pp. 183-186.
@article{7c2d36d9579a45649fbfa622eade17a3,
title = "Semantics derived automatically from language corpora contain human-like biases",
abstract = "Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.",
keywords = "cs.AI, cs.CL, cs.CY, cs.LG",
author = "Aylin Caliskan and Bryson, {Joanna J} and Arvind Narayanan",
year = "2017",
month = "4",
day = "14",
doi = "10.1126/science.aal4230",
language = "English",
volume = "356",
pages = "183--186",
journal = "Science",
issn = "0036-8075",
publisher = "American Association for the Advancement of Science",
number = "6334",

}

TY - JOUR

T1 - Semantics derived automatically from language corpora contain human-like biases

AU - Caliskan, Aylin

AU - Bryson, Joanna J

AU - Narayanan, Arvind

PY - 2017/4/14

Y1 - 2017/4/14

N2 - Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.

AB - Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.

KW - cs.AI

KW - cs.CL

KW - cs.CY

KW - cs.LG

UR - https://doi.org/10.1126/science.aal4230

U2 - 10.1126/science.aal4230

DO - 10.1126/science.aal4230

M3 - Article

VL - 356

SP - 183

EP - 186

JO - Science

T2 - Science

JF - Science

SN - 0036-8075

IS - 6334

ER -