Ethical considerations with Big Data

Questions for today

  • Do data-intensive methods change our ethical considerations?
  • What new ethical considerations arise when conducting data-intensive research?
  • What are some new safeguards we might consider?

Foundations of ethics in biomedical research

  • Nuremberg trials and the Nuremberg code (1947)
    • Basic principles: continual informed consent, risk minimization
  • Declaration of Helsinki (1964)
    • World Medical Association
    • Serves as the basis for the system of IRB
  • Belmont Report (1979)
    • Respect for persons
    • Beneficience
    • Justice

Big data or naturally occurring datasets (BONDS)

  • Collected not for research purposes
  • E.g., data from social media

Examples

  • Decision-making quantified based on search engine data (Moat et al.)
  • Decision-making in used car purchases (Pope)
  • Corpus analysis of language demonstrating correlation of phonetic and semantic distances (Christiansen and Monaghan)
  • Theories of learning tested in educational technology (Koedinger et al.)

Some worrisome signs

  • Social media, in general
  • Scraping of data from OK Cupid (2016-2018)
    • Release of the data as an open dataset
    • Use of the data to classify sexual orientation
  • Human experimentation at scale
    • Facebook emotional contagion study (2014)
  • The Menlo report (2012): US DHS

Recasting the Belmont principles

  • Respect
    • Informed consent in digital services is very problematic
  • Beneficience
    • Minimizing the data that is collected
    • Indirect harms
  • Justice
    • Privacy
    • Representation
    • Algorithmic justice

Privacy

  • Latanya Sweeney : “Only You, Your Doctor, and Many Others May Know”
  • Some data is deidentified on its own
  • But when combined with other available data becomes identifiable
  • Was able to find the medical records of the MA governor
  • Cross-referenced age, gender, zip code between:
    • Publicly available voter rolls
    • Newly released “anonymized” health records of state employees.

Privacy

Differential privacy

  • “Statistical databases”
  • Answer queries by adding noise to each entry
  • While maintaining statistical properties across entries

Privacy

  • Clearview AI
  • Facial recognition at population scale
  • The death of privacy?

Hill, 2023

Algorithmic justice

COMPAS algorithm for sentencing

Algorithmic justice

COMPAS algorithm for sentencing

Algorithmic bias

Koenecke et al., 2020

“Weapons of Math Destruction”

  • Methods that are:
    • Opaque
    • Unregulated
    • Difficult to contest
    • Scalable
    • Consequential

O’Neil, 2016

Fairness Accountability and Transparency (FAccT)