Reproducibility

Why is reproducibility important?

Why is reproducibility important?

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

Buckheit and Donoho (2010), paraphrasing Claerbout

What is reproducibility?

  • There is a lot of terminology confusion.
  • Different groups will use similar words to describe subtly different concepts.
  • Sometimes in exactly opposite ways.
  • See Barba, 2018 for a detailed history of this confusion.

What is reproducibility?

“Most research findings are false”

Ioannidis 2005, PLoS Medicine

  • The main reason is the reliance on p-values for assessing the truth of scientific hypotheses.
  • PPV: “positive predictive value”, the post-study probability that a research finding is true.
  • I.e., the chance it would replicate/generalize.

“Most research findings are false”

  • PPV depends on statistical power (effect size, sample size), but not only.
  • The ratio of “true relationships” to “no relationships” tested within a research field.
  • Bias: design, analysis and presentation factors that produce false research findings.
  • Testing by several independent teams.

“Most research findings are false”

  • The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
  • The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.
  • The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
  • The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.

“Most research findings are false”

It gets worse…

“False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant”

Simmons, Nelson, Simonsohn, 2011

Undisclosed flexibility

  • HARKing
  • p-hacking

Undisclosed flexibility

The Open Science Collaboration

  • Many concerns, but little hard evidence.
  • Goal: estimate PPV in psychology research.
  • Reproduced 100 studies from prominent psych journals (Psych Sci, JPSP, JEP:LMC).
  • Studies/experiments selected using a uniform approach.
  • Replications conducted in consultation with original authors.

Results

  • Using null hypothesis rejection as criterion: 35 / 97 replications.
  • Using effect size overlap 30 / 73 replications.
  • Overall replication more prevalent for cognitive psych than social psych.

Results

What about reproducibility?

“Open science” approach to reproducibility

  • Availability of data and software:
    • Available vs. usable
  • Automation and provenance tracking
  • Copyright issues and other encumbrances
  • Open reporting of results

What about generalization?

“The generalizability crisis”

Yarkoni, 2022

  • Makes the statistical argument for an overall challenge to generalizability.
  • In practice, reproducibility and generalization may be at odds!
  • Example of the Alogna, 2014 registered replication report

What about robustness?

What about robustness?

  • Seems understudied, to be honest
  • Might be a nice project for this course 😀

What is important?