I don't know very much about how reproducibility validation works. Is it the case that, if we assume p= ~0.05 and all 50 original studies are perfect, we would expect the first iteration of reproducibility validation to fail for ~2 of the 50 studies?
Not necessarily 5%, no. The p-value is controlling the false-positive rate, i.e. it constrains the likelihood that a seemingly-significant result was really just by chance. Therefore, if we were now given the real ground truth, it's likely that ~5% of the original positive results (if the studies were properly conducted) would turn out to be unsubstantiated after all.
But you seem to be positing the reverse: a hypothetical case where we assume that the results of the original studies were in fact correct (i.e. ground truth), and we want a sort of false-negative rate, the chance that a fact would not be confirmed by a replication study, despite actually being true. That depends on several things about the replication attempt, such as sample size and methodology, summarized into the concept of statistical power: http://en.wikipedia.org/wiki/Statistical_power
The overall failure-to-replicate rate would involve both aspects, with four possible outcomes at varying likelihood: 1) a correct result that was confirmed; 2) a correct result that failed to be confirmed; 3) an incorrect result that was confirmed anyway; and 4) an incorrect result that failed to be confirmed. Obviously it would be ideal if #1 and #4 were much higher than #2 and #3.
> That depends on several things about the replication attempt, such as sample size and methodology, summarized into the concept of statistical power: http://en.wikipedia.org/wiki/Statistical_power
It's not just power, you also need prior odds if you want to make an unconditional estimate. This is pretty much the core of Ioannides's famous paper http://www.plosmedicine.org/article/info:doi/10.1371/journal... - even after taking power into account, the replication rate could be anywhere from 0 to say 80% depending on the prior odds.
Thanks, very interesting. It was silly of me to assume that the p-value could be used in that way when, as you explain, it's not measuring or indicating that at all.
it's likely that ~5% of the original positive results (if the studies were properly conducted) would turn out to be unsubstantiated after all.
Not quite. Assume P of the tests are truly positive, and N are truly negative.
We expect to see approximately 0.05 x N + A x P positives in the entire sample, where A is the probability of a false negative. So the fraction of true positives not replicated is likely to be 0.05 x N / (0.05 x N + A x P).
This is Bilal from Science Exchange, and I help with the Reproducibility Intiative as well.
I think that in general you're right, with the assumption that p=0.05. However, each of the studies being replicated is composed of many intricate experiments. We will be reproducing most of those experiments, so while the odds of a single validation experiment failing are certainly not zero, I think the chances of an entire paper's worth of reproduction experiments being false negatives is much less likely. All of the protocols and reagents used, and the data from the replication studies, will be publicly available when they are complete, so that everyone can see what we did and how we did it.
Oh, dear. This is very much not a good sign, since answer to grandparent's question is "no" as pointed out above and anyone who knows statistics should see that immediately. (Checks Science Exchange site.) Okay, Bilal Mahmood is "customer support and outreach" so this is not necessarily fatal. Still worrisome.