Results

False positive risk calculations

This web app was written by Colin Longstaff and David Colquhoun with help from Brendan Halpin. And in ver 1.7, help from Dr Will Parry to fix default for nsamp2.

Statistical considerations

The question which we ask in refs[1 - 4, 6, 7] is as follows. If you observe a "significant" P value after doing a single unbiased experiment, what is the probability that your result is a false positive? An account of the precise assumptions that underlie the calculations is given in ref [7]. "False positive risk" (FPR) is defined here as the probability that a result which is "significant" at a specified P value, is a false positive result. It is defined and explained in refs [3, 7 and at 26' in ref 6]. The same thing was called "false discovery rate"in refs [1] and [2], and it was called "false positive rate" in earlier drafts of refs [3, 7] The notation in this field is a mess and it's important to check the definitions in each paper There are two different ways to calculate FPR. These are explained in detail in section 10 of ref [1], and, more carefully, in section 3 of ref [3]. They can be called the p-equals method, and the p-less-than method. The latter definition is used most frequently (eg by Ioannidis and by Wacholder), but the former method is more appropriate for answering our question. All three options give results that are calculated with both methods. The results with the p-equals method, give a higher false positive risk, for any given P value, than the other method (see Fig 2 in ref [3]), but they are the appropriate way to answer the question.

How to run calculations

Click on the calculations tab, and choose which calculation to do by selecting one of the three radio buttons (top left) The input boxes that are appropriate for the calculation will appear. There are three variables, observed P value, FPR and the prior probability that the null hypothesis is false. The calculator will work out any one of these, given numbers for the other two. All three calculations require also the number of observations in each sample, and the effect size, expressed as a multiple of the standard deviation of the observations (default value 1.0) The default number per sample is 16 which gives a power of 0.78 for P = 0.05 and effect size = 1 -see refs [1] and [3] for more details. Note that all that matters is the effect size expressed as a multiple of the standard deviation of the original observations (sometimes known as Cohen's d ). The true mean of sample 1 is always 0 (null hypothesis), The true mean of sample 2 is set to the normalised effect size so the true standard deviation can always be set to 1, with no loss of generality.

A real life example

Fully worked examples are given in section 8 of ref [7].
A study of transcranial electromagnetic stimulation, published In Science concluded that it "improved associative memory performance", P = 0.043. If we assume that the experiment had adequate power (the sample size of 8 suggests that might be optimistic) then, in order to achieve a false positive risk of 5% when we observe P = 0.043, we would have to assume a prior probability of 0.85 that the effect on memory was genuine (found from radio button 1). Most people would think it was less than convincing to present an analysis based on the assumption that you were almost certain (probability 0.85) to be right before you did the experiment.
Another way to express the strength of the evidence provided by P = 0.043 is to note that it makes the existence of a real effect only 3.3 times as likely as the existence of no effect (likelihood ratio). This would correspond to a minimum false positive risk of 23% if we were willing to assume that non-specific electrical zapping of the brain was as likely as not to improve memory (prior probability of a real effect was 0.5) (found via radio button 3).
The radio button 2 option shows that in the most optimistic case (prior = 0.5), you need to have P = 0.008 to achieve an FPR of 5 percent. (Example from refs [3] and [7].)

Matching power

Much the same results are found for FPR if the power is kept constant. This is explained and exemplified in section 5 and Figure 1 of ref [7].
For example effect size = 1 and n = 16 gives power = 0.78. For an effect size of 0.5 SD, n = 61 gives similar power and also a similar FPR etc. And for an effect size of 0.2 SD, a power of 0.78 requires n = 375, and again this gives similar FPR etc. See ref [7] for more details. So choose n so that the calculated power matches that of your experiment.
There is a popular account of the logic involved in ref [4]. And ref [3] has, in section 9, a response to the recent 72 author paper, Benjamin et al [5], on related topics. There is a more technical account of the assumptions in ref [7]. And a defence of those assumptions in ref [8].

Versions

From ver 1.1 onwards, the effect size (expressed as a multiple of the standard deviation of the observations) can be entered. From ver 1.3 onwards, the values of power that are printed out are calculated for P = 0.05 and the specified effect size (expressed as a multiple of the standard deviation of the observations). In earlier versions they were calculated using the observed P value). Ver 1.4 has updated help notes. Ver 1.5 has updated help notes and references. Ver 1.6 is unchanged apart from the default radio button selected at start-up is now button 3, rather than button 1. Ver 1.7 allows unequal sample sizes (but still assumes same variance for both samples).

References

1. Colquhoun D.(2014) An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science 1(3):140216. doi: 10.1098/rsos.140216. Click for full text
2. Colquhoun D. False discovery rates: the movie (now superseded by ref [6]) Click for YouTube
3. Colquhoun D. (2017). The reproducibility of research and misinterpretation of P values. Royal Society Open Science 4 (12), doi: ttp://dx.doi.org/10.1098/rsos.171085 Click for full text
4. Colquhoun D. (2016). The problem with p-values. Aeon Magazine Click for full text
5. Benjamin, D. et al. (2017) Redefine Statistical Significance. PsyArXiv Preprints, July 22, 2017. Click for full text
6. Colquhoun, D. (2018) Colquhoun D. The false positive risk: a proposal concerning what to do about p-values (version 2) [talk based on that given at EvidenceLive, 2018] Click for YouTube
7. Colquhoun, D. (2019) The false positive risk: a proposal concerning what to do about p values. American Statistician . Click for full text
8. Colquhoun, D. (2019b) A response to critiques of "The reproducibility of research and the misinterpretation of p-values". Royal Society Open Science . Click for full text

A list of all of DC's publications on p values can be found at Some papers about p values.