The product-moment correlation is a central statistic in psychological research including meta-analysis. Unfortunately, it has a rather complex sampling distribution which leads to sample correlations that are biased indicators of the respective population correlations. Moreover, there seems to be some uncertainty on how to properly calculate the standard error of these correlations. Because no simple analytical solution exists, several approximations have been previously introduced. This note aims to briefly summarize 10 different ways to calculate the standard error of the Pearson correlation. Moreover, a simulation study on the accuracy of these estimators compared their relative percentage biases for different population correlations and sample sizes. The results showed that all estimators were largely unbiased for sample sizes of at least 40. For smaller samples, a simple approximation by Bonett (2008) led to the least biased results. Based on these results, it is recommended to use the expression for the calculation of the standard error of the Pearson correlation.
A Brief Note on the Standard Error of the Pearson Correlation
The product-moment correlation (Pearson, 1896) is an essential statistic to quantify the linear association between two variables. It is not only an elementary component of descriptive and exploratory research but is also used as a measure of effect size to facilitate interpretations and comparisons of results between studies. Therefore, it also plays a pivotal role in research syntheses including quantitative meta-analyses (e.g., Cheung, 2015; Hafdahl, 2008). Although it is well known among statisticians that the sample correlation r represents a biased estimator of the population correlation ρ (e.g., de Winter et al., 2016; Hedges, 1989; Olkin & Pratt, 1958; Shieh, 2010; Zimmerman et al., 2003), applied researchers seldom adopt unbiased estimators of ρ because the bias in r is often considered increasingly negligible with larger sample sizes (Shieh, 2010). Moreover, there seems to be some uncertainty regarding the calculation of the standard error of r that is often used in meta-analytic investigations as precision weights and, thus, determines the contribution of each correlation coefficient to the pooled estimate (e.g., Bonett, 2008; Hafdahl, 2008). Because ρ follows a rather complex sampling distribution, the standard error is rarely calculated from the exact distribution. Rather, various large-sample approximations have been suggested in the literature (e.g., Bonett, 2008; Hedges, 1989; Hotelling, 1953). However, if these approximations represent biased estimators of standard errors, their use in meta-analyses might involuntarily distort effect size estimations and interpretations. Therefore, this brief note reviews different approaches on how to calculate the standard error of r and demonstrates their biases for different values of r and sample sizes.
The Sampling Distribution of the Pearson Correlation
Let X = {x1, x2, …, xN} and Y = {y1, y2, …, yN} represent two bivariate normally distributed variables with population means , variances , and correlation ρ that are measured in a sample of N elements. Then, the Pearson product-moment correlation is given by
with and as the sample means of X and Y. The density probability distribution of r has been derived by Hotelling (1951, 1953) based on prior work in Fisher (1915, 1921) and follows a rather curious shape,
involving the gamma function Γ(x) = (x - 1)! and Gauss’s hypergeometric function with . For demonstration, Figure 1 visualizes these distributions for two correlations, ρ = .20 and ρ = .80. Whereas the former represents a small to medium effect that is typically observed in different areas of psychology (see Bosco et al., 2015; Gignac & Szodorai, 2016; Lovakov & Agadullina, 2021), the latter reflects a rather large effect that is often limited to specific domains such as competence research (e.g., Gnambs & Lockl, 2023). The distributions in Figure 1 highlight that the sample correlations r follow a rather asymmetrical shape. In the present case, the modes of these distributions are larger than the respective ρ, thus, resulting in a negative skew. Generally, the skew is more pronounced for larger |ρ| because correlations are bounded at -1 and +1. In contrast, smaller correlations often exhibit approximately normal distributions. Moreover, the asymmetry of the sampling distribution is strongly affected by the sample size. It is stronger for small sample sizes, while larger samples result in more symmetric distributions. In the presented examples, already a sample size of N = 50 leads to an approximately normal sample distribution, even for ρ = .80.
The estimator of the population correlation
Because of its skewed sampling distribution, the sample correlation r is a biased estimator of the population correlation ρ (e.g., Hedges, 1989; Olkin & Pratt, 1958). As pointed out by Zimmerman and colleagues (2013), this bias can reach up to .03 or .04 in many applied situations. Therefore, Olkin and Pratt (1958) derived an estimator of ρ from (2) that corrects for the bias in r as
with ≈ indicating an approximation that is accurate within for N ≥ 18. Monte Carlo simulations confirmed that the approximate estimator in (3) is largely unbiased for different sample sizes and population correlations, whereas r tends to underestimate ρ, particularly for medium- to large-sized correlations and small sample sizes (Shieh, 2010). However, r tends to exhibit a higher precision for |ρ| < .60 as reflected by the mean squared error. Therefore, in these cases, the sample correlation r might, despite its bias, serve as a meaningful estimator of the strength of the association between two variables.
Estimators of the sampling variance
The sampling variance of r was derived by Hotelling (1951, 1953) using the moments of r about its mean as
Consequently, the standard error of r is . Because (4) requires integrating over the hypergeometric function, no simple analytic solution exists. Therefore, various approximations have been suggested in the literature that often take the general form
with df representing the degrees of freedom and O(n) a series of terms of the order n that approximates the integral over the hypergeometric function in (4). For n → ∞, the estimator in (5) approaches (4). An overview of 10 thus derived approximations is given in Table 1.
Source . | Standard error . | . |
---|---|---|
Pearson (1896, p. 266) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N \cdot \left( 1 + \rho^{2} \right)}}\] | (6) |
Pearson & Filon (1898, p. 174) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N}}\] | (7) |
Soper (1913, p. 107) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 1}}\] | (8) |
Bonett (2008, p. 174) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 3}}\] | (9) |
Soper (1913, p. 107) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N}} \cdot \left( 1 + \frac{1 + 5.5 \cdot \rho^{2}}{2 \cdot N} \right)\] | (10) |
Soper (1913, p. 107) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 1}} \cdot \left( 1 + \frac{11 \cdot \rho^{2}}{4 \cdot (N - 1)} \right)\] | (11) |
Hotelling (1953, p. 212) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 1}} \cdot \left( 1 + \frac{11 \cdot \rho^{2}}{4 \cdot (N - 1)} + \frac{- 192 \cdot \rho^{2} + 479 \cdot \rho^{4}}{32 \cdot (N - 1)^{2}} \right)\] | (12) |
Ghosh (1966, p. 260) a | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N + 6}} \cdot \left( 1 + \frac{14 + 11 \cdot \rho^{2}}{2 \cdot (N + 6)} + \ldots \right)^{0.5}\] | (13) |
Hedges (1989, p. 474) b | \(\sigma_{r} = \sqrt{q^{2} - Q}\) with \[q = r + r \cdot \left( 1 - r^{2} \right)/\left\lbrack 2 \cdot (N - 3) \right\rbrack\] \[Q = 1 - \left( \frac{N - 3}{N - 2} \right) \bullet\] \[\left( \frac{2 - 4{\cdot r}^{2} + 2 \cdot r^{4}}{N} + \frac{8 - 24{\cdot r}^{2}}{N \cdot (N + 2)} + \frac{48}{N \cdot (N + 2) \cdot (N + 4)} \right)\] | (14) |
Hedges (1989, p. 477) | \(\sigma_{r} = \sqrt{q^{2} - Q}\) with \[q = r \cdot_{2}^{}F_{1}\left( \frac{1}{2},\frac{1}{2};\frac{N - 2}{2};1 - r^{2} \right)\] \[Q = 1 - \frac{(N - 3)\left( 1 - r^{2} \right)_{2}^{}F_{1}\left( 1,1;\frac{N}{2};1 - r^{2} \right)}{N - 2}\] | (15) |
Source . | Standard error . | . |
---|---|---|
Pearson (1896, p. 266) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N \cdot \left( 1 + \rho^{2} \right)}}\] | (6) |
Pearson & Filon (1898, p. 174) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N}}\] | (7) |
Soper (1913, p. 107) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 1}}\] | (8) |
Bonett (2008, p. 174) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 3}}\] | (9) |
Soper (1913, p. 107) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N}} \cdot \left( 1 + \frac{1 + 5.5 \cdot \rho^{2}}{2 \cdot N} \right)\] | (10) |
Soper (1913, p. 107) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 1}} \cdot \left( 1 + \frac{11 \cdot \rho^{2}}{4 \cdot (N - 1)} \right)\] | (11) |
Hotelling (1953, p. 212) | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N - 1}} \cdot \left( 1 + \frac{11 \cdot \rho^{2}}{4 \cdot (N - 1)} + \frac{- 192 \cdot \rho^{2} + 479 \cdot \rho^{4}}{32 \cdot (N - 1)^{2}} \right)\] | (12) |
Ghosh (1966, p. 260) a | \[\sigma_{r} = \frac{1 - \rho^{2}}{\sqrt{N + 6}} \cdot \left( 1 + \frac{14 + 11 \cdot \rho^{2}}{2 \cdot (N + 6)} + \ldots \right)^{0.5}\] | (13) |
Hedges (1989, p. 474) b | \(\sigma_{r} = \sqrt{q^{2} - Q}\) with \[q = r + r \cdot \left( 1 - r^{2} \right)/\left\lbrack 2 \cdot (N - 3) \right\rbrack\] \[Q = 1 - \left( \frac{N - 3}{N - 2} \right) \bullet\] \[\left( \frac{2 - 4{\cdot r}^{2} + 2 \cdot r^{4}}{N} + \frac{8 - 24{\cdot r}^{2}}{N \cdot (N + 2)} + \frac{48}{N \cdot (N + 2) \cdot (N + 4)} \right)\] | (14) |
Hedges (1989, p. 477) | \(\sigma_{r} = \sqrt{q^{2} - Q}\) with \[q = r \cdot_{2}^{}F_{1}\left( \frac{1}{2},\frac{1}{2};\frac{N - 2}{2};1 - r^{2} \right)\] \[Q = 1 - \frac{(N - 3)\left( 1 - r^{2} \right)_{2}^{}F_{1}\left( 1,1;\frac{N}{2};1 - r^{2} \right)}{N - 2}\] | (15) |
aGhosh (1966) presented an approximation to the order of 6. Because this resulted in rather complex terms, for ease of presentation (13) reports only the first ones. bHedges’ (1989) approximation of <em>q</em> (Equation 14) seems to mistakenly use a value of 3 in the denominator instead of 4, thus, adopting <em>N</em> as the degrees of freedom, whereas the degrees of freedom was <em>N</em> - 1 in the remainder of the paper. Note. ρ = population correlation, r = sample correlation, N = sample size, = Gauss hypergeometric function.
Originally, Pearson (1896) proposed df = N (1 + ρ2) resulting in (6). However, he soon noticed that this referred to a rather special case where the variances of X and Y are known (see Pearson & Filon, 1898) and therefore revised the expression to df = N as in (7). But, the expression in (8) with df = N – 1 provides an even more accurate estimator of by assuming estimated means in (1) because (7) implies that the means of the two variables in (1) are known, that is, and (Olkin & Pratt, 1958). Moreover, simulation studies led to Bonett’s (2008) suggestion of df = N – 3 in (9). The formulas in (7) to (9) are frequently used in applied research because they are easy to calculate and give good approximations of (4). However, they are biased to some degree because they ignore O(n) for the approximation of the integrated hypergeometric function.
Several authors tried to give analytic solutions for O(n) and different n. Soper (1913) derived O(1) for df = N and df = N – 1 resulting in (10) and (11). The latter was later extended by Hotelling (1953) to O(2) as in (12). Moreover, Ghosh (1966) independently presented an approximation of O(6) resulting in (13). The formulas in (10) to (13) are expected to provide better estimators of (4) because of their closer approximation of the sampling distribution of r. However, they are hardly used anymore today because researchers trying to improve the accuracy of the estimated standard errors as compared to (7) to (9) can easily do so by directly evaluating the integral in (4) using modern optimization routines implemented in standard statistical software.
Hedges (1989) presented an alternative approach in the context of meta-analytic studies based on the assumption derived from classical test theory. Here, the variance in study i is decomposed into the variance of the distribution from which the study-specific population correlations are sampled and sampling error . Consequently, an estimator of the standard error can be obtained by using unbiased estimates of ρ as given in (4) and the multiple correlation ρ2 as derived by Olkin and Pratt (1958). This results in the approximate and exact expressions of the unbiased standard error given in (14) and (15), respectively.
Finally, it might be helpful to illuminate a potential misconception arising from the fact that the bivariate correlation can also be expressed in terms of the standardized linear regression coefficient. The studentized regression coefficient with b and β as the estimated and true regression weights, respectively, and as the standard error of b follows a t distribution with df = N - 2. Under the null hypothesis of β = 0, this reduces to with R2 as the coefficient of determination (see Pugh & Winslow, 1966, for the detailed derivation). Because for a single predictor R2 = r2, it might be tempting to mistake the term for the standard error of r. However, this is only true for the special case when ρ = 0, whereas this expression leads to increasingly biased estimators of for larger |ρ|.
Comparison of Estimators of the Standard Error
As highlighted in Table 1 different approximations have been suggested for the estimation of the standard error of r. However, little is known about which of these estimators might yield noteworthy benefits in substantive research. Therefore, the accuracy and efficiency of these estimators were compared for different population correlations and sample sizes.
Methods
The comparison varied the population correlations between ρ = .00 and .90 (in increments of .10) and sample sizes N from 10 to 50 (in increments of 10) and 100. For each condition, six types of standard errors for r were examined either following (7) in Pearson and Filon (1898), (8) in Soper (1913), (9) in Bonett (2008), (4) using the integral by Hotelling (1953) with , or (15) in Hedges (1989). Moreover, also the standard error of a regression coefficient was derived to demonstrate its inadequacy for the correlation coefficient. The approximations in (10) to (13) were not considered because they are rarely used in practice and, more importantly, are superseded by the direct evaluation of the integral in (4). The performance of these estimators was compared using the relative percentage bias (Hoogland & Boomsma, 1998) which is given as . Values of %Bias less than 5% were considered negligible. The true standard error was calculated following (4) using the population correlation ρ from the data-generating process. Moreover, the efficiency of the estimators was studied using the root mean squared error, . Because the RMSE can be decomposed into the squared bias and the variance of , it represents a trade-off between the two components. Thus, an estimator might be more biased but at the same time also more efficient if it has a smaller variance.1 The Bias and RMSE were computed by numerical integration using adaptive Simpson’s quadrature with respect to r, that is, and .
Results and Discussion
The relative bias of the different estimators of for different sample sizes is summarized in Figure 2. These results show little differences between the compared estimators for sample sizes of N = 40 or larger. Except for the standard error of the regression coefficient, the approximations of the standard error of the correlation yielded largely unbiased estimates. In contrast, at a small sample size such as N = 10 or 20 more pronounced differences were observed. Estimators using N (Pearson & Filon, 1898) or N - 1 (Soper, 1913) as degrees of freedom resulted in negative relative biases that increased for larger population correlations. In contrast, the estimator by Bonett (2008) with N – 3 degrees of freedom resulted in unbiased estimates of the standard error across a wide range of correlations; only for very large correlations a slight negative bias was observed. Similarly, an evaluation of the integral in (4) led to comparably unbiased estimates for sample sizes of at least 20. However, in extremely small samples it was less precise than the approximation by Bonett (2008), presumably because Equation (4) uses the estimate of the population correlation multiple times which is estimated rather imprecisely in small samples. Interestingly, the approximation by Hedges (1989) was only unbiased for correlations up to about .60; larger correlations resulted in a slightly negative bias. These results also emphasize that - independent of the sample size - the standard error of the regression coefficient yielded unbiased results only for population correlations close to 0. For larger correlations, the relative percentage bias increased up to 125%. Thus, mistakenly using this standard error as an indicator of precision might result in severely distorted meta-analytic summaries of product-moment correlations.
The root mean squared error resulted in only marginal differences between the compared estimators (see Figure 2). For samples of N = 50 or N = 100, the RMSE fell below .001 in all conditions. Although it was slightly larger for smaller sample sizes, the RMSE did not indicate pronouncedly different efficiencies of the studied estimators. Except for N = 10, the different estimators resulted in comparable RMSEs. But again, the standard error of the regression coefficient was less efficient at larger population correlations. Together, these results indicate that the simple approximation by Bonett (2008) results in the least biased estimates of the standard error for different values of ρ and sample sizes. More complex estimators that either require the evaluation of the integral in (4) or rely on the hypergeometric function such as (15), do not seem to provide noteworthy benefits for the accuracy of the standard error.
Conclusion
Despite its popularity in applied research, the distributional properties of the sample correlation are often not well understood or simply neglected. Particularly, the calculation of the standard error of the Pearson correlation remains challenging because of the complex sampling distribution of r which does not give a simple analytical solution. Therefore, various approximations are currently used in substantive research. Although some of the more complex expressions in Table 1 have only historical value today because the integration of complex functions became substantially easier with modern computers, it was unclear whether the choice between the simpler approaches might matter. Therefore, the simulation evaluated the accuracy of these estimators for different population correlations and sample sizes. The respective findings suggest that the least biased estimator used the expression (Bonett, 2008). Particularly, for larger population correlations in small samples, this estimator should result in more precise standard errors. However, it needs to be emphasized that differences between estimators become negligible as soon as sample sizes increase. At typical sample sizes in psychology that often exceed 50 or 100 respondents the choice of the estimator is unlikely to substantially matter. Thus, the estimator in (8) that is currently often used in practice should be usually also acceptable in many applied situations. However, further research is needed to identify specific conditions under which biased estimators might yield non-negligible consequences such as in meta-analytic applications with small samples.
Author Contributions
TG conceived the study, conducted the analyses, and wrote the manuscript.
Funding
The publication of this article was funded by the Open Access Fund of the Leibniz Association.
Competing Interests
The author has no competing interests to declare that are relevant to the content of this article.
Code Availability
The computer code for the simulation is available at https://osf.io/m6srf.
Footnotes
The simulation only considered correlations ρ ≥ 0 because the bias for negative correlations is simply the opposite of the bias for positive correlations, that is, , while the root mean squared error is identical in both cases, that is, .