Whenever parameter estimates are uncertain or observations are contaminated by measurement error, the Pearson correlation coefficient can severely underestimate the true strength of an association. Various approaches exist for inferring the correlation in the presence of estimation uncertainty and measurement error, but none are routinely applied in psychological research. Here we focus on a Bayesian hierarchical model proposed by Behseta, Berdyyeva, Olson, and Kass (2009) that allows researchers to infer the underlying correlation between error-contaminated observations. We show that this approach may be also applied to obtain the underlying correlation between uncertain parameter estimates as well as the correlation between uncertain parameter estimates and noisy observations. We illustrate the Bayesian modeling of correlations with two empirical data sets; in each data set, we first infer the posterior distribution of the underlying correlation and then compute Bayes factors to quantify the evidence that the data provide for the presence of an association.

## Introduction

Formal mathematical models are useful tools for analyzing data obtained from psychological experiments (e.g., Busemeyer & Diederich, 2010; Forstmann & Wagenmakers, 2015; M. D. Lee & Wagenmakers, 2013; Levine, 2000; Lewandowsky & Farrell, 2010). These models come in many flavors, ranging from simple statistical distributions to sophisticated cognitive process models. For example, in the field of response-time modeling, a simple statistical model is the ex-Gaussian distribution, while a sophisticated process model is the diffusion model (Ratcliff, 1978), a popular sequential sampling model of choice response times. Regardless of its degree of sophistication, the general goal of mathematical modeling is to capture regularities in the data using parameters that represent separate statistical components or distinct psychological variables.

Typically, investigators estimate model parameters for each participant separately, for instance with maximum likelihood (e.g., Myung, 2003; Ly, Marsman, Verhagen, Grasman, & Wagenmakers, in press) or Bayesian methods (e.g., M. D. Lee & Wagenmakers, 2013). Once the parameter estimates have been obtained, researchers are sometimes interested in assessing the association between parameters, or between parameters and other observed variables. For example, a researcher may fit the diffusion model to the individual data sets obtained from a lexical decision task and compute the correlation between parameter estimates that reflect participants’ response caution and parameter estimates that quantify participants’ rate of information accumulation (e.g., drift rate). Alternatively, a researcher may be interested in computing the correlation between estimates of drift rate and measurements of general intelligence.

The Pearson product-moment correlation coefficient is a popular measure of the linear relationship between two variables (for Bayesian solutions, see Ly, Verhagen, & Wagenmakers, 2016a, 2016b; Ly, Marsman, & Wagenmakers, 2015). Its popularity in psychology is illustrated by the fact that 42% of the 67 articles in the 2012 volume of the *Journal of Experimental Psychology: General* (JEP:G) report at least one Pearson correlation coefficient, with 152 correlations in total, and an average of more than *5* correlations per article.^{1} Despite its popularity, psychology as a field seems generally unaware that the correlation coefficient can severely underestimate the true strength of the association in the presence of measurement error or estimation uncertainty.

### Measurement Error

It is generally recognized that most—if not all—psychological constructs are measured imperfectly. It is also well-known among statisticians that measurement error decreases the correlation coefficient between two sets of observations (e.g., Charles, 2005; Spearman, 1904). Although various approaches are available to obtain correlation coefficients that take into account measurement error, none are routinely applied in psychology. Indeed, out of the 28 JEP:G articles in the 2012 volume that reported one or more correlations, only one acknowledged the deleterious effects of measurement error and corrected the observed correlation.

Attempts to remedy the problem of the attenuation of the correlation date back to Spearman (1904), who proposed to correct the correlation coefficient using the reliability with which the observations were obtained. Spearman’s attenuation formula is related to errors-in-variables models, which are extensions to standard regression models that aim to correct the bias in parameter estimates that results from measurement error (e.g., Buonaccorsi, 2010; Cheng & Van Ness, 1999; Clayton, 1992; Fuller, 1987; Westfall & Yarkoni, 2016; for Bayesian solutions, see Congdon, 2006; Gilks, Richardson, & Spiegelhalter, 1996; Gustafson, 2004; Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2012; Muff, Riebler, Held, Rue, & Saner, 2015; Richardson & Gilks, 1993). If both the criterion and the response variables are assumed to be measured with noise, Spearman’s method and the correction within standard linear regression models result in the same disattenuation (see Behseta et al., 2009, Appendix). Although these solutions take into account uncertainty that results from measurement noise, they do not explicitly address uncertainty that results from estimating model parameters from limited data.

### Estimation Uncertainty

It is also generally recognized that if only limited data are available, model parameters are typically estimated with some degree of uncertainty. It is, however, rarely acknowledged that the correlation coefficient computed between two sets of uncertain parameter estimates can severely underestimate the true strength of the association between the parameters. This widespread neglect is puzzling because various approaches are available to obtain correlation coefficients that take into account the uncertainty of the parameter estimates. Monte Carlo methods that deal with the adverse consequences of estimation uncertainty in cognitive models have been described by Ratcliff and Strayer (2014). The Bayesian estimation of correlations between parameters in hierarchical regression models has been illustrated by Gelman and Hill (2007, p. 279). Bayesian solutions for inferring the correlation in the context of hierarchical cognitive models have been outlined by Klauer (2010), Matzke, Dolan, Batchelder, and Wagenmakers (2015), Rouder, Lu, Morey, Sun, and Speckman (2008), and Rouder et al. (2007). Although these solutions take into account uncertainty that results from parameter estimation, they do not explicitly address uncertainty that results from the measurement error associated with observed covariates.

### Measurement Error and Estimation Uncertainty: A Bayesian Solution

Methods that remedy the attenuation of the correlation typically target uncertainty that results from either imperfect measurements or parameter estimation, but do not deal with both sources of uncertainty simultaneously. We attempt to address this limitation and describe a Bayesian method that allows researchers to infer the underlying correlation between (1) two sets of error-contaminated observations; (2) two sets of uncertain parameter estimates; and (3) a set of uncertain parameter estimates and a set of error-contaminated observations. Our approach is based on the Bayesian hierarchical method proposed by Behseta et al. (2009) that explicitly models measurement error, and so can be conceived as a principled method for “correcting” the correlation for the presence of noise in the data. Our approach extends that of Behseta et al. so that correlations involving uncertain parameter estimates can be inferred.

Our method for modeling correlations involving parameter estimates comprises of two-steps: In the first step, we obtain posterior distributions for the model parameters of each participant. In the second step, we rely on hierarchical modeling to adjust the correlation for the uncertainty of the parameter estimates. As a result of the hierarchical formulation, the point estimates obtained in the first step are shrunk toward their corresponding group mean. The degree of shrinkage is determined by the relative uncertainty of the estimates. As a result, the correlation computed using the shrunken estimates is automatically adjusted for the additional source of variability that results from imperfect parameter estimation. The benefits of this adjustment are twofold. First, as the method infers the likely value of the correlation in the absence of estimation uncertainty, it typically increases the absolute value of the correlation. Second, as the method properly accounts for the precision of the parameter estimates, it increases the uncertainty of the correlation and guards against overconfident conclusions about the relationship between noisy parameter estimates.

Bayesian hierarchical modeling also allows for the *simultaneous* estimation of the participant-level model parameters and their correlations (e.g., Gelman & Hill, 2007). In this approach, the individual parameters are assumed to follow a bivariate (or multivariate) normal distribution that is described by the group-level means and a variance-covariance matrix. The group-level and the participant-level parameters are estimated simultaneously from the data, where the group-level bivariate normal distribution acts as a prior to adjust extreme individual estimates to more moderate values. As a result of shrinkage of the individual parameters, the correlation automatically takes into account the uncertainty of the participant-level estimates. Note that the two-step approach presented in this article also relies on hierarchical modeling, but, in contrast to simulations hierarchical modeling where the model parameters and the covariance structure are estimated jointly, our framework relies on a modular approach that uses the hierarchical structure only in the second step.

Our two-step procedure and the simultaneous hierarchical framework assume different generative models for the data and as such constitute different approaches to modeling correlations. The two-step procedure assumes that the individual model parameters that generated the data are uncorrelated a priori. The simultaneous framework assumes a priori correlated parameters and automatically uses this prior information during the estimation of the individual parameters. We see merits in both approaches. The two-step approach is easy-to-use and can be applied when only point estimates—from Bayesian or non-Bayesian procedures—and the corresponding estimation uncertainties are available. Moreover, the two-step approach can be useful in exploratory analyses of the correlation between model parameters and external observations (see also Ly, Boehm, et al., in press). The simultaneous modeling framework has the potential to correct for bias in the individual parameter estimates that may result from fitting a simpler but unrealistic model with a priori uncorrelated parameters to scarce data. However, in the context of theory-laden models with meaningful priors, one should consider whether the prior set-up for the multivariate normal distribution used in the simultaneous framework adequately approximates the desired prior set-up for the cognitive model in question (M. D. Lee & Vanpaemel, in press). Note that the two-step and simultaneous modeling frameworks yield similar results when the prior has little or no influence on the individual posterior distributions, such as in situations with a large number of observations per participant. Of course, if the individual parameters are estimated accurately and precisely, the attenuation of the correlation tends to be negligible.

The outline of this article is as follows. First, we illustrate the consequences of measurement error for the computation of the correlation and show that a similar problem applies to correlations derived between uncertain parameter estimates. Second, we describe a Bayesian hierarchical approach for inferring the underlying correlation in the presence of measurement error (Behseta et al., 2009) and show that the same method can also aid the recovery of the underlying correlation between two sets of parameter estimates. Third, we illustrate our approach with two empirical data sets: one focusing on the correlation between parameters of cumulative prospect theory (Tversky & Kahneman, 1992), and the other focusing on the correlation between general intelligence and the drift rate parameter of the diffusion model (Ratcliff, 1978).

## Attenuation of the Correlation

First we illustrate the deleterious consequences of measurement error for the computation of the correlation. We then show that a similar problem applies to correlations computed between two sets of parameter estimates that are obtained from scarce data.

### The Consequences of Measurement Error

Let *θ* and *β* be unobserved random variables and let $\theta ^$ and $\beta ^$ be the observed, error-contaminated measurements:

and

where *ϵ _{θ}* and

*ϵ*are the measurement errors associated with

_{β}*θ*and

*β*, respectively. The measurement errors are uncorrelated with

*θ*and

*β*and with each other, and are assumed to be normally distributed with variance $\sigma \u03f5\theta 2$ and $\sigma \u03f5\beta 2$:

and

so that

and

The correlation between the *unobserved* variables *θ* and *β* with variances $\sigma \theta 2$ and $\sigma \beta 2$, respectively, is given by:

The correlation between the *observed* variables $\theta ^$ and $\beta ^$ is given by:

It follows from Equation 7 and Equation 8 that the observed correlation *r* is always lower in absolute value than the unobserved true correlation *ρ* due to the additional source of variability that results from measurement error^{2} (see also Behseta et al., 2009, Appendix).

### The Consequences of Estimation Uncertainty

In many applications of mathematical modeling, researchers estimate model parameters for each participant separately with maximum likelihood or Bayesian methods. In situations with only a limited number of observations per participant with respect to the number of parameters, the model tends to overfit the data. The resulting parameter estimates are therefore often overdispersed relative to the true parameters that generated the data (Farrell & Ludwig, 2008; Rouder, Lu, Speckman, Sun, & Jiang, 2005; Rouder, Sun, Speckman, Lu, & Zhou, 2003). In fact, we can distinguish two sources of variability in parameter estimates: variability that reflects true individual differences in model parameters and variability that reflects the uncertainty of the estimates that results from estimating model parameters from limited information. The inflation of the variability of the estimates as a result of estimation uncertainty may in turn decrease the observed correlation; as shown in Equation 7 and Equation 8, all else being equal, the larger the variances, the smaller the correlation.

*Example: Step 1*. Consider the following hypothetical scenario. A researcher is interested in the heritability of the latencies of speeded two-choice decisions. In particular, the researcher hypothesizes that fast response times (RTs) are positively correlated between pairs of homozygotic twins. The researcher tests 80 twin pairs in a simple letter discrimination task (e.g., Thapar, Ratcliff, & McKoon, 2003) with 100 trials. The researcher models the RT data of each participant with the ex-Gaussian distribution (Hohle, 1965; Matzke & Wagenmakers, 2009). The ex-Gaussian distribution is a popular RT distribution that results from the convolution of a normal and an exponential distribution; the ex-Gaussian *µ* parameter gives the mean of the normal component and is often used to quantify the latency of fast responses. The researcher obtains posterior distributions for the *µ*_{T1} (Twin 1) and *µ*_{T2} (Twin 2) parameter for each twin pair using Bayesian inference. The researcher then computes the mean of the posterior distribution of the parameters and uses these as point estimates. Lastly, the researcher obtains a Pearson correlation coefficient of *r* = 0.75 between the two sets of point estimates.

Suppose that—unbeknownst to the investigator—the true value of the *µ*_{T1} and *µ*_{T2} parameters that generated the data are known for each twin pair, and so is the true correlation between the parameters: *ρ* = 0.86. The gray points in the top row of Figure 1 show the point estimates $\mu ^T1$ and $\mu ^T2$ plotted against their true values. The gray vertical arrows indicate the range of the point estimates; the black horizontal arrows indicate the range of the true values. The arrows show that the point estimates are more variable than the true values. Note that this result also holds in the maximum likelihood framework. Most importantly, as shown in the bottom panel, the Pearson correlation coefficient computed between the point estimates $\mu ^T1$ and $\mu ^T2$ (i.e., dotted line at *ρ* = 0.75) underestimates the true correlation between *µ*_{T1} and *µ*_{T2} (i.e., dashed line at *ρ* = 0.86) as a result of the additional source of variability that results from the uncertainty of the point estimates.

## A Bayesian Solution to Remedy the Attenuation of the Correlation

In this section, we outline the Bayesian approach proposed by Behseta et al. (2009) that allows for the estimation of the underlying correlation between error-contaminated observations. We then show that the same Bayesian model can also aid the recovery of the underlying correlation between uncertain parameter estimates.

### Inferring the Correlation in the Presence of Measurement Error

Behseta et al.’s (2009) method for inferring the underlying correlation in the presence of measurement error relies on Bayesian hierarchical modeling (e.g., Farrell & Ludwig, 2008; Gelman & Hill, 2007; M. D. Lee, 2011; Rouder et al., 2005). The graphical representation of the hierarchical model is shown in Figure 2. Observed variables are represented by shaded nodes and unobserved variables are represented by unshaded nodes. The graph structure indicates dependencies between the nodes (e.g., M. D. Lee & Wagenmakers, 2013).

*Level I: Observed Data*. As before, let *θ* and *β* represent the true values, $\theta ^$ and $\beta ^$ the corresponding observed values, and *ϵ _{θ}* and

*ϵ*the normally distributed errors associated with

_{β}*θ*and

*β*, respectively. For each observation

*i, i*= 1, …,

*N*, $\theta ^i$ and $\beta ^i$ are given by:

and

The error variances $\sigma \u03f5\theta i2$ and $\sigma \u03f5\beta i2$ are assumed to be known a priori or are estimated from data. Note that the model does not assume homogenous error variances across the *N* observations. Instead, each observation *i* has its own error variance.

*Level II: Inferred Variables*. For each observation *i*, the inferred parameters **η**_{i} = (*θ _{i}, β_{i}*) are assumed to follow a bivariate normal distribution, with means

**µ**and variance-covariance matrix

**Σ**:

where

and

Here *ρ* is the underlying correlation between *θ* and *β* that is not contaminated by measurement error.

The group-level means **µ** and the three elements (i.e., *σ _{θ}, σ_{β}*, and

*ρ*) of the variance-covariance matrix

**Σ**are estimated from data and require prior distributions. We use the following prior set-up:

where $\sigma \mu \theta 2,\sigma \mu \beta 2$, *b _{σθ}*, and

*b*are set to large values to create relatively uninformative prior distributions. We follow Jeffreys (1961) and assign the correlation

_{σβ}*ρ*a default prior distribution that is uniform from –1 to 1, expressing the belief that all values of the correlation are equally likely a priori:

A benefit of the hierarchical formulation is that the correlation *ρ* is automatically adjusted for the additional source of variability that results from the uncertainty of the observations. In particular, the observations $\theta ^$ and $\beta ^$—especially those that are measured with great uncertainty—are shrunk toward their corresponding group mean. The degree of shrinkage is determined by the relative uncertainty of the observations, so that the posterior means of the inferred parameters *θ* and *β* are given by a weighted combination of the observations and the weighted group means. The inferred parameters are therefore less dispersed than the observed data points, and the inferred correlation is typically higher than the correlation computed with the direct observations.

Parameter estimation for the Bayesian hierarchical model outlined above may proceed using standard Bayesian statistical software, such as WinBUGS (Bayesian inference Using Gibbs Sampling for Windows; Lunn et al., 2012; for an introduction for psychologists, see Kruschke, 2010b, and M. D. Lee & Wagenmakers, 2013). The WinBUGS script (version 1.4.3) is presented in the Appendix. Note that the WinBUGS script requires minimal, if any, modification to run under OpenBUGS (Lunn, Spiegelhalter, Thomas, & Best, 2009) or JAGS (Plummer, 2003, 2013). The Stan project (Stan Development Team, 2012) provides yet another alternative to obtain the posterior distribution of *ρ*. For a detailed introduction to Bayesian methods, the reader is referred to Dienes (2011), Edwards, Lindman, and Savage (1963), Etz and Vandekerckhove (in press), Gelman and Hill (2007), Gelman et al. (2013), Kruschke (2010b), Kruschke (2010a), M. D. Lee and Wagenmakers (2013), Rouder and Lu (2005), Rouder, Speckman, Sun, Morey, and Iverson (2009), Wagenmakers et al. (2015), and Wagenmakers et al. (in press).

### Inferring the Correlation in the Presence of Estimation Uncertainty

The Bayesian hierarchical model proposed by Behseta et al. (2009) may be also used to infer the correlation between uncertain parameter estimates. We focus on parameter estimates obtained with Bayesian inference because the resulting posterior distributions can be conveniently used to quantify estimation uncertainty.

In the first step, we infer the posterior distribution of the model parameters for each participant. We use the posterior mean as a point estimate for the parameters; we use the posterior variance as a measure of the uncertainty of the estimates. Moreover, we assume that the point estimates can be conceptualized as an additive combination of a true parameter value and a normally distributed displacement parameter (see Equations 9–10).^{3} In the second step, we use the graphical model in Figure 2 with two minor adjustments to infer the underlying correlation between the model parameters. First, we replace the observed measurements $\theta ^i$ and $\beta ^i$ with the point estimates (i.e., mean of the posterior distributions). Second, we substitute the measurement error variances $\sigma \u03f5\theta i2$ and $\sigma \u03f5\beta i2$ with the uncertainty of the point estimates (i.e., variance of the posterior distributions).

As a result of the hierarchical formulation, the point estimates are shrunk toward their corresponding group mean, where the degree of shrinkage is determined by the relative uncertainty of the estimates. The correlation *ρ* is therefore automatically adjusted for the additional source of variability that results from imperfect parameter estimation.

*Example Continued: Step 2*. Aware of the disadvantages of parameter estimation with scarce data, the researcher sets out to use Bayesian hierarchical modeling to infer the underlying correlation between *µ*_{T1} and *µ*_{T2}. The researcher uses the mean of the posterior distribution of the individual parameters as point estimates ($\mu ^T1$ and $\mu ^T2$) and quantifies their uncertainty with the variance of the posteriors ($\sigma \u03f5\mu T12$ and $\sigma \u03f5\mu T22$). The researcher then feeds the posterior means and posterior variances into the hierarchical model in Figure 2 and adjusts the prior distribution of the group-level means and group-level standard deviations to match the measurement scale of the RT data.

The black points in the top rows of Figure 1 show the posterior means of the inferred *µ*_{T1} and *µ*_{T2} parameters plotted against their true values. The black vertical arrows indicate the range of the inferred parameters; as a result of shrinkage, the inferred parameters from the hierarchical analysis are less dispersed than the point estimates (i.e., gray points). Most importantly, as shown in the bottom panel, the posterior distribution of the correlation *ρ* from the hierarchical analysis more closely approximates the true correlation between *µ*_{T1} and *µ*_{T2} than the Pearson correlation coefficient *r* computed between the point estimates.

## Empirical Examples

We now turn to two examples from the existing literature. In the first example, we demonstrate how our Bayesian approach can be used to infer the underlying correlation between two sets of parameter estimates. In particular, we assess the correlation between parameters of cumulative prospect theory obtained at two different time points. In the second example, we show how the Bayesian approach can be used to infer the underlying correlation between a set of parameter estimates and a set of error-contaminated measurements. In particular, we assess the correlation between the drift rate parameter of the diffusion model and general intelligence as measured by the Raven Progressive Matrices Test.

The examples proceed as follows: We estimate the posterior distribution of the model parameters for each participant separately and use the mean of the posterior distributions as point estimates. To highlight the advantages of our approach, we then conduct two separate analyses. In the first analysis, we compute the posterior distribution of the *observed* correlation *r* by modeling the point estimates and the Raven scores directly using a bivariate normal distribution. In the second analysis, we take into account the uncertainty of the parameter estimates and the Raven scores and compute the posterior distribution of the *inferred underlying* correlation *ρ* using the hierarchical model in Figure 2. For both analyses, we run three MCMC chains with over-dispersed start values. We discard the first 1,000 posterior samples as burn in and retain only every 3^{rd} sample to reduce autocorrelation. Results reported below are based on a total of 14,000 posterior samples. Convergence of the MCMC chains is assessed by visual inspection and the $R^$ statistic (i.e., $R^<1.05$ for all parameters; Gelman & Rubin, 1992).

Finally, for both analyses, we compute the Bayes factor (*BF*_{+0}; Jeffreys, 1961; Kass & Raftery, 1995) to quantify the evidence the data provide for the presence of a positive association ($\mathcal{H}0:\rho =0$ vs. $\mathcal{H}+:\rho >0$). First, we determine the two-sided Bayes factor using the Savage-Dickey density ratio by computing the ratio of the height of the prior and the posterior distribution of *ρ* under the alternative hypothesis at *ρ* = 0 (e.g., Dickey & Lientz, 1970; Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010; Wetzels, Grasman, & Wagenmakers, 2010). We then compute the one-sided—order-restricted—Bayes factor as recommended by Morey and Wagenmakers (2014), namely by correcting the two-sided Bayes factor using the proportion of posterior samples that is consistent with the order-restriction. As the Bayes factor is sensitive to the choice of the prior (e.g., Bartlett, 1957; Liu & Aitkin, 2008), we also examine the robustness of the conclusions with respect to the width of the prior distribution. In particular, we start with a Beta distribution stretched out between –1 and +1 with shape parameters *α* = *β* = 1.^{4} We then increase the value of *α* and *β*, creating progressively more informative symmetric prior distributions around the test value (i.e., *ρ* = 0). The R script (version 3.2.4; R Core Team, 2016) for the computation of the Bayes factor is available at https://osf.io/mvz29/.

### Example 1: Correlation between Parameters of Cumulative Prospect Theory

Our first example features a data set obtained from a decision-making experiment reported in Glöckner and Pachur (2012). The 64 participants were instructed to choose between monetary gambles in two experimental sessions. The two sessions were separated by one week and each consisted of 138 two-outcome gambles.

We model the observed choice data with cumulative prospect theory (CPT; Tversky & Kahneman, 1992) and focus on the *δ* parameter that governs how individual decision makers weight the probability information of the gambles: high values of *δ* indicate high degree of risk aversion. As the CPT parameters are assumed to be relatively stable across short periods of time, we examine the correlation between the *δ* parameters estimated from data collected during the two experimental sessions. We infer posterior distributions for the model parameters for each participant, separately for the two testing occasions, by adapting the model used by Nilsson, Rieskamp, and Wagenmakers (2011).^{5} The prior distribution for *δ* covers a wide range of plausible values based on previous research.

The posterior distributions of the individual model parameters are obtained using JAGS (version 3.3.0). We use the posterior means as point estimates for the *δ* parameters on the two testing occasions: $\delta ^1$ and $\delta ^2$. For the estimation of the inferred correlation *ρ*, we use the variance of the posterior distributions to quantify the uncertainty of $\delta ^1$ and $\delta ^2:\sigma \u03f5\delta 12$ and $\sigma \u03f5\delta 22$.

### Results

The results are shown in Figure 3. Panel A shows a scatterplot between the point estimates $\delta ^1$ and $\delta ^2$. Panel B shows the same scatterplot but includes the corresponding estimation uncertainties $\sigma \u03f5\delta 1$ and $\sigma \u03f5\delta 2*$. Panel *C* shows a scatterplot of the mean of the posterior distributions of the inferred **η** = (*δ*_{1}, *δ*_{2}) parameters. Panel *D* shows the posterior distribution of the observed (*r*) and the inferred (*ρ*) correlation.

As shown in Panel *A*, the Pearson correlation coefficient *r* between the point estimates is 0.62. If we take into account the uncertainty of the point estimates, the correlation increases substantially. Panel *C* shows that the posterior means of the inferred *δ*_{1} and *δ*_{2} parameters are shrunk toward their group mean and are associated very strongly. Panel *D* shows that the posterior distribution of the inferred correlation *ρ* is shifted to higher values relative to the posterior of the observed correlation *r*. In fact, after accounting for the uncertainty in $\delta ^1$ and $\delta ^2$, the mean of the posterior distribution of the correlation increases from 0.61 to 0.92.

One-sided Bayes factors indicate decisive evidence (Jeffreys, 1961) for the presence of a positive association for the observed *r* as well as the inferred *ρ* correlation; in both analyses, the data are more than 3,000,000 times more likely under $\mathcal{H}+$ than under $\mathcal{H}0$. This result is visually evident from the fact that both posterior distributions are located away from zero such that their height at *ρ* = 0 is all but negligible. The results of the robustness analyses for the inferred correlation *ρ* are shown in Figure 4. The value of the Bayes factor increases with increasing values of *α* and *β*; regardless of the prior distribution, the Bayes factor indicates overwhelming evidence in favor of $\mathcal{H}+$.

The dramatic increase in the correlation is not unusual. Figure 5 shows the results of a simulation study investigating the magnitude of the attenuation for different values of the true correlation in a parameter setting that resembles the one found in the present data. We conducted five sets of simulations, each with a different value of the true correlation: 0.92 (i.e., the posterior mean of the inferred correlation *ρ* in the present data), 0.21, 0, –0.21, and –0.92. For each set of simulations, we generated 1,000 synthetic data sets with *N* = 64, using the estimation uncertainties $\sigma \u03f5\delta 12$ and $\sigma \u03f5\delta 22$ and the posterior mean of the group-level *µ*_{δ1}, *µ*_{δ2}, $\sigma \delta 12$, and $\sigma \delta 22$ parameters obtained from the Glöckner and Pachur (2012) data. We then computed the Pearson correlation coefficient *r** in each synthetic data set. The gray violin plots (e.g., Hintze & Nelson, 1998) show the distribution of the 1,000 *r** values for the five levels of the true correlation.

Two results are noteworthy. First, all else being equal, the larger the absolute value of the true correlation, the larger the attenuation. This relationship is also evident from Equation 7 and Equation 8. Secondly, considering the very high inferred correlation of *ρ* = 0.92 in the present data set (upper arrow), the observed Pearson correlation of *r* = 0.62 (lower arrow) seems reasonable. In particular, the observed Pearson *r* between the point estimates $\delta ^1$ and $\delta ^2$ is well within the 2.5^{th} and 97.5^{th} percentile of the Pearson correlation coefficients *r** predicted by the simulations.

In sum, taking into account the uncertainty of the point estimates resulted in a dramatic increase in the correlation between the CPT parameters; the mean of the posterior distribution of the correlation increased from 0.61 to 0.92. The Bayes factor indicated decisive evidence for the presence of a positive association for the observed as well as the inferred correlation.

### Example 2: Correlation between Drift Rate and General Intelligence

Our second example focuses on the correlation between the drift rate parameter of the diffusion model and general intelligence as measured by the total score of the 20-min version of the Raven Progressive Matrices Test (Hamel & Schmittmann, 2006; Raven, Raven & Court, 1998) in a data set collected by Weeda and Verouden (unpublished data). The data set featured RT and accuracy data from 51 participants. The stimuli were borrowed from the *π*-paradigm (Vickers, Nettelbeck, & Willson, 1972; Jensen, 1998), a speeded two-choice RT task and consisted of a series of images, each with one horizontal and two vertical lines (i.e.; two legs) that together formed the letter *π*, with one of the vertical lines longer than the other. The task was to indicate by means of a button press whether the left or the right leg of the *π* was longer. Task difficulty was manipulated on three levels—easy, medium, and difficult—by varying the difference between the length of the two legs.

We model the RT and accuracy data with the diffusion model (Ratcliff, 1978; Wagenmakers, 2009). The diffusion model provides a theoretical account of performance in speeded two-choice tasks; it assumes that participants accumulate noisy information over time from a starting point toward one of two response boundaries corresponding to the two response options. A response is initiated when one of the two response boundaries is reached. The four key parameters of the diffusion model correspond to well-defined psychological processes (Ratcliff & McKoon, 2008; Voss, Rothermund, & Voss, 2004): response caution(a), bias(z), the time taken up by encoding and motor processes, and the rate of information accumulation, which is the drift rate parameter of interest.^{6} As drift rate *v* is often associated with higher cognitive functions and reasoning (e.g., Ratcliff, Schmiedek, & McKoon, 2008; Ratcliff, Thapar, & McKoon, 2010; Schmiedek, Oberauer, Wilhelm, Suss, & Wittmann, 2007; van Ravenzwaaij, Brown, & Wagenmakers, 2011), we examine the correlation between drift rate and general intelligence.

We infer the posterior distributions of the four key diffusion model parameters for each participant using the diffusion model JAGS (version 3.4.0) module (Wabersich & Vandekerckhove, 2014). The prior distributions of the model parameters are based on parameter values reported in Matzke and Wagenmakers (2009). As drift rate is known to decrease with increasing task difficulty (e.g., Ratcliff & McKoon, 2008), we implement the following order-restriction: *v*_{difficult} < *v*_{medium} < *v*_{easy}. For simplicity, we focus on the average of the drift rate parameters across the three task difficulty conditions ($v\xaf$). The remaining parameters are constrained to be equal across the conditions, and we set $z\u2009=\u2009\u2009a2$.^{7}

Once the posterior distributions of the model parameters are obtained, we use the posterior means as point estimates for average drift rate: $v\xaf^$.^{8} For the estimation of the inferred correlation *ρ*, we use the variance of the posterior distributions to quantify the uncertainty of $v\xaf^:\sigma \u03f5v\xaf2$. For the Raven total score *g*, we assume homogenous error variance across participants and—for illustrative purposes—investigate how the inferred correlation changes as a function of the amount of measurement noise assumed in the data. In particular, we examine three scenarios: we assumed that 5%, 25%, and 55% of the total variance in Raven scores is attributable to measurement error, corresponding to excellent, acceptable, and very poor reliability, respectively.^{9}

### Results

The results are shown in Figure 6. Panels *A* to *C* show scatterplots between the point estimates $v\xaf^$ and the Raven total scores $g^$. The gray lines show the corresponding estimation uncertainties $\sigma \u03f5v\xaf$ and the standard deviation of the measurement error $\sigma \u03f5g$ for the 5%, 25%, and 55% noise scenarios. Panels *D* to *F* show scatterplots between the mean of the posterior distributions of the inferred $\eta =(v\xaf,g)$ parameters for the three scenarios. Panels *G* shows the posterior distribution of the observed (*r*) and the inferred correlations (*ρ*) for the three scenarios.

As shown in panel *A*, the Pearson correlation coefficient between the point estimates is 0.14. Panels *D* to *F* show that the posterior means of the inferred $v\xaf$ and *g* parameters are slightly shrunk toward their group mean. The degree of shrinkage is determined by the amount of measurement noise assumed for *g*: the larger the noise, the larger the shrinkage. Panel *G* shows that the mean of the posterior distribution of the inferred correlation *ρ* (i.e., black posteriors) is progressively shifted to higher values relative to the posterior of the observed correlation *r* (i.e., gray posterior). As expected, the magnitude of the shift increases as the amount of measurement noise increases. Note, however, that the shift is modest even if we assume that the Raven total score is an extremely unreliable indicator of general intelligence. The mean of the posterior distribution of the observed correlation *r* equals 0.13; the mean of the posterior of the inferred correlation *ρ* equals 0.14 in the 5% noise, 0.16 in the 25% noise, and 0.21 in the 55% noise scenario. Note also that the posterior of *ρ* is quite spread out, a tendency that becomes more pronounced as the amount of measurement noise in *g* increases.

One-sided Bayes factors indicate evidence for the *absence* of an association between drift rate and general intelligence for the observed as well as the inferred correlation. The evidence, however, is “not worth more than a bare mention” (i.e., 1 < *BF* < 3), following the categorization of evidential strength provided by Jeffreys (1961). The *BF*_{0+}^{10} for the observed correlation *r* equals 2.13. The *BF*_{0+} for the inferred correlation *ρ* decreases from 2.00 in the 5% noise, to 1.75 in the 25% noise, and to 1.32 in the 55% noise scenario. Even with extremely unreliable Raven scores, the data are thus still more likely to have occurred under $\mathcal{H}0$ than under $\mathcal{H}+$. Note however that *BF*_{0+} of 1.32—or even *BF*_{0+} of 2.13—constitutes almost perfectly ambiguous evidence, indicating that the data are not sufficiently diagnostic to discriminate between $\mathcal{H}0$ and $\mathcal{H}+$. Inspection of the posterior distribution of the correlations suggests a similar conclusion: both the observed and the inferred correlations are estimated quite imprecisely. The results of the robustness analyses for the inferred correlation *ρ* are shown in Figure 7. For low values of *α* and *β* the Bayes factor indicates weak evidence in favor of $\mathcal{H}0$, whereas for high values of *α* and *β* the Bayes factor indicates weak evidence in favor of $\mathcal{H}+$ for all three measurement-noise levels. Nevertheless, even for the most informative setting (*α* = *β* = 10), the Bayes factor provides only anecdotal evidence for the presence of a positive association.

Figure 8 shows the results of a simulation study investigating the magnitude of the expected attenuation for different values of the true correlation in a parameter setting that resembles the one found in the present data. Here we present results for the most extreme scenario where we assumed that 55% of the total variance of the Raven scores is attributable to error variance. We conducted five sets of simulations, each with a different value of the true correlation: 0.92, 0.21 (i.e., the posterior mean of the inferred correlation *ρ* in the present data set), 0, –0.21, and –0.92. For each set of simulations, we generated 1,000 synthetic data sets with *N* = 51, using the estimation uncertainties $\sigma \u03f5v\xaf2$ and the posterior means of the group-level $\mu v\xaf,\mu g,\sigma v\xaf2$, and $\sigma g2$ parameters obtained from the Weeda and Verouden (unpublished data) data. We then computed the Pearson correlation coefficient *r** in each synthetic data set. The gray violin plots show the distribution of the 1,000 *r** values for the five levels of the true correlation.

As before, all else being equal, the larger the absolute value of the true correlation, the larger the attenuation. Moreover, considering the relatively low inferred correlation of *ρ* = 0.21 in the present data (upper arrow), the observed Pearson correlation of *r* = 0.14 (lower arrow) seems perfectly reasonable. In fact, the median of the Pearson correlation coefficients *r** predicted by the simulations very closely approximates the observed Pearson *r* between $v\xaf^$ and $g^$.

In sum, taking into account the uncertainty of the variables resulted in negligible increase in the correlation between drift rate and general intelligence; even when assuming unrealistically unreliable Raven scores, the mean of the posterior distribution of the correlation increased only from 0.13 to 0.21. The Bayes factor indicated evidence for the absence of an association between drift rate and general intelligence for the observed as well as the inferred correlations, regardless of the magnitude of the error variance assumed for *g*. The evidence for $\mathcal{H}0$ was, however, only anecdotal, a result that is attributable to the substantial uncertainty in the estimated correlations.

## Discussion

In the psychological sciences, few researchers acknowledge that the Pearson correlation coefficient may underestimate the true strength of the association between two sets of observations as result of measurement error. Even fewer recognize that the attenuation also plagues correlation coefficients derived from uncertain parameter estimates. Although various approaches are available to infer the correlation in the presence of measurement error or estimation uncertainty, most methods do not deal with both sources of noise simultaneously. Our goal was therefore to outline a method that allows researchers to infer the underlying correlation between (1) two sets of error-contaminated observations; (2) two sets of uncertain parameter estimates; and (3) a set of noisy parameter estimates and a set of error-contaminated observations.

We illustrated the use of the Bayesian approach with two empirical data sets. In the first example, we assessed the correlation between parameters of cumulative prospect theory and demonstrated that taking into account the uncertainty of the parameter estimates can result in a dramatic increase in the inferred correlation: the mean of the posterior distribution of the correlation increased from 0.61 to 0.92. In the second example, we assessed the correlation between general intelligence and the drift rate parameter of the diffusion model, where we examined three scenarios: we assumed that 5%, 25%, and 55% of the total variance in Raven scores is attributable to measurement error. The estimated correlation increased only marginally when the uncertainty of the variables was taken into account; even with extremely unreliable Raven scores, the posterior mean of the correlation increased from 0.13 to only 0.21. Importantly, although the posterior mean of the inferred correlation increased slightly as a function of the amount of measurement noise in *g*, so did the posterior standard deviation. This result is only logical; if we measure the variables with a great degree of uncertainty, we cannot be confident about the underlying value of the correlation either.

Our approach for inferring correlations between model parameters is a straightforward application of the hierarchical method proposed by Behseta et al. (2009) for modeling measurement error. The original formulation of the model relies on a slightly different prior set-up than the one used in this article. In particular, Behseta et al. used an Inverse-Wishart distribution to model the variance-covariance matrix **Σ** of the latent variables, whereas we chose to model the individual components in **Σ** rather than **Σ** itself. We feel that the latter specification is more intuitive and flexible. It allows researchers to adapt the range of the prior for the group-level *σ _{θ}* and

*σ*parameters to the measurement scale of their variables and to express prior knowledge about the likely values of the inferred correlation in a straightforward manner.

_{β}Of course, we can take into account measurement error only when the error variance of the observations is known or when it can be estimated from data. Our investigation of the extent of the disattenuation as a function of the amount of measurement noise in the Raven scores served merely as an illustration. In real-life applications, the magnitude of the error variance should not be cherry-picked to obtain the desired (higher) correlation; rather it should be estimated from data. Note that the questionable practice of using arbitrary reliability coefficients is one of the major objections to Spearman’s (1904) original attenuation formula (e.g., Muchinsky, 1996). Luckily, our literature review suggested that the uncertainty of the observations may have been estimated for approximately 25% of the reported correlations. This is possible, for example, when each observation was derived as the average of multiple trials in a repeated–measures design.

Similarly, we can infer the underlying correlation between two sets of parameter estimates only if the estimation uncertainty is known. Throughout this article we used Bayesian inference because the resulting posterior distributions can be automatically used to quantify the uncertainty of the parameter estimates. In particular, we treated the mean of the posterior distribution of the CPT and diffusion model parameters as point estimates and used the variance of the posterior distributions as a measure of the participant-specific estimation uncertainty.

With scarce data, however, the posterior distribution of the individual parameters can be sensitive to the choice of the prior. As a result, estimation uncertainty—expressed in terms of the variance of the posterior—can depend on the prior setting used to derive the individual estimates. It follows that the inferred correlation can only be interpreted given the particular prior setting used to obtain the individual parameter estimates. The present two-step framework should therefore be considered as a post-hoc method for inferring correlations: in the first step we obtain posterior distributions for the model parameters under a particular prior setting; in the second step we infer the correlation between the estimates given the estimation uncertainties obtained under the prior setting used in the first step. In the context of theory-laden models, such as the CPT and the diffusion model, we do not consider prior sensitivity as a shortcoming of the two-step approach; we consider the prior as part of the model which should come from theory just as the likelihood does (M. D. Lee & Vanpaemel, in press; Vanpaemel & Lee, 2012).

The present two-step approach is not the only analysis strategy that can account for measurement error or estimation uncertainty. For instance, structural equation models (e.g., Hoyle, 2012; Cole & Preacher, 2014; for Bayesian solutions, see Kaplan & Depaoli, 2012; S.-Y. Lee, 2007; Song & Lee, 2012), factor models (e.g., Lopes & West, 2004; Ghosh & Dunson, 2009; Turner, Wang, & Merkle, 2017), and cognitive latent variable models (Vandekerckhove, 2014) provide more sophisticated avenues for the estimation and testing of covariance structures in the presence of noise in the data. Moreover, as discussed earlier, within the Bayesian framework we are not limited to the two-step procedure illustrated here. Bayesian hierarchical modeling allows for the simultaneous estimation of the individual parameters and their correlations (Gelman & Hill, 2007; Klauer, 2010; Matzke et al., 2015; Rouder et al., 2008, 2007).

Nevertheless, estimating covariance structures in the context of cognitive models is often not trivial. For instance, estimating the pair-wise correlations between model parameters using the simultaneous hierarchical approach has computational consequences; the simultaneous approach can result in highly complex models with undesirable estimation properties (e.g., Turner et al., 2017). We view our two-step method as an easy-to-use and flexible complement to more principled approaches, one that is especially valuable in exploratory analyses of the correlation between model parameters and external observations (see also Ly, Boehm, et al., in press). For instance, the two-step approach allows researchers to explore the relationship between model parameters and covariates without the considerable computational costs involved in re-fitting the cognitive model. Similarly, the two-step approach may be a computationally efficient complement to recently developed multivariate approaches for modeling the relationship between parameters and covariates in cognitive models (Boehm, Steingroever, & Wagenmakers, in press). In particular, the two-step approach may be extended to multiple regression problems by replacing the multivariate structure in the second step of the analysis by the appropriate regression equation. On a more theoretical level, we argue that the two-step method often provides a more flexible and tractable approach to specifying prior distributions in the context of theory-laden models than more sophisticated simultaneous methods (M. D. Lee & Vanpaemel, in press). For instance, the multivariate normal distribution typically assumed in the simulations hierarchical approach may not necessarily reflect the desired prior set-up for the cognitive model in question.

## Conclusion

Our literature review showed that 42% of the articles published in the 2012 volume of the *Journal of Experimental Psychology: General* reported at least one Pearson product-moment correlation coefficient. Despite the wide-spread use of correlations, most researchers fail to acknowledge that the correlation computed between noisy observations or uncertain parameter estimates often underestimates the true strength of the relationship between the variables. Here we outlined a Bayesian method that allows researchers to infer the underlying correlation in the presence of measurement error and estimation uncertainty. Of course, the measurement error of the observations and the uncertainty of the parameter estimates are not always readily available. Also, our simulations confirmed that for relatively low underlying correlations, correcting the attenuation is likely to have only a small effect. We nevertheless encourage researchers to carefully consider the issue of attenuation and whenever sensible take into account the uncertainty that accompanies parameter estimation and psychological measurement.

## Data Accessibility Statement

The WinBUGS and R code for the computation of the inferred correlation and the corresponding Bayes factor, and a detailed summary of the literature review of correlations published in the 2012 volume of the *Journal of Experimental Psychology: General* are available on the Open Science Framework at https://osf.io/mvz29/.

The numbers are based on an non-systematic literature review conducted by the third author. The results of the literature review are available at https://osf.io/mvz29/.

Theoretically, as $\sigma \theta 2$ and $\sigma \beta 2$ are typically unknown and must be replaced by their sample estimates, *r* can be larger than *ρ*.

Naturally, this assumption is sensible only if the posterior distribution is approximately normally distributed.

Note that this distribution is the same as the uniform distribution we use for estimating *ρ*.

The CPT account of performance in the Glöckner and Pachur (2012) data set is merely an illustration; we do not suggest that the CPT with the present parameter setting provides the best description of the data of the individual participants. Note also that Glöckner and Pachur reported the results from fitting a slightly different model than the one used in the present article.

In addition to these key parameters, the diffusion model also features parameters that describe the trial-to-trial variability of the key parameters.

The diffusion model account of performance in the Weeda and Verouden data set (unpublished data) is merely an illustration; we do not suggest that the diffusion model with the present parameter constraints provides the best description of the data of the individual participants.

Note that the scale of both drift rate and general intelligence is bounded: $v\xaf$ can take on values between 0 and 5.86 (i.e., prior range) and the Raven total score can take on values between 0 and 36. The use of the bivariate normal group-level distribution shown in Figure 2 is therefore theoretically unjustified. As a solution, we may transform the drift rate parameters and the Raven scores to the real line using a probit transformation. Additional analyses not reported here confirmed that using the transformed $v\xaf$ and *g* values yields results that are very similar to the ones obtained using the untransformed drift rates and Raven scores. For simplicity, here we report the results of modeling the untransformed parameters.

Note that this is only an illustration; the reliability of the Raven is well documented and is considered adequate (internal consistency ~ 0.90 and test-retest reliability ~ 0.82).

The subscript 0+ indicates the the Bayes factor quantifies evidence in favor of $\mathcal{H}0$; *BF*_{0+} is computed as 1/*BF*_{+0}.

## Acknowledgments

We thank Andreas Glöckner and Thorsten Pachur for sharing their data.

We thank Jaap Verouderen for his help with the data collection (Weeda & Verouden; unpublished data).

## Funding Information

DM is supported by a Veni grant (451-15-010) from the Netherlands Organization of Scientific Research (NWO).

BS is supported by a grant (100014_172806) from the Swiss National Science Foundation (SNSF).

This work was supported by the starting grant “Bayes or Bust” awarded by the European Research Council.

## Competing Interests

The authors have no competing interests to declare.

## Author Contributions

Contributed to conception: DM, AL, BS, MDL, EJW

Contributed to acquisition of data: RS, BS, WW

Contributed to analysis and interpretation of data: DM, AL, BS, MDL, EJW

Drafted and/or revised the article: DM, ML, EJW

Approved the submitted version for publication: DM, AL, RS, WW, BS, MDL, EJW