Given the need for a rapid and critical response from behavioural sciences during times of crisis, this study investigated the trajectory of all preprints posted to the repository PsyArXiv up to 19 May 2020 that were related to COVID-19 (n = 211). Specifically, we examined the trajectory, transparency, and diversity of these preprints as compared to PsyArXiv preprints unrelated to COVID-19 (n = 167) and articles published in psychology journal articles (n = 75) within the same time frame. Preprints related to COVID-19 had similar traction to published journal articles on COVID-19, but compared to preprints unrelated to COVID-19, the COVID-19 preprints were more likely to be subsequently published during a follow-up period (until 2 March 2021), were published more quickly, and received more citations. Preprints related to COVID-19 reported fewer open science practices than preprints unrelated to COVID-19, but more than COVID-19 journal articles. Primary affiliations for all preprints and journal articles predominantly originated from Western countries, but this was comparatively more for preprints (both related to and not related to COVID-19), even though preprints had more international authorship teams than journal articles. In conclusion, this study sheds light on preprint dissemination within the field of psychology amid the COVID-19 crisis, emphasising the swift spread, heightened probability of subsequent publication, and diverse adherence to open science practices among COVID-19-related preprints. These results underline the continual need for rigorous evaluation and advancement of scholarly communication practices, especially during periods of global urgency, to uphold transparency, diversity, and rigour in disseminating vital research findings.
During periods of crisis and instability, psychological research plays a critical role in informing short and long-term policies (O’Connor et al., 2020). The COVID-19 pandemic was one such crisis: a major public health emergency where policies need to be implemented rapidly based on the best scientific evidence available (Ruggeri et al., 2020). While cognitive and behavioural sciences can increase our understanding of the world around us, some scholars are sceptical that certain social and behavioural sciences are advanced enough to deal with policy problems during crises that concern life-or-death questions (Bryan et al., 2021; IJzerman et al., 2020). Consequently, some have argued that the field should first establish the credibility necessary to inform policy, instead of trying to make policy-makers more willing to draw on the discipline (IJzerman et al., 2020).
The practical relevance for the field is considerable, given that non-pharmaceutical interventions for pandemics, which are the only effective measures in the early stages of a pandemic before vaccines and therapeutics are developed, are largely behavioural. This was the case more than a century ago in response to the Spanish Flu (Soper, 1919) and remained so when the COVID-19 pandemic first hit (Zhang et al., 2021). People are often unaware of the dangers of highly transmittable diseases and unwittingly become persistent threats to themselves and others, e. g., by not keeping social distance or not observing fundamental hygiene. Heeding advice from epidemiologists and public health experts necessitates significant behavioural changes (Hargreaves & Davey, 2020) and puts considerable psychological strain on people (Van Bavel et al., 2020). Thus, various elements of a pandemic response can benefit from psychological research. From a mental health perspective, for instance, psychology can inform best practice for supporting individuals with mental health challenges during isolation (Moreno et al., 2020; Talevi et al., 2020). With respect to outreach and public opinion, findings from psychology can advise social media platforms and governments in their communications efforts to better promote accurate information while mitigating damage from COVID-19-related conspiracy theories (Romer & Jamieson, 2020; Uscinski et al., 2020).
However, the crisis-relevance of psychological research depends not only on whether the field produces actionable evidence that suits the challenges of COVID-19, but also whether that evidence is shared in a timely fashion (Whitty, 2015). During a crisis, “speed trumps perfection” (United Nations, 2020), thus the rapid production and dissemination of knowledge is as much a critical factor as the quality of the information (Lipworth et al., 2020). Policy decisions in a rapidly evolving crisis need to be made immediately, not months later when the evidence is available (Whitty, 2015). Psychological research can therefore only inform rapid decision-making if it is available on time.
The dissemination of research has traditionally taken place through the publication of journal articles. Researchers submit their findings in a manuscript, which is then reviewed by experts in the field (“peer review”) and undergoes an often lengthy revision process before it is finally published. The average duration of the review process for submissions in psychology has been estimated to be 20 weeks (Huisman & Smits, 2017). In the context of COVID-19 biomedical research, even just a month taken to review and revise manuscripts (which may only improve the quality minimally; Carneiro et al., 2020) can mean thousands of new COVID cases. Thus, there is a need for timely knowledge production during times of crisis.
Preprints
Preprints, i.e., publicly available scientific papers that have yet to be reviewed or are in the process of peer review (Mudrak, 2020), can address issues of timeliness in crisis knowledge production. Crucially, preprints are submitted to a public server ahead of the journal publication process (although servers may also host “post-prints”, i.e., accepted manuscripts that are shared openly outside the journal’s paywall), which means they allow the scientific community and members of the public, including journalists, policy-makers, and practitioners, to have early access to research. Preprint servers have existed since 1991, and first became common in mathematics, computer science, and physics (Vlasschaert et al., 2020). Since then, many fields have developed dedicated repositories, including psychology’s PsyArXiv server (Vlasschaert et al., 2020).
Preprints could offer a solution to accelerated knowledge production without unnecessarily compromising quality. For example, preprints facilitated novel analyses and new data throughout the Ebola and Zika outbreaks, and the bulk of those that were matched to peer-reviewed papers was accessible more than 100 days before journal publication (Johansson et al., 2018). During the COVID-19 pandemic, preprints have proliferated, especially in the biomedical field (Gianola et al., 2020). For instance, 50-100 daily COVID-19-related preprints were posted to the clinical repository medRxiv in April 2020. The accessibility of research that preprints offer goes beyond rapid dissemination: preprints can encourage comments, share information, and potentially increase the rigour of methodologies (Vlasschaert et al., 2020).
Preprints have also been critiqued and assessed amongst the scientific community on social media platforms such as Twitter (Carlson & Harris, 2020), with preprints on COVID-19 receiving more attention than non-COVID preprints (Fraser et al., 2021). Evidence further indicates that this attentional advantage of COVID-19 preprints has translated into more citations compared to non-COVID-19 preprints, at least in the fields of biology and medicine (Fraser et al., 2021). This trend was also seen among COVID-19-related journal articles, which were cited on average eight times more than research on other topics between 2020 and 2021 (Ioannidis et al., 2022). While to our knowledge citations for COVID-19-related psychology research has not been studied, the global spotlight on the pandemic and the relevance of psychological interventions to crisis response makes it likely that a similar trend would be observed for psychology.
While quicker communication and accumulating evidence benefit scientific research, the drawback with rapid, open dissemination may be the diminishing quality of evidence. There have been concerns about poor quality research and its detrimental effects on an evidence-based pandemic response, especially when the media recklessly amplifies questionable findings (Glasziou et al., 2020; IJzerman et al., 2020). Issues surrounding generalisability, replicability, and validity could undermine the relevance of psychological research, even if it is produced in a timely fashion (Landy et al., 2020). There appears, prima facie, to be a tension between the need for knowledge production in a crisis to be both rapid and rigorous (van Aert et al., 2023).
However, one should not assume that the journal peer review process automatically leads to more rigorous studies. The generalisability, replicability, and validity of published psychological research has been debated long before the pandemic hit (Open Science Collaboration, 2015), and retractions of articles reporting rapid COVID-19 research occurred even in a high-profile journal such as The Lancet (The Editors of the Lancet Group, 2020) as well as a preprint server (e.g., biorXiv). Overall, the peer review process can improve the quality of published research, but the improvements may be small (Carneiro et al., 2020), begging the question of whether withholding publication of crisis-related research throughout the lengthy peer-review process is worth the gain in quality. As Whitty (2015) states, “An 80% right paper before a policy decision is made is worth ten 95% right papers afterwards, provided the methodological limitations imposed by doing it fast are made clear” (p. 3).
With these issues in mind, we set out to investigate the role that preprints could play in supporting crisis-relevant psychological research by studying the trajectory of PsyArXiv preprints early in the COVID-19 pandemic (posted by mid-May 2020). We approached this question by examining the traction of PsyArXiv COVID-related preprints compared to other types of papers, whether these preprints reflected efforts to promote transparency and replicability of rapid research, and whether posting preprints provided a diversification of the evidence base.
Currently, the extent to which these preprints ended up as timely peer-reviewed publications is unclear. Assessing whether and where the preprints were cited, published, and the elapsed time to publication, could provide insights into the extent to which preprints gained traction as crisis-relevant research.We also consider the subsequent publication and citation of a preprint as one proxy indicator for whether the preprint accelerated research development quality without substituting too much quality, although these measures are of course not without flaws and limitations.
One way to promote research quality across a body of research is to implement scientific practices that promote the replicability of research. Even though these may not necessarily ensure the quality of individual pieces of work, it can, at scale, provide the means for the scientific community to independently validate findings. Open science principles, including transparency, reproducibility, and cooperation, can thus promote the validity and reliability of research findings, advance scientific progress, and inform decision-making in the epidemic context. Researchers can improve the transparency and rigour of their work, lower the risk of errors and biases, and foster cooperation and cross-disciplinary interaction by publishing data, methods, and results openly (Allen & Mehler, 2019). In addition, open science approaches can facilitate the replication of key findings, boost public confidence in scientific research, and aid in identifying and addressing the psychological impact of the pandemic on individuals and communities (Norris & Toomey, 2020). Thus, we analysed whether COVID-19-related PsyArXiv preprints utilised open science practices early in the pandemic.
Crisis-relevant research needs to provide adequate coverage of the situation globally and consider people from multiple backgrounds. During times of crisis, a lack of diversity in research could limit policymakers’ ability around the world to inform policies on research-based evidence. It is therefore useful to determine the level of global diversity represented by crisis-relevant psychological preprints. For example, there have been many reports about clinical trials underrepresenting people from black, Asian, and ethnic minority backgrounds (National Institute for Health and Care Research, 2020), in at least one instance leading to a failure to identify the more frequent occurrence of a drug’s side effects in some racial/ethnic minority groups (Yates et al., 2020). Thus, the inappropriateness of these clinical samples leaves open questions about the appropriateness of the pharmaceuticals under testing for these groups. Historical grievances also need to be considered when examining a target population’s background, as vulnerable groups that have experienced mistreatment by governmental actors in the past are less likely to trust those entities with public health measures such as vaccination campaigns (Jamison et al., 2019; Lowes & Montero, 2021). Non-compliance with public health measures, e.g., social restrictions, can be a response to the disproportionate impact on these groups, rather than a deliberate attempt to undermine the crisis response (Lewandowsky et al., 2022).
In the context of psychology, participant samples overwhelmingly from western countries, comprising 96% of all participants in one study of psychological journals (Tindle, 2021). Another study found that 94% of studies that reported the nationality of participants focused exclusively on WEIRD (western, educated, industrialised, rich and democratic countries) samples (Rad et al., 2018). Recently, a study analysed samples, participants, and authors from relevant preprints containing “coronavirus” or “COVID-19” keywords published on PsyArXiv between March and April 2020 and May and December 2020. Their results showed that countries such as the United States were overrepresented in both waves, with publications featuring authors and samples from these countries more likely to be published in higher-impact journals and cited more frequently (Puthillam, 2023). These findings—i.e., the overreliance on a small share of the world’s population and the resulting problems of generalizability beyond the Western context (see “Towards a Global Psychological Science,” 2022)—underscore the need for greater scrutiny and concern about publication bias, research output from non-WEIRD countries, and its impact on psychological research. Specifically, it is important to understand if this overrepresentation was exacerbated compared to non-crisis levels.
This study
The goal of our exploratory study was to examine the trajectory, transparency, and diversity in origin of preprints related to COVID-19 on PsyArXiv. Our primary research objective was to describe these characteristics of COVID-19 preprints in the psychology domain. To achieve this, we used citation and publication rates to quantify the trajectory of preprints, as a measure of whether they gained traction in the scientific community. As a proxy for transparency, we assessed the prevalence of open science practices reported. Finally, we examined diversity of origin in terms of the presence of international authorship and the countries in which the lead authors’ institutions were based.
Our second research objective was to assess whether these characteristics of COVID-19 preprints were unique to attempts to rapidly post relevant research findings during the pandemic (i.e., examining effects of posting a preprint and effects of reporting on COVID-relevant research). To achieve this, we needed to compare how COVID-19 preprints differed from preprints on other topics published in a similar time window, as well as journal articles related to COVID-19 that did not appear on preprint servers. We therefore collected additional metadata for two comparison groups of (i) non-COVID-related PsyArXiv preprints published between January 2020 and 19 May 2020 (n = 167); (ii) COVID-related journal articles published up to 19 May 2020, as indexed by Web of Science and belonging to the category “psychology” (n = 70). In this part of the study, we pre-registered analyses to investigate whether PsyArXiv preprints related to COVID-19 differed in terms of their citation rates, publication status, time to publication, and reporting of open science practices (preregistration and/or open data) compared to (1) non-COVID-related preprints and (2) COVID-related published journal articles.
Methods
We report how we determined our sample size, data exclusions, manipulations, and measures in the study. Our sample sizes were a product of the number of the relevant papers available on the PsyArXiv repository or in the indexed journal database within the research time frame. Data for other non-COVID-related preprints and COVID-related published journal articles were collected subsequent to the initial collection of the COVID-related preprints sample. The method and analysis plan for comparing these types of papers to COVID-related preprints was pre-registered prior to collecting the additional papers and their metadata, and is available (https://osf.io/h7z5r), along with the data and analysis scripts, on the Open Science Framework: https://osf.io/nufjh/.
Data collection
We collected metadata for three types of psychology papers published on PsyArXiv:
COVID-19-related preprints, which comprised all preprints identified with the search term “COVID” uploaded on PsyArXiv between 01 January 2020 and 19 May 2020 (which was roughly two months after many European countries had announced the introduction of measures such as lockdown to limit the spread of COVID-19). We removed 7 duplicates and preprints that were withdrawn from circulation at the point of our analysis (initial sample n = 218; final sample n = 211).
Non-COVID-related preprints were identified using a retrospective search on 31 August 2021, excluding all COVID-related preprints and limiting results to preprints posted between 01 January 2020 and 19 May 2020, i.e., the same date ranges as the first and last COVID-related preprints identified in the first sample. No other restriction for the search was imposed. We removed two duplicates from this dataset (initial sample n = 169; final sample n = 167).
COVID-related journal articles, as indexed by Web of Science and belonging to the category “psychology” (n = 75)1. To maintain comparability of time frames, we limited this sample to only articles published up to 19 May 2020.
Coding of papers
Eight coders formed the team responsible for coding the preprints. The coding was performed manually by conducting Google searches and reviewing each article individually. For example, Open Science practices were identified by determining whether authors had pre-registered their methods and analyses, and/or shared their data and materials in an open repository. At least one of these indicators was required to classify an article as adhering to open science practices. The coding sessions were conducted via Zoom in March and August 2021. Each coder was assigned a set of preprints and tasked with coding them. Prior to the meetings, coders received briefings on the coding procedure (the instructions are outlined in the procedure discussed below). Throughout the meetings, coders had the opportunity to engage in discussions with one another and pose questions to the coordinating researchers.
For each of these datasets, we recorded for analysis the following metadata that was comparable across the three datasets.
Number of citations (and citation rate). For preprints (both COVID-related and non-COVID-related), we manually searched for the preprint by its title in Google Scholar and recorded the number of citations given by the “cited by” feature in Google Scholar. COVID-related preprint citations were recorded on 2 March 2021. Non-COVID-related preprint citations were collected at a second stage, on 31 August 202. We obtained COVID-related journal articles together with their citation count directly from the Web of Science indexing service, which provides the number of citations received by a paper in all known databases.
Based on the number of citations received, we calculated the daily citation rate for the paper as the number of citations at the point of data collection divided by the number of days since the paper had been uploaded (for preprints) or published (for journal articles).
Publication status. For the COVID-related preprints, we searched for the preprint on Google Scholar on 2 March 2021 to record whether the preprint had been published in a journal by this time, and if so, the publication date and the total number of citations for preprint and published versions2. We used the publication date minus date of upload on PsyArXiv to calculate the time from posting to publication for preprints that were subsequently published as journal articles.
For the non-COVID-related preprints, we followed the same procedure (Google Scholar search), but on 31 August 2021, and recorded based on the publication date for published preprints whether the preprint had been published as of 2 March 2021 (to equivalise the follow-up date). Journal articles by default had a publication status of “published”.
International authorship teams. For each paper, we established the location of all the authors’ primary institutional affiliation and recorded an international authorship team when there were at least two authors based at institutions in different countries.
Open science practices. We used two indicators of open science practices: whether authors had pre-registered the methods and analysis and/or shared their data and materials in an open repository (at least one of the two indicators was required). Computational reproducibility or shared code/data provide an upper bound on the validity of this cue for the reliability of a preprint (Obels et al., 2020).
Primary country of origin. We recorded the country of the paper’s lead author’s primary institutional affiliation as the primary country of origin. In our analysis, countries were further grouped as of “Western” (predominantly English-speaking or Western European country) or “non-Western” origin, reflecting the tendency for research to be conducted in “Western, educated, industrialised, rich and democratic” societies (Henrich et al., 2010).
Other metadata collected but not analysed (including the corresponding author’s name, paper URL, paper title, disciplines, tags, etc.) can be found in the dataset shared on the Open Science Framework.
Analytical approach
We conducted and report here Bayesian analyses performed in R (R Core Team, 2021) using the BayesFactor R package (Morey & Rouder, 2014). This choice of analysis was guided by the exploratory nature of our study, where the novelty of the pandemic situation in 2020 made it difficult to specify informative priors a priori. We thus used the default prior specified by Morey and Rouder (2014) in their package, which uses a Cauchy distribution with scale factor = √2/2. Code to reproduce this analysis is shared on the Open Science Framework.
An advantage of using Bayesian analyses in this exploratory study was that it allowed us to compare the support for competing hypotheses given the observed data, as opposed to computing a p-value that is conditional on the null hypothesis being true (Wagenmakers et al., 2018). The analyses are quantified in the form of a Bayes factor (BF10), which, in generic terms, is the ratio of the probability of the data given one model (e.g., the model with predictors, corresponding to the alternative hypothesis, H1) to the probability of the data given another (e.g., the null, or intercept-only model, H0). We interpret BF10 in accordance with conventional Bayes factor evidence categories (Aczel et al., 2017; Jeffreys, 1961; Lee & Wagenmakers, 2014), whereby BF10 > 10, 3-10, and 1-3 indicates, respectively, strong, moderate, and anecdotal support for the alternative hypothesis (support for the null hypothesis is the inverse, i.e., < 1/10, 1/10-1/3, 1/3-1 respectively) and BF10 = 1 indicates equal support for null and competing hypotheses.
For all continuous dependent variables (citation rates, time taken to publish the paper), we used Bayesian regression models with Bayesian independent-samples t-tests for follow up comparisons. For non-parametric dependent variables (i.e., proportions of preprints/journal articles), we used Bayesian contingency tables testing the difference between observed and expected frequencies.
Results
Trajectories of COVID-19 preprints
To assess the trajectories of COVID-19 preprints, we investigated the publication and citation rates of the preprints, relative to the comparison groups. (Although the non-COVID-19 preprints were coded at a later date, we were able to trace back from the publication date which of these preprints had been published as of the same coding date as the COVID-19 preprints.) As of 2 March 2021, 54% of COVID-19 preprints were published. By contrast, only 36% of non-COVID preprints were published at that time, BF10 = 89.86.
Citation rates for COVID-19 preprints. We analysed the citation rates of the COVID-19 preprints in relation to the two comparison groups: non-COVID preprints and COVID-related journal articles posted/published in the same time period as the COVID preprints were posted. Citation numbers were highly positively skewed, skewness = 9.15, kurtosis = 100.25. We thus calculated the log of number of citations for each article/preprint, and calculated a (log) citation rate as the log citation number divided by the number of days since the preprint or article was publicly posted (or published). We implemented a Bayesian linear regression model on this log citation rate, which included topic (COVID vs. non-COVID) and article type (preprint vs. journal article) as predictors.
There was strong evidence that papers related to COVID had a higher (log) citation rate than those that were not, BF10 = 9.48 x 1023 (see Figure 1). There was moderate evidence against article type having an effect on (log) citation rates, BF10 = 0.17. But pre-printing (as opposed to only publishing a journal article without pre-print) appeared to raise (log) citation rates, over and above the effect of COVID-relevance,with moderate support for this effect, BF10 = 6.81.
We conducted follow-up Bayesian independent samples t-tests between COVID-related PsyArXiv preprints and each comparison group. There was strong evidence that COVID-related PsyArXiv preprints had a higher citation rate than non-COVID-related PsyArXiv preprints, BF10 = 1.98 x 1023 (dposterior = 1.26, 95% interval = [1.04, 1.48]). There was only anecdotal evidence that COVID-related PsyArXiv preprints had a higher citation rate than COVID-related published journal articles, BF10 = 2.81 (dposterior = 0.32, 95% interval = [0.06, 0.58]).
Citation rates for published preprints. Because some of the preprints had been published (see Figure 2, panel C), we checked to see whether within this subset of preprints, citation rates differed among the comparison groups (see Figure 2, panel A). We ran Bayesian independent samples t-tests to compare the two preprint types (COVID and non-COVID), which found strong evidence that (log) citation rates for the published version of the COVID-related preprints were higher than the published versions of the non-COVID-related preprints, BF10 = 16.81, dposterior = 0.48, 95% interval = [0.17, 0.79]. In addition, we assessed whether the preprints differed in the length of time they took to be published (i.e., “time to publication”) by calculating the difference between each published preprint’s publication date and its date of posting on the preprint archive (see Figure 2, panel B). A Bayesian independent-samples t-test found strong evidence that COVID-related preprints had a shorter time to publication than non-COVID-related preprints, BF10 = 1.27 x 105, dposterior = -0.85, 95% interval = [-1.18, -0.53].
Characteristics of COVID-19 preprints
Open science practices. The proportion of non-COVID preprints reporting at least one open science practice (59%) was the highest, with strong evidence that this was more than the proportion of COVID preprints (17%), BF10 = 1.91 x 1015. There was also strong evidence that the proportion of COVID-related preprints reporting at least one open science practice was higher than the proportion of COVID-related published journal articles (17% vs. 3%), BF10 = 44.77.
International authorship teams. There was moderate evidence that the proportion of COVID (35%) and non-COVID (39%) preprints that included authors from more than one country were similar, BF10 = 0.25 (see Figure 3). A greater proportion of COVID preprints had authors from more than one country (35%) compared to the proportion of COVID published journal articles (16%), BF10 = 26.80.
Primary country of origin.Figure 3 shows the proportion of preprints and journal articles with countries of origin from different regions (based on the lead author’s primary institutional affiliation). A greater proportion of non-COVID preprints (90%) had a lead author based at an institution of Western origin compared to the proportion of COVID-related preprints (81%), BF10 = 4.88; and the proportion of COVID published journal articles (55%), BF10 = 1.41 x 103.
Post-hoc analysis on citation rates for journal articles
One limitation of the analysis of citation rates was that citation counts were calculated by different platforms for journal articles (Web of Science, as a long-standing publications indexing service) and preprints (Google Scholar, which was the only means available to obtain this data). Differences in how these two services count citations may thus mask or overstate an effect, since Google Scholar has been previously found to identify more citations than Web of Science for publications in some disciplines (e.g., medicine: Kulkarni et al., 2009; Anker et al., 2019), but less in others (e.g., physics: Bar-Ilan, 2008; chemistry: Bornmann et al., 2009). To our knowledge, there is no benchmark for how much citation counts might vary between Google Scholar and Web of Science in the psychology or social science discipline, but Web of Science is more likely to provide an undercount than overcount relative to Google Scholar (Waltman, 2016). As such, we followed up our analysis of comparisons between preprints and journal articles with a post-hoc simulation to assess the potential impact of a difference in citation counts on our findings.
We focused our simulation on calculating a hypothetical citation count for the journal articles under the assumption that Web of Science was undercounting citations relative to Google Scholar. However, since no precise benchmark exists to scale up the count, we ran a series of simulations. For each journal article in our sample, we recalculated the log citation rate under a series of hypothetical scenarios that assumed a given undercount by Web of Science and scaled up the recorded citation count accordingly. In each simulated scenario, the original citation count was “corrected” by a multiplication factor (e.g., 1.1, 1.2, 1.3, 1.4, and so on). For example, for a journal article in our dataset that had a recorded 10 citations (based on Web of Science data), we calculated log citation rates for that article under hypothetical scenarios that assumed it had instead received 11, 12, 13, 14 (and so on) citations. This effectively simulated a series of situations where we assumed Google Scholar (instead of the original Web of Science) would have retrieved 1%, 2%, 3%, 4% (and so on) more citations for each journal article. In each situation, the simulation calculated what its log citation rate would be in such a case. We then repeated the Bayesian regression and follow-up t-test for each one of these scenarios, producing BF10 values for each effect. In these analyses with the simulated journal article log citations, we kept constant the original log citation rates of the preprints since these had already been retrieved via Google Scholar.
As illustrated in Figure 4, there was only a qualitative change in increasing the citation counts for journal articles for effects involving preprint vs. journal article comparisons (panels A and C); the strong effect of COVID relevance would only get stronger if journal article citations were corrected for (panel B). Adjusting journal article citations upwards reduced the evidence for the null effect of preprinting (Figure 4, panel A), but Web of Science would need to be undercounting citations by 40% before evidence starts building that publication increases citation count relative to preprinting and 65% before this evidence is considered moderate.
For the comparison of COVID-related preprints and journal articles, evidence for the effect of higher (log) citation rates for preprints (which was anecdotal-moderate evidence at the outset) decreased as the journal article citation counts were adjusted upwards, with a Web of Science undercount of approximately 17-23% before evidence started to turn in favour of there being no effect.
Discussion
This study assessed the trajectory, transparency, and diversity of COVID-related preprints published on the PsyArXiv repository. Overall, we found substantial differences between preprints addressing COVID-19 and those covering other topics. Preprints related to COVID-19 were more likely to be published, were published more quickly, and received more citations than non-COVID preprints.
Additionally, posting a preprint on PsyArXiv was associated with a higher citation rate, with some evidence to suggest this might go beyond the effect of being COVID-related. Moreover, although these preprints reported fewer open science practices than non-COVID preprints, they had more open science practices than published journal articles on COVID-19. A larger number of COVID-related preprints originated from Western countries compared to COVID-related published journal articles. The proportion of international authorship teams did not differ between COVID and non-COVID preprints.
These findings are largely in line with evidence from other fields. Preprints addressing COVID-19 have been found to be published more quickly, cited more often, and shared more widely than non-COVID-19 preprints in biology and medicine (Fraser et al., 2021). In some areas, the COVID-19-related preprints were published twice as fast as papers covering other topics (Else, 2020). This can largely be attributed to the gravity of COVID-19 and the attentional pull effects that come with it. This becomes especially apparent when looking at the most cited articles in 2020 and 2021: In the pandemic’s first year, 98 of the top-100 most cited articles published in the same year addressed COVID-19. A year later, that share dropped slightly to 76, but remained high nevertheless (Ioannidis et al., 2022). Across various disciplines, papers reporting COVID-19-related research were cited 8 times more than non-COVID-19-related articles on average (Ioannidis et al., 2022). In light of our findings, this pattern of COVID-dominance seems to extend to preprints from psychology and their trajectories as well.
In terms of quality, our findings seem to follow the general pattern of COVID-19-related publishing: getting the article out appears to be more important than committing to open science practices. For instance, COVID-19-related articles published in leading medical journals, e.g., JAMA, The Lancet, and The New England Journal of Medicine, between February and May 2020 were found to be less likely to adhere to reporting guidelines compared to articles unrelated to COVID-19, and were also more likely to be accompanied by a retraction or major post-publication correction (Quin et al., 2021). This, however, can have very practical reasons. Committing to and implementing open science practices consumes time and resources that are notoriously scarce, especially in times of crisis such as the COVID-19 pandemic. Additionally, it can also be linked to heightened work-related stress and prolonged project duration (Sarafoglou et al., 2022). Some authors estimate that adopting and implementing preregistration and registered reports require twice the regular duration of a research project (Allen & Mehler, 2019). In a situation like the COVID-19 pandemic, when the public demands a rapid response from the scientific community to prevent further suffering and death, this can be difficult to justify. Therefore, the additional time needed to implement open science practices must be weighed against the costs to human life.
Although speed is an understandable priority, achieving it by dropping open science practices carries a considerable risk. This is particularly illustrated by cases in which policy is based on flawed or outright false evidence. For instance, a preprint reporting the efficacy of ivermectin, an antiparasitic drug, as a treatment for COVID-19, claiming to reduce the death rate by more than 90%, was later withdrawn when concerns about data manipulations were raised. The preprint nevertheless informed public health policy in Peru, effectively wasting scarce resources on a treatment not backed by robust evidence (Reardon, 2021). Beyond threatening lives, the neglect of good scientific practice also threatens scientific integrity. The Surgisphere scandal and the resulting retraction of two influential papers based on fabricated data, in both The Lancet and The New England Journal of Medicine, underlines this issue (for a full breakdown of the matter, see Offord, 2020). If both journals would have adhered to open science practices, including data sharing, the scandal could have been prevented, as the company declined to share raw data (Ledford & Van Noorden, 2020).
Thus, our results showing that more preprints than published papers - and more non-COVID-19 papers than COVID-19-related ones - include open science practices is somewhat concerning. Although past instances in which the validity of psychological research has been questioned has put a spotlight on the adoption of open science measures (Landy et al., 2020; Open Science Collaboration, 2015; Shrout & Rodgers, 2018), practices such as preregistration still remain rare (Hardwicke et al., 2022). Therefore, more work is needed to encourage researchers and publishers to demand the adoption of open-science measures to improve psychological science. The implementation of open practices necessitates a shift in mindset and efficiency standards, which academics at all levels and funders must accept (Allen & Mehler, 2019; Norris & O’Connor, 2019). In light of various scandals and the prevalence of questionable research practices, it should not be an option but a key requirement for research and should be promoted wherever possible.
Most of the preprints coming from western countries could create some cause for concern as well - an issue also documented by Puthillam (2023). A lack of representation in science can create blind spots when policymakers draw from evidence created by rather homogeneous research groups. This is especially problematic in times of global crisis. Thus, more needs to be done to address this disparity to provide a comprehensive perspective on the psychological implications of situations like a pandemic. Beyond that, if the behavioural sciences aim to have a truly global impact, the representation of diverse cultures and backgrounds in research groups are necessary preconditions.
Limitations
Although it seems reasonable to publish preprints within the field of psychology on PsyArXiv, this is not necessarily the case. Scholars from psychological disciplines closer to clinical practice and medicine in general might have put their research on other repositories such as bioRxiv or medXriv. This could introduce a bias in respect to the research questions addressed and the general research designs covered by the preprints on PsyArXiv. Such fragmentation also impedes our ability to understand preprinting practices throughout the field of psychology. This results in a partial or biassed perspective on current research trends and topics within the discipline. Moreover, such fragmentation may complicate meta-analyses and systematic reviews, particularly when preprints serve as the primary objects of study. Future research should aim to address this issue by exploring the differences among these repositories. One approach could involve bibliometric analysis to examine the citation impact and networks of psychological preprints published across various platforms. Additionally, interviewing authors who have posted preprints on multiple repositories could provide insights into their reasons for selecting specific sites and the perceived effects of these choices.
Furthermore, our search was limited to articles on PsyArXiv, but it’s crucial to recognize that scholars from other behavioural sciences, such as economics, pedagogy, and sociology, may also contribute to this repository. As a result, some of the articles included in our analysis may not exclusively belong to psychology. Future research should consider refining the analysis to focus specifically on psychological articles, ensuring a more targeted examination of the field’s preprint landscape.
While our study coded for open science measures in research papers by assessing whether authors had pre-registered the methods and analysis and/or shared their data and materials in an open repository, it is important to note that these criteria represent the bare minimum standards for open science practices. Thus, it is reasonable to argue that studies that were coded as using open science measures might have varying open science standards. Future research should document the adherence to open science standards in greater detail. For this purpose, indicators that facilitate the replication of studies, such as the availability of analysis scripts, materials, protocols, and raw data (see, for instance, Hardwicke et al., 2022), should be defined and used as a basis for analysing the prevalence of open science practices.
Another limitation concerns the number of citations for COVID-related journal articles, which were obtained automatically from the Web of Science when searching for the journal articles, while citation counts for preprints were manually retrieved from Google Scholar. While these metrics can produce a comparable number of citations, Google Scholar tends to find more citations for a number of disciplines, although no benchmark exists for psychology papers (Waltman, 2016). Our simulation exercise showed that the differences (or lack thereof) in citation rates found in the two analyses we conducted between journal articles and preprints would only qualitatively differ if Web of Science was undercounting by 17-40% (depending on the exact effect). Further research on differences in citation counts between these tools for psychology papers is needed to ascertain whether this amount of undercounting from Web of Science should be expected.
The interpretation of the disparities in publication time between COVID and non-COVID preprints is constrained by the adoption of fast-track procedures in certain journals during the pandemic, such as those published by the Association for Psychological Science (APS), including Psychological Science, Current Directions in Psychological Science, and Perspectives on Psychological Science. A subset of the articles examined in this study have been published in APS journals (three articles in total), suggesting that the accelerated publication pace could be influenced, at least in part, by streamlined processes implemented by these journals.
Conclusions
Psychology and the behavioural sciences more generally can contribute important insights and develop policy recommendations to cope with crises such as a global pandemic. Yet, to provide robust evidence the field needs to produce timely research of reliable quality. Preprints are central to these efforts. Although it is good to see a majority of research unrelated to COVID-19 reported open science practices (as a rough measure of quality), only a minority of COVID-19-related preprints and published articles did. After the replication crisis and in light of the high stakes inherent to a pandemic, it would be devastating for the discipline to see similar cases of scientific misconduct that has led to the retraction of papers in leading medical journals. Additionally, while the geographic concentration of authors is already a limitation in normal circumstances, its impact is particularly pronounced during a global pandemic such as COVID-19, where assembling an international collaboration may require more time and effort—resources that are especially scarce. Thus, in preparation for future pandemics and other crises, a robust infrastructure needs to be developed to improve geographical diversity. For this purpose, international collaboration grants should be established to financially support researchers from diverse regions, along with interconnected global research hubs for resource sharing. Beyond that, standardised digital collaboration tools that are accessible globally should be adopted, supporting features for virtual communication and data sharing across multiple languages and time zones. Using generative AI applications, such as ChatGPT, can help provide translation and editorial services to aid non-native English speakers in publishing their research. Although these initiatives require significant time and resources, they are crucial for enhancing the geographical diversity of research. This improvement is vital not only in normal circumstances but also in times of crisis, ultimately strengthening our findings and increasing their relevance to addressing global health and other emergencies.
Contributions
Substantial contributions to conception and design: MW, MY, DH, SH, SL, and UH. Acquisition of data: MW, MY, DH, MR, ES, KT, SY, GS, and GE-H. Analysis and interpretation of data: DH. Drafted and/or revised the article: MW, MY, DH, CMA, SL, UH, and SH. Final approval of the version to be published: MW, DH, CMA, SL.
Funding Information
DH and SL are currently supported by Horizon 2020 grant 964728 (JITSUVAX). DH received funding from UKRI research fellowship grant ES/V011901/1 during the research. SL and CMA are supported by an Advanced Grant from the European Research Council (101020961 PRODEMINFO) to SL.
Competing Interests
The authors declare that there were no conflicts of interest with respect to the authorship or the publication of this article.
Data Accessibility Statement
Analysis scripts and data are available on the Open Science Framework: https://osf.io/nufjh/?view_only=00f1d52faeec4573951c2c7210e64aab.
Footnotes
In line with a pre-registered exclusion criterion, we excluded one article from this dataset that had also been posted as a PsyArXiv preprint within the same time frame. This was because we sought to investigate the difference between posting a preprint and only publishing as a journal article, so this article was analysed as part of the COVID-related preprints dataset.
Coders were asked to search by title of the preprint but before coding a preprint as “not published”, they were instructed to click on Google’s “see all results” button because Google does present research with similar features, even under a different title. Coders therefore checked for these possible papers that might be published versions of the preprint under a different title.