The impact of the COVID-19 pandemic on psychological well-being led to a proliferation of online psychological interventions, along with the publications of studies assessing their efficacy. The aim of the present work was to assess the scientific quality of studies addressing online psychological interventions for common mental health problems, comparing studies published during the COVID-19 pandemic to equivalent control articles published four years before the pandemic. To this end, we developed and applied a quality checklist to both samples of articles (N=108). Overall, we found that the methodological quality of many studies on psychological interventions was poor both before and during the pandemic. For instance, 33% of the studies lacked a control group of any kind in both samples of articles, and less than 5% of studies used blinding of any sort. Within this context, we found that studies conducted during the pandemic were published faster, but showed a decrease in key indicators such as the randomized allocation of participants of the experimental groups, pre-registration or data sharing. We conclude that the low overall quality of the available research on online psychological interventions deserves further scrutiny and should be taken into consideration to make informed decisions on therapy choice, policy making, and public health –particularly in times of increased demand and public interest such as the COVID-19 pandemic.

The COVID-19 pandemic created an exceptional demand for answers from the scientific community to address a wide array of critical, time-sensitive challenges. One of these challenges has been the deterioration of mental health and well-being, and the need for treatment or prevention of these pervasive problems via online interventions. However, past experience suggests that expediting scientific production may be problematic (e.g., Jung et al., 2021; Kataoka et al., 2021; Keenan et al., 2021). Here, we address whether the general context of the COVID-19 pandemic may have had an impact on quality of research in the specific area of online interventions for common mental health problems such as depression, anxiety, or stress.

Concerns regarding the quality of research produced in response to the pandemic were first raised in the medical-clinical research field, where scores of methodological quality assessments were reported to decline (Candal-Pedreira et al., 2022; Jung et al., 2021; Quinn et al., 2021). More specifically, meta-scientific reports identified several issues, such as a decreased use of randomized clinical trials (RCT) in favour of observational designs (Joshy et al., 2022), smaller sample sizes (Candal-Pedreira et al., 2022), poorer adherence to publication standards (Quinn et al., 2021), a lower rate of pre-registered studies, a lack of transparency of research protocols and data analysis (Kapp et al., 2022), and a higher risk of publication and reporting bias (Candal-Pedreira et al., 2022). There is also evidence that these studies underwent faster peer reviews, which is not a direct indicator of quality but has been reported to be statistically associated with lower methodological quality (Allen, 2021; Horbach, 2021; Jung et al., 2021; Kapp et al., 2022). Based on these results, some authors have underscored that special care should be taken when interpreting the scientific outputs produced during the COVID-19 pandemic (Candal-Pedreira et al., 2022; Jung et al., 2021; Khatter et al., 2021).

The bulk of research regarding this meta-scientific question has been conducted in biomedical fields to monitor the quality of pharmacological interventions -and their scientific output- during the pandemic, but research in other areas with widespread consequences for public health has been almost entirely overlooked. Particularly, methodological assessments of the scientific production in the field of mental health are scarce, despite the well-documented negative psychological consequences of the COVID-19 pandemic (Gruber et al., 2021; Usher, Bhullar, et al., 2020; Usher, Durkin, et al., 2020) and the considerable increase in the number of therapeutic interventions that have been made available to alleviate such consequences. To our knowledge, only the study by Nieto, Navas and Vázquez (2020) conducted a systematic review of the evidence regarding the impact of the pandemic on the quality of research on mental health. Nieto et al. pointed out that the reviewed papers may not meet the standards on validity, generalizability, reproducibility, and replicability (see, for example, Eignor, 2013). They identified issues such as the use of convenience samples, the lack of a priori power analysis, or poor adherence to open science recommendations. Nevertheless, the research quality of the mental health interventions that proliferated during the pandemic – mostly online, due to the lockdown constraints – remains virtually unexplored.

The anxiety related to the health threat of the pandemic, compounded by social distancing requirements and mobility restrictions, fostered a rapid proliferation of online tools to meet the urge for psychological support. While online (i.e., “tele-health”) psychological interventions had emerged in recent years as alternatives to face-to-face interventions (Torre et al., 2018), it was during the pandemic that these approaches experienced a dramatic surge in popularity (Ho et al., 2021; Sammons et al., 2020), along with corresponding research studies aimed at assessing their effectiveness (e.g., Holmes et al., 2020; Palayew et al., 2020; Ruiz-Real et al., 2020). These interventions often involve the use of applications on electronic devices, SMS services associated with different institutions, or the implementation of classic face-to-face therapy through video-conference platforms. Gaining knowledge about the methodological quality of such research is critical to ensure that patients and users receive appropriate mental healthcare and support. Hence, the goal of the present study was to assess the quality of research on online interventions for mental health during the COVID-19 pandemic, compared to a control set of studies published before the pandemic. To this end, we developed a quality checklist designed to address key methodological features of a heterogeneous set of research designs before and after the COVID-19 outbreak, given the lack of adequate tools available for such purpose. We focused on methodological issues previously detected as problematic in this field (Cybulski et al., 2016; Fraley & Vazire, 2014; Spring, 2007; Tackett et al., 2017; Tackett & Miller, 2019).

Literature Search

We searched for primary research articles published in journals included in three scientific databases widely recognized throughout the community (Scopus, Web of Science, and PubMed). The general inclusion criteria for the selection of articles were: (1) peer-reviewed original research articles published in English in indexed journals, (2) studies that tested the effects of online interventions for common mental health problems and well-being, (3) studies that used quantitative measures for the target dependent variables, being them outcomes of mental health, and (4) studies using at least one within-group (before vs. after the intervention) or one between-group comparison (outcomes for experimental vs. control groups). We allowed ample heterogeneity in research designs to be able to collect sufficient samples (especially those after the outbreak, given the short time elapsed). Observational studies, research articles focused on qualitative analyses, case studies, published protocols, reviews, and meta-analyses were excluded from our search. We also excluded grey literature (i.e., evidence not published in indexed journals that could provide data with null or negative results that might not otherwise be disseminated; see, for example, Paez, 2018). This decision was made because we aimed to explore the quality of the articles that would effectively contribute knowledge usable in society (e.g., for the implementation of public policies on mental health, users’ decision-making, …).

For the COVID-19 sample, we searched for articles published during the pandemic. In addition to the general inclusion criteria listed above, articles were selected if they were published after 2018 and included in their title, abstract or keywords a reference to the COVID-19 (“covid-19”, “coronavirus”, “sars-cov-2”, or “pandemic”), to their interventional nature (“intervention”, “program”, “training”, “treatment”, or “therapy”), to the online format of the interventions (“online”, “web”, “app”, “eHealth”, “e-health”, “telehealth”, “tele-health”, “videoconference”, “videocall”, or “digital”), and to the expected psychological variables targeted by the interventions (“anxiety”, “depression”, “stress”, “distress”, “worry”, “mental health”, “mood disorder”, “coping”, “fear”, or “loneliness”). The number of records found for articles published during the pandemic was 2,278. After reading the title and abstract, 1,924 articles were excluded, and the remaining 354 articles were further evaluated for eligibility. We then filtered out studies that reported: interventions that started before the pandemic, case studies, studies that apply a decommissioning strategy, studies that did not report psychological outcomes, studies without interventions, studies that were not exclusively online (e.g., studies that evaluated the transition of a treatment from in-person to the online modality), qualitative studies and study protocols). Articles not explicitly mentioning COVID-19 were excluded, as these may mostly represent papers that were already in the pipeline or submitted for publication before the pandemic. After reading the title and abstract, there were 354 candidate studies left. Of these, 56 finally made up the sample of articles that were analyzed (see PRISMA flowchart below).

Figure 1a.
PRISMA flow diagram for searching and selecting items from the COVID-19 sample of articles. Adapted scheme from The PRISMA Group (2009).
Figure 1a.
PRISMA flow diagram for searching and selecting items from the COVID-19 sample of articles. Adapted scheme from The PRISMA Group (2009).
Close modal

For the control sample, we searched for articles published before the pandemic outbreak (between 2016 and 2020), with otherwise identical keywords and criteria as the COVID-19 sample, except for omitting pandemic-related terms in the search. Given that this approach yielded too many items, we included an additional constraint so that we would narrow down our search results to papers addressing types of interventions already present in the COVID-19 sample. This was not only intended to facilitate the literature search, but also to improve the comparability between the two groups of articles. The added keywords represented the most common interventions found in the abstracts of the COVID-19 sample of articles. To this end, we restricted our search to articles including at least one of the following terms in their title, abstract or keywords: “mindfulness”, “relax”, “sms”, “CBT”, “iCBT”, “psychoeducation”, “support”, “meditation”, “coach”, “EMDR”, “acceptance and commitment”, “MBCT”, “MBI”, “MBSR”, or “heartfulness”. A total of 467 articles were found in the Scopus database. After reading the title and abstract, 260 articles were excluded. After applying the same filters as above, we retained 205 articles that were eligible for evaluation. Of these, 52 articles made up the final control sample (see PRISMA flowchart below).

Figure 1b.
PRISMA flow diagram for the search and selection of items of the control sample (articles published before the pandemic). Adapted scheme from PRISMA Group (2009).
Figure 1b.
PRISMA flow diagram for the search and selection of items of the control sample (articles published before the pandemic). Adapted scheme from PRISMA Group (2009).
Close modal

Please note that three databases (Scopus, Web of Science and PubMed) were used to search for the COVID-19 articles, whilst only Scopus was used in the search for the sample of control articles. In the first case, we wanted to maximize the number of COVID-19 articles and, therefore, we expanded the search as much as possible. In the second case, the objective was to match the COVID-19 articles and a search in Scopus was enough for this purpose, since most journals are represented in all three databases.

We conducted a power analysis using G*Power version 3.1.9.7 (Faul et al., 2007) to determine the minimum effect size detectable with a sample of 108 articles (56 articles published during the pandemic, 52 articles published before the pandemic). For a two-tailed Wilcoxon-Mann-Whitney test, our sample size yields 90% power to detect an effect size of d = 0.645 and 80% power to detect an effect size of d = 0.557, assuming a Gaussian parent distribution. Therefore, this sample size allowed us to detect medium-to-large effects with reasonable power, which we considered adequate given our time and financial resources.

Development of a Checklist for Quality Assessment

The article search described above returned studies with a large variety of methodologies. Previous quality checklists focused almost exclusively on specific designs, such as randomized controlled trials, where methodological characteristics such as blinding and other uncommon design features in psychology have great weight, such as the RoB-2 (Sterne et al., 2019), which is specific for randomized trials; the ROBINS-I of Sterne et al. (2016) or the Newcastle-Ottawa scale for non-randomized studies (Wells et al., 2013). In the present context, we decided to develop a specific checklist tailored to assess the methodological quality of studies with wider methodological variability. Although the development of the checklist was not the primary goal of our study, we aimed to capture the heterogeneity in research designs present in the field of online psychological interventions. We took as a reference previous work on the assessment of research quality (Ferrero et al., 2021; Jung et al., 2021; Sterne et al., 2019). Items were organized in the following clusters: design features, blinding, statistical analysis, replicability, and reporting (Table 1). Using the definition proposed by Nosek & Errington (2020), by replicability we refer to the possibility of repeating the same study. In this sense, the checklist included three items/questions to check if an independent research team would have sufficient information to re-run the same study on an online intervention. We did not attempt to compute a single quality score for each study merging information from different items, as these composite scores have been criticized extensively in the literature due to their heterogeneous nature (Jüni et al., 1999; Mazziotta & Pareto, 2013; Peduzzi et al., 1993).

Table 1.
Items included in our quality assessment checklist for online interventions in clinical psychology. Each variable is described and operationalized in the Coding Manual (see Supplementary Material 1).
Domain of quality evaluation Items 
Design features 
  1. Is there a study pre-registration/trial registration/protocol pre-published?

  2. Are participants randomized into groups?

    • Is this study labelled as a randomized controlled trial (RCT)?

      • If it is not labelled as RCT, which name does it receive?

  3. What is the total sample size at the end of the study?

  4. What is the attrition rate of the experimental group?

  5. Is there an equivalence experimental group?

  6. Is there a control group?

    • If yes, is this an active control group?

    • Which is the attrition rate of the control group?

  7. What is the sample size ratio between experimental and control groups at the end of the study?

  8. Does the study adhere to CONSORT guidelines?

 
Blinding 
  1. Is it stated that participants were blinded to the goals or the hypothesis of the study, and the assignment on study groups?

  2. Is it stated that those delivering the intervention were blind to the goals or the hypothesis of the study and to the assignment on study groups?

  3. Is it stated that data analysts were blind to the goals or the hypothesis of the study, as well as to the group membership of participants?

 
Statistical analysis 
  1. Is sample size established by a-priori power analysis?

  2. Are groups equivalent in sociodemographic variables by statistical analysis?

    • If differences are detected, is there an adequate treatment of sociodemographic variables in data analysis?

  3. Are groups equivalent in baseline scores for the dependent variable by statistical analysis?

    • If not, is there an adequate treatment of the baseline scores?

  4. Are the statistical analyses adequate for the research design?

  5. Is data analysis performed only for complete cases (not intention-to-treat)?

 
Replicability 
  1. Is the intervention replicable by independent researchers?

  2. Is the dependent variable (DV) replicable?

  3. Is the dependent variable (DV) obtained by validated psychometric tools?

 
Reporting & publishing 
  1. Are the key results of the analysis fully reported?

  2. Is the article available in open access?

  3. Is the data publicly available?

  4. Are potential conflicts of interest explicitly reported?

  5. What is the acceptance time (in days) of the article?

  6. What is the source of research funding?

 
Domain of quality evaluation Items 
Design features 
  1. Is there a study pre-registration/trial registration/protocol pre-published?

  2. Are participants randomized into groups?

    • Is this study labelled as a randomized controlled trial (RCT)?

      • If it is not labelled as RCT, which name does it receive?

  3. What is the total sample size at the end of the study?

  4. What is the attrition rate of the experimental group?

  5. Is there an equivalence experimental group?

  6. Is there a control group?

    • If yes, is this an active control group?

    • Which is the attrition rate of the control group?

  7. What is the sample size ratio between experimental and control groups at the end of the study?

  8. Does the study adhere to CONSORT guidelines?

 
Blinding 
  1. Is it stated that participants were blinded to the goals or the hypothesis of the study, and the assignment on study groups?

  2. Is it stated that those delivering the intervention were blind to the goals or the hypothesis of the study and to the assignment on study groups?

  3. Is it stated that data analysts were blind to the goals or the hypothesis of the study, as well as to the group membership of participants?

 
Statistical analysis 
  1. Is sample size established by a-priori power analysis?

  2. Are groups equivalent in sociodemographic variables by statistical analysis?

    • If differences are detected, is there an adequate treatment of sociodemographic variables in data analysis?

  3. Are groups equivalent in baseline scores for the dependent variable by statistical analysis?

    • If not, is there an adequate treatment of the baseline scores?

  4. Are the statistical analyses adequate for the research design?

  5. Is data analysis performed only for complete cases (not intention-to-treat)?

 
Replicability 
  1. Is the intervention replicable by independent researchers?

  2. Is the dependent variable (DV) replicable?

  3. Is the dependent variable (DV) obtained by validated psychometric tools?

 
Reporting & publishing 
  1. Are the key results of the analysis fully reported?

  2. Is the article available in open access?

  3. Is the data publicly available?

  4. Are potential conflicts of interest explicitly reported?

  5. What is the acceptance time (in days) of the article?

  6. What is the source of research funding?

 

Coding Procedure

Two independent judges (authors C.R.P. and M.B.) assessed the checklist items for each article. The coding protocol was refined throughout 12 meetings (average duration 2:30h) over five months during which the two coders iterated between independent pilot coding and discussion to expand/expunge variables and to fine-tune their definition, following the recommendations by Wilson (2019). Most importantly, during these meetings the two raters never discussed the specific coding of any study in particular, but just general problems faced with the application of the checklist. This ensured that the assessment of each judge remained as independent as possible while allowing the progressive refinement of the checklist. Once the two judges had completed their independent assessments, they resolved disagreements through discussion until a unified agreement was reached on each case.

Of note, coders were not blind to the article category. While we acknowledge that coding would benefit from including blind coders, this would have added an extra layer of complexity to the quality assessment process (i.e., thoroughly pre-processing article characteristics by an independent researcher before coding) that was deemed unfeasible. The coding protocol can be found in Supplementary Material 1.

Data Analysis

The dataset containing the scoring of each article was created, stored, and manipulated using Microsoft Excel 16.0.15028.20160 and Google Spreadsheets. Statistical analyses were performed using Jamovi (The jamovi project, 2021) and RStudio (Posit team, 2023), with the packages tidyr (Wickham, Girlich, et al., 2022) and dplyr (Wickham, François, et al., 2022) for data wrangling. For descriptive and inferential statistical analysis, we used pastecs (Grosjean et al., 2018), stats (R Core Team, 2023), MVN (Korkmaz et al., 2021), asht (Fay, 2022), summarytools (Comtois, 2022), gmodels (Warnes et al., 2022), MASS (Ripley et al., 2022) and vcd (Meyer et al., 2023). The ggplot2 (Wickham, 2016) package was used for data visualization. Continuous variables were reported as mean and SD or median as appropriate, and categorical variables were reported as proportions (%). Continuous variables were compared using the Student t-test (or Mann-Whitney U-test when assumptions were not met) and categorical variables were compared using χ2, Fisher’s exact test, or Kruskal-Wallis test depending on compliance with statistical assumptions. All contrasts were two-tailed. Cohen’s d and odds ratios (OR) were presented as effect sizes for continuous and categorical variables, respectively. However, it is worth noting that as some of our continuous variables did not meet normality assumptions, Cohen’s d may misrepresent the difference between medians of both distributions.

Out of the 108 studies included in the analyses, 41.7% of the articles were originally produced in research centers or universities in North America, 27.8% in Europe, 18.5% in Asia, 9.3% in Oceania, and 0.9% (one article) in South America. Most articles included samples of participants drawn from the general population in western countries. The most represented journals were Mindfulness (Springer, IF = 3.8; n = 8), JMIR Mental Health (JMIR Publications, IF = 6.33; n = 8), International Journal of Environmental Research (Springer, IF = 1.48; n = 4), Frontiers in Psychology (Frontiers Media, IF = 4.23; n = 4), and Internet Interventions (Elsevier, IF = 5.36; n = 3). Impact Factors refer to 2022. Results for each category of the quality assessment checklist are presented below separately for each group (published before the pandemic started and during COVID-19). Summary statistical results are shown in Table 3 for dichotomous variables.

Design Features

In terms of design features, statistically significant differences between COVID-19 and control articles were found in the use of pre-registration, randomization of participants, and the use of RCTs, with control articles showing higher scores than COVID-19 ones in all cases. No statistically significant differences were found in the use of equivalence experimental groups, control groups, the use of active control groups and the use of research guidelines (e.g., CONSORT; Schulz et al., 2010) (Figure 2, Table 3). It should be noted that only 71 of the 108 studies (66%) altogether used control groups at all. This proportion was numerically higher for articles published before the pandemic (73%) than during the pandemic (55%), albeit the difference did not reach statistical significance (Figure 2, Table 3).

Figure 2.
Design features (categorical variables) on both samples of studies (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call. * = p < 0.05; *** = p < 0.001
Figure 2.
Design features (categorical variables) on both samples of studies (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call. * = p < 0.05; *** = p < 0.001
Close modal

No statistically significant differences were found in final sample size (MCOVID-19 = 267.38; MCONTROL = 153.85; SDCOVID-19 = 824.66; SDCONTROL = 204.46; MdnCOVID-19 = 75.50, MdnCONTROL = 62; Mann-Whitney U = 0.53, p = 0.597, 95% CI =[0.42; 0.63], Cohen’s d = 0.186, 95%CI = [-0.19; 0.56]) and sample size ratios between the experimental and control groups (MCOVID-19 = 1.06; MCONTROL = 1.02; SDCOVID-19 = 0.62; SDCONTROL = 0.316; MdnCOVID-19 = 1, MdnCONTROL = 0.945; Mann-Whitney U = 0.56, p = 0.336, 95% CI = [0.43; 0.69], d = 0.101, 95%IC = [-0.358, 0.558]). However, the attrition rates of both the experimental (MCOVID-19 = 14.90; MCONTROL = 28.85; SDCOVID-19 = 18.84; SDCONTROL = 22.84; MdnCOVID-19 = 7.62, MdnCONTROL = 25.69; Mann-Whitney U = 0.30, p < 0.001, 95% CI = [0.21; 0.40], d = 0.67, IC95% = [1.07, 0.26] ) and control groups (MCOVI-19 D = 11.43; MCONTROL = 21.18; SDCOVID-19 = 16.45; SDCONTROL = 19.76; MdnCOVID-19 = 1, MdnCONTROL = 16.13; Mann-Whitney U = 0.32, p < 0.001, 95% CI = [0.21; 0.46], d = 0.553, IC95% = [1.02, 0.03]) did differ, with articles published during COVID showing a significantly lower attrition rate (Figure 3).

Figure 3.
Attrition rates in experimental and control groups on both samples of studies. In pink, studies published before COVID-19; in green, studies published during COVID-19.
Figure 3.
Attrition rates in experimental and control groups on both samples of studies. In pink, studies published before COVID-19; in green, studies published during COVID-19.
Close modal

Blinding

No statistically significant differences were found in reported blinding of participants, interveners, and data analysts. However, it is noteworthy that overall, the reported use of blinding of any sort (and by implication, of double-blind designs) was extremely rare in both samples of articles: during the pandemic, 5.66%, and their historical controls 0% (Figure 4). We assume that there is no tradition of flagging these characteristics due to the nature of the interventions and studies within Clinical Psychology.

Figure 4.
Reported use of blinding on both samples of studies (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call.
Figure 4.
Reported use of blinding on both samples of studies (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call.
Close modal

Statistical Analysis

Articles published during the pandemic relied significantly more often on the analysis of complete cases, without considering the percentage of the sample that did not complete the intervention. For the remaining checklist items, no statistically significant differences were found, although it is noteworthy that it is the articles published during COVID-19 that show the highest proportion of “no information/not applicable” responses (Figure 5, Table 3).

Figure 5.
Data analysis items on both samples of studies (on %, separated by publishing time). SD = sociodemographic; BL = baseline; ** = p < 0.01. Not Applicable means that there wasn’t enough or clear information to establish a clear evaluation.
Figure 5.
Data analysis items on both samples of studies (on %, separated by publishing time). SD = sociodemographic; BL = baseline; ** = p < 0.01. Not Applicable means that there wasn’t enough or clear information to establish a clear evaluation.
Close modal

Replicability

Articles published during the pandemic were assessed to be less replicable by other researchers. No statistically significant differences were found concerning the use of replicable and validated dependent variables (Figure 6, Table 3). The difference was therefore due to other aspects such as the specificity of the interventions’ descriptions. Specifically, the interventions’ descriptions did not provide sufficient detail to enable consistent application by other researchers or clinicians.

Figure 6.
Replicability items on both samples (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call. ** = p < 0.01.
Figure 6.
Replicability items on both samples (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call. ** = p < 0.01.
Close modal

Reporting and Publishing

Articles published during COVID-19 were more likely to be published in Open Access than articles published before the pandemic, and they underwent a shorter acceptance time (Mann-Whitney U = 0.32, p < 0.01, 95% CI = [0.21; 0.44], d = 0.58, IC95% = [1.04, 0.12] ) (Figures 7 and 8). No statistically significant differences were found in terms of full reporting of key results, conflicts of interest, or the types of funding received (Figure 7).

The twenty-seven conflicts of interest explicitly reported in the articles were related to the authors’ relationships as developers of the online apps being assessed, receiving income after publishing books on psychopathology and interventions, and/or collaborating with private companies or receiving royalties from them.

Figure 7.
Reporting & publishing items on both samples of studies (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call. *** p < 0.001.
Figure 7.
Reporting & publishing items on both samples of studies (on %, separated by publishing time). “Not Applicable” means lack of enough information to make a clear call. *** p < 0.001.
Close modal
Figure 8.
Distribution of Acceptance time (in days) in both samples of studies. In pink, studies published before COVID-19. In green, studies published during COVID-19. The right-hand side shows the number of days to acceptance for individual articles. Vertical lines indicate group averages, showing how articles published during COVID-19 had shorter acceptance times than articles published before COVID-19. ** p < 0.01
Figure 8.
Distribution of Acceptance time (in days) in both samples of studies. In pink, studies published before COVID-19. In green, studies published during COVID-19. The right-hand side shows the number of days to acceptance for individual articles. Vertical lines indicate group averages, showing how articles published during COVID-19 had shorter acceptance times than articles published before COVID-19. ** p < 0.01
Close modal
Table 3.
Summary of statistical results for dichotomous variables.
Design features 
 95% IC 
Variable N Statistic Value p OR Lower Upper 
Pre-registered 108 χ2 6.40 0.011 0.34 0.15 0.80 
χ2 cc 5.40 0.020 
Randomization 108 χ2 14.26 < .001 0.21 0.09 0.48 
χ2 cc 12.80 < .001 
RCT 108 χ2 16.30 < .001 0.19 0.09 0.44 
χ2 cc 14.80 < .001 
Use of equivalence experimental groups 108 χ2 1.55 0.213 0.50 0.17 1.50 
χ2 cc 0.95 0.330 
Use of control groups 108 χ2 2.40 0.122 0.53 0.23 1.19 
χ2 cc 1.81 0.179 
Use of active control groups 71 χ2 1.16 0.280 0.60 0.23 1.53 
χ2 cc 0.71 0.400 
Use of guidelines 108 χ2 1.25 0.264 0.60 0.25 1.47 
χ2 cc 0.80 0.372 
Statistical analysis 
Power analysis 108 χ2 1.19 0.274 0.65 0.29 1.42 
χ2 cc 0.799 0.372 
Sociodemographic equivalence (statistical) 74 χ2 1.99 0.159 0.50 0.19 1.32 
χ2 cc 1.35 0.245 
Sociodemographic features accounted for 71 χ2 0.0482 0.826 0.90 0.33 2.40 
χ2 cc 
Baseline scores equivalence (statistical) 65 χ2 0.244 0.621 1.32 0.44 3.96 
χ2 cc 0.0471 0.828 
Baseline scores accounted for 75 χ2 2.49 0.287 2.40 0.55 10.4 
χ2 cc 2.49 0.287 
Adequate statistical analysis 108 χ2 1.62 0.202 2.87 0.53 15.5 
χ2 cc 0.78 0.377 
Exclusive complete-cases analysis 101 χ2 9.39 0.002 3.85 1.59 9.31 
χ2 cc 8.15 0.004 
Reported use of blinding 
Of participants 108 χ2 2.87 0.091 6.87ª 0.35 136 
χ2 cc 1.22 0.268 
Of interveners 108 χ2 0.89 0.345 2.89 0.29 28.7 
χ2 cc 0.19 0.664 
Of data analysts 108 χ2 0.08 1.26 0.27 5.90 
χ2 cc 5.01e-31 0.772 
Replicability 
Replicable interventions 108 χ2 5.88 0.015 0.33 0.13 0.83 
χ2 cc 4.86 0.027 
Replicable DV 108 χ2 0.937 0.333 0.35ª 0.014 8.84 
χ2 cc 1.11e-⁠30 
Validated DV 108 χ2 0.392 0.531 0.62 0.14 2.75 
χ2 cc 0.0669 0.796 
Reporting & Publishing 
Fully reported 108 χ2 1.98 0.160 2.00 0.75 5.31 
χ2 cc 1.35 0.245 
Open Access 108 χ2 14.6 < .001 6.91 2.36 20.2 
χ2 cc 12.9 < .001 
Open Data 108 χ2 1.15 0.284 2.45 0.45 13.2 
χ2 cc 0.464 0.496 
Design features 
 95% IC 
Variable N Statistic Value p OR Lower Upper 
Pre-registered 108 χ2 6.40 0.011 0.34 0.15 0.80 
χ2 cc 5.40 0.020 
Randomization 108 χ2 14.26 < .001 0.21 0.09 0.48 
χ2 cc 12.80 < .001 
RCT 108 χ2 16.30 < .001 0.19 0.09 0.44 
χ2 cc 14.80 < .001 
Use of equivalence experimental groups 108 χ2 1.55 0.213 0.50 0.17 1.50 
χ2 cc 0.95 0.330 
Use of control groups 108 χ2 2.40 0.122 0.53 0.23 1.19 
χ2 cc 1.81 0.179 
Use of active control groups 71 χ2 1.16 0.280 0.60 0.23 1.53 
χ2 cc 0.71 0.400 
Use of guidelines 108 χ2 1.25 0.264 0.60 0.25 1.47 
χ2 cc 0.80 0.372 
Statistical analysis 
Power analysis 108 χ2 1.19 0.274 0.65 0.29 1.42 
χ2 cc 0.799 0.372 
Sociodemographic equivalence (statistical) 74 χ2 1.99 0.159 0.50 0.19 1.32 
χ2 cc 1.35 0.245 
Sociodemographic features accounted for 71 χ2 0.0482 0.826 0.90 0.33 2.40 
χ2 cc 
Baseline scores equivalence (statistical) 65 χ2 0.244 0.621 1.32 0.44 3.96 
χ2 cc 0.0471 0.828 
Baseline scores accounted for 75 χ2 2.49 0.287 2.40 0.55 10.4 
χ2 cc 2.49 0.287 
Adequate statistical analysis 108 χ2 1.62 0.202 2.87 0.53 15.5 
χ2 cc 0.78 0.377 
Exclusive complete-cases analysis 101 χ2 9.39 0.002 3.85 1.59 9.31 
χ2 cc 8.15 0.004 
Reported use of blinding 
Of participants 108 χ2 2.87 0.091 6.87ª 0.35 136 
χ2 cc 1.22 0.268 
Of interveners 108 χ2 0.89 0.345 2.89 0.29 28.7 
χ2 cc 0.19 0.664 
Of data analysts 108 χ2 0.08 1.26 0.27 5.90 
χ2 cc 5.01e-31 0.772 
Replicability 
Replicable interventions 108 χ2 5.88 0.015 0.33 0.13 0.83 
χ2 cc 4.86 0.027 
Replicable DV 108 χ2 0.937 0.333 0.35ª 0.014 8.84 
χ2 cc 1.11e-⁠30 
Validated DV 108 χ2 0.392 0.531 0.62 0.14 2.75 
χ2 cc 0.0669 0.796 
Reporting & Publishing 
Fully reported 108 χ2 1.98 0.160 2.00 0.75 5.31 
χ2 cc 1.35 0.245 
Open Access 108 χ2 14.6 < .001 6.91 2.36 20.2 
χ2 cc 12.9 < .001 
Open Data 108 χ2 1.15 0.284 2.45 0.45 13.2 
χ2 cc 0.464 0.496 

Note. OR estimates were calculated by comparing rows (COVID/control). Variables for which a significant effect was observed are shown in bold. cc = continuity correction. ªHaldane-Ascombe correction applied.

The COVID-19 pandemic that started in early 2020 has become a large-scale natural experiment that can reveal valuable information regarding how research works under unusually high pressure. It is well known that the generalized context of increased stress levels after the COVID-19 outbreak led to a higher prevalence of mental health issues in the population (see, for example, Seyed et al., 2021). This, combined with a surge of online communication due to mobility constraints, led to the proliferation of mental health online interventions in the market (see, for example, Sammons et al., 2020). Many of these interventions urgently sought empirical evidence supporting their efficacy, thereby creating a research context in which: (1) the need to obtain information immediately led to speeding up the research process, increasing the risk of mistakes; (2) a rapid peer review process may contribute to overlooking important details; (3) an already competitive research environment to achieve higher bibliometric indicators was fostered with the urge to publish while the topic remained timely; (4) effective and early solutions were urgently required in a moment of social emergency, and (5) contextual variables may have a negative effect on researchers’ performance (e.g., the stress of being in a highly demanding situation, or conciliation problems). All these factors have been associated with a negative impact on methodological research quality (Allen, 2021; Bommier et al., 2021; Kapp et al., 2022; Park et al., 2021). Consequently, we wondered if the methodological quality of articles addressing the efficacy of online mental health interventions may have decreased during the pandemic compared to before the pandemic. This question was additionally motivated by previous reports underlining a decrease in the methodological quality of other health-related research fields in the wake of the COVID-19 outbreak (see Jung et al., 2021). To address this issue, we developed a quality checklist and applied it to a selection of studies published during the COVID-19 pandemic and a set of comparable control studies. Overall, for many indicators, we found poor adherence to best methodological standards across the board, without significant differences between the two samples of articles. In addition, in some of the indicators we detected signs of lower methodological quality in the articles on online mental health interventions conducted after the COVID-19 outbreak. Specifically, there was a decrease in the use of randomized groups, RCT designs and pre-registration (which has an impact in reducing questionable practices such as HARKing; Kerr, 1998; Lakens, 2019). In addition, although not statistically significant, we detected a numerical decline in the use of active control groups in favour of waiting list control groups. Overall, 25 variables were measured, 17 were non-significant, and 4 indicated poorer research quality for research published during the pandemic (pre-registration, randomization, use of RCT designs, replicability). Additionally, there were reduced attrition rates in research published during the pandemic, and open-access publishing was more common in these studies.

The decline in the use of randomized groups could have compromised the quality of the evidence after the pandemic. The lack of randomization increases the chance that confounding baseline variables produce systematic differences between groups, in addition to, or instead of, the main treatment at test (Reeves, 2008). Although randomization is not the silver bullet for perfectly matched groups (Sella et al., 2021), its absence makes it more likely to incur into range restriction and to overestimate effects. Besides inflating the apparent efficacy of treatments, the overestimation of effects leads to an increase in false-positive rates, contributing to the replicability crisis (Fraley & Vazire, 2014; Spring, 2007; Tackett et al., 2017; Tackett & Miller, 2019). Furthermore, the lack of adequate control groups may make it impossible to draw valid conclusions from one study, since it is not possible to compare the effect of the application of an intervention with its absence. In other words, it poses problems in terms of internal validity (Joy et al., 2005). Beyond the comparison between COVID-19 and control articles, we would like to point out that the generalized lack of active control groups overall (only 35 in 108 studies included an active control group) is an alarming result in itself. What is more, about 37 in 108 studies did not include any kind of control group at all. In clinical research, the inclusion of an active control group, although not sufficient, is often necessary to tell apart the improvements due to the treatment from other variables such as the passage of time, or general amelioration based on the expectation of treatment (e.g., Boot et al., 2013).

Articles published during the pandemic also showed a preference for analysis of complete cases (as opposed to intent-to-treat analysis). The loss of participants mid-treatment breaks the principle of randomization and generates a final sample with fewer participants due to potential variables that have not been measured or observed but are related to the outcome. When the sample of participants is particularly motivated by the intervention –and losing participants during the intervention could generate this circumstance– there may be an overestimation of the effect sizes (Cuijpers et al., 2010; May et al., 1981; Peduzzi et al., 1993; Sackett & Gent, 1979). Thus, the conclusions that can be drawn about the efficacy of the interventions may be biased (Salim et al., 2008). Another distinguishing feature between the COVID-19 and control articles is the time of acceptance for publication in journals. Articles published during the pandemic had shorter acceptance times than articles published before the pandemic. This is consistent with converging evidence that claims the editorial and peer review processes were faster during the pandemic, which was associated with fewer editor’s recommendations for significant changes or additional experiments, and a more conciliatory and cooperative tone (Horbach, 2021; Kapp et al., 2022). Although this may not necessarily be an indication of lower quality, it may reflect more lenient standards in certain studies (see, for example, Horbach, 2021), which may ultimately lead to overlooking methodological problems and to a lack of transparency (Allen, 2021; Jung et al., 2021; Kapp et al., 2022). An alternative explanation for the shorter review time may be that the reviewers were more motivated given the situation and prioritized the review over other tasks, causing the review to be compressed in time even though the time spent on the review itself remained equivalent. With our present data, we cannot discard these alternative explanations.

Articles during COVID-19 tended to publish their full texts in open access to a greater extent, but this was not accompanied by an increase in making the datasets available to the scientific community. This limitation was shared with articles published before the pandemic: neither before nor after the pandemic there is a tradition of publishing the data in repositories (Vanpaemel et al., 2015). This has implications for the replicability debate. Sharing datasets is associated with the strength of the evidence obtained and the quality of statistical reporting of the results (Claesen et al., 2023; Wicherts et al., 2011). It also allows for preventing or solving statistical errors, generating new research questions, and establishing collaborations between researchers (Wicherts, 2016). We could say Open Science initiatives seem to pay more attention to sharing the article than to sharing the data, thus generating a ‘marketing-oriented open science’ phenomenon due to the several difficulties in terms of the response cost for the individual researchers (e.g., on preparing the research byproducts for them to be publicly available with no extra-resources from their institutions), as suggested by some studies (see Fernández Pinto, 2020; Scheliga & Friesike, 2014; Tenopir et al., 2020; Vines et al., 2014). Given the design of the present study, we cannot tell whether the greater proportion of open access modality was due to the contextual conditions caused by the pandemic or to the increasing trend to adopt this publishing modality with time. There was already an increasing tendency to publish in open access before the pandemic began, so the causal link is impossible to establish at present (Belli et al., 2020; Miguel et al., 2016; Tennant et al., 2016).

Beyond the potential before-after differences highlighted above, a secondary but important finding that emerges from the present study relates to the generalized problems in research on online interventions for mental health. Of particular relevance is the lack of control groups, which were lacking in about a third of all the studies, and the even rarer use of active control groups; the reduced use of randomization to form treatment groups; the lack of sample size calculation supported by an a priori power study; and the lack of pre-registration of the study design. We also found that the studies of our sample often did not use (or failed to report) blinding strategies, did not usually match groups in socio-demographic variables, did not inform about the specifics of the implemented interventions sufficiently to allow replication, and did not provide sufficient details about the data treatment and analysis performed, among other limitations that jeopardize the generalizability of their results.

It is also relevant to highlight several methodological limitations of the work presented in this article. Regarding the design of the checklist, it is possible that we failed to cover all the content necessary to fully assess the methodological quality in such a heterogeneous area of research. Furthermore, we encountered difficulties in the evaluation of some items, such as the adequacy of the data analysis; while there is agreement regarding the lack of adequacy of certain analyses, there is significantly less agreement concerning the correctness of others. In many cases, it was difficult to know whether the reported analysis was ideal for the question posed in the study merely because of insufficient information (e.g., lack of precision in justifying or describing the chosen technique). This could have potential implications, as Simmons et al. (2011) point out that the varying flexibility in reporting the selection of statistical analysis is related to a higher incidence of false positives. As Scheel (2022) indicates, “most psychological research findings are not even wrong” because of its critical underspecification. There are some decisions on analysis techniques that are incorrect (e.g., using separate t-tests for experimental and control groups) while others could be more open to discussion, which made it impossible to be categorical on this item. Another limitation of the checklist is that it is only sensitive to reported information. That is, there is a possibility that some of the studies included met the methodological standards assessed in our checklist, but simply failed to report them. For instance, blinding of participants may have been implemented but not mentioned in the article. Conversely, other information may have been concealed, such as the existence of undeclared conflicts of interest. Yet, failing to report what would seem important information about a research study would be, itself, an indicator of poor scholarship. Beyond financial conflicts of interest –which are the easiest to detect if reported, non-financial conflicts such as personal experience, academic competition and the theoretical approach of the researchers could also influence the research process. This issue is not assessed by our checklist because of the inherent difficulties in such a conceptualization. Nevertheless, even if not explicitly stated, it does not mean that these kinds of conflics are not ocurring. Rather, they are an important factor to consider (Bero, 2017).

It is also important to note that the present study cannot establish a causal link between changes in research quality and the pandemic. However, we can hypothesize that the difficulties in working conditions generated by the public health crisis probably played a role. There may have been difficulties in recruiting samples, with those available being more committed to the study, hence the reduction in attrition; a more agile and immediate research process may have been necessary, hence reduced RCT planning and pre-registration. These are all speculative, but plausible causal explanations of the phenomenon observed in this study. Sakamoto et al. (2022) found that 91.1% of a sample of researchers from different fields changed the way they managed their work routine because of the pandemic. Many of the problems mentioned above (pressure to increase the number of publications, problems for planning research, conflicting interpretations, and finding financial support) may have been exacerbated because of the pandemic (Fuentes, 2017; Mohan et al., 2020; Zarea Gavgani, 2020). However, there is no systematized literature on all these factors and their study needs to be expanded.

The present study has revealed an initial but admittedly incomplete picture regarding the research quality of the burgeoning field of online mental health interventions. Future research should for example deepen the development of more complete checklists and other evaluation tools that cover more faithfully the characteristics of research addressing online psychological treatments. These could be complemented by interviews and surveys of the authors of the same articles to understand in greater depth which steps have been taken. We believe that it is necessary to continue working on the evaluation of the methodological quality of online interventional studies, since they are becoming one of the main approaches used in psychology, especially for the treatment of the most frequent forms of mental issues. Moreover, it is worth exploring how other relevant and more sophisticated issues in research such as HARKing (re-hypothesizing once the data have been analyzed), and ‘spinning’ (when the narrative of an article does not correspond to the results obtained and seeks to emphasize the positive results and disguise the negative ones; see, for example, Chiu et al., 2017) could be included. It is also equally necessary to analyze the context in which the research is conducted, as well as to consider the rules of the game that govern it, many of which favour questionable research practices, to propose more appropriate practices (Bakker et al., 2012).

Articles published to address the efficacy of online interventions for common mental health problems in clinical psychology were assessed, in terms of their methodological quality, in two time windows right before and after the COVID-19 outbreak. We observed a decline affecting various aspects of the research process. Articles published during COVID-19 show less frequent use of pre-registration, less randomization of participant allocation, fewer RCTs, and more frequent use of complete case analyses, as well as a shorter acceptance time in journals. Most of the detected problems are not new. In fact, our results also revealed especially important generalized shortcomings in this area of clinical research regarding online psychological interventions, although some of them may have been exacerbated during the pandemic. Given the important consequences for wellness and mental health in large sections of the population, the present results should motivate a more careful consideration of methodological issues in the field of online psychological interventions in general, and during times of increased societal stress and urgency in particular.

Contributed to conception and design: SSF, MAV, LM, MB

Contributed to acquisition of data: CRP, MB

Contributed to analysis and interpretation of data: CRP, MAV, LM, SSF, MB

Drafted and/or revised the article: CRP, SSF, MAV, LM, MB

Approved the submitted version for publication: CRP, SSF, MAV, LM, MB

Many thanks to Gonzalo García-Castro for his invaluable classes in the use of ggplot2, and for making the stays in Barcelona so pleasant. Academia is a kinder place because he is in it.

The project was funded by ‘Ayudas Fundación BBVA a Equipos de Investigación Científica SARS-CoV-2 y COVID-19’, grant nº #093_2020.

MAV is funded by AEI / UE (grant CNS2022-135346)

SSF is funded by the AGAUR Generalitat de Catalunya (2021 SGR 00911).

MB is funded by the Serra Hunter programme / Generalitat de Catalunya.

CRP is co-financed by the funds of the Recovery, Transformation, Resilience and Next Generation - EU plan of the European Union. Exp. INVESTIGO 2022-C23.I01.P03.S0020-0000031- Psychiatry Area.

The authors declare there are no competing interests.

The database used, as well as the scripts used, and the raw results can be found on the project page within the Open Science Framework: “COREVID: COronavirus Research EVIDence evaluation.” https://osf.io/t9gqj/. DOI: 10.17605/OSF.IO/T9GQJ

Allen, R. M. (2021). When peril responds to plague: Predatory journal engagement with COVID-19. Library Hi Tech, 39(3), 746–760. https://doi.org/10.1108/lht-01-2021-0011
Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6), 543–554. https://doi.org/10.1177/1745691612459060
Belli, S., Mugnaini, R., Baltà, J., & Abadal, E. (2020). Coronavirus mapping in scientific publications: When science advances rapidly and collectively, is access to this knowledge open to society? Scientometrics, 124(3), 2661–2685. https://doi.org/10.1007/s11192-020-03590-7
Bero, L. (2017). Addressing Bias and Conflict of Interest Among Biomedical Researchers. JAMA, 317(17), 1723–1724. https://doi.org/10.1001/jama.2017.3854
Bommier, C., Stœklé, H.-C., & Hervé, C. (2021). COVID-19: The urgent call for academic research in research ethics. Ethics, Medicine and Public Health, 18, 100679. https://doi.org/10.1016/j.jemep.2021.100679
Boot, W. R., Simons, D. J., Stothart, C., & Stutts, C. (2013). The Pervasive Problem With Placebos in Psychology: Why Active Control Groups Are Not Sufficient to Rule Out Placebo Effects. Perspectives on Psychological Science, 8(4), 445–454. https://doi.org/10.1177/1745691613491271
Candal-Pedreira, C., Pérez-Ríos, M., & Ruano-Ravina, A. (2022). Comparison of COVID-19 and non-COVID-19 papers. Gaceta Sanitaria, 36(6), 506–511. https://doi.org/10.1016/j.gaceta.2022.03.006
Chiu, K., Grundy, Q., & Bero, L. (2017). ‘Spin’ in published biomedical literature: A methodological systematic review. PLOS Biology, 15(9), e2002173. https://doi.org/10.1371/journal.pbio.2002173
Claesen, A., Vanpaemel, W., Maerten, A.-S., Verliefde, T., Tuerlinckx, F., & Heyman, T. (2023). Data sharing upon request and statistical consistency errors in psychology: A replication of Wicherts, Bakker and Molenaar (2011). PLOS ONE, 18(4), e0284243. https://doi.org/10.1371/journal.pone.0284243
Comtois, D. (2022). summarytools: Tools to Quickly and Neatly Summarize Data (1.0.1). https://CRAN.R-project.org/package=summarytools
Cuijpers, P., Smit, F., Bohlmeijer, E., Hollon, S. D., & Andersson, G. (2010). Efficacy of cognitive–behavioural therapy and other psychological treatments for adult depression: meta-analytic study of publication bias. The British Journal of Psychiatry: The Journal of Mental Science, 196(3), 173–178. https://doi.org/10.1192/bjp.bp.109.066001
Cybulski, L., Mayo-Wilson, E., & Grant, S. (2016). Improving transparency and reproducibility through registration: The status of intervention trials published in clinical psychology journals. Journal of Consulting and Clinical Psychology, 84(9), 753–767. https://doi.org/10.1037/ccp0000115
Eignor, D. R. (2013). The standards for educational and psychological testing. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 1. Test theory and testing and assessment in industrial and organizational psychology (pp. 245–250). American Psychological Association. https://doi.org/10.1037/14047-013
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/bf03193146
Fay, M. P. (2022). asht: Applied Statistical Hypothesis Tests (0.9.7). https://CRAN.R-project.org/package=asht
Fernández Pinto, M. (2020). Open Science for private Interests? How the Logic of Open Science Contributes to the Commercialization of Research. Frontiers in Research Metrics and Analytics, 5. https://doi.org/10.3389/frma.2020.588331
Ferrero, M., Vadillo, M. A., & León, S. P. (2021). Is project-based learning effective among kindergarten and elementary students? A systematic review. PLOS ONE, 16(4), e0249627. https://doi.org/10.1371/journal.pone.0249627
Fraley, R. C., & Vazire, S. (2014). The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power. PLOS ONE, 9(10), e109019. https://doi.org/10.1371/journal.pone.0109019
Fuentes, M. J. V. (2017). Challenges in Doing Research and Its Effects to the Research Output. In Ascendens Asia Journal of Multidisciplinary Research Conference Proceedings 1(3). https://api.semanticscholar.org/CorpusID:151304211
Grosjean, P., Ibanez, F., & Etienne, M. (2018). pastecs: Package for Analysis of Space-Time Ecological Series (1.3.21). https://CRAN.R-project.org/package=pastecs
Gruber, J., Prinstein, M. J., Clark, L. A., Rottenberg, J., Abramowitz, J. S., Albano, A. M., Aldao, A., Borelli, J. L., Chung, T., Davila, J., Forbes, E. E., Gee, D. G., Hall, G. C. N., Hallion, L. S., Hinshaw, S. P., Hofmann, S. G., Hollon, S. D., Joormann, J., Kazdin, A. E., … Weinstock, L. M. (2021). Mental health and clinical psychological science in the time of COVID-19: Challenges, opportunities, and a call to action. American Psychologist, 76(3), 409–426. https://doi.org/10.1037/amp0000707
Ho, Y.-S., Fu, H.-Z., & McKay, D. (2021). A bibliometric analysis of COVID-19 publications in the ten psychology-related Web of Science categories in the social science citation index. Journal of Clinical Psychology, 77(12), 2832–2848. https://doi.org/10.1002/jclp.23227
Holmes, E. A., O’Connor, R. C., Perry, V. H., Tracey, I., Wessely, S., Arseneault, L., Ballard, C., Christensen, H., Cohen Silver, R., Everall, I., Ford, T., John, A., Kabir, T., King, K., Madan, I., Michie, S., Przybylski, A. K., Shafran, R., Sweeney, A., … Bullmore, E. (2020). Multidisciplinary research priorities for the COVID-19 pandemic: A call for action for mental health science. The Lancet Psychiatry, 7(6), 547–560. https://doi.org/10.1016/s2215-0366(20)30168-1
Horbach, S. P. J. M. (2021). No time for that now! Qualitative changes in manuscript peer review during the Covid-19 pandemic. Research Evaluation, 30(3), 231–239. https://doi.org/10.1093/reseval/rvaa037
Joshy, A. S., Thomas, C., Surendran, S., & Undela, K. (2022). Can We Really Trust the Findings of the COVID-19 Research? Quality Assessment of Randomized Controlled Trials Published on COVID-19. Infectious Diseases (except HIV/AIDS). Preprint. https://doi.org/10.1101/2022.04.15.22273881
Joy, J. E., Penhoet, E. E., Petitti, D. B., & Cancer, I. of M. (US) and N. R. C. (US) C. on N. A. to E. D. and D. of B. (2005). Common Weaknesses in Study Designs. In Saving Women's Lives: Strategies for Improving Breast Cancer Detection and Diagnosis. National Academies Press (US). https://www.ncbi.nlm.nih.gov/books/NBK22323/
Jung, R. G., Di Santo, P., Clifford, C., Prosperi-Porta, G., Skanes, S., Hung, A., Parlow, S., Visintini, S., Ramirez, F. D., Simard, T., & Hibbert, B. (2021). Methodological quality of COVID-19 clinical research. Nature Communications, 12(1), 943. https://doi.org/10.1038/s41467-021-21220-5
Jüni, P., Witschi, A., Bloch, R., & Egger, M. (1999). The hazards of scoring the quality of clinical trials for meta-analysis. JAMA, 282(11), 1054–1060. https://doi.org/10.1001/jama.282.11.1054
Kapp, P., Esmail, L., Ghosn, L., Ravaud, P., & Boutron, I. (2022). Transparency and reporting characteristics of COVID-19 randomized controlled trials. Infectious Diseases (except HIV/AIDS). Preprint. https://doi.org/10.1101/2022.02.03.22270357
Kataoka, Y., Oide, S., Ariie, T., Tsujimoto, Y., & Furukawa, T. A. (2021). The Methodological Quality Score of COVID-19 Systematic Reviews is Low, Except for Cochrane Reviews: A Meta-epidemiological Study. Annals of Clinical Epidemiology, 3(2), 46–55. https://doi.org/10.37737/ace.3.2_46
Keenan, C., Noone, C., McConnell, K., & Cheng, S. H. (2021). A rapid response to the COVID-19 outbreak. Journal of EAHIL, 17(2), 16–20. https://doi.org/10.32384/jeahil17464
Kerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4
Khatter, A., Naughton, M., Dambha-Miller, H., & Redmond, P. (2021). Is rapid scientific publication also high quality? Bibliometric analysis of highly disseminated COVID-19 research papers. Learned Publishing, 34(4), 568–577. https://doi.org/10.1002/leap.1403
Korkmaz, S., Goksuluk, D., & Zararsiz, G. (2021). MVN: Multivariate Normality Tests. 6.
Lakens, D. (2019). The Value of Preregistration for Psychological Science: A Conceptual Analysis. PsyArXiv. Preprint. https://doi.org/10.31234/osf.io/jbh4w
May, G. S., DeMets, D. L., Friedman, L. M., Furberg, C., & Passamani, E. (1981). The randomized clinical trial: Bias in analysis. Circulation, 64(4), 669–673. https://doi.org/10.1161/01.cir.64.4.669
Mazziotta, M., & Pareto, A. (2013). Methods for constructing composite indices: One for all or all for one. Rivista Italiana di Economia Demografia e Statistica, 67(2), 67–80.
Meyer, D., Zeileis, A., & Hornik, K. (2023). _vcd: Visualizing Categorical Data_. R package version 1.4-11. https://CRAN.R-project.org/package=vcd
Miguel, S., Tannuri de Oliveira, E. F., & Cabrini Grácio, M. C. (2016). Scientific Production on Open Access: A Worldwide Bibliometric Analysis in the Academic and Scientific Context. Publications, 4(1), 1. https://doi.org/10.3390/publications4010001
Mohan, A., Chacko, J., & Mohan, P. (2020). Current trends in research influencing its scientific value. International Journal of Scientific Research, 9(8), 1–2. https://doi.org/10.36106/ijsr/3130290
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & The PRISMA Group. (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med, 6(7), e1000097. https://doi.org/10.1371/journal.pmed1000097
Nieto, I., Navas, J. F., Vázquez, C. (2020). The quality of research on mental health related to the COVID-19 pandemic: A note of caution after a systematic review. Brain, Behavior, Immunity - Health, 7, 100123. https://doi.org/10.1016/j.bbih.2020.100123
Nosek, B. A., Errington, T. M. (2020). What is replication? PLoS Biology, 18(3), e3000691.
Palayew, A., Norgaard, O., Safreed-Harmon, K., Andersen, T. H., Rasmussen, L. N., Lazarus, J. V. (2020). Pandemic publishing poses a new COVID-19 challenge. Nature Human Behaviour, 4(7), 666–669. https://doi.org/10.1038/s41562-020-0911-0
Park, J. J. H., Mogg, R., Smith, G. E., Nakimuli-Mpungu, E., Jehan, F., Rayner, C. R., Condo, J., Decloedt, E. H., Nachega, J. B., Reis, G., Mills, E. J. (2021). How COVID-19 has fundamentally changed clinical research in global health. The Lancet Global Health, 9(5), e711–e720. https://doi.org/10.1016/s2214-109x(20)30542-8
Peduzzi, P., Wittes, J., Detre, K., Holford, T. (1993). Analysis as-randomized and the problem of non-adherence: An example from the Veterans Affairs Randomized Trial of Coronary Artery Bypass Surgery. Statistics in Medicine, 12(13), 1185–1195. https://doi.org/10.1002/sim.4780121302
Posit team. (2023). RStudio: Integrated Development Environment for R. Posit Software, PBC, Boston, MA. http://www.posit.co/
Quinn, T. J., Burton, J. K., Carter, B., Cooper, N., Dwan, K., Field, R., Freeman, S. C., Geue, C., Hsieh, P.-H., McGill, K., Nevill, C. R., Rana, D., Sutton, A., Rowan, M. T., Xin, Y. (2021). Following the science? Comparison of methodological and reporting quality of covid-19 and other research from the first wave of the pandemic. BMC Medicine, 19(1), 46. https://doi.org/10.1186/s12916-021-01920-x
R Core Team. (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Reeves, B. C. (2008). Principles of research: Limitations of non-randomized studies. Surgery (Oxford), 26(3), 120–124. https://doi.org/10.1016/j.mpsur.2008.02.004
Ripley, B., Venables, B., Bates, D. M., ca 1998), K. H. (partial port, ca 1998), A. G. (partial port, Firth, D. (2022). MASS: Support Functions and Datasets for Venables and Ripley's MASS (7.3-57). https://CRAN.R-project.org/package=MASS
Ruiz-Real, J. L., Nievas-Soriano, B. J., Uribe-Toril, J. (2020). Has Covid-19 Gone Viral? An Overview of Research by Subject Area. Health Education Behavior, 47(6), 861–869. https://doi.org/10.1177/1090198120958368
Sackett, D. L., Gent, M. (1979). Controversy in counting and attributing events in clinical trials. New England Journal of Medicine, 301(26), 1410–1412. https://doi.org/10.1056/nejm197912273012602
Sakamoto, B., Boer, G. C., de Lima, C., Paiva, C. E. (2022). Influence of the COVID-19 pandemic in the academic production of health researchers from public universities of the State of São Paulo, Brazil. Manuscripta Médica, 5, 31–42. https://doi.org/10.59255/mmed.2022.71
Salim, A., Mackinnon, A., Christensen, H., Griffiths, K. (2008). Comparison of data analysis strategies for intent-to-treat analysis in pre-test–post-test designs with substantial dropout rates. Psychiatry Research, 160(3), 335–345. https://doi.org/10.1016/j.psychres.2007.08.005
Sammons, M. T., VandenBos, G. R., Martin, J. N. (2020). Psychological Practice and the COVID-19 Crisis: A Rapid Response Survey. Journal of Health Service Psychology, 46(2), 51–57. https://doi.org/10.1007/s42843-020-00013-2
Scheel, A. M. (2022). Why most psychological research findings are not even wrong. Infant and Child Development, 31(1), e2295. https://doi.org/10.1002/icd.2295
Scheliga, K., Friesike, S. (2014). Putting open science into practice: A social dilemma? First Monday. https://doi.org/10.5210/fm.v19i9.5381
Schulz, K. F., Altman, D. G., Moher, D., for the CONSORT Group. (2010). CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMJ, 340(mar23 1), c332–c332. https://doi.org/10.1136/bmj.c332
Sella, F., Raz, G., Cohen Kadosh, R. (2021). When randomisation is not good enough: Matching groups in intervention studies. Psychonomic Bulletin Review, 28(6), 2085–2093. https://doi.org/10.3758/s13423-021-01970-5
Seyed, A. S., Karimi, A., Shobeiri, P., Nowroozi, A., Mehraeen, E., Afsahi, A. M., Barzegary, A. (2021). Psychological symptoms of COVID-19 epidemic: A systematic review of current evidence. Psihologija, 54(2), 173–192. https://doi.org/10.2298/psi200703035s
Simmons, J. P., Nelson, L. D., Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Spring, B. (2007). Evidence-based practice in clinical psychology: What it is, why it matters; what you need to know. Journal of Clinical Psychology, 63(7), 611–631. https://doi.org/10.1002/jclp.20373
Sterne, J. A., Hernán, M. A., Reeves, B. C., Savović, J., Berkman, N. D., Viswanathan, M., Henry, D., Altman, D. G., Ansari, M. T., Boutron, I., Carpenter, J. R., Chan, A.-W., Churchill, R., Deeks, J. J., Hróbjartsson, A., Kirkham, J., Jüni, P., Loke, Y. K., Pigott, T. D., … Higgins, J. P. (2016). ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ, 355.
Sterne, J. A., Savović, J., Page, M. J., Elbers, R. G., Blencowe, N. S., Boutron, I., Cates, C. J., Cheng, H.-Y., Corbett, M. S., Eldridge, S. M., Emberson, J. R., Hernán, M. A., Hopewell, S., Hróbjartsson, A., Junqueira, D. R., Jüni, P., Kirkham, J. J., Lasserson, T., Li, T., … Higgins, J. P. T. (2019). RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ (Clinical Research Ed.), 366, l4898. https://doi.org/10.1136/bmj.l4898
Tackett, J. L., Lilienfeld, S. O., Patrick, C. J., Johnson, S. L., Krueger, R. F., Miller, J. D., Oltmanns, T. F., Shrout, P. E. (2017). It’s Time to Broaden the Replicability Conversation: Thoughts for and From Clinical Psychological Science. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 12(5), 742–756. https://doi.org/10.1177/1745691617690042
Tackett, J. L., Miller, J. D. (2019). Introduction to the special section on increasing replicability, transparency, and openness in clinical psychology. Journal of Abnormal Psychology, 128(6), 487–492. https://doi.org/10.1037/abn0000455
Tennant, J. P., Waldner, F., Jacques, D. C., Masuzzo, P., Collister, L. B., Hartgerink, Chris. H. J. (2016). The academic, economic and societal impacts of Open Access: An evidence-based review. F1000Research, 5, 632. https://doi.org/10.12688/f1000research.8460.3
Tenopir, C., Rice, N. M., Allard, S., Baird, L., Borycz, J., Christian, L., Grant, B., Olendorf, R., Sandusky, R. J. (2020). Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide. PLoS ONE, 15(3), e0229003. https://doi.org/10.1371/journal.pone.0229003
The jamovi project. (2021). Jamovi. https://www.jamovi.org/
Torre, M. de la, Pardo, R., Colegio Oficial de Psicólogos (España). (2018). Guía para la intervención telepsicológica.
Usher, K., Bhullar, N., Jackson, D. (2020). Life in the pandemic: Social isolation and mental health. Journal of Clinical Nursing, 29(15–16), 2756–2757. https://doi.org/10.1111/jocn.15290
Usher, K., Durkin, J., Bhullar, N. (2020). The COVID-19 pandemic and mental health impacts. International Journal of Mental Health Nursing, 29(3), 315–318. https://doi.org/10.1111/inm.12726
Vanpaemel, W., Vermorgen, M., Deriemaecker, L., Storms, G. (2015). Are We Wasting a Good Crisis? The Availability of Psychological Research Data after the Storm. Collabra, 1(1), 3. https://doi.org/10.1525/collabra.13
Vines, T. H., Albert, A. Y. K., Andrew, R. L., Débarre, F., Bock, D. G., Franklin, M. T., Gilbert, K. J., Moore, J.-S., Renaut, S., Rennison, D. J. (2014). The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. https://doi.org/10.1016/j.cub.2013.11.014
Warnes, G. R., Bolker, B., Lumley, T., SAIC-Frederick, and R. C. J. C. from R. C. J. are C. (2005), Program, I. F. by the I. R., NIH, of the, Institute, N. C., NO1-CO-12400, C. for C. R. under N. C. (2022). gmodels: Various R Programming Tools for Model Fitting (2.18.1.1). https://CRAN.R-project.org/package=gmodels
Wells, G., Shea, B., O.'Connell, D., Peterson, J., Welch, V., Losos, M., Tugwell, P. (2013). The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp
Wicherts, J. M. (2016). Chapter 13: Data re-analysis and open data. In Plucker Makel (Eds.), Doing Good Social Science: Trust, Accuracy, Transparency. American Psychological Association.
Wicherts, J. M., Bakker, M., Molenaar, D. (2011). Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLOS ONE, 6(11), e26828. https://doi.org/10.1371/journal.pone.0026828
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org/
Wickham, H., François, R., Henry, L., Müller, K., RStudio. (2022). dplyr: A Grammar of Data Manipulation (1.0.9). https://CRAN.R-project.org/package=dplyr
Wickham, H., Girlich, M., RStudio. (2022). tidyr: Tidy Messy Data (1.2.0). https://CRAN.R-project.org/package=tidyr
Wilson, D. B. (2019). Systematic Coding. In H. Cooper, L. V. Hedges, J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (pp. 153–172). Russell Sage Foundation. https://doi.org/10.7758/9781610448864.12
Zarea Gavgani, V. (2020). Does Covid-19 Change the Direction of Citation Impact and Social Impact of Research Articles? Depiction of Health, 11(3), 200–201. https://doi.org/10.34172/doh.2020.27
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Supplementary Material