UvA-DARE (Digital Academic Repository) A Roadmap to Large-Scale Multi-Country Replications in Psychology

and technological advances provide opportunities to replicate studies across a wide range of countries and settings to investigate whether these findings are universally applicable, limited to specific countries, or vary in magnitude depending on settings. Researchers from around the world connect to revisit such findings collaboratively, adapt the original design to the Zeitgeist, integrate new knowledge to improve statistical analyses, and broaden the scope by testing effects globally – or at least in as many countries, as budget and feasibility allow. We currently observe multiple international consortia conducting large-scale multi-country replications. How do such collaborations form and how do they approach these complex investigations? This paper brings together researchers from different initiatives that conduct replications on an international scale to outline approaches and summarises what we have learned in applying them: Junior Researcher Programme (JRP), Psychological Science Accelerator (PSA), ManyBabies, Collaborative Open-science REsearch (CORE), and International Study of Metanorms (ISMN). We describe different ways for study selection, methodological approaches, statistical analyses, ethical issues, and most importantly, how the different collaborations formed and how team communication worked. We look in detail at challenges of including typically underrepresented countries in psychological science, not only in terms of data collection but also in making it possible for local researchers to contribute. This paper provides a structured insight into how different collaborations work and issues to consider for anyone who seeks to conduct a multi-country replication in psychology, or looking for additional perspectives to their existing plan. We close the article with a checklist built as a helpful tool for colleagues putting together their study protocols for such efforts – and invite them to collaboratively expand it in the future.


Introduction
The replication and reproducibility crisis (Pashler & Wagenmakers, 2012;Wiggins & Christopherson, 2019) has eroded trust in classic findings from psychology and be-havioural sciences (Anvari & Lakens, 2018;Wingen et al., 2020). Revisiting such findings and investigating whether widely cited and applied concepts can be replicated and generalised across time, context, and countries is more and more visible and encouraged by researchers and journals.
Methodological and technological advances, as well as improvements in data collection and management practices can now be utilised to update our knowledge about theory and findings and expand further to reach populations around the globe. These advances provide new opportunities for initiatives that are sometimes labelled Big Team Science: collaborations "involving a relatively large number of collaborators who may be dispersed across labs, institutions, disciplines, cultures, and continents" (Forscher et al., 2020, p. 2). For example, to test their 1979 Nobel prize-winning Prospect Theory, Daniel Kahneman and Amos Tversky relied on a small random sample of students from the US, Sweden, and Israel. Using collaborative online tools, Ruggeri and colleagues (2019) recently replicated patterns of prospect theory across 19 countries and in 13 languages with over 4,000 participants. This project was facilitated with the help of the Junior Researcher Programme (JRP), an initiative that provides opportunities for early-career researchers in psychology (jrp.pscholars.org; Jarke, 2021;Ruggeri, 2020). Multi-country replication initiatives that started earlier include the Open Science Collaboration (2012), ManyLabs (e.g., Ebersole et al., 2014;Klein et al., 2014), or the Collaborative Replications and Education Project (CREP; Grahe et al., 2013). Another approach to such multi-country replications is facilitated by the Psychological Science Accelerator (PSA; Beshears et al., 2020;Moshontz et al., 2018), an organisation focused on producing psychological science that is generalizable and reliable by utilising their distributed network of labs.
In this paper, we aim to synthesise insights gained from several multi-country collaborations. In two online panel discussions Jarke et al., 2021), we brought together researchers who participated in and organised multi-country replication efforts to discuss key aspects of such undertakings (see Table 1 for an overview).
Based on these discussions, we summarise structured insights and experiences with different approaches on key topics, including study selection, replication criteria, translation, selection of potential collaborators, team communication, data collection methods and privacy issues, gaining ethical approval for multiple countries, and statistical approaches. We also address the inclusion of underrepresented countries and outline hurdles in including colleagues from these areas. We conceptualised this paper as a roadmap to support researchers planning a multi-country replication so that they can incorporate what has been learned in previous attempts and tailor successful approaches to their lab's situation. Some consortia already have compiled information on their approaches in detail. These are included in the accompanying reading list to this paper (Appendix 1). We discuss our insights chronologically, starting at the beginning of a study until publication and beyond. In addition, we compiled a checklist of lessons learned (Appendix 2). This checklist is not a comprehensive "must-have" but rather a tool that summarises considerations for preparing a multi-country replication. Researchers already involved in such efforts may also find suggestions and insights that can strengthen their existing approaches or add a perspective they had not previously considered. While this paper primarily focuses on replications including large numbers of labs and researchers, many insights are also applicable to replications within fewer countries or settings.

Building a Consortium: Identifying Collaborators and Team Communication
Perhaps the easiest way to build a consortium 1 of collaborators is to draw on existing networks. Uhlmann et al. (2019) describe multi-lab collaborations along two axes, one being inclusiveness versus selectivity (i.e., who can participate and make decisions), the other being communication (i.e., is there constant collaboration, or does the PI "collect" and synthesise the work of otherwise mostly independent units). We outline our own experiences below. The JRP draws on its network of programme alumni and current team members, with current programme interns usually leading a country team in collaboration with students participating in GLOBES -a programme at Columbia University, which connects undergraduate students in behavioural sciences to researchers. The majority of collaborators are early-career researchers (ECRs; here: students and postdocs who make up small project teams each year but often go on to participate in larger collaborations), whereas the project itself is led by an established academic. JRP interns and alumni can be integrated into projects without further screening, as a working relationship and trust are already established through them having participated in a summer school and a one-year research project supported by the programme at that point. Participating researchers are briefed usually five to six months before activities commence and learn about their responsibilities, the project timeline, and how communication will be facilitated (Ruggeri, 2020). Shortly before and during data collection, communication is facilitated via daily briefing emails with updates, key developments, and reminders. These are supported by group calls and a slack channel for discussion between country teams as well as an FAQ page for providing important information across country teams. While the first JRP-organised multi-country replication (Ruggeri et al., 2020) was conducted with most contributors on-site in one place, the two following projects have been conducted online (Ruggeri et al., 2021(Ruggeri et al., , 2022 due to the COVID-19 pandemic. Although online work clearly allowed for more contributors, it also constitutes a much higher workload for the project leaders and administrators. Further, with the JRP's contributors being mostly ECRs, the in-person approach also provided a unique and motivating experience We refer to a consortium in this context as any formal or informal collaboration between multiple labs 1 A Roadmap to Large-Scale Multi-Country Replications in Psychology Collabra: Psychology Hansika Kapoor Eriksson et al. (2021) and ECRs benefit more by forming stronger relationships with international colleagues. CORE works with a different model in which the Principal Investigator (PI) conducts all replications with students as part of the coursework in a one-semester course, publicly sharing each stage of replication, and inviting international involvement and collaborations throughout. Once projects are finalised at the end of the semester, ECRs are publicly invited to take the lead over the completed projects, verifying and extending the available outputs, and then helping bring these to publication in journals. The PI works with each ECR on their led projects, and over time, with gained experience, the ECRs support each other on specific issues or initiatives, and join forces in tackling new projects.
The PSA represents a globally distributed network of labs located on all populated continents (see https://psysciacc.org/map/ for an updated map of all labs) organising data collection of democratically chosen projects (Moshontz et al., 2018). The network includes one director and four associate directors that are elected by all members. In addition, the network is coordinated by several committees focusing on different aspects of the study process such as study selection, ethics, translation and cultural diversity, community building, project monitoring, data and methods, training, and funding. The network regularly issues calls for project submissions from both inside and outside the network, which are first screened by the selection committee, then sent out for review by at least three experts, and finally voted on by all PSA members. Once a project has been successfully selected, PSA members are able to sign up for contributing to the project by collecting data, translating questionnaires or coordinating aspects of the project. All researchers can sign up to become a PSA member. Therefore, the PSA focuses both on inclusiveness and a mode of constant collaboration.
ManyBabies is a network of developmental psychologists distributed across a range of large-scale replication pro-jects. New members opt in to the network and join individual projects through the organization's website. The governing board reviews new project proposals and advertises approved projects to the general listserv, where collaborators can volunteer to contribute to any and all parts of a project, including study design, data collection, analysis, and writing. The organization also has several dedicated positions, including an executive director, office assistant, and postdocs who support the range of ongoing studies. ManyBabies is committed to open science and uses a consensus-based approach for all project decisions to promote transparency, inclusivity, diversity, and collective governance.

Authorship and credit
No matter how a consortium is established, it is advantageous if everyone's responsibilities and the consequences of inaction are clear from the beginning. These responsibilities should ideally be listed in the final output (e.g., as a Contributor Role Taxonomy, or short CRediT; Allen et al., 2019;Holcombe et al., 2020, adopted also by the PSA) to transparently report what each author has contributed. To set expectations, it is also advisable to decide on the order of authors (or principles for determining author order) as early as possible and be clear on who will be responsible for article processing fees. For instance, the PSA has specific roles for each project (e.g., data management, translation coordinator, ethics coordinator) and these roles typically qualify for different authorship tiers; but also adapts criteria on a case-by-case basis beforehand. ManyBabies has a guiding example for determining authorship (https://manybabies.github.io/authorship) but the leadership team for each new project is responsible for defining and communicating authorship guidelines. Similarly, CORE employs a detailed guide for determining authorship (https://mgto.org/joinmassreplication).
We suggest that authorship be granted to all collaborators who contribute substantially to any crucial parts of the study. A collaboration agreement that transparently outlines the criteria for co-authorship as well as how authorship order is determined can ensure expectations on this topic are aligned. Authorship may act as a strong individual, and career incentive to academic collaborators, increases the public representation of researchers from marginalised backgrounds, and shows that contributions are credited and valued. In terms of administration, it should also be kept in mind that typing in all author names and data for submission can take many hours of work. Therefore, we urge journals to implement systems that allow for easier ways of providing such data, such as the option of filling in template spreadsheets that can be read and copied by the submission system. Big Team Science projects may run into other unexpected issues regarding authorship: For example, in a recent submission, the PSA Self-Determination Theory Collaboration (2022) could not list all 500 authors on the byline because this would have exceeded the journal's maximum page count. Instead, consortium authorship was adapted, which especially disadvantaged researchers from low-and middle-income countries and early career researchers: Institutional policies often do not recognize this type of authorship (for an extensive discussion about disadvantages resulting from group authorship and a call for change, see PSA's Open Letter 2 ).

Study Selection
The process of deciding which study to replicate can vary substantially from lab to lab and consortium to consortium. Approaches can take various forms from where the PI or their lab decide which study to replicate, how to do it, and whom to include (top-down) to consortia forming through loose ideas and developing goals and structure from the bottom-up.
Within the PSA, several labs or PIs submit a study proposal, the study selection committee checks its feasibility, the network votes on the blinded proposal, then the lead team of the accepted studies pre-registers the experiment, and publicly opens it up for potential collaborators. There are also more bottom-up approaches to study selection. For instance, the ManyBabies Consortium accepts study proposals from any researcher, which are then evaluated by the ManyBabies Governing Board. Accepted proposals are advertised to ManyBabies Consortium members and through related organisations, enabling interested researchers and labs to join new projects. Similarly, CREP starts projects through collaborators expressing interest first. The study Director looks for top-cited studies in a range of sub-disciplines of psychology. After being checked for feasibility by established researchers, feasible studies are sent to students and collaborators to rate on different aspects, such as feasibility and interest. The CREP administrator then takes those ratings and identifies the next studies for collaboration. After a study-specific admin team is put together for each study, labs are allowed to sign on to the project. On the other hand, replications conducted by the JRP have started with a small team around the PI deciding on the topic and detailed study plan first, before recruiting collaborators.

Feasibility and durability: Is this topic suitable for replication?
The most important criterion to decide on a topic is feasibility: whether a project is doable for the PI's team (planning and managing) and the partnered labs (execution). Key considerations include how easily the research can be executed across environments by all participating labs and whether all equipment is available across sites and whether necessary training has already been attained or can be provided. All parties need to carefully consider whether there are sufficient resources to coordinate such a large-scale effort, which can require substantial amounts of time-consuming administrative work. For example, the ManyBabies consortium has multiple staff members dedicated exclusively to non-research tasks. Basic considerations must include budget (for ethics review, study materials, personnel, etc.), whether research goals are attainable (e.g., study materials are clear and available), and whether the expected impact is worth the large-scale investment.
Replication targets can also be selected based on feasibility in terms of the replicators' qualifications, and the intended methods and target sample. In CORE, replication studies are designed by undergraduate students as part of a one-semester course with data collection designed using simplified survey platforms such as Qualtrics to be collected online using labour markets such as Amazon Mechanical Turk (with CloudResearch) and Prolific, mostly with US American and British participants. Therefore, in CORE the selected target articles are typically the highest impact classics that can be easily adjusted to an online design and with methods and statistics that are doable for undergraduate student level. Isager et al. (2020) propose another approach, based on a formalised model that can help guide the decision on choosing a topic, aiming to calculate an expected utility gain, which they label replication value. For this model, the authors consider the costs of the study, uncertainty about the claim before replication, the value of the scientific claim, and the expected utility of the claim before the replication as determinants for the expected utility gain. However, they also state that these variables are not necessarily exclusive and further, unexpected variables can influence the utility gain.

Direct replications
3 are usually defined as the repetition of a procedure, whereas conceptual replications test hypotheses or results of existing research using different methods or participant populations (Open Science Collaboration, 2015;Schmidt, 2009). In principle, both forms of replications can add valuable knowledge to the literature. A third possibility combining the two types is to run a direct replication with a conceptual replication extension added in a way that would have minimal impact on the direct replication yet allow for a comparison of the two types of replication. For example, Korbmacher et al. (2022) ran a direct replication of Kruger's (1999) above and below average effects which manipulated difficulty in a within-subject design, with an added extension on top of that introducing a manipulation in a between-subject design, resulting in a mixed design demonstrating the strength, robustness, and generalizability of the phenomenon. Chen et al. (2021) report a combination of two direct replications (Studies 1 and 2) and one additional conceptual replication (Study 3 building on the same design as in Study 1) within the same article.
A replication and extension design is especially helpful in disentangling the cause for failure in finding support for a novel contribution, whether the failure has to do with the novelty added or the replicability of the underlying phenomenon. Extensions can also add nuances to what is known, address potential weaknesses in the original design, or explore new directions that would advance the literature beyond the original's. For example, when Jakob et al. (2019) replicated an investigation of the relationship between membership in the fictional, fraternity-like Hogwarts houses from the Harry Potter franchise and psychometric measures of personality, the authors identified the concept of basic human values to be more in line with the description of these houses found in the books, and added this measurement to their questionnaire.
In addition to planning these technical aspects, it is also helpful to take a step back and consider the implications of the study design and its potential outcomes. Let's assume an effect replicates in the country where the original study was conducted, in a neighbouring country as well, but not in a third country, or across a specific set of states. The study would then look into not just whether an effect replicates, but also its generalisability. The two competing hypotheses here would be consistency in the effect vs. variation. This variability in specific countries can add nuance and allow for comparison of populations, even without adding extensions. For example, Hornsey et al. (2018) provide country-specific insights regarding political ideology and scepticism about climate change, highlighting how these relationships were stronger in the USA compared to the rest of the world. In a similar fashion, other studies can reveal how country or cultural differences might mediate the effect of interventions, as shown for social discounting (Tiokhin et al., 2019). As such, multi-country replications can be both replications and investigations into generalisability, showing either variation or robustness of a phenomenon across countries, or even means of data collection (e.g. near-identical results across platforms Mturk and Prolific, see Brick et al., 2021;Chandrashekar et al., 2022;Efendić et al., 2022).

Security and safety of collaborators and participants
Sometimes, a research topic may be incredibly valuable and relevant to both researchers and participants, but also too taboo in a country to study safely. This could involve research eliciting people's opinions on sensitive political, religious, health, social/moral issues, or sexuality, and can reach from these topics being frowned upon to being outright illegal to discuss in public. Even in situations where governing bodies or the laws of a country are not of concern, the community of a participant or collaborator may exact punishment if sensitive information is somehow made accessible. Consulting local collaborators in an informal manner (if possible) before deciding to go ahead with the replication is of utmost importance here, and if safety concerns are found, the best approach is to not conduct the study in that particular area. The security of participants and local researchers always comes before anything else.

Including Underrepresented Populations
Psychological research often lacks the inclusion of historically underrepresented populations (Thalmayer et al., 2021). However, their inclusion is key when trying to investigate phenomena globally. While researchers appear to be generally motivated to recruit collaborators and participants from outside more traditionally represented populations (e.g. those often summarised as western, educated, industrialised, rich and democratic [WEIRD], see also Arnett, 2008), they face a number of obstacles that limit the feasibility and perceived benefits of doing so. Below are some prevalent examples of such obstacles, along with suggestions for addressing them.

Funding
Reaching out to potential collaborators from different geographic areas often incurs costs that researchers are unprepared to bear. These may include costs for hiring and training local research assistants, high-quality translation of materials, travel associated with training and meeting collaborators, and incentivising participants. When multicountry efforts are not funded with an explicitly cross-cul-Note that even a "direct" replication is not exactly identical to the original investigation. They are usually as close as possible, with ideally only inevitable differences present (e.g. a new sample). Brandt et al. (2014) argue that "close" replications would be a more fitting term.

A Roadmap to Large-Scale Multi-Country Replications in Psychology
Collabra: Psychology tural approach in mind, it can be difficult for researchers to globalise their samples at later stages. In some cases, even well-funded studies designed to document cross-cultural phenomena can be difficult to implement because of restrictions from funding agencies requiring that purchases be made from companies based in the home country of the funder. Based on discussions in our panels and some literature (Azouaghe et al., 2020;Silan et al., 2021), funding appears to be the largest obstacle to achieving more globally representative samples in multi-country replications. Where the collaborations included here have made efforts to address these, they are summarised in Table 2 (When consulting this table, please note that the PI for JRP projects is not in the current authorship. Out of respect for the personal and professional sacrifices that the individual made toward the work, and that there will be an additional publication with extensive detail from the PI, we provide only brief information here). Consider also that the use of expensive software may provide a barrier to participating in a collaborative project for institutions that do not have necessary licenses. It is worth exploring if institutions are able to share licenses or provide project-based access, or considering open-source alternatives with bespoke edits made through expert collaborators.

Data vs. collaborators from underrepresented groups
Where funding is not a limiting factor, it is beneficial to approach cross-country replications with the mindset of not simply obtaining underrepresented or non-WEIRD data, but rather involving collaborators (researchers, clinicians, trainees, etc.) from underrepresented groups and regions in fundamental stages of the research process, ideally starting with the planning phase. This approach can help to broaden perspectives and mitigate potential power imbalances that may otherwise exist when lead researchers from high-resource institutions seek out collaborators from lower-resourced ones .
Part of making a study valuable to any group of people is approaching the topic of replication from all angles, including the local perspectives of researchers from all collaborating regions. Trying to conduct a replication in a vastly different culture than that of the original study may result in weaker replication findings, not due to issues in the methods, false positive findings, or a challenge to the central theory/phenomenon, but rather because that theory/phenomenon works differently in a different context in which the replication is conducted. For example, while conspiracy beliefs have been found to mediate the link between political ideology and risk perception in Americans, the effects seem to be much weaker in India (Puthillam & Kapoor, 2021). Conversely, a successful replication in a novel population may or may not mean that the studied phenomenon occurs in the same way across countries-it might also be the result of non-representative sampling.
Involving local collaborators from the outset of the project not only demonstrates consideration and respect but also benefits the research by ensuring that design decisions are cross-culturally informed. One option is to establish an "outreach committee" within the research group or consortium, that specialises in finding and contacting potential collaborators for specific tasks and studies. The PSA uses their community building and network expansion committee to find ways to engage with all members of the network (i.e. social events, conferences/seminars, slack threads, etc.) which in turn allows the committee to connect on a natural basis. For conceptual replications, inspiration could also be taken from the STRATEQ-2 project (Dujols & IJzerman, 2021), where researchers started developing an international questionnaire by first asking researchers from different countries around the globe to generate items relevant to the phenomenon at hand to gather what fits the respective cultural context.

Considering study value for participants and collaborators
The ability to recruit participants and collaborators will likely depend on how valuable a study's topic is to the local population it aims to sample . When considering a study replication, researchers are advised to consider (and ideally consult local collaborators about) how relevant the topic might be in different populations, as this might heavily influence participation rates or interest from potential collaborators. For instance, replicating a study about financial choices relating to health insurance costs might be of interest to US citizens but relatively inconsequential to someone in a country with universal healthcare provided by the state. Researchers can then either adapt the replication to better suit their target populations or, with the help of local collaborators, find ways to convey or augment the value and benefits (direct or indirect) of the research to potential participants. In the case of large studies requiring community investment or designed to impact communities through intervention, it may be most appropriate to involve community leaders and local stakeholders in ongoing conversations about both costs and benefits of the study. In determining adequate compensation, it is advisable to consult researchers and personnel with local knowledge, especially if a population may be vulnerable for economic or social reasons.

Strengths-based vs. deficit-based framing of cross-population comparisons
An important aspect of taking local perspectives into account is to frame research questions in a way that will adequately contextualise the study's results and allow room for the exploration of phenomena that may vary from those seen in the original study. One way of accomplishing this would be to use a strengths-based (as opposed to a deficitbased) model when comparing the original population with those from multi-country replications. Instead of viewing divergences in results as an indication of one population performing better at a task or better understanding a theory than the other population, it would be more meaningful to investigate how differing skill sets, values, goals, and incentives play into how different populations interact with the concepts being studied. As discussed in the previous A Roadmap to Large-Scale Multi-Country Replications in Psychology Collabra: Psychology section, certain concepts or relationships may not be statistically significant in some populations; take for instance conspiratorial blame and political ideology as laid out by Puthillam and Kapoor (2021) -a researcher could either look at this phenomenon through a deficit-based lens (e.g. the comparison population has less of a grasp on conspiracy theories than the original sample, or they're less 'clued into'/exposed to international media), or they could use a strengths-based approach to explore why the comparison population is not incentivized or motivated to link conspiratorial blame to their political ideologies. What aspects of their context (daily life challenges, governmental structure, etc.) might counterbalance their (potential) inclination to engage with certain conspiracy theories?

Methodological Approaches
Even for conceptual replications, the general methodology is often at least partially based on the methodology of the original study. However, there are several methodological aspects that require extra attention and adaptation due to the cross-cultural nature of multi-country replications, and these come with some challenges to consider before embarking on such a study. Also, note that multi-country collaborations focussing on existing research and materials can take various forms, including the validation of metrics in different countries and cultures, or a revalidation to see if an instrument measures the same construct in different populations (see also Ruggeri et al., 2019).

Pre-registering your project
Pre-registration is always advisable in confirmatory research (Munafò et al., 2017) but especially important in multi-country replication studies, where adaptations are likely needed for different countries. Where possible, this may take the form of a registered report: a publishing model in which a journal first approves a study protocol and later the full report with results and conclusions, including potential deviations from the original plan. While registered reports are no deus ex machina to solve issues with reproducibility, initial evidence supports that they promote reproducibility, transparency, and self-correction across disciplines as intended (Chambers & Tzavella, 2021;Scheel et al., 2021) and that they are perceived as having greater research quality and rigour compared to the standard publishing model (Soderberg et al., 2021). However, consider also that publishing a registered report can impact the timeline of your project, as review at this early stage can take a long time and may delay the start of your study. If this a concern-or where materials need to remain private before the conclusion of data collection-a time-stamped pre-registration (private or public) on platforms such as the OSF can ensure that any deviations from the planned methodology are traceable upon publication.

Ethics
Applying for ethical approval can be challenging when operating in multiple countries. In general, there are two options: a centralised approval from the PI's Institutional Review Board (IRB) or approval from the PI's institution and at least one IRB per country. It is also possible that institutions or journals require approval from each institution's IRB. Having only one approval is obviously the easier approach but may not always be possible. IRBs may refuse such a wide-ranging approval because their members are not able to review materials in all languages involved or cannot assess data privacy matters for all countries. In our experience, contributors should at least sign a letter confirming their commitment to upholding procedures exactly as outlined in the procedural plans submitted to the IRB. Where approvals from multiple IRBs are needed, timelines should account for the potential of considerable delays so that data collection can commence in all countries around the same time, as global events may otherwise interfere with the validity of data collected at different time points. In the experience of PSA, approvals can take different forms and may take any time between just one week and nine months to be finalised, which might even lead to abandoning plans for a country. However, having obtained approval from one institution can often speed up the process as it may provide another IRB with extra confidence where other colleagues have already approved it. This should also be considered when making the choice of a private pre-registration versus registered report, as a journal may require all ethical approvals before consideration, which can cause delays in the project timeline. Further, you can find yourself in a situation where one IRB requires a change in approach (e.g., dropping a culturally sensitive question). This impact will then have to be reconsidered from a methodological perspective.
In addition to these administrative aspects, there are specific ethical considerations to keep in mind when planning research in multiple countries. Primarily, this means discussing cultural aspects with the local collaborators in detail and also involving local stakeholders for their approval. Especially when the research requires intense time commitment, the consortium should consider what to provide in return. For example, linguists commonly create dictionaries based on their work and share them with participants. Further, the age of consent for participation may differ between countries and IRBs may require a different age minimum, an aspect which makes sense to be checked with local collaborators early.

Translation and adaptation of materials
Another time-consuming aspect of multi-country studies is the translation of study materials. It may take multiple iterations and checks to confirm that materials are not only translated, but also that the translated text is perceived as intended in the base version of the materials. The forward-and back-translation protocol (Brislin, 1970) seems to be the most commonly utilised method in psychological research and requires at least one bilingual, as well as additional native speakers of the target language. Ideally, the forward-and back-translation should be complemented by methods that can test the conceptual (i.e., construct has a comparable meaning in all countries/cul-  connecting data to new research question), coauthorship on all subsequent papers emerging from the main dataset.

Strengths-based approach
JRP encourages country-specific analyses, examples can be found in the appendix of Ruggeri et al. (2020) N/A By involving scientists from a diverse range of research sites in all project stages, MB provides a platform for perspectives that are often underrepresented within scientific discourse. MB projects have generally interpreted differential performance across participants and populations as evidence for the many paths that typical cognitive development may take.
N/A ISM encourages country-level analyses and contextualisation of meta-norms in contemporary times (e.g., a follow-up study on COVID-19 related metanorms)

A Roadmap to Large-Scale Multi-Country Replications in Psychology
Collabra: Psychology tures), instrument (i.e., construct is operationalized in a similar way across countries/cultures), item (i.e., construct can be measured with the same instrument across countries/cultures), and scalar (i.e., construct can be assessed on the same metric) equivalence of an instrument to meaningfully compare different cultures/countries (Hui & Triandis, 1985). Despite the prevalence of the forward-and back-translation method, evidence that this method yields the best results is lacking (Epstein et al., 2015). While some researchers consider back-translation indispensable, others argue that it does not necessarily guarantee equivalence and linguistic appropriateness of an instrument (Behr, 2017) and therefore advise against it (Epstein et al., 2015). In addition, there is no standardised method of translating research material in psychology (Cha et al., 2007;Epstein et al., 2015). This poses an issue as the back-translation process requires several people with different backgrounds/ expertise levels, making it difficult to achieve a truly valid translation where the target language is very rare or in regions where psychological research may not have a large presence. In these cases, it may be worth considering utilising a combination of translation techniques, depending on the resources and personnel available.
The PSA uses a variation of back-translation as its official method and also appoints language coordinators for each study.
The PSA method uses two translators, and goes as follows: When following the back-translation method, focusing on the feedback of non-academic speakers of the target language is crucial, as they are ensuring that materials are jargon-free, make practical/social (not just grammatical) sense in their cultural contexts, and are easy to follow for everyone. To put it bluntly: When your colleague tells you your questionnaire is fine, but your grandma tells you it does not make sense, your grandma is right. 4 Sometimes such feedback may require changes to very fundamental parts of the materials, such as items in a scale that do not apply or make sense in a particular culture. Another issue to consider is adapting materials to local dialects or versions rather than leaving them in the 'standard' version of the language (such as Austrian German vs. Standard High German) -an extra measure to ensure cultural validity.
Whether such an adaptation is adequate should be evaluated by a local contributor. Additionally, in countries with multiple major languages where people may be fluent in more than one, discussions to decide which languages the materials will be translated to are helpful. Given the culturally dependent and, at times, subjective nature of translation, transparency in the process is paramount. One of the easiest ways to maintain transparency is to keep track of changes made to materials through the translation process, and comment on the rationale behind those changes. This can be done in an informal way at first (such as commenting on a Word document or using the Track Changes feature), and later refined into an easily digestible format (such as a table with every major change to each material; see Supplementary materials in Ruggeri et al., 2020Ruggeri et al., , 2021. Especially when replicating older studies, adaptations to the base materials are sometimes needed. For example, studies including financial decisions need to be adapted to the local currency, but also to inflation, and anchored to income levels of the country (for an example, see the appendix of Ruggeri et al., 2020). Similarly, the Zeitgeist might make adaptation of materials necessary, as encountered in Wagenmakers and colleagues' (2016) attempt at replicating experiments underlying the facial feedback hypothesis, where the pictures used in the original study to evoke laughs were deemed not funny anymore.

Data Collection
The process of collecting and storing data from multiple countries differs across consortia, with some collecting data centrally (i.e., the PI saves data from multiple countries into their own database) and others collecting data through individual researchers or labs (i.e., collaborators store the data they collected in their countries, and then pass it on to the PIs). Both, national data protection laws and ethics committees, determine some aspects of how data is supposed to be collected and stored. While these determinants provide guidance on establishing a data collection protocol, the multitude of regulations also limits how data can be collected and shared and may require differences in how data is collected and stored from country to country, which can cause methodological inconsistencies.

Data Privacy
Data privacy should be considered in parallel to ethical issues before applying to an IRB. IRB applications can be complicated when multiple countries with different data protection laws are involved. It is advisable to think about data protection from a general ethical perspective first and try to adhere to this set of standards throughout the project, even if the law in some countries may not require it. In most aspects, the EU's General Data Protection Regulation (GDPR) appears to be the strictest law in that sense. For those coming in contact with the GDPR for the first time, it is important to know that it only applies if personal data are collected, but is not applicable if individuals are not directly identifiable from the answers in a research endeavour (i.e. the data is pseudonymised, see Recital 26). When collecting data online, researchers should pay special attention not to collect IP addresses or geolocation of respondents (as these may make participants identifiable). GDPR also usually does not apply to data collection in research contexts (Article 89), so the applicability should be checked in advance -where possible with the help of a legal advisor or data privacy expert, such as the institution's Data Protection Officer. As GDPR is sometimes poorly understood and interpreted differently by IRBs, both outside and inside Europe, it is advisable to plan additional time to clarify these issues. Independent of which data laws apply, collecting non-personal and anonymised data is always the least likely to cause problems and avoids having to add complicated and long notices that can scare people off. If personal identifiable data are involved, researchers may also need to set up Data Sharing Agreements between the consortium partners (these regulate how the collected data are shared between partners and processed), or even Data Privacy Impact Assessments (a process to identify and minimise data protection risks which is necessary when processing sensitive data on a large scale). As of 2022, this also largely still applies to the UK, as the Data Protection Act 2018 is the UK's adaptation of GDPR. Note that, depending on the category of data concerned, countries will often have specific laws pertaining solely to those categories (e.g., HIPAA or FERPA in the US). Based on GDPR categories and the authors' experiences, the following categories of data always warrant special attention to details and protection and will therefore be more complicated to handle:

Statistical Approach
If you ask multiple data analysts what analysis to choose for your research project, chances are high you will find yourself with many different replies (Menkveld et al., 2021;Silberzahn et al., 2018). However, there are multiple aspects that should always be considered when choosing the statistical approach to a multi-country replication. For direct replications, studies should be powered to detect an effect at least equal to the one found in the original study, ideally even smaller -if there is no existing cross-cultural comparison, it is possible that the effect you are investigating may be smaller in some of the target countries. More gen-erally, due to the combination of publication bias and small sample sizes, reported effect sizes in the literature tend to be inflated (Gelman & Carlin, 2014;Lane & Dunlap, 1978). Due to this "winner's curse", it is advisable to power replications to a smaller effect size than the one reported in the literature (e.g., using the small telescopes approach; Simonsohn, 2015). Where resources are constrained, sequential analyses can also be considered: This approach, commonly used in medical trials (where stakes for participants are high), uses interim analyses to observe if a sufficient sample has been reached while controlling for Type-1 error rate (Lakens, 2014).
In addition, researchers should take into consideration whether it is advantageous and feasible to power the study for individual regions or groups. One option is to replicate the effect in one country only before opting for a multicountry replication. That is specifically useful when the original evidence may be old and conducted on small samples. It is also an option to repeat the original analysis (often frequentist) and additionally analyse the data with an equivalent Bayesian analysis. In any case, the statistical approach(es) should be part of the pre-registration, and any additional, exploratory analyses should be labelled as such.
To estimate the necessary sample size per country it should be considered whether the approach allows for a multi-level structure. If so, the number of countries and the variation of the effect at that level should be taken into consideration.
For conceptual replications, assessing sample size is more complicated. Assumptions on effect sizes should be as conservative as possible, yet might yield unrealistically large minimum sample sizes. Lakens (2021) provides detailed discussions on different approaches and highlights that, depending on the justification for a sample size, it should be considered "1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are plausible in a specific research area" (p. 1). When powering your study, you should also consider the practical implications of the effect size you are aiming for: while an extremely small but significant effect may provide insight, its practical relevance may be nil (Götz et al., 2021; see also reply by Primbs et al., 2021). While we would clearly not advise against research for the sole purpose of gaining knowledge in general, practical relevance is surely a key consideration for such large-scale endeavours where resources could be used to investigate potentially more beneficial matters. Anvari and colleagues (2021) provide an insightful discussion on this matter.

Reporting and Disseminating Findings
Once the study is completed, it is time for the exciting part of sharing the results with the world. The unique advantage in communicating results of a multi-country replication lies in the opportunity to 1) share results in every author's native language and 2) highlight country-specific findings that might fall short of attention in the shadow of the overall findings of the paper. It can be advantageous to put a dissemination plan in writing early, to avoid scattered communication around results. All collaborators may inform their universities and institutions or faculty, who often provide the opportunity to write a blog post or news article. Sharing results before peer review has become fairly common but also raises concerns about research being shared uncritically, as most people may not be able to differentiate between peer-reviewed research and preprints. This concern was confirmed by Wingen et al. (2022), who experimentally showed that a brief explanation can help clarify this issue. As such, if you opt for a preprint publication, we recommend both clearly marking your preprint as such, as well as adding the explanation developed by Wingen and colleagues. In addition, it is good practice to ensure that all materials are accessible. Collecting information and conducting studies is costly -it is important to make this data available to achieve the maximum benefit of your work and resources, so that others may answer additional research questions. The PSA has even incentivised secondary analysis, by challenging researchers to work with one of their large datasets, offering monetary rewards (Forscher et al., 2019).

Concluding Remarks
Large-scale multi-country replications are not the most straightforward or easiest research endeavours. Yet they come with large benefits such as comparable data from different countries, and datasets which-if curated well-can be the source of future insights. While there is clearly no one-size-fits-all approach, we hope that the lessons we learned and summarised in this paper will be helpful at all steps of planning future multi-country replications. However, when approaching your multi-country replication, keep in mind to plan the protocol and responsibilities of collaborators well in advance and listen closely to your collaborators' insights of their own countries, identify potential pitfalls, and make sure everyone's safety is guaranteed. Likewise, share your knowledge about your own region or country. Your study also does not need to solve every question there is -a simple effect or theory is much more realistic to test at scale and makes it easier to provide tangible insights from the observations. Likewise, extensions should be equally straightforward.
Working on a large scale with colleagues from many countries can be a challenging but enriching experience and provides collaborators with research expertise and insights into how scientists operate in other parts of the world. These differences allow for additional perspectives and a more holistic view of phenomena, but they also require clear guidelines on communication channels and responsibilities (ideally provided in a way that is easily accessible to all collaborators.) This includes using communication tools that are accessible to every partner in the consortium. Simple and prevalent communication is key to the success of every project employing Big Team Science. However, it should also be considered that commu-nication tools may not always take into account their usability for researchers with disabilities. In order to ensure that colleagues with disabilities can contribute without additional hurdles, such factors should be explored before the start of a project and not be made the responsibility of potential contributors.
Lastly, it is our impression that despite the differences in approaches, all large-scale multi-country endeavours we have been part of have one thing in common: the motivation to conduct such studies mostly stems from a drive to produce solid research that can help improve people's lives. Multi-lab approaches may facilitate a mindset-shift in how research is conducted, where instead of operating in silos and potentially competing for publication spots in journals, a collaborative approach allowing for different perspectives within one research project allows for not only more productive discussions where everyone has the same goal, but also leads to more nuanced insights. Including students in such initiatives can further help to support developments towards more collaborations of this kind and provide valuable early research experience and network opportunities. On the other hand, journals will also have to consider how to best provide impartial reviews as experts for such collaborations will become more and more connected to each other and it will become increasingly difficult to find editors who are knowledgeable in multi-country replications but without ties to author consortia that often include more than a hundred researchers. While we hope that our insights and checklist are helpful in conducting your multi-country replication, we also encourage researchers to build meaningful and lasting relationships with their project partners, moving towards a methodologically sound, more collaborative, and inclusive field. We would also like to encourage researchers who have created, or know of, additional materials for approaching this topic to get in touch and add resources or links to the accompanying online repository (https://osf.io/xrv5p/).