The credibility of scientific claims depends upon the transparency of the research products upon which they are based (e.g., study protocols, data, materials, and analysis scripts). As psychology navigates a period of unprecedented introspection, user-friendly tools and services that support open science have flourished. However, the plethora of decisions and choices involved can be bewildering. Here we provide a practical guide to help researchers navigate the process of preparing and sharing the products of their research (e.g., choosing a repository, preparing their research products for sharing, structuring folders, etc.). Being an open scientist means adopting a few straightforward research management practices, which lead to less error prone, reproducible research workflows. Further, this adoption can be piecemeal – each incremental step towards complete transparency adds positive value. Transparent research practices not only improve the efficiency of individual researchers, they enhance the credibility of the knowledge generated by the scientific community.
… until recently I was an open-data hypocrite. Although I was committed to open data, I was not implementing it in practice. … Some of it was a lack of effort. It was a pain to document the data; it was a pain to format the data; it was a pain to contact the library personnel; it was a pain to figure out which data were indeed published as part of which experiments. Some of it was forgetfulness. I had neither a routine nor any daily incentive to archive data. (Rouder, 2016, p. 1063)
Introduction
Science is a cumulative and self-corrective enterprise; over time the veracity of the scientific literature should gradually increase as falsehoods are refuted and credible claims are preserved (Merton, 1973; Popper, 1963). These processes can optimally occur when the scientific community is able to access and examine the key products of research (materials, data, analyses, and protocols), enabling a tradition where results can be truly cumulative (Ioannidis, 2012). Recently, there has been growing concern that self-correction in psychological science (and scientific disciplines more broadly) has not been operating as effectively as assumed, and a substantial proportion of the literature may therefore consist of false or misleading evidence (Ioannidis, 2005; Johnson, Payne, Wang, Asher, and Mandal, 2016; Klein et al., 2014; Open Science Collaboration, 2015; Simmons, Nelson, & Simonsohn, 2011; Swiatkowski & Dompnier, 2017). Many solutions have been proposed; we focus here on the adoption of transparent research practices as an essential way to improve the credibility and cumulativity of psychological science.
There has never been an easier time to embrace transparent research practices. A growing number of journals, including Science, Nature, and Psychological Science, have indicated a preference for transparent research practices by adopting the Transparency and Openness Promotion guidelines (Nosek et al., 2015). Similarly, a number of major funders have begun to mandate open practices such as data sharing (Houtkoop et al., 2018). But how should individuals and labs make the move to transparency?
The level of effort and technical knowledge required for transparent practices is rapidly decreasing with the exponential growth of tools and services tailored towards supporting open science (Spellman, 2015). While a greater diversity of tools is advantageous, researchers are also faced with a paradox of choice. The goal of this paper is thus to provide a practical guide to help researchers navigate the process of preparing and sharing the products of research, including materials, data, analysis scripts, and study protocols. In the supplementary material, readers can find concrete procedures and resources for integrating the principles we outline in their own research.1 Our view is that being an open scientist means adopting a few straightforward research management practices, which lead to less error prone, reproducible research workflows. Further, this adoption can be piecemeal – each incremental step towards complete transparency adds positive value. These steps not only improve the efficiency of individual researchers, they enhance the credibility of the knowledge generated by the scientific community.
Why Share?
Science is based on verifiability, rather than trust. Imagine an empirical paper with a Results section that claimed that “statistical analyses, not reported for reasons of brevity, supported our findings (details are available upon request)”. Such opaque reporting would be unacceptable, because readers lack essential information to assess or reproduce the findings, namely the analysis methods and their results. Although publication norms for print journals previously supported only sharing verbal descriptions, rather than a broader array of research products, this same logic applies equally.
When study data and analysis scripts are openly available, a study’s analytic reproducibility can be established by re-running the reported statistical analyses, facilitating the detection and correction of any unintended errors in the analysis pipeline (Hardwicke et al., 2018; Peng, 2006; Stodden, 2015; Stodden, Seiler, & Ma, 2018; see supplementary material [SM]: Promoting analytic reproducibility). Once analytic reproducibility has been established, researchers can examine the analytic robustness of the reported findings, by employing alternative analysis specifications (Silberzahn et al., in press; Simonsohn, Simmons, & Nelson, 2015; Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016), highlighting how conclusions depend on particular choices in data processing and analysis. When stimuli and other research materials are openly available, researchers can conduct replication studies where new data are collected and analyzed using the same procedures to assess the replicability of the finding (Simons, 2014). And once a finding has been shown to be replicable, researchers can investigate its generalisability: how it varies across different contexts and methodologies (Brandt et al., 2014).
Transparency also enhances trust in the validity of statistical inference. Across statistical frameworks, conducting multiple tests and then selectively reporting only a subset may lead to improper and ungeneralisable conclusions (Goodman et al., 2016; Wasserstein & Lazar, 2016). Even if only a single analysis is conducted, selecting it based on post-hoc examination of the data can undermine the validity of inferences (Gelman & Loken, 2014). Transparency regarding analytic planning is thus critical for assessing the status of a particular statistical test on the continuum between exploratory and confirmatory analysis (De Groot, 2014; Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). Such transparency can be readily achieved by publicly documenting one’s hypotheses, research design, and analysis plan before conducting a study in a process called pre-registration (De Angelis et al. 2004; Nosek, Ebersole, DeHaven, & Mellor, 2018; see SM: Preregistration).
Besides increasing the credibility of scientific findings, transparency also boosts the efficiency of scientific discovery. When information is not shared, the value of a study is limited because its products cannot be reused. In contrast, when research products are shared, subsequent researchers can avoid duplication of effort in data collection and decrease the expense involved in creating stimulus materials and analytic code. Further, sharing research products allows researchers to explore related hypotheses and can inspire new research questions. Shared research products can also be important models for researchers, especially trainees, to use in the development of their own materials and analyses. And, in the case of publicly funded research it should also be considered an ethical impetus to make the results of this publicly funded work available to the public.
Finally, there are practical reasons to embrace transparency. First, public sharing is probably the best protection against data loss, since – as we will discuss – best practices require sharing in durable repositories. Second, open research practices increase visibility and facilitate access to unique opportunities for collaboration, jobs, and funding (McKiernan et al., 2016). Third, data sharing has been associated with a citation benefit (Piwowar, & Vision, 2013). Fourth, and perhaps most importantly: in our own anecdotal experience (cf. Lowndes et al., 2017), a research workflow designed at its core to be shared with others is far more efficient and sustainable to use oneself. Accessing an old project to find data, code, or materials need not trigger nightmares. A useful saying to keep in mind is that “your closest collaborator is you six months ago, but you don’t reply to emails” (Broman, 2016). Research is a finite enterprise for everyone: collaborators leave projects, change jobs, and even die. If work is not shared, it is often lost.
What to Share?
In this section, we review the different parts of the scientific process that can be shared. Our primary recommendations are:
Make transparency a default: If possible, share all products of the research process for which there are no negative constraints (due to e.g., funder, IRB/ethics, copyright, or other contract requirements). While attributes of the data, such as disclosure risk, sensitivity, or size, may limit sharing, there are many options for granting partial and restricted access to the data and associated materials.
If negative constraints prohibit transparency, explicitly declare and justify these decisions in the manuscript (Morey et al., 2016).
Any shared material incrementally advances the goals of increasing verifiability and reuse. Authors need not wait to resolve uncertainty about sharing all products before beginning the process: Bearing in mind any negative constraints (e.g., privacy of participants), any product that is shared is a positive step.
Navigating this space can be difficult (see Figure 1). For this reason, we recommend that lab groups discuss and develop a set of “Standard Operating Procedures” (SOP) to guide the adoption of transparent research practices in a manner that is well-calibrated to their own unique circumstances.2 One part of that organisation scheme is a consistent set of naming conventions and consistent project structure (see SM: Folder structure); an example of project created in accordance with our recommendations is available on The Open Science Framework (http://doi.org/10.17605/OSF.IO/XF6UG). Below, we review each of the different products that can be shared.
Decision flowchart outlining important considerations when sharing research products.
Decision flowchart outlining important considerations when sharing research products.
Study Protocol. A study protocol consists of a detailed written specification of hypotheses, methods, and analysis. For relatively straightforward studies, it may be reasonable to include all of this information in the main body of the primary report of the study. Alternatively, you may wish to share a separate protocol document and provide a higher-level verbal summary in the main report. For certain experimental procedures it may also be beneficial to include instructive video footage. One way to view the study protocol is as a verbal layer that collates, describes, and organises more specific research products, such as materials, software, and analysis code, and informs the reader how they were implemented during your study. Either way, the level of detail should be sufficient to allow others to replicate your work without direct instruction from you.
Materials. What constitutes materials differs widely from application to application, even within psychology. In simpler studies, the materials may be a list of questionnaire items or stimuli presented to participants manually (videos, images, sounds, etc.). In other studies, materials may include elaborate video stimuli, (video-taped) procedures for an interaction with a confederate or participants (Grahe, Brandt, & IJzerman, 2015), or computer code to present stimuli and collect responses. For clinical studies, materials may include case report forms and materials for informed consent. Sharing these materials is valuable for both interpretation of research results and for future investigators. A detailed examination of stimulus materials can lead to insights about a particular phenomenon or paradigm by readers or reviewers. In addition, since these materials are often costly and difficult to produce, lack of sharing will be a barrier for the replication and extension of findings.
Data and Metadata. Sharing data is a critical part of transparency and openness, but investigators must make decisions regarding what data to share. “Raw data” are the data as originally recorded, whether by software, an experimenter, a video camera, or other instrument (Ellis & Leek, 2017). Sharing such data can raise privacy concerns due to the presence of identifying or personal information. In some cases, anonymisation may be possible, while in others (e.g., video data), anonymity may be impossible to preserve and permission for sharing may not be granted. Regardless of the privacy concerns surrounding raw data, it should almost always be possible to share anonymised data in tabular format as they are used in statistical analyses (see SM: Anonymisation). Such data should typically be shared in an easily readable format that does not rely on proprietary software (e.g., comma-separated values, or CSV). Ideally, the script for generating these processed data from the raw data should be made available as well to ensure full transparency (see SM: Automate or thoroughly document all analyses).
One critical ingredient of data sharing is often overlooked: the need for metadata. Metadata is a term describing documentation that accompanies and explains a dataset (see SM: Data documentation). In psychology, metadata typically include information on who collected the data, how and when they were collected, the number of variables and cases in each data file, and dataset-level information such as verbose variable and value labels. Although they can also be shared in standardized, highly structured, machine-readable formats, often metadata are simply a separate document (called a “codebook” or “data dictionary”; see SM: Data documentation) that gives verbal descriptions of variables in the dataset. Researchers do not need to be experts in metadata standards: Knowing the structure of metadata formats is less important than making sure the information is recorded and shared. Machine-readable structure can always be added to documentation by metadata experts after the information is shared.
Analysis Procedure. To ensure full transparency and reproducibility of research findings it is critical to share detailed documentation of how the analytic results reported in a research project were obtained (see SM: Analytic reproducibility). Researchers analyze their data in many different ways, and so the precise product(s) to be shared will vary. Nevertheless, the aim is to provide an exact specification of how to move from raw data to final descriptive and statistical analyses, ensuring complete documentation of any cleaning or transformation of data. For some researchers, documenting analyses will mean sharing, for example, R scripts or SPSS syntax; for others it may mean writing a step-by-step description of analyses performed in non-scriptable software programs such as spreadsheets. In all cases, however, the goal is to provide a recipe for reproducing the precise values in the research report.
One challenge for sharing analyses is the rapid pace of change in hardware and software (SM: Avoid “works on my machine” errors). Some researchers may find it discouraging to try and create a fully-reproducible analytic ecosystem with all software dependencies completely specified (e.g., Boettiger, 2015; SM: Sharing software environments). We have several recommendations. First, do not let the perfect be the enemy of the good. Share and document what you can, as it will provide a benefit compared with not sharing. Second, document the specific versions of the analysis software and packages/add-ons that were used (American Psychological Association, 2010; Eubank, 2016). And finally, when possible, consider using open source software (e.g., R, Python) as accessing and executing code is more likely to be possible in the future compared with commercial packages (Huff, 2017; Morin et al., 2012).
Research Reports. While we primarily focus on research products beyond the standard written report in this guide, research reports themselves (i.e., published papers) also provide important information about how materials were used, how data were collected, and the myriad other details that are required to understand other products. Making research reports publicly available (through “Open Access”) greatly facilitates the use of shared research products. Two main options exist to publish Open Access: Green (posting research online through preprint repositories, like PsyArxiv; https://psyarxiv.com) or Gold (full open access via the publisher, most of which currently still charge Article Processing Costs). For the Green route, preprints do not typically affect the traditional publication process as most journals do not consider them a ‘prior publication’ (Bourne, Polka, Vale, & Kiley, 2017). A further discussion of Open Access is beyond the scope of this article. However, you can always check a particular journal’s stance on open access by typing its name into the SHERPA/ROMEO database (http://www.sherpa.ac.uk/romeo/index.php). This will also tell you whether the journal makes the final article publicly available on its website, and whether this will require you to pay a fee.
When to Share
When it comes to the question of when to share, any time is better than never. However, benefits are maximised when sharing occurs as soon as possible. We consider the possibilities for sharing 1) before data collection, 2) during data collection, 3) when submitting a paper, 4) when the paper is published, 5) at the end of a project, and 6) after a specified embargo period. Figure 2 presents a typical workflow.
Typical workflow indicating when to share research products at different stages of the research process.
Typical workflow indicating when to share research products at different stages of the research process.
Planning to share. Sharing research products is easier when you have planned for it in advance. For example, it makes sense to store and structure your data files in a systematic manner throughout the data collection period (i.e., to have a basic “data management plan”). Sharing then only requires a few clicks to upload the files. Many researchers justify not sharing their data because of the time and effort it takes (Borgman, 2012; Houtkoop, Chambers, Macleod, Bishop, Nichols, & Wagenmakers, 2018) – starting early helps avoid this problem. Ideally, researchers should create a data management plan at the beginning of their study (for information on how create one, see, e.g., Jones, 2011).
Before data collection. Sharing key study design and analysis information prior to data collection can confer a number of significant benefits, such as mitigating selective reporting bias or “p-hacking” (Nosek et al., 2018; Simmons et al., 2011). Your study protocol – hypotheses, methods, and analysis plan – can be formally “registered” by creating a time-stamped, read-only copy in a public registry (e.g., The Open Science Framework), such that they can be viewed by the scientific community (a “pre-registration”; see SM: Pre-registration). If you wish, it is possible to pre-register the study protocol, but keep it private under an “embargo” for a specified period of time (e.g., until after your study is published).
While embargoes on preregistrations can mitigate the fear of being scooped, flexibility in the release of pre-registered documents limits transparency. For example, researchers may strategically release only those documents that fit the narrative they wish to convey once the results are in. It is therefore preferable to encourage transparency from the outset. At the very least, the scientific community should be able to check whether a study was preregistered and, preferably, have access to the content of this preregistration, regardless of whether it is communicated in the final paper.
“Registered Reports” (Chambers, 2013; Hardwicke & Ioannidis, 2018; also see https://cos.io/rr/) address this concern by embedding the pre-registration process directly within the publication pipeline. Researchers submit their study protocol to a journal where it undergoes peer-review, and may be offered in principle acceptance for publication before the study has even begun. This practice could yield additional advantages beyond standard pre-registration, such as mitigation of publication bias (because publication decisions are not based on study outcomes), and improved study quality (because authors receive expert feedback before studies begin).
The central purpose of pre-registration is transparency with respect to which aspects of the study were pre-planned (confirmatory) and which were not (exploratory). Viewed from this perspective, pre-registration does not prevent researchers from making changes to their protocol as they go along, or from running exploratory analyses, but simply maintains the exploratory-confirmatory distinction (Wagenmakers et al., 2012). When used appropriately, pre-registration has the potential to reduce bias in hypothesis-testing (confirmatory) aspects of research. This ambition, however, does not preclude opportunities for exploratory research when it is explicitly presented as such (Ioannidis, 2014; Nosek et al., 2018).
During data collection. Study protocols and materials can be readily shared once data collection commences. Rouder (2016) has, additionally, advocated sharing data while they are being collected, a concept he calls “born-open data” (see SM: Born-open data). Born-open-data are automatically uploaded to a public repository, for example, after every day of data collection. Besides the obvious advantages of greater transparency and immediate accessibility, born-open-data can simplify data management (e.g., the published data constitute an off-site backup to a professionally managed data storage). Because of technical and privacy issues, this approach may not be right for every project. However, once the system is set up, sharing data requires minimal effort (other than appropriate maintenance and periodic checking).
Upon paper submission or publication. A more common practice is to share research products when submitting a paper to a journal or when the paper is published. Of these two possibilities, we recommend sharing on submission. First, editors/reviewers may need access to this information in order to properly evaluate your study. Second, sharing on submission adds value to your paper by demonstrating that your research products will be made available to the scientific community. Finally, sharing on submission allows for errors to be caught before publication, reducing the possibility of later public correction. If ‘blind-reviewing’ is important and author names are displayed alongside shared research products, some repositories, such as The Open Science Framework, offer a “view-only” link option, that (partially) circumvents this problem.
After an embargo period. Finally, there may be reasons why researchers cannot or do not want to share all research products immediately. It is possible to archive products in an accessible repository right away, and temporarily delay their release by placing them under an embargo.
How to Share?
Once a researcher has decided to share research products, one of the most important decisions to make is where and how to share. Journals and professional societies often recommend that data and other research products be made “available upon request” (Vasilevsky et al., 2017). However, a number of studies suggest that data requests typically do not result in data access (Alsheikh et al., 2011; Dehnhard et al., 2013; Stodden et al., 2018; Vanpaemel et al., 2015; Vines et al., 2014; Wicherts et al., 2006). Similarly, sharing via personal websites is very flexible and increases accessibility and discoverability compared to sharing on request, but is also not a sustainable solution: Research products may become inaccessible when the personal website is deleted, overhauled, or moved to a different hosting service. Thus we do not recommend either of these options.
Instead, we recommend the use of independent public repositories for sharing research products. When choosing a repository, researchers should consider whether the repository:
Uses persistent and unique identifiers for products (such as DOIs).
Accommodates structured metadata to maximize discoverability and reuse.
Tracks data re-use (e.g., citations, download counts).
Accommodates licensing (e.g., provides the ability to place legal restrictions on data reuse or signal there are no restrictions).
Features access controls (e.g., allows restriction of access to a particular set of users).
Has some persistence guarantees for long-term access.
Stores data in accordance with local legislation (e.g., the new General Data Protection Regulation for the EU, http://www.eugdpr.org/).
Within this category, we highlight the Open Science Framework (http://osf.io). This repository satisfies the first six criteria above (the last one being dependent on the exact location of the researchers),3 is easy to use, and provides for sharing the variety of the products listed above (for a detailed tutorial on using the Open Science Framework to share research products, see Soderberg, 2018). Note that some research communities make use of specialized repositories, for example, brain imaging data (https://openeuro.org/) or video and audio recordings (https://nyu.databrary.org/). Such repositories are more likely to have metadata standards and storage capacity calibrated to specific data types. For an overview of other public repositories, see Table 1.
Features of selected public repositories that hold psychological data.
. | Operator(s) . | For- profit/Non-profit . | Country/jurisdiction . | Focus/specialization . | Costs for data sharer . | Self-deposit1 . | Private (i.e., non-public) storage/projects possible . | Restrictions of access possible (for published/public projects) . | Embargo period possible . | Content types2 . |
---|---|---|---|---|---|---|---|---|---|---|
Code Ocean | Code Ocean | Non-profit | USA | None | 50 GB of storage, 10 hrs/month cloud computing time (with up to 5 concurrent runs) free for academic users | Yes | Only before publication of the project | No | No | Software applications, Source code, Structured graphics, Configuration data |
DANS EASY | Netherlands Organisation for Scientific Research (NOW) & Royal Netherlands Academy of Arts and Sciences (KNAW) | Non-profit | Netherlands | None | Free for up to 50GB | Yes | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Structured text, Structured graphics, Databases |
Dryad | Dryad Digital Repository | Non-profit | USA | Medical and life sciences | $120 for every data package up to 20GB (+ $50 for each additional 10 GB); no costs for data sharers if charges are covered by a journal or the submitter is based in a country that qualifies for a fee waiver | Yes | No | No | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Software applications, Source code, Structured text, other |
figshare | Digital Science | For-profit | UK | None | Free for up to 100GB | Yes | Yes (up to 20GB for free accounts) | Yes3 | Yes3 | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Archived data, Source code, Structured graphics |
GESIS datorium | GESIS – Leibniz Institute for the Social Sciences | Non-profit | Germany | Social sciences | Free | Yes | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Raw data, Structured graphics, other |
GESIS standard archiving | GESIS – Leibniz Institute for the Social Sciences | Non-profit | Germany | Social sciences, survey data (esp. from large or repeated cross-sectional or longitudinal studies) | Free | No | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Archived data |
Harvard Dataverse | Harvard University, Institute for Quantitative Social Sciences | Non-profit | USA4 | None | Free | Yes | Yes | Yes | Yes | Scientific and statistical data, formats, Standard office documents, Raw data, Archived data, Software applications, Source code, Databases |
Mendeley Data | Elsevier (in cooperation with DANS) | For-profit | Netherlands | None | Free5 | Yes | Yes | No | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Software applications, Structured text, Configuration data, other |
openICPSR | Inter-University Consortium of Political and Social Science | Non-profit | USA | Political and social research, social and behavioral sciences | Free up to 2GB6 | Yes | Yes | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Archived data, Structured text, Structured graphics |
Open Science Framework | Center for Open Science | Non-profit | USA | None | Free | Yes | Yes | No | Yes | Scientific and statistical data formats, Standard office documents, Plain text, other |
PsychData | ZPID – Leibniz Institute for Psychology Information | Non-profit | Germany | Psychology, data for peer-reviewed publications | Free | No | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text |
UK Data Service standard archiving | UK Data Archive & Economic and Social Research Council (ESRC) | Non-profit | UK | Social research, esp. large-scale surveys, longitudinal, and qualitative studies | Free | No | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Structured graphics |
UK Data Service ReShare | UK Data Archive & Economic and Social Research Council (ESRC) | Non-profit | UK | Social sciences | Free | Yes | No | Yes | Yes | Scientific and statistical data, formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Structured graphics |
Zenodo | European Organization for Nuclear Research & Open Access Infrastructure for Research in Europe (OpenAIRE) | Non-profit | EU | None | Free7 | Yes | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Archived data, Source code, Structured text, Structured graphics, Networkbased data, other |
. | Operator(s) . | For- profit/Non-profit . | Country/jurisdiction . | Focus/specialization . | Costs for data sharer . | Self-deposit1 . | Private (i.e., non-public) storage/projects possible . | Restrictions of access possible (for published/public projects) . | Embargo period possible . | Content types2 . |
---|---|---|---|---|---|---|---|---|---|---|
Code Ocean | Code Ocean | Non-profit | USA | None | 50 GB of storage, 10 hrs/month cloud computing time (with up to 5 concurrent runs) free for academic users | Yes | Only before publication of the project | No | No | Software applications, Source code, Structured graphics, Configuration data |
DANS EASY | Netherlands Organisation for Scientific Research (NOW) & Royal Netherlands Academy of Arts and Sciences (KNAW) | Non-profit | Netherlands | None | Free for up to 50GB | Yes | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Structured text, Structured graphics, Databases |
Dryad | Dryad Digital Repository | Non-profit | USA | Medical and life sciences | $120 for every data package up to 20GB (+ $50 for each additional 10 GB); no costs for data sharers if charges are covered by a journal or the submitter is based in a country that qualifies for a fee waiver | Yes | No | No | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Software applications, Source code, Structured text, other |
figshare | Digital Science | For-profit | UK | None | Free for up to 100GB | Yes | Yes (up to 20GB for free accounts) | Yes3 | Yes3 | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Archived data, Source code, Structured graphics |
GESIS datorium | GESIS – Leibniz Institute for the Social Sciences | Non-profit | Germany | Social sciences | Free | Yes | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Raw data, Structured graphics, other |
GESIS standard archiving | GESIS – Leibniz Institute for the Social Sciences | Non-profit | Germany | Social sciences, survey data (esp. from large or repeated cross-sectional or longitudinal studies) | Free | No | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Archived data |
Harvard Dataverse | Harvard University, Institute for Quantitative Social Sciences | Non-profit | USA4 | None | Free | Yes | Yes | Yes | Yes | Scientific and statistical data, formats, Standard office documents, Raw data, Archived data, Software applications, Source code, Databases |
Mendeley Data | Elsevier (in cooperation with DANS) | For-profit | Netherlands | None | Free5 | Yes | Yes | No | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Software applications, Structured text, Configuration data, other |
openICPSR | Inter-University Consortium of Political and Social Science | Non-profit | USA | Political and social research, social and behavioral sciences | Free up to 2GB6 | Yes | Yes | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Archived data, Structured text, Structured graphics |
Open Science Framework | Center for Open Science | Non-profit | USA | None | Free | Yes | Yes | No | Yes | Scientific and statistical data formats, Standard office documents, Plain text, other |
PsychData | ZPID – Leibniz Institute for Psychology Information | Non-profit | Germany | Psychology, data for peer-reviewed publications | Free | No | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text |
UK Data Service standard archiving | UK Data Archive & Economic and Social Research Council (ESRC) | Non-profit | UK | Social research, esp. large-scale surveys, longitudinal, and qualitative studies | Free | No | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Structured graphics |
UK Data Service ReShare | UK Data Archive & Economic and Social Research Council (ESRC) | Non-profit | UK | Social sciences | Free | Yes | No | Yes | Yes | Scientific and statistical data, formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Structured graphics |
Zenodo | European Organization for Nuclear Research & Open Access Infrastructure for Research in Europe (OpenAIRE) | Non-profit | EU | None | Free7 | Yes | No | Yes | Yes | Scientific and statistical data formats, Standard office documents, Plain text, Images, Audiovisual data, Raw data, Archived data, Source code, Structured text, Structured graphics, Networkbased data, other |
1 If self-deposit (i.e., researchers can directly upload their own materials) is not possible, this means that the repository is curated (or at least more strongly curated than the others). The advantage of these repositories is that they offer additional help and services by professional archiving staff (e.g., in the creation of study- and variable-level documentation or the conversion of files to nonproprietary formats).
2 We used the content type category from the re3data schema v3.0 here (see Rücknagel et al., 2015).
3 Individual files can be embargoed or made confidential.
4 Dataverse is a special case in several regards. There is the overall Dataverse Project (https://dataverse.org/), then there are different Dataverse repositories (e.g., the Harvard Dataverse: https://dataverse.harvard.edu/ or DataverseNL: https://dataverse.nl/ by the Dutch Data Archiving and Networked Services) which host multiple individual Dataverses (e.g., by individual universities, research groups or researchers). If the institution a researcher is affiliated with does not have its own Dataverse repository or Dataverse, it is possible to create a Dataverse within the Harvard Dataverse repository. For a more detailed description of Dataverse and its organizational architecture, see King (2007) and Leeper (2014).
5 The FAQ on the Mendeley Data website states that they may introduce a freemium model in the future “for instance charging for storing and posting data, above a certain dataset size threshold” (see https://data.mendeley.com/faq).
6 If more storage space or additional services are needed the researchers or their institutions can choose to pay for branded OpenICPSR hosting or the “Professional Curation Package” to access all of the ICPSR (curation) services (see https://www.openicpsr.org/openicpsr/pricing).
7 The Zenodo terms of use state that “content may be uploaded free of charge by those without ready access to an organized data centre”.
We recommend to share on a platform, such as the OSF, that make it possible to attribute a unique and persistent URL (such as a DOI) to the project. Several studies have indicated that regular URLs used by journals to link to supplementary files can often break over time, severing access to research products (Evangelou, Trikalinos, & Ioannidis, 2005; Gertler & Bullock, 2017). Using persistent URLs increases the chances that research products will be accessible for the long term.
Sharing can raise a number of legal and ethical issues, and these vary between countries and between institutions. Handling these is vastly simplified by addressing them ahead of time. For example, consent forms (see SM: Informed consent) can explicitly request participant consent for public data sharing, as it can be hard or impossible to obtain retroactive consent. Additionally, researchers should always clarify any requirements of their institution, granting agency, and intended publication venue. Below we review issues related to privacy and licensing.
Considering participants’ privacy can be both an ethical issue and a legal requirement (for example, The United States’ Health Insurance Portability and Accountability Act and The European Union’s General Data Protection Regulation, see SM: EU Data Protection Guidelines). In short, researchers must take appropriate precautions to protect participant privacy prior to sharing data. Fortunately, many datasets generated during psychological research either do not contain identifying information, or can be anonymised (“de-identified”) relatively straightforwardly (see SM: Anonymisation). However, some forms of data can be quite difficult to anonymise (e.g., genetic information, video data, or structural neuroimaging data; Gymrek et al., 2013; Sarwate et al., 2014), and require special considerations beyond the scope of this article. Because it is often possible to identify individuals based on minimal demographic information (e.g., postal code, profession, age; Sweeney, 2000), researchers should consult with their ethics board to find out the appropriate legal standard for anonymisation.
One further legal concern for sharing research products is their ownership. Researchers often assume that publicly available research products have been placed in the public domain, that is, that the authors waive all property rights. However, by default, researchers retain full copyright of the products they create. Unless they are published with a permissive license, the products technically cannot be used or redistributed without approval from the authors – despite scientific norms to the contrary. Thus, to reduce uncertainty about copyright, shared products should ideally be licensed using an easy-to-understand standard license such as a Creative Commons (CC; https://creativecommons.org/licenses/) or Open Data Commons (ODC; https://opendatacommons.org/licenses/). In the spirit of openness we recommend to release research products into the public domain by using maximally permissive licenses, such as CC0 and ODC-PDDL, or to condition re-use only on attribution (e.g., CC-BY and ODC-BY). Licensing research products is as easy as including a text file alongside your research products containing a statement such as “All files in this repository are licensed under a Creative Commons Zero License (CC0 1.0)”.
So, why not share?
Given all of the arguments we and others have presented, why would researchers products still not share their data? Beyond the concerns described above (privacy, etc.), one commonly heard worry is that researchers will make use of shared resources to gain academic precedence (“scooping”; Houtkoop et al. 2018). In our view, this worry is usually unwarranted. Most subfields of psychology are not so competitive as to be populated with investigators who are looking to race to publish a particular finding. In addition, in many cases the possibility of being scooped is likely outweighed by the benefits of increased exposure, as noted by Gary King:4 “The thing that matters the least is being scooped. The thing that matters the most is being ignored”. Researchers who are truly concerned about being scooped – whether justifiably or not – can simply wait to share their materials, code, and data until after they publish, or release research products under a temporary embargo. Such embargoes slow verification and reuse, but they are far better than not sharing at all.
Another worry is that errors will be revealed by others checking original data, or original conclusions will be challenged by alternative analyses (Houtkoop et al., 2018). Indeed, it seems likely that errors will be discovered and conclusions will be challenged as widespread adoption of transparent research practices adds fuel to the idling engines of scientific self-correction and quality control, such as replication and peer review. It is understandable that researchers worry about errors being discovered in their own work, but such errors are inevitable – we are after all, only human. A rise in informed critical discourse will be healthy for science and make discovery of such errors normative. We believe that more, rather than less, transparency is the best response. Honesty and transparency are likely to enhance – rather than diminish – one’s standing as a scholar (Fetterman & Sassenberg, 2015).
Researchers may also be concerned that learning and then implementing transparent research practices will be too time-consuming (Houtkoop et al., 2018). In our experience, there is indeed a significant time-cost to learning such practices. Nonetheless, these should not necessarily be embraced and mastered at once. It is often through “baby steps”, via trial and error, that the practice of open science can become natural and habitual. It helps to include, for example, “research milestones” in one’s workflow. Adding in milestones also contributes to an optimal teaching strategy, with students learning how to engage in open science practices in small steps. Besides, there are major benefits that make this time well spent for the individual researcher. First, transparent research practices are often synonymous with good research management practices, and therefore increase efficiency in the longer term. For example, it is much easier to locate stimuli from an old project or re-use analysis code when it is well-documented and available in a persistent online repository. Second, transparent practices can lead to benefits in terms of citation and reuse of one’s work (see SM: incitenvising sharing). Finally, transparent research practices inspire confidence in one’s own research findings, allowing one to more readily identify fertile avenues for future studies that are truly worth investing resources in.
Conclusion
The field of psychology is engaged in an urgent conversation about the credibility of the extant literature. Numerous research funders, institutions, and scientific journals have endorsed transparent and reproducible research practices through the TOP guidelines (Nosek et al., 2015)5 and major psychology journals have begun implementing policy changes that encourage or mandate sharing (see e.g., Kidwell et al., 2016; Nuijten et al., 2017). Meanwhile, the scientific ecosystem is shifting and evolving. A new open science frontier has opened, and flourishes with a plethora of potential tools and services to help researchers adopt transparent research practices.
Here we have sketched out a map to help researchers navigate this exciting new terrain. Like any map, some aspects will become outdated as the landscape evolves over time: Exciting new tools to make research transparency even more user-friendly and efficient are already on the horizon. Nevertheless, many of the core principles will remain the same, and we have aimed to capture them here. Our view is that being an open scientist means adopting a few straightforward research management practices, which lead to less error-prone, reproducible research workflows with each incremental step adding positive value. Doing so will improve the efficiency of individual researchers and it will enhance the credibility of the knowledge generated by the scientific community.
One of the major burdens facing scientists is keeping up with the evolution in standards and resources. That’s why the SM will be updated regularly and collaboratively. This “live” version is available at http://open-science-guide.uni-koeln.de will therefore differ from the publisher’s version.
For example, see the SOP for the Nosek group (https://osf.io/mv8pj/) and the Green group (https://github.com/acoppock/Green-Lab-SOP).
OSF, which is located in the United States, satisfies US legislation. It has also adapted its privacy policy and terms of use to comply with the GDPR: https://cos.io/blog/were-committed-gdpr-heres-how/?_ga=2.211832468.1861791594.1528181281-32920606.1460731415. Note, however, that compliance with these legislations depends on the use of proper anonymisation procedures by the researchers (see Supplementary Material).
See also guidelines for specific fields and types of research: the CONSORT statement (randomized controlled trial: http://www.consort-statement.org/), the ARRIVE guidelines (animal research: https://www.nc3rs.org.uk/arrive-guidelines) or the PRISMA statement (meta-analysis: http://prisma-statement.org/). The EQUATOR website (http://www.equator-network.org/) lists the main reporting guidelines.
Acknowledgments
The authors thank Tim van der Zee and Kai Horstmann for helpful comments and Daniël Lakens for his help at the start of this project. We also thank Christoph Stahl and Tobias Heycke for allowing us to use their data and materials for the example project (from Heycke, Aust, & Stahl, 2017) and Luce Vercammen for proofreading the manuscript. Any remaining errors are the authors’ responsibility.
Funding Information
This work was partly funded by the French National Research Agency in the framework of the “Investissements d’avenir” program (ANR-15-IDEX-02) awarded to Hans IJzerman. Tom Hardwicke was supported by a general support grant awarded to METRICS from the Laura and John Arnold Foundation.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
OK initiated the project.
He coordinated it with MF.
All authors contributed to the writing and commented on previous versions.
FA designed the example project.
Author Information
Olivier Klein, Center for Social and Cultural Psychology, Brussels, Belgium. Tom E. Hardwicke, Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, USA; Frederik Aust, Department of Psychology, University of Cologne, Cologne, Germany; Johannes Breuer, Data Archive for the Social Sciences, GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany; Alicia Hofelich Mohr, Liberal Arts Technologies and Innovation Services, College of Liberal Arts, University of Minnesota, Minneapolis, Minnesota, USA; Hans IJzerman, LIP/PC2S, Université Grenoble Alpes, Grenoble, Isère, France; Gustav Nilsonne, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm Sweden, Stress Research Institute, Stockholm University, Stockholm, Sweden, and Department of Psychology, Stanford University, Stanford, USA; Wolf Vanpaemel, Research Group of Quantitative Psychology and Individual Differences, University of Leuven, Leuven, Belgium; Michael C. Frank, Department of Psychology, Stanford University, Stanford, California, USA.
Peer review comments
The author(s) of this paper chose the Open Review option, and the peer review comments are available at: http://doi.org/10.1525/collabra.158.pr