Computational results are considered reproducible if the same computation on the same data yields the same results if performed on a different computer or on the same computer later in time. Reproducibility is a prerequisite for replicable, robust, and transparent research in digital environments. Various approaches have been suggested to increase chances of reproducibility of R code. Many of them rely on R Markdown as a language to dynamically generate reproducible research assets (e.g., reports, posters, or presentations). However, a simple way to automatically verify reproducibility is still missing. We introduce the R package reproducibleRchunks, which automatically stores metadata about original computational results and verifies later reproduction attempts automatically. With a minimal change to users’ workflows, we hope that this approach increases transparency and trustworthiness of digital research assets.
Computational results, including those from statistical data analyses, are considered reproducible if the same computation on the same data yields the same results if performed on a different computer or on the same computer later in time. Reproducibility is a prerequisite for replicable, robust, and credible research in digital environments (Epskamp, 2019; Hardwicke et al., 2018). Surprisingly often, results from published raw data cannot be reproduced. For example, Artner et al. (2021) were able to reproduce only 70% of major scientific conclusions from a sample of articles published in the domain of Psychology – however, the problem is likely pervasive in all scientific disciplines. Various approaches have been suggested to increase chances of reproducibility of statistical data analyses in the R language (Chan & Schoch, 2023; Nagraj & Turner, 2023; Peikert et al., 2021; Ushey & Wickham, 2024). Many approaches rely on R Markdown (Allaire et al., 2023). R Markdown is a simple language that allows users to mix natural language, simple formatting instructions (e.g., what is a headline, what should appear in bold face, or what are list elements), and executable computer code (R, Python, or other languages) in a single document (Xie et al., 2018). Whenever an R Markdown document is rendered, all of its computer code chunks are executed, which allows for the generation of dynamic content in the document. This is particularly interesting for quantitative research reports, as statistical results and contents of tables and figures can be dynamically generated. Packages like papaja(Aust & Barth, 2022) or stargazer(Hlavac, 2022) further extend R Markdown’s capabilities to format documents (including tables, figures, and statistical results) in a standardized style, such as APA style (American Psychological Association, 2020). As another example, (R) Markdown also forms the basis of executable research articles (Tsang & Maciocci, 2020), a browser-based form of research articles that allows authors to enrich their publications with interactive elements, such as figures or results that dynamically change based on readers’ input. A particular advantage of using R Markdown to create research reports is that it helps to avoid common threats to reproducibility of statistical results, first and foremost, copy-and-paste errors, which happen when results of a regular R script and a scientific report (e.g., a reported statistic or table in a Word document) mismatch by mistake (Peikert & Brandmaier, 2021); further, R Markdown is versatile enough to generate a variety of scientific assets beyond manuscripts, including posters (Thorne, 2019), presentations (Xie, 2022), resumés (O’Hara-Wild & Hyndman, 2023), or preregistrations (Peikert et al., 2021), all using the same R Markdown approach.
In sum, using R Markdown for the creation of scientific reports including statistical computations comes with various advantages, particularly avoiding copy-and-paste errors as a threat to reproducibility. However, there are still various other sources of errors that are not eliminated solely by using R Markdown. For example, different R versions may lead to different results; a famous example is the change in the random number generator between R versions (Peikert & Brandmaier, 2021). For another example, bugfixes or changes in package defaults across different package versions may go unnoticed and lead to different results. Therefore, it is essential to monitor changes in computational results within R Markdown documents throughout the document’s entire lifecycle — from the initial draft, through (peer) review, to publication and long-term storage. Currently, there is no straightforward, automated method for documenting and verifying reproducibility. The reproducibleRchunks package addresses this challenge by ensuring that computational results in R Markdown documents are automatically testable for successful reproduction. It verifies whether the same script with the same data produces identical results, even on different computers or at later times, with minimal disruption to users’ workflows.
This article is written for readers who use R for statistical analyses and ideally have some experience with R Markdown. We begin with a straightforward tutorial demonstrating a typical use case for the package. We then delve into more detailed technical background information and explain various ways to customize the package’s behavior. While a deep understanding of the technical details is not necessary for effective use, gaining insight into these aspects can enhance the user experience and help readers appreciate the challenges of ensuring reproducibility.
Package at a glance
For users already familiar with R Markdown, only two modifications are needed when using the package:
Load the reproducibleRchunk package in the first code chunk of your document;
Change the code chunk type from r to reproducibleR for every code chunk whose reproducibility should be automatically verified.
In reproducibleR chunks, all newly declared variables are automatically identified and their contents are stored in a metadata file. Furthermore, information about the code chunk itself (that is, the code syntax) is also stored, such that later non-reproduction can be traced back to either changes of the code chunk or changes of the computational environment. Once a document is rendered again, all computational results generated in the reproducibleR chunks are automatically tested for reproducibility. Successes and failures of the reproduction attempts are displayed in a reproducibility report for each chunk. Customizations or suppression of these reports are also possible, as we will demonstrate later. A simple way to assess reproducibility of computational results would be a bitwise comparison of the entire knitted1 R Markdown document (i.e., the resulting HTML, PDF or DOCX file) to a previous version. Yet, a particular advantage of using reproducibleR chunks is that users have fine-grained control over what should be tested for reproducibility, and at which stages during the data analysis. This allows for forensic examination of non-reproducibility; for example, finding out whether the non-reproducibility is due to changes in the data file, due to the data cleaning, or to the estimation of a statistical model. Using reproducibleRchunks, users can separately assess the reproducibility of each stage of a given statistical data analysis (or any other computation). Also, not all elements of a research asset may be required to reproduce (such as the current date displayed in a presentation of results). In the following, we will briefly demonstrate how reproduction is checked and how successful and failed attempts are displayed by the package. We will go into details of the mechanics of the package, and discuss potential cases of non-reproducibility and how the package will support researchers in handling these cases.
Introductory Example
The core functionality of the package is to provide reproducibleR chunks, which are code chunks in R Markdown documents that can be used like regular R chunks (including name labels and the usual options regulating output parameters) but offer reproducibility checks in addition. To use these new code chunks, the package must be loaded2 in the first regular R code chunk in the document using:
library(reproducibleRchunks)
On load, the package registers a new type of code chunk called reproducibleR. Figure 1 shows a snippet from an R Markdown file in which there is a reproducibleR chunk with the name addition. The entire tutorial file can be found in the R package in subdirectory “inst/examples_simple_demo” or can be downloaded from https://github.com/brandmaier/reproducibleRchunks/tree/main/inst/examples/simple_demo. In the chunk, a new variable my_sum is declared and defined to be the sum of some variable x plus one. Variable x was declared in an earlier code chunk (not shown in the figure) and is thus not subject to the reproducibility test of this chunk.
In this example, we assume that the R Markdown document will eventually be converted into a HTML report (but all other pandoc-supported formats are possible as well, such as PDF or Word documents). During initial document conversion, the R code in the reproducibleR-chunk is executed and the value of my_sum is stored in a separate file (with the prefix .repro) that contains all information for further reproducibility verification tests. A reproducibility report is shown in the generated document below the code chunk indicating information about all variables of the given chunk. Figure 2 shows a snippet from the HTML report generated during the initial creation of the document.
Once the document is rendered a second time, and the reproducibility metadata file exists, the computations in the code chunk are rerun, and their results are compared to the stored reproducibility information for each variable. The reproducibility report will then detail the success of each reproduction attempt. In our example, the computation was successfully reproduced. A snippet from the resulting generated document is displayed in Figure 3.
If the document is rendered again and the computational result differs from the original result, the failure to reproduce is noted in the reproducibility report. In a variation of the previous example, we assume someone changed the value of x, the variable my_sum would change, and a failure would be displayed (see Figure 4).
Ideally, the reproducibility metadata files are always kept together with the original Markdown file. For example, when providing Open Code on open platforms, the repository should contain both the Markdown files as well as the metadata files. Note that the metadata file format is particularly suitable for use with version control systems (see Peikert & Brandmaier, 2021). If users intentionally modify R code later, e.g., because they want to fix an error, then the metadata files would need to be deleted and recreated. Version control can help users document such changes.
Methods
In the following, we describe how the package stores, retrieves, and compares reproducibility information to verify reproduction attempts. First of all, the package executes reproducibleR code chunks just like regular R code chunks, meaning there are no additional restrictions on what can be computed and tested for reproducibility. After code execution, the package collects information about all variables that were newly declared in the current chunk. The contents of those variables are stored in a separate JSON data file. JSON, short for JavaScript Object Notation, is an open standard for mapping complex objects to text files with high simplicity and readability for machines and humans (Lennon, 2009). The name of the JSON file contains the original Markdown file and the chunk label and, by package default, starts with the prefix .repro. That is, reproducibility information of a chunk labelled datacleaning in the file sem_analysis.Rmd is stored in a file called .repro_sem_analysis.Rmd_datacleaning.json. Once the document is regenerated and matching JSON data files exist, their content is checked against the newly computed chunk variables for identity.
Here is an example of how the contents of a single variable called numbers is stored in JSON format. In this case, the variable content is a vector of five random draws from the number one to ten: 1, 5, 10, 8, and 2 generated by the following command:
set.seed(42) numbers <- sample(1:10, 5)
This vector is serialized in raw format to a JSON format as follows:
{ "type": "integer", "attributes": {}, "value": [1, 5, 10, 8, 2] }
The JSON format has a clear, formal structure that facilitates parsing of information for both computers and humans. This accessibility can make forensics easier if reproduction issues arise. To further aid forensic investigation, the package also stores information about the code syntax used to generate an object. While JSON can represent arbitrarily complex R objects (such as an entire regression model), we opted to not store all computational results by default but only their “fingerprints.” Using the fingerprint, we can determine if an object has changed without storing the object itself. This strategy avoids accidental leakage of potentially sensitive data3 and reduces storage demands. Fingerprints are realized via so-called one-way hash functions (we use the SHA256 algorithm by default), which can take an arbitrary large digital object and map it onto a fixed-size object, which is typically displayed as hexadecimal string. The following shows how the information about variable numbers is stored using a SHA256-fingerprint (which is a 256 bit-long fingerprint that is usually displayed using 64 hexadecimal characters):
52bbb8b04a1e1533e223be4d8c2966c7
81b2d473ddf737acf31641c5369d08b1
The storage of the fingerprinted information about a code chunk and its computational results is illustrated in the schematic in Figure 5.
Comparisons of computational results
Once a document is (re-)generated with metadata present (i.e., reproducibleRchunks with matching JSON files), the package will compare all objects that are amenable for reproducibility checks. Identicality of the original and regenerated results is checked with the all.equal() function from R’s base package. According to the R documentation, it “is a utility to compare R objects x and y testing ‘near equality’. If they are different, comparison is still made to some extent, and a report of the differences is returned.” (see ?all.equal R documentation, R Core Team, 2024).
Types of objects
In principle, any R object is suitable for reproducibility testing no matter how complex. For creating fingerprints, the package uses R’s serialization methods that take any R object as input and prepare it for storage. That is, all of the following variables (x of class integer, y of class character, z of class lm, and lst of class list) are valid examples for the automated reproducibility checks:
x <- 1:10 y <- "qr" z <- lm(x~1, method=y) lst <- list(x, y, z)
Tutorial
In the following, we provide a practical guide on how to use reproducible R code chunks in Markdown documents. First, to install the package, please install the latest version from CRAN4:
install.packages("reproducibleRchunks")
In your R Markdown file, load the package using library(reproducibleRchunks), preferably in the first regular R code chunk at the very top of the document.
Now, we define a reproducible code chunk by setting the code chunk language to reproducibleR. Once this document is rendered for the first time, a data file will be created that stores all reproducible results computed in this code block. The name of the file will contain the label name (in this example helloworld). The following code block contains two reproducible results stored in variables x and y. Those computations can be based on results computed in previous chunks but the variables will not be subject to the reproducibility tests per se. If the R markdown document is rendered a second time, the computational results are recomputed and compared against the original results. For each result, reproduction success or failure is reported separately.
```{reproducibleR helloworld, echo=TRUE, eval=TRUE} set.seed(42) x <- rnorm(10, mean=0, sd=1) y <- 4 * 4 ```
Rendering this code chunk twice in a row on the same computer should result in a message of successful replication. To break reproducibility, remove, or comment out the set.seed() command that sets the random number generator in a reproducible state, and render the document again. Now, you should obtain an error message indicating that the result of x could not be reproduced while y still reproduces as it did not depend on the random number generator.
In practice, we expect that reproducible R chunks will be used both in cases in which the reproduction report should be displayed (e.g., when developing a data analysis script or reproducing a former one) and those in which it should not (e.g., when rendering a presentation or a manuscript for submission to a journal). By default, reproduction report statements are produced for each chunk but this default can be changed via the code chunk argument report=FALSE. If users wish to change the default display of reports over the entire document, this can be adjusted via a global knitr option as follows:
knitr::opts_chunk$set(report = FALSE)
Changing defaults
Some default behaviors of the package can be changed via R options(). The package defines the following options:
reproducibleRchunks.digits This is the number of digits for the rounding of numbers and controls the numeric precision of the reproducibility checks. By default, this is .
reproducibleRchunks.filetype This is the type of data storage. Currently, we only support the JSON format; however, this leaves room for other formats to be supported in future. By default, this is ‘json’.
reproducibleRchunks.hashing Boolean. This indicates whether fingerprints should be used (default) or raw values should be stored.
reproducibleRchunks.hashing_algorithm. This is the hashing algorithm that is used to generate fingerprints of variable contents. By default, the algorithms offered by the digest package can be used.
reproducibleRchunks.templates This is a list with keys corresponding to pandoc output formats and values corresponding to templates for formatting the reproducibility reports. See below for more details.
reproducibleRchunks.prefix This is the prefix for the file containing reproducibility information. The filename will always contain the name of the R Markdown file and the chunk name separated by an underscore. The default prefix is .repro.
Here are a few examples how these options can be changed. First, it is possible to change the precision with which numeric results are stored. By default, this is up to 10 digits. The following line reduces the precision to only four digits after the decimal point:
options(reproducibleRchunks.digits = 4)
By default, computational results are stored as fingerprints using a hash function. To store data in raw format, switch off hashing with the following option:
options(reproducibleRchunks.hashing = FALSE)
There are various hashing functions with different features available. They generally differ along three dimensions: speed, chance of collisions, and security. Generally, hashing functions should map objects of arbitrary size to a fixed-size alphanumeric string. In this application, the choice of algorithm is not overly crucial. Supported algorithms (provided by the digest package, Antoine Lucas et al., 2022) are sha1, crc32, sha256, sha512, xxhash32, xxhash64, murmur32, spookyhash, and blake3. To reduce chance of collisions, the package defaults to sha256, which results in fingerprints of 64 hexadecimal characters that corresponds to a 256-bit fingerprint, but comes at the cost of some speed (sha256 of a vector of 10000 numbers still takes only 0.05 seconds on our standard computers). We believe that in most cases, the speed factor will be irrelevant but it could become an issue when very large objects (e.g., entire neuroimaging data sets) are fingerprinted. In that case, users either have to use a faster fingerprinting algorithm or forego the fingerprinting of raw data and fingerprint intermediate or final results only instead.
options( reproducibleRchunks.hashing_algorithm = "sha256")
Note that these options can be chosen differently for each chunk. That is, it is possible to use fingerprints to store results of one chunk and plain data storage for results of another chunk.
Customization
Last, users of this package can customize the appearance of the reproducibility reports. This can be done by either using convenient template layouts, which can be tailored to each specific (pandoc) output format, for example a template for HTML files and a different LaTeX template for PDF files. Alternatively, users can obtain a summary of the reproduction status of every variable in every chunk. This information is provided by the function get_reproducibility_summary(), which returns a data.frame with three columns. The first column contains the name of the code chunk, the second column contains a variable name, and the third column contains a Boolean variable representing whether the reproduction was a success. This information can be used to either generate custom reports, such as one general report at the very end of the document, or write reports to a separate file. The alternative is to just use the default reports that are appended to each code chunk output (as shown in the previous examples). Here, users cannot change the content but have some degrees of freedom in styling the output. The option reproducibleRchunks.templates stores the default templates that are used for displaying the report information. It is a list of key-value pairs where the key is the final pandoc output format (typically either ‘html’, ‘pdf’, or ‘docx’) and the value is a string containing formatting information. If ‘html’ output is chosen, it can contain any valid HTML/CSS code, if the output is ‘pdf’, it can contain any valid LaTeX code. These formatting instructions can contain two placeholders, which the package will replace with the title of the report (${title}) and the content of the report (${content}).
options(reproducibleRchunks.templates = list( html="<div style='border: 3px solid black; padding: 10px 10px 10px 10px; background-color: #EEEEEE;'> <h5>${title}</h5> ${content}</div>"))
For example, the following code reformats the appearance of the code report in PDF documents (via LaTeX), such that the report is enclosed by two horizontal lines (elements), the title is displayed as a section header, there is a medium skip between title and content, and content is displayed in a small font size:
options(reproducibleRchunks.templates = list( latex="\\hrulefill \n \\section{${title}} \\medskip \\small ${content}\n \\hrulefill \n ") )
As mentioned above, if users wish to entirely suppress the default reproducibility reports, they can use the chunk argument report=FALSE as part of the code chunk options to do so. Note that reproduction is still attempted and reproducibility information is still accessible through the function get_reproducibility_summary(). The other standard chunk options can be used with reproducibleR code chunks as usual. In particular, eval=FALSE suppresses execution of the code, echo=FALSE suppresses that the code is shown, and so forth.
Recommendations
In the following, we give recommendations on how to use the package in typical cases of statistical data analysis and reporting. A simple way to start is to replace all classic R code chunks with reproducibleR code chunks. Then, all variables created in the process of a given data analysis will be tested for reproducibility. However, we advise against this procedure as it is likely that some variables will not exactly reproduce even though all statistical results that researchers find meaningful will reproduce perfectly.
To explain: Statistical models may often store internal information that is not necessarily relevant to the statistical result but will strictly lead to a non-reproducibility error. For example, structural equation models estimated with OpenMx(Neale et al., 2016) store information about the time that elapsed when fitting the model. Below is an example of the contents of a simple confirmatory factor model from the OpenMx documentation called factorFit1 run on OpenMx’s demonstration dataset demoOneFactor including 500 observations on five numeric variables. Among information relevant to check reproducibility of parameter estimates and model fit, its output attribute also contains timing variables wallTime and cpuTime as well as a timestamp timestamp, which will lead to a non-reproducibility error if the entire factorFit1 object is tested for reproducibility. For illustration, we show the content of the cpuTime variable and the timestamp variable for the aforementioned factor model, which was estimated when this manuscript was generated:
cat(factorFit1$output$cpuTime) ## 0.04672503323 cat(factorFit1$output$timestamp) ## 1744280694
Therefore, we generally recommend a checkpoint approach in which reproducibility tests are used only at selected, meaningful stages of the data analysis process and are used only on variables that contain values that are shown and/or interpreted in the scientific report (e.g., effect size point estimates, goodness-of-fit indices, standard error estimates, confidence intervals, test statistics, p values, Bayes factors, etc.). Specifically, we recommend use of at least the following checkpoints:
Data loading: Check whether the loaded raw data are identical to the data loaded in the original analysis;
Preprocessing: Check whether the preprocessed data (that is, data after steps such as outlier removal, aggregation, filtering) are identical to the preprocessed data in the original analysis;
Results: Check whether the results that are reported in text, tables, and figures are identical to the results of the original analysis. At the same time, avoid adding automated tests of entire statistical models but focus on the results.
A convenient approach is to use standard wrapper methods that extract relevant numeric quantities from statistical models, such as parameter estimates, effect size estimates, confidence interval limits, p values, fit indices, or Bayes factors. To this end, several packages support the generic function coef() that can be used, for example, to extract point estimates from linear regression models. Other packages offer their own accessor function such as parTable() for structural equation models in lavaan or omxGetParameters() in the aforementioned OpenMx model. The output of the summary() function may be a good target for reproducibility checks because it often contains information about parameter estimates and fit statistics; however, in some cases (as with OpenMx models), it may again contain timing information, which will not exactly reproduce. Note that small deviations of exact reproducibility that can be tolerated may occur. For example, when comparing reproducibility across machines that work with different numerical precision (e.g., 32 bit vs 64 bit precision), numeric representations may count as reproducible if they are identical up to some numerical precision. To avoid such problems, reproducibleRchunks rounds numeric values to a given precision (by default, ten digits). Finally, users may wish to test identity of reproduction up to a certain precision lower than the package default. This could be relevant for algorithms that are fundamentally based on random numbers, such as Monte Carlo methods, bootstrap estimators, and similar (even though their perfect reproducibility should usually be guaranteed by setting a random seed, but see Peikert & Brandmaier, 2021). In this case, approximate reproducibility tests could be realized by individually adjusting the numeric precision of the tests (see option reproducibleRchunks.digits as previously described). Adherents of the tidyverse approach (Wickham et al., 2019) are well advised to use functions tidy() and glance() from the broom package (Robinson et al., 2023). These functions convert parameter estimates and model fit statistics from various statistical models in R (e.g., anova, glm, coxph, gam, lm, lavaan, smooth.spline, survfit, and others) into a consistent and easily accessible format known as tibble. This format is particularly suitable for fingerprinting using our proposed approach. Here is a brief example of how the broom package formats the output of a linear regression model:
x <- rnorm(10) y <- rnorm(10) broom::tidy(lm(y~x))
## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.389 0.272 1.43 0.191 ## 2 x -0.586 0.420 -1.39 0.201
As a rule of thumb, we recommend fingerprinting the results of all parameter estimates from the tidy() function and all model fit statistics from the glance() function in a reproducibleR chunk.
Discussion
Summary
Dynamic document generation using R Markdown (e.g., reports, posters, or presentations) is an important building block for reproducible research. However, a simple way to automatically verify reproducibility of dynamically generated content within such R Markdown documents was lacking. We have developed an R package that enables automatic reproducibility testing of R code chunks within R Markdown documents while requiring essentially no changes to users’ workflows. This is achieved by storing metadata about computational results that can be used later to verify reproducibility. Templates allow for customization of the appearance of reproducibility reports; further, the package allows users to generate entirely customized reproducibility reports either within a given R Markdown document or as separate files.
Strengths and practical implications
Adopting reproducibleR chunks in a workflow offers several benefits. First of all, it allows manual verification of reproducibility whenever an R Markdown document is rendered. This may be particularly relevant when collaboratively working on scientific reports. Adopting the proposed approach ensures that the same results are generated across all computers of a team of researchers. Even in projects where a person works alone, the proposed workflow is useful. It ensures that all computed results remain consistent throughout the lifecycle of a scientific report, from the first draft through peer-review phases to publication. In addition to manually verifying computational results, automatic checks are also possible. Documents can be systematically checked for reproducibility using the function call isReproducible(), which takes as input an R Markdown file and returns a logical value indicating whether reproduction was successful. On collaborative platforms like GitHub, reproducibility could be automatically tested with a GitHub action whenever the repository is modified. It is conceivable that journals could use a similar mechanism to check the reproducibility of a paper upon submission.
Limitations
Our approach is currently limited to users who rely on R Markdown. Regular R scripts are not amenable to our proposed approach. Recently, Quarto (Allaire & Dervieux, 2024) was proposed as a next-generation Markdown publishing system and its popularity has grown. However, our proposed approach is not yet compatible with Quarto. In general, our package relies on the availability and stability of some R functions and packages used for serializing objects and generating fingerprints. Below, we outline these issues and other short-term and long-term issues one may experience with this package.
Potential short-term and long-term issues
In the following, we briefly discuss a few future scenarios in which threats to reproducibility or reproducibility checking may occur and explain how these situations can be dealt with:
Original computations were executed and stored in the JSON data file. Later, a reproduction attempt fails using the exact same R Markdown file executed on a different computer. This is a classic case of non-reproducibility that is often due to changes in the software packages and the R version the computations in R Markdown rely upon (Epskamp, 2019; Peikert et al., 2021). Note that the goal of this package is not to guarantee reproducibility but to allow for automated testing and reporting of reproducibility. To increase the chances of reproducibility in the first place, various solutions exist (Chan & Schoch, 2023; Nagraj & Turner, 2023; Peikert et al., 2021; Ushey & Wickham, 2024).
Some computations are executed and their original results are stored in a JSON file as planned. Later on, someone modifies the R code in the R Markdown file, such that the results differ and a failure of reproduction is indicated. This change of code between the original computation and the reproduction attempt is caught by the package because fingerprints of the entire code chunk syntax are stored. Users are informed by a warning in the reproducibility report. The package will still try to reproduce each result and give individual reports on successes and failures.
The metadata files get lost. Without metadata files, no automated reproducibility check can be made. In this case, it may be a good idea to rerun the analysis in a computing environment as close as possible to the original computing environment and store the newly generated metadata files. It is advisable to manually compare the recreated results to some original reference (e.g., a report or published article) for consistency.
The reproducibleRchunks package is not available anymore. In this case, all R Markdown code chunks of the type reproducibleR can be renamed to R, allowing the manuscript to render, albeit without automated reproducibility checks. If there is a large number of code chunks, the following line of code can be inserted in the Markdown document in the very first code chunk, which will tell knitr to render all reproducibleR chunks as regular R chunks without the need to rename the chunks: R knitr::knit_enginesget("R").
A future R version changes the way objects are serialized (that is, converted from the internal representation to a byte-stream representation, of which the fingerprint is taken). The metadata contains information about the R version used to generate the metadata, so future versions of our package could easily be adapted to this change. As long as the digest package is used for generating fingerprints, the serialization version can be fixed to the version that is used at the time of writing (2025) using this command: options(serializeVersion=2).
Rigor in software development
In developing this package, we adhere to three major aspects of rigor in scientific software development (Brandmaier et al., 2024). First, the package comes with a variety of formal tests (based on the testthat package, Wickham, 2011) to test correct functioning of the package. Second, we provide documentation in form of this manuscript and an online documentation (https://github.com/brandmaier/reproducibleRchunks). Third, bug reports and feature requests can be submitted through our GitHub project website.
Outlook
Again, we emphasize that this package is not meant to ensure reproducibility but allows for automatic testing and verification of reproducibility. For example, it would allow for reproducibility checks in peer review as suggested by Crüwell et al. (2023), which could be automated using this package. According to the authors’ convention, the current approach relies on testing whether results are exactly reproducible, which is consistent with the idea of awarding an Open Data badge. If there was a demand, the package could be extended to support the more fine-grained judgements suggested by Crüwell et al. (2023), such as essentially reproducible (minor deviations in the decimals), partially reproducible (minor deviations, but the results were mostly numerically consistent), or mostly not reproducible (major deviations). Note, however, that the consistent use of R Markdown already eliminates some sources of irreproducibility, such as copy-and-paste errors (Peikert et al., 2021). Even though a variety of approaches have been suggested to ensure reproducibility of computations in R Markdown documents (Chan & Schoch, 2023; Nagraj & Turner, 2023; Peikert et al., 2021; Ushey & Wickham, 2024), we ourselves have encountered various scenarios in which such approaches (including our own) failed. First and foremost, many approaches rely on further software packages such as Docker, which essentially provide virtual environments that execute code under identical conditions on different machines (or the same machine at different time points, e.g., before and after an upgrade of R or of some or all packages used). On some machines, Docker may simply be not available, either because it is not (yet) available on a certain newly introduced hardware (e.g., this happened when Apple switched to their own processor brand) or because a user does not have admin privileges to install Docker in the first place. Further, some approaches rely on service providers such as Microsoft’s MRAN archive, which was unexpectedly terminated a while ago. Now and then, these different weak points lead to situations in which users will try to locally reproduce historic computational results and may want to ensure that these reproductions were succesful. With our package, the success of such reproduction attempts becomes easily and formally testable.
We hope the suggested approach helps to raise awareness for the importance of reproduciblity and increasing the visibility of potential non-reproducibility issues in R code, eventually increasing quality and credibility of digital research assets.
Contributions
Conceptualization: Andreas M. Brandmaier (Lead). Methodology: Andreas M. Brandmaier (Equal), Aaron Peikert (Equal). Software: Andreas M. Brandmaier (Lead). Writing – original draft: Andreas M. Brandmaier (Lead). Writing – review & editing: Andreas M. Brandmaier (Lead), Aaron Peikert (Supporting).
Acknowledgements
We thank Leonie Hagitte for providing comments on an earlier version of the manuscript. We thank Julia Delius for her helpful assistance in language and style editing.
Competing Interests
The authors declare no competing interests.
Ethics statement
This study did not involve testing of human participants.
Footnotes
This is R Markdown-speak for generating a publishable document from an R Markdown source file.
If an R Markdown file is knitted and the reproducibleRchunks package was not loaded, this error is shown: Error In get_engine(options$engine) : Unknown language engine reproducibleR.
Note that many fitted statistical models in R also contain raw data, such as (general) linear model fits from the stats or lme4 package or structural equation model fits from the lavaan or OpenMx packages.
The latest version can be found in the package repository: https://github.com/brandmaier/reproducibleRchunks.