Estimating and examining the replicability of belief system networks

Belief system structure can be investigated by estimating belief systems as networks of interacting political attitudes, but we do not know if these estimates are replicable. In a sample of 31 countries from the World Values Survey (N = 52,826), I find that countries’ belief system networks are relatively replicable in terms of connectivity, proportion of positive edges, some centrality measures (e.g., expected influence), and the estimates of individual edges. Betweenness, closeness, and strength centrality estimates are more unstable. Belief system networks estimated with smaller samples or in countries with more unstable political systems tend to be less replicable than networks estimated with larger samples in stable political systems. Although these analyses are restricted to the items available in the World Values Survey, they show that belief system networks can be replicable, but that this replicability is related to features of the study design and the political system.


ORIGINAL RESEARCH REPORT
Estimating and Examining the Replicability of Belief System Networks Mark J. Brandt Belief system structure can be investigated by estimating belief systems as networks of interacting political attitudes, but we do not know if these estimates are replicable.In a sample of 31 countries from the World Values Survey (N = 52,826), I find that countries' belief system networks are relatively replicable in terms of connectivity, proportion of positive edges, some centrality measures (e.g., expected influence), and the estimates of individual edges.Betweenness, closeness, and strength centrality estimates are more unstable.Belief system networks estimated with smaller samples or in countries with more unstable political systems tend to be less replicable than networks estimated with larger samples in stable political systems.Although these analyses are restricted to the items available in the World Values Survey, they show that belief system networks can be replicable, but that this replicability is related to features of the study design and the political system.

Keywords: Belief systems; networks; replication; political stability
Estimating the structure of belief systems is a central activity in political psychology, political science, and sociology (e.g., Barker & Tinnick, 2006;Converse, 1964;Johnston & Ollerenshaw, 2020;Kinder & Kalmoe, 2017;Malka et al., 2019).Multiple teams have begun to conceptualize (Friedkin et al., 2016) and estimate (Fishman & Davis, 2019;Boutyline & Vaisey, 2017;Brandt et al., 2019) the structure of political belief systems as networks of relevant political attitudes and identities (for work looking at individual attitudes see Dalege et al., 2016; for work looking at moral values see Turner-Zwinkels et al., in press).The attitudes and identities of the belief system are the nodes of the network and the connections between them are the edges.After estimating the belief system network, the teams use centrality metrics from network science to identify the most central components of the belief system in the population (e.g., Boutyline & Vaisey, 2017;Brandt et al., 2019) or compare belief system density between different subgroups (e.g., Fishman & Davis, 2019).Although these teams focused on centrality and density, other edge, node and network characteristics could also be used to understand the structure of political belief systems, just as they have been used to understand the structure of other psychological constructs (e.g., psychopathology, Fried et al., 2018).
Prior work on belief system structure often assesses the association between pairs of beliefs in a population to inform how a belief system is structured (e.g., Chen & Goren, 2016;Kinder & Kalmoe, 2017;Malka et al., 2019).For example, Malka and colleagues (Malka et al., 2019), recently demonstrated that the link between economic beliefs and cultural beliefs are not always positive when looking across countries.These emerging methods rooted in network science allow researchers to go beyond pairs of associations to analyze the entire belief system simultaneously (rather than just two or three nodes at a time).This allows individual cultural and economic beliefs, such as those used by Malka and colleagues, to be situated with the other beliefs and identities in the belief system (Boutyline & Vaisey, 2017;Brandt et al., 2019).This also allows scholars to use modeling techniques that best match the tendency to theorize about belief systems as if they are networks.The purpose of this paper is to document and explore the replicability of belief system networks in a range of countries, so that researchers interested in these methods and ideas have necessary information about the replicability of the technique.

What is the Technique?
There are multiple methods for analyzing belief systems as networks depending on one's theoretical assumptions (e.g., Boutyline & Vaisey, 2017;Brandt et al., 2019).One approach (Brandt et al., 2019), builds on work conceptualizing a variety of psychological constructs as networks (e.g., Borsboom & Cramer, 2013;Costantini & Perugini, 2016;Dalege et al., 2016;Sayans-Jiménez et al., 2019) and models belief systems using a partial correlation approach.This approach assumes that nodes that are positively connected want to be like one another, that connected nodes reciprocally affect one another, and that nodes that are unconnected are independent conditional on all of the other nodes of the network.These assumptions are consistent with a pairwise random Markov field and can be estimated as Gaussian graphical models (Epskamp, & Fried, 2018;Lauritzen, 1996). 1  Theoretically, this approach is consistent with the idea that people prefer to have consistent belief systems and worldviews (Festinger, 1957;Gawronski et al., 2012;Randles et al., 2015) and with theory (e.g., Converse, 1964;Gerring, 1997) and quantitative models (e.g., Friedkin et al., 2016) that conceptualize belief systems as interconnected political attitudes and beliefs.Empirically, this approach estimates partial correlations between all of the variables (i.e.nodes) in the network and adopts regularization and model selection techniques to reduce the size of the parameter space and decrease false discovery rate (Epskamp, Borsboom, & Fried, 2018;Epskamp, & Fried, 2018;Williams, 2018;Williams et al., 2018).These estimates become the edges (or paths) in the belief system networks.They can be the target of investigations in and of themselves, or be used to calculate other features of the belief system (e.g., connectivity, centrality of a node).We use these methods to estimate the overall belief system in countries and test the replicability of these estimates.

Why Investigate Replicability?
Belief system networks are a recent methodological technique adopted from the psychopathology literature.Although this technique may be promising, it is important to understand the extent estimates from belief system networks replicate before extending the technique to study a wide range of phenomenon in the belief system literature.Assessing the replicability of estimates of belief system networks can provide justification for the use (or abolishment) of these estimates in the political psychology and political science literatures, but can also help address theoretical predictions about the stability of belief system structure.
I aim to address two questions with this study.First, I aim to document the extent to which belief system networks are replicable.I will do this by comparing belief systems estimated for a country at two different time points and assessing how similar the estimates of edge weights, centrality metrics, connectivity, and other features are between the two time points.Second, I aim to explore how methodological features and characteristics of the political systems are associated with the replicability of political belief system networks.

Are Belief System Networks Replicable?
It is important to document the replicability of belief system networks because we do not know the extent we can expect a belief system estimated in a population to replicate in the same population.There is some indication that belief systems are replicable.For example, in both the United States and New Zealand operational components of belief systems (i.e.political policies) tend to be less central than symbolic components of belief systems (i.e.identification with political symbols) across multiple time points (Brandt et al., 2019;Fishman & Davis, 2019).
Moreover, when researchers test network replicability in the network psychopathy literature, the typical result is that the networks are similar in different samples (Borsboom et al., 2017;Fried et al., 2018;Jones, Williams, & McNally, 2019), although others disagree (Forbes et al., 2017;Forbes et al., in press).Together, one might suspect that the estimation of belief system networks is replicable.
However, there are also reasons to suspect that belief system networks are difficult to estimate reliably.One challenge is that they require researchers to estimate a large number of parameters.A 10-node belief system has 45 potential edges between nodes.A 20-node belief system has 190 potential edges.And a 30-node belief system has 435 potential edges.Although the estimation techniques are designed to reliably estimate the networks, even when faced with many parameters (e.g., Epskamp, Borsboom, & Fried, 2018;Epskamp, & Fried, 2018;Williams, 2018;Williams et al., 2018), it is not yet clear if this is the case with real political data.A second challenge is that researchers typically use single items to estimate each node in a belief system network.This is because the instruments used to assess belief systems are not designed to assess each potential node with multiple items.Instead, each possible policy or identity is typically represented with just one item.This practice may result in less replicable networks overall.
Another reason that political belief system networks may not be replicable is that the political system changes over time.This may be due to shifts in political coalitions, the salience of particular issues, high profile discrete events (e.g., a terrorist attack), or the experience of largescale social upheavals (e.g., economic recessions).Such changes are likely to be reflected in the structure of the belief systems themselves (e.g., Ciuk & Yost, 2016;Converse, 1964;Federico & Malka, 2018).For example, partisan cues about changing the New Zealand flag shifted the link between party identity and support for changing the flag (Satherley et al., 2018).Similarly, major societal events like wars, economic, recessions, and major terrorist events can shift political attitudes (Van de Vyver et al., 2016;Zaller, 1992).Zaller (1992), for example, highlights how the link between political dispositions and support for the Vietnam war changes as the elite rhetoric changes.These are examples that might lead to less replicable belief system networks over time and highlight the importance of the political context for understanding the structure of political belief systems.

What Predicts Belief System Network Replicability?
The methodological and theoretical reasons to expect that belief systems may not be replicable can also be used to generate expectations for what might be related to belief system network replicability.
Methodologically, sample size and time between assessment may affect belief system replicability.First, larger samples can increase replicability by helping to precisely estimate the large number of parameters in the belief system network.This should increase the chances that a belief system network is replicable (Epskamp & Fried, 2018).Moreover, prior meta-science research suggests that the sample size of the original study is associated with its replicability (Open Science Collaboration, 2015).Second, belief system networks may be less replicable when there is more time between their estimation because of a variety of subtle and not-so-subtle changes in the political system.Just as the correlation between personality assessments decreases with time (Anusic & Schimmack, 2016), so might the replicability of belief system networks in the countries we examine.
Theoretically, indicators of instability in a political system and the fragmentation of political parties may be associated with less replicable belief systems.First, countries with more political changes should have less replicable belief system estimates.Although some political systems are relatively stable overtime, other political systems are not (e.g., Carlsen & Bruggemann, 2017).Such instability may result in belief systems that are less replicable.This may be because the salient issues in the system change overtime (e.g., Ciuk & Yost, 2016;Zaller, 1992), or because precise packages of beliefs propagated by elites shifts with the shifting political system (e.g., Converse, 1964;Federico & Malka, 2018;Zaller, 1992).Second, party fragmentation (Gallagher & Mitchell, 2008;Laakso & Taagepera, 1979) may also be associated with less replicable belief system estimates.Party fragmentation occurs when there are more political parties competing for votes.When there are more political parties each vying for votes, influence, coalitions, and legitimacy there may be more elite packages of beliefs to choose from leading to less replicability over time.Whereas the stability of the political system may produce less replicable belief system estimates due to changes in the political system, party fragmentation may produce less replicable belief system estimates due to the greater availability of belief system packages (i.e.issue combinations) in the system. 2

The Current Study
To answer my two key questions, I analyze data from the World Values Survey.This allows me to estimate belief system networks for a variety of countries using exactly the same items for multiple countries at multiple points of time.This means that any differences in the belief system estimates cannot be attributable to differences in the items.I estimate the replicability of belief system networks by comparing belief system networks estimated from a single country at multiple time points.I assess how stable the edge-characteristics (e.g., size of the edges), node-level characteristics (e.g., centrality), and overall network characteristics (e.g., connectivity) are across time.In addition to mapping on to the metrics used in the belief system network and psychological network literatures, this broad selection of metrics allows us to ascertain if some aspects of the belief system (e.g., overall characteristics) are more replicable than others (e.g., edge-characteristics).By holding method constant and only varying time and country, we are able to estimate and compare replicability between countries.
After examining overall rates of replicability, we test if sample size, years between assessments, the stability of the political system, and party fragmentation is associated with variation in replicability across countries.In addition to furthering our understanding of belief system networks, these latter analyses also build on work on the replicability of psychological networks (Borsboom et al., 2017;Forbes et al., 2017;Forbes et al., in press;Fried et al., 2018;Jones, Williams, & McNally, 2019).Only by estimating networks for more samples than is typical (e.g., Fried et al., 2018 investigated four samples) are we able to investigate the correlates of network replicability.

Method
Participants and Procedure I used data from the World Values Survey (Inglehart et al., 2014).After excluding countries, waves, and participants who did not complete all of the relevant measures, my analyses included data from 52,826 participants (52% men, 48% women, 0.001% missing gender data, M age = 42.0,SD = 15.9) from 31 countries (mean N/country/wave = 724, SD = 349) who were part of the 3 rd (1995)(1996)(1997)(1998), 4 th (1999)(2000)(2001)(2002)(2003)(2004), and 5 th (2005)(2006)(2007)(2008)(2009) waves of the World Values Survey (see Table S1 for list of sample sizes and countries).This allows me to estimate the replicability of belief system networks estimated at Wave 3 by comparing it with those estimated at Waves 4 and 5, and the replicability of belief system networks estimated at Wave 4 by comparing it with those estimated at Wave 5.For narrative simplicity, I refer to the earlier network (i.e.Wave 3 or Wave 4 networks depending on the comparison) as the "original network".

Measures
Belief System Measures I included 19 items assessing political attitudes and identities in the belief system networks.I chose items if they were available across the three waves and if they were measures of political attitudes.One challenge for item selection is that we ideally would include items that are relevant in the countries, yet countries have different relevant issues.To guard against this issue, we included a broader array of items than past work on political beliefs using the World Values Survey (e.g., Malka et al., 2019). 3 The items we chose included items assessing social issues (e.g., immigration policy, justifiability of euthanasia), economic issues (e.g., the role of government in businesses, inequality), environmental issues (e.g., protecting the environment vs. economic growth), government types (e.g., preference for army rule, democracy), and selfidentification as right-wing or left-wing (all items are in Table 1).Ideological identification, economic issues, environmental issues, and social issues were all recoded so that higher scores indicated more traditionally right-wing positions.Governing types were scored so that higher scores indicated more support for non-democratic and anti-democratic governing types.

Country-Level Measures
To explore correlates of replicability across countries, I used the original network's sample size (see Table S1), years between assessments, indicators of the stability of the country's political system, and the number of effective parties in the political system.
I assessed the stability of the country's political system two ways.First, I used the 2006 values of the Fragile States Index.This index uses a variety of content analyses, qualitative data, and quantitative data to assess the stability of the country's political, economic, and social system.Because the different facets of the index all tap into issues that could affect the items in the belief systems I estimate, I use the total score for the Fragile States Index (Fragile States Index, 2006).Ideally, I would have used values from the same years I have data for the countries; however, 2006 was the oldest available data of this index.Second, I used changes in the levels of democracy between waves to assess overall changes in the political system.To estimate democracy, I used the average democracy as assessed by the Varieties of Democracy project (Coppedge et al., 2019) and calculate the absolute value of the difference between two waves as an indicator of political change.In addition, I explored if the level of democracy (rather than change) is associated with belief system replicability by using the democracy estimates from the initial year in the replicability comparison (e.g., Wave 3 democracy to predict Wave 3 to Wave 4 replicability).Party fragmentation was assessed using the effective number of parties at the parliamentary or legislative level (Gallagher, 2019;Gallagher & Mitchell, 2008;Laakso & Taagepera, 1979).This measure is a combination of the number of Table 1: Items used to estimate the belief system networks.

Ideological Identification
1 In political matters, people talk of "the left" and "the right."How would you place your views on this scale, generally speaking?"(1 = left, 10 = right).parties and relative party size within a political system.Higher numbers indicate greater fragmentation.

Estimating Belief System Networks
Belief system networks were estimated for each country/wave combination, 73 networks in total.Partial correlation networks, that meet the assumptions outlined in the introduction, are a type of Gaussian graphical model that can be used to estimate networks that meet these assumptions (Epskamp, & Fried, 2018;Lauritzen, 1996).
Although Gaussian graphical models can be encoded in a partial correlation matrix where each edge in the network is a partial correlation, due to the number of parameters it is necessary to adopt techniques that do well with a large number of parameters.There are multiple methods that do this (e.g., Epskamp & Fried, 2018).
We chose a Bayesian estimation technique with a Wishart prior distribution (Williams, 2018) implemented in the R package BGGM (Williams & Mulder, 2019).This method has an acceptable false discovery rate, is computationally efficient, and efficiently incorporates network comparisons (a key aspect of this study).All of the belief system measures are included in the analysis.The output is a matrix of edges between all of the nodes in the network (i.e., the partial correlations between all of the issues and identities in the belief system).We keep edges (i.e.partial correlations) in the belief system network when the probability of a positive or negative effect is 95% and set the remaining edges to zero (see e.g., Williams, 2018).These are the belief system networks and they can be interpreted as the partial correlations between issues and identities when controlling for all of the other nodes in the network.The networks are visualized in Figures S1-S3.I compare belief system networks from the same country estimated in different years.This analysis holds the country constant and examines how similar the belief system networks are at different time points.I compare overall features of the edge-level, node-level and overall network metrics.For each metric, I also computed benchmark expectations using simulations.

Description of Benchmark Simulations
It is not clear how replicable we should expect belief system networks to be.The complexity of belief system networks and their estimation makes it impossible to rely on a typical null distribution or benchmarks developed for assessing the replicability of psychological scales.Therefore, I conducted simulations to identify replicability benchmarks.I first simulated 1000 random graph networks (Erdős & Rényi, 1960;Yin & Li, 2011) using BDgraph's (Mohammadi & Wit, 2019) bdgraph.simfunction.Then, I simulated two datasets based on this network and estimate the network using the same methods described above.Finally, I calculate the similarity between the two estimated networks using the same methods used to compute replicability (see below).For each network, the probability of nodes connecting randomly was randomly determined and could take on the values [.6, .7, .8, .9].The sample sizes for the simulated datasets were randomly chosen from the Wave 3 Ns (simulated dataset 1) and the Wave 5 Ns (simulated dataset 2).

Replicability of Edge-Level Metrics
First, I compared networks on edge-level metrics.I correlated the edges from the original network to the edges from subsequent networks from the same country.This assesses how replicable the connections between the nodes (i.e. the partial correlations) are across time.Figure 1 shows these correlations.The median wave-towave correlation ranges between .65 and .69.Although this suggests that for most countries there is some correspondence between edges at two time points, all of these estimates are below the median benchmark and nearly all are outside of the benchmark expectations.There is variation in the edge-to-edge correlation between countries.For example, Montenegro's wave 3 to wave 4 correlation is .38,suggesting relatively less correspondence between edges at two time points.Other countries, such as Albania, India, and Moldova observed correlations less than .50.On the other side, across all three comparisons the estimate for the United States is within the benchmark expectations and is greater than .83.
A more direct way to test if edges differ between waves is to directly compare them using Bayesian hypothesis testing.Here I follow the example of Jones and colleagues (2019) and I compute Bayes Factors (H 0 = equality, H 1 = not equal) for each edge.I used a somewhat unrestricted and uninformative prior (sd = .35)that is agnostic to the size of the edges (a less informative prior finds higher replication rates).For each edge, I can see if evidence is primarily in favor of equality, inequality, or if it's inconclusive.I use a Bayes Factor of 3 to make this determination.
The results of these comparisons are summarized in Figure 2. On average, approximately 73% of the edges were equal when comparing belief system networks across waves (i.e. a Bayes factor >3 for the "equal" hypothesis; Median range [.71, .75]),whereas approximately 5% were not equal (i.e. a Bayes factor >3 for the "not equal" hypothesis) (Medians range [.05, .06]).The remaining edges were inconclusively equal or not equal (Median range [.18, .20]).That is, across countries there is relatively little evidence of dramatic differences in edges across waves.The edges of belief system networks are largely stable.As before, there is variation in these estimates.Belief system networks in countries like India, Moldova, and Montenegro tended to have fewer edges identified as equal and more edges identified as not equal or inconclusive.The proportion of equal and inconclusive edges are similar to the benchmarked estimates; however, the proportion of not equal edges generally appears to exceed benchmarked estimates suggesting that the edges that are not equal may indicate genuine changes in the underlying network (i.e. a genuine change in belief system structure in the country from wave to wave).

Replicability of Node-Level Metrics
Second, I compare the networks on node-level metrics.
Although the node-level metrics are often composites of the edges compared above, it is necessary to also estimate the replicability of the node-level metrics.This is because these metrics are sometimes used as outcomes in and of themselves (e.g., centrality estimates found in Brandt et al., 2019) and because researchers in other domains have noted instances where some node-level metrics (e.g., betweenness centrality) are unreplicable even when the edges are replicable (e.g., Epskamp, Borsboom, & Fried, 2018;Fried et al., 2018).I calculate betweenness centrality, closeness centrality, strength centrality, eigenvector centrality, 1-step expected influence, and 2-step expected influence for each node (see Table 2; Bonacich, 1987;Epskamp & Fried, 2018;Opsahl, Agneessens, & Skvoretz, 2010;Robinaugh, Millner, & McNally, 2016).These metrics give an indication of the centrality of the node in the network and its potential for influencing the nodes around it.They have all been used in research on psychological and belief system-related networks (Boutyline & Vaisey, 2017;Brandt et al., 2019;Epskamp & Fried, 2018;Robinaugh et al., 2016).
I also calculate the Bayesian R 2 for each node (i.e. the percent variance explained in the node by all of the other nodes in the network).This gives an estimation of the upper bound on the extent of controllability of the node (i.e. if all edges go towards this node, this tells how much we can influence the node by changing its neighbors; Haslbeck & Waldorp, 2018).For each of the node-level metrics, I correlate the node-level metrics from one belief system network to the node-level metrics of the belief system network of the same country from another wave (e.g., Argentina's betweenness centrality on all nodes at  S1.
Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.S1.Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.
Table 2: Summary of measures of centrality.

Betweenness
The number of times a node sits on the shortest path between two other nodes in the network.

Closeness
The inverse of the total length between a node and all other nodes in the network.

Strength
The sum of the absolute value of the connections between a node and its immediate neighbors.

Eigenvector
The extent a node is connected to other prominent nodes in the network.
Expected Influence ( 1Step) The sum of the value of the connections between a node and its immediate neighbors.
Expected Influence ( 2Step) A node's 1-Step Expected Influence plus the 1-Step Expected Influence of the other nodes in the network weighted by their connections with the target node.
Note: Consistent with practices in the field, the first four centrality metrics treat all edge weights as positive (i.e. it take the absolute value of all the edges).The last two centrality metrics use both positive and negative edges.
Wave 3 correlated with Argentina's betweenness centrality on all nodes at Wave 4).Higher correlations indicate greater replicability.
Results of the node-level comparisons are in Figure 3.Although median stability is greater than zero across all of the metrics and comparisons (Median range [.21, .93]),there is substantial variability across each of the measures.For example, betweenness centrality ranges from -.17 (Moldova) to .68 (Norway) for the wave 3 to wave 5 comparison, suggesting anything from slight anti-stability to moderate stability.Although the medians are generally higher, similarly wide ranges are found for closeness, strength, and eigenvector centrality.Of the centrality metrics, eigenvector centrality had the highest overall stability (Median range [.75, .81]).The two expected influence metrics had moderate overall stability (Median range [.63, .75]),although the 2-step version had somewhat higher stability.Node predictability tended to be relatively high (Median range [.88, .92]).Notably, the replicability of betweenness, closeness, strength, and expected influence (1 step) metrics were typically outside the benchmarked expectations.The replicability of eigenvector centrality, expected influence (2 step), and predictability were typically within the benchmarked expectations.These findings suggests that betweenness and closeness centrality, which have featured prominently in work on belief system networks (Boutyline & Vaisey, 2017;Brandt et al., 2019), should be treated with caution.S1.Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.

Replicability of Overall Network Characteristics
Third, I compare the networks on overall features of the network.These include the overall connectivity and the proportion of positive edges.Average shortest path length was used as the measure of network connectivity (Dalege et al., 2018;Wasserman & Faust, 1994).Shortest path length was calculated using Dijkstra's algorithm.This algorithm minimizes the inverse distance between two nodes using the absolute value of the edge weights.
Higher connectivity is indicated by lower average shortest path length.The proportion of positive edges is simply the proportion of edges greater than zero.This indicates whether the overall "logic" of the network is replicable (cf.Boutyline & Vaisey, 2017).I examine how large the absolute value of the difference of connectivity and the proportion of positive connections are between waves.
Larger differences indicate worse replicability.
The differences in connectivity and proportion positive edges of the networks are compared in   S1.Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.
primarily fall outside of the benchmark expectations. 4 The median differences in the proportion of positive edges is small (Median range [.02, .04])and all estimates are consistent with the benchmark expectations.These data suggest relative similarity between belief system networks estimated at different time points.That said, there is variability with some countries (e.g., Montenegro, Philippines, Serbia) showing larger differences in connectivity and the proportion of positive edges.

Robustness Check
I tested if the replicability estimates in the prior sections are consistent with replicability estimates for the same data and networks after removing a subset of items.This helps us understand if the replicability estimates are due to the specific combination of items.For these checks, I randomly removed 4 of the items, reestimated all of the networks for each country/wave combination, and estimated the replicability metrics in the prior sections.These estimates are presented in Figures S4-S7 in the supplemental materials.The distribution of replicability estimates in the original networks (examined above) and the networks using a subset of items are highly overlapping.This suggests that the replicability estimates are not due to the specific combination of items.

Replicability Associations
Belief system networks appear to be relatively replicable in absolute terms.However, average levels of replicability mask underlying variation: belief system networks tend to be highly replicable in some countries and seem to be substantially less replicable in others.This could mean that belief systems are meaningfully different across countries, but it may also indicate that differences in precise methodological details could play a role.I tested if sample size, years between assessments, the stability of the countries' political system, and the number of effective parties in the countries' political system are associated with replicability.I use the original network's sample size, the number of years between assessments, and the countries' scores on either Fragile States Index, changes in democracy, overall levels of democracy, or the number of effective parties in the countries' political system to predict replicability on all of the indexes included here (i.e. the y-axis in Figures 1-4).I regressed replicability for each index on sample size, number of years between assessments, and either Fragile States Index, changes in democracy, overall levels of democracy, or the number of effective parties in the countries' political system using a multilevel model estimated with lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) and nesting observations within the three wave-comparisons groups (48 models total).All predictors and outcomes were rescaled to range from 0 to 1.I reverse scored the replicability indices for the overall network so that higher scores indicated more replicability.Finally, I averaged all of the indicators of replicability to create an aggregate index of replicability across all possible metrics.The results of these analyses are in Figure 5.The confidence intervals for sample size, difference in years, and changes in democracy are, in general, relatively wide which means that these estimates are unlikely to be precise.The confidence intervals for state fragility, overall democracy, and number of effective parties were relatively more precise.In general, we see that higher sample sizes are associated with replicability, more unstable states are associated with unreplicability, and more democracy is associated with replicability.Changes in democracy, the number of effective parties, and the difference in years did not have clear effects.These overall impressions should be interpreted cautiously given the wide confidence intervals; however, it does appear that sample size and political context are associated with replicability.

Discussion
Conceptualizing and analyzing belief systems as networks can give insight into the overall structure of belief systems, including its central components (Brandt et al., 2019b) and how it changes over time (Fishman & Davis, 2019).However, to be confident in these insights, we need to know how replicable the method is.I find that belief system networks are, on average, replicable across a range of countries.For five of the 11 metrics, I find that the median replicability fell within the expected range of the benchmark simulations.For the remaining six metrics, the median fell outside of the expected range; however, it often at least represented a moderate correlation between waves (e.g., the median replicability for expected influence [1 step] was >.60).It does appear that estimates based on the identification of shortest paths through the network (e.g., connectivity, betweenness centrality, and closeness centrality) tend to be less replicable than estimates that are based on directly connected edges (e.g., strength centrality, predictability).
The overall relative replicability of the belief system networks masks underlying variation in replicability.For each indicator of replicability there was variation in replicability across countries.We found that this variation in replicability was associated with state fragility, overall democracy, and the sample size.When sample sizes are small and political systems are unstable, we should not expect the estimate of the belief system network to be stable.Theoretically, these results suggest that the stability of the belief system corresponds to the stability of the broader political system.It is, of course, possible for belief systems to change in stable political systems, however, these changes appear to be larger and more readily apparent in unstable political systems.This is consistent with work suggesting that changes in the political environment can shift the structure of the belief system (e.g., Zaller, 1992).It also suggests that work on belief systems in less stable contexts may be less stable overall and is an empirical finding in need of study in and of itself.In practice, these results (re)highlight simulations which have shown the need for large samples in order to estimate replicable belief system networks (Epskamp, Borsboom, & Fried, 2018;Epskamp & Fried, 2018).By analyzing a large number of belief system networks, I was able empirically show that small sample sizes are associated with less replicable network estimates.
By taking advantage of the World Values Survey, I was able to estimate replicability across a range of countries in representative samples (for importance of representative samples when studying ideology see, Kalmoe, in press); however, these data did not allow me to incorporate other important features of political belief systems like partisan identities or all possible relevant political beliefs for all countries.For example, if a subset of particularly relevant beliefs was not included in a particular country and this belief system network of this subset was more replicable than the less relevant political beliefs, this might have underestimated replicability for such countries.The finding that belief system networks with randomly chosen four fewer items had similar replicability to the full belief system networks suggests that the specific columns do not have large effects on replicability indices (although it may have effects on the specific structure in specific countries, something that I did not examine).
I was also able to analyze differences between countries that are associated with replicability; however, other issues, such as the proximity to an election, may also induce belief system and affect replicability.This is an important question subject to ongoing research (e.g., Fishman & Davis, 2019).Moreover, my study uses between-subject associations between variables, which highlight belief cleavages in society (Martin, 2000), rather than the belief system "in someone's head".Future work may take advantage of intensive longitudinal designs (e.g.>20 waves) to begin to estimate and assess the stability and heterogeneity of individual-level belief systems.Despite these limitations, the current study shows that belief system networks are largely replicable, although the replicable varies by both features of the sample and the political system.

Data Accessibility Statement
Data is publicly available.This is detailed along with replication code at https://osf.io/csx2g/?view_only=023 3b894fd1b40e391175e84f22b312a.replicability outcomes were scored so that higher scores indicate higher replicability.All variables were rescored so to range from zero to one.

Issues 2 1 =
Incomes should be made more equal, 10 = We need larger income differences as incentives for individual effort 3 1 = People should take more responsibility to provide for themselves, 10 = The government should take more responsibility to ensure that everyone is provided for; reverse scored 4 1 = Private ownership of business and industry should be increased, 10 = Government ownership of business and industry should be increased; reverse scored 5 1 = Competition is good, 10 = Competition is harmful; reverse scored Environmental Issues 6 Increase in taxes if used to prevent environmental pollution (1 = Strongly agree, 2 = Agree, 3 = Disagree, 4 = Strongly disagree) 7 Here are two statements people sometimes make when discussing the environment and economic growth.Which of them comes closer to your own point of view? 1 = Protecting the environment should be given priority, even if it causes slower economic growth and some loss of jobs, 2 = No answer, 3 = Economic growth and creating jobs should be the top priority, even if the environment suffers to some extent Social Beliefs 8 Euthanasia (1 = Never justifiable, 10 = Always justifiable); reverse scored 9 Prostitution (1 = Never justifiable, 10 = Always justifiable); reverse scored 10 Homosexuality (1 = Never justifiable, 10 = Always justifiable); reverse scored 11 Abortion (1 = Never justifiable, 10 = Always justifiable); reverse scored 12 Men make better political leaders than women do (1 = Strongly agree, 2 = Agree, 3 = Disagree, 4 = Strongly disagree); reverse scored 13 When jobs are scarce, men should have more right to a job than women (1 = Disagree, 2 = Neither, 3 = Agree) 14 When jobs are scarce, employers should give priority to people of this country over immigrants (1 = Disagree, 2 = Neither, 3 = Agree) 15 How about people from other countries coming here to work.Which one of the following do you think the government should do?(1 = Let anyone come who wants to? 2 = Let people come as long as there are jobs available?3 = Place strict limits on the number of foreigners who can come here?4 = Prohibit people coming here from other countries?)Governing Types 16 Having a strong leader (1 = Very good, 2 = Fairly good, 3 = Fairly bad, 4 = Very bad); reverse scored 17 Having experts make decisions (1 = Very good, 2 = Fairly good, 3 = Fairly bad, 4 = Very bad); reverse scored 18 Have the army rule (1 = Very good, 2 = Fairly good, 3 = Fairly bad, 4 = Very bad); reverse scored 19 Having a democratic political system (1 = Very good, 2 = Fairly good, 3 = Fairly bad, 4 = Very bad) Note: Numbers are used to label nodes in network figures in the supplemental materials.

Figure 1 :
Figure 1: Boxplots of the stability of edges.High values imply higher replicability.The y-axis is the correlation between the edges in the original network and replication network.The top and bottom edges of the box indicate the 75th and 25th percentiles, respectively, and the black line near the middle of the box is the 50th percentile.The whiskers represent the lowest and highest data points within 1.5 times the interquartile range of the lowest quartile and the highest quartile, respectively.Points are horizontally jittered to improve clarify.Country abbreviations are in TableS1.Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.

Figure 2 :
Figure 2: Boxplots of the proportion of edges identified as equal, inconclusive, and not equal.The top and bottom edges of the box indicate the 75th and 25th percentiles, respectively, and the black line near the middle of the box is the 50th percentile.The whiskers represent the lowest and highest data points within 1.5 times the interquartile range of the lowest quartile and the highest quartile, respectively.Points are horizontally jittered to improve clarify.Country abbreviations are in TableS1.Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.

Figure 3 :
Figure 3: Boxplots of the stability of node-level characteristics.High values imply higher replicability.Each panel shows the correlation between the original network and replication network for each node-level metric.The top and bottom edges of the box indicate the 75th and 25th percentiles, respectively, and the black line near the middle of the box is the 50th percentile.The whiskers represent the lowest and highest data points within 1.5 times the interquartile range of the lowest quartile and the highest quartile, respectively.Points are horizontally jittered to improve clarify.Country abbreviations are in TableS1.Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.

Figure 4 :
Figure 4: Boxplots of the stability of overall network characteristics.Low values imply higher replicability.The top panel shows the absolute value of the difference in connectivity.The bottom panel shows the absolute value of the difference in the proportion of positive connections.The top and bottom edges of the box indicate the 75th and 25th percentiles, respectively, and the black line near the middle of the box is the 50th percentile.The whiskers represent the lowest and highest data points within 1.5 times the interquartile range of the lowest quartile and the highest quartile, respectively.Points are horizontally jittered to improve clarify.Country abbreviations are in TableS1.Horizontal dashed grey lines are median from benchmark simulations.Horizontal dotted grey lines the 2.5% and 97.5% percentiles from the benchmark simulations.

Figure 5 :
Figure 5: Multilevel estimate and 95% confidence interval of sample size and state fragility index on replicability.Allreplicability outcomes were scored so that higher scores indicate higher replicability.All variables were rescored so to range from zero to one.