Hydrological modeling of freshwater discharge into Hudson Bay using HYPE

This study details the enhancement and calibration of the Arctic implementation of the HYdrological Predictions for the Environment (HYPE) hydrological model established for the BaySys group of projects to produce freshwater discharge scenarios for the Hudson Bay Drainage Basin (HBDB). The challenge in producing estimates of freshwater discharge for the HBDB is that it spans over a third of Canada’s continental landmass and is 40% ungauged. Scenarios for BaySys require the separation between human and climate interactions, specifically the separation of regulated river discharge from a natural, climate-driven response. We present three key improvements to the modelling system required to support the identification of natural from anthropogenic impacts: representation of prairie disconnected landscapes (i.e., non-contributing areas), a method to generalize lake storage-discharge parameters across large regions, and frozen soil modifications. Additionally, a unique approach to account for irregular hydrometric gauge density across the basins during model calibration is presented that avoids overfitting parameters to the densely gauged southern regions. We summarize our methodologies used to facilitate improved separation of human and climate driven impacts to streamflow within the basin and outline the baseline discharge simulations used for the BaySys group of projects. Challenges remain for modeling the most northern reaches of the basin, and in the lake-dominated watersheds. The techniques presented in this work, particularly the lake and flow signature clusters, may be applied to other high latitude, ungauged Arctic basins. Discharge simulations are subsequently used as input data for oceanographic, biogeochemical, and ecosystem studies across the HBDB.


Introduction
The Hudson Bay Drainage Basin (HBDB) comprises over a third of the Canadian landmass, contains important hydroelectric infrastructure and agricultural lands, and accounts for over a fifth of freshwater exports into the pan-arctic ocean system via the Hudson Bay Complex (HBC comprised of Hudson Bay, James Bay, Ungava Bay, Foxe Basin, and Hudson Strait;McClelland et al., 2006).The HBC itself is important to Arctic marine wildlife, and as a shipping route for Canada.It is a large region of primary production, influenced mainly by the timing of freshwater inputs (largely meltwater).Changing freshwater inputs are essential to the annual formation, decay and break up of sea ice, with seasonal ice cover being important for the HBC pelagic life (Manak and Mysak, 1989;Saucier et al. 1998Saucier et al. , 2004)).HBDB terrestrial discharge influences Arctic Ocean circulation, sea ice dynamics, biological and biogeochemical processes within the HBC.Due to the importance of terrestrial freshwater fluxes into the HBC for bay-wide circulation and ice-formation, continental-scale hydrologic modeling is key to the combined modeling/observation BaySys group of projects (Barber, 2014).
The importance of the HBDB necessitates a comprehensive hydrological monitoring and prediction framework.Although hydrometric gauges remain relatively dense in southern, lower-latitude regions of the basin, higher-latitude counterparts remain poorly gauged.Additionally, the number of hydrometric gauges has declined across Canada in recent decades (Coulibaly et al., 2013;Mlynowski et al., 2011).As of 2013, 40% of the HBDB is ungauged and 27% has never been gauged.Hydrological modeling is required to fill spatial and temporal gaps in the observational record, and for making long-term streamflow projections.
Over the same period this basin has undergone significant climate change impacting the distribution and variability in hydrology of the basin (MacDonald et al. 2018), there has been increasing regulation imposing a similar trend towards flattening of the average annual hydrograph (Déry et al., 2018).This creates a need to better understand the individual roles of climate change and anthropogenic alteration of the water budget of the basin to properly identify and mitigate contemporary change, but more importantly, for accurate projection of future changes in freshwater discharge.This is a key objective of the BaySys project, and a requirement of any modeling system developed to examine freshwater discharge to Hudson Bay (Barber, 2014).Added to this need is the diverse and complex hydrologic interactions within the basin, which need must be accurately defined.
Across the vast and diverse landscape of the HBDB, water is stored variably in lakes and wetlands, and as snow and ice.These storage dynamics provide several hydrological modeling challenges, such as accurately representing both natural and anthropogenically controled lake storage-discharge dynamics and prairie pothole non-contributing areas (NCAs), as well as the changing role and distribution of frozen soils across the basin.
The Global Lakes and Wetlands Database (Lehner et al., 2011) shows that the HBDB has amongst the greatest concentrations of lakes in the world, with approximately 10% of the HBDB surface area covered by lakes.Lakes modify streamflow by storing and releasing large volumes of water, buffering extreme floods, and increasing the potential for evaporative loss.The multitude of lakes across the HBDB conflicts with the parameterizations found in most hydrological models, which account only for the natural storage-discharge dynamics of the largest lakes (e.g., HYPE (HYdrological Predictions for the Environment; Lindström, 2016), Hydrotel (Fortin et al., 2001) and WATFLOOD (Kouwen, 1988(Kouwen, , 2016))).Even more uncommon are parameterizations of storage-discharge relationships in small, internal and regulated lakes (i.e., model subgrid).Several models permit a direct representation of subgrid, internal lakes, yet parameterizing these lakes in numerous ungauged natural basins, and in managed landscapes or subbasins, remains a challenge across the HBDB.
Much of the Nelson River Basin (western HBDB) consists of semi-arid and low relief prairie terrain that generates relatively low runoff volumes.Ubiquitous pothole depressions (e.g., wetlands, sloughs and small natural lakes) are often the terminus of this limited runoff, resulting in numerous internally closed basins (Spence and Woo, 2003).Since prairie streamflow generation depends largely on the interconnectivity of potholes and their connections to river networks, which is also often regulated or controlled through diversions and culverts, streamflow volumes can vary widely from year-to-year (Stichling and Blackwell, 1957).Approximately 8% of the HBDB contains prairie terrain that does not contribute directly to streamflow for a 2-year return period (Prairie Farm Rehabilitation Administration [PFRA], 1983), which poses a significant modeling challenge, particularly because of the variability in connectivity at differing return periods.Early approaches ignored the dynamic nature of the Prairie Pothole Region (PPR) and simply masked out (i.e., removed) published NCAs from the stream network (PFRA, 1983;Wood et al., 1992).More recently, the storage and dynamic interactions between adjacent pothole wetlands have been successfully simulated at small scales using high resolution topographic data (Shook and Pomeroy, 2011;Shook et al., 2013;Huang et al., 2013).Success at modeling the dynamics of the NCA region at relatively coarse resolutions has been achieved using calibrated probability distribution functions of dynamic contributing area models (Muhammad et al., 2019;Mekonnen et al., 2014;Mengistu and Spence, 2016), or with the aid of high-resolution elevation data and a US national wetland inventory (Evenson et al., 2016), whilst others ignore the NCAs altogether at the expense of model accuracy and representativeness.
Prediction of spring melt volumes is additionally complicated by the extensive permafrost and frozen soils underlying the HBDB.During spring melt, frozen soils reduce infiltration into the subsurface, causing relatively more overland flow (Hayashi, 2013).Thawing upper soil layers contribute to runoff as spring warming progresses.Continuous and discontinuous permafrost, common at latitudes above 51°N, reduce groundwater-surface water interactions (Woo and Winter, 1993).Several hydrological models have been improved by including soil freezing processes (Cherkauer and Lettenmaier, 2003;Pomeroy et al., 2007;Wang et al., 2010).Suppressing infiltration under frozen conditions influences runoff timing but can have mixed results on model skill, suggesting that, although frozen soils effectively restrict infiltration at small scales, their parameterization is not always appropriate at larger scales due to the heterogeneous nature of soil properties and flow through macropores (Pitman et al., 1999;Slater et al., 1998).
Given the importance of describing changes in freshwater discharge to Hudson Bay for the BaySys project, and the necessity to separate the relative contributions of anthropogenic and climate driven influences, we develop a hydrologic model of the HBDB.Here we present the development of that hydrologic model, and unique modifications made to account for numerous regulated and natural lakes, prairie potholes or ineffectively drained regions, and underlying frozen soils across the domain.Careful consideration of hydrometric gauge density and proximity is also a consideration within the expansive HBDB domain.Our objective is to summarize enhancements made to the Arctic implementation of the Hydrological Predictions for the Environment (A-HYPE) hydrological model to handle the spatially heterogeneous (natural and anthropogenically controlled) lake-storage discharge relationships, Prairie NCAs and frozen soils.We present an innovative calibration strategy to account for uneven gauge density, and apply it to simulate discharge throughout the HBDB, both in gauged and ungauged watersheds.This work provides the foundation for the BaySys project by producing a reliable hydrological model that is used to provide historic and future projections of runoff and streamflow from the HBDB terrestrial landmass, into the HBC.

Hudson Bay Drainage Basin
The HBDB (Figure 1) makes up the landmass between three North American continental divides: the Laurentian to the south, Arctic to the north and Pacific to the west.Runoff from the Canadian provinces of Alberta, Saskatchewan, Manitoba, Ontario and Québec, the Northwest Territories and Nunavut, and four American States (Montana, North Dakota, South Dakota, and Minnesota), all eventually discharge into the HBC.The basin spans 26° latitude, 54° longitude, and rises from sea level to over 3200 m on the eastern slopes of the western Rocky Mountain Ranges.
The landscape of the HBDB spans seven ecozones: from the southwestern Montane Cordillera, across the semiarid Prairies, to the cooler and wetter Boreal and Hudson Plains, through the Taiga Shield to the Southern and Northern Arctic (Canadian Council of Ecological Areas, 2004).Permafrost is continuous north of the Cape Henrietta Maria area (boundary between Hudson and James Bays), but transitions to sporadic and isolated moving further south toward the Hudson Bay lowlands.A small portion of the basin, in the Canadian Rockies, contains glaciers.A large portion of the HBDB, particularly along the Precambrian Shield and Boreal Forest region, is covered by open surface water in the form of lakes and wetlands.
The HBDB is composed of several regional river systems, which are detailed in Table 1 of Déry et al. (2016).The two largest rivers by volume are the Nelson River, which discharges into the western Hudson Bay, and La Grande Rivière, which discharges into the eastern James Bay.Both of these river systems are heavily fragmented with diversions and storage for hydroelectric generation (Déry et al., 2018).In total, there are 84 reservoirs within the watershed with 303.7 km 3 of storage (Lehner et al., 2011).Discharge from regulated rivers accounts for up to 53% of freshwater discharge into Hudson Bay (Déry et al., 2018).
The HBDB spans several climatic zones.Mean annual air temperature ranges from 4°C (Canadian Prairies and upper midwestern United States) to -12°C in Nunavut.In northern and southern regions of the basin, climate is drier ( ~200 mm annually) with wetter conditions for the Boreal Forest region ( ~800 mm annually).Generally, the HBDB is characterized by long, cold winters having significant snowpack accumulation ranging from 100 mm mean annual snow water equivalent (SWE) in the Prairies, to 400 mm SWE in northern Québec (Déry et al., 2005).Snow cover typically begins by early October (mid-November in the south) and stays until mid-June (mid-April in the south) (McKay and Gray, 1981).The HBDB has a primarily nival river regime.Further details on the HBDB watershed and physiography are found in Stadnyk et al. (2019).

Arctic-HYPE hydrological model
A-HYPE (Andersson et al., 2015;Gelfan et al., 2016) is a regional implementation of the HYPE hydrological model (Lindström et al., 2010), developed by the Swedish Meteorological and Hydrological Institute (SMHI).A-HYPE is set up for hydrological modeling of the pan-arctic landmass that discharges into the Arctic Ocean system, spanning North America and Eurasia.In addition to the base version of HYPE, A-HYPE includes representations of cryospheric processes including glacier accumulation and melt, lake and river ice, and a combined air temperature and radiation index snowmelt module (Gelfan et al. 2017).

H-HYPE model
The HBDB is set up within A-HYPE as 6,668 subbasins having mean (median) area of 597.8 km 2 (351.0 km 2 ), ranging from 2.0 km 2 in size, to a maximum of 24,431 km * Objective functions: MNS -mean Nash-Sutcliffe Efficiency (Nash and Sutcliffe, 1970); MKG -median subbasin KGE (Gupta et al., 2009); MCC -mean subbasin Pearson correlation coefficient (component of KGE); MRS -mean subbasin error in standard deviation (component of KGE); MDV -mean subbasin deviation of volumes (component of KGE; Gupta et al., 1999); RDV -regional relative error (data from all stations combined into a single vector).^ Multipliers for Evaporation and Runoff parameters correspond to the number of significant landcover and soil types requiring a unique parameter calibration.^^ Multipliers for Lake Discharge correspond to the number of lake types requiring a unique parameter calibration (Section 4.2). and urban.Table S1 lists the data sources used in A-HYPE, including inputs for snow and ice (Krenke 2004), glaciers (WGMS, 2012), and lakes (Lehner and Döll 2014).

Regulated reservoirs
There are multiple regulated reservoirs within the HBDB as identified by the Canadian Dam Association criteria (Natural Resources Canada, 2008), documented within Stadnyk et al. (2019).Tefs et al. (in revision) develop a generalized daily regulation routine for HYPE that can be applied to predict specific reservoir releases for 10 of the largest reservoirs operated within the HBDB, targeting monthly operational levels.There remain, however, numerous reservoirs with human-altered regulation that do not have sufficient data to develop stage-discharge relationships for, or which reside in ungauged basins (or where hydrometric data are unavailable).
Using a modification of the existing sine wave relationship coded in HYPE (Lindström et al., 2010), we derive a more suitable and generalized reservoir release relationship for the HBDB in consultation with hydropower utilities having knowledge of the major regulation within the basin (Tefs et al., in revision;Stadnyk et al., 2019).Outflow from regulated reservoirs is predicted using a parameterized specified desired discharge (qprod), which is satisfied if there is sufficient volume in the reservoir.qprod can have a uniform value throughout the year or vary sinusoidally over the year as: where qprod mean is the mean annual production flow (m 3 s -1 ), qamp is the amplitude of the production flow, dayno is the day of the year, days is the number of days per year (365 or 366) and qpha sets the seasonal phase (minimum sine curve occurs at day number qpha).For reservoir levels above a maximum threshold (w0 [m]), spillway flow is assumed to occur and follows the rating curve: where k and p are coefficients, and wlm is lake level [m].
Parameters qprod, qprod mean , q amp , qpha, k and p are manually calibrated for a number reservoirs in the HBDB.This is critical for accurate prediction of volume and timing within basins with regulated reservoirs.

Calibration approach
Aside from ensuring adequate process representation, it is imperative that a strong model calibration strategy is used.Recurrent topics in calibrating and validating hydrological models include the choice of calibration algorithm, objective function and split-sample periods (e.g., Refsgaard, 1997, Lindström et al., 1997, and Moriasi et al., 2007).Unaddressed in the literature is the objective selection of gauges for calibration over continental regions, such as the HBDB, where gauge density is irregular and often clustered.The relatively higher density of gauges in the southern HBDB compared to the north presents the risk of over-calibrating the model to fit hydrographs in higher-density regions.We employ a flow signature and cluster analysis within our calibration methodology to avoid overfitting more densely gauged regions, and the under-representation of sparsely gauged (northern) regions (Donnelly et al., 2016).We perform simulations from 1961-2013, with model spin up from 1961-1970.A split calibration period is used to span both the early (relatively drier and cooler) and late (relatively wetter and warmer) periods; calibration years were 1971-1975, 1981-1985, 1991-1995 and 2001-2005, with skipped years used for validation (Section 3.2.3).Statistics were calculated on a daily basis for day in each year of calibration.Markov chain differential evolution, a genetic algorithm, was used to define uncertainty during auto-calibration (Vrugt, 2009) with 10 populations and 400 generations for each calibration step.This method selects new parameter values as a function of randomness and inherits traits from accepted previous solutions, based on probability analysis.The advantage of this approach over other differential evolutionary methods is that both a global optimum value and probability-based uncertainty estimate of that value are obtained.Calibration is performed using daily hydrometric data from the Water Survey of Canada and US Geological Survey.Of 245 hydrometric gauges, 101 gauges were used to calibrate land surface parameters and 12 are used for calibrating dams and reservoirs (shown on Figure S1).Although the hydrometric gauge on the Sylvia Grinnell River near Iqaluit on Baffin Island is not within the HDBD, its data were used for validation because it is the only gauge located near the northernmost reaches of the basin (i.e., our model domain extended further north of the HBDB extent to include this northern-most subbasin for validation purposes).Calibration steps are outlined in Table 1 and the process flow visually presented on Figure 2 (additional process diagrams are provided in Supplemental material for each operation: sensitivity analysis, guided calibration, and process improvement; Figures S2 to S4, respectively).A subset of unregulated gauges (10 in total) without large headwater lakes were used in aspects of model calibration further described in Section 4 (Table S2).
The initial parameter set was that calibrated by SMHI for the entire pan-Arctic (Gelfan et al., 2017).The first stage of calibration in this study was a regional calibration of evaporation and runoff adjustment factors (0.80-1.20) to scale the existing A-HYPE landcover and soil type dependent parameters to the regional HBDB context (step 1, Table 1).The same adjustment factor is applied to all landcover and soil types for a given parameter.Evaluation metrics comprised bias, correlation and variance errors, which were equally weighted in the Kling-Gupta Efficiency (KGE; Gupta et al., 2007) score.The Nash-Sutcliffe Efficiency score (NSE; Nash and Sutcliffe, 1970) was also evaluated.Multiple criteria during calibration were weighted according to the error components they most strongly influence, as determined by sensitivity runs (shown in Table 1, last column).Sensitivity runs were conducted separately before each calibration step listed in Table 1, with 10 populations and 200 generations for each, with the objective function an equal weighting of all error criteria (see Figure S2).For each step, the coefficient of determination (R 2 ) between each error criterion and perturbed parameters was calculated.Evaporation parameters most affected simulation bias, runoff parameters most affected NSE and the KGE correlation and variance errors, and lake parameters most affected NSE and the KGE variance error.Details on hydrometric gauge selection for calibration are provided in Section 4.4.
Following regional adjustment factor calibration was the PET crop coefficient calibration that controls evapotranspiration loss (kc), and the fraction of soil available for evaporation (fc), across all landcover and soil types (Step 2, Table 1).
Step 3 involved calibration of lake discharge from unregulated basins (Step 3a), and dams and reservoirs for regulated basins (Step 3b), which were performed in parallel.Lastly, routing parameters (damp, rivvel) were optimized in the final stage of calibration to leverage improvements in the estimation of runoff and hydrological storages from all other steps in the calibration process (Step 4, Table 1).Figure S3 provided a process flow of the guided and random calibration conducted for H-HYPE to derive the final set of parameters, and in the stepwise introduction of new processes (and the evaluation of the effectiveness of each process) to the model structure.
We employed k-means clustering to develop the prairie NCA parameterization (Section 4.1), lake clusters (Section 4.3), and for the selection of hydrometric gauges for calibration (Section 4.4).In each case, input data were normalized as standard scores (z-scores) prior to clustering.Two to 10 clusters were calculated for each application.Silhouette widths, which are a measure of how similar a point is to its own cluster compared to other clusters (Rousseeuw, 1987), were calculated for each cluster grouping to objectively select the optimal number of clusters.Clusters with highest average silhouette width and fewest negative scores were selected (Tables 2 and  3, Table S3).
Lastly, flow signatures were used to capture specific (sometimes regional) similarities from measured hydrometric data to assist with calibration gauge selection.Eight flow signatures were calculated using daily observed discharge based on the method from Pechlivanidis and Arheimer (2015; Figure S5; Table S4) and were applied to select distinct groupings of gauges for calibration (Section 4.4, Table 2).Gauges used for calibration, relative to flow signatures and NCA sites, are shown on Figure S1.

Validation
After each auto-calibration step, a split sample temporal validation of the optimized parameter set was performed before starting the next auto-calibration step.This resulted in five validation steps using the same objective functions as derived for calibration.Validation years were 1976-1980, 1986-1990, 1996-2000 and 2006-2013.

Model enhancements and results
A-HYPE model enhancements were incrementally implemented as part of the development of the H-HYPE model, which are further described below.Model performance  from calibration is presented sequentially, for each enhancement.Figures with boxplots show the median, 25 th and 75 th percentiles at the hinges, 95% confidence interval at the whisker extents, and outliers beyond.

Prairie non-contributing area
The new parameterization strategy leverages the existing NCA map from the PFRA (1983), which includes anthropogenically drained or altered landscapes.The NCA (shown on Figure 1) is that for a return period of two years; therefore, it is likely that the PFRA NCA is underestimated for low runoff conditions, and potentially overestimated for the highest runoff conditions.An increase in the NCA decreases the effective drainage area of a subbasin.We identify thresholds in simulated daily runoff at which different NCAs (or effective drainage areas) are applied.The analysis uses A-HYPE (prior to H-HYPE calibration) evaluated at 10 gauges with upstream NCA but without large upstream lakes or regulation (Table S2, Figure S1).Although the Water Survey of Canada database (http:// www.ec.gc.ca/rhc-wsc/) indicates that these are natural rivers, it cannot be confirmed that their drainage areas are completely void of agriculture withdrawals or altered drainage.Although there is considerable variability in the relationship between simulated runoff and observed specific discharge for these 10 gauges (Figure 3), a Pearson's correlation coefficient of 0.87 (p < 0.001) indicates that, even when neglecting the influence of complex pothole connectivity dynamics (contributing to the scatter, or variability), runoff is a suitable for parameterizing NCA within A-HYPE.
The relationship between simulated runoff and the error in simulated specific discharge is examined using differences between the logarithms of simulated and measured discharge to give comparable weighting to high and low flows (Figure 4).K-means clustering (Section 3.2) is performed to identify runoff thresholds at which different NCA schemes may be applied.Two runoff thresholds (three clusters) are selected: <0.351 mm day -1 , 0.351 to 1.071 mm day -1 , >1.071 mm day -1 ; thresholds are taken as the minimum values of runoff clusters 2 and 3.
Sensitivity runs were performed, using KGE as a performance metric, to find an NCA scheme for each of the three runoff threshold groupings.Each runoff threshold has a multiplier, α i , that is applied to the published NCA in subsequent simulations: where NCA new is the updated NCA and NCA pub is the published PFRA NCA.Best fit multipliers are found to be:  low runoff, and overestimation for higher runoff.The new runoff threshold based NCA parameterization improves model performance across the 10 gauges (Figure 5).NCA processes and parameterization were introduced into the H-HYPE model, and the model recalibrated (Figure S4).Mean deviation of runoff volumes (D v ) improve from -13% to -8%, and median KGE improves from 0.21 to 0.28, relative to a simple NCA mask.Results confirm that NCA modifications impact median to low flow more significantly than peak flow, evidenced by similarity among NSE scores.

Lake clusters
Lakes are represented in HYPE as either outlet lakes (olakes) or internal lakes (ilakes).Olakes are located at the outlet of subbasins.Ilakes are located conceptually within a subbasin and receive a portion of subbasin runoff.Olakes, in particular, exert a strong control on discharge characteristics.Outflow from olakes and ilakes in HYPE are parameterized using rating curves: where k and p are coefficients, and wlm is lake level (m); lake depth is taken from Kourzeneva (2010).In H-HYPE, 84% of the subbasins contain ilakes and 12% contain olakes.Few of these lakes are gauged directly at the mouth, therefore, it is necessary to generalize lake outflow rating curve parameters in some way.Groupings of ilakes and olakes with similar physiographic characteristics were made using k-means cluster analyses (Section 3.3.2).Lakes with similar physiography have similar discharge characteristics and can be given common calibrated rating curve parameters (k and p).Combinations of a number of physiographic parameters are considered; for olakes: olake surface area, slope (10 km buffer around olake), relief (10 km buffer around olake), elevation (10 km buffer around olake) and upstream area; for ilakes: ilake area, ilake count per subbasin, mean ilake area, standard deviation of ilake area, subbasin slope, subbasin relief.
It is best to avoid a large number of lake clusters to limit equifinality during calibration (Beven, 2001).Table 1 identifies three parameters × number of ilake clusters and two parameters × number of olake clusters that are used in the model setup and calibration.Silhouette analyses (Table S3) indicate that the best selections of lake clusters were four olake clusters based olake area and relief, and three ilake clusters based on ilake count alone (Figure 6 and Table 3).Subbasins with high ilake count and relief were most common in the Canadian Shield and low Arctic (Figure S6), as would be expected resulting from localized internal (to the subbasin) storage in isolated and perched wetlands and lakes within the Canadian Shield and poorly draining sub-Arctic terrain.Cluster regions, assigned at the subbasin level, determine how ilakes and olakes are parameterized within a given region (Figure S6a, b).Selection of the optimal number of lake clusters was made by finding the number of clusters which returned the smallest number of negative silhouette scores and the largest average silhouette width by ranking these data and taking the highest combined rank (Table S3), with distributions of characteristics for the optimal selection shown in Figure 6.
Model performance results from two alternate lake cluster selections were compared to those based on the above cluster formations for validation of the methodology: Alternative 1) four olake clusters based on olake area and three ilake clusters based on ilake area; and Alternative 2) four olake clusters based on olake area, slope and elevation and three ilake clusters based on ilake area, standard deviation of ilake area, elevation and relief.Lake parameters were recalculated for the alternate lake clusters (i.e., parameters from calibration steps 1 and 2 (Table 1) are retained, but step 3a is repeated separately).Results from the calibration of olake and ilake cluster parameters are presented in Section 4.5.1.

Frozen soils
Four representations of frozen soils in A-HYPE are considered: 1) none, 2) frozen soils limiting infiltration, 3) frozen soils limiting subsurface runoff, and 4) frozen soils limiting both subsurface runoff and infiltration.Soil layer temperatures are calculated following the equations presented in Pers et al. (2016).Prior to this study, there was no representation of frozen soils in A-HYPE (Option 1).Option 2 employs the parametric expression for estimating infiltration into frozen soils from Zhao and Gray (1999).The relationship relates infiltration to total soil saturation and temperature at the beginning of snowmelt, the soil surface saturation during melt and the infiltration opportunity time estimated following Zhao and Gray (1997).Infiltration calculations are grouped into three categories: Restricted -infiltration is completely restricted due to impermeable surface conditions such as ice lens formation.This is identified by the soil temperature falling below a threshold for lens formation (default: -10°C) and sufficient recent snowmelt (default: 5 mm).Limited -capillary flow predominates and infiltration is primarily controlled by soil physical properties.Unlimited -gravity flow predominates and all water infiltrates.The parametric equation is used for the limited infiltration case: where INF is the potential infiltration capacity (mm), C is an empirical constant with value taken from Gray et al. (2001), S 0 is the soil surface saturation (assumed equal to 1.00 during infiltration), S I = θ I ⁄θ 0 is the premelt pore saturation of the upper soil layer with θ I being the volumetric soil moisture at the start of infiltration and θ 0 being the porosity, T I is the pre-melt temperature of the upper soil layer (K), and t 0 is the infiltration opportunity time (hours).Option 3 employs the freezing point depression equation (e.g., Fuchs et al., 1978;Flerchinger and Saxon, 1989).As soil temperatures fall below freezing, water nearest soil particles remains in liquid form due to absorptive and capillary forces.The soil matric potential has some dependence on soil texture, expressed by the empirically derived Clapp-Hornberger parameters (Cosby et al., 1984).Combining the equation relating water suction to temperature when ice is present (e.g., Black and Tice, 1988) with the Clapp and Hornberger (1978) equation for suction as a function of liquid water content gives the upper limit of liquid water content for subfreezing temperatures: where θ l,max is the maximum volumetric liquid water content (m 3 m -3 ), θ S is the saturation volumetric water content or porosity (m 3 m -3 ), L F is the latent heat of fusion (3.34 × 10 5 J kg -1 ), T freeze = 273.15K, T soil is soil layer temperature (K), g is gravitational acceleration (9.81 m s -2 ), ψ s is soil water potential at saturation (m), and b is the shape coefficient of the soil water potential-moisture curve.ψ s and b are empirically derived soil texture dependent parameters with values taken from Cosby et al. (1984).Equation 6is applied to each soil layer in A-HYPE, with only θ l,max being available for runoff.The model is calibrated separately for each of the four frozen soil representations using the same methodology described in Section 3.2 (shown on Figure S4).Five populations of 300 generations were used, with equally weighted mean KGE and mean NSE as the objective function.Only soil runoff parameters are calibrated (srrcs_corr, rrcs_corr, ep_corr, wp_corr and fc from Table 1).Performance statistics for the best parameter set from each calibration chain were combined for evaluation and all gauges in the HBDB are assessed (refer to Section 4.5.1).

Flow signatures and calibration gauge selection
For a large watershed such as the HBDB, the hydrometric gauges selected for calibration should be non-biased with respect to drainage area covered by distinct flow signatures.K-means cluster analyses (Section 3.3.2) are used to group hydrometric gauges based on similar flow signatures (Section 3.3.1, Figure S5).A grouping of five flow signature clusters was chosen to maximize mean silhouette width with the fewest negative scores.Resulting flow signatures follow expected regional streamflow patterning: greatest flashiness in higher sloping mountainous regions, greatest normalized high flow ratios due to NCAs across the prairies, and greatest flow magnitudes in wettest regions with the largest drainage areas (Table S2).
An appropriate number of area-weighted stations from each flow signature cluster are selected for calibration (Figure S1).The particular calibration stations are manually selected with the goal of minimizing overlapping upstream drainage area, maximizing spatial coverage, and maximizing the range of years covered.In practice, selection of stations for calibration is limited in most cases to those available in, and required from each, flow signature cluster.The ilake and olake clusters are well-covered by the calibration gauges: olake cluster 1 (45.2%),olake cluster 2 (35.7%), olake cluster 3 (47.9%),olake cluster 4 (20.6%),ilake cluster 1 (40.5%),ilake cluster 2 (37.5%), ilake cluster 3 (32.5%).
To verify the flow signature approach to gauge selection, model calibration is repeated for four different random selections of 101 gauges.Calibration steps 1-3a (Table 1) were repeated; model performance results are compared in section 4.5.

Frozen soils, lake clusters and flow signatures
The ilake and olake clusters were selected using the silhouette analyses to optimize model performance (NSE and KGE scores) among the alternate lake clusters (Figure S7 only NSE is shown).Median NSE and KGE are 0.04-0.07and 0.04-0.08better than the alternate clusters.There are negligible differences in runoff volume (D v ) scores between the different clusters because bias is more so controlled by the evaporation and runoff parameters.
For frozen soil representation, option 3 (only frozen subsurface) provides the best performance across all evaluation metrics, i.e.NSE, KGE and D v (each of mean, median, 1 st quartile and 3 rd quartile of statistics; Figure 7 -only KGE shown).When compared to no frozen soils, option 3 provides slight improvements in mean evaluation scores (2.9% D v , 0.04 KGE and 0.06 NSE improvements).Option 3 was retained for further modeling and calibration.Including frozen soil effects on infiltration (option 2) degrades model performance across all measures (18.5% D v , 0.14 KGE and 0.23 NSE degradations compared to no frozen soils).
The flow signatures approach to gauge selection for calibration provides better performance largely in validation (i.e., long term simulation) than the random selection of gauges across all measures of model performance.Median NSE, KGE and D v are 0.03-0.11,0.04-0.16and 3.4%-8.0%better than the random gauge selection for validation.Lower quartiles improved by 0.08-0.17,0.03-0.20,and 2.2%-7.4% during validation when using the flow signatures.

Final model performance
Calibration improves all performance measures, with similar scores for calibration and validation indicating a verifiable parameter set.Calibration improves the variance of performance across all gauges (i.e., a lower difference between the third and first quartiles).There is substantial improvement in simulated volume with median D v error decreasing from 27.4% to 2.7% and variance reduced (all gauges, both calibration and validation time periods; Figure not shown).This is an indication of general improvement of estimation error at all gauges across the HBDB domain.There is also increased skill in the timing and variance of discharge, with median NSE improving from 0.26 to 0.43 and median KGE improving more substantially from 0.39 to 0.59 (Figure 8).This result indicates a reduction of the long-term error estimation was There is negligible improvement in volume error from subsequent calibration steps.Lake calibration improves the simulated variance of discharge, as reflected in improved KGE scores.Lake calibration also degraded some NSE scores, but higher scores are obtained following dam and routing calibration.Routing calibration has little effect on model error statistics, indicating the strong control of lake discharge parameters on routing performance.
Spatially, calibration improves model performance across the whole HBDB.Volume error and NSE improve across all regions of the HBDB with few degradations, with comparatively smaller improvements in NSE than KGE (KGE not shown).Some regional distinctions in statistical improvement are apparent (Figure 9).Spatial plots indicate lower D v values in northern rivers, in Québec, and parts of the southern prairies.Conversely, there are gauges in the southwest HBDB where volume errors are high (Figure 9b). Figure 10 presents the final KGE error across the HBDB domain, used as a final assessment of model performance for the long-term predictive capability to estimate mean and variance in freshwater discharge to Hudson Bay.

Discussion
The results presented in Section 4.5.2demonstrate improvement in model performance from the processbased enhancements added to the HYPE model for the HBDB.Including an NCA representation markedly reduces streamflow overestimation across the prairies (Figure 5).The developed parameterization does not, however, simulate prairie pothole fill and spill dynamics, which are the primary runoff generation control in the prairie environment (Stichling and Blackwell, 1957;Shook and Pomeroy, 2011;Shook et al., 2013).It is expected that the simulation of large-scale prairie pothole dynamics will aid in simulating the timing of discharge and peak flow performance (NSE); whereas the parameterization developed here is primarily a volume-based (storage) correction.There have been several algorithms developed over the past decade to specifically tackle runoff generation in the Prairie Pothole Region, including probability distribution models (Mekonnen et al., 2014;2016), fill and spill DEM or depression-driven methods (Shook et al., 2013;Nasab et al., 2017), and hybrid approaches combining physically-based and stochastic (artificial neural networks and physically based models) (Mekonnen et al., 2015).But a fully realistic representation of the complexity and dynamic nature of the prairie potholes in large-scale hydrological modeling experiments still requires the use of a routing product derived from high-resolution digital terrain models, and exponentially increases computational resource requirements (Muhammad et al., 2019), which is not practical or possible for the vast HBDB.The NCA representation we develop in this study balances improved model accuracy (for low flow approximation) with efficiency and realistic input data requirements.Model improvements realized from the lake clusters were demonstrated during calibration (Figure 8).Combined calibration of the olake and ilake cluster parameters improves the efficacy of H-HYPE to correctly simulate the variance of observed discharge, with the variance error component of KGE reflecting this result.Calibration improves the mean and median KGE variance error components to 0.96 (from 0.70) and 0.98 (from 0.77), respectively, which approach the optimal value of 1.00.This results from improvements to the amount of land-based storage, or simulated volume of runoff within the model (ilakes), resulting in a more constrained simulated discharge value downstream of olakes.
With global warming frozen soils have become an indicator of change in the natural ecohydrological system.The implementation of a frozen soil scheme in cold region land surface models or spatially distributed hydrological models is required for more realistic long-term projections of runoff impacted by changing soil moisture and soil temperature profiles (Wang et al., 2010;Qin et al., 2016).Frozen soils are distinct because of unique hydrological characteristics relative to thawed soils (e.g.lower hydraulic conductivity, higher soil water content, seasonal freeze/thaw dynamics, and redistribution of frozen/thawed water) (Cherkauer and Lettenmaier, 1999;Qin et al., 2016).In this study, the freezing point depression representation of subsurface runoff provides a small improvement in discharge simulations compared to no representation of frozen soils (Figure 7).These improvements are generally greater at higher latitudes, where H-HYPE performs poorest.There is otherwise not a clear spatial pattern to model improvement.The Zhao and Gray (1999) parameterization tested for reducing infiltration into frozen soils degrades model performance across the HBDB in this study as it produced earlier and flashier runoff, which was not commensurate with observed discharge records.It is noted, however, that regulated discharge signals (downstream of reservoirs) produce flattened hydrograph responses that do not mimic naturally flashy hydrographs typical of frozen soils (i.e., particularly in the Prairie basins).In these regions, where frozen soils and regulated reservoirs co-exist, there is likely error in the modeling process resulting from both the predicted (anthropogenic) reservoir releases and (natural) frozen soil infiltration that cannot be distinguished without more data to validate frozen soils.Pitman et al. (1999) showed that runoff simulations at large scales can be poorer when suppressing infiltration into frozen soils and suggest this is due to the presence of preferential drainage pathways such as macropores.
Our calibration methodology improves model performance, but there are identified areas for further improvement.The H-HYPE model set-up has eight landcover types and seven soil types, with common landcover and soil parameters applied to various areas across the whole HBDB.We suspect that further discretization of soil and land classes, and a more regional approach to model calibration, could improve model performance.NSE scores are lowest (<0) in the lake-heavy Churchill River Basin where a high fraction of ilake cluster 3 exists.Further improvements to lake storage-discharge curve relationships is also likely required in this region to improve model performance.Representations of more than a single ilake and olake per subbasin may be warranted.Model performance metrics indicate the H-HYPE model performs well at estimating long-term mean and variability in streamflow (i.e., KGE scores), but still struggles to estimate the volume, timing (or both) of peak flow, evidenced by smaller improvements to the NSE scores.This is particularly true, and most prevalent (i.e., region with lowest NSE scores) across the Prairie region of the Western Hudson Bay where hydrographs are characteristically flashy.This region is notoriously difficult to simulate using hydrologic models and has been referred to as the "graveyard of hydrologic models" (Muhammad et al., 2019).Moriasi et al. (2015) suggest a hydrologic model is satisfactorily calibrated when NSE>0.45 at the watershed scale.The HBDB is an extremely large and hydrologically diverse (and complex) region, and yet we are able to achieve this standard for more than half the subbasins in the H-HYPE domain.Moreover, given the emphasis on long-term and large-scale mean and variability in freshwater discharge to Hudson Bay, KGE scores better reflect the desired performance of our model.Based on KGE performance, we determine the H-HYPE model to be satisfactorily fit-for-purpose for estimating the freshwater discharge to Hudson Bay for the BaySys project.

Conclusions
This study summarizes the initial development of the H-HYPE hydrological model, developed to estimate freshwater discharge across the HBDB terrestrial domain and into Hudson Bay; a largely ungauged continental scale region.This modeling was undertaken as part of the larger BaySys project to provide non-stationary, dynamic time-series of freshwater flux for the NEMO ocean model (Ridenour et al., 2019), which will then be used to generate scenarios of sea ice, biogeochemical, and ecosystem processes.A key aspect of the BaySys project is to separate the relative contributions of human and climate-driven impacts to the freshwater discharge to Hudson Bay.To that end, several key developments were needed for the model to be capable of simulating and discerning climatedriven hydrology from those imposed by anthropogenic control, or water resources management.Our publication documents those developments, and a stepwise approach to calibrating the baseline freshwater discharge scenarios for the BaySys project.
We made three landscape scale storage-based improvements to the original A-HYPE model to facilitate improved separation of natural from human-driven changes to the water cycle, including the 1) representation of prairie pothole NCAs, 2) parameterization of lakes and regulated storage-discharge relationships, and 3) representation of frozen soils in the model structure.
The NCA parameterization provided marked reduction in simulated discharge bias across the Prairie region.As it is principally a volume-based correction without representing the dynamic fill and spill runoff generation process, it is recognized that a more refined understanding of large-scale prairie runoff generation mechanisms will further improve hydrological simulations in this region.Our approach to identifying NCA thresholds is applicable in other Prairie region hydrological studies, and the identified thresholds transferable to other studies within this region.
The use of lake clusters improved the ability of H-HYPE to simulate the variability of observed discharge.The lake clustering technique, based on physiographic characteristics, allows the transfer of calibrated rating curve parameters to ungauged regions.This approach is generalizable and can be applied in other, regional or continental scale ungauged basins with numerous lakes (e.g., Viglione et al., 2013).By identifying lake similarity characteristics (e.g., Fig. 6) that dominate the lake runoff response, and calibrating that response among gauged basins or lakes in each cluster, the identified parameters can then be transferred to ungauged regions identified as having similar hydrologic behaviour.Further, more refined (region-specific) calibration of the most lake-heavy subbasins in the HBDB, such as the western Canadian Shield, are needed to further improve model performance.Since the goal of this study was to derive a Hudson Bay-wide model and calibration that resulted in a good match of total freshwater flux to the Bay, region-specific calibration was out of scope for this project, and there were insufficient data to support such an endeavour.Though the timing and volume of freshwater flux delivered to the Bay are important for local-scale circulation, the scope of the BaySys project (given the size of the domain and scale of analyses) was to analyse variability on a monthly basis, including the separation of regulation and climate-driven influences (Barber, 2014).The nature of how H-HYPE freshwater discharge is integrated into the NEMO ocean model also spatially aggregates discharge on a regional basis, meaning that discharge from several adjacent rivers is combined into one ocean polygon.Given these spatial and temporal constraints imposed on BaySys analyses, we find the H-HYPE model to be appropriate for simulating the dynamics of the Hudson Bay freshwater discharge response.A recent study by Ridenour et al. (2019) compared the H-HYPE freshwater discharge inputs, and their impact on ocean variables, to reanalysis discharge datasets and found comparable performance.
Reducing subsurface runoff due to soil freezing provided only minor improvements in simulated discharge; however, limiting infiltration into frozen soils was found to degrade discharge simulations.We suggest that the parameterization of infiltration into frozen soils at large scales should remain an open area of research, and the algorithms identified in our study tested in other cold regions at varying spatial scales.An emerging critical area of research is on the parameterization of deep, frozen soils (i.e., permafrost).Several studies have been published looking at the effects of frozen soils and permafrost under changing climates (e.g., Evans et al., 2018;Connon et al., 2014;Smith et al., 2010), but further research into the impact on surface runoff, and for methods of regional data collection (i.e., remote sensing) are warranted and could potentially improve discharge simulations in the

Figure 3 :
Figure 3: A-HYPE simulated daily runoff (original parameter set) relative to measured daily specific discharge for 10 gauges used in NCA parameterization development.A black line represents the linear regression through the simulation data with associated Pearson's r correlation.Scatter indicates the complex pothole runoff generation dynamics of the region.DOI: https://doi.org/10.1525/elementa.439.f3