We propose a modular framework for methane emission detection, localization, and quantification on oil and gas production sites that uses concentration and wind data from point-in-space continuous monitoring systems. The framework leverages a gradient-based spike detection algorithm to estimate emission start and end times (event detection) and pattern matches simulated and observed concentrations to estimate emission source location (localization) and rate (quantification). The framework was evaluated on a month of non-blinded, single-source controlled releases ranging from 0.50 to 8.25 h in duration and 0.18 to 6.39 kg/h in size. All controlled releases are detected and 82% are localized correctly; 5.5% of estimated events are false positives. For emissions ≤1 kg/h, the framework underestimates the emission rate by −3.9% on average, with 90% of rate estimates falling within a percent difference of [−74.9%, 195.2%] from the true rate. For emissions >1 kg/h, the framework overestimates the emission rate by 4.3% on average, with 90% of rate estimates falling within a percent difference of [−49.3%, 78.8%] from the true rate. Potential uses for the proposed framework include near real time alerting for rapid emission mitigation and emission quantification for use in measurement-informed inventories on production sites.
1. Introduction
Reducing methane emissions is a key component of short-term climate action and would greatly increase the feasibility of the 1.5° temperature goal from the 2015 Paris Climate Agreement (Schleussner et al., 2016; Collins et al., 2018). The oil and gas sector accounts for 22% of global anthropogenic methane emissions (Crippa et al., 2021; O’Rourke et al., 2021) and 32% within the United States (U.S. Environmental Protection Agency, 2022), although this is likely an underestimate (Karion et al., 2013; Brandt et al., 2014; Pétron et al., 2014; Zavala-Araiza et al., 2015; Alvarez et al., 2018; Wang et al., 2022). Therefore, this sector provides a promising avenue for emissions reduction.
Recent attention and advances in measurement technologies have revealed a number of important characteristics of methane emissions from the oil and gas sector. First, emissions exhibit high temporal variability, with intermittent events such as liquids unloadings and blowdowns representing a significant portion of total emissions (Schleussner et al., 2016; Allen et al., 2017; Lavoie et al., 2017; Schwietzke et al., 2017; Vaughn et al., 2018). Second, emissions can vary by orders of magnitude across basins and sites (Brandt et al., 2016; Robertson et al., 2017; O’Connell et al., 2019). Third, infrequent super-emitter events can represent a large portion of total emissions, and hence measuring these events is critical for accurate emissions accounting (Brandt et al., 2016; Caulton et al., 2019; Cusworth et al., 2021). Bottom-up inventories have been found to underestimate total emissions and are poorly suited to accommodate these complicated emission characteristics (Karion et al., 2013; Brandt et al., 2014; Pétron et al., 2014; Zavala-Araiza et al., 2015; Alvarez et al., 2018; Wang et al., 2022). Therefore, direct measurements are needed to better understand and mitigate emissions. In accordance with these findings, the Inflation Reduction Act has directed the U.S. Environmental Protection Agency to incorporate empirical data into its oil and gas emissions reporting requirements (Yarmuth, 2022).
A range of methane measurement technologies exists, including satellites, aircraft, and ground-based continuous monitoring systems (CMS). Satellites with publicly available data have detection thresholds around 1,400 kg/h in ideal conditions (Sherwin et al., 2023) and 5,000 kg/h in more complex conditions (Gorroño et al., 2023), meaning that they can only detect very large emissions. Satellites designed to target specific locations have lower detection thresholds of around 200 kg/h, but the data are (for the most part) not publicly available (Sherwin et al., 2023). Aerial measurement technologies have a much lower detection threshold of around 3 kg/h and provide mostly unbiased emission rate estimates (Johnson et al., 2021; Bell et al., 2022; Rutherford et al., 2023). However, measurement campaigns using aerial technologies often sample each individual site infrequently, making it challening to consistently capture intermittent, short-lived emissions at the site-level. CMS observe methane concentrations in near real time and have detection thresholds ranging from less than 1 kg/h (this work) to 3–6 kg/h (Bell et al., 2023), making them a promising avenue for site-level, measurement-based emission information. However, an analytical framework is required to translate the raw concentration data from the CMS into emission event start and end times (event detection), emission source locations (localization), and emission rates (quantification).
In this work, we propose a new analytical framework that performs methane emission detection, localization, and quantification using raw concentration and wind data from any network of point-in-space CMS sensors. The framework has 2 primary use cases: (1) alerting, in which localization and quantification estimates are provided to the operator in near real time, and (2) inventory development, in which quantification estimates are used in aggregate to determine cumulative emissions at a given cadence. The proposed framework is best suited for production sites that are around 150 m in diameter and contain spatially distinct equipment groups (e.g., wellheads, separators, and tanks).
A number of groups have proposed similar techniques for emission localization and quantification in this setting. Some of these solutions rely on a large sensor network (20 plus), which would be cost prohibitive in practice (Keats et al., 2007; Sharan et al., 2009), or on long term (multiple hours to days) aggregates, which would be unable to detect short-lived events, even if they are large (Luhar et al., 2014; Feitz et al., 2018). Additionally, some solutions rely on the Gaussian plume atmospheric transport model (Sharan et al., 2009; Kumar et al., 2022). This model is optimized for far-field applications with distance scales much larger than the typical production site and assumes steady state wind conditions over short (approximately 10 min) time intervals, an assumption that breaks down in many practical scenarios where wind conditions are variable. On the other end of the complexity spectrum, some solutions rely on large eddy simulations for modeling atmospheric transport (Keats et al., 2007; Travis et al., 2020). These simulations are far more accurate than the Gaussian plume model but require special expertise to operate and implement for specific sites, are computationally expensive, and are not generally publicly available. Finally, none of the solutions discussed here attempt to perform emission event detection, and hence they are all evaluated in rounds with known start and end times (Keats et al., 2007; Sharan et al., 2009; Luhar et al., 2014; Feitz et al., 2018; Travis et al., 2020; Kumar et al., 2022). Note that proprietary solutions are developing rapidly in the private sector, which we are unable to assess.
The event detection, localization, and quantification framework proposed here contributes a number of advancements to this body of literature. First, the framework includes a novel spike detection algorithm that creates a local background estimate for each spike identified in the concentration time series. This algorithm is used in multiple steps of the broader framework. Second, the framework requires no information about emission start or end times to operate. The framework can be run continuously and will only estimate emission location and rate if an emission event is detected. Third, the framework uses a Gaussian puff atmospheric transport model that provides a balance between simpler models whose assumptions are rarely met in practice (e.g., the Gaussian plume) and more complex models that are computationally expensive and may require customization for different locations. Fourth, with the goal of providing rapid alerts, the framework can provide a localization and quantification estimate using as little as 15 min of data. Finally, the framework does not require an expansive sensor network, but rather can operate using only as many sensors as are required to surround all potential emission sources on the site (approximately 4–8 sensors). Note that the proposed framework can be applied to data from any sensor network that provides point-in-space methane concentration, wind speed, and wind direction measurements.
A preliminary evaluation of the framework was performed on 1 month of non-blinded, single-source controlled releases at the Methane Emissions Technology Evaluation Center (METEC) in Fort Collins, Colorado. These controlled release data are used solely to demonstrate the practical feasibility of the proposed framework, and further blinded testing is needed to more rigorously evaluate the framework’s performance. Non-blinded data are used in this study because the authors did not have access to blinded testing when developing the proposed methods.
2. Methods
In this section, we first describe the METEC facility and the specific CMS sensors that are used to demonstrate the proposed methods. We then describe the 2 main contributions of this work: (1) a gradient-based spike detection algorithm and (2) a modular framework for methane emission detection, localization, and quantification.
2.1. METEC and CMS sensors
The framework described in Section 2.3 was evaluated using data collected by Project Canary CMS sensors at METEC during the Advancing Development of Emissions Detection (ADED) research program (Zimmerle, 2020). METEC is a testing center that resembles an oil and gas production site and performs controlled methane releases from multiple pieces of equipment. The controlled release data used in this article are non-blinded, which is a deviation from the ADED protocol. Non-blinded data were used in this work because blinded data were not available to the authors when developing the proposed methods.
We focus on 1 month of controlled releases (April 17 to May 16, 2022), which contains 85 single-source releases ranging from 0.50 to 8.25 h in duration and 0.18 to 6.39 kg/h in size. Emission rates were constant during each release, and releases were separated by periods of no emissions with varying durations. For each single-source release, emissions could come from 1 of 5 potential emission sources, indicated with colored boxes in Figure 1. We assumed that the release point for the tanks was at a height of 4.5 m and the release point for all other sources was at a height of 2 m. There were 7 multisource emission events during this study period, which were excluded from all analysis given that the current framework is unable to localize multisource emissions.
Satellite imagery of the Methane Emissions Technology Evaluation Center (METEC). Source locations are marked with colored boxes. Continuous monitoring systems (CMS) sensor locations are marked with pins and are named based on their geographical position. Pins with gray interiors indicate that the corresponding sensor measures wind speed and direction in addition to methane concentrations.
Satellite imagery of the Methane Emissions Technology Evaluation Center (METEC). Source locations are marked with colored boxes. Continuous monitoring systems (CMS) sensor locations are marked with pins and are named based on their geographical position. Pins with gray interiors indicate that the corresponding sensor measures wind speed and direction in addition to methane concentrations.
Methane concentrations were measured by 8 CMS sensors placed around the perimeter of the METEC site at a height of 2.4 m, 3 of which also measured wind speed and direction. Additional higher sensors would have been necessary if taller emission sources were present (e.g., flare stacks). Sensor locations are marked with pins in Figure 1. Project Canary used Near-IR Tunable Diode Laser Absorption Spectroscopy (TDLAS) methane sensors and R.M. Young 2D ultrasonic anemometers during the ADED controlled releases. The methane concentration sensors have an accuracy of ±2% and a precision of ≤0.125 ppm with 60 s averaging as reported by the manufacturer (R Mistry, personal communication, 17/01/2024). The exact model of the methane sensors and details regarding the calibration process were not provided to the authors. The anemometers have an accuracy of ±2% ±0.3 m/s for wind speed and ±2° for wind direction and a resolution of 0.01 m/s for wind speed and 0.1° for wind direction as reported by the manufacturer (R.M. Young Company, 2021). Methane concentration, wind speed, and wind direction were measured every second and then averaged every minute by Project Canary, resulting in one data point per minute.
For the purpose of this study, we assumed a homogeneous wind field across the site, and hence the minute-by-minute median of wind speed and direction was taken across the 3 sensors that measured these variables. Using the median reduced the impact of sensor noise compared to the mean. Section S6 in the Supporting Information (SI) file contains details on the wind data processing scheme.
2.2. Spike detection algorithm
The proposed spike detection algorithm flags sharply elevated values (“spikes”) in a univariate methane concentration time series. This algorithm is used in the broader event detection, localization, and quantification framework to estimate background methane concentrations, determine when an emission is occurring, and isolate time steps in which a given sensor is recording a relevant signal.
The algorithm relies on 3 parameters: (1) the going up threshold in ppm, which is used to identify the start of an event, (2) the return threshold as a percent, which is used to identify the end of an event, and (3) the amplitude threshold in ppm, which is used to filter spikes by their background-removed amplitude.
The algorithm proceeds as follows. Iterate once through the univariate time series. When an observation is going up threshold ppm greater than the previous observation, start a spike. Take the concentration value immediately preceding the spike as an initial estimate of the methane background for this specific spike. Remain in the spike until the background-removed concentration values return to return threshold percent of the largest background-removed concentration value encountered during the spike. After exiting the spike, take the average of the concentration values immediately preceding and following the spike as an updated background estimate for the spike. Subtract this value from all observations within the spike, and if the largest background-removed concentration value within the spike is less than amplitude threshold ppm, discard the spike.
This algorithm results in a mask that records if each entry of the concentration time series is a spike or background. Furthermore, it requires only a single loop through the methane concentration time series, meaning that it can be run on historical or real-time data. Section S3 in the SI file contains a more detailed description of the spike detection algorithm, a discussion of the default parameter values, and example output from the algorithm.
2.3. Event detection, localization, and quantification framework
The proposed event detection, localization, and quantification framework pattern matches methane concentration observations from the CMS sensors to simulated concentrations assuming different potential sources. For each detected emission event, the potential source whose simulated concentrations most closely match the actual CMS observations is taken to be the localization estimate for that event. The emission rate is then estimated by minimizing error between the simulated concentrations given the localization estimate and the CMS observations. A rate estimate is only produced when there is reasonably good alignment between the simulated concentrations and CMS observations, which reduces the impact of simulation inadequacies (e.g., not accounting for turbulence).
The framework is separated into 4 steps that are described in Sections 2.3.1 through 2.3.4. The framework parameters and evaluation metrics discussed in these sections were informed by the non-blinded controlled release data while developing the framework. We provide a sensitivity study for each parameter and evaluation metric in Sections S1 and S2 of the SI file, which shows that the performance of the framework is not sensitive to the choice of parameter value or evaluation metric.
Finally, while event detection, localization, and quantification can be framed as an inverse problem, the proposed framework does not perform a full inversion to retrieve source location and emission rate. Instead, the dimension of the problem is greatly reduced by specifying potential emission sources. Doing so is a useful choice in practice for oil and gas production sites, as potential emission sources are often well-known (e.g., tank thief hatches, separators, and wellheads). This procedure would introduce quantification error if emissions originated from a source that was not specified in advance, but it is often feasible to identify all potential on-site emission sources for the production sites considered in this study.
2.3.1. Background removal and event detection
The simulated methane concentrations described in the following section do not include background concentrations that are present in the atmosphere. Therefore, background concentrations from the CMS observations must first be removed before comparing them to simulated concentrations. This is done for each sensor separately using the spike detection algorithm proposed in Section 2.2. Specifically, identified spikes that are close together in time are grouped, and each group is background-removed using the average of the concentration observations immediately preceding and following the group as the background estimate. Finally, all observations that are not within a group of spikes are set to zero.
Event detection is performed by taking the minute-by-minute maximum across the background-removed concentration observations from each sensor. This collapses the signals from each sensor into one time series that captures the concentration spikes across the entire site. If the sensors sufficiently surround the site, then any on-site emission will cause an enhancement in this maximum value time series, regardless of wind direction. Since all non-spike observations were already set to zero during background removal, emission events are defined as groups of non-zero values in this maximum value time series. Events that are separated by less than 30 min are combined, as there are often small gaps between concentration enhancements that do not correspond to gaps in emissions, but rather to periods in which the methane plume is not detected. Events that are less than 15 min long are discarded, as these short events typically correspond to noise rather than actual emissions. The 30- and 15-min thresholds used in this step were informed by controlled release data, and a sensitivity study is provided in Section S1 of the SI file. Section S4 in the SI file contains details on the event detection step and provides an alternative method that is better suited for more complex sites that have consistent concentration enhancements.
2.3.2. Forward simulation of methane concentrations
An atmospheric transport model is used to simulate methane concentrations at each sensor location during the events detected in the previous step. A separate simulation is run for all potential emission sources on the oil and gas site, which makes it possible to pick the most likely source for each event in the following step. The framework is not intrinsically tied to any atmospheric transport model, and a Gaussian puff model was used in this study as a balance between simulation accuracy and computational expense, availability, and ease of use. This model approximates a continuous release of methane from a point source as a sequence of many small “puffs,” each of which is modeled using a 3-dimensional Gaussian-like function. This provides a reasonable approximation of atmospheric transport within a short time period after release and over short distances barring any major obstructions that would block the transport of methane (e.g., a large building).
Simulated concentrations from the Gaussian puff model are a linear function of the emission rate selected for use in the simulation, meaning that the pattern of the simulated concentrations does not depend on the choice of emission rate. Therefore, an arbitrary unitary emission rate (1 g/s) was used in the simulations, as true emission rates are unknown in practice. With this in mind, the localization step described in the following section was designed to be scale-independent by evaluating patterns rather than overall amplitudes. The choice of emission rate in this simulation step does not impact the output of the framework. Section S5 in the SI file and Jia et al. (2023) contain a detailed description of the Gaussian puff atmospheric transport model and our implementation for use in the framework proposed here.
2.3.3. Source localization
For each detected emission event, the potential source whose simulated concentrations most closely match the actual CMS observations is taken as the estimated source. This pattern matching is performed by computing the correlation coefficient between a stacked vector of simulated concentrations at all sensor locations and a stacked vector of the corresponding background-removed CMS observations. This results in a correlation value for each potential source for all detected emission events. For each emission event, the source with the highest positive correlation is taken to be the localization estimate for that event. Correlation coefficient is a natural choice for the evaluation metric used in this step, but a customized metric that rewards spike alignment and penalizes spike misalignment was also tested. Section S2 of the SI file contains a detailed discussion of these metrics and a sensitivity study.
2.3.4. Emission rate quantification
Emission rates are estimated by comparing the amplitude of the background-removed CMS observations to the amplitude of the simulated concentrations, assuming the localization estimate from the previous section. This comparison is performed using only the time steps where both observations and simulated concentrations are in a spike, which are identified by the spike detection algorithm described in Section 2.2. This filtering drastically reduces the impact of transport model inadequacies on the emission rate estimate and in part justifies the use of a relatively simple forward model. If there are at least 4 time steps where both the observations and simulated concentrations are in a spike, then a rate estimate is computed for that emission event. Otherwise, the event is deemed unsuitable for quantification. This prevents individual time steps with large model errors from dominating any one emission rate estimate. Section S1 in the SI file contains more details about this threshold and a sensitivity study.
Specifically, emission rates are estimated as follows. For each emission event, the time steps in which both observations and simulated concentrations are in a spike are sampled with replacement 1,000 times. For each sample, root-mean-square error (RMSE) between a vector of simulated concentrations at the sampled time steps and a vector of the corresponding background-removed CMS observations is minimized over a range of simulation emission rates. This results in 1,000 sample-specific emission rate estimates. The vectors of simulated concentrations and observations contain data from all CMS sensors installed on the site. The average of the 1,000 sample-specific rate estimates is taken as the overall rate estimate for each event, and the 5th and 95th percentiles of the sample-specific rate estimates are used to construct a 90% confidence interval. This method of uncertainty quantification does not impose any assumptions on the symmetry of the confidence interval.
Three other evaluation metrics were tested in addition to RMSE, and Section S2 in the SI file contains a discussion and sensitivity study of these metrics. The optimization in this step can be performed using only one set of simulated concentrations, as the Gaussian puff model is a linear function of emission rate. Hence, to compute the simulated concentrations at a new emission rate, q g/s, the concentrations simulated using a unitary emission rate (1 g/s) simply need to be multiplied by q.
3. Results
A preliminary evaluation of the proposed framework was performed on 85 single-source controlled releases. Figure 2 provides an overview of the timing, source, and emission rate for each of the controlled releases along with the event detection, localization, and quantification results from the proposed framework. The remainder of this section will discuss specific aspects of the framework’s performance.
Detection, localization, and quantification results over the 1 month study period from April 17 to May 16, 2022. Controlled releases are shown with solid rectangles. The width of the rectangle corresponds to the duration of the release, the height shows the true emission rate, and the color indicates the true source. Estimated emission events are shown with transparent rectangles. The width of the rectangle corresponds to the duration of the estimated event and the color indicates the localization estimate. The heights of the transparent rectangles are fixed at an arbitrary value for visual clarity. Estimated emission rates and bootstrapped 90% confidence intervals are shown with black circles and lines, respectively.
Detection, localization, and quantification results over the 1 month study period from April 17 to May 16, 2022. Controlled releases are shown with solid rectangles. The width of the rectangle corresponds to the duration of the release, the height shows the true emission rate, and the color indicates the true source. Estimated emission events are shown with transparent rectangles. The width of the rectangle corresponds to the duration of the estimated event and the color indicates the localization estimate. The heights of the transparent rectangles are fixed at an arbitrary value for visual clarity. Estimated emission rates and bootstrapped 90% confidence intervals are shown with black circles and lines, respectively.
Timing and minute-by-minute event detection performance is considered first. Most emission start times are estimated accurately, with a median start time error of 3 min (meaning the estimated start time came 3 min late) and a distribution tightly centered around the median (25th percentile = 1 min and 75th percentile = 11 min). Similarly, most end times are estimated accurately, with a median end time error of 0 min and a distribution tightly centered around the median (25th percentile = −1 min and 75th percentile = 2 min). For context, the controlled releases ranged from 0.5 to 8.25 h in length, with an average length of 4 h. Overall, the framework correctly estimates the presence or absence of an emission event for 86.3% of time steps and has a minute-by-minute true positive rate of 79.9% and true negative rate of 92.5%. Sections S7 and S8 in the SI file contain details on start and end time errors and minute-by-minute event detection errors, respectively.
Figure 3 summarizes a number of event-level performance metrics. The framework correctly detects 100% of the controlled releases, with 92% deemed suitable for quantification. Note that only 5.5% of estimated events are false positives. Furthermore, 89% of the controlled releases overlap with at least one estimated emission event that was correctly localized and 82% have a completely correct localization estimate. The performance in all 4 categories is largely consistent across the 5 potential emission sources, indicating that the sensor arrangement shown in Figure 1 (including sensor height) is sufficient for capturing emissions from both the taller (i.e., tanks at 4.5 m) and shorter sources at METEC. This level of detection and localization performance suggests that the framework could be used to provide informative alerts to operators when emissions are occurring on production sites similar in complexity to METEC.
Summary of event-level performance metrics. The bar marked “correctly detected emission event” shows the percent of controlled releases that overlap with an estimated emission event. The bar marked “rate estimate available” shows the percent of controlled releases that are deemed suitable for quantification. Recall that the framework can estimate multiple emission events that all occur during one controlled release. Therefore, the bar marked “location estimate partially correct” shows the percent of controlled releases where at least one of the overlapping estimated events has correctly estimated the source location, while the bar marked “location estimate completely correct” shows the percent of controlled releases where all overlapping estimated events have correctly estimated the source location. Within each bar, the events are grouped by their true emission source.
Summary of event-level performance metrics. The bar marked “correctly detected emission event” shows the percent of controlled releases that overlap with an estimated emission event. The bar marked “rate estimate available” shows the percent of controlled releases that are deemed suitable for quantification. Recall that the framework can estimate multiple emission events that all occur during one controlled release. Therefore, the bar marked “location estimate partially correct” shows the percent of controlled releases where at least one of the overlapping estimated events has correctly estimated the source location, while the bar marked “location estimate completely correct” shows the percent of controlled releases where all overlapping estimated events have correctly estimated the source location. Within each bar, the events are grouped by their true emission source.
The parity plot in Figure 4 summarizes quantification performance by showing the true rate on the horizontal axis and the corresponding estimate on the vertical axis for all emission events. Ordinary least squares was used to fit a line with a zero intercept to all of the correctly detected events (i.e., true positives only), which can be used as one measure of overall bias in the estimates. Estimates for the larger emissions (>1 kg/h) are less biased than estimates for the smaller emissions (≤1 kg/h), where there is a tendency to underestimate. This suggests that the framework is better able to quantify larger emissions, which is expected as these emissions have a stronger signal (more separation from baseline and noise) and hence should be easier to quantify. Note that more than 1 in 10 uncertainty intervals miss the 1:1 line, as a perfect 90% confidence interval would suggest. This is likely because the emission rate resampling procedure described in Section 2.3.4 does not quantify uncertainty from the localization step or from inadequacies in the forward model.
Parity plot of true (horizontal axis) and estimated (vertical axis) emission rates. Point color and symbol show the true emission source for each event. Points between the black dashed lines are within a factor of two from the true rate. Points between the black dotted lines are within a factor of 3 from the true rate. Points along the vertical solid gray line are false positives, and points along the horizontal solid gray line are false negatives. Points along the horizontal dot-dashed gray line are emission events that were detected but were not deemed suitable for quantification. The magenta line shows the linear model fit to all correctly detected events (i.e., true positives only) using ordinary least squares and a zero intercept. One estimate of 12.1 kg/h corresponding to a controlled release of 4.9 kg/h is excluded for visual clarity. Uncertainty in the METEC emission rates is a 95% confidence interval provided by METEC. The right panel zooms in on data shown in the left panel. The magenta line in the right panel is the same best fit line as shown in the left panel. METEC = Methane Emissions Technology Evaluation Center.
Parity plot of true (horizontal axis) and estimated (vertical axis) emission rates. Point color and symbol show the true emission source for each event. Points between the black dashed lines are within a factor of two from the true rate. Points between the black dotted lines are within a factor of 3 from the true rate. Points along the vertical solid gray line are false positives, and points along the horizontal solid gray line are false negatives. Points along the horizontal dot-dashed gray line are emission events that were detected but were not deemed suitable for quantification. The magenta line shows the linear model fit to all correctly detected events (i.e., true positives only) using ordinary least squares and a zero intercept. One estimate of 12.1 kg/h corresponding to a controlled release of 4.9 kg/h is excluded for visual clarity. Uncertainty in the METEC emission rates is a 95% confidence interval provided by METEC. The right panel zooms in on data shown in the left panel. The magenta line in the right panel is the same best fit line as shown in the left panel. METEC = Methane Emissions Technology Evaluation Center.
True emission rate is the primary driver of quantification accuracy, with higher emission rates resulting in smaller quantification errors. There is a slight relationship between quantification error and emission duration and wind speed, with longer emissions and higher wind speeds resulting in slightly smaller errors, but this relationship is much less influential than the relationship between quantification error and true emission rate. Section S9 in the SI file contains details on these relationships.
Figure 5 further explores the difference in quantification accuracy between small (≤1 kg/h) and large (>1 kg/h) emissions by showing histograms of the percent difference between estimated and true rates. Note that percent difference is calculated as the estimated rate minus the true rate divided by the true rate. Estimates for small emissions (≤1 kg/h) tend to underestimate, with an average percent difference of −3.9% across all identified emission events. Note, however, that the median percent difference is −28.9%, and 68.6% of the estimates underestimate the true rate. This means that underestimation is more common than overestimation for small emissions, but the overestimation error is larger than the underestimation error on average. Estimates for large emissions (>1 kg/h) tend to overestimate, with an average percent difference of 4.3%. In this regime, the median percent difference is −2.3%, and 53.5% of the estimates underestimate the true rate. This means that over and underestimation are nearly equally likely for large emissions, with overestimation error being only slightly larger than underestimation error on average.
Percent difference between estimated and true emission rate. (a) shows events with small (≤1 kg/h) true emission rates, and (b) shows events with large (>1 kg/h) true emission rates. Vertical gray lines show the error bounds of the middle 50% and 90% of all individual estimates. For example, the solid gray lines show the percent difference range that contains 50% of all emission rate estimates. The vertical red lines show the average percent difference value taken across all individual rate estimates. Darker shading indicates underestimation and lighter shading indicates overestimation.
Percent difference between estimated and true emission rate. (a) shows events with small (≤1 kg/h) true emission rates, and (b) shows events with large (>1 kg/h) true emission rates. Vertical gray lines show the error bounds of the middle 50% and 90% of all individual estimates. For example, the solid gray lines show the percent difference range that contains 50% of all emission rate estimates. The vertical red lines show the average percent difference value taken across all individual rate estimates. Darker shading indicates underestimation and lighter shading indicates overestimation.
For small emissions (≤1 kg/h), 90% of the estimates have error within a percent difference of [−74.9%, 195.2%] from the true rate. Performance is markedly better for large emissions (>1 kg/h), where 90% of the estimates have error within a percent difference of [−49.3%, 78.8%] from the true rate. For small emissions (≤1 kg/h), 57% and 77% of estimates have error within a factor of 2 and 3, respectively, from the true rate. For large emissions (>1 kg/h), 91% and 100% of estimates have error within a factor of 2 and 3, respectively, from the true rate.
Finally, when considering only small emissions (≤1 kg/h), the framework underestimates cumulative emissions over the 1-month study period by 38.5%. When considering only large emissions (>1 kg/h), the framework underestimates cumulative emissions by 0.2%. Overall, when including both false positives and false negatives and both small and large emissions, the framework underestimates cumulative emissions by 5.5%. Note that these cumulative errors are with respect to total emitted volume and are therefore impacted by errors in estimated emission event start and end times. This is why the cumulative errors for small and large emissions differ from the average quantification errors on individual events discussed earlier. In particular, estimated emission events for smaller emissions (≤1 kg/h) have a slight tendency to underestimate total emission duration (see Figure 2 for examples), which is why cumulative emissions are more underestimated in this regime. Section S10 in the SI file contains details on cumulative quantification error.
4. Discussion
We propose an analytical framework for the detection, localization, and quantification of single-source methane emissions on relatively simple oil and gas production sites. A preliminary evaluation of the framework’s performance was performed using non-blinded data, but further blinded testing is needed to more rigorously evaluate the framework. The true controlled release information was used solely for framework evaluation in this study and not training, as the framework is fairly rigid and does not require training data to operate. While the framework relies on a number of input parameters, they control site-agnostic settings and were not optimized using the controlled release data. A full sensitivity study of the framework parameters is provided in the SI file and shows that framework performance is not highly sensitive to the input parameters. The proposed framework has also been used on multiple oil and gas production sites, with results closely in line with top-down measurement techniques (Daniels et al., 2023).
We do not intend for this work to be a direct comparison to the CMS evaluation performed in Bell et al. (2023), despite using a subset of the same controlled release data. The evaluation in Bell et al. (2023) was single-blinded and performed on both single- and multisource emissions, whereas this study considers only single-source emissions. Quantifying single-source emissions is a considerably simpler problem than multisource emissions but is a crucial first step in overall algorithmic development. The primary aim of this work is to propose an open-source framework for emission event detection, localization, and quantification, and the initial evaluation on non-blinded, single-source controlled releases is meant to demonstrate the framework’s practical feasibility.
Multisource and off-site emissions pose challenges that are outside the scope of this article and are therefore left for future work. Multisource emissions could be modeled as the sum of 2 or more single-source emissions (as simulated concentrations from the Gaussian puff model are linearly additive), and hence would require no additional simulations to accommodate. However, this will notably increase the search space during localization and quantification and requires further study to assess its practical feasibility. Off-site emissions are an important consideration for dense production settings (e.g., the Permian basin) and could be identified by incorporating potential off-site sources during the simulation step. The fidelity of this strategy would need to be balanced against the extra computational cost of simulating from many additional sources.
The number of sensors and their placement is another important consideration when deploying CMS. The evaluation in this study used 8 sensors, and hence it was very likely that a methane plume would be detected by at least one sensor regardless of wind direction. Fewer sensors are often used in practice, making it more likely that a methane plume may blow between two sensors and hence cause very small or no concentration enhancements. Therefore, instances in which the methane plume is blown between sensors need to be properly identified so that they do not bias the overall quantification estimates low, which will be addressed in future work.
Applying this framework to larger, more complex sites (e.g., midstream compressor stations) will likely require a more nuanced monitoring and modeling approach. Large buildings and equipment groups block the transport of methane plumes and introduce downwash effects, both of which are not captured by the Gaussian puff model. Therefore, a more sophisticated transport model may be necessary for these types of sites. Furthermore, a single ring of sensors placed around the perimeter of larger sites will likely provide insufficient signal separation, as it is hard to distinguish between two nearby sources when the sensors are far away. For monitoring purposes, such sites may need to be divided into smaller sectors that are each surrounded by sensors.
Despite these limitations, the proposed framework in its current form has 2 clear use cases on relatively simple oil and gas production sites: (1) alerting, where localization and quantification estimates are provided to the operator in near real time, and (2) inventory development, where quantification estimates are used in aggregate to determine cumulative emissions at a given cadence. The variability of individual rate estimates does not detract from the applicability of the framework in these use cases. For alerting, the ability to generate continuous emission estimates with localization in near real time is more pertinent than the exact rate estimate, as the main purpose of alerts is to give the operator enough information to decide if further investigation is necessary. If a large rate estimate is consistently localized to a given equipment group every, for example, 15 min, then the operator can be fairly confident that an emission is indeed occurring, regardless of the exact rate estimates that are being generated. For inventory development, the average of many individual rate estimates will be used to produce an overall estimate for the site at a, for example, monthly or yearly cadence. This means that variability in the individual estimates will be averaged out and the overall bias in the estimates will drive the accuracy of the inventory. The framework is largely unbiased, with quantification estimates having an average percent difference of −3.9% from the true rate for small emissions and 4.3% for large emissions. The proposed framework has already been used for exploratory inventory development on oil and gas production sites, with a case study in Daniels et al. (2023) showing that the framework is better able to capture operational activity than a traditional bottom-up inventory, an encouraging sign of the utility of this framework in real-world settings.
Data accessibility statement
All code and data can be found at https://github.com/wsdaniels/DLQ.
Supplemental files
The supplemental file for this article can be found as follows:
DLQ_SI.pdf
Funding
This work was partially funded by grants from Project Canary and from the Energy Emissions Modeling and Data Lab (EEMDL).
Competing interests
The authors have no competing interests to declare.
Author contributions
Contributed to conception and design: WSD, MJ, DMH.
Contributed to acquisition of data: WSD, MJ, DMH.
Contributed to analysis and interpretation of data: WSD, MJ, DMH.
Drafted the article: WSD.
Revised the article: WSD, MJ, DMH.
Approved the submitted version for publication: WSD, MJ, DMH.
References
How to cite this article: Daniels, WS, Jia, M, Hammerling, DM. 2024. Detection, localization, and quantification of single-source methane emissions on oil and gas production sites using point-in-space continuous monitoring systems. Elementa: Science of the Anthropocene 12(1). DOI: https://doi.org/10.1525/elementa.2023.00110
Domain Editor-in-Chief: Detlev Helmig, Boulder AIR LLC, Boulder, CO, USA
Guest Editor: David Lowry, Earth Sciences, Royal Holloway University of London, Egham, United Kingdom
Knowledge Domain: Atmospheric Science