August 2023

Machine learning-assisted induced seismicity characterization of the Ellenburger formation, Midland basin

A hybrid geophysical method enables engineers to understand the relationship between saltwater injection in the Ellenburger formation of West Texas and associated earthquake activity. The innovative combination of technologies provides operators with a reliable earthquake probability map to guide planning of injection wells and schedules to avoid environmental issues.
Niven Shumaker / SLB Dr. Kaustubh Shrivastava / SLB Mohamed Afia / SLB

Produced water disposal is a key challenge associated with oil and gas operations. One of the popular techniques to dispose of produced water is through its reinjection in subsurface. This reinjection can lead to an increase in the pressure in the formation used for disposing produced water and can result in increased earthquake activity. This can have a significant impact on the environment and add uncertainty to design and planning decisions.  

The northern Midland basin has seen a recent uptick of basement fault-related seismic events, commonly referred to as induced seismicity, which correspond to increased industrial activity since 2017. Previous investigations in the Permian basin have focused on characterizing earthquake focal mechanisms, poroelasticity, fault-slip potential, and multivariate statistical models.  

In this article, we focus on characterizing the relationship between induced pore pressure in the heterogeneous Ellenburger saltwater disposal (SWD) units and observed seismicity in the northern Midland basin. The novelty of this investigation is the incorporation of high-quality semi-regional 3D seismic data for structural and stratigraphic interpretation, modeling of reservoir heterogeneity, and the coupling of physics-based reservoir simulation with multivariate statistics, ML, and optimization workflows. 


Fig. 1. Simplified stratigraphic column in our area of interest in the northern Midland basin. Structural interpretation of the 3D seismic area focused on the Strawn (green) seismic horizon and underlying strata. Seismic attribute work identifying karst facies was performed on stratal slices between the Strawn and Ellenburger (light blue). A reservoir model was built for the upper Ellenburger. Most earthquake hypocenters are in the basement (orange).

Historically, the San Andres formation of the northern Midland basin has been suitable for SWD, given its high porosity and low drilling and completion (D&C) cost. However, incidences of drilling challenges and interference into lower productive units caused by overpressuring has caused operators to seek other alternatives, such as the underlying Lower Ordovician Ellenburger formation. While D&C costs of the Ellenburger are considerably higher because of its depth and the overlying pressure benches, the Ellenburger is regionally extensive, known to have prolific karst facies of high injectivity index, and is sealed by overlying Permian mudrocks, making it a suitable candidate for SWD. Interpretation of 3D seismic data indicate a Lower Paleozoic structural system, largely decoupled from the overlying Permian mudrocks. In this article, we focus on structural interpretation and earthquake seismicity of the Lower Paleozoic/basement section, Fig. 1. 


Overview and data sources. Our methodology discussed herein is a hybrid of a traditional petrotechnical and a data science workflow, where features such as basement structuring and disposal unit pressure are derived from semi-regional 3D seismic interpretation, geocellular property modeling, and reservoir simulation. These features are subsequently fed into multivariate ML engines for the purpose of earthquake risk characterization. The generated workflow is, thereafter, also used to run what-if and optimization scenarios, Fig. 2. 

Fig. 2. Process diagram depicting key elements of our method. 3D seismic interpretation is used as the basis for reservoir model construction and feature engineering that is passed to the multivariate classification model. Underground injection control records and proprietary sources are used to construct wellbore models, calculate injectivity index and to history-match the reservoir model.

Data used for this analysis were derived from various public sources. The historical injection data were taken from the Texas Railroad Commission underground injection control (UIC) records. UIC records contain wellbore information, such as top and base of injection zone, monthly data on fluid volumes, and maximum and average injection pressures. Earthquake data were obtained from the TexNet earthquake catalog. In addition to public sources, proprietary datasets were used for obtaining regional 3D seismic (WesternGeco). 

SWD well performance and injection history. To assess reservoir injectivity, in this study we used a Hall diagnostic plot, Fig. 3. The Hall Plot is a graphical representation of the pressure response of a well to injection, which helps to identify the main factors affecting the well's performance, such as reservoir heterogeneity and wellbore damage. The plot shows the aggregate daily-weighted injection pressure vs. cumulative injection volume. Given the large number of injection wells in our area of interest, we adopted a linear tree regressor to automatically flag poor data, interpret flow regimes, and calculate estimates of reservoir injectivity index. The calculated reservoir injectivity index was then employed for calibrating the reservoir model. 

Fig. 3. Hall diagnostic plot divided into injectivity index segments (dashed gray lines), using a linear tree regressor. This high-rate injection well demonstrates high injectivity index in underfilled karst zones and progressive filling of tighter pore volumes and possibly increasing completion skin.

Structural interpretation and reservoir modeling. We used semi-regional 3D seismic data to interpret stratigraphic horizons and structural faults in the reservoir. This enabled us to generate facies maps that further informed our reservoir simulation model. Because of the structural complexity of the Lower Paleozoic in the Midland basin, we used image-segmentation-based, ML-assisted fault interpretation methods to generate fault point sets. This analysis indicated that the deep faults within our interpretation area are typically dipping steeply (>65°). Using the ML-assisted fault interpretation enabled us to quickly process the semi-regional 3D seismic data and enabled us to identify small branch faults, subtle juxtapositions, and radial karst faults in the dataset.  

In addition, the process enabled us to leverage curvature attributes to extract karst facies from the seismic dataset. We use seismic curvature attributes to identify karst features on stratal slices in the Ordovician, Fig. 4. Karst features are used to identify reservoir facies and are used as a feature in the multivariate earthquake prediction model. While structural faults are manifest on the curvature attributes, we rely on the fault point artifacts from the ML-assisted fault interpretation to delineate structural faults. 

Fig. 4. Seismic curvature attribute from Ordovician stratal slice, demonstrating the presence of karst zones (circular features near white arrow) and structural faults (linear features shown by the pink arrow). Green arrow indicates north.

To generate Ellenburger reservoir model layers, we used the facies model presented by Kerans (1988) and Sanchez (2019) as a first-order guide, Fig. 5. We extended the generated Ellenburger model up section into the Simpson group and into non-reservoir Lower Ellenburger. The reservoir model was subsequently populated with grid faults, and local grid refinements were carried out near complex faults in the model. Except for the local grid refinements, the base grid size in the reservoir simulation was set at 1,000 ft x 1,000 ft x 100 ft. To calibrate fault transmissibility, permeability anisotropy, and reservoir properties, we used injectivity index estimates generated from the linear tree regressor and also employed standard history-matching workflows.  

Fig. 5. Generalized facies model. Key layers with high permeability are the cave roof and lower collapse zone. Property modeling is calibrated to injectivity indices derived from the Hall plots when available. Many high-rate injection wells are located near karst features (after Kerans 1988).

For the simulation, the Ellenburger formation was assumed to be normally pressured at initial conditions and was saturated with high-salinity brine. To ensure stability of the reservoir model while simulating flow of a slightly compressible fluid in a saturated reservoir, a large numerical aquifer is used at the model boundaries. This enabled us to calculate pressure distribution in the reservoir as salt water is injected into the associated disposal wells, Fig. 6. The increased (induced) reservoir pressure caused by injection of salt water is then used to characterize earthquake risk in the region, using ML. 

Fig. 6. Representative history match of semi-regional Ellenburger reservoir model. Average monthly tubinghead pressures from UIC records are converted to flowing bottomhole pressures and are used as a guide to tune hydraulic properties of the reservoir model, such as grid cell poro-perm, fault transmissibility, and aquifer size.

Feature engineering and machine-learning pipeline. An ML pipeline was developed to build and deploy a multivariate classification scheme to characterize earthquake risk.

The fundamental unit of analysis in the constructed pipeline is a calendar month in the time domain and a reservoir model grid cell in the spatial domain. Simulated reservoir pressures and earthquake data were aggregated on a monthly basis. Static geological interpretation points such lineament positions, lineament orientation, and karst facies are aggregated to the nearest reservoir model grid cell and repeated for all time steps in the simulation. The calculated features were then scaled and passed into our modeling framework. In the workflow, earthquake data were filtered to exclude earthquakes from the study that occurred at depths shallower than the base of the Permian section. The remaining deeper earthquake epicenters were aggregated to the nearest reservoir grid cell in the reservoir model.  

The processed dataset was split into a training/testing set stratified on the class label of earthquake/no earthquake to use in the model training workflow. As earthquake occurrence data were spatially skewed in our reservoir model, class weighting was applied to the dataset during the model training to correct for the imbalance. It was observed that the reservoir simulation grid had a 400:1 imbalance between grid cells that did not contain an epicenter and grid cells that did contain an epicenter.  

To develop the optimal ML model, we conducted a systematic study to understand the performance and applicability of various classification models on the problem of characterizing earthquakes. We observed that tree-style schemes typically overfit the training data. After our methodical investigation, we found that multivariate logistic regression and a simple three-layer neural network are optimum choices for addressing the problem. The multivariate logistic regression results in a smoother prediction kernel while the neural network emphasizes the local features, such as the lineament trends. We designed an ensemble of these two types of models to produce a final classification that can be more robust and accurate than the predictions of these two individually, Fig. 7. 

Fig. 7. Receiver operating characteristic evaluating the test dataset of the earthquake binary classification scheme. Area under curve score is shown in the legend. Both models are trained on the same six input features. The three-layer neural network has slightly higher accuracy and emphasizes the basement lineament set more than the logistic regression.


A large body of existing work in the literature is based on deterministic Mohr-Coulomb-style fault-slip potential (DFSP), where the unit of analysis is a given fault plane within a background stress field. DFSP depends on estimates of reservoir pressure, internal friction of the material, and knowledge of the principal stresses acting on the fault plane. In practice, rigorous geomechanical modeling is limited by lack of relevant data acquisition, such as borehole image logs, dipole sonic, leakoff tests and formation pressure measurements that are needed to estimate the principal stresses and rock strength along a 1D wellbore and, by extension, within a semiregional 3D area of interest.  

The novelty of our data science method is that relevant geomechanical properties are inferred by linking engineered features (such as reservoir pressure, distance to a basement lineament, Riedel shear-stress projection on the lineament set, and prevalence of karst that are derived from semi-regional 3D seismic interpretation) with historic earthquake events.  

The results of our earthquake prediction model are shown in Figs. 8a and 8b. Our model was trained to identify subsurface conditions correlated with earthquake events and will err on the side of depicting a false positive at any given grid cell in our model. Partial dependency plots demonstrate earthquake probability is the highest for grid cells: 1) that are not associated with a karst feature; 2) are above a pressure threshold; and 3) are near a basement lineament oriented along a Riedel shear set implied by a N79°E principal displacement zone, which is consistent with SHmax in the northern Midland basin, reported by Snee and Zoback (2018).  

Fig. 8. Classification scheme outputs inferred in December 2022. Color is probability of earthquake, calculated at grid cells of the reservoir model. Red dots are the epicenters of earthquakes for all dates up to December 2022. Injection wells are indicated with colored circles. The green arrow is north.

(with Fig. 8: The classification probability can be interpreted as a seismic risk map, indicating the similarity of conditions historically associated with earthquake epicenters and the conditions of the cells of the reservoir model for the given timestep. Figure 8a shows an earthquake swarm in the vicinity of three injection wells from the northern part of our study area. The map indicates earthquake positions concentrated along a pressurized lineament set. The white arrow annotated “K” indicates the presence of a karst feature interpreted from seismic curvature attributes. In our study area, few earthquake epicenters are coincident with karsts. Fig. 8b shows earthquake swarms concentrated along a major ENE structural fault zone. Red coloring indicates high reservoir pressure along the basement lineament set in the vicinity of the two active injection wells (peach and yellow). White arrow annotated “L1” shows a preferentially oriented lineament, whereas white arrow annotated L2 shows a non-preferentially oriented lineament. The white arrow annotated “NL” shows an area away from a basement lineament (not lineament). Like in Fig. 8a, the purple circular features are karsts.)

Physical process model. Most earthquake hypocenters in our dataset are positioned below the base of the Paleozoic section, which is between 12,000 and 14,000 ft in our area, whereas reservoir pressure was modeled, and faults were picked, in the Paleozoic section. Consistent with other investigators, we propose that structural faults transmit pressure into deeper basement layers which, in turn, induces fault slip, Fig. 9. 

Fig. 9. A process model consistent with our observations. Pressuring of the Ellenburger in the vicinity of basement lineaments is correlated with basement seismicity in the northern Midland basin (Walsh et al. 2017).

We observed that in some areas of our 3D seismic footprint, earthquake epicenters are consistent with the presence of a Paleozoic fault system, whereas in some areas there are linear earthquake swarms that are not associated with a nearby structural fault. In addition, we observed that these linear earthquake swarms were on trend with a mapped fault set. Following a similar lineament mapping methodology often employed to infer joints, faults, and shear zones in the subsurface, based on topographic maps, we generated a basement lineament set by connecting on-trend fault segments mapped in the Lower Paleozoic. To generate the basement lineament set, we employed a Hough transform (commonly used for edge detection in image processing) to quantitatively generate the lineament set from the fault interpretation points.  

It was observed that as a predictive feature, the lineament set is effective in predicting both earthquake epicenters in the vicinity of a mappable Paleozoic fault plane and linear earthquake swarms not directly associated with a mappable fault plane. In the context of our multivariate classification model, the basement lineament set is capturing implied hydraulic properties of a discrete fracture network (DFN) of high permeability and low storage porosity.  

In our multivariate model, earthquake probability increases on a preferentially oriented lineament that is coincident with high pore pressure in the overlying Ellenburger group, caused by water injection. Grouping the Ellenburger and basement as a common geomechanical unit in the current tectonic stress regime, the lineament set indicates preferred planes of strain connected by preexisting weaknesses in the rock fabric of the basement and Lower Paleozoic section.  

SWD optimization scheme. Given the trained multivariate classification model, what-if forecasting and planning scenarios can be used to optimize SWD rates, given known proximity to a karst feature, basement lineament presence and orientation, and reservoir pressure while remaining below an induced seismicity risk tolerance. Figure 10 shows the result of an injection rate maximization scheme in the vicinity of a major ESE structural fault.  

Fig. 10. A SWD injection rate maximization scheme, based on the trained multivariate classification model. Color is reservoir pressure at grid cells along the fault; cyan streamlines demonstrate flow away from the injection wells. The injection well rate, at 1,000 barrels of water per day, is labeled above each of the three injection wells. Maximum injection rate depends on the background reservoir pressure from historic injection, well spacing, distance to faults and lineaments and presence of karst.


The methodology described in this article provides a tool to explain historic seismicity and to inform asset development planning, the regulatory permitting process, and interaction within adjacent operators and the SWD industry. Our model uses traditional “seismic to simulation” workflows as the basis for custom feature engineering that is passed into multivariate classification models that infer the probability of an earthquake epicenter occurring at a given grid cell in our semiregional reservoir model.  

Our findings are that grid cells with higher pore pressure caused by water injection, proximity to a basement lineament preferentially oriented relative to a Riedel shear set, and not positioned at an Ellenburger karst, are more likely to coincide with an earthquake epicenter as reported by TexNet. Novelties of this work are the incorporation of Ellenburger heterogeneity on a semi-regional scale, characterization of a basement lineament set based on detailed mapping of 3D seismic data, and the imputation of relevant geomechanical features commonly used in DFSP analysis, using ML workflows calibrated to historic earthquake data. 


The authors would like to thank SLB and WesternGeco for allowing them to present this work. This article contains excerpts from URTeC paper 3855873, “Machine-learning-assisted induced seismicity characterization and forecasting of the Ellenburger formation in northern Midland basin,” presented at the Unconventional Resources Technology Conference, held in Denver, Colo., June 13-15, 2023. 

About the Authors
Niven Shumaker
Niven Shumaker is a data scientist and innovation advisor on the Global Innovation Factori team at SLB where he builds new AI-driven solutions for reservoir simulation, pressure transient analysis and seismic interpretation.
Dr. Kaustubh Shrivastava
Dr. Kaustubh Shrivastava DR. KAUSTUBH SHRIVASTAVA is a data scientist with the Houston Innovation Factori at SLB where he develops AI driven solutions for the oil and gas industry.
Mohamed Afia
Mohamed Afia is a Cairo-based senior reservoir geophysicist at SLB.
Related Articles FROM THE ARCHIVE
Connect with World Oil
Connect with World Oil, the upstream industry's most trusted source of forecast data, industry trends, and insights into operational and technological advances.