1 min read
Data Quality in Petrophysical Machine Learning Workflows: Why it matters
When developing petrophysical machine learning models, it is essential to ensure that input data is of high quality, trustworthy and reliable. Data...
5 min read
Andy McDonald
:
July 03, 2025
Well log data is a key data source for petrophysical analysis and machine learning models, however, it can be affected by a range of issues including sensor issues, borehole environmental conditions, missing data and even human error (Data Quality in Petrophysical Machine Learning Workflows: Why it matters).
Any errors or inconsistencies within the data that are not identified and dealt with can propagate through entire workflows, leading to less reliable interpretations, poor performing machine learning models and costly decisions. By catching any data issues early on, we can ensure that data quality remains high, thereby maximising any value derived from it.
Interactive Petrophysics (IP) provides a powerful suite of data visualisation and data quality tools to help you better understand your data. These include crossplots, log plots, histograms and a dedicated Log Quality Control module.
Within this article we will explore how these different tools can be used to understand your data and identify data quality issues that may be present.
Interactive Petrophysics (IP) contains numerous tools for visualising well log data, including log plot, crossplots (scatter plots), histograms, and raw data listings. It is highly recommended that different plots are used to interrogate and visualise the data – something that is not apparent on a log plot may become apparent on a crossplot when other variables are considered.
Log plots are a great and very common way to visualise well log measurements with respect to depth. They also allow us to easily compare other measurements at the same depth, which can help us understand if the anomaly we are seeing is isolated to a single measurement or related to other measurements. For example, a washed out section would show a high caliper reading, which in turn could lead to an increase in Sonic Slowness (DTC) and a decrease in Bulk Density (RHOB).
Example of a data spike within the Sonic Slowness (DTC) curve, where measurement values are in excess of 598 us/ft.
A histogram is used to visualise the distribution of values for a given measurement as well as the frequency of data points within specified ranges. From these charts, we can understand how spread out the data is, if there are outliers or if there is any skewness within the dataset.
If we are working with multiple wells with the same formation, we can easily compare the data distributions from the wells. This can be useful to understand if there any differences between them, which could arise from geological differences to logging tool differences.
Outliers, if they are within the set range for the plot, will appear outside the expected range.
A crossplot (also known as a scatter plot), is used to visualise the relationship between two or more variables, where one axis represents one variable, and the other axis represents the other variable. Sometimes other variables can be used to create a third axis and change point size based on values.
They are a common tool for understanding the relationship between the variables and identifying patterns within the data, especially for different lithologies.
In the crossplot example below, we are plotting neutron porosity vs bulk density. These data are then coloured by a data quality flag generated by the Log QC module. When there is washout, data points tend towards the upper right as density drops and neutron porosity increases. These points have been highlighted by the data quality flag, in addition to a few points that are in the centre and require further attention.
Interactive Petrophysics density-neutron crossplot can be to identify data impacted by borehole washout (points flagged in red).
The Log QC Module within Interactive Petrophysics (IP) has been designed to assist you to flag potential issues within your well data. It is used to highlight and draw your attention to potentially problematic intervals, which includes:
The Interactive Petrophysics Log Quality Control module is used to flag areas of poor quality data, missing data and data affected by borehole washout.
By using the defaults for the selected inputs you can build up a general idea of the quality of your data. These defaults can easily be changed to suit experienced gained within your project.
In the image example above, we can clearly see where problematic areas are. At the top, we have data highlighted in red, indicating that certain badhole limits have been reached (eg, caliper exceeds our upper limit).
Any curves that are missing from the inputs will be shaded in grey, making it immediately obvious that data is missing.
Further down the log in the gamma ray track (Track 3), an interval has been highlighted in blue, indicating that we have constant values.
Overall, we can say:
Based on this we can build up a picture of the quality of our data. We can see what is missing, what is impacted by badhole conditions and what may be outwith our expected limits for each input.
The Log Quality Control module can also be used to flag values that require further attention and are outside our expectations, such as the high gamma ray shown in track 2.
In a slightly different example, the above log shows what happens when values fall outside of your expected limits (SGR greater than 200API). This may not necessarily be a bad thing, but it highlights that something is not normal.
Also, when we have extreme values, such as RHOB (track 3) falling below 1g/cc, we know that something may not be right with this data.
Both of the above would require further investigation to determine if this data is real (which it could be – a hot shale for the high GR or a tool issue with 0 g/cc reading for RHOB) and what to do with that data (for example excluding that data from further interpretation or repairing it with machine learning algorithms).
To access the Log QC module, navigate to the Edit menu, and select Log QC
After data quality has been assessed and managed, there are a number of benefits that can come from this. These include, but are not limited to:
As well log data is a primary data source for petrophysical analysis and petrophysical machine learning models, it is essential that the data is of good quality and trustworthy. This in turn allows us to make confident and effective business decisions based upon any analysis derived from that data.
After loading your data, it is essential to identify and deal with any data quality issues. These can be caused by sensor failures, borehole environmental conditions or even missing data. By catching these issues early on, you can ensure that your data quality remains high and thereby maximise the value that can be derived from it.
Interactive Petrophysics (IP) contains several modules that can help with understanding your data and identifying costly data quality issues. Data visualisation is a key part of the process and can be done using crossplots, histograms and log plots, as well as other tools. The Log Quality Control Module provides a simple and easy way to flag potential outliers, missing data and data affected by bad hole conditions.
By using these tools in Interactive Petrophysics (IP), you can ensure that you are following steps to identify data quality issues early on, which can save you time and headaches further down the interpretation pipeline.
McDonald, A., 2021. Data Quality Considerations for Petrophysical Machine-Learning Models. Petrophysics, [online] 62, pp.585-613. Available at: https://onepetro.org/petrophysics/article-abstract/62/06/585/473276/Data-Quality-Considerations-for-Petrophysical
McDonald, A., 2022. Impact of Missing Data on Petrophysical Regression-Based Machine Learning Model Performance. SPWLA 63rd Annual Logging Symposium, June 2022. Available at: https://onepetro.org/SPWLAALS/proceedings-abstract/SPWLA22/4-SPWLA22/D041S016R004/487867
Banas, Ryan, McDonald, Andrew, and Tegwyn J. Perkins. “Novel Methodology for Automation of Bad Well Log Data Identification and Repair.” Paper presented at the SPWLA 62nd Annual Logging Symposium, Virtual Event, May 2021. doi: https://doi.org/10.30632/SPWLA-2021-0070
1 min read
When developing petrophysical machine learning models, it is essential to ensure that input data is of high quality, trustworthy and reliable. Data...
When planning and building machine learning models, a question often asked is, “What features should I use as input to my model?”.
We have enhanced subsurface interpretation with the launch of our dynamic new Mapping feature in Interactive Petrophysics (IP) - a game-changer for...