The Hidden Cost of Dirty Data: Why AI in Mining Demands Robust Data Governance
- Sarada MW Lee
- Jul 16
- 3 min read
Updated: Jul 29
Author: Sarada MW Lee, Perth Machine Learning Group

As the mining industry accelerates toward digital transformation, AI and machine learning (ML) are becoming indispensable tools for optimising operations, reducing costs, and enhancing safety.
However, behind the promise of predictive algorithms lies a vital truth: the quality of insights depends entirely on the quality of data. Garbage in, garbage out (GIGO) is not just a cliché - it’s a harsh reality in AI.
Start with Digitalisation
Before deploying AI solutions, mining operations must prioritise digitising paperwork and legacy systems (i.e. SCADA logs, handwritten maintenance forms). This key step ensures that data is accessible, traceable, and ready for analysis. Without digital records, even the most advanced AI models are ineffective.
Fit-for-Purpose Use Cases
AI success begins with asking the right questions. Not every problem calls for a complex model. Concentrate on use cases where measurable variables have a causal link to the target result, whether it's predicting equipment failure, optimising ore grades, or boosting haulage efficiency. It's also crucial to realise that not all valuable observations are captured by current systems. Human knowledge, undocumented procedures, contextual factors and limitations of what can be quantifiably measured with sensors often escape digital records. As a result, even the most advanced AI models can overlook important patterns or draw incorrect conclusions if the data doesn't show the complete picture.
Understanding and Managing Data Bias
Biases in data, whether from incomplete records, historical skew, or sensor issues, can lead to flawed insights and risky decisions. A solid data governance framework includes checks for bias with clear strategies for mitigation.
Handling Outliers and Missing Data
Not all outliers are true anomalies. Some indicate safety incidents, rare occurrences from changes in input materials, or other unexpected environmental conditions. Understanding their context through domain expertise is crucial. Similarly, missing data can skew analysis if overlooked. Techniques such as interpolation, imputation, or exclusion should be used with care.
Understanding Relationships Between Data Variables
Understanding how different variables in your data interact is essential. If seemingly independent variables are actually connected, AI models can make false assumptions, leading to inaccurate predictions. Ensuring data variables are genuinely independent where necessary helps create more reliable and meaningful models.
Getting AI-Ready: Where Mining Data Collection Must Start
As mining operations increasingly automate asset maintenance, production monitoring, and environmental compliance, now is the time to audit your data collection processes. If you're relying on AI to optimise these areas, ensure your data is digital, complete, and governed with rigour. Start by identifying gaps in your current systems — because dirty data doesn’t just compromise insights, it undermines your decisions.
About the Author
Sarada is a co-founder of Perth Machine Learning Group and a Director of medical imaging start-up, Assisted Evolution Pty Ltd. She received the Women in Technology WA Tech [+] 20 Award in 2019.
Sarada brings machine learning/artificial intelligence and strong corporate governance together to help companies evolute. She is passionate about sharing knowledge and supporting diversity in tech.
About EXTAG
EXTAG is a powerful software platform designed for asset-intensive
organisations to ensure all assets – no matter how minor – are compliant,
safe, and ready for use.
Proven in Oil & Gas since 2018, EXTAG’s proprietary platform complements
existing systems to deliver cost-effective, ERP-level control of excluded assets
– making Asset Managers’ jobs simpler and their results visible.
Media Contact
Lan Tran
Chief Relationship Officer
0412 026 208




Comments