Application of PCA and Machine Learning for Predicting Oil Measurement Discrepancies in Custody Transfer Systems: Understanding from an Indonesian Mature Onshore Facility

Authors

  • Wan Fadly Department of Petroleum Engineering, Faculty of Engineering, Universitas Islam Riau
  • Fiki Hidayat Universitas Islam Riau https://orcid.org/0000-0003-1407-8952
  • Noratikah Abu Centre for Mathematical Sciences, Universiti Malaysia Pahang
  • Muhammad Khairul Afdhol Department of Petroleum Engineering, Faculty of Engineering, Universitas Islam Riau
  • Dike Fitriansyah Putra Department of Petroleum Engineering, Faculty of Engineering, Universitas Islam Riau
  • Mulyandri PT. Pertamina Hulu Rokan

DOI:

https://doi.org/10.29017/scog.v48i4.404

Keywords:

Oil measured volume discrepancies, Time-Series Forecasing, Principal Component Analysis, Time Series Cross-Validation

Abstract

Oil measured volume discrepancies in custody transfer systems is becoming a persistent challenge, which is often caused by complex thermal, hydraulic, and compositional interactions. Therefore, this study aimed to introduce a data-driven framework incorporating Principal Component Analysis (PCA) and machine learning (ML) to identify as well as predict discrepancies at a representative onshore gathering station (GS) in Indonesia (Field-X). Major operational parameters, including gross volume, unallocated net oil, pressure, temperature, and Basic Sediment & Water (BS&W), were analyzed to assess the impact on volumetric imbalance. During the analysis, PCA reduced 64 correlated variables to five principal components, explaining 95% of the total variance and showing gross volume, pressure, and temperature as dominant factors. Four ML models, namely XGBoost, Random Forest, Support Vector Regression, and ElasticNet, were trained as well as validated with three-fold time series cross-validation for temporal robustness. Incorporating PCA significantly improved predictive performance, with Support Vector Regression showing the largest R² increase (from –0.0082 to 0.82). Results signified that discrepancies were primarily governed by thermodynamic shrinkage, temperature changes, and BS&W-related metering errors. In addition, the proposed PCA–ML framework offered an interpretable, reliable method for early detection and mitigation of oil volume discrepancies in complex production environments.

Author Biography

Fiki Hidayat, Universitas Islam Riau

Lektor - Program Studi Teknik Perminyakan Universitas Islam Riau

References

Alharbi, R., Alageel, N., Alsayil, M., & Alharbi, R. (2022). Prediction of oil production through linear regression model and big data tools. International Journal of Advanced Computer Science and Applications, 13(12).

Al-Jawarneh, A. S., Ismail, M. T., Awajan, A. M., & Alsayed, A. R. M. (2022). Improving accuracy models using elastic net regression method based on empirical mode decomposition. Communications in Statistics-Simulation and Computation, 51(7), 4006–4025.

Badings, T. S., & van Putten, D. S. (2020). Data validation and reconciliation for error correction and gross error detection in multiphase allocation systems. Journal of Petroleum Science and Engineering, 195, 107567.

Bikmukhametov, T., & Jäschke, J. (2019). Oil production monitoring using gradient boosting machine learning algorithm. Ifac-Papersonline, 52(1), 514–519.

Botache, D., Dingel, K., Huhnstock, R., Ehresmann, A., & Sick, B. (2023). Unraveling the Complexity of Splitting Sequential Data: Tackling Challenges in Video and Time Series Analysis. ArXiv Preprint ArXiv:2307.14294.

Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794.

Dsouza, N. A. (2024). Evaluation of Machine Learning Algorithms for Flow Rate Estimation in Oil and Gas Industry [Master’s thesis, University of South-Eastern Norway]. www.usn.no

Effrosynidis, D., Spiliotis, E., Sylaios, G., & Arampatzis, A. (2023). Time series and regression methods for univariate environmental forecasting: An empirical evaluation. Science of The Total Environment, 875, 162580.

Emeke, K. B. C. (2019). A novel model developed for forecasting oilfield production using multivariate linear regression method. Journal of Science and Technology Study, 29(2), 579–591.

Han, D., & Kwon, S. (2021). Application of machine learning method of data-driven deep learning model to predict well production rate in the shale gas reservoirs. Energies, 14(12), 3629.

Hidayat, F., Nasution, A. H., Ambia, F., & Putra, D. F. (2025). Leveraging Large Language Models for Discrepancy Value Prediction in Custody Transfer Systems: A Comparative Analysis of Probabilistic and Point Forecasting Methods. IEEE Access.

Ilic, I., Görgülü, B., Cevik, M., & Baydoğan, M. G. (2021). Explainable boosted linear regression for time series forecasting. Pattern Recognition, 120, 108144.

Kanshio, S. (2020). A review of hydrocarbon allocation methods in the upstream oil and gas industry. Journal of Petroleum Science and Engineering, 184, 106590.

Li, X., Zhang, L., Khan, F., & Han, Z. (2021). A data-driven corrosion prediction model to support digitization of subsea operations. Process Safety and Environmental Protection, 153, 413–421.

Mai-Cao, L., & Truong-Khac, H. (2022). A comparative study on different machine learning algorithms for petroleum production forecasting. Improved Oil and Gas Recovery, 6.

Masini, R. P., Medeiros, M. C., & Mendes, E. F. (2023). Machine learning advances for time series forecasting. Journal of Economic Surveys, 37(1), 76–111.

Naufal, A. A., & Metra, S. (2021). A digital oilfield comprehensive study: Automated intelligent production network optimization. SPE Asia Pacific Oil and Gas Conference and Exhibition, D031S026R003.

Nemer, Z. N. (2024). Oil and Gas Production Forecasting Using Decision Trees, Random Forst, and XGBoost. Journal of Al-Qadisiyah for Computer Science and Mathematics, 16(1), 9–20.

Nengkoda, A. (2011). The role of crude oil shrinkage in heavy mix light crude in main oil pipeline: case study Oman. SPE International Heavy Oil Conference and Exhibition, SPE-148925.

Nugroho, A., & Husin, A. (2022). Analisis Performa Random Forest Menggunakan Normalisasi Atribut. SISTEMASI: Jurnal Sistem Informasi, 11(1), 186–196.

OKON, J., UDOH, T., & EMENKA, B. (2024). Prediction of Interfacial Tension Using Machine Learning: A Review of Applied Techniques in Petrochemical/Reservoir Engineering.

Osah, U., & Howell, J. (2023). Predicting oil field performance using machine learning programming: a comparative case study from the UK continental shelf. Petroleum Geoscience, 29(1), petgeo2022-071.

Parhizkar, T., Rafieipour, E., & Parhizkar, A. (2021). Evaluation and improvement of energy consumption prediction models using principal component analysis based feature reduction. Journal of Cleaner Production, 279, 123866.

Pisner, D. A., & Schnyer, D. M. (2020). Support vector machine. In Machine learning (pp. 101–121). Elsevier.

Rangga, A., Widyasari, Y. D. L., & Sahid, D. S. S. (2022). Integrated production facilities clustering and time-series forecasting derived from large dataset of multiple hydrocarbon flow measurement. Science, Technology and Communication Journal, 2(2), 32–45.

Rhamadhani, D. A., & Saputra, E. E. D. (2023). Analisa Model Machine Learning dalam Memprediksi Laju Produksi Sumur Migas 15/9-F-14H. Journal of Sustainable Energy Development, 1(1), 48–55.

Salem, N., & Hussein, S. (2019). Data dimensional reduction and principal components analysis. Procedia Computer Science, 163, 292–299.

Sherif, S., Adenike, O., Obehi, E., Funso, A., & Eyituoyo, B. (2019). Predictive data analytics for effective electric submersible pump management. SPE Nigeria Annual International Conference and Exhibition, D033S019R003.

Sola-Aremu, O. (2019). An inferable machine learning method to predicting PVT properties of Niger delta crude oil using compositional data. SPE Annual Technical Conference and Exhibition?, D023S103R021.

Song, L., Wang, C., Lu, C., Yang, S., Tan, C., & Zhang, X. (2023). Machine Learning Model of Oilfield Productivity Prediction and Performance Evaluation. Journal of Physics: Conference Series, 2468(1), 012084.

Sulandari, W., Yudhanto, Y., Subanti, S., Zukhronah, E., & Subarkah, M. Z. (2024). Implementing Time Series Cross Validation to Evaluate the Forecasting Model Performance. KnE Life Sciences, 229–238.

Suwono, S., & Utama, D. N. (2025). Estimation of Well Flowing Bottomhole Pressure (FBHP) Using Machine Learning. Scientific Contributions Oil and Gas, 48(3), 37–51. https://doi.org/10.29017/scog.v48i3.1851

Tian, F., Fu, Y., Liu, X., Li, D., Jia, Y., Shao, L., Yang, L., Zhao, Y., Zhao, T., & Yin, Q. (2024). A Comprehensive Evaluation of Shale Oil Reservoir Quality. Processes, 12(3), 472.

Ulil, M. R., Winardhi, S., & Dinanto, E. (2025). Machine Learning-Based Prediction of Shear Wave Velocity: Performance Evaluation of Bi-Gru, Ann, and The Greenberg-Castagna Empirical Method. Scientific Contributions Oil and Gas, 48(3), 133–144. https://doi.org/10.29017/scog.v48i3.1797

Vahabi, N., & Selviah, D. R. (2019). Dimensionality reduction and pattern recognition of flow regime using acoustic data. Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 2, 880–891.

Vakilinejad, A., Ahmad, W., & Vakili-Nezhaad, G. (2017). Volumetric Behavior Study of Petroleum Fluids Mixtures Through Shrinkage Factor. ICTEA: International Conference on Thermal Engineering, 2017.

Vien, B. S., Wong, L., Kuen, T., Rose, L. F., & Chiu, W. K. (2021). A machine learning method for anaerobic reactor performance prediction using long short-term memory recurrent neural network. Struct. Health Monit, 18, 61.

Wardhana, S. G., Pakpahan, H. J., Simarmata, K., Pranowo, W., & Purba, H. (2021). Algoritma komputasi machine learning untuk aplikasi prediksi nilai total organic carbon (TOC). Lembaran Publikasi Minyak Dan Gas Bumi, 55(2), 75–87. https://doi.org/10.29017/LPMGB.55.2.606

Wood, D. A. (2023). Geomechanical brittleness index prediction for the Marcellus shale exploiting well-log attributes. Results in Engineering, 17, 100846.

Zhang, Y., Zhang, G., Zhao, W., Zhou, J., Li, K., & Cheng, Z. (2024). Total organic carbon content estimation for mixed shale using Xgboost method and implication for shale oil exploration. Scientific Reports, 14(1), 20860.

Published

16-12-2025

Issue

Section

Articles