Journal of Computational Intelligence in Materials Science


A Theoretical Review on Improving Predictive Accuracy and Mitigating Overfitting in Materials Informatics



Journal of Computational Intelligence in Materials Science

Received On : 29 December 2023

Revised On : 28 March 2024

Accepted On : 04 April 2024

Published On : 29 April 2024

Volume 02, 2024

Pages : 068-076


Abstract


Within the field of machine learning, where computers are required to identify the best match for a particular set of data, overfitting is a typical concern. To effectively enhance the accuracy of prediction, this research seeks to investigate the problem of overfitting in the field of machine learning, and recommends novel techniques to establish hypothesis functions retrieved from data. This paper also discusses the necessity for more data collection, and potential challenges related to handling unrepresented datasets. To effectively predict data, this paper puts more emphasis on the selection of effective descriptors and feature extraction elements, with major focus on entropy-based and decision tree models within the field of informatics. In addition, this paper reviews principal component analysis (PCA) and model interpretability of applications. To enhance performance, this research ends with a discussion on the selection of standard models, and machine learning algorithms. The discussions in this article provides a basis of understanding the processes involved in the advancement of content-based reporting models, emphasizing the necessity of gathering essential data, developing sophisticated models, advancing them, and putting them to practical applications.


Keywords


Overfitting, Material Informatics, Principal Component Analysis, Predictive Accuracy, Descriptor Selector.


  1. R. Ramprasad, R. Batra, G. Pilania, A. Mannodi-Kanakkithodi, and C. Kim, “Machine learning in materials informatics: recent applications and prospects,” Npj Computational Materials, vol. 3, no. 1, Dec. 2017, doi: 10.1038/s41524-017-0056-5.
  2. I. H. Sarker, “Machine learning: algorithms, Real-World applications and research directions,” SN Computer Science, vol. 2, no. 3, Mar. 2021, doi: 10.1007/s42979-021-00592-x.
  3. S. L. Kukreja, J. Löfberg, and M. Brenner, “A LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR (LASSO) FOR NONLINEAR SYSTEM IDENTIFICATION,” IFAC Proceedings Volumes, vol. 39, no. 1, pp. 814–819, Jan. 2006, doi: 10.3182/20060329-3-au-2901.00128.
  4. R. Gautam, S. Vanga, F. Ariese, and S. Umapathy, “Review of multidimensional data processing approaches for Raman and infrared spectroscopy,” EPJ Techniques and Instrumentation, vol. 2, no. 1, Jun. 2015, doi: 10.1140/epjti/s40485-015-0018-6.
  5. Ö. F. Alçın, A. Şengür, S. Ghofrani, and M. C. İnce, “GA-SELM: Greedy algorithms for sparse extreme learning machine,” Measurement, vol. 55, pp. 126–132, Sep. 2014, doi: 10.1016/j.measurement.2014.04.012.
  6. C. Tennant, A. Carpenter, T. Powers, A. Shabalina, L. Vidyaratne, and K. M. Iftekharuddin, “Superconducting radio-frequency cavity fault classification using machine learning at Jefferson Laboratory,” Physical Review Accelerators and Beams, vol. 23, no. 11, Nov. 2020, doi: 10.1103/physrevaccelbeams.23.114601.
  7. V. Srikant and D. R. Clarke, “On the optical band gap of zinc oxide,” Journal of Applied Physics, vol. 83, no. 10, pp. 5447–5451, May 1998, doi: 10.1063/1.367375.
  8. В. И. Анисимов, J. Zaanen, and O. K. Andersen, “Band theory and Mott insulators: HubbardUinstead of StonerI,” Physical Review, vol. 44, no. 3, pp. 943–954, Jul. 1991, doi: 10.1103/physrevb.44.943.
  9. S. Adams, O. Moretzki, and E. Canadell, “Global instability index optimizations for the localization of mobile protons,” Solid State Ionics, vol. 168, no. 3–4, pp. 281–290, Mar. 2004, doi: 10.1016/j.ssi.2003.04.002.
  10. Y. Okada et al., “Quasiparticle interference on cubic perovskite oxide surfaces,” Physical Review Letters, vol. 119, no. 8, Aug. 2017, doi: 10.1103/physrevlett.119.086801.
  11. K. J. Hubbard and D. G. Schlom, “Thermodynamic stability of binary oxides in contact with silicon,” Journal of Materials Research, vol. 11, no. 11, pp. 2757–2776, Nov. 1996, doi: 10.1557/jmr.1996.0350.
  12. C. Moure and O. Peña, “Recent advances in perovskites: Processing and properties,” Progress in Solid State Chemistry, vol. 43, no. 4, pp. 123–148, Dec. 2015, doi: 10.1016/j.progsolidstchem.2015.09.001.
  13. T. W. Schultz and M. Cronin, “Essential and desirable characteristics of ecotoxicity quantitative structure–activity relationships,” Environmental Toxicology and Chemistry, vol. 22, no. 3, pp. 599–607, Mar. 2003, doi: 10.1002/etc.5620220319.
  14. T. G. Dietterich and R. S. Michalski, “A Comparative Review of Selected Methods for Learning from Examples,” in Springer eBooks, 1983, pp. 41–81. doi: 10.1007/978-3-662-12405-5_3.
  15. W. Gao and X. Li, “Application of multi-task sparse lasso feature extraction and support vector machine regression in the stellar atmospheric parameterization,” Chinese Astronomy and Astrophysics, vol. 41, no. 3, pp. 331–346, Jul. 2017, doi: 10.1016/j.chinastron.2017.08.004.
  16. C. G. Bampis, L. Zhi, I. Katsavounidis, and A. C. Bovik, “Recurrent and dynamic models for predicting streaming video quality of experience,” IEEE Transactions on Image Processing, vol. 27, no. 7, pp. 3316–3331, Jul. 2018, doi: 10.1109/tip.2018.2815842.
  17. S. Wold, K. H. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and Intelligent Laboratory Systems, vol. 2, no. 1–3, pp. 37–52, Aug. 1987, doi: 10.1016/0169-7439(87)80084-9.
  18. Z. Yong-Li and Y. Yang, “Cross-validation for selecting a model selection procedure,” Journal of Econometrics, vol. 187, no. 1, pp. 95–112, Jul. 2015, doi: 10.1016/j.jeconom.2015.02.006.
  19. B. Meredig, “Industrial materials informatics: Analyzing large-scale data to solve applied problems in R&D, manufacturing, and supply chain,” Current Opinion in Solid State & Materials Science, vol. 21, no. 3, pp. 159–166, Jun. 2017, doi: 10.1016/j.cossms.2017.01.003.
  20. B. Tsoi, R. Goeree, J. Jegathisawaran, J. Tarride, G. Blackhouse, and D. O’Reilly, “Do different decision-analytic modeling approaches produce different results? A systematic review of cross-validation studies,” Expert Review of Pharmacoeconomics & Outcomes Research, vol. 15, no. 3, pp. 451–463, Mar. 2015, doi: 10.1586/14737167.2015.1021336.

Acknowledgements


The authors would like to thank to the reviewers for nice comments on the manuscript.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


No data available for above study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Razak bin Osman, “A Theoretical Review on Improving Predictive Accuracy and Mitigating Overfitting in Materials Informatics”, Journal of Computational Intelligence in Materials Science, vol.2, pp. 068-076, 2024. doi: 10.53759/832X/JCIMS202402007.


Copyright


© 2024 Razak bin Osman. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.