Exploring the Generalization Ability of Machine Learning Models: Predicting physical information of materials with unknown elements

Yang Bai1*, Jingjin He, Yanjing Su and Lijie Qiao

EXTENDED ABSTRACT: Machine learning method, based on the known data to predict the physical properties of unknown materials, needs to accurately capture the key factors influencing the target properties and relationships. Moreover, the excellent generalization ability is highly desired, a model established to capture the underlying knowledge through known data, to accurately predict in huge search space of unknown samples. The generalization ability is a key problem for machine learning. It is still a great challenge, how to accurately predict in the search space outside known data sets as far as possible, even for those containing some elements not involving in the known data set. This study takes the phase diagram prediction of a multi-component ferroelectric system as an example to explore how to accurately predict the physical information of materials containing unknown element. For (Bal-x-yCaxSry)(Til-u-v-wZruSnvHfw)O3 complex multicomponent ferroelectric material system, a deep neural network model with physical features as input and phase transition temperature as target was established to predict the phase diagram, and the element generalization ability of the machine learning model was analyzed. By predicting the phase diagram of the materials doped with some unknown equivalent elements (Zr, Hf, Sn) and heterovalent elements (La, Ce, Nd, Sm, Eu, Gd), it is proved that the model can make an accurate prediction when the key feature value of the unknown elements are within the range of the corresponding data in the training set. The prediction error of test data is related to its distribution in the training data set. As the Euclidean distance of the test data relative to the training set increases, the prediction error increases. On the contrary, when the predicted samples are closer to some samples in the training set, the model is easier to capture the effective physical information, and present a better prediction. Based on the above analysis, adding a small amount of targeted data can significantly improve the generalization ability of the model, so that its prediction accuracy of the physical properties of materials containing unknown elements is significantly improved. 

Keywords: ferroelectric materials; machine learning; phase diagram; generalization 

Brief Introduction of Speaker
Yang Bai

Yang Bai is a Professor at University of Science and Technology Beijing (China). He received his B.S. and Ph.D. in Materials Science and Engineering from Tsinghua University, in 2001 and 2006. In 2012, he was voted as the National Program for Support of Top-Notch Young Professionals, and was selected in the New Century Excellent Talents plan (MOE). In 2015, he was voted as the Best Scientific Research Workers by the Chinese Association of Young Scientists and Technologists. In 2019, he won the 13th Youth Science and Technology Award of Chinese Ceramic Society. Also he won one first prize and one second prize of Natural Science Award of Ministry of Education. Currently, he is executive director of Metamaterials Society of Chinese Materials Research Society and fellow of Advanced Ceramics Society of Chinese Ceramic Society. He also serves as editorial member in Journal of Advanced Ceramics, SCIENCE CHINA: Technological Sciences, Journal of Physics D: Applied Physics and International Journal of Minerals, Metallurgy and Materials, and peer reviewer of more than 80 international academic journals including Nature Comm, Adv Mater, Mater Today and so on.