A Review of Molecule Featurization Methods for Materials Informatics Submitted to 6th For MGE

Wei Wang*

1The Hong Kong University of Science and Technology (Guangzhou), China;

2The Hong Kong University of Science and Technology, Hong Kong SAR, China

EXTENDED ABSTRACT: Machine learning (including deep learning) approaches have played a critical role in materials informatics and have demonstrated great successes in material property modelling (prediction or regression tasks), and material designs. An important step in applying machine learning is featurization (also known under different names such as descriptor generation, feature extraction/engineering, or representation learning), which converts the input objects into an representation (typically high-dimensional vectors) amenable to downstream machine learning models. In this paper, we will review existing featurization approaches for modeling molecules in materials informatics, and categorize them into major categories (such as feature preprocessing, strings, graphs, matrix, or topological analysis). We will then discuss recent progresses in machine learning that can be used to further supplement and enhance the existing featurization approaches. 

Keywords: Featurization, Machine Learning, Materials Informatics, Representation Leaming

[l] Gu, G.H., Choi, C.H., Lee, Y., Situmorang, A.B., Noh, J., Kim, Y., & Jung, Y. (2020). Progress in Computational and Machine-Leaming Methods for Heterogeneous Small-Molecule Activation. Advanced Materials, 32.

[2] Li, S., Liu, Y., Chen, D., Jiang, Y., Nie, Z., & Pan, F. (2021). Encoding the atomic structure for machine learning in materials science. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12.

[3] Bronstein, M.M., Bruna, J., Cohen, T., & Velivckovi'c, P. (2021). Geometric Deep Leaming: Grids, Groups, Graphs, Geodesics, and Gauges. ArXiv, abs/2104.13478. 

Brief Introduction of Speaker

Wei WANG graduated from the Department of Computer Science, The Hong Kong University of Science and Technology in 2004. He is currently a Professor at the Data Science and Analytics Thrust, Information Hub, The Hong Kong University of Science and Technology (Guangzhou). He has published more than 160 papers in reputed journals and conferences, and has won the Best Paper Awards in SIGCOMM 2022, ICMR 2021, and the Best Student Paper at DASFAA 2016. He is an Associate Editor of IEEE Transactions on Knowledge and Data Engineering and Journal of Materials Informatics.