Deep learning based molecular generation strategies for different materials

Xuemei Pu1*, Ming Sun1, Guangchuan Wang2, Chenghui Wang2, Haoming Su1, Jing Liu,Songran Yang1

1College of Chemistry, Sichuan University, Chengdu 610064

 2College of Computer,Sichuan University, Chengdu 610064 

EXTENDED ABSTRACT: Deep learning has exhibited great successes in diverse fields, including molecular generation. However, it has been challenging to explore appropriate strategies for different material fields, for example, low data and sufficient data regimes. Motivated by the challenge, we probe different generation frameworks based on deep learning. 1) for the energetic materials in the low data region, we explore a correlated deep learning framework, which is consisted of three recurrent neural networks (RNN) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with high detonation velocity in the case of only very limited data available. To avoid the dependence on external big dataset, the data augmentation by fragment shuflling and SMILES enumeration is utilized and the pretrained knowledge is introduced to improve the RNN based generation model and prediction model. Finally, only based on 303 energetic molecules, we obtained thirty-five new molecules, which present higher detonation velocity and lower synthetic accessibility than the classic explosive RDX. 2) For metal—organic frameworks (MOFs) with relatively sufficient data, previous limitation lies in that the complex structures of MOFs always lead to that Machine Leaming (ML) models inaccurately predicting and generate unreal material structures when using the sequence-based representation such as Simplified Molecular Input Line Entry System (SMILES) and SEIFIES. To address the limitation, we develop a novel graph based MOFs representation and design an inverse design ML framework to generate desirable MOFs. We construct a multiple component input based on the graph representation and use corresponding encoders to encode the input into VAE based generate model. Meanwhile, a property predictor is trained with descriptors from latent space. Our model achieved 82% in reconstruction accuracy, 100% in prio r validity and a coefficient of determination of 0.95 for the predictor, greatly outperforming previous competitive models. Based on the DL-based framework, we further introduce the optimization algorithm to search MOF with CO2 uptake in the CO2/N2 separation as target property. Finally, some new MOFs generated enable to capture 2.7 mol/kg of CO2, comparable to that of the best adsorbent materials, demonstrating the success of our model. 

Brief Introduction of Speaker
Xuemei Pu

Dr. Xuemei Pu is a professor of College of Chemistry, Sichuan University and a member of the Computational Chemistry Professional Committee of the Chinese Chemical Society. In recent years, she has carried out a series of research works in the field of functional materials and biomedical fields, supported by multiple National Natural Science Foundation of China. She has co-published more than 100 SCI papers, six authorized patents and five computer software copyrights.