Lessons to Integrate Heterogeneous Data Sets for Prediction
Shuichi Iwata
University of Science and Technology Beijing, Beijing, 100083, China
EXTENDED ABSTRACT: There is in principle only one material and we do not have the same existence in the cosmos. In this context each material can be characterized uniquely by a set of properties measured on the material. Starting from this understanding our experience on building a material database system named 'Data-Free-Way' [1] is critically reviewed as a reference to examine the method of integrating heterogeneous data sets. The system has been developed for conceptual design of fusion reactors by four national organizations(NIMS, JAERI, PNC and JST) under a cooperative agreement in order to share and utilize available data captured by three different experimental modes of defect production dynamics organized by NIMS, JAERI and PNC. The mode difference consists of the ones on energy deposition and quantum beam particle, namely, ion and neutron, which results in differences of consequent microstructural evolutions including nuclear transmutation. Heterogeneity of obtained data is essential with respect of space and time. In order to extract values from such apparently-looking-different three types of heterogeneous data sets, we need to resolve each data set into a set of normalized descriptions which enable scientifically meaningful comparisons of the three types of data and predictions of microstructural evolutions of materials under virtual fusion reactor conditions. This resolving process need to be carried out by converging tentative results such as design specification by conceptual/detailed design, selection of constitutional equations/models after established design standards, qualitative/morphological explanations on microstructural evolutions. Consistency of our understanding on design parameters of an engineering product and raw data obtained by feasible experiments is required. In order to get a consistency among heterogeneous data sets information infrastructure of JST such as metadata & knowledge management and fundamental data on materials has been utilized for editing data. The editing procedure as usual is a collection of direct approaches for analysis and inverse approaches for fitting, where inverse problem solving is two-folds, namely, parametric optimization inverse and model-derivation inverse. As for model-derivation inverse challenge comparative studies[2] have been carried out to predict materials performances of structural components and finally ones of engineering products in service, where bridging of microscopic, mesoscopic, macroscopic features of materials had been carried out strategically to establish a de facto standard. A brief summary of lessons obtained through this challenge is same to the common explanation that the complex system of many strongly interacting components cannot simply be derived from summation of individual components behavior. Our materials world has too rich semantics for us to see essentials and values hidden there, which implies the importance of discovery approaches on unexperienced domains especially for predictions on long term dynamic performances of materials. We have experienced the same issues in building databases for high Tc superconducting materials, superalloys etc., which requires strategic challenges to bridge diversities of and universalities of materials. Based on this experience future perspectives on materials data systems are shown in Fig.1 for discussion by taking into accounts recent progresses of ICT and new challenges in evolving information infrastructure on materials. In addition to progresses of cutting edge challenges so called high throughput intelligent calculations and high throughput intelligent experiments, two exemplars of materials data systems by Granta[3] and MPDS[4] have been becoming de facto standards as holistic guide maps on materials after several decades of incubation period. Therefore it becomes feasible to make challenges of exploring new frontiers cutting edge approaches with a set of guide maps on materials as shown in [5]. Basic parameters up to this phase are in principle mass, space, time and energy, which are i)developed by combination, structure, causality and relation/interaction, ii)connected by mathematics, iii)aggregated by statistics, iv)designed as engineering products. Works from i) to iii) can be managed as optimization problems in the framework complex systems modeling along work flows and reuse of established scientific knowledge. Human dimensions need to be considered to deal with stakes and uncertainties, heuristics and emergent interactions of agents for works from iv) up to social complexity, namely, making rules including standards. So it is important to develop an open platform to deal with decision making procedures properly(transparent, traceable, FAIR) based on scientific data.

REFERENCES
1.H. Tsuji, M. Fujita, S.Kano, R.Nakajima, S.Iwata, et al., J. Nucl. Mater., 271&272, (1999)486~490.
2.S. Iwata, Radiation Effects & Defects in Solids, Vol. 144, (1998) 1-25.
3.https://www.ansys.com/products/materials
4.https://mpds.io/#start
5.https://arxiv.org/abs/2105.12784
 
            
        
        Emeritus Professor, The University of Tokyo; Guest Professor, Yamanashi University; Member of EAJ; Coordinator of Linus Pauling File Project; Chief Scientist of MGE/USTB, Editor-in-Chief of Ecoethica, Member of the Academic Advisory Board (iShine, BAIC for MGE/USTB); Doctor of Engineering (The Art of Alloy Engineering); Served as President and EC Member of CODATA, Editor-in-Chief of Data Science Journal, Member of SCJ, Director of RACE, The University of Tokyo, Visiting Professor/Guest Researcher of FIZKarlsruhe, National Museum of Nature and Science, National Institute of Metals, NIST, ITODYS/Paris University, Cambridge University, Oxford University. Published about 300 scientific papers including two Editorials of Science; Having committed himself to chair Project for Virtual Experiments for Materials Design, Nuclear Test Research of AEC, Japan, Program Director/Officer of Initiative for Strategic Scientific Research on Nuclear Fields and advisors for SIP Materials Integration etc. Lectures given since 1978 are spreading over data, informatics, nuclear fuels and materials, engineering, materials science, design science, environmental studies, philosophy and ethics. Award and Honors: Honda Memorial Young Researcher Award/ Iketani Science Foundation Award/ Promotion of Science and Technology Information Award, JST/ Paper Award, JIM/ GIW Best Paper Award.