Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Applications of Artificial Intelligence in Process Systems Engineering
Applications of Artificial Intelligence in Process Systems Engineering
Applications of Artificial Intelligence in Process Systems Engineering
Ebook977 pages9 hours

Applications of Artificial Intelligence in Process Systems Engineering

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Applications of Artificial Intelligence in Process Systems Engineering offers a broad perspective on the issues related to artificial intelligence technologies and their applications in chemical and process engineering. The book comprehensively introduces the methodology and applications of AI technologies in process systems engineering, making it an indispensable reference for researchers and students. As chemical processes and systems are usually non-linear and complex, thus making it challenging to apply AI methods and technologies, this book is an ideal resource on emerging areas such as cloud computing, big data, the industrial Internet of Things and deep learning.

With process systems engineering's potential to become one of the driving forces for the development of AI technologies, this book covers all the right bases.

  • Explains the concept of machine learning, deep learning and state-of-the-art intelligent algorithms
  • Discusses AI-based applications in process modeling and simulation, process integration and optimization, process control, and fault detection and diagnosis
  • Gives direction to future development trends of AI technologies in chemical and process engineering
LanguageEnglish
Release dateJun 5, 2021
ISBN9780128217436
Applications of Artificial Intelligence in Process Systems Engineering

Read more from Jingzheng Ren

Related to Applications of Artificial Intelligence in Process Systems Engineering

Related ebooks

Chemical Engineering For You

View More

Related articles

Reviews for Applications of Artificial Intelligence in Process Systems Engineering

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Applications of Artificial Intelligence in Process Systems Engineering - Jingzheng Ren

    China

    Chapter 1: Artificial intelligence in process systems engineering

    Tao Shia; Ao Yanga; Yuanzhi Jina; Jingzheng Rena; Weifeng Shenb; Lichun Dongb; Yi Mana    a Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, People’s Republic of China

    b School of Chemistry and Chemical Engineering, Chongqing University, Chongqing, People’s Republic of China

    Abstract

    Accompanied by the great advances in computer hardware and the widespread commercial application in big data, the artificial intelligence (AI) represented by the machine learning technology has gained popular applications in the past two decades. On the other side, some challenges such as multiscale modeling, simulation, optimization, control, and supply chain management have been encountered in the research of process system engineering (PSE). Advanced AI technology like deep learning, reinforcement learning, etc. provides a promising way to solve the above-mentioned problems in the PSE from a different perspective. Therefore, the background about PSE and typical branching of AI are introduced to give us an overall grasp about both disciplines. In addition, some work related to the AI applications in PSE have been reviewed hopefully to provide some inspirations in the relative fields.

    Keywords

    Artificial intelligence; Process system engineering; Machine learning

    Acknowledgment

    The authors would like to express their sincere thanks to the Research Committee of The Hong Kong Polytechnic University for the financial support of the project through a PhD studentship (project account code: RK3P).

    1: What is process system engineering

    1.1: Background

    In 1961, process system engineering (PSE) was first proposed in a special volume of AIChE Symposium Series. The term PSE was not widely accepted, however, until 1982 when the first international symposium on this topic took place in Kyoto, Japan. The first journal devoted to the PSE-related research was Computers and Chemical Engineering born in 1997. PSE is considered under the banner of chemical engineering process, which was developed from the concept of unit operation, mathematical models, and computer visualization. Of note is that chemical engineering differs so much from chemistry research since engineers have to take the responsibility for transferring the natural theory or microscale matters into the macroscale products. Therefore, in some sense, PSE is a multiscale systematical process beneficial for us humanity and a whole multiscale concept is shown in Fig. 1. The widely accepted definition of PSE is that an interdisciplinary research filed concerned with the development of the systematic procedures with the aid of mathematics and computer for the smart design, effective control, schedule distribution of product manufacture system, which is characterized by multiple scale [2].

    Fig. 1

    Fig. 1 Multiscale research flowsheet of PSE [1].

    Up to now, the PSE research deeply influenced by the engineering achievement and scientific theory is closely tied to mathematics, computer science, and software engineering. Much bigger data and high-performance computer greatly boost the investigation of PSE meanwhile connect the commercial enterprise and research institutions more closely. With the aid of real industrial data from the enterprise, the design of process model and simulation in advanced computer can be further improved by researchers. The improved design could be continuously applied into the industrial adjustment and upgrading leading to an interesting cycle and feedback. As shown in Fig. 1, the major concerns in the PSE research are divided into the following sections like the modeling, simulation, optimization, and control, which are in multiscale process.

    1.2: Challenges

    Among these concerns there are many accomplishments in the past and a full review can be obtained by Klatt et al. [3]. It can be summarized that PSE strives to create representations and models to generate reasonable alternatives ranging from molecules to process superstructures and select the optimal solution from these candidates. Developing computationally efficient methods, accurate models, and simulation tools are all attributions in PSE area. PSE is relatively a young area from the viewpoint of discipline development, in which tough challenges stand there under the pressure of global competition and environmental protection. Overcoming these challenges eventually is rooted by innovative technology which is so closely to the improvement of manufacture efficiency, product yield, payback period, energy utilization, and so on [4].

    First, the product design on molecular scale has received much more attention than ever before [5] in addition to the process design aiming to a more energy-efficient structure. During the product design, it is needed to make property predictions of pure components or mixtures involving thermodynamics, environment, health, safety, and so on. For example, evaluating the health-related and environmental property of new chemicals is so important in pharmaceutical engineering that researchers make great efforts to find out a green synthetic route with the aid of huge database and efficient computer [6]. Moreover, people pay more attention to the sustainability and safety during the product and process design.

    How to deal with the coupling problem in the control structure of a process system in which multiinput and multioutput exist is also a big challenge. Traditionally, the proportion integral derivative (PID) control theory was the most classical and has been widely applied in most of the chemical and petrochemical industry embedded in the distributed control system. PID controllers take actions based on the linear combination of current and past deviations of the dynamic variables. As the process becomes more complex characterized by nonlinearity, multiple variables, and strong coupled function, classic control theory usually tends to a poor control quality [7]. Advanced control method is called for a solution in achieving the efficient control of such chemical processes.

    Third, a big challenge lies in the process optimization which is of necessity not only to evaluate the impact of different factors but also manage the optimal design when considering the benefits of stakeholders. The high-level nonlinearity in chemical processes greatly increases the optimization solver difficulty as one added variable often represents more than 100,000 algebraic equations have to be calculated [3]. Multivariable problems combined with single-objective or multiobjective (e.g., economic, environmental, and safety indicators) optimization must be solved within a process design. Fortunately, the huge improvement in numerical modeling and process simulation has presented an effective method to solve such problem partially. Challenging requirements on efficient optimization algorithm to deal with nonlinearity, integer discrete variables, and constraints displayed by the model still exist [8, 9].

    When the research area of PSE is expanded into the macroscale supply chain as shown in the Fig. 1, managers and investors have to consider the downstream logistics and product distribution activities meanwhile saving transportation cost and making more profit, which can be easily understood under the popularity of electrical commercial trade today. Each transaction online represents the logistics transportations and environmental influences offline. Therefore, it is necessary to introduce the life-cycle concept and carry out the supply chain optimization and inventory control in the integration of research, development, control, and operation at the business level [10, 11]. The life-cycle assessment integrated with the supply chain is an interesting topic in the PSE field. In summary, there is some challenges of multicriterion decision making in the PSE for enterprise and managers including the production schedule or plan if the industry itself wants to sell the products with little waste.

    2: What is artificial intelligence?

    2.1: Background

    In 1956, the first artificial intelligence meeting in history, named Dartmouth Summer Research Project on Artificial Intelligence (AI), was initiated by John McCarthy and other experts in this area [12]. The meeting lasted for a month having a big wish for intelligent machines that could work perfectly like human beings. It is still a long way to go to achieve the big wish. However, some AI technologies specialized at one special skill have improved a lot since 2015 when parallel computing became much faster and more efficient with the aid of high-performance CPU and GPU. Those special skills in the AI machines like image identification, classification, logical calculation is obtained by learning and is more professional than human beings. One of the most important and popular ways to achieve the professional skills for AI is machine learning (ML). Basically, once the relative data is collected, next ML algorithms are applied to make feature extraction, model training, and evaluation. After the above learning procedure, effective models can be produced to make predictions and instruct the decision-making in the real world. It is summarized that the biggest advantage by ML is that trained AI models actively has some special skills and replaces the millions of lines of code lines to do us a favor in real world [13].

    There are lots of different methods targeting on different problem solutions in the content of ML, which are generally divided into supervised learning, unsupervised learning, and reinforcement learning (RL) according to the label information in the training dataset and the dynamic decision-making process [14]. Of note is that the RL, a typical learning method with no prior label dataset, actively choose the next step (e.g., decision-making) to maximize long-term benefits according to the good or bad feedback in a dynamic environment with uncertainty. During the sequential decision-making process, reward and penalty are crucial elements to instruct the optimization [15]. Besides, ML methods branching of support vector machine based on the statistical learning theory and artificial neural network (ANN) based on the biological neural cells’ connection have gain intensive attention in handling different problems, respectively. ANN architecture which looks like a thinking process of the human brain. The ANN model is composed of a large number of neural cells. Each node connects with each other and represents a specific output function which is called an activation function, and each connection between two nodes represents a weighted value for signal passing or calculations [16].

    2.2: Deep learning

    The first-generation ANN can only make some linear classification while cannot effectively finish the logical function classification [17]. The second-generation ANN was developed by Geoffrey Hinton who used multiple hidden layers to replace the feature layer in the ANN model and calculate the intermediate parameters by back-propagation algorithm [18]. As the number of hidden layers increases, the optimization function is more prone to local optimal solutions and the training of deep network are difficult to achieve. To overcome such problem, the method of hierarchical feature extraction and unsupervised pretraining are proposed by Hinton et al. [19]. Then, the third-generation ANN technology represented by deep learning was developed and recognized as a breakthrough with regards to handling the training of multilayer networks where deep referred to the number of layers in the network. Different architectures like deep neural network (DNN), recurrent neural network, and convolutional neural network are used in the deep learning study targeting on different applications in life [20]. The development of hardware and the explosion of datasets, the demand of image and text handling, greatly provide additional boosts to the effectiveness of training of DNN. Deep learning provides an efficient approach to obtain the intelligent model after training from the big data, which can be integrated into PSE to solve complex problems [15].

    3: The AI-based application in PSE

    As well-known, the application of AI could be extended in the PSE to achieve more intelligent computer-aided tools for chemical engineering. The investigation of AI in PSE is remarkable by some researchers in the late 1960s and early 1970s and then AI methods for chemical engineering problems have vigorously developed in the early 1980s [21]. The Adaptive Initial Design Synthesizer system for the chemical process synthesis is developed [22], which was arguably the first time to systematically describe the application of AI methods in chemical engineering including means-and-ends analysis, symbolic manipulation, and linked data structures. Afterwards, the AI technology has been widely extended in physics properties prediction, process modeling and simulation, dynamic control, process optimization, and fault detection and diagnosis, etc.

    3.1: Physical properties prediction and product design

    Physical properties (e.g., critical properties) and prediction models play an important role in chemical process and product design [23]. To improve the accuracy of prediction, an intelligent and automated quantitative structure property relationships (QSPR) model based on deep learning models is developed [24]. The DNN by using the combination of tree-long short-term memory network and back-propagation neural network (BPNN) can be used to model the proposed QSPR, and the results demonstrated that the proposed approach could well predict the critical temperature, pressure, and volume [24]. Furthermore, the proposed model could be extended to predict environmental properties such as octanol-water partition coefficients [25] and Henry’s law constant [26]. Similarly, the hazardous property such as toxicity and flashpoint could be forecasted via ensemble neural network models. The investigation of physical properties prediction can realize high-throughput screening and further promote the development of product design based on the AI technique [27]. At the early stage, the BatchDesign-Kit system based on the state-transition network is used to find a new batch reaction process with lower health, safety, environmental impact and cost, which is explored by Linninger et al. [28]. Subsequently, Zhang et al. [13] develop a machine learning and computer-aided molecular design (ML-CAMD) to establish and screen suitable fragrance molecules, which are widely used in modern industries. The results illustrated that the obtained molecule C9H18O2 has the higher odor pleasantness compared with the existing product. Chai [29] extended the ML-CAMD to design crystallization solvent for an industrial crystallization process. A hybrid model is established for the determination of new food products [30]. In summary, the expanding of AI in physics properties prediction could effectively promote the development of product design and process synthesis.

    3.2: Process modeling

    Actually, the bottleneck of the chemical process requires a lot of time to perform the necessary detailed analyses (e.g., feed and product) in high-frequency optimization and process control. Such issues could be resolved via ANN models. Plehiers et al. [31] developed an ANN to model and simulate a steam-cracking process with a naphtha feedstock and a detailed composition of the steam cracker effluent and the computational illustrated that the presented method is applicable to any type of reactor without loss of performance. According to the study of George-Ufot [32], the forecasting of electric load could improve the energy efficiency and reduce the production cost. A hybrid framework including genetic algorithm (GA), particle swarm optimization, and BPNN is established, which displays good reliability and high accuracy in terms of electric load forecasting [33]. In addition, the prediction model based on the various neural network techniques could be used to forecast the energy consumption [34], chemical oxygen demand content [35], and throughput [36], respectively. The black-box model based on neural network can be employed to process optimization and control achieving, energy saving, emission reduction, and cleaner production.

    3.3: Fault detection and diagnosis

    Fault detection and diagnosis are very important in process control and monitoring because it could ensure the process safety and reliability (i.e., product quality) in chemical engineering processes [37, 38]. A back-propagation neural network-based methodology is proposed by Venkatasubramanian and Chan [39] to handle the fault diagnosis of the fluidized catalytic cracking process in oil refinery. Similarly, the proposed model could be used to fault diagnosis and process control [40]. The results illustrated that the established approach could diagnose novel fault combinations (not explicitly trained upon) and handle incomplete and uncertain data. Then, the neural network method is used to fault detection for reactor [41] and a fatty acid fractionation precut column pressure/temperature [42] sensors fault. An extended Deep Belief Networks is developed, which could avoid the valuable information in the raw data lost in the layer-wise feature compression in traditional deep networks [43].

    3.4: Process optimization and scheduling

    AI method for process optimization and scheduling has been attracted because it could increase the speed, reduce the time consumption of optimization, and achieve optimal matching. For instance, Qiu et al. [44] developed a data-driven model based on the radial basis function neural network combined with GA to solve the mixed integer nonlinear programing problem of distillation process. The separation of propylene/propane by using the externally heat-integrated distillation columns is employed to illustrate the proposed approach and the optimal solution could be quickly found in a wide space. A hybrid model via the combination of artificial neural network and GA is investigated by Khezri et al. [45], and the capability of the hybrid model is illustrated by means of a gas to liquids process. The hybrid model shows significant advantage because the computational time for optimization is greatly reduced from multiple days to just a few seconds and the relative error is in an accepted range. In addition, the flow shop scheduling [46, 47] optimization of cooling water system [48], dynamic vehicle routing scheduling [49] could be effectively solved via neural network models.

    4: Summary

    The application of AI in the above fields of PSE may be just one tip of the iceberg. With the continuous development and penetration of AI technology, we believe that AI could be applied more extensively in the future. In short term, maintaining AI’s influence on society is conducive to the study of its deep applications in many fields such as logical reasoning and proof, natural language processing, intelligent information retrieval technology and expert systems, etc. In the long run, achieving the general AI machine better than human at most cognitive tasks is the final goal of scientists and engineers. Novel materials and high-performance computer have crucial influences on the process.

    References

    [1] Marquardt W., von Wedel L., Bayer B. Perspectives on lifecycle process modeling. In: AIChE Symposium Series. New York: American Institute of Chemical Engineers; 2000 1998.

    [2] Grossmann I.E., Westerberg A.W. Research challenges in process systems engineering. AICHE J. 2000;46(9):1700–1703.

    [3] Klatt K.-U., Marquardt W. Perspectives for process systems engineering—personal views from academia and industry. Comput. Chem. Eng. 2009;33(3):536–550.

    [4] Grossmann I.E., Harjunkoski I. Process systems engineering: academic and industrial perspectives. Comput. Chem. Eng. 2019;126:474–484.

    [5] Ng L.Y., Chong F.K., Chemmangattuvalappil N.G. Challenges and opportunities in computer-aided molecular design. Comput. Chem. Eng. 2015;81:115–129.

    [6] Chen H., et al. The rise of deep learning in drug discovery. Drug Discov. Today. 2018;23(6):1241–1250.

    [7] Qian X., et al. MPC-PI cascade control for the Kaibel dividing wall column integrated with data-driven soft sensor model. Chem. Eng. Sci. 2020;116240.

    [8] Biegler L.T., Grossmann I.E. Retrospective on optimization. Comput. Chem. Eng. 2004;28(8):1169–1192.

    [9] Grossmann I.E., Biegler L.T. Part II. Future perspective on optimization. Comput. Chem. Eng. 2004;28(8):1193–1218.

    [10] Tian X., et al. Sustainable design of geothermal energy systems for electric power generation using life cycle optimization. AICHE J. 2020;66(4):e16898.

    [11] Zhao N., Lehmann J., You F. Poultry waste valorization via pyrolysis technologies: economic and environmental life cycle optimization for sustainable bioenergy systems. ACS Sustain. Chem. Eng. 2020;8(11):4633–4646.

    [12] McCarthy J., et al. A proposal for the Dartmouth summer research project on artificial intelligence, August 31, 1955. AI Mag. 2006;27(4):12.

    [13] Zhang L., et al. A machine learning based computer-aided molecular design/screening methodology for fragrance molecules. Comput. Chem. Eng. 2018;115:295–308.

    [14] Alpaydin E. Introduction to Machine Learning. MIT Press; 2020.

    [15] Lee J.H., Shin J., Realff M.J. Machine learning: overview of the recent progresses and implications for the process systems engineering field. Comput. Chem. Eng. 2018;114:111–121.

    [16] Dietterich T.G. Machine-learning research. AI Mag. 1997;18(4):97.

    [17] Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958;65(6):386.

    [18] Rumelhart D.E., Hinton G.E., Williams R.J. Learning Internal Representations by Error Propagation. California Univ San Diego La Jolla Inst for Cognitive Science; 1985.

    [19] Hinton G.E., Osindero S., Teh Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–1554.

    [20] LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521(7553):436–444.

    [21] Rudd D.F., Powers G.J., Siirola J.J. Process Synthesis. Prentice-Hall; 1973.

    [22] Siirola J.J., Rudd D.F. Computer-aided synthesis of chemical process designs. From reaction path data to the process task network. Ind. Eng. Chem. Fundam. 1971;10(3):353–362.

    [23] Venkatasubramanian V. The promise of artificial intelligence in chemical engineering: is it here, finally?. AICHE J. 2019;65(2):466–478.

    [24] Su Y., et al. An architecture of deep learning in QSPR modeling for the prediction of critical properties using molecular signatures. AICHE J. 2019;65(9):e16678.

    [25] Wang Z., et al. Predictive deep learning models for environmental properties: the direct calculation of octanol-water partition coefficients from molecular graphs. Green Chem. 2019;21(16):4555–4565.

    [26] Wang Z., et al. A novel unambiguous strategy of molecular feature extraction in machine learning assisted predictive models for environmental properties. Green Chem. 2020;22(12):3867–3876.

    [27] Zhang L., et al. Chemical product design—recent advances and perspectives. Curr. Opin. Chem. Eng. 2020;27:22–34.

    [28] Linninger A.A., et al. Generation and assessment of batch processes with ecological considerations. Comput. Chem. Eng. 1995;19:7–13.

    [29] Chai S., et al. A grand product design model for crystallization solvent design. Comput. Chem. Eng. 2020;135:106764.

    [30] Zhang X., et al. Food product design: a hybrid machine learning and mechanistic modeling approach. Ind. Eng. Chem. Res. 2019;58(36):16743–16752.

    [31] Plehiers P.P., et al. Artificial intelligence in steam cracking modeling: a deep learning algorithm for detailed effluent prediction. Engineering. 2019;5(6):1027–1040.

    [32] George-Ufot G., Qu Y., Orji I.J. Sustainable lifestyle factors influencing industries' electric consumption patterns using fuzzy logic and DEMATEL: the Nigerian perspective. J. Clean. Prod. 2017;162:624–634.

    [33] Hu Y., et al. Short term electric load forecasting model and its verification for process industrial enterprises based on hybrid GA-PSO-BPNN algorithm—a case study of papermaking process. Energy. 2019;170:1215–1227.

    [34] Chen C., et al. Energy consumption modelling using deep learning embedded semi-supervised learning. Comput. Ind. Eng. 2019;135:757–765.

    [35] Picos-Benítez A.R., et al. The use of artificial intelligence models in the prediction of optimum operational conditions for the treatment of dye wastewaters with similar structural characteristics. Process Saf. Environ. Prot. 2020;143:36–44.

    [36] Sagheer A., Kotb M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing. 2019;323:203–213.

    [37] Ming L., Zhao J. Review on chemical process fault detection and diagnosis. In: 2017 6th International Symposium on Advanced Control of Industrial Processes (AdCONIP); IEEE; 2017.

    [38] Nor N.M., Hassan C.R.C., Hussain M.A. A review of data-driven fault detection and diagnosis methods: applications in chemical process systems. Rev. Chem. Eng. 2019;1: (ahead-of-print).

    [39] Venkatasubramanian V., Chan K. A neural network methodology for process fault diagnosis. AICHE J. 1989;35(12):1993–2002.

    [40] Ungar L., Powell B., Kamens S. Adaptive networks for fault diagnosis and process control. Comput. Chem. Eng. 1990;14(4–5):561–572.

    [41] Ahmad A., Hamid M., Mohammed K. Neural networks for process monitoring, control and fault detection: application to Tennessee Eastman plant. In: Proceedings of the Malaysian Science and Technology Congress, Melaka, Malaysia; Johor Bahru, Malaysia: Universiti Teknologi Malaysia; 2001.

    [42] Othman M.R., Ali M.W., Kamsah M.Z. Process fault detection using hierarchical artificial neural network diagnostic strategy. J. Teknol. 2007;46(1):11–26.

    [43] Wang Y., et al. A novel deep learning based fault diagnosis approach for chemical process with extended deep belief network. ISA Trans. 2020;96:457–467.

    [44] Qiu P., et al. Data-driven analysis and optimization of externally heat-integrated distillation columns (EHIDiC). Energy. 2019;189:116177.

    [45] Khezri V., et al. Hybrid artificial neural network–genetic algorithm-based technique to optimize a steady-state gas-to-liquids plant. Ind. Eng. Chem. Res. 2020;59(18):8674–8687.

    [46] Liu F., et al. On the robust and stable flowshop scheduling under stochastic and dynamic disruptions. IEEE Trans. Eng. Manage. 2017;64(4):539–553.

    [47] Zeng Z., et al. Multi-object optimization of flexible flow shop scheduling with batch process—consideration total electricity consumption and material wastage. J. Clean. Prod. 2018;183:925–939.

    [48] Zhu Q., et al. Model reductions for multiscale stochastic optimization of cooling water system equipped with closed wet cooling towers. Chem. Eng. Sci. 2020;224:115773.

    [49] Joe W., Lau H.C. Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers. In: Proceedings of the Thirtieth International Conference on Automated Planning and Scheduling; 2020:394–402.

    Chapter 2: Deep learning in QSPR modeling for the prediction of critical properties

    Yang Su; Weifeng Shen    School of Chemistry and Chemical Engineering, Chongqing University, Chongqing, People's Republic of China

    Abstract

    Deep learning rapidly promotes many fields with successful stories in natural language processing. An architecture of deep neural network (DNN) combining tree-structured long short-term memory (Tree-LSTM) network and back-propagation neural network (BPNN) is developed for predicting physical properties. Inspired by the natural language processing in artificial intelligence, we firstly developed a strategy for data preparation including encoding molecules with canonical molecular signatures and vectorizing bond-substrings by an embedding algorithm. Then, the dynamic neural network named Tree-LSTM is employed to depict molecular tree data-structures while the BPNN is used to correlate properties. To evaluate the performance of proposed DNN, the critical properties of nearly 1800 compounds are employed for training and testing the DNN models. As compared with classical group contribution methods, it can be demonstrated that the learned DNN models are able to provide more accurate prediction and cover more diverse molecular structures without considering frequencies of substructures.

    Keywords

    Deep learning; Neural network; Signature molecular descriptor; Property prediction; Critical properties

    Acknowledgments

    The chapter is reprinted from AIChE Journal, 2019, 65 (9), e16678, Yang Su, Zihao Wang, Saimeng Jin, Weifeng Shen, Jingzheng Ren, Mario R. Eden, An architecture of deep learning in QSPR modeling for the prediction of critical properties using molecular signatures, by permission of John Wiley & Sons.

    1: Introduction

    The chemical process and product design rely heavily on physical properties (e.g., critical properties) and prediction models [1, 2]. To investigate relationships between molecular structures and properties, plenty of mathematical models have been developed [3]. Most prediction models are based on semi-empirical quantitative structure property relationships (QSPRs) including group contribution (GC) methods and topological indices (TIs).

    In GC methods, any compound can be divided into fragments (e.g., atoms, bonds, groups of both atoms, and bonds). Each fragment has a partial value called a contribution, and the final property value is given by summing the fragmental contributions. A large variety of these models has been designed differing in the field of their applicability and in the set of experimental data. For example, GC methods reported by Lydersen [4], Klincewicz and Reid [5], Joback and Reid [6], Constantinou and Gani [7], and Marrero and Gani [8] are generally suitable to obtain values of physical properties, because these methods provide the advantage of quick estimation without substantial computational work. As alternative approaches, topological indices (TIs) are used to estimate properties similar to the way of GC methods. In TI methods, molecular topology is characterized depending on standard molecular graph properties such as vertex degrees, connectivity, atomic types, etc. Additionally, one of the main advantages is that TI methods can make a distinction between two similar structures from a more holistic perspective than GC methods [9].

    Another method named ‘signature molecular descriptor’ combining the advantages of GC and TI methods is developed by Faulon et al. [10, 11] Similar to TI methods, chemical structures is conceived of as chemical graphs. The signature descriptor retains all the structural and connectivity information of every atom in a molecule, rather than ascribe various numerical values to a complete molecular graph [9]. Meanwhile, the signature descriptor has the ability to represent molecular substructures similar to GC methods. Faulon et al. [12] also introduced a canonical form of molecular signatures to solve molecular graph isomorphism which provides a holistic picture depicting molecular graphs and also holds the sub-structural information of a molecule. Nevertheless, we found that the previous researches have few attempts to use the canonical molecular signature for QSPR modeling. To the best of our knowledge, the main reason is that the canonical molecular signature is not represented in a numeric form, and it cannot be employed within the common-used mathematical models for QSPRs.

    For the property estimation, most above-mentioned QSPR models, based on the specific rules such as a certain set of molecular substructures or an array of molecular graph-theoretic properties, are often formulated by multiple linear regressions (MLRs). Facts proved the MLR techniques have strong ability to correlate QSPRs; however, their encoding rules and mapping functions are defined a priori (i.e., mathematical formulations are not adaptive to the different regression tasks). Moreover, the MLRs cannot be applied with the canonical molecular signatures for QSPR modeling. On the other hand, an alternative technique, the neural network, has been used to learn molecular structures and correlate physical properties or activities [13]. A variety of molecular descriptors (e.g., topological characteristics, frequency of molecule substructures, and microscopic data of molecules) are fed to artificial neural networks (ANNs). With the limitation of the computing capability and development platform at that period, most researchers adopted feedforward neural networks (FNNs) with static computing graphs in their studies [14–32].

    Although these methods are well-used or precise in properties prediction, the molecular features are chosen manually as the input for above-mentioned models. For example, the splitting rules of molecular groups are pre-determined manually in the GC methods, or the well-chosen descriptors are input the ANNs. With the number of various properties and product designs has been increasing, some properties/activities may need to be correlated with more molecular features or calculated by more complex mathematical models. It is therefore a challenge to pick out relevant features of molecules from massive data in the classical QSPR modeling.

    Recently, many researchers were encouraged to study deep learning in artificial intelligence with improvements of computing performance. The deep learning is a much more intelligent technique that can capture the valuable features automatically. This advantage enables deep neural networks (DNNs) to formulate models from a great variety of Big Data. As such, some new information carriers (e.g., graphs, images, text, and 3D models) could be used to represent molecular structures in the QSPR modeling with DNNs. Lusci et al. [33] utilized the recurrent neural networks (RNNs) to present a molecular graph by considering molecules as undirected graphs (UGs) and proposed an approach for mapping aqueous solubility to molecular structures. Goh et al. [34] developed a deep RNN SMILES2vec that automatically learns features from simplified molecular-input line-entry system [35] (SMILES) to correlate properties without the aid of additional explicit feature engineering. Goh et al. [36] also developed a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules without providing any additional explicit chemistry knowledge such as periodicity, molecular descriptors, and fingerprints. These creative works [34, 36] demonstrate the plausibility of using DNNs to assist in computational chemistry researches. The neural networks based on the long short-term memory (LSTM) units suggested by Hochreiter et al. [37] also have been adopted in the quantitative structure–activity relationship (QSAR) researches. Altae-Tran et al. [38] proposed a new deep learning architecture based on the iterative refinement LSTM to improve learning of meaningful distance metrics over small-molecules. The Tree-LSTM introduced by Tai et al. [39] is able to capture the syntactic properties of natural languages and two natural extensions were proposed depending on the basic LSTM architecture, which outperform other RNNs in their experiments. We noticed that the new neural network Tree-LSTM might be possible to depict the canonical molecular signature.

    Motivated by the preceding researches, in this contribution, we focus on developing a deep learning approach that can learn QSPRs automatically and cover a wider range of substances for better predictive capabilities. A Python-based implementation with Faulon’s algorithm [12] is achieved to convert molecules into canonical signatures for depicting molecular graphs and an in-house encoding approach is developed to parse the signatures into tree data-structures conveniently. The Tree-LSTM network and BPNN are incorporated into the DNN for modeling QSPR, among which, the Tree-LSTM mimics the tree structures of canonical signatures and outputs a feature vector that is used to correlate properties within a BPNN. As such, there is no need to convert molecules to bitmap images for training CNNs and to treat molecules as linear languages for training RNNs. Then, the novelty of the proposed approach is that the canonical molecular signatures are used as templates to generate the topological structures of Tree-LSTM networks. In this sense, the contribution of this work is to propose an intelligent strategy of QSPR modeling based on deep learning that can extract the valuable features from molecular structures automatically. An important type of properties in process and product designs, critical properties, is used as a case study to clarify the main details of the deep learning architecture, which highlights the outperformance of the implemented QSPR modeling strategies within the proposed DNN.

    2: Methodology

    In this section, the technical details respected to the deep learning architecture for modeling QSPR will be introduced. The proposed deep learning architecture incorporates multiple techniques that including canonical molecular signatures, word embedding, Tree-LSTM network, BPNN, etc. The proposed architecture consisting of eight steps is illustrated in Fig. 1. Step 1 mainly involves the data acquisition of molecular structures, where the SMILES expressions are captured from open access databases. The second step is the embedding stage, where the vectors representing the substrings of chemical bonds are generated and collected into a dictionary with a widely used word-embedding algorithm. The third step is focused on the canonization of a molecule, where the molecular structures are transformed into the canonical molecular signatures as the templates for formulating the Tree-LSTM network. Step 4 refers to the mapping stage, where the adaptive structure of the Tree-LSTM network is obtained by the recursive algorithm from the canonical signature. In other words, the Tree-LSTM network is self-adaptive to a molecule. Step 5 involves the inputting vectors of each substrings corresponding to each node. The Tree-LSTM network will be calculated from the lowest leaf node to the root node in this step. Finally, a vector representing a molecule is given from the root node. Step 6 is focused on the correlation stage of a property, where the vector representing a molecule is input into a BPNN to compute a scalar output for the property prediction. Step 7 is the comparison stage, where the tolerance between the predicted value and the experimental value is calculated. Step 8 is the feedback stage, where the adjustable parameters in the Tree-LSTM network and the BPNN are corrected for reducing the tolerance in step 7. The training process of the proposed DNN is the iterative loop within steps 5, 6, 7, and 8.

    Fig. 1

    Fig. 1 Schematic diagram of technical architecture for deep learning in the prediction of physical properties.

    2.1: The signature molecular descriptor

    The canonical molecular signature is employed to depict molecules in this work. One reason is that a computer program can generate signatures automatically. Another important reason is that the canonical molecular signature provides a method to distinguish molecular structures for isomorphism. This also transforms the molecules with a uniform form for mapping to the neural network model.

    To introduce canonical molecular signatures, atomic and molecular signatures have to be defined. An atomic signature is a subgraph originated at a specific root atom and includes all atoms/bonds extending out to the predefined distance, without backtracking. The predefined distance is a user-specified parameter called the signature height h, and it determines the size of the local neighborhood of atoms in a molecule. It means that specified a certain root atom in a chemical graph, its atomic signature represents all of the atoms that are within a certain distance h, from the root. The atomic signature of atom x in height h given as hσG(x), is a representation of the subgraph of the 2D graph G = (V, E) containing all atoms that are at distance h from x. It is noted that V, E corresponds to the vertex (atom) set and edge (bond) set, respectively. Acetaldoxime (CAS No. 107-29-9) is taken as an example to provide atomic signatures shown in Fig. 2. The carbon atom numbered by 0 (C0) is given as the root atom, and it is single-bonded to three hydrogen atoms and another carbon atom numbered by 1 (C1). Thus, the atomic signature for this root atom at height 1 is [C]([C][H][H][H]), the other atomic signatures is shown in Fig. 2B.

    Fig. 2

    Fig. 2 Signature descriptors generating from the molecule of acetaldoxime. (Note: C0 is the root. An atomic signature begins from this root atom numbered by 0, steps forward a predetermined height, and records all the atoms encountered on the path are connected to the root atom. The process is repeated for all atoms in the molecule at a certain height, and then the molecular signature will be achieved by a linear combination of all the atomic signatures. Acetaldoxime is used as an example, we present the heights from the root atom in: (A) the molecular structure; (B) tree form and atomic signatures of different heights from a certain atom; (C) the molecular signatures from height = 0 to height = 1; (D) the canonical molecular signature in the red rectangle is the lexicographically largest atomic signature. The atoms found further down the tree of a branch point atom are marked by nested parenthesis. Single bonds between atoms are omitted in atomic signatures. In addition, other bond-types are presented as follows (=is double bond; # is a triple bond; : is an aromatic bond).)

    In Faulon’s theory [8], the molecular signature shown in Fig. 2C is a linear combination of all the atomic signatures and is defined as Eq. (1).

    si12_e    (1)

    In a given compound, any atomic signature can appear more than once. For example, the atomic signature [H]([C]) occurs four times in acetaldoxime. When the height of atomic signatures reaches the maximum value, the molecular graph can be reconstructed from any of the atomic signatures. Consequently, as long as graph canonization is concerned, there is no need to record all atomic signatures. The lexicographically largest atomic signature suffices to represent the graph in a unique manner [10]. For example, acetaldoxime has nine atomic signatures at the maximum height as shown in Fig. 2D, and each of them is able to describe the complete molecular structure. If these nine signatures are sorted in decreasing lexicographic order (a canonical order), the lexicographically largest one can be defined as the canonical molecular signature that could be encoded and then mapped to the Tree-LSTM network.

    2.2: Data preparation: Molecules encoding and canonizing

    In this work, SMILES expressions that used for depicting molecular structures are gathered from PubChem database [40]. We developed a program based on RDKit [41] for parsing and preserving the canonical molecular signature. The program implements Faulon’s algorithm to generate and canonize atomic signatures, which can translate SMILES expressions to molecular graphs before canonizing molecular structures. There exist two rules for coding molecular structures in this program, one is the canonical string encoding a canonical molecular signature, and the other is the developed in-house coding method. The canonical molecular signature is used to determine the root atom in different molecules. However, it is difficult to reproduce the molecular structures and feed into the neural network from a molecular signature represented by a canonical string. When training the neural networks, one needs a more straightforward and simpler expression for parsing a molecule as a tree data-structure. As such, a specified in-house coding method is proposed. Again, taking acetaldoxime as an example, the codes of atomic signatures from height 0 to 4 are demonstrated in Table 1, which is started from the C0 in Fig. 2B.

    Table 1

    Note: The atomic signature only contains one atom that expressed by a short string when its height is zero. If the height of an atomic signature is greater than zero, the atomic signature consists of more than one atom. Every atom and the information are represented by a substring between two vertical bars in a line of code.

    The substring shown in Fig. 3 represents the atom carbon in the atomic signature of height 1. The first character in the code involves the layer position (height) of this atom in the tree form of a molecule, and the character S refers to the layer position equals 0. The second character 0 is the initial position index of the atom in a molecule. The third and fourth characters represent the number of neighbor atoms and children atoms, respectively. The next character C is the SMILES expression of an atom. The sixth character is the initial position index of parent atom. When an atom is a root atom of a signature structure, the sixth character S represents the atom has no superior atom. The seventh character is the valence of this atom. The eighth character represents the type of bond between this atom and its parent atom. The last character indicates isomeric type of the bond between this atom and its parent atom.

    Fig. 3

    Fig. 3 Signature coding for the molecule of acetaldoxime when height is one.

    2.3: Data preparation: Atom embedding from chemical bonds

    As the input of Tree-LSTM networks, atoms and bonds need to be translated and represented in form of vectors. Word embedding has been widely applied in natural language processing, several known program in the field has been developed, such as Word2vec [40]. Inspired by this method, we proposed a simple approach to generate vector representations of atoms (see Fig. 4) by breaking a chemical bond string into two smaller particles.

    Fig. 4

    Fig. 4 The procedure of the embedding neural network for vectorizing the bond-strings.

    As we all know, chemical bonds are frequently represented in form of A-B, A and B represent atoms, and - represents chemical bond types between two atoms. The strings as A-B are extracted from a data set of molecular structures, and then it is split into two part, A and -B, as the samples to train the embedding neural network. For this application, the skip-gram algorithm [42] is employed. As such, the substrings A and -B can be mapped into vectors for expressing each node in the Tree-LSTM network. In other words, a molecule is considered as a sentence in the embedding algorithm, and A or -B is equivalent to a

    Enjoying the preview?
    Page 1 of 1