Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Big Mechanisms in Systems Biology: Big Data Mining, Network Modeling, and Genome-Wide Data Identification
Big Mechanisms in Systems Biology: Big Data Mining, Network Modeling, and Genome-Wide Data Identification
Big Mechanisms in Systems Biology: Big Data Mining, Network Modeling, and Genome-Wide Data Identification
Ebook1,627 pages16 hours

Big Mechanisms in Systems Biology: Big Data Mining, Network Modeling, and Genome-Wide Data Identification

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Big Mechanisms in Systems Biology: Big Data Mining, Network Modeling, and Genome-Wide Data Identification explains big mechanisms of systems biology by system identification and big data mining methods using models of biological systems. Systems biology is currently undergoing revolutionary changes in response to the integration of powerful technologies. Faced with a large volume of available literature, complicated mechanisms, small prior knowledge, few classes on the topics, and causal and mechanistic language, this is an ideal resource.

This book addresses system immunity, regulation, infection, aging, evolution, and carcinogenesis, which are complicated biological systems with inconsistent findings in existing resources. These inconsistencies may reflect the underlying biology time-varying systems and signal transduction events that are often context-dependent, which raises a significant problem for mechanistic modeling since it is not clear which genes/proteins to include in models or experimental measurements.

The book is a valuable resource for bioinformaticians and members of several areas of the biomedical field who are interested in an in-depth understanding on how to process and apply great amounts of biological data to improve research.

  • Written in a didactic manner in order to explain how to investigate Big Mechanisms by big data mining and system identification
  • Provides more than 140 diagrams to illustrate Big Mechanism in systems biology
  • Presents worked examples in each chapter
LanguageEnglish
Release dateOct 25, 2016
ISBN9780128097076
Big Mechanisms in Systems Biology: Big Data Mining, Network Modeling, and Genome-Wide Data Identification
Author

Bor-Sen Chen

Bor-Sen Chen received B.S. degree of electrical Engineering from Tatung Institute of Technology in 1970, M.S. degree of Geophysics from National Central University in 1973, and PhD in Electrical Engineering from University of Southern California in 1982. He is an expert on the topic of nonlinear robust control and filter designs based on stochastic Nash game theory to override the influence of intrinsic random fluctuations and attenuate the effect of environmental disturbances, which can be applied to evolutionary game strategies of biological networks under natural selection to respond to random genetic variations and environmental disturbances in the evolutionary process. Prof. Chen had audited more than 10 courses of biology before his research in systems biology. He has published about 100 papers in bioinformatics and systems biology. Further, he have published more than 100 papers in system theory and control, and more than 80 papers of signal processing and communication. In the last three years, he has also published 7 monographs. He was elected to an IEEE Fellow in 2001 and became an IEEE Life Fellow in 2014.

Related to Big Mechanisms in Systems Biology

Related ebooks

Biology For You

View More

Related articles

Reviews for Big Mechanisms in Systems Biology

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Big Mechanisms in Systems Biology - Bor-Sen Chen

    Big Mechanisms in Systems Biology

    Big Data Mining, Network Modeling, and Genome-Wide Data Identification

    Bor-Sen Chen

    National Tsing Hua University, Hsinchu, Taiwan

    Cheng-Wei Li

    National Tsing Hua University, Hsinchu, Taiwan

    Table of Contents

    Cover image

    Title page

    Copyright

    Chapter 1. Introduction to Big Mechanisms in Systems Biology

    Abstract

    Introduction

    1.1 Introduction to Big Mechanisms

    1.2 Big Mechanisms in Systems Biology

    1.3 The Scope of Big Mechanisms of Systems Biology in This Book

    References

    Chapter 2. System Modeling and System Identification Methods for Big Mechanisms in Biological Systems

    Abstract

    Introduction

    2.1 Dynamic System Models and Their Parameter Estimation by Time-Profile Experimental Data

    2.2 Static Models and Their Parameter Estimation by Sample Microarray Data

    2.3 Modeling and Identification of Integrated Genetic and Epigenetic Cellular Networks

    2.4 The Core Network by PNP of the Integrated Genetic and Epigenetic Cellular Network Using PCA

    References

    Chapter 3. Procedure for Exploring Big Mechanisms of Systems Biology Through System Identification and Big Database Mining

    Abstract

    Introduction

    3.1 Big Mechanisms Based on GRNs by System Identification and Big Database Mining

    3.2 Big Mechanisms Based on PPINs by System Identification and Big Database Mining

    3.3 Big Mechanisms Based on the Integrated GRN and PPIN by System Identification and Big Database Mining

    3.4 Big Mechanisms Based on the Integrated Genetic and Epigenetic Cellular Network by System Identification and Big Database Mining

    References

    Chapter 4. Big Cellular Mechanisms in the Cell Cycle by System Identification and Big Data Mining

    Abstract

    Introduction

    4.1 Constructing Transcriptional Regulatory Network to Investigate the Big Mechanisms in the Yeast Cell Cycle by System Identification and Big Data Mining

    Appendix A Matched Filter for Selecting More Correlated Regulators in Yeast Cell Cycle

    4.2 Constructing TRMs for Big Regulatory Mechanisms of the Yeast Cell Cycle

    Appendix B Methods and Figures

    References

    Chapter 5. Big Regulatory Mechanisms in the Transcriptional Regulation Control of Gene Expression Using a Stochastic System Model and Genome-Wide Experimental Data

    Abstract

    Introduction

    5.1 Identification of TF Cooperativity in Gene Regulation of the Cell Cycle via the Stochastic System Model

    Appendix A Methods in Identifying the TF Cooperativity

    5.2 Cis-Regulatory Mechanisms for Gene Expression via Cross-Gene Identification and Data Mining

    5.3 Nonlinear Dynamic Trans/Cis-Regulatory Mechanisms for Gene Transcription via Microarray Data

    Appendix B Figures

    References

    Chapter 6. Big Mechanisms of Information Flow in Cellular Systems in Response to Environmental Stress Signals via System Identification and Data Mining

    Abstract

    Introduction

    6.1 Constructing Stress-Response Mechanisms via Dynamic Gene Regulatory Modeling and Data Mining

    6.2 Identifying Protective Mechanisms of Gene and Protein Networks in Response to a Broad Range of Environmental Stress Signals

    6.3 Constructing GRNs for Control Mechanisms of Photosynthetic Light Acclimation in Response to Different Light Signals

    6.4 Constructing IGECN for Investigating Whole Cellular Signal Flow Mechanisms in Response to Environmental Stress Signals Using High-Throughput NGS

    References

    Chapter 7. Big Offensive and Defensive Mechanisms in Systems Immunity From System Modeling and Big Data Mining

    Abstract

    Introduction

    7.1 A Systems Biology Approach to Construct the GRN of Systemic Inflammation Mechanisms via Microarray and Databases Mining

    Appendix A Tables and Figures

    7.2 Identification of Infection and Defense-Related Mechanisms via a Dynamic Host–Pathogen Interaction Network Using C. albicans-Zebrafish Infection Model

    Appendix B Methods, Tables, and Figures

    7.3 Investigating Host–Pathogen Interaction Networks to Reveal the Pathogenic Mechanism in HIV Infection: A Systems Biology Approach

    Appendix C Figures

    References

    Chapter 8. Big Regeneration Mechanisms via Systems Biology and Big Database Mining Methods

    Abstract

    Introduction

    8.1 Dynamic System Mechanisms in the Three Differentiation Stages of Stem Cells to Reveal Essential Proteins and Functional Modules in the Directed Differentiation Process

    Appendix A Figures

    8.2 Cerebella Regeneration-Related Pathways and Their Crosstalks in Molecular Restoration Mechanisms After TBI in Zebrafish

    Appendix B Methods, Tables, and Figures

    References

    Chapter 9. Big Tumorigenesis Mechanisms in Systems Cancer Biology via Big Database Mining and Network Modeling

    Abstract

    Introduction

    9.1 Construction and Clarification of Dynamic Networks of the Cancer Cell Cycle via Microarray Data

    Appendix A Methods

    9.2 Investigating Tumorigenesis Mechanisms by Cancer-Perturbed PPINs

    Appendix B Methods of Constructing Cancer-Perturbed PPINs

    9.3 A Network-Based Biomarker Approach for Molecular Investigation and Diagnosis of Lung Cancer

    Appendix C Tables and Figures

    9.4 Network Biomarkers of Bladder Cancer Based on a Genome-Wide Genetic and Epigenetic Network Derived From NGS Data

    References

    Chapter 10. Big Evolutionary Mechanisms of Network Robustness and Signaling Transductivity in Aging and Carcinogenic Process by System Modeling and Database Mining

    Abstract

    Introduction

    10.1 New Measurement Methods of Network Robustness and Response Ability in Aging and Carcinogenic Process via Microarray Data and Dynamic System Model

    Appendix A Methods and Figures

    10.2 Evolution of Signal Transductivities of Coupled Signal Pathways in the Carcinogenic Process

    Appendix B Figures

    10.3 Nonlinear Stochastic Game Strategy for Evolution Mechanisms of Organ Carcinogenesis Under a Natural Selection Scheme

    Appendix C

    References

    Chapter 11. Big Mechanisms of Aging via System Identification and Big Database Mining

    Abstract

    Introduction

    11.1 On the Systematic Mechanism of GRN in the Aging Process: A Systems Biology Approach via Microarray Data

    11.2 Investigating Specific Core GEN for Cellular Mechanisms of Human Aging via NGS Data

    References

    Chapter 12. Big Drug Design Mechanisms via Systems Biology and Big Database Mining

    Abstract

    Introduction

    12.1 Overview of Drug Discovery Using Systems Biology

    12.2 Investigating Core and Specific Network Markers of Cancers for Multiple Drug Targets

    Appendix A Methods, Tables, and Figures

    12.3 Systems Drug Design Mechanisms for Multiple Drug Targets

    Appendix B Method and Table

    References

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    125 London Wall, London EC2Y 5AS, United Kingdom

    525 B Street, Suite 1800, San Diego, CA 92101-4495, United States

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    Copyright © 2017 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    ISBN: 978-0-12-809479-2

    For Information on all Academic Press publications visit our website at https://www.elsevier.com

    Publisher: Mica Haley

    Acquisition Editor: Rafael Teixeira

    Editorial Project Manager: Mariana Kuhl

    Production Project Manager: Karen East and Kirsty Halterman

    Designer: Victoria Pearson

    Typeset by MPS Limited, Chennai, India

    Chapter 1

    Introduction to Big Mechanisms in Systems Biology

    Abstract

    Until now, the theorems, analyses, and mechanisms of complicated systems, such as the ecosystem, biological systems, economic systems, and social systems, remain sectioned because their data and literature are subjected to fragmentation, distribution, and inconsistency. With increasing data (big data) and database expansion (big database), the method of integrating information to construct an explanatory pattern (or mechanistic models) to determine overall mechanisms is still a challenge. This explanatory pattern (or mechanistic models) is referred to as the big mechanism.

    Systems biology is centered at the heart of a new integrative 21st century paradigm of biology. In this chapter, the scope of the Big Mechanisms of systems biology is introduced. Further, we show how the block diagram can help resolve and describe the Big Mechanisms of systems biology via system modeling, system identification, and extensive data.

    Keywords

    Big Mechanisms; complicated systems; big data; systems biology; system identification; explanatory pattern; mechanistic model

    Introduction

    Currently, the reductionist approach in science leads to a causal model that has small fragments from complicated systems. However, constructing causal models of entire systems is difficult because the required information is distributed across the exhaustive literature. In the United States, The Defense Advanced Research Projects Agency (DARPA) created the Big Mechanism program (BMP) to develop technology for constructing, understanding, and reasoning of large, complicated systems such as climatic, economic, ecological, and biological systems. At present, BMP focuses on cancer signaling pathways, but the technology is intended to be applied for general purposes [1,2]. In the future, the BMP aims to produce machines that can read literature and assemble causal fragments found in individual articles into larger causal models. For example, the computer can gather information from the literature on cancer biology, extract fragments of causal mechanisms from publications, assemble the mechanisms into executable models of unprecedented scale and fidelity, use these models to explain and predict aspects of cancer biology, and even test these predictions in vitro [1,2]. BMP aims to use existing online sources of biological knowledge as well as existing machine reading and information extraction methods, such as big data and big data mining methods, to develop representation and inference methods for mechanistic biology models and methods for the machine to understand the claims and evidence in papers, algorithms, or system modeling for inferring causal relationships through system identification via big data and big data mining [2].

    In this chapter, we introduce Big Mechanisms, which are mechanistic models of complicated systems with too many elements and relationships or too many possible current or future states to be easily comprehended by humans. Next, we introduce Big Mechanisms in systems biology via system modeling and identification through genome-wide data and big data mining methods. Since the focus of the BMP is on systems biology, in this book, the scope of Big Mechanisms in systems biology is on the system mechanisms of the cell cycle, signal flow, immunity, regeneration, infection, aging, evolution, carcinogenesis, and medicine based on system models, big database mining, and genome-wide high-throughput experimental data. This book is designed to develop systematic methods to help scientists understand the big mechanisms of very complicated biological systems.

    1.1 Introduction to Big Mechanisms

    The DARPA has recently introduced a search for technologies based on a new kind of science in which research is integrated automatically or semi-automatically into causal, explanatory models [1]. It emphasizes the need for solutions to complex systems such as ecosystems, brains, economics, and social systems that have parts and processes that are currently studied piecemeal, with literature and data that are fragmented, distributed, and inconsistent. It addresses the problem of information overload that we are all currently experiencing in Big Data environments. In recent years, Big Data has transformed science, engineering, medicine, healthcare, finance, business, military, and ultimately society itself. For example, analyses of command information flows during military crises have been suggested as an approach to ontogenetic learning that avoids both the problem of describing the subject covered in a document, and the problem of integrating new subject matter into a predetermined classification code. We need to make computer-readable documents and data, and assemble the fragments into explanatory pattern. This would be part of the Big Mechanisms that explain causes and effects within systems.

    Therefore, Big Mechanisms are made up of many small mechanisms, which are dispersed and need to be integrated into a knowledge base to understand the mechanisms of big and complex systems. However, our knowledge of these mechanisms is increasingly fragmented, voluminous, and inconsistent. It is of central importance to plan and follow the big database mining process to create a knowledge base, and investigate how to put this knowledge base into both a relational and graph format to perform analyses and visualizations for the development of knowledge and for application. DARPA’s goal is to develop technologies for a new kind of science in which many areas of research are integrated into causal, explanatory, or mechanistic models, with deeper semantics to represent the causal and often kinetic models in response to the challenge of big data that science and industry are focus today.

    At present, the domain of the BMP is systems biology [1,3,4]. The amount of data produced by the study of biological systems in different species and at different scales, from molecules to ecosystems, is growing exponentially. As this increase in information presents challenges to some areas of biology, systems biology researchers employ bioinformatics technologies to handle large amounts of data, from local production through to storage in publicly accessible and integrative depositories. Systems biology is a new area of study in biology that seeks to incorporate bioinformatics, statistics, mathematics, physics, chemistry, biology, and engineering, and that promises to advance and transform our ability to increase biological understanding, program biological functions, and render the engineering of biological circuits or pathways faster and more predictable. A large-scale, integrative, and multidisciplinary approach is needed for systems biology to flourish.

    1.2 Big Mechanisms in Systems Biology

    This book addresses Big Mechanisms of systems biology by system identification and big data mining methods using models of biological systems [2,5]. Faced with a large volume of available literature, complicated mechanisms, small prior knowledge, few classes on the topic, and causal and mechanistic language, biological research is currently undergoing revolutionary changes in response to the integration of powerful technologies. Immunity, regeneration, infection, aging, evolution, and carcinogenesis are complicated biological systems, and related biological knowledge is fragmented, voluminous, and even inconsistent. The authors have observed remarkably poor agreement among different databases regarding components of these biological systems. These inconsistencies may reflect the underlying biology with time-varying systems and signal transduction events are often context-dependent. This raises a significant problem for mechanistic modeling since it is not clear which genes/proteins to include in models or experimental measurements.

    In this book, we first construct candidate biological networks from omics data and big data mining that contain many false positives, interaction inconsistencies, and contradictions (see Fig. 1.1). System models are then employed to describe the relationship between components in the biological network. System identification (reverse engineering) methods are used to identify the interaction parameters of candidate biological network by experimental measurement via microarray data, NGS (next-generation sequencing) data, or real-time PCR data in a specific biological condition, such as lung cancer. Finally, the system order detection method is employed to determine the system order (the number of interactions) of the biological network so we can delete the insignificant interactions from the system order and prune false positives, inconsistencies, or contradictory interactions in the candidate biological network to obtain the real biological network of a specific biological condition.

    Figure 1.1 A block diagram of Big Mechanisms of systems biology via system identification and big data mining.

    After detection and removing inconsistencies between the experimental data and candidate biological network, biological networks in different biological conditions are obtained. We can compare two biological networks in different conditions to find significant changes in interactions or evolution of biological conditions (e.g., from a normal to cancer condition) as the network markers of these conditions change. From some related biological websites, like GO Enrichment Analysis, we can find biological functions or pathways to interpret system mechanisms of these network markers and identify adequate drug targets by methods to analyze sensitivity and robustness for the therapeutic treatment.

    In this book, the Big Mechanisms of biological systems, like the immune system, regeneration process, photosynthetic system, and cancer are constructed by systems biology and big database mining methods. Systems biology sits at the heart of a new integrative 21st century paradigm of biology. It has introduced a search for a new kind of scientific research based on big data mining and causal and explanatory models of experimental data. If we could resolve Big Mechanisms of systems biology by system modeling, system identification, and big data, the results could provide a paradigm for solutions to Big Mechanisms of complex systems, such as ecosystems, brains, economics, and social systems in the future.

    1.3 The Scope of Big Mechanisms of Systems Biology in This Book

    In order to describe the system mechanisms and behaviors, system modeling and system identification methods are introduced with big data mining in Chapter 2, System Modeling and System Identification Methods for Big Mechanisms in Biological Systems. Linear and nonlinear models and related systematic and mathematic analyses for biological networks are introduced to investigate the Big Mechanisms in Systems Biology. Then, least square parameter estimation and maximum likelihood parameter estimation methods are introduced to estimate the system parameters of biological networks using experimental data or high-throughput data [5]. Some stochastic testing, systems theory, and system order detection methods are provided for statistical inference and hypothesis testing of biological mechanisms in bioinformatics and systems biology [6,7].

    In Chapter 3, Procedure for Exploring Big Mechanisms of Systems Biology Through System Identification and Big Database Mining, how to construct Big Mechanisms in systems biology by integrating many smaller mechanisms from the dispersed heterogeneous omics data via system models and big database mining is introduced. Since the main domain of Big Mechanisms is systems biology, several examples, i.e., how to construct a gene regulatory network (GRN), protein–protein interaction network (PPIN), and integrated genetic and epigenetic networks (IGEN) from omics data, are given to illustrate the procedure for following big data mining processes and system identification procedures to elucidate system-based Big Mechanisms.

    The most remarkable features of cells and entire organisms are their ability to reproduce, and the cell cycle entails an ordered series of macromolecular events that lead to cell division and the production of two daughter cells. As a consequence, we choose Chapter 4, Big Cellular Mechanisms in the Cell Cycle by System Identification and Big Data Mining, to look at the regulation of transcription in the progression of the yeast cell cycle as an example for analyzing Big Cellular Mechanisms using big data mining processes and system identification procedures in yeast [8–10].

    Since the control of transcription factors (TFs) on their target genes is one of the most important cellular mechanisms, it is more appealing to identify the regulatory trans/cis mechanisms of TFs via dynamic regulatory models and genome-wide microarray data. Therefore, in Chapter 5, Big Regulatory Mechanisms in the Transcriptional Regulation Control of Gene Expression Using a Stochastic System Model and Genome-Wide Experimental Data, stochastic system models and experimental data are used to analyze Big Regulatory Mechanisms in transcription control [11–13].

    The cells sense extracellular signals through intercellular communication or stress responses, cellular responses to sudden environmental stresses or physiological changes that provide living organisms with the opportunity for survival and further development. Therefore, defending against environmental stresses with protective mechanisms is an important topic. In Chapter 6, Big Mechanisms of Information Flow in Cellular Systems in Response to Environmental Stress Signals via System Identification and Data Mining, cellular systems under environmental stresses are given as examples to identify Big Mechanisms of information flow and protective mechanisms in response to environmental stresses using system identification and database mining [14–17].

    The major task of the immune system is to defend the host against infections. Recently, a large variety of experimental techniques and high-throughput NGS data has been created and provides a sound scientific basis for integrative approaches to advanced, quantitative, and qualitative systems biology, using computation and modeling that will allow us to understand Big Mechanisms of complex immune responses. In Chapter 7, Big Offensive and Defensive Mechanisms in Systems Immunity From System Modeling and Big Data Mining, system modeling and big data mining are used to identify the big offensive and defensive mechanisms in the immune system [18–20].

    Regeneration is one of the most intriguing and fascinating biological phenomena, but the molecular and cellular bases of regeneration are still not fully understood. Unlike the vertebrates with high regenerative capacity such as amphibians and fish, mammals have a restricted ability to regenerate lost cells, tissues, and organs at the adult stage. Therefore, understanding the molecular mechanism of regeneration in organisms is not only valuable in shedding light on the longstanding question of regeneration mechanisms, but it is also useful for stem cell-based therapies in the future. In Chapter 8, Big Regeneration Mechanisms via Systems Biology and Big Database Mining Methods, Big Regeneration Mechanisms are given as an example of using systems biology and big database mining methods [21,22].

    Cancer is a genetic disease. Many screening arrays for finding new cancer genes have been constructed. Cancer is also a systemic disease. Therefore, tumorigenesis mechanisms should be investigated from a systems biology perspective. In Chapter 9, Big Tumorigenesis Mechanisms in Systems Cancer Biology via Big Database Mining and Network Modeling, cancer systems biology is used to identify Big Tumorigenesis Mechanisms through big database mining and network modeling [20,23–25].

    Living organisms are complex systems characterized by emergent properties. A ubiquitous property in complex systems is robustness. Robustness is not a trivial biological mechanism to be studied using a reductionism approach. Robust mechanisms are frequently and widely distributed, guaranteeing organization of the system and cooperation between various parts. Signal transductivity, a system property, is complementary to robustness, i.e., a biological system with greater network robustness will show greater loss of signal transductivity and vice versa. Since aging and cancer are both systems diseases of somatic cell evolution, it is appealing to investigate the network robustness and signal transductivity of somatic cells in the aging and carcinogenic processes. In Chapter 10, Big Evolutionary Mechanisms of Network Robustness and Signaling Transductivity in Aging and Carcinogenic Process by System Modeling and Database Mining, the Big Mechanisms in aging processes and carcinogenesis are given as examples of examining the mechanisms of network robustness and signaling transductivity through system modeling and database mining [26,27].

    Aging, an extremely complex and system-level process, has attracted much attention in medical research, especially as chronic diseases are quite prevalent in the elderly population. These may be the result of both accumulated genetic and epigenetic variations that lead to intrinsic perturbations and environmental changes that may stimulate signaling in the body. Decades of research have demonstrated that higher incidence of many pathophysiological phenotypes, such as cancer, inflammation, diabetes, cardiovascular diseases, and neurodegenerative disorders, are associated with the process of aging. In Chapter 11, Big Mechanisms of Aging via System Identification and Big Database Mining, we first investigate the systematic mechanism of GRN in the aging process by systems biology method via microarray data. Then we will investigate specific core genetic and epigenetic network for cellular mechanisms of human aging [28].

    In the past few decades, medicine has sought a magic bullet that targets a simple disease-causing molecule and that could cure cancer, diabetes, and other complex diseases. Although some drugs have proven successful, many others have been found to be ineffective or the cause of significant side effects. This disappointing outcome highlights the limitation of single target drug paradigms and is considered to be the underlying cause of stagnation in the productivity of pharmaceutical industry. However, compound efficacy and safety in humans, including toxicity and pharmacokinetic profiles, rather than target selection, are the criteria that determine which drug candidates enter the clinic. Therefore, drug discovery using database mining and systems biology will become the most important approach to drug discovery. Finally, in Chapter 12, Big Drug Design Mechanisms via Systems Biology and Big Database Mining, Big Drug Design Mechanisms are introduced as therapeutic treatments using systems biology and big database mining. At first, we will give an overview of drug discovery using systems biology, and then introduce how to investigate core and specific network markers for multiple drug targets from the perspective of systems biology. Finally, a cocktail drug design is also introduced for multiple drug targets [29,30].

    References

    1. A data science Big Mechanism for DARPA, <http://semanticommunity.info/>.

    2. Cohen PR. DARPA’s Big Mechanism program. Phys Biol. 2015;12(4):1–9.

    3. Chen B-S, Wu C-C. Systems biology: an integrated platform for bioinformatics, systems synthetic biology and systems metabolic engineering New York, NY: Nova Science Pub Inc.; 2014.

    4. Klipp E, Liebermeister W, Wierling C, Kowald A, Lehrach H, Herwig R. Systems biology Weinheim: Wiley; 2011.

    5. Johansson R. System modeling and identification Englewood Cliffs, NJ: Prentice Hall; 1993.

    6. Chen BS, Li CW. Constructing an integrated genetic and epigenetic cellular network for whole cellular mechanism using high-throughput next-generation sequencing data. BMC Syst Biol. 2016;10:18.

    7. Wang YC, Chen BS. Integrated cellular network of transcription regulations and protein-protein interactions. BMC Syst Biol. 2010;4:20.

    8. Chen HC, Lee HC, Lin TY, Li WH, Chen BS. Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle. Bioinformatics. 2004;20(12):1914–1927.

    9. Wu WS, Li WH, Chen BS. Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinformatics. 2006;7:421.

    10. Wu WS, Li WH, Chen BS. Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data. BMC Bioinformatics. 2007;8:188.

    11. Chang YH, Wang YC, Chen BS. Identification of transcription factor cooperativity via stochastic system model. Bioinformatics. 2006;22(18):2276–2282.

    12. Chang YH, Wang YC, Chen BS. Nonlinear dynamic trans/cis regulatory circuit for gene transcription via microarray data. Gene Regul Syst Bio. 2007;1:151–166.

    13. Lin LH, Lee HC, Li WH, Chen BS. Dynamic modeling of cis-regulatory circuits and gene expression prediction via cross-gene identification. BMC Bioinformatics. 2005;6:258.

    14. Li CW, Chen BS. Identifying functional mechanisms of gene and protein regulatory networks in response to a broader range of environmental stresses. Comp Funct Genomics. 2010;Article ID:408705 20 pages.

    15. Wu WS, Li WH, Chen BS. Reconstructing a network of stress-response regulators via dynamic system modeling of gene regulation. Gene Regul Syst Bio. 2008;2:53–62.

    16. Yao CW, Hsu BD, Chen BS. Constructing gene regulatory networks for long term photosynthetic light acclimation in Arabidopsis thaliana. BMC Bioinformatics. 2011;12:335.

    17. Przytycka TM, Kim YA. Network integration meets network dynamics. BMC Biol. 2010;8:48.

    18. Chen BS, Yang SK, Lan CY, Chuang YJ. A systems biology approach to construct the gene regulatory network of systemic inflammation via microarray and databases mining. BMC Med Genomics. 2008;1:46.

    19. Kuo ZY, Chuang YJ, Chao CC, Liu FC, Lan CY, Chen BS. Identification of infection- and defense-related genes via a dynamic host-pathogen interaction network Using a Candida albicans-zebrafish infection model. Journal of Innate Immunity. 2013;5(2):137–152.

    20. Chen BS, Li CW. Constructing an integrated genetic and epigenetic cellular network for whole cellular mechanism using high-throughput next-generation sequencing data. BMC Syst Biol. 2016;10(1):18.

    21. Wu CC, Lin C, Chen BS. Dynamic network-based relevance score reveals essential proteins and functional modules in directed differentiation. Stem Cells Int. 2015;2015:792843.

    22. Wu CC, Tsai TH, Chang C, et al. On the crucial cerebellar wound healing-related pathways and their cross-talks after traumatic brain injury in Danio rerio. PLoS One. 2014;9.

    23. Chu LH, Chen BS. Construction of a cancer-perturbed protein-protein interaction network for discovery of apoptosis drug targets. BMC Syst Biol. 2008;2:56.

    24. Li CW, Chu YH, Chen BS. Construction and clarification of dynamic gene regulatory network of cancer cell cycle via microarray data. Cancer Inform. 2006;2:223–241.

    25. Wang YC, Chen BS. A network-based biomarker approach for molecular investigation and diagnosis of lung cancer. BMC Med Genomics. 2011;4:2.

    26. Chen BS, Tsai KW, Li CW. Using nonlinear stochastic evolutionary game strategy to model an evolutionary biological network of organ carcinogenesis under a natural selection scheme. Evol Bioinform. 2015;11:155–178.

    27. Tu CT, Chen BS. On the increase in network robustness and decrease in network response ability during the aging process: a systems biology approach via microarray data. IEEE-ACM Trans Comput Biol Bioinform. 2013;10(2):468–480.

    28. Li CW, Wang WH, Chen BS. Investigating the specific core genetic-and-epigenetic networks of cellular mechanisms involved in human aging in peripheral blood mononuclear cells. Oncotarget. 2016;7(8):8556–8579.

    29. Chen BS, Li CW. Analysing microarray data in drug discovery using systems biology. Expert Opin Drug Discov. 2007;2(5):755–768.

    30. Wong YH, Lin CL, Chen TS, et al. Multiple target drug cocktail design for attacking the core network markers of four cancers using ligand-based and structure-based virtual screening methods. BMC Med Genomics. 2015;8(Suppl. 4):S4.

    Chapter 2

    System Modeling and System Identification Methods for Big Mechanisms in Biological Systems

    Abstract

    Big Mechanisms in systems biology employ big database mining, system modeling, and system identification technologies to integrate small mechanisms behind big and complex systems using high-throughput data. Therefore, system models and their parameter estimation methods are necessary to delete many false positives in big database mining to construct real gene regulatory networks (GRNs), protein–protein interaction networks (PPINs), or genetic and epigenetic networks (GENs) to investigate the Big Mechanisms in high-throughput experimental data. In this chapter, we introduce several system identification methods for constructing GRN, PPIN, or GEN from high-throughput data. Further, the principal network projection method is introduced based on using principal component analysis to obtain the corresponding core network to investigate the most significant mechanism of biological systems.

    Keywords

    System modeling; system identification; gene regulatory network (GRN); protein–protein interaction network (PPIN); genetic and epigenetic network (GEN); principal network projection (PNP)

    Introduction

    Consider the cellular system as shown in Fig. 2.1. Through intercellular communication and cellular stress responses, the cells sense extracellular signals. Different external changes or events may induce signaling in cells. Typical signals are hormones, pheromones, pathogens, heat, cold, light, osmotic pressure, and changes in the appearance or concentration of substances such as glucose, potassium ions, calcium ions, or cyclic adenosine monophosphate (cAMP) [1–7]. In the signal transduction pathway, extracellular signals are perceived by transmembrane receptors. The receptor changes its own state from susceptible to active and then it triggers subsequent cellular processes.

    Figure 2.1 An integrated signal transduction pathway (i.e., protein–protein interaction network (PPIN), gene regulatory network (GRN), and miRNA regulatory network).

    The active receptor stimulates an internal signaling cascade. This cascade frequently includes a series of changes in protein phosphorylation states. The sequence of state changes crosses the nuclear membrane. Eventually, some transcription factors (TFs) are activated or deactivated, which change their ability to bind a set of genes that produce the corresponding proteins in response to extracellular signals or stresses. In addition to genetic regulation by TFs, epigenetic regulation due to DNA methylation and microRNAs (miRNA) can also alter gene expression. However, owing to the complex nature of dynamic biological systems, knowledge of their components and interactions is not sufficient to interpret behaviors in the cellular system. In this book, as shown in Fig. 1.1, we employ big database mining to construct candidate-integrated genetic and epigenetic networks (GENs) to interpret some cellular mechanisms for some biological conditions. However, there exists a large amount of false positives in a big database mining process. Therefore, it is very important to prune these false positive regulations and interactions from the candidate GEN by real high-throughput experimental data. We need to model interactions and regulations in the candidate GEN and then prune these false positive interactions and regulations to obtain the real GEN through system modeling and identification methods by microarray data or next generation sequencing (NGS) data.

    Therefore, in this chapter, we will describe the models for these cellular systems and use experimental data, such as microarray data, NGS data, or real-time polymerase chain reaction (PCR) data, to identify the parameters of the model that will allow us to efficiently and accurately integrate this data into the Big Mechanism, and to interpret the behaviors of cellular systems by system modeling and identification methods. To begin, we will model and identify the protein–protein interaction network (PPIN) and gene regulatory network (GRN) by protein expression and microarray data, respectively. Then, these genetic networks will be integrated with epigenetic regulation by miRNA expression data to construct an integrated genetic and epigenetic cellular network. Further, core network extracted by applying principal network projection (PNP) to the integrated genetic and epigenetic cellular network is also introduced in this chapter to investigate significant network mechanism by principal component analysis (PCA) via singular value decomposition (SVA) method.

    For the convenience of illustration, only linear system models are introduced for system identification of biological networks to investigate the Big Mechanisms through genome-wide high-throughput data. When necessary, the system identification methods based on nonlinear system models will be introduced in the following chapters.

    2.1 Dynamic System Models and Their Parameter Estimation by Time-Profile Experimental Data

    For the simplicity of analysis, we first discuss only the coupled signal transduction pathways shown in Fig. 2.2, which is extracted from Fig. 2.1. In Fig. 2.2, yi(t) denotes the expression level of the ith protein in the coupled signal transduction pathways. ui(t) denotes extracellular signals. Different external changes or events outside the cell may stimulate signaling. yi(t) denotes the expression level of the ith protein in the transduction pathways. In the flow chart of the coupled signal transduction pathways in Fig. 2.2, y42(t), y43(t), y44(t), y45(t), y46(t), y47(t),… represent the expression levels of the terminal TFs in the pathways as a PPIN.

    Figure 2.2 The coupled signal transduction pathways, i.e., protein–protein interaction network, or PPIN, extracted from Fig. 2.1.

    2.1.1 Dynamic model and parameter estimation for PPIN

    For the purpose of system identification in the flow of signal transduction pathways in Fig. 2.2, the simple linear dynamic regression model for the expression level of the ith protein at time t+1 can be described as follows:

    (2.1)

    where yi(t) indicates the expression level of the ith protein at the present time t; ci,j denotes the interaction ability between protein i and protein j; λi denotes the degradation rate of protein i; hi denotes the basal level of the ith protein; bi,j denotes the binding ability of extracellular signal uj to the ith protein. In general, extracellular signals always bind to the receptor proteins on the membrane. If the ith protein is not bound by extracellular signal uj(t), then bi,j=0. wi(t) denotes model residue or noise.

    Let us denote the state vector and system matrices of the PPINs from Fig. 2.2 in Eq. (2.1) as follows:

    where N is the total number of proteins and l is the total number of extracellular signals. Then, the PPI network of the coupled signal transduction pathways in Fig. 2.2 is represented by the following discrete-time dynamic equation

    (2.2)

    To exploit the effect of extracellular signals u(t) on the coupled signal transduction pathways, the effect of basal level H and noise w(t) should be extracted from the dynamic network in Eq. (2.2). Without consideration of the extracellular signal u(t), the pathway network in Eq. (2.2) can be represented by

    (2.3)

    which is only the effect of basal level H and noise w(t).

    and subtract Eq. (2.3) from (2.2). We then get

    (2.4)

    in Fig. 2.2.

    . The solution of the dynamic signal transduction system in Eq. (2.4) is given by the following signal flow equation [8]

    (2.5)

    If we want to know the signal flow from u(t) to any protein, then the coupled signal transduction dynamic equation in Eq. (2.4) should be represented by

    (2.6)

    denotes the expression of the ith protein due to extracellular signal u(t) and Di is a row vector with all zeros except 1 at the ith element.

    Therefore, the signal flow from u(t) to the iin Eq. (2.6) in the signal transduction dynamic equation in Eq. (2.6) is given by

    (2.7)

    Remark 2.1

    1. Considering basal level and noise, the solution of the dynamic equation of coupled transduction pathways in Eq. (2.2) is given by

    (2.8)

    where y(0) denotes the initial condition. In this case, the expression of the ith protein in Eq. (2.7) should be modified as

    (2.9)

    2. In the continuous dynamic case, Eq. (2.2) should be modified as

    (2.10)

    with the solution as follows [9]:

    (2.11)

    in which the signal flow from u(t) to the ith protein is given by

    (2.12)

    3. In nonlinear biological systems, Eqs. (2.2) and (2.10) should be modified as

    (2.13)

    and

    (2.14)

    .

    4. In the metabolic pathways in Fig. 2.2, the stoichiometric description of the flux network based on a balanced equation can be also modeled as Eq. (2.2) or (2.10) in linear dynamic systems or Eq. (2.13) or (2.14) in nonlinear dynamic systems.

    Based on the above analyses, if the system matrices C, H, and B in Eq. (2.2) or (2.10) can be identified from time-profile microarray data or protein microarray data, behavior and system characteristics like robustness, sensitivity, and transductivity of the biological network can be further analyzed. Therefore, we will follow this with an estimation of system matrices of the coupled signal transduction pathways in Eq. (2.2) from time-profile microarray data as an illustrative example of system parameter estimation of the PPIN. In general, we do not identify C, H, and B from Eq. (2.2) directly owing to the complex computation involved, with roundoff errors in the parameter estimation process. We want to estimate these parameters protein-by-protein from Eq. (2.1). From Eq. (2.1), they can be represented by the following linear regression form [10]

    (2.15)

    to be estimated.

    from the following M for t=1,…,M, i.e.,

    (2.16)

    , as follows:

    (2.17)

    Finally, the resulting parameter estimation model for linear regression model is

    (2.18)

    to minimize the following square error [8]

    (2.19)

    to minimize the square error in Eq. (2.19) is obtained as [8]

    (2.20)

    .

    If the constraints of parameter λi≥0, and hi≥0, i.e.,

    , need to be considered in the parameter estimation procedure, the constrained least squares parameter estimation method should be considered, which can be solved by using the optimization toolbox in Matlab [11,12].

    The least squares parameter estimation algorithm in Eq. (2.20) for the dynamic model (2.1) of the ith protein is equivalent to the following recursive least squares parameter estimation algorithm [8]

    (2.21)

    are given.

    This recursive least square parameter estimation algorithm uses time-profile microarray data or real-time PCR data to update parameters protein-by-protein. Therefore, it can be used for real-time parameter estimation. If the number of time-profile data yi(t) is small, we can repeat several rounds of the recursive parameter estimation algorithm in for i=1,…,N of the transduction pathways in Eq. (2.2) as follows:

    (2.22)

    in Eq. (2.2), respectively.

    Remark 2.2

    1. The parameter estimation problem in Eq. (2.15) or (2.18) can be also solved by the maximum likelihood estimation (MLE) method [8]. The MLE parameter estimation method will be introduced in the following chapters. However, in the case of Gaussian noise, the MLE method is equivalent to the constrained least square estimation method using the optimization toolbox in Matlab [11,12].

    2. Since the coupled signal transduction network in Fig. 2.2 may be constructed by big database mining, there may be many false positive protein interactions in this candidate pathway. After the estimation by the least square parameter estimation algorithm in Eq. (2.20) or (2.21), we will prune these false positive protein interactions by the system order (interaction number in our case) detection method, i.e., the insignificant parameters in the system order will be removed from the coupled signal transduction pathways. The parameter estimation error variance and model complexity can be included in one statistic by the Akaike information criterion (AIC) as follows [8]:

    (2.23)

    and the second term is related to the model complexity; M ) denotes the number of interaction parameters in Eq. (2.1) or (2.15) for the ican minimize AICmust be removed to prune the false positive protein interactions in Fig. 2.2 by the system identification method using microarray data.

    Remark 2.3

    If the protein interaction in Eq. (2.1) is of the following nonlinear dynamic model

    (2.24)

    denote the nonlinear interactions of protein i with other proteins yj(t) and extracellular signals uk(t), respectively, then the least square estimation algorithm in Eq. (2.20) or (2.21) can be also employed for parameter estimation.

    2.1.2 Dynamic models and their parameter estimation by time-profile microarray data for GRNs

    Suppose the microarray data is used for the system identification of candidate GRNs in Fig. 2.3, which is also extracted from Fig. 2.1 for the simplicity of analysis. The regulatory dynamics of the ith gene can be represented by the following regressive model

    (2.25)

    denotes the gene expression level of the ith gene at time tdenotes the regulatory ability of gene j on gene idenotes the degradation rate of mRNA of the idenotes the mRNA basal level, which represents regulations other than genetic regulations, such as epigenetic regulations.

    Figure 2.3 The candidate gene regulatory network (GRN) extracted from Fig. 2.1.

    In general, the regulation of the jth gene on the ith gene must be through its translational protein yi(t) (the TF yi(t) of the ith gene). However, if we only want to study the GRN among genes, the translational process of proteins to TFs is neglected and the regulatory ability ai,j has considered the effect of the translational process. Further, the epigenetic regulation of miRNAs on genes is also neglected here. The regulation of genes by TFs and miRNAs will be discussed in Section 2.3 in detail.

    means the jmeans the jand ki, for i=1,…,n, in Eq. (2.25) can be identified with the constraint ki, for i=1,…,n [7]. Similarly, after parameter estimation, the AIC in Eq. (2.23) can be also employed to prune the false positive regulators in Eq. (2.25) if the candidate GRN in Fig. 2.3 is constructed through big database mining.

    Remark 2.4

    If the gene regulatory functions in Eq. (2.25) are of the following nonlinear form

    (2.26)

    denotes the gene regulatory function from gene j to gene i, then the least square parameter estimation algorithm in Eq. (2.20) or (2.21) can be also employed for system identification of Eq. (2.26) through the corresponding time-profile microarray data. Similarly, after parameter estimation, the AIC in Eq. (2.23) can be used to prune the false positive regulations in the candidate GRN in Fig. 2.3 from big data mining to obtain the real GRN in a specific biological condition.

    Therefore, based on the regulatory dynamic in Eq. (2.25), the GRN in Fig. 2.3 can be shown as follows:

    (2.27)

    which can be represented by

    (2.28)

    The solution of the state dynamic equation in Eq. (2.28) is given by

    Suppose we want to calculate the regulatory information flow from gene j to gene i. In general, it is difficult to solve the regulatory information problem for the GRN in Fig. 2.3 from the graph theory perspective [13], especially for a digraph (directed graph). In the GRN case, an input/output state space method is proposed to solve this difficult information flow problem of the diagraph as follows. First, the dynamic model of a GRN in Eq. (2.27) can be represented by the following input/output dynamic state space equation [14]

    (2.29)

    In the input/output dynamic state space system for the GRN in as an output signal. Therefore, Eq. (2.29) is simply represented by

    (2.30)

    which is similar to the signal transduction dynamic equation in .

    Using systems theory [15], the regulatory signal flow from gene j to gene i in the GRN as shown in Fig. 2.3 is given by

    (2.31)

    2.2 Static Models and Their Parameter Estimation by Sample Microarray Data

    2.2.1 Static modeling and parameter identification of PPIN by sample microarray data

    If the available microarray data for measuring coupled signaling pathways in Fig. 2.2 are of one time point microarray from K different samples (e.g., K patients), the regression model for the protein expression of the ith protein cannot be represented by the discrete-time dynamic equation in Eq. (2.2), but can be represented by the following linear static regression form

    (2.32)

    where C and B are defined similarly in Eq. (2.2) but with ci,igiven as follows:

    , denote the sample of microarray data. In this situation,

    (2.33)

    From Eq. (2.33), it can be seen that the transduction function T from extracellular signal u to protein expression level y is

    (2.34)

    If we want to estimate the signal transduction function T in Eq. (2.34), we need to estimate the system matrices C and B in Eq. (2.32) from the sample microarray data. From the linear regression model in Eq. (2.32), we get

    (2.35)

    Similarly, the least square parameter estimation algorithm with the constraint hias follows:

    (2.36)

    . The problem can be solved using the optimization toolbox in Matlab [11,12].

    of the ith protein in Eq. (2.35) with K sample microarray is given by [8]

    (2.37)

    If the sample number K of the microarray data is small, we can repeat several rounds of the recursive parameter estimation algorithm in for i=1,…,M, are estimated by the recursive least square estimation algorithm in Eq. (2.37), we can use AIC in Eq. (2.23) for interaction number detection to prune the false positive interactions and obtain the true linear static model in Eq. (2.32). Then, the signal transduction function T to the iin Fig. 2.2, then T in Eq. (2.34) should be modified as

    (2.38)

    is defined in Eqs. (2.29) and (2.30), i.e., a row vector with zero at all elements except 1 at the ith element. The information transduced from the jth extracellular signal uj(t) to the ith protein is given by

    (2.39)

    where Bj is defined in Eq. (2.30), i.e., a column vector with zero at all elements except 1 at the jth element.

    2.2.2 Static modeling and parameter identification in the GRN by sample microarray data

    If the microarray data measuring the regulatory information flow in the GRN in Fig. 2.3 are from one time point and K different samples, then the regression model for the regulation of the ith gene in Eq. (2.25) is modified to the following

    (2.40)

    denote the gene expression levels of the GRN in Fig. 2.3 at the kth sample microarray. Therefore, the whole GRN in Fig. 2.3 can be represented by

    (2.41)

    where

    Suppose we want to calculate the regulatory information flow from gene j to gene i in the GRN in Fig. 2.3. Then, Eq. (2.41) needs to be rearranged as

    (2.42)

    are defined in Eqs. (2.29) and (2.30). From Eq. (2.42), we can obtain the regulatory information flow from gene j, to gene i, as follows

    (2.43)

    Hence, the regulatory information flow equation from gene j to gene i is given by

    (2.44)

    , and ki in Eq. (2.40) by the least square parameter estimation algorithm with the constraint ki≥0 in Eq. (2.36) or (2.37) by microarray with K samples for all genes in the GRN, we can identify the regulatory parameter A in Eq. (2.41) and then determine Aj, Bj, and Di in Eq. (2.44) between any two genes in the GRN.

    After the model and parameters of the PPIN in Fig. 2.2 and GRN in Fig. 2.3 are identified by microarray data, as described in the sections above, the PPIN and GRN should be integrated as a genetic and epigenetic cellular network as seen in Fig. 2.1.

    2.3 Modeling and Identification of Integrated Genetic and Epigenetic Cellular Networks

    2.3.1 Dynamic system model for integrated genetic and epigenetic cellular networks

    The signal transduction pathways, GRNs, metabolic pathways, and epigenetic regulatory networks are always integrated as a genetic and epigenetic cellular network as shown in Fig. 2.1. In this situation, the translation from mRNA to protein needs to be considered at the protein expression level, and the protein interaction equation of the ith protein in Eq. (2.1) should be modified as

    (2.45)

    .

    Similarly, if the gene i miRNAs, then the gene regulatory equation of the ith gene in Eq. (2.25) should be modified as

    (2.46)

    where ai,j denotes the regulation from TF j , whose expression equation is described by the following

    (2.47)

    denotes the degradation rate of the ldenotes the loss of the lth miRNA for gene regulation in Eq. (2.46).

    Remark 2.5

    in in the gene regulation dynamic equation in Eq. (2.46).

    2. After the integrated genetic and epigenetic cellular network is modeled as Eqs. (2.45)–(2.47), the parameter estimation procedures are similar to Eq. (2.20) or (2.21) if these interactive dynamic equations are formulated as the regression form in Eq. (2.15); e.g.,

    (2.48)

    parameters from the least square estimation algorithm in Eq. (2.20) or (2.21). Similarly, the system order detection scheme AIC in Eq. (2.23) can also be employed to prune the false positives by removing the insignificant parameters out of the system order.

    2.3.2 Static system model for integrated genetic and epigenetic cellular networks

    If the experimental data for integrated genetic and epigenetic cellular networks are one time point data from different samples, then the dynamic genetic and epigenetic model in Eqs. (2.45)–(2.47) should be modified to the following static model for more suitable system parameter identification

    (2.49)

    where K denotes the number of sample data points.

    These static models of integrated genetic and epigenetic regulation in Eq. (2.49) can be transformed to the regression form as seen in Eq. (2.35) for parameter estimation by the least square parameter estimation algorithm in Eq. (2.36) or (2.37) through the corresponding NGS data. Similarly, after parameter estimation, the system order detection scheme AIC in Eq. (2.23) can be employed to prune the false positives by removing the insignificant parameters out of the system order to obtain the real integrated genetic and epigenetic cellular network from candidate-integrated networks through big database mining and NGS data.

    2.4 The Core Network by PNP of the Integrated Genetic and Epigenetic Cellular Network Using PCA

    If the genetic and epigenetic cellular network is still very complex after database mining and system identification using NGS data have been employed to construct the real integrated network in Eqs. (2.45)–(2.47) for the dynamic model or Eq. (2.49) for the static model, it is still not easy to gain insight into the Big Mechanisms of systems biology. In this situation, the PNP is introduced to extract the most significant integrated genetic and epigenetic cellular network as the core network. Let the combined network matrix of the integrated genetic and epigenetic cellular network model in Eqs. (2.45)–(2.47) or in Eq. (2.49) be represented by:

    (2.50)

    where ci,j denotes the interaction ability between proteins i and j, ai,j denotes the regulatory ability of gene j on gene idenotes the inhibitive epigenetic regulation of miRNA l on gene j in Eq. (2.49). PCA is the linear transformation of H as the following SVD:

    (2.51)

    are the i, respectively. The diagonal entries of the diagonal matrix, D=diag(d1, …, dm, …, dN), are the N singular values of H in descending order, i.e., d1≥…≥dm≥…≥dN. The eigen expression fraction (Em) is defined as

    (2.52)

    For example, we select the top M singular vectors of V with the minimal M, so that the M principal components contain 85% of the GEN from an energy point of view. Then, the principal projections of H to the top M singular vectors of V, or the similarities, are defined respectively as follows:

    (2.53)

    where hk denotes the kth row vector of H in Eq. (2.50) and vmT denotes the mth singular vector of V.

    We further define the 2-norm distance from each gene to the top M singular vectors.

    (2.54)

    If D(k) is close to zero, this implies that the kth node of the biological network is independent of the top M singular vectors. Thus, we define three thresholds, th1, th2, and th3, to identify the respective core genes, D(k)≥th1 for k=1,…, N, and the core proteins, D(k)≥th2 for k=N+1,…, 2N, and the core miRNAs D(k)≥th3 for k=2N+1,…, 2N+L, in the core network of the integrated genetic and epigenetic cellular network, which has the principal structure (or the so-called core) of the network.

    References

    1. Dibner C, Schibler U, Albrecht U. The mammalian circadian timing system: organization and coordination of central and peripheral clocks. Annu Rev Physiol. 2010;72:517–549.

    2. Dodd AN, Kudla J, Sanders D. The language of calcium signaling. Annu Rev Plant Biol. 2010;61:593–620.

    3. Kim EK, Choi EJ. Pathological roles of MAPK signaling pathways in human diseases. Biochim Biophys Acta. 2010;1802(4):396–405.

    4. Kim TH, Bohmer M, Hu H, Nishimura N, Schroeder JI. Guard cell signal transduction network: advances in understanding abscisic acid, CO2, and Ca²+ signaling. Annu Rev Plant Biol. 2010;61:561–591.

    5. Leung AK, Sharp PA. MicroRNA functions in stress responses. Mol Cell. 2010;40(2):205–215.

    6. Mosenden R, Tasken K. Cyclic AMP-mediated immune regulation—overview of mechanisms of action in T cells. Cell Signal. 2011;23(6):1009–1016.

    7. Klipp E. Systems biology in practice: concepts, implementation and application Weinheim: Wiley-VCH; 2005.

    8. Johansson R. System modeling and identification Englewood Cliffs, NJ: Prentice Hall; 1993.

    9. Boyd SP. Linear matrix inequalities in system and control theory Philadelphia, PA: Society for Industrial and Applied Mathematics; 1994.

    10. Chen B-S, Wu C-C. Systems biology: an integrated platform for bioinformatics, systems synthetic biology and systems metabolic engineering New York, NY: Nova Science Pub Inc.; 2014.

    11. Coleman TF, Li YY. A reflective Newton method for minimizing a quadratic function subject to bounds on some of the variables. Siam J Optim. 1996;6(4):1040–1058.

    12. Gill PE, Murray W, Wright MH. Practical optimization San Diego, CA: Academic Press; 1986.

    13. Kreyszig E. Advanced engineering mathematics 7th ed. New York, NY: Wiley; 1993.

    14. Chen B-S, Li C-W. Measuring information flow in cellular networks by the systems biology method through microarray data. Front Plant Sci. 2015;6:390.

    15. Ogata K. Discrete-time control systems Englewood Cliffs, NJ: Prentice-Hall; 1987.

    Chapter 3

    Procedure for Exploring Big Mechanisms of Systems Biology Through System Identification and Big Database Mining

    Abstract

    In order to improve the understanding of Big Mechanisms in systems biology, several procedures are described in this chapter to illustrate how to explore Big Mechanisms of systems biology through system modeling, system identification, and big database mining of high-throughput microarray data and next generation sequencing data. The procedures for extracting Big Mechanisms based on gene regulatory network (GRN), protein–protein interaction network (PPIN), integrated GRN and PPIN, and integrated genetic and epigenetic network are introduced in this chapter to offer scientists an useful method for enhancing their understanding of Big Mechanisms mediating life processes.

    Keywords

    Big Mechanisms; GRN; PPIN; integrated GRN and PPIN; integrated GEN

    Introduction

    After introducing system modeling and system identification methods of biological systems in Chapter 2, System Modeling and System Identification Methods for Big Mechanisms in Biological Systems, some examples and procedures on how to extract Big Mechanisms of biological systems by system identification and big database mining methods using genome-wide high-throughput data are introduced in this chapter. First, we introduce Big Mechanisms of gene regulatory networks (GRNs) and protein–protein interaction networks (PPINs) by

    Enjoying the preview?
    Page 1 of 1