Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Deep Learning in Bioinformatics: Techniques and Applications in Practice
Deep Learning in Bioinformatics: Techniques and Applications in Practice
Deep Learning in Bioinformatics: Techniques and Applications in Practice
Ebook670 pages6 hours

Deep Learning in Bioinformatics: Techniques and Applications in Practice

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Deep Learning in Bioinformatics: Techniques and Applications in Practice introduces the topic in an easy-to-understand way, exploring how it can be utilized for addressing important problems in bioinformatics, including drug discovery, de novo molecular design, sequence analysis, protein structure prediction, gene expression regulation, protein classification, biomedical image processing and diagnosis, biomolecule interaction prediction, and in systems biology. The book also presents theoretical and practical successes of deep learning in bioinformatics, pointing out problems and suggesting future research directions. Dr. Izadkhah provides valuable insights and will help researchers use deep learning techniques in their biological and bioinformatics studies.
  • Introduces deep learning in an easy-to-understand way
  • Presents how deep learning can be utilized for addressing some important problems in bioinformatics
  • Presents the state-of-the-art algorithms in deep learning and bioinformatics
  • Introduces deep learning libraries in bioinformatics
LanguageEnglish
Release dateJan 8, 2022
ISBN9780128238363
Deep Learning in Bioinformatics: Techniques and Applications in Practice
Author

Habib Izadkhah

Dr. Habib Izadkhah is an Associate Professor at the Department of Computer Science, University of Tabriz, Iran. He worked in the industry for a decade as a software engineer before becoming an academic. His research interests include algorithms and graphs, software engineering, and bioinformatics. More recently he has been working on the developing and applying Deep Learning to a variety of problems, dealing with biomedical images, speech recognition, text understanding, and generative models. He has contributed to various research projects, authored a number of research papers in international conferences, workshops, and journals, and also has written five books, including Source Code Modularization: Theory and Techniques from Springer.

Related to Deep Learning in Bioinformatics

Related ebooks

Science & Mathematics For You

View More

Related articles

Related categories

Reviews for Deep Learning in Bioinformatics

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Deep Learning in Bioinformatics - Habib Izadkhah

    Preface

    Habib Izadkhah     

    Artificial Intelligence, Machine Learning, Deep Learning, and Big Data have become the latest hot buzzwords, Deep learning and bioinformatics being two of the hottest areas of contemporary research. Deep learning, as an emerging branch from machine learning, is a good solution for big data analytics. Deep learning methods have been extensively applied to various fields of science and engineering, including computer vision, speech recognition, natural language processing, social network analyzing, and bioinformatics, where they have produced results comparable to and in some cases superior to domain experts. A vital value of deep learning is the analysis and learning of massive amounts of data, making it a valuable method for Big Data Analytics.

    Bioinformatics research comes into an era of Big Data. With increasing data in biology, it is expected that deep learning will become increasingly important in the field and will be utilized in a vast majority of analysis problems. Mining potential value in biological data for researchers and the health care domain has great significance. Deep learning, which is especially formidable in handling big data, shows outstanding performance in biological data processing.

    To practice deep learning, you need to have a basic understanding of the Python ecosystem. Python is a versatile language that offers a large number of libraries and features that are helpful for Artificial Intelligence and Machine Learning in particular, and, of course, you do not need to learn all of these libraries and features to work with deep learning. In this book, I first give you the necessary Python background knowledge to study deep learning. Then, I introduce deep learning in an easy to understand and use way, and also explore how deep learning can be utilized for addressing several important problems in bioinformatics, including drug discovery, de novo molecular design, protein structure prediction, gene expression regulation, protein sequence classification, and biomedical image processing. Through real-world case studies and working examples, you'll discover various methods and strategies for building deep neural networks using the Keras library. The book will give you all the practical information available on the bioinformatics domain, including the best practices. I believe that this book will provide valuable insights for a successful career and will help graduate students, researchers, applied bioinformaticians working in the industry and academia to use deep learning techniques in their biological and bioinformatics studies as a starting point.

    This book

    •  provides necessary Python background for practicing deep learning,

    •  introduces deep learning in a convenient way,

    •  provides the most practical information available on the domain to build efficient deep learning models,

    •  presents how deep learning can be utilized for addressing several important problems in bioinformatics,

    •  explores the legendary deep learning architectures, including convolutional and recurrent neural networks, for bioinformatics,

    •  discusses deep learning challenges and suggestions.

    Chapter 1: Why life science?

    Abstract

    This chapter discusses deep learning applications, which show that deep learning is embedded in all aspects of our lives. In fact, people deal with these applications on a daily basis. The chapter will then point out why deep learning is important in bioinformatics. Finally, the concepts presented in all chapters of the book will be reviewed.

    Keywords

    Bioinformatics; Life Science; Deep Learning Applications

    1.1 Introduction

    There are many paths which people can follow based on their technical desires and interests in data. Due to the availability of massive data in recent years, biomedical studies have drawn a great deal of attention. The advent of modern medicine has transformed many fundamental aspects of human life. Over the past 20 years, there have been innovations affecting the lives of many people. Not so long ago, HIV/AIDS was considered a fatal disease. The ongoing development of antiviral treatments has significantly increased the life expectancy of patients in developed countries. Other diseases such as hepatitis C, which was not effectively treatable a decade ago, can now be treated. Genetic breakthroughs have brought about high hopes for the treatment of different diseases. Innovation in diagnosis and availability of precision tools enable physicians to diagnose and target a special disease in the human body. Many of these breakthroughs have used and will benefit from computational methods.

    1.2 Why deep learning?

    Living in the golden era of machine learning, we are now experiencing a revolution directed by machine learning programs.

    In today's world, machine learning algorithms are indispensable to every process ranging from prediction to financial services. As a matter of fact, machine learning is a modern human invention that has not only led to developments in industries and different businesses but also left a significant footprint on the individual lives of humans. Scientists are developing certain algorithms which enable digital assistants (e.g., Amazon Echo and Google Home) to speak well. There have also been notable advances in psychologist robots.

    Sentiment analysis is another modern application of machine learning. This is the process of determining a speaker's or an author's attitudes or beliefs. Machine learning developments have allowed for multilingual translation. In addition to the daily life, machine learning has affected many areas of physical sciences and other aspects of life. The algorithms are employed for different purposes, ranging from the identification of new galaxies through telescopic images to the classification of subatomic reactions in the Large Hardon Collider.

    The development of a class of machine learning methods, known as deep neural networks, has contributed to these technological advances. Although the technological infrastructure of artificial neural networks was developed in the 1950s and modified in the 1980s, the real power of this technique was not totally perceived until the recent decade, in which many breakthroughs have been achieved in computer hardware. While Chapters 3 and 4 give a more comprehensive review of neural networks, and a deep neural network (deep learning) is presented in the subsequent chapters of the book, it is important to know about some of the breakthroughs achieved with deep learning first.

    A common application of deep learning is image recognition. Using deep learning for facial recognition includes a wide range of applications from security areas and cell phone unlocking methods to automated tagging of individuals who are present in an image. Companies now seek to use this feature to set up the process of making purchases without the need for credit cards. For instance, have you noticed that Facebook has developed an extraordinary feature that lets you know about the presence of your friends in your photos? Facebook used to make you click on photos and type your friends' names to tag them. However, as soon as a photo is uploaded, Facebook now does the magic and tags everybody for you. This technology is called facial recognition.

    Deep learning can also be utilized to restore images or eliminate their noise. This feature of machine learning is also employed in different security areas, identification of criminals, and quality enhancement of a family photo or a medical image. Producing fake images is also another feature of deep learning. In fact, deep learning algorithms are able to generate new images of people's faces, objects, and even sceneries that have never existed. These images are utilized in graphic design, video game development, and movie production.

    Leading to a plethora of applications for users, many of the similar deep learning developments are now employed in bioinformatics and biomedicine to classify tumor cells into various categories. Given the scarcity of medical data, fake images can be produced to generate new data.

    Deep learning has also resulted in many speech recognition developments that have become pervasive in search engines, cell phones, computers, TV sets, and other online devices everywhere.

    So far, various speech recognition technologies have been developed, such as Alexa, Cortana, Google Assistant, and Siri, changing human interactions with devices, homes, cars, and jobs. Through the speech recognition technology, it is possible to talk with computers and devices, which can also understand what the speech means and can make a response. Introducing voice-controlled or digital assistants into the speech recognition market has changed the outlook of this technology in the 21st century.

    Analyzing its user's behavior, a recommender system suggests the most appropriate items (e.g., data, information, and goods). Helping users find their targets faster, this system is an approach proposed to deal with the problems caused by the growingly massive amount of information. Many companies that have extensive websites now employ recommender systems to facilitate their processes. Given different preferences of various users at different ages, there is no doubt that users select different products; thus, recommender systems should yield various results accordingly. Recommender systems have significant effects on the revenues of different companies. If employed correctly, these systems can bring about high profitability for companies. For instance, Netflix has announced that 60% of DVDs rented by users are provided through recommender systems, which can greatly affect user choices of films.

    Recommender systems can also be employed to prescribe appropriate medicines for patients. In fact, prescribing the right medicines for patients is among the most important processes of their treatments, for which accurate decisions must be made based on patients' current conditions, history, and symptoms. In many cases, patients may need more than one medicine or new medicines for another condition in addition to a previous disease. Such cases increase the chances of medical error in the prescription of medicines and the incidence of side effects of medicine misuse.

    These are only a few innovations achieved through the use of deep learning methods in bioinformatics. Ranging from medical diagnosis and tumor detection to production and prescription of customized medicines based on a specific genome, deep learning has attracted many large pharmaceutical and medical companies. Many deep learning ideas used in bioinformatics are inspired by the conventional applications of deep learning.

    We are living in an interesting era when there is a convergence of biological data and the extensive scientific methods of processing that kind of data. Those who can combine data with novel methods to learn from data patterns can achieve significant scientific breakthroughs.

    1.3 Contemporary life science is about data

    As discussed earlier, the fundamental nature of life sciences has changed. The large-scale use of machine experiments has significantly increased the amount of producible experimental data. For instance, signal processing and 3D imaging in empirical molecular biology can result in a large amount of raw information. In the 1980s, a biologist would conduct an experiment and draw a conclusion. This experiment would lack a sufficient amount of data because of computational limitations. In addition, the experimental data would not be made available to others due to the absence of extensive communication tools. However, modern biology benefits from a mechanism which can generate millions of experimental data in one or two days. Furthermore, experiments such as gene sequencing, which can generate massive datasets, have become inexpensive and easy to access.

    Advances in gene sequencing can produce the databases which attribute a person's genetic code to a multitude of health-related outcomes, including diabetes, cancer, and genetic diseases such as cystic fibrosis. Employing computational techniques for the analysis and extraction of data, scientists are now perceiving the causes of these diseases correctly in order to develop novel treatment methods.

    The disciplines which used to basically rely on human observations now benefit from the datasets that cannot easily be analyzed manually due to their massive dimensions. Machine learning is now usually used for image classification. The outputs of these machine learning models are employed to detect and classify cancerous tumors and evaluate the effects of potential treatments for a disease.

    Advances in empirical techniques have resulted in the development of several databases which list the structures of chemicals and their effects on a wide range of processes or biological activities. These structure–activity relationships (SARs) lay the foundations for a discipline that is known as cheminformatics. Scientists use the data of these large datasets to develop predictive models. Moreover, making good and rapid decisions in the field of medicine can lead to the identification and optimization of problems.

    The huge amount of data requires a new generation of scientists who are competent in both scientific and computational areas. Those who possess these combinatorial skills will have the potential to work on the structures and procedures for big datasets and make scientific discoveries.

    Bioinformatics is an interdisciplinary science that includes methods and software for understanding biological information. Bioinformatics uses a combination of computer science, statistics, and mathematics to analyze and interpret biological information. In other words, bioinformatics is used to analyze biological problems using computer algorithms, mathematical and statistical techniques.

    1.4 Deep learning and bioinformatics

    Deep learning with successful experimental results and wide applications has the potential to change the future of medical science. Today, the use of artificial intelligence has become increasingly common and is used in various fields such as cancer diagnosis. Deep learning also enables computer vision, imaging, and more accurate medical diagnosis. So it is no surprise that a report from Report Linker states that the market for artificial intelligence in the medical industry is expected to grow from $1.2 billion in 2018 to $26 billion in 2025!

    Deep learning: the future of medical science

    As deep learning has become so popular in the industry, the question arises as to how it will affect our lives in the next few years. In medicine, although we have stored large amounts of patient data over the past few years, deep learning has so far been used to analyze image or text data. In addition, deep learning has recently been used to predict a wide range of problems and clinical outcomes. Deep learning will have a wonderful future in medicine.

    Today's interest in deep learning in medicine stems from two factors. First, the growth of deep learning techniques is widespread. Second, a dramatic increase in health care data.

    Use deep learning in e-health records

    Electronic health systems store patient data such as demographic information, medical records, and test results. These systems can use deep learning algorithms to improve the correct diagnosis and the time required to diagnose the disease. These algorithms use data stored in electronic health systems to identify patterns of health trends and risk factors, and draw conclusions based on identified patterns. Researchers can also use data from e-health systems to create depth learning models that predict the likelihood of some health-related outcomes.

    1.5 What will you learn?

    Let us briefly review what you will learn in this book:

    Chapter 2 provides a brief introduction to machine learning. I begin with a definition of Artificial Intelligence from Oxford Dictionary. Then, I provide a figure that shows the relationship between Artificial Intelligence, Machine Learning, and Deep Learning. The difference between traditional programming and machine learning methods is stated. In Chapter 2, I discuss the model. Machine learning aims to automatically create a model from data, which you can then use to make decisions. Machine learning typically proceeds by initially splitting a dataset into a training set that is used to generate a model and a test set that is used to evaluate the performance of the model. Chapter 2 also discusses generalization. Generalization usually refers to a machine learning model's ability to perform well on unseen data rather than just the data that it was trained on. Due to the concept of generalizability in machine learning, two other terms emerge, called Underfitting and Overfitting. If your model is overfitted then it will not generalize well. I describe these problems using an example. Then, I summarize the meanings of these two concepts. To deal with the overfitting problem, in general, there are two ways, namely regularization and cross-validation, which I discuss.

    There are different ways of how machines learn. In some cases, we train them (called supervised learning) and, in some other cases, machines learn on their own (called unsupervised learning). In Chapter 2, I discuss the three ways that a machine can learn, which are supervised learning, unsupervised learning, and reinforcement learning.

    To work with deep learning, you need to be familiar with a number of mathematical and statistical concepts. In Chapter 2, I outline some of the important concepts, e.g., tensors, you will be working with. Chapter 2 introduces the Keras library where we will implement deep learning projects. Chapter 2 ends by introducing several real-world tensors.

    Chapter 3 provides a brief introduction to the Python ecosystem. If you would like to make a career in the domain of deep learning, you need to know Python programming along with the Python ecosystem. Based on the report of GitHub, Python is the most popular programming language used for machine learning hosted on its service. To build effective deep learning models, you need to have some basic understanding of the Python ecosystem, e.g., Numpy, Pandas, Matplotlib, and Scikit-learn libraries. This chapter introduces various Python libraries and examples that are very useful to develop deep learning applications.

    The chapter begins with introducing four high-performance computing environments that you can use to write Python programs without installing anything, including IPython, the Jupyter notebook, Colaboratory, and Kaggle. Chapter 3 provides general descriptions about SciPy (Scientific Python) ecosystem and Scikit-learn library. This chapter provides a few basic details about Python syntax you should be familiar with to understand the code and write a typical program. The syntaxes discussed include identifier, comments, data type, control flow statements, data structures, and functions. I provide examples to explain these syntaxes.

    NumPy is a Python core library that is widely used in deep learning applications. This library supports multidimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on them. In Chapter 3, I provide several examples about this library which you will need in deep learning applications. After providing an overview of NumPy, I discuss the Matplotlib library which is a plotting library used for creating plots and charts. An easy way to load data is to use the Pandas library. This library is built on top of the Python programming language. In Chapter 3, you learn how to use this library to load data. In Python, there exist several ways to load a CSV data file to use in deep learning algorithms. In Chapter 3 you will learn two frequently used ways: (1) loading CSV files with NumPy and (2) loading CSV Files with Pandas. Reviewing the shape of the dataset is one of the most frequent data manipulation operations in deep learning applications, for example, seeing how much data we have, in terms of rows and columns. Chapter 3 also provides examples of this. After that, I explain how you can use the Pearson correlation coefficient to determine the correlation between features.

    In Chapter 3, I explain Histograms, Box and Whisker Plots, and Correlation Matrix Plot, three techniques that you can use to understand each feature of your dataset independently. Deep learning algorithms use numerical features to learn from the data. However, when the features have different scales, such as Age in years and Income in hundreds of dollars, the features using larger scales can unduly influence the model. As a result, we want the features to be on a similar scale that can be achieved through scaling techniques. In this chapter, you learn how to standardize the data using Scikit-learn.

    Bioinformatics datasets are often high-dimensional. Chapter 3 introduces several feature selection methods. Feature selection is one of the key concepts in machine learning which is used to select a subset of features that contribute the most to the output. It thus hugely impacts the performance of the constructed model. Chapter 3 ends with introducing the train_test_split() function which allows you to split a dataset into the training and test sets.

    Chapter 4 provides the basic structure of neural networks. In this chapter, I discuss the types of neural network and provide an example of how to train a single-layer neural network. Chapter 4 discusses gradient descent which is used to update the network's weights. To this end, three gradient descent methods, namely Stochastic Gradient Descent, Batch Gradient Descent, and Mini-batch Gradient Descent, are discussed. Chapter 4 ends with a discussion about the limitations of single-layer neural networks.

    Training a multilayer neural network is discussed in Chapter 5. In this chapter, the backpropagation algorithm, an effective algorithm used to train a neural network, is introduced. After that, I explain how you can design a neural network in Keras. The MNIST dataset is often considered the hello world of deep learning. The purpose of this example is first to classify different types of handwritten numbers based on their appearance and then to classify the handwritten input into the most similar group in order to identify the corresponding digit. In this chapter, I implement a handwritten classification problem with dense layers in Keras. After the implementation of this problem, you can learn the components of neural networks without going into technical details. Chapter 5 ends with a discussion about two more general data preprocessing techniques, namely vectorization and value normalization. After studying this chapter, you will be able to design a deep learning network with dense

    Enjoying the preview?
    Page 1 of 1