Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Gene Expression Programming: Fundamentals and Applications
Gene Expression Programming: Fundamentals and Applications
Gene Expression Programming: Fundamentals and Applications
Ebook185 pages2 hours

Gene Expression Programming: Fundamentals and Applications

Rating: 0 out of 5 stars

()

Read preview

About this ebook

What Is Gene Expression Programming


Gene expression programming, often known as GEP, is an evolutionary technique used in the field of computer programming that is used to construct computer programs or models. These computer programs take the form of complex tree structures that, in a manner analogous to that of real organisms, are able to learn and adapt by altering their sizes, forms, and composition. In addition, just like the chromosomes of real creatures, the GEP computer programs are encoded in simple linear chromosomes that are a predetermined length. GEP is an example of a genotype-phenotype system since it possesses both a simple genome for the purpose of storing and transmitting genetic information and a complex phenotype for the purpose of exploring the environment and adapting to it.


How You Will Benefit


(I) Insights, and validations about the following topics:


Chapter 1: Gene expression programming


Chapter 2: Symbolic regression


Chapter 3: Decision tree


Chapter 4: Evolutionary algorithm


Chapter 5: Genetic algorithm


Chapter 6: Genetic programming


Chapter 7: Grammatical evolution


Chapter 8: Artificial intelligence


Chapter 9: Machine learning


Chapter 10: Artificial neural network


(II) Answering the public top questions about gene expression programming.


(III) Real world examples for the usage of gene expression programming in many fields.


Who This Book Is For


Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of gene expression programming.


What is Artificial Intelligence Series


The artificial intelligence book series provides comprehensive coverage in over 200 topics. Each ebook covers a specific Artificial Intelligence topic in depth, written by experts in the field. The series aims to give readers a thorough understanding of the concepts, techniques, history and applications of artificial intelligence. Topics covered include machine learning, deep learning, neural networks, computer vision, natural language processing, robotics, ethics and more. The ebooks are written for professionals, students, and anyone interested in learning about the latest developments in this rapidly advancing field.
The artificial intelligence book series provides an in-depth yet accessible exploration, from the fundamental concepts to the state-of-the-art research. With over 200 volumes, readers gain a thorough grounding in all aspects of Artificial Intelligence. The ebooks are designed to build knowledge systematically, with later volumes building on the foundations laid by earlier ones. This comprehensive series is an indispensable resource for anyone seeking to develop expertise in artificial intelligence.

LanguageEnglish
Release dateJul 1, 2023
Gene Expression Programming: Fundamentals and Applications

Read more from Fouad Sabry

Related to Gene Expression Programming

Titles in the series (100)

View More

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Gene Expression Programming

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Gene Expression Programming - Fouad Sabry

    Chapter 1: Gene expression programming

    Gene expression programming, often known as GEP, is an evolutionary technique used in the field of computer programming that is used to construct computer programs or models. These computer programs take the form of complex tree structures that, in a manner analogous to that of real organisms, are able to learn and adapt by altering their sizes, forms, and composition. In addition, much like the chromosomes of real creatures, the GEP computer programs are stored in simple linear chromosomes that have a predetermined length. Therefore, GEP is a genotype–phenotype system, which benefits from having a basic genome in order to store and convey the genetic information, and a complex phenotype in order to investigate its surroundings and adapt to them.

    Evolutionary algorithms begin with populations of people, select individuals based on their level of fitness, and then add genetic diversity by making use of one or more genetic operators. Their use in artificial computational systems dates back to the 1950s where they were used to solve optimization problems (e.g. The evolutionary algorithm known as Box 1957 is a member of the family of algorithms known as evolutionary algorithms. It is also closely connected to genetic algorithms and genetic programming. It inherited the linear chromosomes of a constant length from genetic algorithms, and it inherited the expressive parse trees of variable sizes and forms from genetic programming. Both of these contributions contributed to its evolution.

    Creating a genotype/phenotype system requires the use of linear chromosomes in gene expression programming. These chromosomes function as the genotype, while the parse trees function as the phenotype. This genotype/phenotype system is multigenic, and as a result, each chromosome has the encoding for many parse trees. This indicates that the computer programs that are produced by GEP are made up of many different types of parse trees. Due to the fact that these parse trees are the end product of gene expression, they are referred to as expression trees in GEP.

    Gene expression programming uses a linear, symbolic string or chromosome of a predetermined length that is made of one or more genes of equivalent size. This string or chromosome is known as the genome. Despite the fact that these genes are all the same length, the expression trees they code for are all distinct sizes and forms. The string chromosome is an example of a chromosome that has two genes, each of which has size 9. (position zero indicates the start of each gene):

    012345678012345678

    L+a-baccd**cLabacd

    where L represents the natural logarithm function and a, b, c, and d represent the variables and constants used in a problem.

    As was just shown, the genes that control the programming of gene expression are all the same size. Nevertheless, these expression trees with fixed length strings code for a variety of varied sizes. This indicates that the size of the coding sections changes from gene to gene, which paves the way for adaptation and evolution to take place in a seamless manner.

    As an example, consider the mathematical expression:

    {\sqrt {(a-b)(c+d)}}\,

    may also be shown as an expression tree as necessary:

    where Q represents the square root function.

    The phenotypic expression of GEP genes makes up this kind of expression tree, although the genes themselves are linear strings that code for these more complicated structures. In the context of this specific illustration, the linear string is equivalent to:

    01234567

    Q*-+abcd

    It is the simple reading of the expression tree from left to right and from top to bottom in that order. k-expressions are the names given to these linear strings (from Karva notation).

    The transition from k-expressions to expression trees may also be described as being extremely easy. Take, for instance, the k-expression that is shown below:

    01234567890

    Q*b**+baQba

    is composed of two different terminals (the variables a and b), two different functions of two arguments (* and +), and a function of one argument (Q).

    The impression that it gives:

    The regions of the genes that are expressed correspond to the k-expressions that are used in the programming of gene expression. This indicates that there is a possibility that the genes include sequences that are not expressed, which is in fact the case for the majority of genes. These noncoding sections are there for the sole purpose of providing a buffer of terminals in order to ensure that all k-expressions encoded in GEP genes always correspond to legitimate programs or expressions.

    Therefore, the genes that are responsible for gene expression programming are made up of two distinct domains: a head and a tail. Each of these domains has its own set of characteristics and activities. The head is primarily used to encode the functions and variables that have been selected to address the issue at hand. On the other hand, the tail, although also being used to encode the variables, offers basically a pool of terminals to guarantee that all programs are free of errors.

    The length of the tail is calculated using the following formula for GEP genes::

    t=h(n_{\max }-1)+1

    where h is the head's length and nmax is maximum arity.

    For example, in the case of a gene that was constructed with the set of functions F = Q, +, −, ∗, /} and the set of terminals T = {a, b}, nmax = 2.

    And assuming that the head is 15 cm in length,, therefore t = 15 (2–1) + 1 = 16, This results in a gene length of 15 + 16, which equals 31.

    The following string, which was created at random, is an example of such a gene:

    0123456789012345678901234567890

    *b+a-aQab+//+b+babbabbbababbaaa

    It is responsible for encoding the expression tree:

    which, in this particular instance, utilizes just 8 of the 31 components that together make up the gene.

    It is not difficult to see that, despite their fixed length, each gene has the potential to code for expression trees of different sizes and shapes. The simplest expression trees are composed of only one node (when the first element of a gene is a terminal), while the most complex expression trees are composed of as many nodes as there are elements in the gene (when all the elements in the head are functions with maximum arity).

    It is also not difficult to see that it is simple to implement all forms of genetic modification (mutation, inversion, insertion, recombination, and the like) with the guarantee that all of the resulting offspring encode correct, error-free programs. This is something that can be seen as being very easy to do.

    Chromosomes, which are responsible for the programming of gene expression, often consist of more than one gene of the same length. Every gene has its own unique sub-expression tree (sub-ET) or sub-program that it controls. After then, the sub-ETs will be able to interact with one another in a variety of different ways, which will result in a more complicated software. One example of a program that is made up of three sub-ETs is shown in the figure.

    Since there are no limitations placed on the sort of linking function that one may choose, the final program's sub-ETs may be connected to one another using addition or any other function of one's choosing. Taking the average, the median, and the midrange of the data, then thresholding their sum to make a binomial classification, applying the sigmoid function to compute a probability, and so on are some examples of more complex linkers. Other examples include using the sigmoid function to compute a probability and so on. These connecting functions are often selected a priori for each challenge; but, they are also capable of being developed in a manner that is both elegant and efficient by the biological system of gene expression programming.

    Homeotic genes are responsible for controlling the relationships between the several sub-ETs and modules that make up the main program in gene expression programming. The expression of these genes leads to the production of distinct primary programs or cells; more specifically, they decide which genes are expressed in each cell and how the sub-ETs of each cell communicate with one another. In other words, homeotic genes decide which sub-ETs are called upon, how often they are called upon, and in which main program or cell they are called upon, as well as the types of connections they make with one another.

    The structural arrangement of homeotic genes is precisely the same as that of regular genes, and the mechanism by which they are constructed is also precisely the same. They also include a head domain and a tail domain, but the heads of these proteins now have connecting functions and a specialized kind of terminals called genic terminals that represent the typical genes. The tails of these proteins also have a head domain. The expression of the normal genes always leads, as expected, in a variety of distinct sub-ETs, which are known as ADFs inside the biological system (automatically defined functions). When it comes to the tails, they just have genic terminals, which are simply derived features that are formed dynamically by the algorithm.

    For instance, the chromosome shown in the image has three typical genes in addition to one homeotic gene, and it encodes a primary program that, in all, calls upon three distinct functions four times, therefore connecting them in a specific manner.

    It is abundantly obvious from the aforementioned illustration that the cellular system not only permits the unrestricted growth of connecting functions but also permits the reuse of code. In addition, recursion should not be too difficult to implement in this system.

    Multiple homeotic genes are required for the development of multicellular organisms. In this system, each homeotic gene is responsible for assembling a unique mix of ADFs or sub-expression trees, which results in the creation of various cells or main programs.

    For instance, the software that is seen in the image was developed by using a biological system that consists of two cells and three typical genes.

    These multicellular systems have a wide range of applications, and similar to multigenic systems, they may be used to issues that have only one output as well as problems that have numerous outputs. In other words, they are very versatile.

    The head and tail domain of GEP genes, in both their normal and homeotic forms, serves as the fundamental component of each and every GEP algorithm. Gene expression programming does not just investigate the head-and-tail structure of chromosomes, but also investigates other, more complicated chromosomal configurations. In their most fundamental form, these intricate structures are made up of functional units or genes that each have a head domain, a tail domain, and one or more additional domains. These additional domains often store random numerical constants, which the algorithm continues to fine-tune in an iterative fashion in order to find a satisfactory answer. For instance, these numerical constants could be the weights or factors in a function approximation problem (refer to the GEP-RNC algorithm below for more information); they could be the weights and thresholds of a neural network (refer to the GEP-NN algorithm below for more information); the numerical constants required for the design of decision trees (refer to the GEP-DT algorithm below for more information); the weights required for polynomial induction; or the random numerical constants used.

    The main stages of the basic algorithm for gene expression are presented below in pseudocode format:

    Choose the function set you need; To choose the terminal set; Please load the dataset for the health assessment; Generate the chromosomes of the starting population in a random fashion; With regard to every program in population:

    Express chromosome; Execute program; Evaluate fitness; Verify stop condition; Select programs; Replicating certain programs in order to create the following population; Use genetic operators to make changes to the chromosomes; Continue on to step 5.

    The iterative loop of the method cannot begin until the first four stages have been completed since they prepare all of the necessary components (steps 5 through 10). The formation of the initial population, which is done in a random fashion utilizing components from both the function sets and the terminal sets, is the most important stage in these preparatory procedures.

    Gene expression programming, like all other evolutionary algorithms, works with populations of individuals, which in this instance are computer programs. These populations are then used to create new computer programs. As a result, it is necessary to establish an initial population of some type in order to begin things rolling. Following populations are those that have descended from an original population via the processes of selection and genetic change.

    In the genotype-phenotype system of gene expression programming, the only thing that needs to be done to create the individuals is their simple linear chromosomes. There is no need to worry about the structural soundness of the programs that these

    Enjoying the preview?
    Page 1 of 1