Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Post-transcriptional Gene Regulation in Human Disease
Post-transcriptional Gene Regulation in Human Disease
Post-transcriptional Gene Regulation in Human Disease
Ebook932 pages8 hours

Post-transcriptional Gene Regulation in Human Disease

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Post-transcriptional Gene Regulation in Human Disease, a new volume in the Translational Epigenetics book series, offers a thorough overview and discussion of post-transcriptional genetic control mechanisms and their roles across various pathologies and human developmental outcomes, along with regulatory mechanisms targeted for therapeutic approaches. The book is broadly divided in two parts: early chapters describe the basics of post-transcriptional gene regulation, associated epigenetic mechanisms, the role of RNA binding proteins, the evolution of post-transcriptional gene regulation, and methods to study these mechanisms. The second half of the book includes deeper discussion of post-transcriptional gene regulation across specific diseases and therapeutics targets. Various post-transcriptional events, including alternative splicing and polyadenylation, mRNA stability, and miRNAs and their involvement in the disease progression, are examined in detail.
  • Includes full-color imagery illustrating key concepts and post-transcriptional disease processes, as well as descriptions of methods for studying post-transcriptional gene regulation
  • Presents fundamental knowledge, molecular and biochemical mechanisms, and recent findings in concise and easily understandable formats
  • Features a summary and conclusion at the end of each chapter
LanguageEnglish
Release dateAug 12, 2022
ISBN9780323914246
Post-transcriptional Gene Regulation in Human Disease

Related to Post-transcriptional Gene Regulation in Human Disease

Titles in the series (30)

View More

Related ebooks

Medical For You

View More

Related articles

Related categories

Reviews for Post-transcriptional Gene Regulation in Human Disease

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Post-transcriptional Gene Regulation in Human Disease - Buddhi Prakash Jain

    Chapter 1: Regulation of gene expression in mammals

    an overview

    Shyamal K. Goswami     School of Life Sciences, Jawaharlal Nehru University, New Delhi, India

    Abstract

    The human genome is a large and complex storehouse of information that is meticulously propagated and deciphered by the macromolecular assemblies of gene regulatory proteins and RNAs. An intricate network of feed-forward and feedback interactions among these entities ensures the precise levels of gene expression in any cell type in a given context. Such macromolecular interactions involve simple diffusion, synergistic interactions, phase separation (in the nucleus), and kinetic regulations. Molecular biology, the subdiscipline that studies the process of gene expression, has also shown phenomenal growth since the discovery of the DNA structure in 1953. The title word Gene expression yielded 95,691 entries in PubMed of which 7186 were reviewed articles as of January 31, 2022. Almost all other disciplines of biology viz., biochemistry, structural biology, genetics, epigenetics, developmental biology, proteomics, transcriptomics, and bioinformatics have significantly contributed toward the accumulation of such a large volume of information. Thus, although it would be a gargantuan effort to summarize such an ocean of information in a single review, I have only summarized major aspects of gene expression viz., the structure of core promoters, the RNA polymerases, and transcription factors that performs the synthesis of pre-mRNA, post-transcriptional processing of the pre-mRNA, small regulatory RNAs, and epigenetic regulation. Also, for further information for the readers, I have cited the recent reviews by eminent researchers for each of the topics that I have covered, albeit except citing the original pioneering discoveries. I hope this review will give the readers a platform to comprehend the essence of the excitements in the field as provided by the other authors of this book on post-transcriptional gene regulation in the context of various human diseases

    Keywords

    Enhancer; Epigenetics; lncRNA; miRNA; Promoter; Splicing; Transcription; Transcription factor

    Introduction

    In the year 1958, Francis Crick proposed the Central Dogma of Molecular Biology. He was awarded the Nobel Prize in 1962 for his contributions toward the understanding of the DNA structure and its functions. Since then, phenomenal progress has been made in analyzing the structure and the organization of the genomes and the mechanisms of their expression across species. Although the basic tenets of the molecular mechanisms by which the DNA sequence is transcribed into the mRNA, which in turn is translated into the proteins remain unchallenged, the past 60 years have witnessed the discovery of newer and newer regulatory modules and their complex interactions that precisely control the entire process of gene expression, especially in the mammals. The progress of our understanding of the mechanisms of gene expression can be divided into two distinct phases, viz., the pre- and the postgenomics era that also happens to be at the juncture of the past and the present millennium. In the 1970s, with the advent of breakthrough tools like recombinant DNA technology, DNA sequencing, Southern and Northern hybridizations, construction and screening of cDNA and genomic libraries, etc., numerous genes were isolated and studied. In the following decades, using biochemical, genetic, and molecular biological approaches, different RNA polymerases and a large number of general and tissue-specific transcription factors across species were also isolated and characterized. Among the breakthrough discoveries were the post-transcriptional capping, splicing, and polyadenylation of the pre-mRNAs; and the role of enhancers and long noncoding RNAs in the metazoan gene expression. These created a conceptual framework that is, the interactions of the RNA polymerase, general and gene-specific transcription factors, chromatin modifiers, and RNA processing enzymes with the segments of DNA called promoters, and the enhancers govern the expression of the cognate gene. Over the years, it became evident that the transcription of the mRNA coding genes and the post-transcriptional processing of the pre-mRNA does not occur sequentially as envisaged earlier rather it occurs simultaneously for most of the genes. Also, the covalent modifications of the histones play a major role in the selective expression of various genes or otherwise. These observations heightened the complexities of gene regulation, especially in mammals. Completion of genome sequences in the late 1990s followed the emergence of various tools of bioinformatics and the expressed sequence tag (EST) database: methods for high throughput RNA sequencing, chromatin immunoprecipitation with DNA microarray (ChIP-on-chip), etc. Based on these new tools of genome analyses, there was a paradigm shift in our understanding of gene regulation. It astonishingly revealed that almost 90% of the genomes are transcribed from both the strands of the DNA (hence named Pervasive transcription), transcription can also start from many nucleotides spreading over several hundred base pairs around the promoter (thereby challenging the concept of the transcription start site"); and long noncoding-, enhancer derived- and micro-RNAs play a major role in gene regulation. These studies established that the RNA, originally envisaged as being a product of transcription per se, is indeed the major regulator of gene expression.

    Another fascinating development in the past 50 years is the understanding of the 3D structure of the genome and the epigenetic regulation that ensures the selective expression of a set of genes in a given cell type. Although many fundamental discoveries in gene regulation in the early days had considered DNA as a string of nucleotides interacting with the transcription factors and generating RNAs, over the years, it emerged that the histone proteins that wrap the DNA molecules around and regulate the access of transcription, splicing, and other factors are the key determinants of the profile of gene expression in a given condition. Let us consider the simple fact that the human genome is 2m long and it is packaged into a 10-μm nucleus in such a way that a fraction of it is dynamically as well as selectively exposed for its expression in a given context. Therefore, the mechanism by which the nucleosomes are selectively modified and regulate the expression of various genes became more paramount than the genome itself. In contrast to the belief that the nucleotide sequence is the final determinant of a cell's fate, it has now been found that certain diseases are caused by changes in the local chromatin conformations and the 3D structure of the genome.

    Although this book primarily deals with the role of post-transcriptional regulation of gene expression in diseases, this chapter is aimed toward giving a general background on gene regulation in human so that the comprehension of the specific aspects of human diseases as described in other chapters become easier for the readers with different backgrounds. However, considering the hugeness of literature on the mechanisms of gene expression, the information to be provided in the following sections would be a brief overview than a comprehensive compilation on this immensely vast subject. The readers are also requested to follow an excellent review by Klaus Scherrer wherein several fundamental concepts of gene expression have been critically assessed [1].

    The genome, genes, and the regulatory elements: how many genes do we have?

    The human genome comprises ∼3,200,000,000 nucleotides assorted into 23 independent chromosomes. Among those, the smallest chromosome is ∼50,000,000 and the longest one is of ∼260,000,000 nucleotides [2]. Each somatic cell contains pairs of those chromosomes of which 22 pairs are called autosomes while the 23rd pair called the sex chromosomes differ between the male and the female. Despite such enormity of the genome and the complexity of the human species, the number of genes it harbors is surprisingly low. As per the most recent analyses, out of a total of 42,611 genes, 20,352 encode for the proteins, 18,887 for the long noncoding RNAs (lncRNAs), and the remaining code for the transfer, ribosomal, antisense, and small regulatory RNAs (http://ccb.jhu.edu/chess). These genes are transcribed into 323,258 transcripts of which 266,331 represent isoforms of protein-coding transcripts and the rest are the noncoding regulatory RNAs. Surprisingly, there are also over 30 million transcripts spread over ∼650,000 genomic loci that are either nonfunctional (noise), or their functions are yet to be defined [3]. The question that is quite obvious: how many proteins are produced by this small number of genes, and does it explain the complexity of the human species? Proteomic analyses, especially detecting all the constituent proteins, both abundant and scarce; in a cell or tissue are a major challenge. Detecting all the constituent proteins, both abundant and scarce, in a cell or tissue, with the available tools of proteomics is still a major challenge. As per the UniProt Knowledgebase (UniProtKB), the central repository for the information on all proteins, currently there are 20,386 human proteins corresponding to the protein-coding genes (https://www.uniprot.org/help/uniprotkb; https://www.uniprot.org/). According to the Human Proteome Map, another resource portal that records protein products from multiple organs, tissues, and cell types, there are 30,057 proteins corresponding to 17,294 genes (https://www.humanproteomemap.org/). However, these estimates are highly conservative and primarily aimed toward validating the identities of the protein-coding genes rather than making a comprehensive catalog of the human proteome. Considering the alternative splicing and editing of the pre-mRNAs, the existence of alternate open reading frames, variability of translational start sites, and post-translational modifications, each of which would yield functionally different proteins from the same transcript; the total number of unique proteins in humans is likely to be very high. To date, the determination of the number and quantity of unique proteins in any human tissue is highly challenging; and the estimate is speculative, varying from researcher to researcher. While some experts in the field estimate it to be 80,000–400,000, some others even anticipate about 10,00,000 different proteins in the human proteome (https://www.mpg.de/11447687/W003_Biology_medicine_0594-05.pdf [4]). Therefore, the transmission of the genetic information from the DNA to RNA to proteins seems highly complex and not in conformity with the one gene-one enzyme hypothesis proposed by George Beadle in 1941 (awarded Nobel Prize in 1958) and that was later revised as one gene-one peptide hypothesis. Our present understanding of the complexities of the mechanisms of gene expression largely came from studying the expression of the protein-coding genes.

    Promoters, enhancers, and their regulation

    The core promoter and the preinitiation complex

    The first step of gene expression is transcription. RNA Polymerase II transcribes all protein-coding genes, and in addition, many noncoding genes are also transcribed by it. Early methodologies like S1 nuclease mapping and primer extension studies suggested that RNA Polymerase II-mediated transcription starts at a particular nucleotide named transcription start site (TSS). The TSS is central to the Core promoter that encompasses ∼50 nucleotides upstream and ∼50 nucleotides downstream. The RNA Polymerase II along with the general transcription factors (GTFs) uses the core promoter as the platform for its binding, followed by the initiation of transcription. The core promoters harbor combinations of several conserved motifs (small nucleotide sequences) that are targeted by the GTFs and the RNA Polymerase II. Common among these motifs are the TATA-box, Initiator elements (Inr), Downstream Promoter Element (DPE), Motif 10 Element (MTE), TFIIB Recognition Elements (BREs), and downstream core elements (DCE). These sequences assist the recruitment of the RNA Polymerase II and the GTFs on the core promoter to form the preinitiation complex (PIC), a large multimeric assembly involving about a hundred proteins [5]. Also, these motifs are located at specific distances on the core promoters so that the PIC can align properly and initiate transcription at the TSS. All these motifs are not necessarily found together in a promoter, as they occur in different combinations in different classes of genes, providing the first layer of the regulation of transcription. The characteristics of these motifs and the GTFs are summarized in Tables 1.1A and 1.1B and Fig. 1.1.

    The PIC consists of six GTFs viz., TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, plus the RNA Polymerase II and the Mediators [6]. In addition, long noncoding RNAs also play roles in their assembly and functions [6]. To be noted that the subunit compositions of the PICs are not absolute and it rather varies for different types of genes (constitutive, housekeeping, inducible, etc.). The assembly of the PIC is a highly complex process that has been studied extensively during the past 50 years [7,8]. Earlier studies on the formation of PIC and the initiation of transcription were done in vitro using biochemically purified GTFs and RNA Polymerase II, using prototype promoters as the templates. It showed that the formation of the PIC is hierarchical and involves cooperative interactions between various GTFs. Several salient features of this process are as follows: it is initiated by the engagement of the TATA-box binding protein (TBP), a component of the TFIID complex, to the core promoter. It is a crescent-shaped molecule that binds through its concave surface to the TATA or TATA-like sequences located ∼30 base pairs upstream of the TSS and bends the DNA by ∼90 degrees. The bent DNA structure is then stabilized by the engagement of TFIIB and TFIIA and thereafter, the TFIIF and the RNA Polymerase II are brought in. The recruitment of TFIIE and TFIIH then completes the formation of the PIC ([8], Fig. 1.2). In promoters devoid of the TATA box, the PIC assembles on the Inr site. Since the promoter DNA is enwrapped by the nucleosomes, RNA Polymerase II-mediated initiation of transcription also involves ATP-dependent chromatin remodeling complexes and histone-modifying enzymes (will be discussed later in detail). ATP-dependent translocase xeroderma pigmentosum type B (XPB), a component of TFIIH, separates the DNA strands at the promoter, enabling the single-stranded template to be transcribed. Once the transcription is initiated, the RNA Polymerase II leaves behind the GTFs and the Mediator complex and moves forward for the elongation, allowing the entry of another molecule of RNA Polymerase II for the reinitiation of transcription of the same gene. To ensure that the RNA Polymerase II does not pause or stall due to the nucleosomes or the DNA structures, elongation is facilitated by the elongation factors. During the elongation process, the transcripts are simultaneously processed by capping, splicing, and polyadenylation, preceding termination [6–8].

    Table 1.1A

    D: A, G, or T; K: G or T; N: A, C, G, or T; R: A or G; S: G or C; V: A, C, or G; W: A or T; Y: C or T.

    Table 1.1B

    Reconstituted from Yoshiaki Ohkuma, J Biochem 1997;122, 481–489

    Figure 1.1  Schematic representation of a typical core promoter of a mammalian protein coding gene. The solid line represents the segment of the DNA spanning about 50 base pair upstream (-) to 50 base pair downstream (+) of the transcription start site (TSS). Various sequence motifs described in Table 1 are represented by boxes and the consensus sequences are shown. None of the core promoters carry all these motifs together. The relative positions of the elements in the promoter are maintained but are not in absolute scale. Reproduced from Vo Ngoc L, Wang Y.-L, Kassavetis GA, Kadonaga JT, The punctilious RNA polymerase II core promoter, Genes Dev, 2017;31(13):1289–1301. https://doi.org/10.1101/gad.303149.117.

    One key aspect of the transcription process is the role of the carboxy-terminal domain (CTD) of the largest subunit of the RNA polymerase II. In humans, the CTD domain comprises 52 repeats of the heptad Tyr-Ser-Pro-Thr-Ser-Pro-Ser (YSPTSPS), in which the phosphorylation of the serine residues at second and fifth positions plays a critical role in transcription initiation, elongation, and pre-mRNA processing. Unphosphorylated/hypophosphorylated CTD is necessary for the formation of the PIC, promoter melting, and the initiation of transcription. The phosphatase activity associated with the TFIIF ensures that the CTD domain remains dephosphorylated during the initiation of transcription [9]. Phosphorylated CTD is required for the elongation and concurrent capping, splicing, and polyadenylation of the transcript. Phosphorylated CTD is also required for the interaction of RNA Polymerase II with the Mediator complexes through which it communicates with the transcription factors bound to the upstream promoters and the enhancers. Following transcription initiation, the CTD domain becomes phosphorylated by CDK7, a TFIIH associated kinase, and by CDK9, a constituent of P-TEFb (Positive Transcription Elongation Factor), facilitating the elongation process [10].

    Figure 1.2  Sequential assembly of the preinitiation complex (PIC) on the core promoter of a mammalian protein-coding gene. The assembly starts with the engagement of TFIID with the core promoter at the TATA sequence ∼23–30 base pair upstream of the transcription ion start site at the Inr. Thereafter, TFIIB, TFIIF, and RNA Polymerase II enter the complex. The assembly is completed by the engagement of TFIIE and TFIIH. The involvement of TFIIA is not shown in this scheme as it is nonessential. It engages along with TFIIB to stabilize the binding of TFIID to the promoter. The CTD domain of RNA Polymerase II brings in the mediator complexes that coordinate with the transcription factors bound further upstream initiating transcription (not shown).

    Although during the pregenomic era, the core promoters were viewed as of generic type having similar mechanisms of activation for all genes, following the extensive sequencing of total cellular RNA vis-à-vis analyses of the genome, it became clear that there is substantial diversity in the core promoters and the mechanisms of transcription initiation. Apart from the presence of different combinations of the motifs in the core promoters, it is also now established that there are two distinct modes of the initiation transcription in vertebrates, viz., focused and dispersed. In focused transcription, RNA synthesis starts at a single nucleotide as discussed earlier, but in dispersed transcription, there are several start sites over a span of about 50–100 nucleotides. Interestingly, focused transcription is more prevalent in the lower metazoans like Drosophila, while in vertebrates, ∼70% of the promoters are dispersed [5,6]. Dispersed promoters are generally associated with the constitutive genes, are devoid of the core promoter motifs like TATA, BRE, DPE, and MTE, and generally have the CpG islands (∼200-bp region in the proximal promoter with a higher GC content than that are commonly found in the genome). Thus, the mechanisms of transcription from focused versus dispersed promoters are likely to be fundamentally different and a better understanding of this process needs further investigations. In the past 2decades, with the advent of powerful tools of structural biology, especially cryo-electron microscopy, it is evident that the initiation of transcription is an enormously complex but precise process that ensures the appropriate level of expression of every gene in a spatiotemporal manner [11].

    The proximal- and the distal promoters

    In vitro and ex vivo assays had shown that although the PIC assembled on a core promoter can initiate transcription, the output is quite low. To understand the overall process of gene expression, it is thus necessary to have an understanding of the anatomy of the protein-coding genes. The segment of DNA that is several hundred base pairs upstream of the core promoter is called the Proximal Promoter and that up to about 2000 base pairs further upstream is the Distal Promoter. The optimum expression of any gene depends on the regulatory sequences present on both the proximal- and the distal promoters [11,12]. In addition, segments of a few hundred base-pair lengths located as far as hundreds of kilobases or even more from the promoter, often significantly boost the expression of many genes. These sequences are called Enhancers Enhancers have two distinct characteristics: firstly, their functions are position-independent as they can increase transcription even if they are shifted to a different location closer to or further away from the coding sequence. Many enhancers also work when they are placed even downstream of the gene sequences. Enhancers have also been found in the intron segments of the genes they regulate. Secondly, they remain functional even when the 5ʹ-3ʹ orientation of the two strands is reversed. The reason for such unique characteristics of the enhancers is the 3D structure of the genome that brings them to the proximity of the core promoter even if their positions or orientations are altered [13,14]. A schematic representation of the proximal-, distal-promoters, and the enhancers in the context of gene expression is shown in Fig. 1.3.

    Figure 1.3  A schematic representation of the role of proximal-, distal-promoters and the enhancers in gene expression. The enhancer and the promoter (both proximal and distal) are occupied by various transcription factors (TF). These factors along with the coactivators make a local assembly that then communicates with the RNA Polymerase II complex through the mediators. Reproduced from Haberle V, Stark A., Eukaryotic core promoters and the functional basis of transcription initiation. Nat Rev Mol Cell Biol. 2018; 19(10):621–637. https://doi.org/10.1038/s41580-018-0028-8.

    Transcription factors are the drivers of gene expression

    The promoters (both distal and proximal) and the enhancers exert their effects through a series of small (4–10 base pairs) cis-acting sequences targeted by a family of proteins called transcription factors. These transcription factors are not generic as the GTFs described above. Rather, they are quite specific for the set of genes they regulate in various tissues. In humans, ∼1600 such transcription factors have been predicted, of which a few hundred have been extensively studied to date [15]. While many of these transcription factors are ubiquitous (e.g., SP-1) and are involved in the regulation of a diverse set of genes in many tissues, some others (e.g., NFκB), are quite exclusive for a group of genes.

    Depending on their secondary structures, transcription factors are often classified as the Helix-loop-Helix (HLH), Leucine zipper (LZ), Zn-finger, homeodomain, etc. The leucine zipper family of transcription factors are characterized by the presence of about 30 amino acid segment with a repetition of leucine residues at every seventh position so that they are aligned with each other on the outer surface of the α-helix it forms. These factors form homo- or heterodimers through the leucine residues (hence named leucine zipper) and they bind the DNA as the dimers only. That way, about 50 such potential leucine zipper transcription factors in humans can create a larger repertoire of functional dimers [16]. Nuclear hormone receptors are also a large group of transcription factors that are activated by the steroid hormones, lipophilic vitamins, sterols, and many genotoxic compounds. Many of the nuclear hormone receptors are zinc finger proteins. The Homeobox (Hox) transcription factors are involved in the development of the body plan in the early embryo that eventually leads to the complex structures of the limb and the organs. This family of factors harbors a conserved Homeodomain of 60 amino acids (the corresponding DNA sequence of 180 bases is called the homeobox) with a helix-turn-helix secondary structure that binds the target DNA. In mammals, 39 Hox genes have been predicted and their functional specificity lies with the variable amino-acid residues within the homeodomains [17]. Drosophila genetics has enormously contributed to the understanding of the role of Hox genes in embryonic development.

    Transcription factors have certain common characteristics. Each factor has one well-defined DNA binding domain that is rich in basic amino acids and it is called the basic domain. Due to the presence of this basic domain, helix–loop–helix and leucine zipper transcription factors are called bHLH and bLZ (or bzip) factors. Most transcription factors have structured dimerization domains and function as hetero- or homodimers (e.g., the leucine zipper family). They also have compatible protein–protein interaction domains that are involved in the interaction with the neighboring transcription factors being bound to the respective DNA sequences. Finally, each transcription factor is characterized by an activation domain through which they communicate with the RNA Polymerase II to drive transcription (Fig. 1.4).

    Many transcription factors are expressed in the specific locations of the developing embryos and determine the cell lineage. The bHLH transcription factors Neurog1/2/4 are involved in the initiation of neuronal specification and differentiation [18]. Myogenic Regulatory Factors viz., MyoD, myogenin, Myf5, and MRF4 are also members of the bHLH family and they orchestrate the determination and differentiation of skeletal muscle cells during embryogenesis. They also coordinate the expression of muscle-specific genes during postnatal development [19]. Ectopic expression of the myogenic factors in certain nonmuscle cells converts those into myoblasts, a process called transdifferentiation.

    Another class of transcription factors named Pioneer Transcription Factors can bind specific DNA sequences in the heterochromatin stage (heterochromatin is the part of the genome where gene expression is silenced by the epigenetic modifications of the histones). Upon binding, they initiate the opening of the chromatin structure by erasing those epigenetic marks, paving the way for other factors to bind their target sequences and initiating transcription. NeuroD1 is one such pioneer factor that alters the epigenetic and transcriptional program during neurogenesis, converting microglia to neurons. These factors play a major role in tissue differentiation during embryonic development [20,21].

    Figure 1.4  A schematic (linearised protein structure) representation of the functional domains of human glucocorticoid receptor alpha (hGRα). NTD: N-terminal domain; DBD: DNA binding domain; HR: Hinge region; LBD: ligand (glucocorticoid) binding domain. These domains are further divided into several subdomains as shown below. AF: activation function. The AF-1 subdomain plays an important role in the interaction of the receptor with the coactivators, chromatin modulators and basal transcription factors. The DNA-binding domain (DBD) contains two zinc finger motifs through which the receptor binds to specific DNA sequences called the glucocorticoid-response elements (GREs) in the promoter region (s) of the target genes. The DBD also contains sequences important for receptor dimerization and nuclear translocation. The hinge region (HR) confers structural flexibility in the receptor dimmers. The ligand-binding domain (LBD) binds to glucocorticoids and plays a critical role in the ligand-induced activation of the receptor. The LBD also contains a second transactivation domain termed AF-2, which is ligand-dependent. LBD also harbor sequences important for the dimerization of the receptor, its translocation to the nucleus, binding to the heat shock proteins and interaction with coactivators. Adapted with permission from Nicolas C. Nicolaides, Zoi Galata, Tomoshige Kino, George P. Chrousos, and Evangelia Charmandar, The Human Glucocorticoid Receptor: Molecular Basis of Biologic Function, Steroids. 2010 January; 75(1): 1. https://doi.org/10.1016/j. steroids. 2009.09.002

    Although the number of genes for the transcription factors in mammals lies around 1600, the total number of functional factors is much higher as many of those factors have multiple isoforms arising out of alternative splicing. Various domains of the transcription factors are post-translationally modified by ubiquitination, acetylation, phosphorylation, methylation, etc., by a range of extra- and intracellular stimuli [22]. Such modulations regulate their nuclear-cytoplasmic localization, stability, DNA binding activity, interaction with other transcription factors, and transcriptional activity ([23], Fig. 1.4). While many transcription factors are constitutively expressed, many are induced at the level of transcription in response to specific stimuli.

    Interestingly, the target DNA sequences of the mammalian transcription factors are quite short that is, 4–10 nucleotides (the GATA factors involved in hematopoiesis are named after the four nucleotide long target sequence). Considering the total length of the human genome, it is likely that these small sequences would randomly occur in millions. So, how do these factors maintain their target specificities? Generally, many of these transcription factors act as dimers and therefore their target sequences need to be in tandem with proper orientation and spacing (to accommodate each of the binding factors that are often bulky). As an example, the AP-1 transcription factor is a dimer of the two bzip proteins Fos and Jun. It binds to the target sequence 5ʹ-TGACTCA-3ʹ (TRE element) wherein each monomer binds to the terminal three nucleotides, called the half-sites (underlined). In vitro studies have shown that since the TRE is a palindrome (i.e., the reverse strand read from its 5ʹ end is also TGACTCA), the Fos: Jun heterodimer can bind it in both orientations (Fig. 1.5). However, when it binds to different AP-1 sites in various promoters, it shows a preference for one orientation over the other and that depends upon the sequences flanking the core TRE sequence as well as certain amino acids beyond their DNA binding (basic) domains [24]. Such preference also plays a role in their interactions with the transcription factors bound to the adjoining sequences in the same promoter. The cAMP response element-binding protein (CREB) family of bzip factors binds to the CRE sequences and mediates the cAMP response. It is an eight nucleotide sequence (5′-TGACGTCA-3′) that is identical to the TRE, except having one extra G in between the two terminal half-sites. CRE sites are occupied by either homodimer of CREB or CREB-ATF-1 heterodimer, but not by AP-1 (Jun-Fos heterodimer) (Fig. 1.5). Such selectivity in the binding sequence is conferred by certain amino acid residues in the DNA binding, leucine zipper, and the intervening domains of the CREB/ATF-1 proteins [25]. Another example of small nucleotide sequences with high functional specificity is the binding sites of the nuclear receptors. Nuclear receptors are a large family of transcription factors that are activated by small lipophilic molecules such as steroid hormones, retinoic acids, thyroid hormone, vitamins D3, and genotoxic compounds. They regulate numerous physiological and developmental processes through the modulation of their respective target genes. Their target sites consist of two hexanucleotide half-sites that are arranged in two different orientations. One is the inverted repeats where the two half-sites are arranged in the opposite orientation (head-to-head) forming a palindrome with an intervening sequence of three nucleotides. The other orientation is the direct repeats where the half-sites are oriented toward the same direction (head-to-tail) with an intervening sequence of 1–5 nucleotides [26]. There are different types of binding of nuclear receptors to their respective target sequences. The steroid receptors subfamily, that is, the glucocorticoid, progesterone, mineralocorticoid, androgen, and estrogen receptors, bind to the inverted repeats as homodimers (Fig. 1.5). Another subgroup comprising the retinoic acid, the vitamin D, and the peroxisome proliferator-activated receptors heterodimerize with the retinoid X receptors and the heterodimers bind the direct repeats (Fig. 1.5). The binding sites of the steroid hormone receptors are quite similar, and these receptors recognize each other's target sequences in vitro. In the case of the direct repeats, where the heterodimers bind, the target sequences are also quite similar for all and the length of the intervening spacer sequences (DR1-DR5) determines their target specificities (Fig. 1.5). The sequences that are targeted by the homo- and hetero-dimers of various receptors are called the hormone response elements (HRE). It is a generic term as all the nuclear receptors are not regulated by hormones only. Interestingly, while the sequences of individual HREs can vary considerably, all nuclear receptors can strongly bind to two of the consensus half-sites, that is, 5′-AGAACA-3′ and

    Enjoying the preview?
    Page 1 of 1