Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Epigenetics in Organ Specific Disorders
Epigenetics in Organ Specific Disorders
Epigenetics in Organ Specific Disorders
Ebook1,555 pages17 hours

Epigenetics in Organ Specific Disorders

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Epigenetics in Organ Specific Disorders, a new volume in the Translational Epigenetics series, provides a foundational overview and nuanced analysis of epigenetic gene regulation distinct to each organ type and organ specific disorders, fully elucidating the epigenetics pathways that promote and regulate disease. After a brief introduction, chapter authors compare epigenetic regulations across normal and disease conditions in different organ tissues, exploring similarities and contrasts. The role of epigenetic mechanisms in stem cells, cell-matrix interactions and cell proliferation, cell migration, cellular apoptosis, necrosis, pyknosis, tumor suppression, and immune responses across different organ types are examined in-depth.Organ specific epigenetic mechanisms and biomarkers of early use in developing drugs, which can selectively target the organ of interest, are also explored to enable new precision therapies.
  • Identifies unique epigenetic mechanisms that occur in normal and disease conditions in each organ, examining differences and similarities
  • Explores organ specific epigenetic mechanisms to enable drug discovery and development
  • Features chapter contributions from leading researchers in the field
LanguageEnglish
Release dateDec 2, 2022
ISBN9780128239322
Epigenetics in Organ Specific Disorders

Related to Epigenetics in Organ Specific Disorders

Titles in the series (30)

View More

Related ebooks

Medical For You

View More

Related articles

Reviews for Epigenetics in Organ Specific Disorders

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Epigenetics in Organ Specific Disorders - Academic Press

    Section 1

    Molecular and structural epigenetics

    Chapter 2: Higher-order chromatin structure and gene regulation

    Kenta Nakaia; Alexis Vandenbonb,c    a Institute of Medical Science, The University of Tokyo, Tokyo, Japan

    b Institute for Life and Medical Sciences, Kyoto University, Kyoto, Japan

    c Institute for Liberal Arts and Sciences, Kyoto University, Kyoto, Japan

    Abstract

    It is estimated that the human body consists of several hundreds of cell types, which are nevertheless specified with a single genome. To understand how such a variety of cells is created from a common set of genes is one of the most fundamental problems in modern biology. In each type of cells, a certain subset of genes are expressed and this cell type-specific expression of genes is likely to be regulated mostly by enhancers/cis-regulatory regions, which exist in the noncoding regions of the genome at great many places. Since enhancers stimulate the expression of their target genes through so-called enhancer-promoter interactions caused by DNA looping, the combinatorial pattern of these interactions would be changed between different cell types with the changes of higher-order chromatin structures. In this article, we would like to briefly review how this notion has been explored. In its first half, we briefly overview what is known about the (higher-order) chromatin structure, while in the latter half, what is known about enhancers is reviewed, emphasizing their relationship with cell type-specific regulation and chromatin structure. In most cases, we only cited recent review articles for the description of general topics and added reports on recent findings, where available.

    Keywords

    Molecular mechanism of gene regulation; Genomics; Enhancers; Insulated neighborhoods; Cell-type-specific gene expression

    1: Introduction

    It is estimated that the human body consists of several hundreds of cell types, which are nevertheless specified with a single genome. Understanding how such a variety of cells is created from a common set of genes is one of the most fundamental problems in modern biology. In each type of cells, a certain subset of genes are expressed and this cell type-specific expression of genes is likely to be regulated mostly by enhancers/cis-regulatory regions, which exist in the noncoding regions of the genome at great many places. Since enhancers stimulate the expression of their target genes through so-called enhancer-promoter interactions caused by DNA looping, the combinatorial pattern of these interactions would be changed between different cell types with the changes of higher-order chromatin structures. In this article, we would like to briefly review how this notion has been explored. In its first half, we briefly overview what is known about the (higher-order) chromatin structure, while in the latter half, what is known about enhancers is reviewed, emphasizing their relationship with cell type-specific regulation and chromatin structure. In most cases, we only cited recent review articles for the description of general topics and added reports on recent findings, where available.

    2: Chromatin structure

    2.1: Nucleosomes and the factors that regulate their dynamics

    Although the main topic of this article is the higher-order structure of chromatin, its lower-order structure is also important to understand differential gene regulation. It is well established that genomic DNA takes a beads-on-a-string structure, binding with histone octamers (i.e., the nucleosomes).¹ Since DNA is negatively charged, binding with positively charged histone proteins would facilitate the close packing of DNA within the narrow space in the nucleus and this nucleosome structure is the basic unit of the chromatin structure. However, if DNA is very tightly packed within nucleosomes, it would be difficult for transcription and other processes to access these regions. Thus, it is considered that transcriptionally active regions are loosely packed or even free of nucleosomes. Since such active regions can be different in different cell types or under different conditions, there must be regulated dynamics of nucleosomes.²,³ Several factors are known to be involved in this dynamic process: DNA methylation, exchange of histones with their variants, and posttranslational modification of core histones.⁴

    First, in mammals, DNA methylation means that a methyl group is added to the 5′ position on the pyrimidine ring of cytosine bases, usually preceding a guanine base (i.e., a CpG dinucleotide). It has been reported that DNA methylation increases nucleosome compaction and rigidity.⁵ Since this compaction process is rather complicated, its details have not been clarified enough but a group of proteins that own a characteristic domain (the methyl-CpG-binding domain; MBD) which specifically binds to methylated DNA seem to play important roles.⁶ Due to the compaction, DNA methylation is a repressive factor in transcription and the enzymes that mediate DNA methylation (DNA methyltransferases) and DNA demethylation (which can also occur passively, in other words, without the maintenance mechanism at the time of DNA replication) should play important roles for the regulation of transcription. In typical mammalian somatic cells, about 75% of CpGs are methylated.⁷ In addition, in some specific cells, such as neurons and embryonic stem cells, a small but a significant number of methylations occur on cytosines with different contexts (denoted as CpH methylation). It seems that methyltransferases that are responsible in neurons and embryonic stem cells are different.⁸,⁹ In addition, an oxidated form of methylated cytosine, 5-hydroxymethylcytocine (5hmC), is observed relatively abundantly in these cells. The oxidation is caused by the TET (ten-eleven translocation) family of proteins and this modification is considered as an epigenetic mark per se.⁶,¹⁰,¹¹

    Second, core histones (H3, H4, H2A, and H2B) have several variants, such as H3.3 for canonical H3 and H2A.Z for H2A.³ These variant histones have slightly different amino acid sequences from the canonical ones and can exert some structural influence on chromatin. However, their exact roles have not been clarified because of their complicated behaviors, perhaps partly due to their posttranslational modifications. The exchange between variant histones and canonical ones is mediated by protein complexes, called histone chaperones (or a class of chromatin remodelers; see below).

    Third, as for the posttranslational modification of histones, many examples are known.¹² Histones are globular proteins with an N-terminal unstructured region, which is called the tail. Modifications occur in this tail region. Among many types of chemical modifications known to date, methylation and acetylation of lysine residues are the most typical. Importantly, specific types of modifications of specific positions are known to be enriched in specific functional states of gene regulatory regions. For example, monomethylation of the fourth residue (lysine) of histone H3 (denoted as H3K4me1) is often found in primed (i.e., one step before the activated step; see Section 3.2) enhancers, while trimethylation of the ninth residue of H4 (H4K9me3) is associated with transcriptionally repressive regions (enriched in so-called heterochromatin). As for the acetylation of lysine residues, all are associated with transcriptionally active regions, so far. Obviously, there are mechanisms that read and interpret such modifications (marks), subsequently leading to the corresponding chromatin states (see below). Therefore, the pattern of these histone modifications is known as the histone code, which seems to be more important as a mark to be recognized, though they have also structural influences directly in the surrounding region to some extent. It is still not known well how frequently such marks are updated. We observed that certain types of histone modification were changed rapidly in mouse dendritic cell upon stimulation by LPS (lipopolysaccharide), while some others were changed more slowly.¹³ Thus, there seems some hierarchy between the histone marks. Before closing this section, the importance of the linker histone H1 on regulating the chromatin structure should be added, too.¹⁴,¹⁵

    2.2: How the nucleosomes are (not) organized

    From earlier X-ray crystallographic studies, how DNA is bound with the histone octamers has become well established.¹ Including the linker region of about 60 base pairs (bps), a unit of the nucleosome contains about 200 bps of DNA. This unit is called the 10-nm fiber because of its approximate diameter. However, how this 10-nm fiber constitutes its higher-order structure is still controversial.¹⁶ From earlier in vitro studies, it has been believed that nucleosomes form a 30-nm fiber. Originally, two models for its structures have been proposed: the solenoid model and the zig-zag model. Both models are supported by several experimental results under varying conditions. Notably, more recent studies often suggest that the 30-nm structure is not the standard form of chromatin in the cell (even in the condensed heterochromatin). Rather, nucleosomes are folded into a more heterogeneous and irregular structure. The difficulty of solving this problem lies in that the condition of chromatin within the nucleus can be quite heterogenous. However, even if the chromatin structure of this level is irregular, it seems to take a somewhat ordered higher structure (e.g., chromatin territories thanks to proteins, such as condensins; see Section 2.5).

    In addition, recently, a number of studies implied that the liquid-liquid phase separation (LLPS) mechanism can play important roles in forming membrane-less organelle-like compartments in the cell.¹⁷ Such examples include nucleoli, heterochromatin, and polycomb bodies (see Section 2.5). Since this is a result of physical interactions, the formation of such structures could occur spontaneously (without additional energy sources, such as ATP). It is also relatively well accepted that the presence of large intrinsically disordered regions (IDRs) in component proteins can facilitate the formation of such phase-separated structures through their multivalent interactions. In the regulation of transcription, too, the LLPS may be important. For example, a target gene promoter may interact with multiple distantly-located enhancer elements simultaneously within a separated fluid condensate. As another possibility, it has been discussed that some transcriptional coactivators and RNA polymerase II (Pol II) are coexisting in phase-separated condensates, which are enriched with highly active genomic domains containing an array of enhancers (called super-enhancers; see Section 3.6). Although these notions are quite attractive and may explain some of the long-standing mysteries in transcription, they are still disputed and thus we need to take care in considering this problem.

    2.3: Chromatin loops

    With the progress in high-throughput chromatin conformation assays, such as Hi-C and ChIA-PET (chromatin interaction analysis by paired-end tag sequencing),¹⁸ it has become increasingly accepted that topologically associating domains (TADs) (and their related structures obtained from various experimental techniques) are the units of higher-order chromatin structure.¹⁹,²⁰ These high-throughput assays rely on next-generation sequencing (NGS). Intuitively, their basic steps are as follows:

    1.Two distinct DNA regions that are in close proximity are chemically cross-linked.

    2.The DNA is fragmented into segments that include the cross-linked ones.

    3.The ends of the two segments are ligated and a circular DNA consisting of the two regions is made.

    4.This DNA is isolated (through immunoprecipitation in ChIA-PET or through biotin-labeling in Hi-C) and sequenced.

    5.By aligning the two ligated sequences to their reference genome, we can infer the two positions in the genome that were in close contact.

    6.When such information is massively accumulated, a triangle map (the genome contact matrix) is obtained. The map shows the degree of contact between any pairs of positions (loci) in the genome. An example is shown in Fig. 2.1.

    Fig. 2.1 A schematic illustration on how (sub)TADs are defined from the triangle map obtained from a Hi-C experiment. This figure is just for explaining the conceptual relationship between the triangle map from Hi-C and (sub)TADs. Note that this figure is not based on real data. No permission required.

    A TAD is loosely defined as a genomic region within which DNA has frequent physical interactions. Thus, TADs can be recognized from the map as subtriangles within which the contacts are more significant. In Fig. 2.1, a simplified (conceptual) relationship between the triangle map obtained from Hi-C experiments and a TAD is shown. Roughly speaking, the color indicates the degree of the interaction between pairs of chromosomal regions, and the red triangular shapes indicate TADs and sub-TADs, i.e., chromosomal regions that are in close contact with each other. The median size of TADs is estimated to be about 1 Mb while the existence of their nested substructures (called sub-TADs, chromatin loops, or contact domains, by different research groups) has been reported.²¹ Their median size is about 300 kb. Several algorithms for TAD recognition have been proposed.⁵,¹⁵

    The TAD unit seems to be more or less conserved among cell types and across metazoan species, although the notion of conservation should be treated carefully.²² Similarly, the binding pattern of CTCF proteins, which are often found at its both ends, is thought to be relatively stable (see Section 3.7). On the other hand, sub-TAD boundaries are more variable between cell types. In many mammalian cases, the CTCF proteins in both ends of TADs/sub-TADs form a dimer with each other and it seems that TADs/sub-TADs can also be characterized as extruded chromatin loops that are bounded by such CTCF protein pairs as well as by cohesin rings, which are a member of the SMC (structural maintenance of chromosome) protein complex family. Indeed, according to a recent comprehensive study using ChIA-PET using 24 human cell types, 28% of all cohesin-mediated chromatin loops, which are roughly thought to correspond to sub-TADs, show a certain degree of variations across these cell types (see Section 3.8).²¹ As a molecular motor driven by ATP hydrolysis, the cohesin ring may help the dynamic change of the loop structure, known as the loop extrusion process, though its details are still unknown.²²,²³

    2.4: Chromatin remodelers

    Within the nucleus, a certain (appropriate) chromatin structure must be formed and maintained after DNA replication; nucleosomes should be assembled with DNA at regular intervals and be ejected or edited (i.e., their subunits should be exchanged) when necessary. Such functions are mediated by specific protein complexes, called chromatin remodeling complexes (CRCs).²⁴ More than 20 CRC members are known and all of them use ATP as their energy source for their DNA helicase-like activity, sharing a subunit that contains an evolutionarily related domain (i.e., the ATPase-translocase subunit). Based on the sequence similarity, most of them are classified into one of four subfamilies: ISWI (imitation switch), CHD (chromodomain helicase DNA-binding), SWI/SNF (switch/sucrose nonfermentable), and INO80 (inositol-requiring protein 80). Roughly speaking, ISWI and CHD members are involved in the nucleosome assembly; SWI/SNF members change the status of chromatin access by RNA polymerase II as well as related factors through nucleosome repositioning; and INO80 members participate in exchanging nucleosome subunits (nucleosome editing). In this sense, the SWI/SNF subfamily might be considered to be most related to transcriptional regulation, but, in reality, the above view is oversimplified and obviously other members are involved in transcriptional regulation as well. Indeed, a new classification scheme of CRC has been proposed, based on the genome-wide binding profiles of CRCs with respect to a variety of epigenetic information, such as DNA methylation, nucleosome positioning, histone modification, and Hi-C contacts.²⁵ According to this scheme, CRCs are classified into either the ones associated with (mainly) actively marked chromatin (Group 1) or the others associated with repressively marked chromatin (Group 2). Interestingly, this classification looks rather independent from the above-mentioned sequence similarity-based one: for example, there are two kinds of ATPase subunits in the mammalian SWI/SNF family: BRG1 and BRM. BRG1 belongs to Group 1 while BRM belongs to Group 2 (Table 2.1). More studies are needed for further understanding of their functional roles. Many additional factors also contribute to chromatin remodeling. According to a recent study using Xenopus, ISWI is required for the de novo TAD formation.²⁶

    Table 2.1

    Eight typical CRCs are classified based on the features of their ATPase catalytic subunits by a conventional sequence-based way (columns) and another proposed by Giles et al.²⁵ (rows).

    Another important family of regulator complexes is the polycomb-group (PcG) proteins, which are classified into two subfamilies: PRC1 (polycomb repressive complex 1) and PRC2.²⁷ PRC2 methylates lysine 27 of histone H3 (i.e., H3K27me3) and thus induces the silencing of chromatin. PRC1 stabilizes this silencing. They are known to play important roles especially in developmental processes, such as homeotic gene regulation and the inactivation of X chromosome. Their (positive/negative) interaction with ATP-dependent CRCs is also important.²⁸

    It is interesting to know how the histone marks are read by such protein complexes, leading to the remodeling of the region. It is known that some subunits of CRCs contain protein domains that can recognize histone modifications. For example, bromodomains (this name is confusing because they have nothing to do with the element bromine) exist in the SWI/SNF subfamily of CRCs and they are known to recognize the structure of acetylated lysine; chromodomains exist in the CHD subfamily and recognize methylated lysine; and the PHD (plant homeodomain) finger domains exist in many proteins (over 100 in human genome), including some members of the ISIW subfamily, and are thought to be the main readers. Many of them which have been characterized so far recognize unmodified and methylated lysine residues (H3K4); according to a recent systematic study, 31 of 123 annotated domains showed strong preference of binding to the tail of histone H3.²⁹ Some other domains, including the BAH (bromo-adjacent homology) domain, the PWWP domain, and the WD40 domain, have shown to be important, too. However, the reality of the CRC-recruiting mechanisms is likely to be far more complicated, partly because each complex contains many subunits that may have additional domains and partly because their interaction with modification enzymes (writers and erasers) as well as transcription regulators is also important.

    2.5: Chromosome compartmentalization

    From imaging and related studies, it has been clarified that the inside of the nucleus is not homogenous. One typical example is the nucleolus, which is not bounded by a membrane. The existence of many other subnuclear membrane-less bodies has been proposed. They are called nuclear bodies (or nuclear domains or dots), which include Cajal bodies and PML (promyelocytic leukemia protein) nuclear bodies. In the study of transcription, nuclear speckles are important³⁰: their shape is irregular and about 20–50 of them exist in each nucleus. Although they were first found as granules enriched with RNA splicing factors, they also seem to play many important roles in transcription: a recent study shows that nuclear speckles work as a gene expression amplifier (i.e., the expression of endogenous genes is enhanced when they are associated with speckles).³¹ On the contrary, some bodies seem to be associated with gene repression: it is proposed that PcG forms a membrane-less compartment called PcG (or polycomb) body that contain many co-repressed genes, some of which are important during early development. How these nuclear bodies are formed is not known but recently several lines of evidence show that all of them are formed through LLPS¹⁷ (see below).

    From the side of chromatin DNA, it is known that some local chromatin regions are associated with the above bodies. Two typical examples are NADs (nucleolus-associated domains) and LADs (lamina-associated domains).³² The nucleolus is the largest substructure in the nucleus and is characterized as the place of ribosome biosynthesis. The gene-density of NADs is low and the genes in these domains are thought to be relatively inactive but, at least in stem cells, some of them may work actively. The nuclear lamina is a fibrillar network structure that exists on the inner surface of the nuclear membrane. LADs are chromatin domains that interact with the nuclear lamina and thus are located at the periphery of the nucleus. The existence of LADs was first proposed using an experimental technique known as DamID, which uses a bacterial methylation system as a mark of interaction.

    More generally, through Hi-C studies, it has been proposed that the entire genome is divided into A and B compartments, both of which are several mega-bases in size.³³ In short, the A compartment can be regarded as euchromatin and the B compartment as heterochromatin. The boundaries of the compartments can be defined based on the sign (positive/negative) of the first principal component derived from the contact matrix normalized with the expected contacts. A more recent study showed that A/B compartments can be reliably estimated using various kinds of epigenetic data, too. Although the relationship between TADs, which are also defined through the same Hi-C data, and A/B compartments is not established, it is tempting to assume that some TADs with open chromatin form A compartments. Live imaging studies show that B compartments are basically positioned at the periphery of the nucleus, containing LADs. Each compartment (genomic region) tends to interact with the same compartment; probably because these interactions are not strong enough to separate the two compartments completely, there are many microcompartments, each of which contains either A or B compartments, in the nucleus. This may be interpreted as a phase-separation process: DNA can be regarded as block copolymers, composed of alternating A/B compartments (or euchromatin and heterochromatin), each of which attracts each other. Based on this idea, Belaghzal et al. developed a variant of the Hi-C method, liquid chromatin Hi-C, in which DNA is fragmented before cross-linking: the dependency of fragment length against the stability of a certain condensate shows the strength of intrinsic interaction, such as the A to A interaction.³⁴ They concluded that phase segregation occurs when blocks of a particular chromatin state are at least 10 kb. In addition, they report that LADs, which are enriched in a subgroup of the B compartment, are most stable whereas speckles and polycomb bodies, which can be regarded as more facultative, are less stable. With further progress of this kind of studies, we will be able to understand how the chromatin is organized and dynamically regulated in a unified way.

    Finally, as an even higher-order chromatin structure, it has been proposed that each chromosomal DNA molecule occupies a certain nonrandom region within the nucleus, named a chromosome territory.³⁵ Although the details of such structures are still enigmatic, a recent study indicated the importance of condensin II, another family member of the SMC protein complexes, in maintaining the chromosome architecture.³⁶ The study also showed the cross-species conservation/differences of these architectures.

    3: Roles of enhancers in cell type-specific gene expression

    3.1: General features

    Although it is not easy to define enhancers rigorously (and their definition can be varied between researchers and/or historically), they are roughly considered as DNA regions (with 100–1000 bp in length) that increase the basal transcription level of their target gene(s), which is sometimes located very far (>  1 Mbp) away from them.³⁷–⁴⁴ Enhancers are considered to be bound with a variety of transcription factors and cofactors, which can interact with the promoter (or transcriptional start site (TSS)) region of their target genes by DNA looping. Therefore, the relative distance and orientation between them are relatively less important (though most effective enhancers are thought to be located within 100 kb of their target promoters⁴⁵); a target gene of an enhancer is not always its nearest neighbor and the nearest gene for an enhancer is not always its target (according to a recent study, 23% of the predicted target genes were not the closest one from enhancers⁴⁶). This looping mechanism seems to be associated with chromatin remodeling, as described below. It has been postulated that enhancers play a major role in orchestrating cell type-specific and developmentally specific gene expression. Here, we summarize what is known about enhancers.¹

    While enhancers have been characterized individually using, say, reporter assays in old days, nowadays, they are characterized genome-widely from rather indirect evidences because it is now accepted that there are so many enhancers; according to a recent (predictive) estimation from 131 samples of various types of cells, there were about 270,000 unique enhancers in the human genome, compared to about 23,000 expressed (protein-coding) genes; on average, each enhancer regulates 2.7 genes while each gene is regulated by 2.8 enhancers.⁴⁶ Histone marks are the most conventionally used⁴⁷ (see below). Among many algorithms that link the epigenetic information and the functional state of the genome (such as active enhancers), ChromHMM,⁴¹ which is used in the ENCODE project⁴⁸ and the Roadmap Epigenomics Consortium,⁴⁹ is well known. In Table 2.2, a simplified list of chromatin states and their typical marks is shown. Please note that the representative marks shown here are neither necessary nor sufficient conditions for the definition of these states; the states are determined based on the combination of marks including their neighboring features and the appearances of these marks in each state are not so uniform. For example, H3K4me1 is known as a mark of active enhancers but it is also seen (weakly) in many promoters (and vice versa). The balance between H3K4me1 and H3K4me3 might be important for distinguishing them.

    Table 2.2

    For simplicity, the original 15 chromatin states used in the Roadmap Epigenomics Consortium are summarized into eight states here. For each state, typical histone marks are indicated with the × character, and its averaged coverage in percentage. Note that these marks are displayed intuitively rather than quantitatively. For more details, see original references.⁵⁰

    In addition, positions of open chromatin, determined with ATAC-seq or DNase-seq/MNase-seq, can be a useful clue. ATAC-seq (assay for transposase-accessible chromatin using sequencing) is a conventional technique for making a profile of chromatin accessibility using a Tn5 transposase mutant.⁵¹ Another characteristic of active enhancers is enhancer RNA (eRNA), which is short in length (0.5–2 kb) and is transcribed bidirectionally from the enhancer region.⁵² eRNAs are not polyadenylated and are rapidly degraded. The activity of an enhancer is roughly correlated with its eRNA expression. Another class of eRNAs, 1D eRNAs, was once proposed.⁵³ They are transcribed unidirectionally, polyadenylated, and rather long (>  4 kb). The general function of eRNAs has not been established; it is still possible that they are transcriptional noises, produced by leaky transcription at open chromatin, though several studies support that some eRNAs have some functional roles in cis (as an adaptor) or in trans (i.e., working at distant locations). They seem to be used as a mark of active enhancers; an atlas of enhancers was constructed based on systematic detection of eRNAs.⁵⁴ Another mark of active enhancers is the bound coactivators, such as p300 histone acetyltransferase (HAT).⁵⁵

    Most importantly, in this review, it is generally accepted that enhancers are mainly responsible for the cell type-specific expression of genes.³⁹ However, although there are plenty of examples where disruption of certain enhancers significantly affects development and diseases,⁴⁰,⁴²,⁵⁶ its direct proof is not easy partly because at present there are no reliable ways to detect the target genes of each active enhancer systematically (but see Section 3.8). At least, it is certain that there is a set of enhancers that are active in each cell type and that they can be characterized by histone markers and some additional evidences, such as the RNA polymerase II occupancy.⁵⁷ Recently, based on both their own experiments using embryonic/adult erythroblasts and computational studies using public databases, Cai et al. claimed that enhancer dependence of cell type-specific gene expression increases with developmental age⁵⁸; in other words, the dependence on more proximal promoters is higher in embryonic cells. It should be noted that these promoters might be regarded as one particular subset of enhancers, because the distinction between enhancers and promoters is not so clear, except for the fact that promoter sequences contain transcriptional start site information.⁵⁹

    3.2: Classification of enhancer states

    Since there are so many (potential) enhancers in the genome and since they need to orchestrate cell type-specific gene expression, appropriate enhancers must be chosen for each cell type or condition. Moreover, since there are series of related cell types, the use of enhancers must allow such flexibility. Indeed, the states of enhancers are not binary (on/off or active/inactive) but include intermediate states ready for activation. Namely, they are classified into four: inactive, primed, poised, and active states.³⁷,³⁹ Their properties are discussed below and are summarized in Table 2.3.

    •Inactive enhancers are thought to be buried in tightly packed chromatin. No histone marks are necessary on them.

    •Primed enhancers can be regarded as being in the state prepared for their immediate activation. Such an activation may be caused by the binding of additional signal-dependent transcription factor(s) (SDTFs), working as a cue. Primed enhancers are typically marked with H3K4me1 (and H3K4me2 but lacks H3K27ac) by two (redundant) histone methyltransferases, MLL3 (myeloid/lymphoid or mixed-lineage leukemia protein 3) and MLL4. Generally speaking (though may not be mandatory), in primed enhancers, H3.3/H2A.Z variants, which make the nucleosomes hyperdynamic, are enriched and thus nucleosomes are sparse (i.e., open chromatin). However, no eRNAs are produced. Primed enhancers may be created by the binding of a pioneer transcription factor (or lineage-determining transcription factor; LDTF), which may directly bind to tightly-packed chromatin by, say, recruiting a chromatin remodeler complex and/or a variety of protein complexes, containing readers/erasers/writers of histone marks etc. Two examples of such pioneering factors are FAXA1 in the liver development and PU.1 in macrophage/B-cell development (see below). 5hmC methylation (see Section 2.1) may also be enriched in primed enhancers.

    •Poised enhancers are predominantly found in human and mouse ESCs (embryonic stem cells). They can be regarded as a subset of primed enhancers but are characterized with the presence of the repressive H3K27me3 mark as well as H3K4me1/me2. They are also bound with p300 but additionally bound with PRC2. EZH2, a component of PRC2, works as a histone-lysine N-methyltransferase to deposit H3K27me3 marks. A complex containing HDAC (histone deacetylase) prevents the histones from acetylation.

    •Finally, in active enhancer regions, the chromatin is even more open and their flanking histones are predominantly marked with H3K27ac (established by HATs (histone acetyltransferases)) and H3K4me1/me2 (by MLL3 and MLL4). In addition, histone demethylases (HDMs) remove the H3K27me3 marks. These enzymes are thought to be recruited as parts of coactivators by the SDTFs. Their histones are likely to be enriched in H3.3 and H2A.Z; eRNAs are synthesized; many transcription factors as well as p300, Pol II, and the mediator complex (see Section 3.5) bind to them.

    Table 2.3

    Key properties of the four states of enhancers are summarized.

    It is mysterious how pioneering factors (or lineage-dependent TFs) can find appropriate cell type-specific enhancer candidates from so many potential binding motifs in the genome. Through ChIP-seq analysis, it was demonstrated that PU.1 distinguishes macrophage-specific and B-cell-specific enhancers rather precisely; PU.1 uses cell type-specific partners of pioneering factors (e.g., C/EBPs in macrophages versus EBF1 in B-cells).⁶⁰ According to a systematic survey of combinations of DNA motifs in Drosophila, direct interactions between TFs require DNA motifs to be located within some range of spacing.⁶¹ In other words, these motifs are not necessary to be immediately adjacent each other, which will substantially increase the number of potential false positives (i.e., positions that apparently satisfy the required condition for the motifs but do not work as real enhancers). Computational studies to identify cell type specific motifs associated with H3K4me1-marked regions have also been attempted. More studies will clarify the more detailed mechanisms of enhancer selection.

    As described above, primed and poised enhancers are thought to be activated by the activation of additional transcription factors (i.e., SDTFs). Typical examples of SDTFs are the members of the nuclear receptor and NF-κB families. These factors also regulate gene expression as ordinary TFs (i.e., non-SDTFs) in different cell types. Although these SDTFs play a role in the activation of primed/poised enhancers, they are not required for the definition of these enhancers, which are predefined in advance by LDTFs. However, the SDTFs can also cooperate with LDTFs to newly define de novo enhancers. For example, ecdysone receptor is known to select enhancers with other factors in insects,⁶² while FOXP3 binds almost exclusively to poised enhancers for regulatory T cell lineage specification.¹⁰,⁶³

    3.3: Enhancer grammar

    In the above scenario, the binding of pioneering transcription factor(s) to tightly packed chromatin predefines cell type-specific enhancers. Although this scenario might be oversimplified, enhancers can be characterized as clusters of transcription factor binding sites (TFBSs), and thus, it is of special interest to identify general rules of sequence features that define cell type-specific enhancers. Such rules are often called the enhancer grammar and have been studied by many researchers. For the organization of TFBSs, two extreme models can be considered conceptually: one is the enhanceosome model, where the bound TFs interact cooperatively and thus the relationship of their binding sites, in terms of their relative spacing, orientation, order, number, binding affinity, etc., is rather rigid, while the other is the billboard model, where the interaction between bound TFs is indirect and the relative positions of their binding sites are rather flexible.⁴⁰ Real enhancers are intermediate between the two models but, as described below, the evolutionary conservation of enhancer sequences is not so strict, supporting the flexible nature of the grammar. Recently, a review on motif grammar, which is a basis of enhancer grammar, was published.⁶⁴

    There have been many attempts to characterize cell type- or tissue-specific enhancers/cis-regulatory modules computationally. Our group has also attempted to detect effective rules that can discriminate a given set of tissue-specific regulatory sequences from others in several organisms, using a genetic algorithm.⁶⁵ One of the weak points of our approach was that we only searched features in a rather neighboring region from the TSS, which will not be able to cover distantly located regulatory regions, and thus, we have recently focused on the analyses of organisms with smaller genomes.⁶⁶ Zhao et al. explored cell type-specific chromatin signatures using many RNA-seq and ChIP-seq data from several cell types.⁶⁷ Although their findings do not look so striking, this kind of research should be attempted more. Recently, Chen and Capra reported an interesting attempt to learn and interpret the enhancer (or regulatory) grammar using a deep learning framework⁶⁸: they prepared synthetic training sequences based on a combination of several grammars (such as homotypic/heterotypic clusters of TFBSs) and trained deep residual neural networks (ResNets). What the networks learned can be visualized using an unsupervised clustering method. When applied to real enhancers, they could learn a known heterotypic grammar. Such an approach will be useful to identify the variation of existing enhancers but the method should be further improved because its results can be significantly influenced by the negative data used, for example.

    Enhancer grammar has also been explored experimentally. For example, Keller et al. identified that two TFs, Dorsal (Dl) and Zelda (Zld: a pioneering factor), play distinct roles in determining the spatiotemporal range of t48 gene in Drosophila development, in terms of the transcription bursting mechanism.⁶⁹ In another recent example, King et al. synthesized artificial cis-regulatory sequences, composed of binding sites for SOX2, OCT4, KLF4, and ESRRB.⁷⁰ With a MPRA (massively parallel reporter assay), they compared the activity of these sequences with the original genomic sequence. Their results show that the mere number of binding sites was the most important factor in the grammar, implying the relatively independent activity of TFs, rather than their cooperativity. Recently, a rather large-scale exploration of randomized synthetic yeast promoters was conducted.⁷¹ Such data will be undoubtedly useful for future theoretical studies on motif grammar. Of course, this kind of assay does not reflect the situation in the 3D genome but the value of such data is obvious.

    3.4: Evolution of enhancers

    Another way to explore the enhancer grammar is to examine the evolutionary conservation of enhancers. Namely, genome sequence comparisons between closely/distantly related species have been used to delineate functionally important regions/features. For the finding of hidden regulatory elements, such a method has been called phylogenetic footprinting.⁷² However, in spite of their functional importance, the degree of sequence conservation of enhancer regions is not so high. In a survey of 104 experimentally validated murine enhancers, only 10.5% of them were conserved in zebrafish and the conserved ones were favored for developmental regulator genes, suggesting the relative importance of enhancers in development.⁷³ In another experiment using synthetic enhancers derived from highly conserved zebrafish ones, the relative distance and orientation between two elements were highly flexible and the authors’ conclusion was that the words of the regulatory code are arranged in a variable manner.⁷⁴ Moreover, according to a comprehensive comparison of enhancers/promoters (identified using histone marks) that are active in liver across 20 species, enhancers were found to be far less conserved than promoters and their turnover is very rapid by exaptation (i.e., reforming of ancient sequences used for a different purpose).⁷⁵ Using a similar histone mark-based approach, highly conserved enhancers and species-specific enhancers were compared⁷⁶: highly conserved enhancers had higher density and diversity in their TFBSs and they were likely to be more pleiotropic (i.e., influencing multiple traits). The same group developed enhancer predictors based on SVM (support vector machine) or CNN (convolutional neural network) algorithms applied to the 5-mer frequency of input sequences and reported that the predictors can work across species, suggesting some degree of sequence conservation exists between species.⁷⁷ Recently, it has been shown that some human and mouse enhancers share a similar set of sequence motifs with those of marine sponge and zebrafish.⁷⁸ Moreover, the human and mouse enhancers turned out to drive a similar cell type-specific expression of a reporter gene, implying the deep conservation of the enhancer grammar.

    3.5: Enhancer-promoter interactions

    As described above, the positions of enhancers are thought to be first recognized by pioneering transcription factors. After the binding of several transcription factors, they recruit coactivators to the enhancers.⁵⁵ Unlike transcription factors, coactivators do not bind to DNA specifically but help the action of the activators, which are the transcription factors bound to these enhancers, to the promoters of their target genes. Similar to coactivators, corepressors, which repress the transcription of the target genes, exist; indeed, the roles of being either a coactivator or a corepressor seem to be switched through the change of its surrounding environments, and thus, it may be more suitable to collectively call them coregulators. The basic functions of coactivators are (1) the modification of histones (histone acetyltransferases, such as P300/CBP, and histone methyltransferases, such as MLL3/4); (2) the remodeling of the chromatin structure (such as BRG1); and (3) the promotion of crosslinking between the enhancers and the promoters (more specifically, the basal transcriptional machinery or the preinitiation complex (PIC) on them) by DNA looping. For the third cross-linking function, the role of the mediator complex seems to be most important.⁷⁹

    The mediator complex is a large multisubunit protein complex, containing 25 subunits in yeast and up to 30 in humans; the composition of the components can be varied under different conditions. The complex is required for the expression of almost all protein-coding genes in yeast. In addition to promoting the PIC formation, it stimulates the phosphorylation of the C-terminal domain (CTD) of RNA polymerase II by CDK7 and this phosphorylation is necessary for the transition from the initiation step to the elongation step of transcription. The mediator complex consists of four modules: head, middle, tail, and CDK8 kinase modules. The association of the CDK8 module with the others is rather transient. The complex can interact with many kinds of transcription factors mainly at its tail module. This versatility partly comes from the use of different components. The interaction between mediator and TFs can be complicated because they are likely to be regulated by their posttranslational modifications (e.g., phosphorylation). Recently, the 3D structure of the human mediator bound with the preinitiation complex was resolved using cryo-electron microscopy by several groups.⁸⁰–⁸² These structures are no doubt useful in our understanding on the above-mentioned mysteries about the function of mediator complex. It seems that the mediator complex is for general use and may not be so critical in determining the specificity of enhancers and their targets. As will be discussed in later sections, the most important factor that determines the target(s) of an enhancer seems to be the surrounding chromatin domain, such as the TAD, though there must be additional factors.¹

    Indeed, the involvement of several factors to the enhancer-promoter interaction has been proposed. For example, the cohesin complex,⁸³ which forms a characteristic ring structure surrounding DNA strands and thus can stabilize loop formation, has been shown to interact with the mediator complex.⁸⁴ As described above, it has been proposed that cohesin helps the change of enhancer-promoter interactions between different cell types via the DNA loop extrusion mechanism.²³,⁸⁵ In another example, there are several studies that support the involvement of noncoding RNAs. Namely, (some of the) eRNAs may interact with subunits of cohesin and/or mediator, facilitating the loop formation; they may also interact with coactivators, such as P300, affecting the histone modification, or with the transcription machinery. In addition, there are some studies claiming that some long noncoding RNAs can work as enhancers (i.e., activating the transcription of neighboring genes) or help the work of other enhancers by interacting with the mediator complex, etc.⁵³ Since these topics are still controversial, more studies are needed.

    3.6: Super-enhancers

    In the mammalian genomes, there are regions where a large number of enhancers are densely located. Such regions were named super-enhancers by Richard A. Young’s group in 2013.⁸⁶,⁸⁷ A related notion was termed stretch enhancers by a different group. Although there can be variations of the definition of super-enhancers, the ROSE (rank ordering of super-enhancers) algorithm, which stitches neighboring enhancers, evaluating the clusters using ChIP-seq data, is frequently used.⁸⁶ The median size of super-enhancers is reported to be from 10 kb to over 60 kb where that of typical enhancers is from 1 to 4 kb (please note the discrepancy in the typical size of enhancers written in Section 3.1; this is due to the ambiguity in the definition of enhancers). Their number for each cell type is about 100–1000, much smaller than that of typical enhancers. Super-enhancers are hyperactive; they are densely bound with master transcription factors as well as the mediator complex; they are also highly enriched with active histone marks. It is still controversial on whether their function is more than the mere sum of the enhancers they contain or not.

    Super-enhancers are thought to control the cell identity. One of its grounds is that the majority of super-enhancers are active in only a few cell types. Moreover, when we group different cell types by the similarity of their active super-enhancer positions, this results in a classification that fits reasonably well with the nature of these cell types. There are several databases of super-enhancers, such as SEA (ver.3)⁸⁸and SEdb,⁸⁹ which contain hundreds of thousand super-enhancers, derived from the data of hundreds of cells.¹²

    Concerning the debate on whether super-enhancers are just the sum of many enhancers or not, it has been suggested that super-enhancers can be a core of active and stable transcription factories that are separated by the LLPS mechanism. It has been shown that super-enhancers are bound with proteins, such as BRD4 and MED1, having intrinsically disordered regions (also known as low-complexity domains) which tend to cause the phase separation. In the same issue of Science, three reports that support this idea were published in 2018.⁹⁰–⁹² However, it must be noted that there is still controversy on the role of such condensates, claiming that the supportive evidences are only qualitative (see McSwiggen et al. for a critical review on phase separation in living cells⁹³).

    3.7: Insulated neighborhoods

    Insulators are a type of cis-regulatory elements, working as an enhancer blocker and/or a barrier to the spreading of heterochromatin.⁹⁴ The enhancer blocking means that an insulator element located between an enhancer and its potential target gene interferes with their interaction (i.e., loop forming). In mammalian cells, this function is mediated by the DNA loop formation by connecting a pair of insulators, both bound by the CTCF protein (i.e., homodimer formation). In addition, this connecting point (anchor) is surrounded by the cohesin ring. The loop region delimited by a pair of CTCF-bound insulators is called the insulated neighborhood. It is thought that the action of an enhancer within an insulated neighborhood only reaches to genes that are in the same neighborhood. These insulated neighborhoods are thought to correspond with TADs or sub-TADs. According to a study by Richard A. Young’s group in 2016, the landscape of CTCF-CTCF loops between naïve and primed ES cells is very similar and the regulatory changes between these cell states tend to occur within them.⁹⁵

    If the insulated neighborhood really defines the target genes of any given enhancer, it would be helpful in systematically identifying the enhancer-target gene relationships.⁹⁶ Although an interesting project has been reported recently (see below), it seems that the quantitative validation of the notion of the insulated neighborhoods on enhancer selection at different cell types has not been established. Toward this direction, however, such 3D genome data have been used for validating/constructing enhancer-target gene prediction algorithms. As a recent example, a benchmark of candidate enhancer-gene interactions (BENGI) was constructed using three types of data (3D chromatin interaction, genetic interaction (eQTLs), and CRISPR/Cas9 perturbation).⁹⁷ According to a recent review, the enhancer-target gene prediction algorithms can be classified into four groups: correlation-based, supervised learning-based, regression-based, and those based on other scores.⁹⁸ Of these, correlation-based methods use the co-occurrence between the active enhancer marks and the active genes in a specific cell type(s); supervised learning-based ones integrate various kinds of features based on positive/negative data; and the regression-based methods deal with the combinatorial effects of multiple enhancers on a target gene. According to a benchmark using the above BENGI data, correlation-based approaches were worse than a baseline distance-based method (such as the closest pair) while supervised methods are better than the baseline methods. Among them, TargetFinder showed relatively good performance.⁹⁹

    In 2019, Fulco et al. proposed a simple model to estimate the relative contribution of distal enhancers to a given gene based on their extensive enhancer perturbation experiments with their CRISPRi-FlowFISH method.⁴⁵ They named the model the ABC (activity-by-contact) model: the score is calculated from the product of the activity term, estimated from DHS (DNase I hypersensitive sites) and ChIP-seq (H3K27ac) data, and the contact term, estimated from Hi-C data. They reported that their ABC model is more reliable than other predictors. More recently, the same group expanded their prediction based on their ABC model to 131 human cell types and tissues.⁴⁶ Across these samples, they identified about 6.3 million enhancer-gene connections for about 23,000 expressed genes. On average, there were about 48,000 enhancer-gene connections for about 18,000 unique enhancers in a sample; these enhancers covered about 12% of chromatin accessible regions. They found that their map of enhancer-target genes in these cell types is effective in interpreting GWAS (genome-wide association study) variants. Thus, this kind of data will be useful for increasing our understanding about the mechanism of cell type-specific gene expression, as well as for medical applications, in which individual genomes could be used in the diagnosis of, say, cancer patients.

    3.8: Cell type-specific variation of chromatin structure

    As described above, cell type-specific gene expression is thought to be largely regulated by enhancers (though the degree of their contribution can be variable between cells). Moreover, the selection of target genes by enhancers seems to be highly constrained by its surrounding higher-order chromatin structure, such as the insulated neighborhood. However, whether each cell type owns its specific 3D chromatin structure, such as the TAD organization, has not been clarified yet. In other words, it is not clear whether the variation of TAD organizations between cell types is significant and stable, compared to the variations between cells belonging to the same cell type. Nor has it been clarified whether the differential chromatin structures can explain a lot of the cell type-specific gene expression. As we described above, it has been reported that the TAD structure (as well as the binding profile of CTCFs) is relatively stable between cell types, while sub-TADs (or their related structures, such as the contact domains) can be more flexible. Note that this issue depends on how we define cell types and how we distinguish different cell states belonging to the same cell type. There is a hypothesis that a cell type can be defined based on a evolutionarily conserved activated gene regulatory network (termed the core regulatory complex), while different cell states of the same cell types are derived from the activation of additional regulatory module(s).¹⁰⁰ Perhaps, the activation of a set of enhancers from their poised state to the activated state can cause such changes of cell states.

    As for studies on the relationship between chromatin structure and gene expression, in an early study using 20 human cell types in 2014, about 25% of cell type-specific gene expression is explained by the changes of chromatin structure, probed by the change of their DNaseI hypersensitivity.¹⁰¹ In 2018, it was shown using 4C-seq (i.e., sequencing with circular chromatin conformation capture) and ChIP-seq data that a cell type-specific rewiring of the enhancer-promoter interaction explains the cell type-specific expression of the GILZ gene.¹⁰² In a more global analysis, we compared published Hi-C data from various sources and found that about 5% of genes that were in the repressive B compartment in normal pro-B cells were switched to be in the A-compartment in B lymphoma cells.¹⁰³ In Fig. 2.2, an example of how the boundaries of A/B compartments are similar/dissimilar between four types of mouse cells is shown (Luis A.E. Nagai and Kenta Nakai, unpublished; original data were taken from four references¹⁰⁴–¹⁰⁷). Another study reported a further comprehensive comparison between 137 Hi-C samples from nine studies.¹⁰⁸ It showed that TAD structures are relatively stable though there are differences even between replica experiments and the biases caused from different experimental procedures. They also reported that the difference between individual genomes is not so significant. In another study based on the comparison of multiple single-cell Hi-C data, cell type-specific chromatin structure was detectable in spite of the rather severely noisy and sparse nature of the data.¹⁰⁹ Recently, a rather comprehensive atlas of cohesion-mediated chromatin loops, detected by ChIA-PET across 24 human cell types, was published as an effort of the ENCODE project.²¹,¹¹⁰ As described in Section 2.3, 28% of the loops were variable between cell types and their correlation between their specific gene expressions was modest. But the similarity between the loop organizations was correlated with the similarity in their cell lineage, such as blood, embryonic cells, and the cells derived from solid tissues. Therefore, it is likely that at least a subset of genes is regulated by relatively subtle changes of their surrounding chromatin loop structure. Indeed, according to a study using iPS cells and their differentiated cells (iPSC-derived cardiomyocites) from seven individuals, there was quantitative proportionality between the (small) changes in contact propensity caused from the subtle changes of chromatin loops and the molecular phenotypes, such as the gene expression and the H3K27ac level.¹¹¹ They also reported that the allelic differences of the propensity were frequently observed in imprinted

    Enjoying the preview?
    Page 1 of 1