Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Genomics of Rare Diseases: Understanding Disease Genetics Using Genomic Approaches
Genomics of Rare Diseases: Understanding Disease Genetics Using Genomic Approaches
Genomics of Rare Diseases: Understanding Disease Genetics Using Genomic Approaches
Ebook724 pages9 hours

Genomics of Rare Diseases: Understanding Disease Genetics Using Genomic Approaches

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Genomics of Rare Diseases: Understanding Disease Genetics Using Genomic Approaches, a new volume in the Translational and Applied Genomics series, offers readers a broad understanding of current knowledge on rare diseases through a genomics lens. This clear understanding of the latest molecular and genomic technologies used to elucidate the molecular causes of more than 5,000 genetic disorders brings readers closer to unraveling many more that remain undefined and undiscovered. The challenges associated with performing rare disease research are also discussed, as well as the opportunities that the study of these disorders provides for improving our understanding of disease architecture and pathophysiology.

Leading chapter authors in the field discuss approaches such as karyotyping and genomic sequencing for the better diagnosis and treatment of conditions including recessive diseases, dominant and X-linked disorders, de novo mutations, sporadic disorders and mosaicism.

  • Compiles applied case studies and methodologies, enabling researchers, clinicians and healthcare providers to effectively classify DNA variants associated with disease and patient phenotypes
  • Discusses the main challenges in studying the genetics of rare diseases through genomic approaches and possible or ongoing solutions
  • Explores opportunities for novel therapeutics
  • Features chapter contributions from leading researchers and clinicians
LanguageEnglish
Release dateJun 12, 2021
ISBN9780128204368
Genomics of Rare Diseases: Understanding Disease Genetics Using Genomic Approaches

Related to Genomics of Rare Diseases

Related ebooks

Medical For You

View More

Related articles

Related categories

Reviews for Genomics of Rare Diseases

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Genomics of Rare Diseases - Claudia Gonzaga-Jauregui

    Chapter 1

    Introduction to concepts of genetics and genomics

    Karlla Welch Brigatti,    Clinic for Special Children, Strasburg, PA, United States

    Abstract

    Several thousands of rare diseases have now been characterized. Development and advances of genomic sequencing platforms have accelerated the pace of new diseases discovery and their molecular underpinnings, with continuous additions to this burgeoning catalog of rare diseases. Connecting all of them are shared principles inherent to human genetics and genomics. This chapter will serve as a primer of those basic concepts with applications throughout this book and in rare disease research and treatment. An exhaustive survey of this topic would take a book in and of itself; this merely provides an overview of these concepts and serves as an accessible resource geared for the nongeneticist, beginning with cell biology and extending to classes of disorders and modes of inheritance. Woven throughout are aspects and concepts of genetics and genomics with broad application for most clinicians and scientists interested in the genomics of human diseases.

    Keywords

    Genetics; genomics; rare diseases; Mendelian genetics; gene; chromosome; human genome

    1.1 Introduction

    Genetics is broadly considered as the study of biological inheritance and traits, whereas the totality of the genetic information of an organism is known as the genome. The large-scale study of the information contained in the genome is referred to as genomics. While genomic studies began with the sequencing of the whole genome of the bacteria Haemophilus influenzae in 1995, the completion of the first draft of the human genome reference sequence by the Human Genome Project (HGP) in 2003 (see Chapter 4: Genomic Sequencing of Rare Diseases) and the continued advances in molecular biology, biochemistry, biophysics, computational sciences, and biotechnology ushered in a new discipline of study, human genomics. The breathtaking advancements of human genomics in the last decades have allowed scientists to understand the human genome, and its variation, to a precise and unprecedented degree, further enabling the application of this knowledge into clinical genomics. Bioinformatics is the interdisciplinary field of biology and computer science that analyzes complex genomic information to interpret biological variant data and predict gene function or dysfunction.

    1.2 The human genome: structure and function

    With the exception of mature red blood cells and cornified hair cells, all cells in the human body contain a nucleus that houses the majority of the human genome (nuclear genome). A much smaller genome can also be found elsewhere in the cell within the mitochondria (mitochondrial genome), the organelles responsible for producing the energy needed for cell function. Disruptions, or pathogenic variation, to either genome can lead to human disease.

    The genetic information of the human genome is maintained as deoxyribonucleic acid (DNA), a double-stranded macromolecule bound together in stable form as a double helix. Each DNA strand is made of a sugar and phosphate backbone coupled to a sequence of bases in one of four versions: adenine (A), cytosine (C), guanine (G), and thymine (T). These bases pair with one another across the two strands by hydrogen bonds in a prescribed Watson–Crick complementary base-pairing fashion: guanine on one strand pairs with cytosine on the other, as do adenine and thymine. This strict pairing of base pairs makes the sequence of one strand represent the reverse complement of the sequence on the opposite strand. Approximately six billion base pairs make up the diploid nuclear genome, which consist of two sets of chromosomes and genes, and consequently double the three billion base pairs that comprise the haploid reference human genome sequence (see Chapter 4: Genomic Sequencing of Rare Diseases).

    Long stretches of DNA are organized, supported, and packaged into 46 rod-shaped structures called chromosomes within the nucleus of the cell, arranged in 23 homologous pairs of matching DNA sequence. Twenty-two of these pairs are named and numbered from 1 to 22, according to size and relative to DNA and fraction of genome content with chromosome 1 being the largest and chromosomes 21 and 22 the smallest. These 22 pairs of chromosomes are called autosomes and are the same in males and females. The 23rd pair makes up the sex chromosomes: two X chromosomes are found in females, while an X and Y chromosome pair is found in biologically male individuals. The study of chromosomes, their structure, and their inheritance is known as cytogenetics. The chromosomal complement in a given cell is the karyotype, which also refers to the photographic arrangement of the magnified chromosome pairs after specific preparation and under certain staining conditions. Karyotyping and its role in rare disease discovery and diagnosis are explored further in Chapter 2, Karyotyping as the First Genomic Approach.

    The genome is characterized by long stretches of noncoding DNA sequences interspersed with smaller sequence content (coding DNA) that make up genes. Genes contain the instructions to direct the production of proteins or ribonucleic acid (RNA) products necessary for cells to perform their given function (Fig. 1.1). The human genome contains about 25,000 genes. Genes vary in length, but they share similar characteristics that relate to their function and help differentiate them from the surrounding noncoding sequence to computationally annotate and map them on the genome.

    Figure 1.1 Gene expression through transcription and translation, simplified in a hypothetical gene containing four exons and three introns. During transcription in the nucleus, the DNA sequence of a gene is used as a template to produce a pre-mRNA transcript that includes introns and exons. The four bases of DNA are shown in exon 2. In mature mRNA, the introns are spliced out, such that the coding sequence is continuous. The mRNA moves to the cytoplasm for translation, where ribosomes attach to the mRNA template and protein synthesis occurs. Specific amino acid tRNA molecules bind to the mRNA as determined by the sequence of mRNA codons, groups of three mRNA bases that correspond to one of 20 amino acids or three stop codons. A peptide bond forms between the growing amino acid chain until a stop codon is reached and the sequence is released. The polypeptide chain undergoes folding and posttranslation modifications to become a functional protein. DNA, Deoxyribonucleic acid; mRNA, messenger RNA; tRNA; transfer RNA.

    Structurally, genes are composed by different regions and recognizable sequence features. Exons are regions of DNA and the parts of a gene that determine the amino acid sequence of its protein product. Conversely, introns are regions of noncoding sequence separating exons from one another that are eliminated from the mature messenger RNA (mRNA) after transcription. In addition, a sequence of noncoding DNA known as the promoter can be found adjacent to the beginning of a gene (classically defined as the 5′ end) and acts as the region where certain regulatory proteins will bind sequence elements or motifs to enhance gene expression or silence it altogether. Alterations to the canonical DNA sequence, in the form of mutations or variants, in any of these structural elements of genes can disrupt normal function and expression of the gene, leading to human disease. The majority of variants currently associated with genetic conditions are found in the exons, which make up only about 1% of the haploid human genome and are maintained or constrained by selection and evolution; the aggregate sum of all exons is known as the exome.

    Following the HGP, the development of massively parallel sequencing technologies, also known as next-generation sequencing (NGS), enabled the rapid sequencing of millions of short DNA fragments in parallel, significantly reducing the cost of sequencing individual human genomes. The main applications of NGS in rare disease have focused on sequencing the protein-coding fraction of the genome through whole-exome sequencing (WES) together with whole-genome sequencing (WGS), which involves sequencing the totality of the DNA in the human genome, including the nonprotein-coding regions; these technologies are explored in-depth in Chapter 4, Genomic Sequencing of Rare Diseases. Both techniques are commonly used in genetics research and clinical genetics settings to identify and investigate the rare variant potential contribution or genetic etiology to the clinical presentation of a disorder under investigation, thereby rendering a molecular diagnosis.

    The flow of genetic information from DNA to RNA to protein product is known as the central dogma of molecular biology, and it can be predicted by scientists thanks to the elucidation and understanding of the genetic code, which establishes the rules of translation via the three base sequence or triplet code, from DNA sequence to amino acid composition of proteins. RNA is the mechanism for expression of the genetic information stored in the DNA toward the cell machinery to process and produce bioactive molecules in the form of proteins or noncoding RNA. RNA is similar to DNA, except that the sugar backbone is ribose, the thymine base is replaced by uracil (U), and RNA is single-stranded instead of double-stranded like the DNA double helix. When the product of a particular gene is needed, that portion of DNA containing the gene will unwind, and through a process known as transcription, a single strand of complementary RNA is generated, and the intronic sequences spliced out to produce a mature mRNA. Transcription takes place in the nucleus, where the DNA resides. Then, the mRNA moves from the nucleus to the intracellular cytoplasm, where organelles known as ribosomes utilize that mRNA template for protein synthesis through a process known as translation. During translation, the ribosome moves along the mRNA strand and binds the mRNA template to a second type of RNA known as transfer RNA (tRNA) that joins together specific amino acids as determined by three consecutive mRNA bases (known as codons) whose sequence encodes one of 20 possible corresponding amino acids (Table 1.1). The genetic code is said to be degenerate in that most amino acids are encoded by more than one codon. The standard start codon for translation of a gene is AUG, which encodes the amino acid methionine (Met or M), and establishes the reading frame for the ribosome to follow, adding corresponding amino acids in a polypeptide chain. The translation complex halts the process of protein production once it reaches a stop codon (encoded by one of the three codons UAA, UAG, or UGA), and the completed polypeptide is released from the ribosome for posttranslational modification. This process is illustrated in Fig. 1.1.

    Table 1.1

    The genetic code determines the translation of the DNA sequence encoded in genes into the corresponding sequence of amino acids to produce proteins.

    As previously mentioned, in addition to the genome housed in the nucleus of cells (nuclear genome), human cells also contain another smaller genome that resides within the energy-producing organelles of the cells, the mitochondria. The mitochondrial genome (mtDNA) is made up of a little over 16,000 DNA bases arranged in a circle that contains two related promoter sequences, one for each strand, which are transcribed in their entirety. All cells contain multiple mitochondria, each of which has several copies of their mitochondrial genome. The 37 genes encoded by the mtDNA are specific to the structure and function of the mitochondria itself, which are integral to the production of cellular energy. Unlike the nuclear genome, the mitochondrial genome is inherited only through the maternal line, as the sperm cell contributes no mitochondria during conception. A change in the mtDNA that alters the production of proteins necessary to meet the energy requirements of the cell can cause mitochondrial disease, often affecting the organs with high energy requirements, such as the brain, heart, eyes, and skeletal muscles. Nuclear genes also contribute to mitochondrial function, so mitochondrial disorders can result from alterations in nuclear or mitochondrial genes. Mitochondrial disorders are examined in depth in Chapter 7, X-linked and Mitochondrial Disorders.

    Replication of nuclear DNA occurs during cell division of somatic cells in a process known as mitosis, in which two genetically identical daughter cells are produced from the original parent cell, and which maintains the diploid (46) chromosomal content. Meiosis is the biological process of germ cell production. It is specific to the cells of the reproductive system and results in four haploid gametes (23 chromosomes each) that are genetically unique from each other and to the parent cell, due to the process of meiotic recombination. During fertilization, the egg and sperm join together and the full chromosomal complement is restored. The biological significance of these two important processes is ensuring the constancy of genetic information from one generation to the next and promoting genetic diversity. During the replication of DNA, either in mitosis or meiosis, changes in the DNA can occur, known as mutations. Several cellular proofreading and repair processes exist to ensure the integrity of the nuclear genome and fidelity of the code, though changes can sometimes escape detection. Certain exposures, such as ionizing radiation, can also increase the rate of mutation. Additionally, other biological factors in humans may contribute to an increased rate of mutation in offspring, such as advanced maternal age for chromosome aneuploidies like trisomy 21, commonly known as Down syndrome (see Chapter 2: Karyotyping as the First Genomic Approach), and advanced paternal age for single gene defects (see Chapter 6: Dominant and Sporadic De Novo Disorders).

    1.3 Genetic variation

    DNA sequence variation is a constant feature of both germ and somatic cells and can occur on a scale varying from single DNA nucleotide changes to deletions or duplications of entire chromosomes. The genetic information and variation encoded in the DNA combined with environmental influences determine individual characteristics and susceptibility to disease, and together make up the clinical characteristics or traits known as phenotype. The effect of DNA variation on gene expression and ultimately phenotype often depends on where the change occurs, for example, changes that occur in genes can ultimately alter proteins, whereas when DNA alterations happen in the noncoding regions of the genome, they tend to have no obvious or strong effect on cellular function. Such silent or subtle changes are generally considered to be benign polymorphisms. Some of these benign polymorphisms can also occur in coding sequences of genes but if they do not confer a deleterious effect to the cell, they can be passed on by generations contributing to common variation in the human population and to traits such as hair, skin, or eye color. Sequence variants that change one nucleotide in the DNA sequence and that differ between individuals or even humans and other species are known as single-nucleotide polymorphisms (SNPs) and occur quite frequently across the human genome (see Chapter 4: Genomic Sequencing of Rare Diseases). SNPs have been extensively studied in association with disease and drug response. When certain DNA changes occur in introns, exons, promoters, or span entire genes or chromosomes, they can abolish or alter the normal function of the encoded proteins and consequently exert a profound phenotypic effect. These changes are often referred to as deleterious variants or mutations, depending on whether they have been observed in other individuals in the population or they occurred as a new event in a given person, respectively.

    Single base-pair mutations (also referred to as single or simple nucleotide variants, SNVs) within the exon can alter the coding sequence, as illustrated in Fig. 1.2. Synonymous variants, also called silent mutations, occur when the single base pair substitution maintains the same amino acid, due to the degenerate nature of the genetic code, and consequently do not alter the final protein product. Nonsynonymous or missense mutations cause a codon change from one amino acid to another. A missense mutation may not exert a strong phenotypic effect if the new amino acid shares similar physicochemical properties to the original conserved amino acid at that position or occurs at a nonessential site along the protein. Other missense mutations ultimately alter the protein configuration or enzymatic function and may introduce novel properties that exert a deleterious effect or change an important one rendering the protein inefficient or nonfunctional. Nonsense mutations result from substitutions that introduce a premature stop codon in the mRNA sequence. If the nonsense mutation occurs early in the transcribed mRNA, the cell can identify the abnormal location of the stop codon and dispose of the defective transcript through a mechanism known as nonsense-mediated decay (NMD), which effectively leads to the destruction of the mRNA and the absence of a protein product resulting in a loss-of-function (LoF). Conversely, if the nonsense mutation occurs later toward the end of the transcript, the mRNA can escape NMD and go on to be transcribed into a truncated and nonfunctional protein product. In some instances, this truncated form of the protein, although unable to perform normal biological functions, can act as a toxic protein product that interferes with other proteins it may interact with, causing disease through a dominant negative effect (see Chapter 6: Dominant and Sporadic De Novo Disorders). Frameshift mutations are caused by the insertion or deletion (indels) of one to a few nucleotides by a number nondivisible by three. This disrupts the reading frame such that all ensuing DNA sequence is transcribed incorrectly and the improper amino acids are incorporated during translation from the location where the indel occurred. A frameshift mutation may also introduce a premature stop codon resulting in a nonfunctional and truncated protein product or leading to degradation of the mRNA through NMD. When nucleotides are inserted or deleted in the DNA sequence by multiples of three, the reading frame is conserved, although amino acids may be added or missing from the final protein product; these mutations are known as nonframeshifting or in-frame mutations. The addition or deletion of in-frame amino acids can sometimes occur in regions of the protein that are important for proper function or affect amino acids essential for enzymatic or catalytic functions. Lastly, splice site mutations occur at the junctions between exons and introns and may cause exons to be removed or intronic sequence to remain in the mature mRNA, altering the amino acid sequence and exerting a functional effect on the gene product.

    Figure 1.2 Types of mutations that can occur to the reference sequence of nucleotide base pairs and their effect on the resulting protein.

    Copy number variants (CNVs) are a class of structural variation (SV), meaning variation that modifies the architecture of the genome, involving alterations in the number of copies of specific regions of DNA, which can either be deleted or duplicated (see Chapter 3: Genomic Disorders in the Genomics Era). These involve large stretches of DNA varying from thousands of base pairs to segments or entire chromosomes (chromosomal aneuploidy; see Chapter 2: Karyotyping as the First Genomic Approach). Some large CNVs do not have any impact on gene function, while other small ones can exert a strong effect by removing sections of a coding gene or altering the expression or dosage of a given gene. While large CNVs can be evident on a karyotype, changes smaller than 35 Mb are below the resolution of chromosome studies; therefore the most common and precise technique in use for identifying submicroscopic CNVs is chromosomal microarray analysis (CMA). As discussed in depth in Chapter 3, Genomic Disorders in the Genomics Era, CMA will not identify balanced rearrangements of genetic material, such as balanced translocations, where different chromosomal segments can be joined together. Intrachromosomal submicroscopic inversions, although copy number neutral, can alter the normal expression of genes or disrupt those that occur at the breakpoint of the genomic rearrangement. Even if a balanced translocation maintains the full genetic complement, it can lead to abnormalities in copy number during meiosis and introduce CNVs in the gametes. Implementation of genomic sequencing technologies is allowing better detection and characterization of CNVs and SV in human genomes.

    1.4 Nomenclature in human genetics and genomics

    The Human Variation Genome Society (www.hgvs.org) maintains the standards for consistent nomenclature for the description of sequence variations and gene names. Human genes are named using symbols designated by the Human Gene Nomenclature Committee (HGNC) and are generally capitalized and italicized in print (e.g., SMN1 is the name for the survival of motor neuron 1 gene); while the protein product of the gene uses a nonitalicized symbol (e.g., SMN1 is the survival of motor neuron 1 protein). Various symbols and abbreviations are used to refer to designated variants or changes to the reference sequence and their impact on different molecules. References to particular molecules generally use the RefSeq database maintained by the National Center for Biotechnology Information. A table of common abbreviations and nomenclature conventions is found in Table 1.2 below.

    Table 1.2

    A commonly used resource in human genetics is the Online Mendelian Inheritance in Man (OMIM) database (www.omim.org). OMIM is a continuously updated comprehensive compendium of human genes and phenotypes with a presumed genetic basis. It focuses on the relationships between phenotype and genotype and documents established gene-disease associations of so-called Mendelian disorders, based on literature review and curation. Throughout this book, we refer to many different genetic disorders by name and also by acronyms and provide their designated six-digit identification number or MIM number. The reader can then look-up such disorders of interest in OMIM using these unique identifiers to learn more about their clinical features and associated information. The # symbol prior to the MIM numbers of genetic disorders referenced throughout indicates that the molecular basis or gene affected in that disorder has been identified and documented in the scientific

    Enjoying the preview?
    Page 1 of 1