Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Principles and Applications of Molecular Diagnostics
Principles and Applications of Molecular Diagnostics
Principles and Applications of Molecular Diagnostics
Ebook1,644 pages21 hours

Principles and Applications of Molecular Diagnostics

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

Principles and Applications of Molecular Diagnostics serves as a comprehensive guide for clinical laboratory professionals applying molecular technology to clinical diagnosis. The first half of the book covers principles and analytical concepts in molecular diagnostics such as genomes and variants, nucleic acids isolation and amplification methods, and measurement techniques, circulating tumor cells, and plasma DNA; the second half presents clinical applications of molecular diagnostics in genetic disease, infectious disease, hematopoietic malignancies, solid tumors, prenatal diagnosis, pharmacogenetics, and identity testing. A thorough yet succinct guide to using molecular testing technology, Principles and Applications of Molecular Diagnostics is an essential resource for laboratory professionals, biologists, chemists, pharmaceutical and biotech researchers, and manufacturers of molecular diagnostics kits and instruments.

  • Explains the principles and tools of molecular biology
  • Describes standard and state-of-the-art molecular techniques for obtaining qualitative and quantitative results
  • Provides a detailed description of current molecular applications used to solve diagnostics tasks
LanguageEnglish
Release dateJun 13, 2018
ISBN9780128160626
Principles and Applications of Molecular Diagnostics

Related to Principles and Applications of Molecular Diagnostics

Related ebooks

Chemistry For You

View More

Related articles

Reviews for Principles and Applications of Molecular Diagnostics

Rating: 5 out of 5 stars
5/5

2 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Principles and Applications of Molecular Diagnostics - Nader Rifai

    Principles and Applications of Molecular Diagnostics

    Editors

    Nader Rifai, PhD

    Professor of Pathology, Harvard Medical School, Louis Joseph Gay-Lussac Chair in Laboratory Medicine, Director of Clinical Chemistry, Boston Children’s Hospital, Boston, MA, United States

    Andrea Rita Horvath, MD, PhD

    Professor, Department of Clinical Chemistry & Endocrinology, New South Wales Health Pathology, School of Medical Sciences, University of New South Wales, Sydney, Australia

    Carl T. Wittwer, MD, PhD

    Professor of Pathology, University of Utah School of Medicine Medical Director, Immunologic Flow Cytometry ARUP Laboratories, Salt Lake City, UT, United States

    Table of Contents

    Cover image

    Title page

    Copyright

    Contributors

    Preface

    1. Principles of Molecular Biology

    Historical Developments in Genetics and Molecular Biology

    Molecular Biology Essentials

    Nucleic Acid Structure and Function

    Central Dogma of Molecular Biology

    Epigenetics

    Understanding Our Genome

    2. Genomes and Variants

    Introduction

    Human Genome

    Variations in Specific Populations

    Nonhuman Genomes

    DNA That Codes for RNA but Not Protein

    Variation in the Human Genome

    Nomenclature

    Databases

    Informatics

    3. Nucleic Acid Isolation

    History of Nucleic Acid Preparation Tools

    Steps Involved in NA Preparation

    Impact of Sample Matrix on Nucleic Acid Preparation

    Processing Throughput

    Specific Applications

    Nucleic Acid Quality and Quantity

    Methods to Measure Nucleic Acid Quality and Quantity

    Summary of Methods

    Conclusion

    4. Nucleic Acid Techniques

    Introduction

    Nucleic Acid Preparation

    Amplification Techniques

    Detection Techniques

    Discrimination Techniques

    5. Molecular Microbiology

    Introduction

    Viral Syndromes

    Sexually Transmitted Infections

    Respiratory Tract Infections

    Bloodstream Infections

    Central Nervous System

    Gastroenteritis

    Antibacterial Drug Resistance

    Human Microbiome and Metagenomics

    Future Directions

    6. Genetics

    Diseases With Mendelian Inheritance

    Mitochondrial DNA Diseases

    Imprinting

    Expanded Carrier Screening

    Massively Parallel Sequencing

    Whole-Exome Sequencing

    Cytogenomics

    Reporting of Test Results

    Laboratory Regulation

    7. Solid Tumor Genomics

    Considerations for Solid Tumor Genomics

    Genomic Analysis of Solid Tumors

    Massive Parallel Sequencing of Solid Tumor Samples

    Interpreting Somatic Alterations in a Clinical Context

    Summary and Concluding Remarks

    8. Genetic Aspects of Hematopoietic Malignancies

    Recurrent Translocations and Structural Chromosomal Abnormalities

    Gene Mutations in Hematopoietic Malignancies

    Role of Genetic Regulatory Mechanisms in Hematopoietic Malignancies

    Clonality Testing in Hematopoietic Malignancies

    Conclusion

    9. Circulating Tumor Cells and Circulating Tumor DNA

    Circulating Tumor Cells

    Circulating Tumor DNA

    Combined Discussion of CTC and ctDNA

    Conclusions

    10. Circulating Nucleic Acids for Prenatal Diagnostics

    Brief Overview of the Early Developments of Prenatal Genetic Diagnostics

    Strategies to Mitigate Risks of Invasive Prenatal Diagnosis

    Noninvasive Fetal DNA Analysis

    The Biology of Circulating Cell-Free Fetal Nucleic Acids in Maternal Plasma

    Diagnostic Applications of Circulating cffDNA

    Noninvasive Fetal ‘Omics

    Analytical Aspects

    Conclusion

    11. Pharmacogenetics

    Principles of Pharmacogenetics

    Specific Examples of Pharmacogene Associations

    Future Directions

    12. Identity Testing

    Short Tandem Repeats and Amelogenin

    The Analytical Process

    Y–Short Tandem Repeats

    Mitochondrial DNA

    Touch Samples and DNA Mixtures

    Single Nucleotide Polymorphisms

    Quality Assurance

    Statistical Interpretation

    Rapid DNA

    Massively Parallel Sequencing

    Legal Issues

    Parentage and Kinship

    Other Clinical Applications of DNA Identity Markers

    13. Amino Acids, Peptides, and Proteins

    Introduction

    Amino Acids

    Peptides

    Proteins

    14. Proteomics

    Historical Perspective

    Biomarker Pipeline

    Bottom-Up Targeted Proteomics

    Top-Down Proteomics

    Preanalytical and Other Technical Considerations

    Conclusions

    Index

    Copyright

    Elsevier

    Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    Copyright © 2018 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    ISBN: 978-0-12-816061-9

    For information on all Elsevier publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Susan Dennis

    Acquisitions Editor: Kathryn Morrissey

    Editorial Project Manager: Carly Demetre

    Production Project Manager: Paul Prasad Chandramohan

    Cover Designer: Miles Hitchen

    Typeset by TNQ Technologies

    Contributors

    D. Hunter Best, PhD,     Associate Professor of Pathology, University of Utah School of Medicine, Medical Director, Molecular Genetics and Genomics, ARUP Laboratories, Salt Lake City, Utah

    Cory Bystrom, BS, MS, PhD,     Vice President, Research and Development, Cleveland HeartLab, Cleveland, Ohio

    Rossa W.K. Chiu, MBBS, PhD, FHKAM, FRCPA,     Choh-Ming Li Professor of Chemical Pathology, Department of Chemical Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China

    Dennis J. Dietzen, PhD,     Professor of Pediatrics, Washington University School of Medicine, Medical Director, Core Laboratory and Metabolic Genetics Laboratory, St. Louis Children’s Hospital, St. Louis, Missouri

    Katherine B. Gettings, PhD,     Research Biologist, Applied Genetics Group, National Institute of Standards and Technology, Gaithersburg, Maryland

    Andrew N. Hoofnagle, MD, PhD,     Professor, Head, Division of Clinical Chemistry, Department of Laboratory Medicine, University of Washington, Seattle, Washington

    Dave Hoon, MSc, PhD,     Professor of Translational Molecular Medicine, Division of Molecular Oncology, John Wayne Cancer Institute, Providence Health Systems, Santa Monica, California

    John Greg Howe, PhD,     Associate Professor of Laboratory Medicine, Yale University School of Medicine, New Haven, Connecticut

    Todd W. Kelley, MD, MS,     Associate Professor of Pathology, University of Utah, Medical Director, Molecular Hematopathology, ARUP Laboratories, Salt Lake City, Utah

    Evi Lianidou, PhD,     Professor of Analytical Chemistry–Clinical Chemistry, National and Kapodistrian University of Athens, Athens, Greece

    Y.M. Dennis Lo, DM, DPhil,     Li Ka Shing Professor of Medicine, Department of Chemical Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China

    G. Mike Makrigiorgos, PhD,     Professor and Director of Medical Physics and Biophysics, Radiation Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts

    Elaine R. Mardis, PhD,     Professor of Pediatrics, Ohio State University College of Medicine, Co-Director, Institute for Genomic Medicine, Research Institute, Nationwide Children's Hospital, Columbus, OH, United States

    Gwendolyn A. McMillin, PhD,     Professor of Pathology, University of Utah, Medical Director, Toxicology and Pharmacogenomics, ARUP Laboratories, Salt Lake City, Utah

    Frederick S. Nolte, PhD, D(ABMM), F(AAM),     Professor and Vice-Chair for Laboratory Medicine, Department of Pathology and Laboratory Medicine, Director, Clinical Laboratories, Medical University of South Carolina, Charleston, South Carolina

    Jason Y. Park, MD, PhD,     Associate Professor, Joint Appointment, Pathology and the Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Director, Advanced Diagnostics Laboratory, Department of Pathology, Children’ s Medical Center Dallas, Dallas, Texas

    Jay L. Patel, MD,     Assistant Professor of Pathology, University of Utah School of Medicine, Salt Lake City, Utah

    Daniele S. Podini, PhD,     Associate Professor of Forensic Sciences, George Washington University, Washington, D.C

    Victoria M. Pratt, PhD,     Associate Professor of Medical and Molecular Genetics, Director, Pharmacogenomics Laboratory, Indiana University School of Medicine, Indianapolis, Indiana

    Stephanie A. Thatcher, MS,     Director of Systems Integration, BioFire Diagnostics, Salt Lake City, Utah

    Cindy L. Vnencak-Jones, PhD,     Professor of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Medical Director, Molecular Diagnostics Laboratory, Vanderbilt University Medical Center, Nashville, Tennessee

    Mia Wadelius, MD, PhD,     Associate Professor of Medical Sciences, Clinical Pharmacology, Uppsala University, Uppsala, Sweden

    Victor W. Weedn, MD, JD,     Professor and Chair of Forensic Sciences, George Washington University, Washington, D.C

    Carl T. Wittwer, MD, PhD,     Professor of Pathology, University of Utah School of Medicine, Medical Director, Immunologic Flow Cytometry, ARUP Laboratories, Salt Lake City, Utah

    Preface

    Biology is complex. Over 3 billion (3,000,000,000) nucleic acid base pairs are present in two copies within each human cell. Their sequence and redundancy determine the expression and regulation of proteins that determine human traits.

    Molecular diagnostics usually refers only to nucleic acids, but that is unfair. Indeed, most laboratory analytes are molecules, except some electrolytes and heavy metal ions. In particular, proteins execute the orders of nucleic acids and deserve at least as much notoriety. Proteogenomics is an emerging field that combines proteomic, genomic, and transcriptomic data to better define the molecular signatures of disease.

    Both nucleic acids and proteins are macromolecules made up of 4 (nucleic acid) or 20 (protein) components whose sequence determines function. Both are typically modified beyond their basic structure. The cytosines in DNA are variably methylated, and proteins are often glucosylated and may be reversibly phosphorylated. The combination of nucleic acids and proteins in the right environment is a highly regulated, yet adaptable, self-generating system that we call life.

    We start with an introductory chapter on molecular biology, laying the groundwork for both nucleic acids and proteins. This is followed by a detailed look at the human genome, whose sequencing is perhaps the defining achievement of the early 21st century. Chapter 3 focuses on nucleic acid isolation as a preamble to analysis. This chapter is unique in that very few reviews on sample preparation are available from the chemistry perspective. The chapter on nucleic acid techniques provides a detailed look at methods ranging from basic PCR to massively parallel sequencing. We then present various applications of nucleic acid analysis, including infectious disease, genetics, and cancer. First presented is molecular microbiology: analysis of DNA or RNA from pathogenic organisms. Microbiology has been revolutionized by nucleic acid tests replacing phenotypic and culture methods. Then, genetics is covered in detail, including autosomal recessive, dominant, and X-linked diseases, in addition to mitochondrial disease and inherited cancer predisposition. Chapters on solid tumor genomics, genetic aspects of hematopoietic malignancies, and circulating tumor cells and circulating tumor DNA follow. Specialty topics of circulating nucleic acids for prenatal diagnostics, pharmacogenetics, and identity testing complete our coverage of nucleic acids.

    As mentioned earlier, if nucleic acids are the brain of the organism, proteins are the muscles. No story of macromolecules is complete without proteins. We end our volume with an introductory chapter on amino acids, peptides, and proteins, and a final chapter on proteomics, assessing the spectrum of proteins present by conventional gel methods and powerful mass spectroscopy tools.

    We thank the authors and our publisher who have made this compilation possible. The chapters presented here were first published in the 6th edition of the Tietz Textbook of Clinical Chemistry and Molecular Diagnostics in 2017. However, 4 of the 14 chapters included here were only published electronically at that time. We are now pleased to present all 14 chapters in one printed volume and hope you will enjoy consuming them as much as we enjoyed assembling them.

    1

    Principles of Molecular Biology

    John Greg Howe

    Abstract

    Background

    Molecular diagnostics and its parent field, molecular pathology, examine the origins of disease at the molecular level, primarily by studying nucleic acids. Deoxyribonucleic acid (DNA), which contains the blueprint for constructing a living organism, is the centerpiece for research and clinical analysis. Molecular pathology is an outgrowth of the enormous amount of successful research in the field of molecular biology that has discovered over the last seven decades the basic biological and chemical processes of how a living cell functions. The success of molecular biology, as noted by the large number of Nobel prizes awarded for its discoveries, is now used for clinical diagnosis and the development and use of therapeutics.

    Content

    The following chapters are devoted to describing this field and the specific applications currently being used to characterize and help treat patients with a variety of ailments, including hereditary genetic diseases, cancer neoplasms, and infectious diseases. In this chapter the fundamentals of molecular biology are reviewed, followed by a focus on genomes and their variants in Chapter 2. In Chapters 3 and 4 techniques for isolating and analyzing nucleic acids are discussed. The clinically important subdivisions of molecular diagnostics are then reviewed and include microbiology in Chapter 5, genetics in Chapter 6, solid tumors in Chapter 7, and hematopoietic malignancies in Chapter 8. Chapters 9 and 10 are devoted to the molecular diagnostic analysis of circulating tumor cells and circulating nucleic acids. Finally, pharmacogenetics and identity assessment are the focus of Chapters 11 and 12.

    Historical Developments in Genetics and Molecular Biology

    Molecular diagnostics would not be possible without the many significant pioneering efforts in genetics and molecular biology. Earlier observations in genetics began with the discovery of the inheritance of biological traits made by Gregor Mendel in 1866 and the observation in 1910 that genes were associated with chromosomes by Thomas Morgan. The initial findings that contributed to determining that DNA was the transmittable genetic material were performed by Griffith in 1928 and Avery, McLeod, and McCarty in 1944.¹,² The definitive studies, published by Hershey and Chase in 1952, demonstrated that radiolabeled phosphate incorporated into the DNA of a bacteriophage was found in newly synthesized DNA containing bacteriophage instead of radiolabeled sulfur in protein, which showed that DNA and not protein was the genetic material.³

    Deciphering the structure of DNA required several crucial findings. These included the observation by Erwin Chargaff that the quantity of adenine is generally equal to the quantity of thymine, and the quantity of guanine is similar to the amount of cytosine⁴ and the pivotal x-ray crystallography results produced by Rosalind Franklin and Maurice Wilkins.⁵,⁶

    Molecular biology has historically traced its beginnings to the first description of the structure of DNA by James Watson and Francis Crick in 1953.⁷,⁸ The description of the DNA structure initiated the dramatic increase in the knowledge of the biology and chemistry of our genetic machinery. The impact of the Watson and Crick discovery was so significant that it is considered one of the most important scientific discoveries of the 20th century.⁹

    One reason the work of Watson and Crick had such a dramatic impact on scientific discovery was that they not only described the structure of DNA, but hypothesized about many of its properties, which took decades to confirm experimentally.⁷,⁸,¹⁰ One of those properties was the replication of DNA, which was shown to be semiconservative by Meselson and Stahl¹¹ in 1958. At the same time, DNA polymerase, which replicates the DNA, was discovered by Arthur Kornberg.¹² Deciphering the genetic code was vital for understanding the information stored in DNA, and cracking the code in 1965 required many scientists, most prominently Marshall Nirenberg.¹³ Additional studies described the transcription and translation processes and uncovered several startling findings. One finding was the isolation of reverse transcriptase, an enzyme that synthesizes DNA from ribonucleic acid (RNA), which demonstrates that genetic information can be transferred in part in a bidirectional manner.¹⁴,¹⁵ Another finding showed that the eukaryotic gene structure was composed of alternating non–protein-encoding introns and protein-encoding exons.¹⁶,¹⁷ Along with the discovery of the basic biology of genes and their expression, many important techniques were invented. For example, the isolation of restriction enzymes¹⁸ and DNA ligase allowed for the construction of recombinant DNA,¹⁹ which could be transferred from one organism to another, leading to the cloning of DNA²⁰ and the emergence of genetic engineering. The Southern blot method, which identified specific electrophoretically separated pieces of DNA, participated in many discoveries and was one of the first molecular diagnostics methods to be used to test for genetic diseases.²¹ DNA sequencing technologies were invented²²,²³ and further advances in these technologies led to the first large biological science research undertaking, the Human Genome Project. Along with DNA sequencing, further technical discoveries, including the polymerase chain reaction in 1986²⁴ and microarray technology in 1995,²⁵ became methodologic foundations for molecular diagnostics.

    Molecular Biology Essentials

    Whether it is a bacterium, virus, or eukaryotic cell, the genetic material located in these organisms dictates their form and function. For the most part the genetic material is DNA, which is composed of two strands of a sugar-phosphate backbone that are bound together by hydrogen bonds between two purines and two pyrimidines attached to the sugar molecule, deoxyribose, in a double helix (Figs. 1.1 and 1.2). DNA in human cells is wrapped around histone proteins and packaged into nucleosome units, which are compacted further to form chromosomes (Fig. 1.3). There are 23 pairs of chromosomes, two of which are the sex chromosomes, X and Y. Each chromosome is a single length of DNA with a stretch of short repeats at the ends called telomeres and additional repeats in the centromere region. In humans, there are two sets of 23 chromosomes that are a mixture of DNA from the mother's egg and father's sperm. Each egg and sperm is therefore a single or haploid set of 23 chromosomes and the combination of the two creates a diploid set of human DNA, allowing each individual to possess two different sequences, genes, and alleles on each chromosome, one from each parent. Each child has a unique combination of alleles because of homologous recombination between homologous chromosomes during meiosis in the development of gametes (egg and sperm cells). This creates genetic diversity within the human population. If a child has a random DNA sequence change or mutation, the child's genotype is different from that inherited from either of the parents (de novo variant). If the child's genotype leads to visible disease, the child has acquired a different phenotype from the parents.

    Human cells have a limited lifespan and die through a process called apoptosis. Therefore most cells replace themselves as they progress naturally through their cell cycle. As a cell moves through phases of the cell cycle, its DNA doubles during the synthesis phase when the double-stranded DNA molecule separates. Each strand of DNA is used as a template to make a complementary strand by DNA polymerase in a process called DNA replication. Eventually during the cell cycle, two cells are created from one during the final mitotic phase.

    DNA is composed of genes that code for proteins and RNA. For DNA to convert its store of vital information into functional RNA and protein, the DNA strands need to separate so that RNA polymerase can bind to the start region of the gene. With the help of transcription factors that bind upstream to promoters, the RNA polymerase produces single strands of RNA that are further processed to remove the introns and retain the protein-encoding exons. The mature, processed RNA molecule, the messenger RNA (mRNA), migrates to the cytoplasm, where it is used in the production of protein.

    To start the process of protein synthesis or translation, the mRNA is bound by various protein factors and a ribosome, which contains ribosomal RNA (rRNA) and protein. The mRNA-bound ribosome begins to produce a polypeptide chain by binding a methionine-bound transfer RNA (tRNA) to the mRNA's initiating AUG codon or triplet code. The conversion of the nucleic acid triplet code to a polypeptide is accomplished by the tRNA, which contains a nucleic acid triplet code (anticodon) in its RNA sequence that is specific for an amino acid bound to one end of the tRNA molecule. After synthesis, the protein migrates to its functional location and eventually is removed and degraded.

    FIGURE 1.1   A, Purine and pyrimidine bases and the formation of complementary base pairs. Dashed lines indicate the formation of hydrogen bonds. (∗In RNA, thymine is replaced by uracil, which differs from thymine only in its lack of the methyl group.) B, A single-stranded DNA chain. Repeating nucleotide units are linked by phosphodiester bonds that join the 5′ carbon of one sugar to the 3′ carbon of the next. Each nucleotide monomer consists of a sugar moiety, a phosphate residue, and a base. ( † In RNA, the sugar is ribose, which adds a 2′-hydroxyl to deoxyribose.)

    FIGURE 1.2  The DNA double helix, with sugar-phosphate backbone and pairing of the bases in the core-forming planar structures. 

    From Jorde LB, Carey JC, Bamshad MJ, editors: Medical genetics. 4th ed. Philadelphia: Mosby; 2010.

    Nucleic Acid Structure and Function

    DNA is a rather simple molecule with a limited number of components compared to those of proteins. DNA is composed of a deoxyribose sugar, phosphate group, and four nitrogen-containing bases. Deoxyribose is a pentose sugar containing five carbon atoms that are numbered from 1′ to 5′, starting with the carbon that will be attached to the base in DNA and progressing around the ring until the last carbon that is not part of the ring structure. The bases consist of the purines, adenine and guanine and the pyrimidines, cytosine and thymine; an additional base, uracil, replaces thymine in RNA. A basic building block is the nucleotide, which consists of a deoxyribose sugar with an attached base at the 1′ carbon and a phosphate group at the 5′ carbon. The triphosphate nucleotide is the building block for making newly synthesized DNA. Newly synthesized DNA forms a polynucleotide chain that connects the individual nucleotides through the 5′ and 3′ carbons of each deoxyribose sugar via phosphodiester bonds.

    Structure of Deoxyribonucleic Acid

    DNA is double stranded, and the two strands bind to one another through hydrogen bonds between the bases on each strand. Hydrogen bonding is augmented by hydrophobic attraction (stacking) between bases on adjacent rungs of the DNA ladder. Both hydrogen bonds and base stacking are not covalent, but are weak bonds that can be broken and reestablished. This important property is exploited by many of the methods that are used in molecular diagnostics. The composition of DNA is equal quantities of guanine and cytosine and equal quantities of adenine and thymine, because, in general, guanine binds to cytosine and adenine binds to thymine.⁴,⁷ There are two hydrogen bonds between adenine (A) and thymine (T) and three hydrogen bonds between cytosine (C) and guanine (G), and because of this difference in the number of hydrogen bonds, separating a guanine-cytosine (G-C) pair takes more energy than an adenine-thymine (A-T) pair (see Fig. 1.1).

    Each of the two DNA strands is formed by a phosphate sugar backbone that starts at the 5′ phosphate and ends at a 3′ hydroxyl group with the complementary bases binding to one another between the two phosphate sugar backbones. Each strand is therefore a polar opposite of the other (see Fig. 1.2). When the two strands are bound to one another they progress in opposite 5′ to 3′ directions in an antiparallel configuration. By convention, the DNA sequence is denoted in a 5′ to 3′ direction. As discussed later, both the replication of new DNA and the transcription of DNA to RNA progress in the 5′ to 3′ direction. In addition, the conversion of RNA to protein, a process called translation, proceeds from the 5′ end of the RNA to the 3′ end. The combination of the base pairing and the directionality of the two DNA strands allows for the deciphering of the DNA sequence of one strand of DNA when the other complementary strand sequence is known.

    FIGURE 1.3  Structural organization of human chromosomal DNA. Double-stranded DNA is wound around the octamer core of histone proteins to form nucleosomes, which are further compacted into a helical structure called a solenoid. Nuclear DNA in conjunction with its associated structural proteins is known as chromatin. Chromatin in its most compact state forms chromosomes. The primary constriction of a chromosome is the centromere, and the chromosome's ends are the telomeres. 

    From Jorde LB, Carey JC, Bamshad MJ, editors. Medical genetics. 4th ed. Philadelphia: Mosby; 2010.

    Types of Deoxyribonucleic Acid

    Double-stranded DNA in living cells is generally found as the right-handed B-DNA helical structure, which has specific dimensions. Each turn of the helix is 3.4 nm long and consists of 10 bases. The DNA sugar-phosphate backbone is on the outside of the helix, and the bases of each strand are inside bound to their complement on the other strand by hydrogen bonds. Other conformational structures of DNA occur, mostly associated with DNA sequences that are repeated. These non-B DNA forms include a left-handed Z-form, A-motif, tetraplex G-quadruplex, i-motif, hairpin, cruciform, and triplex and are abundant in the human genome because a large percentage of the genome contains various repeats. Non-B DNA is associated with many biological processes, including transcriptional control. However, these structures also can create genetic instability, which can lead to various diseases such as neurologic disorders.²⁶

    Molecular Composition of Ribonucleic Acid

    The composition of RNA is similar to that of DNA because it contains four nucleotides linked together by a phosphodiester bond, but with several important differences. RNA consists of a ribose sugar with a hydroxyl group at the 2′ carbon instead of the hydrogen atom in DNA. The bases attached to the ribose sugar are adenine, cytosine, and guanine, but not thymine because RNA uses another pyrimidine—uracil—as a substitute for thymine.

    Structure of Ribonucleic Acid

    One significant difference between DNA and RNA is that RNA does not normally exist as two strands bound to one another, although a single strand can bind internally to itself creating functionally important secondary structures. Although in the past several decades the complexity and number of different RNAs has greatly expanded, the majority of cellular RNA is composed of a rather small number of RNA types. These include mRNA, rRNA, and tRNA.

    Ribonucleic Acids Associated With Protein Production

    mRNA is the most diverse group of the three major types of RNAs, but constitutes only a small percentage of the total RNA. mRNAs are transcribed from DNA that codes for proteins and therefore are used as the template for the translation of proteins. In the case of prokaryotes the mRNA is colinear with the protein that is translated; however, in eukaryotes the mRNA begins as a precursor RNA called premessenger or heterogeneous nuclear RNA (hnRNA) that includes untranslated intron and translated exon regions. After transcription the hnRNA is spliced into mature mRNA lacking the introns. The mature mRNA contains only exons and can be further modified by the addition of a 7-methylguanosine cap at the 5′ end, which protects the mRNA from degradation, and a polyadenosine (polyA) sequence at the 3′ end. In eukaryotes the production and processing of the hnRNA to mRNA takes place in the nucleus, and the final form of the mRNA is then transported to the cytoplasm to be translated.

    rRNA is associated with ribosomes, which are the primary structures that produce protein through the biological process of translation. rRNA, unlike mRNA, does not code for proteins. The ribosome is composed of two structures, the 50S and 30S subunits found in prokaryotes and the 60S and 40S subunits found in eukaryotes. The S stands for Svedberg units and is determined by the centrifugal sedimentation rate. The Svedberg unit measures the mass, density, and shape of an object. The ribosome is a mixture of RNA and protein. In eukaryotes there are four major rRNAs: the 18S rRNAs found in the 40S subunit and the 28S, 5.8S, and 5S rRNAs found in the 60S subunit. In prokaryotes, the 50S subunit contains the 23S and 5S rRNAs and the 30S subunit contains the 16S rRNA. Synthesis of eukaryotic rRNA occurs as a large 45S precursor RNA that is enzymatically cleaved to form all the rRNAs except the 5S RNA, which is transcribed separately. Ribosomal RNAs have secondary and tertiary structures that are well conserved with various loops, stem loops, and pseudoknots that contribute to their function. Ribosomal RNA and protein, as the components of ribosomes, function to carry out the translation of proteins. The sequence of the 16S rRNA has alternating conserved and divergent regions that can be used to identify microorganisms. The structure of the ribosome is now known, and the rRNA is more important than ribosomal proteins in ribosome functioning. The RNA acts as a catalytic agent called a ribozyme.²⁷,²⁸

    Another important group of RNAs are the tRNAs, which function as key molecules that act as a bridge between the nucleic acids and the proteins. They have a unique cloverleaf secondary structure, with the 3′ end covalently attached to the amino acid by specific aminoacyl tRNA synthetases. In the middle of the tRNA structure is the anticodon sequence that binds to a specific homologous codon in the mRNA. Therefore the codon directs the binding of a specific tRNA linked to its corresponding amino acid. The genetic code, which consists of a 64 3-base code, specifies the appropriate amino acid to be attached to the growing polypeptide chain (see Figs. 1.7 and 1.8, later in the chapter). There are several different classes of aminoacyl tRNA synthetases, but there is at least one aminoacyl tRNA synthetase for each of the 20 amino acids. There is also at least one tRNA for each amino acid; however, there can be more depending on the species.²⁹

    Besides the three major types of RNAs, other RNAs include nuclear, nucleolar, and cytoplasmic small RNAs, signaling RNAs, telomerase RNA, and micro-RNAs.³⁰ This list appears to be growing with each passing year. Some of the first characterized small RNAs, the nuclear and nucleolar small RNAs, are involved with the processing of precursor RNAs to mature RNAs, including splicing of hnRNA to mRNA and precursor rRNA to mature rRNAs. More recently a large number of microRNAs have been discovered that partly function in the regulation of translation. In addition, there are many other noncoding RNAs whose functions are just beginning to be understood.

    Human Chromosome

    Human double-stranded DNA that is contained in the sperm or egg is a single copy or haploid amount of DNA made up of approximately 3 billion base pairs (bp). To be more precise, the Human Genome Project consensus sequence of the human genome was 2.91 × 10⁹ bp³¹ and the first human to be sequenced, Craig Venter, had a genome size of 2.81 × 10⁹ bp,³² not including remaining gaps of highly repetitive sequences, many near centromeres and telomeres (see Chapter 2). The DNA in the cell is bound by many proteins to form chromatin (see Fig. 1.3). The proteins in chromatin consist of histones, which are bound in precise amounts per a length of DNA, and other proteins called nonhistone proteins that are bound more irregularly and in widely varying amounts. The histone proteins consist of eight proteins (two copies each of H2A, H2B, H3, and H4) that bind as a unit to 147 bp of DNA to make up a nucleosome, and the protein, H1, that binds between the nucleosomes (Fig. 1.4). The nucleosomes are the basic structure to which many other proteins interact and modify to regulate gene expression. For example, the access to DNA by transcription factors is controlled by proteins that remodel the histone proteins through phosphorylation, acetylation, and methylation. The nucleosomes are condensed into filaments and even more compact structures to form a chromosome (see Fig. 1.3). There are 23 pairs of chromosomes; 22 autosomal chromosomes and 2 sex chromosomes, X and Y, with an XX pair denoting female and an XY pair denoting male. The DNA in chromosomes is continuous for each chromosome and can be as much as several hundred million base pairs in length for the largest chromosomes.

    FIGURE 1.4  Schematic illustration of a nucleosome unit. A segment of DNA is wound around a nucleosome core particle consisting of an octamer of two each of the histone proteins H2A, H2B, H3, and H4. Tails with modifications (indicated by a red star ) are shown to protrude from H3 and H4. Adjacent nucleosomes are separated by a segment of linker DNA and the linker histone, H1.

    From a cytogenetic viewpoint, regions of the chromosomes can be classified by their transcriptional activity. The more condensed heterochromatin DNA is transcriptionally inactive and stains with Giemsa, a mixture of several dyes that bind to AT-rich regions of DNA. The less condensed euchromatin DNA is transcriptionally active and does not stain with Giemsa. The ends of the chromosomes, called telomeres, contain a repeat sequence, such as TTAGGG that is found in humans and shortens with age. The centromeres, at the center of most chromosomes, are important for linking sister chromatids during mitosis and contain various satellite DNAs, such as α-satellite tandem repeats (171 bp) that are over several million base pairs (Mb) in length.

    Surprisingly, most of the human DNA does not code for the expression of protein. As much as 50% of human DNA consists of many types of interspersed repeat sequences, such as satellites, telomeres, microsatellites, minisatellites, short and long interspersed nuclear elements (SINES, LINES), and retrovirus elements.³¹ Like other eukaryotes, human genes are in pieces with the protein-encoding regions, exons, alternating with the introns, which do not code for protein sequence and occupy more than a quarter of the human DNA.³³ Other regions around the genes, such as the promoter regions and the 3′ untranslated regions are also not translated into proteins. After all the noncoding sequences are removed, the protein-coding DNA sequence spans only approximately 1.2 to 1.5% of human DNA. Even though most human DNA is not associated with protein-producing genes, the Encyclopedia of DNA Elements (ENCODE) project has shown that much of the non–protein-encoding DNA is transcribed into noncoding RNAs, most with unknown function.

    Central Dogma of Molecular Biology

    Francis Crick originated the concept of the central dogma of biology, which describes the transfer of genetic information into functional macromolecules.³⁴ This was generally depicted to show the movement of genetic information from DNA to RNA via transcription using RNA polymerase and further translated into protein via ribosomes and various factors. This is a simplistic version of the original concept, which took into consideration every possible transfer of information even though no evidence existed at the time. However, since the original publication a number of other postulated transfers have been described. DNA can enzymatically replicate itself by DNA polymerase, and RNA can be made into DNA using reverse transcriptase.³⁵ Many of these enzymes are used in molecular diagnostics assays.

    Deoxyribonucleic Acid Replication

    A general principle underlying the synthesis or replication of new DNA is that it uses one of the two DNA strands as a template to make a new homologous strand. This is termed semiconservative replication and was first theorized by Watson and Crick.⁷ DNA replication begins at an adenine and thymine (AT)-rich structure called an origin of replication. In bacteria there is generally only one origin of replication, but in eukaryotic cells there are thousands. Since DNA can be supercoiled into more structures, a topoisomerase is required to first unwind this structure so that the DNA is accessible. A DNA helicase binds to the double-stranded DNA and separates the two strands, providing two single-stranded DNA templates. Replication progresses in a 5′ to 3′ direction; therefore one strand, the leading strand, is synthesized as one continuous strand using the 3′ to 5′ template and the other strand, called the lagging strand, is synthesized in small segments called Okazaki fragments from the 5′ to 3′ template. Because the DNA polymerase requires a primer, small RNA primers are made by a primase enzyme on the 5′ to 3′ template and the Okazaki fragments are synthesized starting from the primer. Okazaki fragments are finally linked by a ligase (Fig. 1.5).³⁶

    DNA polymerases of various types have been identified and they function in many different roles, the most important being the replication of new DNA and the repair of existing DNA. Using the template strand as a guide, the DNA polymerase binds a nucleotide triphosphate to the primer at a free 3′ hydroxyl group, releasing pyrophosphate. The specific nucleotide selected depends on the base on the template strand; for example, an adenine nucleotide is used if a thymine nucleotide is in the template strand. In summary, a complementary sequence is synthesized opposite the template strand. The insertion of the correct nucleotide does not always occur. Mistakes occur approximately every 100,000 nucleotides; therefore a major function of a DNA polymerase is error correction or proofreading and is accomplished by an intrinsic 3′ to 5′ exonuclease activity. DNA polymerases are important in molecular diagnostics because they are used in the polymerase chain reaction (PCR) and DNA sequencing.

    FIGURE 1.5  DNA replication. Double-stranded DNA is separated at the replication fork. The leading strand is synthesized continuously, whereas the lagging strand is synthesized discontinuously but is joined later by DNA ligase.

    DNA replication is part of the cell cycle and occurs during the synthesis phase. The rest of the cell cycle is the interphase, further divided into the first growth phase (G1) and the second growth phase (G2), along with the DNA replication or synthesis (S) phase that lies between G1 and G2. The mitosis phase, which involves the splitting of one cell into two cells, occurs after the G2 phase. Mitosis is divided into six subphases: prophase, prometaphase, metaphase, anaphase, telophase, and cytokinesis.

    At important control points in the cell cycle the cell will commit significant resources to proceed further. One of these control points is between the G1 and S phase, just before it begins DNA replication. The G1/S boundary control point is disrupted in many cancers. It is common for neoplasms to have mutations in the retinoblastoma gene (RB1), whose protein product regulates cell cycle progression from G1 to S. Another control point is between G2 and M, just as the cell commits to creating two cells from one.

    Deoxyribonucleic Acid Repair

    The integrity of DNA is damaged in a variety ways that culminate in changes or mutations in the DNA sequence. DNA bases may be damaged, removed, cross-linked or incorrectly paired with one another, and single- or double-stranded breaks may also occur.³⁷,³⁸ When the cell senses that its DNA has become damaged, it stops the progression of its cell cycle and initiates DNA repair processes.³⁹ Cells repair these lesions by employing multiple DNA repair mechanisms that are specific for the type of DNA lesion and include base excision repair, nucleotide excision repair, mismatch repair, and homologous recombination repair.

    Mechanisms

    Base excision repair removes bases that are damaged by deamination, oxidation, and alkylation. Deamination of guanine, cytidine, and adenine converts them into structures that will incorrectly base pair, creating transition mutations, which are changes between similar nitrogenous bases such as a purine to a purine. A transversion mutation is a change from a purine to a pyrimidine or vice versa. DNA glycosylases, such as uracil-DNA-glycosylase, cleave the damaged base, and a 5′-deoxyribose phosphate lyase removes the nucleotide upstream of the removed base. DNA polymerase and ligase then add a new nucleotide repairing the damage. One of the inherited disorders associated with this repair process that leads to a predisposition to various neoplasms is caused by mutations in MUTYH, a DNA glycosylase gene.³⁸,⁴⁰

    Nucleotide excision repair removes base modifications that change the helical structure of DNA, including bulky DNA distortions and covalently bound structures that may be created by ultraviolet radiation and certain cancer drugs. The damage is recognized by global and transcription-mediated repair processes. After the repair is initiated, the transcription factor, TFIIH, binds to a complex of proteins and makes an incision. The damaged DNA is unwound, and the gap is filled by DNA polymerase and finally sealed by DNA ligase. Mutations in the nucleotide excision repair genes cause xeroderma pigmentosum, which leaves affected individuals susceptible to specific tumors.³⁸,⁴¹

    Mismatch repair recognizes base incorporation errors and base damage. DNA polymerase has a 3′ to 5′ editing exonuclease with a proofreading function that is not completely effective and allows some mismatches to occur that can lead to mutations after DNA replication. The mismatched nucleotides must be repaired on the newly synthesized strand of DNA, which in prokaryotes is recognized by its unmethylated state. In eukaryotes the mechanism is different, and it is proposed that proteins associated with the replication apparatus, specifically the proliferating cell nuclear antigen protein determines the appropriate DNA strand for repair.³⁸ These mutations are corrected with DNA mismatch repair proteins, which identify the mismatches by their methylation patterns, excise the surrounding sequence, and then repair the excision with new sequence. Mutations in the human mismatch repair genes are associated with Lynch syndrome (hereditary nonpolyposis colorectal cancer).

    Double-stand breaks are a very destructive form of DNA damage that destabilizes the genome, sometimes resulting in gross chromosomal changes, such as translocations that are frequently found in cancer. Double-stranded breaks are caused by several processes, including ionizing radiation and chemotherapy drugs, and are repaired by either homologous recombination or nonhomologous end joining.³⁸,⁴¹ The homologous recombination repair pathway is initiated by recognition of a double-stranded break, followed by resection using exonucleases to create a 3′ single-stranded overhang. With the assistance of many proteins, RAD51 is bound to the single-stranded DNA, which invades the intact homologous double-stranded DNA of the sister chromatid and uses it as a template for new double-stranded DNA repair.³⁸

    DNA repair mechanisms operate independently to repair simple lesions. However, the repair of more complex lesions involves multiple DNA processing steps regulated by the DNA damage response pathway. When single- and double-stranded DNA breaks occur, a cascade of responses is initiated that culminates in either DNA repair, stopping the cell cycle, or programmed cell death. After DNA damage has occurred, the DNA damage response pathway activates the protein kinases ATM (ataxia telangiectasia mutated) and ATR (ataxia telangiectasia and Rad3-related protein) to phosphorylate signaling proteins, such as p53, which eventually leads to cell cycle arrest at the G1/S boundary. This gives time for the DNA repair mechanism to repair the damaged DNA; however, if the damage is too extensive, the cell initiates apoptosis or cell death.³⁹

    Deoxyribonucleic Acid Modification Enzymes

    There are two groups of nucleases, the endonucleases that cut through the sugar-phosphate backbone and exonucleases that digest the ends of DNA. The commercially important restriction endonucleases, which bacteria have acquired to protect themselves from viral infections, are used to cleave DNA at a specific nucleotide sequence or restriction sites.⁴² Several thousand restriction endonucleases have been characterized and are used extensively to manipulate DNA in molecular biology and molecular diagnostics. Recent work has described new nucleases, such as the RNA-guided engineered nuclease, CRISPR/Cas system, that can precisely cleave genomic DNA.⁴³

    DNA glycosylases are a family of enzymes associated with base excision repair that are used in the first step of DNA repair to remove the damaged base, without disrupting the sugar-phosphate backbone. An important member of that family, uracil DNA glycosylase, repairs the most common mutation found in humans, the spontaneous deamination of cytosine to uracil, by removing the uracil base.

    Gene Structure

    The structure of prokaryotic genes is straightforward; almost all of the gene sequence is used to make protein; however, this is not the case with eukaryotic genes. One of the unique hallmarks of eukaryotic genes is that the protein-coding DNA is interspersed with regions that do not code for DNA, an observation made by Richard Roberts and Phillip Sharp in 1977. A mature mRNA retains only the protein-coding sequences called exons, and the sequences between the exons are non–protein-encoding sequences called introns that are removed during mRNA maturation (Fig. 1.6).⁴⁴

    In addition to introns and exons, eukaryotic genes consist of regulatory regions, such as promoters and enhancers, and 3′ regions that contain termination and polyadenylation signals. The regulation of the expression of eukaryotic genes can occur at all levels from transcription to splicing to translation to degradation; however, most gene regulation occurs at the initiation of transcription by various promoters and enhancers.⁴⁵ There are two groups of regulatory elements: one is close to the transcriptional start site and is made up of the core promoter and ancillary promoters slightly further away from the start of transcription. The other group of regulatory elements can be much further away, not only upstream but also downstream from the gene. This second group is made up of enhancers, silencers, insulators, and locus-specific control regions.⁴⁵,⁴⁶ These regulatory elements contain specific sequences that bind to transcription factors that can upregulate or downregulate the expression of a gene. There are only several thousand human transcription factors, much less than the number of human genes; therefore each gene has many regulatory elements to provide the needed complexity to function in 200 different human cell types.⁴⁵

    A surprising property of human genes is that there are so few compared to less complex species. Humans have approximately 20,000 genes, many fewer than found in rice and only slightly more than found in the roundworm, Caenorhabditis elegans.⁴⁷-⁴⁹ Recently, results from the ENCODE project have challenged the concept of one gene, one protein.⁵⁰ Their studies show that the exon of one gene can be spliced into the exon of another gene.⁵¹ This result, along with alternative splicing, demonstrates that one gene can make multiple proteins and is probably the reason humans have such a small number of genes.

    Ribonucleic Acid Transcription and Splicing

    RNA transcription involves synthesizing an RNA strand using DNA as a template. This requires many different proteins, the most important being the RNA polymerases, of which there are three types in eukaryotic cells. RNA polymerase I is specific for the rRNAs, 28S, 18S, and 5.8S, which are initially transcribed as a single primary transcript of 45S. RNA polymerase II transcribes all genes that encode proteins and the small nuclear RNA (snRNA) genes. RNA polymerase III transcribes a variety of small RNAs, including the 5S rRNA, and tRNA. Additional proteins called transcription factors function in combination to recognize and regulate transcription of different genes.⁵²

    The synthesis of RNA proceeds in a 5′ to 3′ direction using DNA as a template and a specific DNA sequence acts as a transcription start site. Transcription progresses through three phases: initiation, elongation, and termination. The initiation phase includes the binding of transcription factors to promoters upstream from the start site and includes the core promoter immediately upstream and the ancillary promoters further away. However, some of the small RNA gene promoters are in the middle of the gene. Transcription factors binding to upstream promoters act as regulators of the transcription of genes. These factors generally bind in pairs or dimers and have several functional domains. One functional domain of the transcription factor binds to a specific promoter DNA sequence via several structures, such as the helix-turn-helix, zinc finger, and leucine zipper structures. Another domain binds to the other transcription factor of the dimer pair, and a third domain may bind to the RNA polymerase complex that carries out transcription.⁴⁶ Even though promoters and the transcription factors binding to them are far away from the transcription initiation complex, the promoter DNA folds back on itself to allow for the transcription factors to interact with the RNA polymerase complex.⁵³

    FIGURE 1.6  DNA transcription and messenger RNA processing. A gene that encodes for a protein contains a promoter region and variable numbers of introns and exons. Transcription commences at the transcription start site. Premessenger RNA or heterogeneous nuclear RNA (hnRNA) is processed by capping, polyadenylation, and intron splicing and becomes a mature messenger RNA.

    Important recurring sequences are found in the core promoter. For example, the core promoter of an RNA polymerase II gene contains a TATAAA sequence, called a TATA box located upstream 25 to 40 nucleotides from the transcriptional start site. Only 20% to 30% of eukaryotic promoters contain TATA boxes, but they are highly regulated compared to those without TATA boxes that are mostly housekeeping genes.⁴⁵,⁵⁴,⁵⁵

    The first step in mRNA transcription is the binding of transcription factor IID (TFIID) to the TATA box, which in turn promotes the binding of other transcription factors (TFIIA, TFIIB, TFIIE, TFIIF, and TFIIH), RNA polymerase II, and proteins attached to the upstream promoter sites. To form a functional transcription complex, the promoter region's doubled-stranded DNA separates and the transcription complex moves away from the core promoter region.⁴⁵ Once started, the RNA polymerase adds nucleotides to the 3′ free hydroxyl group in a manner similar to that of DNA replication. Transcription is eventually terminated by one of several termination mechanisms. In bacteria a termination factor bound to the RNA polymerase recognizes a DNA sequence termination signal. In the case of genes transcribed by RNA polymerase II, termination is coupled with the polyadenylation step (see Fig. 1.6).

    Two posttranscriptional processing events are performed on the newly formed hnRNA, one at each end of the RNA. At the 5′ end, the hnRNA is capped with a 7-methyl guanosine molecule to help protect the hnRNA from degradation. At the 3′ end, a polyadenosine (poly A) stretch is added by poly A polymerase after the RNA sequence AAUAAA is synthesized. Some transcribed mRNAs are not polyadenylated, such as histone mRNAs.⁵⁶

    Transcription initially produces an hnRNA that contains both exons and introns, which needs to be processed or spliced into mature mRNA for it to be properly translated into protein. RNA splicing involves cleavage and removal of intron RNA segments and splicing of exon RNA segments. The process uses consensus splice site sequences located at both the 5′ (GU) and 3′ (AG) ends of the intron and an internal intron sequence. Splicing requires the effort of a number of proteins and small RNAs that come together to form a spliceosome, which directs the splicing of exons and removal of introns.⁵⁷ Splicing begins with the binding of the U1 small nuclear ribonucleic protein (snRNP) to the donor splice site and the U2 snRNP to the internal intron sequence, followed by the binding of U4, U5, and U6 snRNPs, resulting in excising the intron and joining (splicing) of the ends of the two exons on either side of the excised intron (see Fig. 1.6).⁵⁷

    An important modification of the splicing process, alternative splicing, allows for the generation of different mRNAs from the same primary RNA transcript by the cutting and joining of the RNA strand at different locations. Among the types of alternative splicing are exon skipping, alternative 3′ and 5′ splice sites, and intron retention. It is estimated that 92% to 95% of all human genes are alternatively spliced.⁵⁸,⁵⁹

    The movement of cellular signals from the surface of a cell to the nucleus is called signal transduction, and one of the eventual targets is the modification (eg, phosphorylation) of transcription factors, which can modulate the binding of other transcription factors to DNA and their dimerization, thereby controlling gene expression.⁶⁰ A common cascade of signaling begins with the activation of a receptor on the cell surface, such as a tyrosine kinase receptor. The tyrosine kinase receptor in the form of a dimer can be activated by binding to a hormone or growth factor, for example, which causes a dimerization and autophosphorylation of the tyrosine receptor protein kinase. This in turn activates a cytoplasmic protein, such as the guanine nucleotide exchange factor that activates the G-protein Ras, which can then modify another G-protein, Raf, which propagates the signal to a common signaling pathway, the mitogen-activated protein (MAP) kinases. The final enzyme in the pathway can then act on downstream targets, including other protein kinases, and transcriptional factors. Some mutations in the tyrosine kinase receptor or Ras protein switches them to an unregulated on position, which can lead to uncontrolled growth of the cell and eventually to cancer.⁶⁰

    Translation

    The final phase of the transfer of information from DNA is to proteins, the structural and functional molecules that make up the majority of a living organism, such as the human body. Proteins are long single strands of various amino acids and are synthesized by a process called translation, which requires the functioning of many protein factors, tRNAs, and ribosomes.

    Amino acids have a common structure consisting of a carbon atom bound to amino and carboxylic acid groups and a unique side chain. There are 20 amino acids each with a different side chain that give them their unique properties. The side chains can be divided into four types: nonpolar (hydrophobic), polar (hydrophilic uncharged), and negative and positively charged. Nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, valine, proline, methionine, phenylalanine, and tryptophan. The uncharged polar (hydrophilic) amino acids include glycine, serine, threonine, cysteine, tyrosine, glutamine, and asparagine. The negatively charged (acidic) amino acids are aspartic acid and glutamic acid, and the positively charged (basic) amino acids are arginine, histidine, and lysine. A protein's amino acid makeup and sequence in the polypeptide chain determine the overall structure and function of the protein. Some amino acids have a more significant presence than others. For example, proline, which disrupts secondary structure, and cysteine, which can cross-link to another cysteine through disulfide bonds, can change the structure of a protein.

    Protein structures are grouped into four different classes. The primary structure is the sequence of the amino acids in the protein. There are several common types of secondary structure, such as β-pleated sheets and α helixes. Proteins can be constructed with a combination of these different types of secondary structures. Tertiary structure applies to the folding of the polypeptide chain into a three-dimensional form. Quaternary structure is the structural relationship of more than one polypeptide/protein joining together, such as in immunoglobulin molecules, that contains light and heavy proteins bound together by cysteine residues.

    Once proteins are synthesized, they can be modified in various ways. One of the most common modifications is phosphorylation of the amino acids serine, threonine, and tyrosine, which can regulate protein activity. Other modifications include proteolytic cleavage, such as removal of the signal transport sequence, and acetylation of the N-terminus of most eukaryotic proteins that helps to prevent degradation. Glycosylation of secreted and membrane proteins on asparagine, serine, and threonine residues and formation of disulfide bonds via cysteine cross-linking are additional modifications.

    Taking into consideration these posttranslational modifications and alternatively spliced forms mentioned in an earlier section, the total number of proteins in the more than 200 human cell types is estimated to range from 250,000 to several million.⁶¹

    The genetic code, which was deciphered in the early 1960s, is required to convert a nucleic acid sequence into an amino acid sequence.¹³ It was reasoned that if there are 20 amino acids, a code of at least 3 nucleotides was necessary to have enough combinations. A 3-nucleotide code gives 64 combinations, and therefore one hallmark of the genetic code is that it is redundant, meaning that there are several codes for one amino acid. That is the case for most amino acids, but not all; for example, methionine and tryptophan have only one code. The redundancy is usually in the third base of the code. All of the 64 3-nucleotide codon possibilities code for an amino acid, except 3 that serve as stop codons (UAA, UGA, and UAG) (Fig. 1.7).

    Protein synthesis or translation occurs in the cytoplasm and proceeds in three steps: initiation, elongation, and termination. The process requires tRNA and rRNA molecules, as well as ribosomes and initiation, elongation, and termination factors. One of the most important groups of molecules are the tRNAs, which are recognized by aminoacyl tRNA synthetase enzymes that attach amino acids to the 3′ end of specific tRNA molecules. Each tRNA has a 3-base sequence (anticodon) that facilitates the specific recognition and interaction with a codon in the mRNA.

    The initiation step of protein synthesis is the most complex and begins with the binding of initiation factor 4E to the cap structure on the 5′ end of the mRNA and binding of poly-adenosine–binding protein (PABP) to the 3′ PABP polyadenosine tail. The binding of initiation factor 4G to both initiation factor 4E and PABP circularizes the mRNA and prepares it for binding to the preinitiation complex containing the 40S ribosomal subunit, initiation factor 2, and methionine tRNA. The preinitiation complex then scans the mRNA until it finds a methionine start codon (AUG), at which point the 60S ribosomal subunit binds forming the 80S initiation complex and initiates translation elongation.⁶² This is a simplistic description of the initiation process because over a dozen additional initiation and auxiliary factors are involved.

    Ribosomes have at least three structural positions where tRNAs can bind, the acceptor (A), peptidyl (P), and exit (E) sites. The acceptor site binds the incoming aminoacyl-tRNA. The peptidyl site holds the peptidyl-tRNA that is covalently linked to the growing polypeptide chain, and the exit site binds to the outgoing empty tRNA that carries no amino acid.⁶²,⁶³

    The first codon (AUG) always codes for methionine; therefore to initiate translation the methionine tRNA binds to the aminoacyl-tRNA binding site of the ribosome. The tRNA specific for the next 3-base codon—for example, lysine—binds to the acceptor site of the ribosome and with the help of elongation factors (eg, eEF2), the amino acid in the peptidyl site is bound to the amino acid in the acceptor site by the formation of a peptide bond. A peptide bond is created between the amino group of one amino acid and the carboxyl group of the next amino acid through condensation releasing water. At the same time the tRNA shifts positions, with the methionine tRNA shifting to the exit site and the tRNA containing the growing chain of amino acids shifting to the peptidyl site. At the same time, the ribosome moves forward one codon and the next tRNA specific for the next codon through its anticodon binds in the acceptor site, and the process is repeated until a termination codon is reached (Fig. 1.8). Termination factors then bind and stop the translation process.⁶² Protein synthesis occurs in the eukaryotic cytoplasm in the endoplasmic reticulum where multiple ribosomes called polyribosomes are involved in translating an individual mRNA.

    FIGURE 1.7  Genetic code. Translation of messenger RNA to amino acids during protein synthesis.

    FIGURE 1.8  Translation. Shown is a ribosome bound to a messenger RNA converting the messenger RNA triplet code (codon) via a specific amino acid–bound transfer RNA containing a complementary anticodon sequence. There are three transfer RNA positions. A new amino acid–bound transfer RNA first arrives on the ribosome at the A or acceptor site at the front of the moving ribosome and then moves to the P or peptidyl site where the amino acid on the newly arrived transfer RNA combines with the growing polypeptide chain. Finally the now empty transfer RNA moves to the E, or exit site, where it prepares to leave the ribosome. 

    Modified from Huether SE, McCance KL. Understanding pathophysiology. 6th ed. St. Louis, Elsevier; 2017.

    Regulation of translation is not as extensive as that for transcription. However, there is global regulation of eukaryotic translation at the initiation step with phosphorylation of initiation factor 2B by four different protein kinases. This occurs when the cells are under stress, such as amino acid starvation or DNA damage.⁶⁴ In addition, mRNA-specific translational regulation can occur through binding to specific sequences located in the 5′ and 3′ untranslated regions. Furthermore, there are over 1000 microRNAs in humans,⁶⁵ many of which regulate transcription. The microRNA genes are transcribed as precursor RNA and then processed into a mature 22-nucleotide form by the processing enzymes Dicer and Drosha. The mature form of microRNAs can bind to specific sites on mRNA while associated with the Argonaute protein and either reversibly inhibit translation or degrade the mRNA.⁶²,⁶⁶ For example, microRNAs Mir 15a/16-1 are deleted in chronic lymphocytic leukemia, thereby increasing Bcl2 expression and inhibiting apoptosis or cell death to prolong the life span of the cell.⁶⁷

    After proteins are synthesized there are two major processes to remove excess or damaged proteins. One process degrades the proteins ingested and uses nonspecific proteases, such as pepsin and trypsin, to digest proteins associated with foodstuff in the gut into amino acids so they can be absorbed. The second process digests extracellular and intracellular proteins by either general proteinases within lysosomes or by protein degradation via

    Enjoying the preview?
    Page 1 of 1