Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Navigating Non-coding RNA: From Biogenesis to Therapeutic Application
Navigating Non-coding RNA: From Biogenesis to Therapeutic Application
Navigating Non-coding RNA: From Biogenesis to Therapeutic Application
Ebook926 pages9 hours

Navigating Non-coding RNA: From Biogenesis to Therapeutic Application

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Navigating Non-coding RNA: From Biogenesis to Therapeutic Application provides a concise overview of the field of non-coding RNA (ncRNA). Chapters cover the history of discoveries that have occurred in the area of ncRNA, specific types of ncRNA, housekeeping ncRNAs such as ribosomal RNA, transfer RNA, small nuclear RNA and telomerase RNA, regulatory ncRNAs such as microRNA, small interfering RNA, long non-coding RNA and Y RNA. Biogenesis, structure, function, and regulation of each of these are also explored in addition to traditional and cutting-edge methods for the identification, functional characterization and structural characterization of ncRNA.

The book also focuses on the different types of epitranscriptomic modifications and their involvement in regulating ncRNA structure, stability and intermolecular interactions in addition to the role of ncRNAs in a range of diseases and potential therapeutic applications.

  • Covers a wide range of non-coding RNAs, including ribosomal RNA, transfer RNA, telomerase RNA, microRNA, small interfering RNA and circular RNA
  • Features both traditional and novel methodologies for investigating ncRNA, from microarray and conventional chemical probing to CAGE-seq and computational methods
  • Includes chapters on ncRNAs in a range of diseases, including cancers, neurological disorders, cardiovascular conditions and infectious illnesses
  • Discusses novel therapeutic strategies for targeting ncRNAs, including CRISP/Ca9 applications and RNAi-based strategies
  • Explores the molecular mechanisms and intermolecular interactions of ncRNA
LanguageEnglish
Release dateJun 17, 2023
ISBN9780323907002
Navigating Non-coding RNA: From Biogenesis to Therapeutic Application

Related to Navigating Non-coding RNA

Related ebooks

Biology For You

View More

Related articles

Reviews for Navigating Non-coding RNA

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Navigating Non-coding RNA - Joanna Sztuba-Solinska

    Navigating Non-coding RNA

    From Biogenesis to Therapeutic Application

    Edited by

    Joanna Sztuba-Solinska

    Department of Biological Sciences, Auburn University, Auburn, AL, United States

    Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland

    Table of Contents

    Cover image

    Title page

    Copyright

    List of contributors

    Chapter 1. History and definitions of ncRNAs

    The C-value paradox—there is more to a genome than protein-coding genes

    Protein synthesis, splicing, and RNA with catalytic function—the early days of ncRNA research

    Pervasive transcription

    Small ncRNAs

    Medium ncRNAs

    Long non-coding RNAs

    Conclusions

    Chapter 2. Regulatory non-coding RNAs-biogenesis, mechanisms of action and role in gene expression regulation

    Introduction

    microRNAs (miRNAs)

    Small interfering RNAs (siRNAs)

    Long non-coding RNAs (lncRNAs)

    Enhancer RNAs (eRNAs)

    Circular RNAs (circRNAs)

    Piwi RNAs (piRNAs)

    Vault RNAs (vRNAs)

    Concluding remarks

    Chapter 3. Non-coding RNAs: Mechanisms of action

    MicroRNAs

    PIWI-interacting RNAs

    Small nucleolar RNAs

    Small nuclear RNAs

    tRNA-derived fragments

    Long non-coding RNAs

    Circular RNAs

    Enhancer RNAs

    CRediT author statement

    Chapter 4. Functional characterization of lncRNAs

    Introduction

    Genomic features

    Transcriptomic features

    Subcellular localization

    Interacting partners (RNA/DNA/protein)

    RNA structure

    Perturbation experiments followed by phenotypic assessment

    High-throughput functional screening

    Outlook

    Chapter 5. Secondary structural characterization of non-coding RNAs

    Introduction

    In silico approaches for the characterization of non-coding RNA secondary structure

    Experimental approaches for the characterization of non-coding RNA secondary structure

    Example analysis of the human H19 long noncoding RNA

    Conclusion

    Author contributions

    Chapter 6. Regulation of non-coding RNAs

    Introduction

    Transcriptional regulation of ncRNAs

    Identification of promoter regions for miRNAs and lncRNAs

    Epitranscriptomic regulation of non-coding RNA

    A-to-I editing

    Funding

    Chapter 7. Non-coding RNAs in human non-infectious diseases

    Introduction

    Aberrant regulation of non-coding RNAs in rheumatoid arthritis

    Dysregulation of non-coding RNA in systemic lupus erythematosus

    Conclusions and perspectives: autoimmune and inflammatory diseases

    Cardiac and skeletal muscle

    Disease of cardiac muscle

    Atrophic conditions in skeletal muscle

    Conclusions and perspectives: cardiac and skeletal muscle diseases

    Chapter 8. Non-coding RNAs in human infectious diseases

    Introduction

    Biogenesis of non-coding RNA

    Innate immune response elicited by pathogens

    Non-coding RNAs in viral infections

    Non-coding RNAs in bacterial infections

    Non-coding RNAs in other infections

    Predisposition to infectious diseases due to genetic polymorphism

    Non-coding RNAs as therapeutics for infectious diseases

    Conclusion

    Chapter 9. Therapeutic targeting non-coding RNAs

    Introduction

    Non-coding RNA-based therapeutics

    Conclusions and perspectives

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    125 London Wall, London EC2Y 5AS, United Kingdom

    525 B Street, Suite 1650, San Diego, CA 92101, United States

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    Copyright © 2023 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    ISBN: 978-0-323-90406-3

    For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Wolff, Andre G.

    Acquisitions Editor: Fisher, Michelle

    Editorial Project Manager: Mapes, Matthew

    Production Project Manager: Raviraj, Selvaraj

    Cover Designer: Christian J. Bilbow

    Typeset by TNQ Technologies

    List of contributors

    Ryan J. Andrews,     Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

    Nicole I. Anthony,     Department of Biological Sciences, Auburn University, Auburn, AL, United States

    Liliana Roxana Balahura (Stamat),     Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Alessandro Bonetti,     Translational Genomics, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden

    Alessia Corbelli,     Dipartimento di Biologia Ecologia Scienze Della Terra, Università Della Calabria, Rende, Italy

    Marieta Costache

    Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Research Institute of the University of Bucharest, Bucharest, Romania

    Sarah D. Diermeier

    Department of Biochemistry, University of Otago, Dunedin, New Zealand

    Amaroq Therapeutics Ltd., Auckland, New Zealand

    Sorina Dinescu

    Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Research Institute of the University of Bucharest, Bucharest, Romania

    Agnieszka Dzikiewicz-Krawczyk,     Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland

    Ota Fuchs,     Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic

    Marta Elżbieta Kasprzyk,     Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland

    Marta Kazimierska,     Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland

    Akhilesh Kumar,     Department of Biological Sciences, Laboratory of Immunology and Infectious Disease Biology, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal, Madhya Pradesh, India

    Himanshu Kumar

    Department of Biological Sciences, Laboratory of Immunology and Infectious Disease Biology, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal, Madhya Pradesh, India

    Laboratory of Host Defense, WPI Immunology, Frontier Research Centre, Osaka University, Osaka, Japan

    Andreea Daniela Lazar,     Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Megan P. Leask

    Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States

    Department of Physiology, University of Otago, Dunedin, New Zealand

    Phillip J. McCown,     Department of Internal Medicine - Nephrology, Michigan Medicine, University of Michigan, Ann Arbor, MI, United States

    Alexandra Elena Mocanu-Dobranici,     Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Walter N. Moss,     Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

    Simona Nazarie (Ignat),     Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Collin A. O'Leary,     Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

    Clay E. Pandorf,     Cell Biology and Physiology, Edward Via College of Osteopathic Medicine-Auburn, Auburn, AL, United States

    Simona Panni,     Dipartimento di Biologia Ecologia Scienze Della Terra, Università Della Calabria, Rende, Italy

    Jake M. Peterson,     Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

    Marta Podralska,     Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland

    Athira S. Raj,     Department of Biological Sciences, Laboratory of Immunology and Infectious Disease Biology, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal, Madhya Pradesh, India

    Roslyn Michelle Ray,     Gene Therapy Research, CSL Behring, Pasadena, CA, United States

    Warren B. Rouse,     Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

    Iuliana Samoilă,     Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Aida Şelaru,     Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania

    Weronika Sura,     Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland

    Joanna Sztuba-Solinska

    Department of Biological Sciences, Auburn University, Auburn, AL, United States

    Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland

    Van S. Tompkins,     Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

    Emma Catharina Walsh,     Translational Genomics, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden

    Chapter 1: History and definitions of ncRNAs

    Sarah D. Diermeier ¹ , ² , and Megan P. Leask ³ , ⁴       ¹ Department of Biochemistry, University of Otago, Dunedin, New Zealand      ² Amaroq Therapeutics Ltd., New Zealand      ³ Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States      ⁴ Department of Physiology, University of Otago, Dunedin, New Zealand

    Abstract

    Recently, non-coding RNAs (ncRNAs) have received a lot of attention in the literature due to their involvement in a plethora of molecular roles, such as regulation of gene expression, splicing, or as modulators of protein activity. As reflected in the number of published primary research articles over the past 5–10 years, the spotlight has been in particular on some of the more recently discovered RNA biotypes, such as long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs). However, these are just the most recent developments in a long history of ncRNA research that extends some 60 years. Here, we describe the history of ncRNA research from the first findings of catalytically active RNA molecules in the 1950s to the discovery of pervasive transcription in genome-wide transcriptome studies, and the current state of research. We go on to describe and define the different classes of ncRNAs based on size and provide some examples of their functions in the cell.

    Keywords

    circRNA; eRNA; lncRNA; miRNA; ncRNA; piRNA; PROMPTs; rRNA; siRNA; snoRNA; snRNA; tRNA

    The C-value paradox—there is more to a genome than protein-coding genes

    The earliest hints at the widespread existence of non-coding RNAs stems from the C-value paradox, which was coined in the 1970s (Thomas, 1971). Around that time, it became apparent that the amount of DNA in a haploid genome, referred to as the C-value (constant value of a haploid genome) showed little correlation with the complexity of its organism. Frogs for example have larger genomes than humans, and single-celled amoeba have some of the largest genomes we know of, which seemed paradoxical to scientists at the time based on the assumption that the higher organism should have the bigger genome. A few years later, an explanation for the C-value paradox arose as newer research suggested that much of the genome does not code for proteins. Calculations of the mutational load in the genome determined that the human genome likely contains 20,000–30,000 genes (Ohno, 1972). In this context, genes were defined as transcriptional units ultimately translated to proteins. A few years later, DNA–RNA hybridization experiments further confirmed this estimate (Levin, 1980). Interestingly, this number is very close to the currently identified number of protein-coding genes in the human genome, just under 20,000 (19,955 according to GENCODE v38, November 2021). The remaining non-coding part of the genome was termed junk DNA under the initial assumption that it had no function at all. However, not all scientists agreed with this notion at the time and some interest about the potential functionality of non-coding DNA remained. Early hypotheses included regulatory functions, RNA processing, a reservoir for evolutionary innovation, and genome integrity, among others (Yunis and Yasmineh, 1971; Britten and Davidson, 1971; Orgel and Crick, 1980; Lewin, 1982 ). As we now know, much of the non-coding genome does indeed have biochemical functionality, including regulatory regions such as enhancers or promoters to transposons, repetitive elements and, of course, non-coding RNA (ncRNA) genes (ENCODE Project Consortium et al., 2007, 2020). This chapter explores the history of ncRNA research (Fig. 1.1), which contributes to our current understanding of the C-value paradox.

    Protein synthesis, splicing, and RNA with catalytic function—the early days of ncRNA research

    A decade before the C-value paradox was hypothesized the very first ncRNA species were discovered. In 1958, transfer RNAs (tRNAs) (Hoagland et al., 1958) and ribosomal RNAs (rRNAs) (Crick, 1958) were described, both of which are essential for protein synthesis. Not only were these discoveries crucial to forming the Central Dogma of Molecular Biology (the flow of information from a gene to its corresponding protein) earning Watson and Crick their Nobel Prize in 1962, these ncRNAs also demonstrated for the first time that RNA is an important functional molecule in the cell. Three years later, in 1961, three papers published within the same month described the protein-coding messenger RNA (mRNA), the third and final component needed to understand the Central Dogma, the flow of information from a gene to its corresponding protein (Brenner et al. 1961; Gros et al., 1961; Jacob and Monod, 1961).

    Figure 1.1  A schematic of the history of ncRNA discoveries since 1950 to the present day.In gold are discoveries that lead to the award of Nobel Prizes.

    Some 20 years later, one of the earliest discoveries that started to provide some explanation for the C-value paradox was the identification of intronic sequences in eukaryotes, which account for a substantial proportion (25%) of non-coding DNA (Jo and Choi, 2015). In 1977, two labs independently discovered split genes in eukaryotes (Chow et al., 1977; Berget et al., 1977). In bacteria, mRNAs are a perfect, sequence-complementary copy of the original DNA sequence. However, eukaryotic genes contain intervening sequences, called introns, that are removed when mRNA is transcribed, and are not part of the coding sequence that is translated. Some organisms have large intronic sequences which correlate with larger genomes but not necessarily with more coding genes (Vinogradov, 1999). Using nucleic acid hybridization, both teams showed that a probe for mRNA hybridized with the genomic DNA complement caused the DNA to loop out in the region where the non-coding sequence was displaced by the mRNA (termed R loops) indicating the presence of an intron. For this discovery, Richard Roberts and Philip Sharp were awarded the Nobel Prize in Physiology or Medicine in 1993. In the 1980s, scientists showed that these intronic sequences were removed by a novel class of ncRNAs functioning in a ribonucleoprotein (RNP) complex (Black et al., 1985; Chabot et al., 1985) defined in the same year as the spliceosome (Brody and Abelson, 1985). This unexpected class of ncRNAs was in fact first identified in 1966 through gel electrophoresis (Hadjiolov et al., 1966) and originally termed U-RNAs due to their uridine content, but are now known as the small nuclear RNAs (snRNAs) (Busch et al., 1982) (described in more detail in the section Small nuclear RNAs).

    In the late 1970 and 1980s, a large number of studies found that many different ncRNAs are complexed in RNPs (Lerner et al., 1981; Lerner and Steitz, 1979; Reimer et al., 1987; Kedersha and Rome, 1986), some of which have catalytic function, including rRNAs in the ribosome and snRNAs in the spliceosome. These RNA enzymes, or ribozymes, catalyze chemical reactions that are critical for life. In the case of both the ribosome and the spliceosome, the RNA component is sufficient for the enzymatic reaction catalyzed while proteins are the structural units that support and stabilize the RNA core. Although Woese, Crick, and Orgel were the first to suggest that RNA could act as an enzymatic catalyst in 1967 (Woese, 1967), the general discovery of RNAs with catalytic functions is credited to Thomas Cech's group, who found a self-splicing intron in the rRNA locus of Tetrahymena and coined the term ribozyme (Cech et al. 1981; Kruger et al., 1982). Despite extensive efforts of the group to remove proteins from the experiment, the splicing reaction kept occurring, leaving catalytic RNA as the only possible explanation. Thomas Cech went on to win the Nobel Prize in Chemistry together with Sidney Altman for establishing the catalytic properties of RNA in 1989. It took another decade to show that the ribosome is indeed a ribozyme as well. The conclusion that rRNA has catalytic function could only be made once the RNP structure of the large subunit was solved, which revealed that there are no proteins in the active center where peptide bond formation occurs (Ban et al., 2000). Similarly, once the structure of the spliceosome was revealed, this RNP was classed as a protein-directed metalloribozyme (Yigong Shi, 2017). While RNA-catalyzed protein synthesis is a universal principle across all animal kingdoms, the ncRNA field did not progress much further at the time, as researchers assumed them to be, for the most part, unstable intermediates. Nonetheless, the discoveries of the first ncRNA genes and intronic sequences were able to explain in part the C-value paradox, but there was still a lot to be discovered over the next decades.

    Pervasive transcription

    In the 1970s, more hints emerged that some of the junk DNA was actually being transcribed. In mammals, a new class of RNAs was discovered as being transcribed from repetitive and heterochromatic regions, as well as >20% of non-repetitive regions, and termed heterogenous nuclear RNA (hnRNA). As the name implies, about half of these hnRNAs were restricted to the nucleus (Holmes et al., 1972; Pierpont and Yunis, 1977). In 1975, it was determined that there is 10-fold more hnRNA in the cell compared to mRNA using nucleic acid hybridization reassociation kinetics, or cot curves (Hough et al., 1975). In 1980, cot curves further demonstrated a 10-fold greater complexity of nuclear compared to cytoplasmic polyadenylated RNA (Holland et al., 1980).

    While these findings were intriguing, it was not until new technologies such as microarrays and high-throughput sequencing emerged in the 2000s that the genomics era revealed just how widespread or pervasive transcription in eukaryotic genomes is and elucidated the many different ncRNA classes comprising hnRNA. Multiple landmark studies demonstrated independently that genomes contain a lot more transcripts than expected based on existing annotations, dramatically reducing the percentage of junk or non-functional/non-transcribed DNA. In 2002, tiling arrays identified widespread unannotated transcription (Kapranov et al., 2002). In the same year, several studies detected widespread ncRNA and antisense transcription by sequencing full-length cDNAs (Chen et al., 2002; Saha et al., 2002; Okazaki et al., 2002). The first whole-genome transcriptome mapping experiments were completed in 2003–04 for Drosophila, Arabidopsis, and human (Stolc et al., 2004; Yamada et al., 2003; Bertone et al., 2004) and many of the newly discovered transcripts comprised numerous different biotypes of RNA, including previously unidentified protein-coding genes, new splice isoforms of protein-coding genes, and overlapping transcripts on both strands, leading to the discovery of extensive antisense transcription. In a landmark paper in Science in 2005, >70% of all mammalian sense transcripts were found to have antisense partners (Katayama et al., 2005). Furthermore, these early transcriptomics studies hinted at the existence of a vast abundance of transcripts lacking open reading frames (ORFs) of significant length. These ncRNAs appeared to be more tissue-specific than mRNAs, and many of them were expressed in a tightly controlled manner throughout development or associated with disease (Inagaki et al., 2005; Ravasi et al., 2006; Sasaki et al., 2007).

    Several international consortia were formed at around the same time, producing large-scale datasets from many different cell and tissue types in human and mouse. These large groups of scientists set out to identify and characterize all functional elements in the human and mouse genomes, taking a swing at the antiquated concept of junk DNA. Functional ANnoTation Of the Mammalian genome (FANTOM) was established in 2000at Riken, Japan, to generate an atlas of mouse transcripts (Kawai et al., 2001; Carninci et al., 2005). Over the years, FANTOM evolved and expanded, with the two latest projects FANTOM5 and FANTOM6 focusing specifically on cataloging mammalian ncRNAs such as microRNAs (miRNAs) and long ncRNAs (lncRNAs), and functionally characterizing lncRNAs, respectively (Hon et al., 2017; de Rie et al., 2017; Ramilowski et al., 2020). FANTOM data is publicly available and provides an invaluable resource for the scientific community. Another international consortium, the Encyclopedia of DNA Elements (ENCODE) was launched in 2003 by the US National Human Genome Research Institute to identify all functional elements of the human genome. They went on to catalog all existing human and mouse transcripts and determined their subcellular localizations (ENCODE Project Consortium et al., 2007; Djebali et al., 2012). Their findings revealed that only 1%–2% of the human genome encodes for proteins while 60%–75% of the mammalian genome can be transcribed in a context- and cell type-specific manner, with up to 80% of the genome being transcribed or having regulatory function. In particular, the number of short/small RNAs and long poly-adenylated ncRNAs expanded significantly due to the efforts of the consortium. Like FANTOM, ENCODE data is publicly available and continues to release regular updates, with the latest major release in 2020 (ENCODE Project Consortium et al., 2020).

    In addition to direct detection of RNA transcripts, further evidence for pervasive transcription emerged from epigenetics studies such as genome-wide histone modification profiles and chromatin accessibility assays that confirmed the existence of many actively transcribed non-coding loci. In 2004, chromatin immunoprecipitation coupled to high-throughput sequencing (ChIP-seq) experiments showed an unexpected abundance of mammalian promoters, many of which were associated with ncRNAs (Cawley et al., 2004). ENCODE continues to extend on these initial epigenetic findings integrating transcriptomics with multi-omics datasets, including DNA methylomes, histone modifications, chromatin accessibility, chromatin organization in 3D, transcription factor occupancies, and RNA binding proteins, creating an ever expanding repository of functional annotations (ENCODE Project Consortium et al., 2020).

    The discovery of pervasive transcription provided some of the answers for the C-value paradox proposed over 30 years earlier. In 2012 ENCODE reported that 80.4% of the genome had biochemical function (ENCODE Project Consortium, 2012). Some scientists suggest that all or most transcripts that have been detected are in fact functional in onecell state or developmental context or another (Mattick et al., 2010; Pennisi, 2012). However, pervasive transcription is not an unchallenged concept, with some scientists continuing to argue that the C-value paradox remains unsolved and most of the genome consists of junk DNA. This opinion is based on the observation that many ncRNAs are expressed at very low levels, leading to the classification of some non-coding transcripts as non-functional, caused by transcriptional noise such as spurious transcription from randomly initiating or leaky RNA polymerases. Detection limits of the technologies used have been criticized as well. In addition to this, many ncRNAs are not conserved across species, which leads some experts to question if they can have function (Eddy, 2012; Graur et al., 2013; Palazzo and Gregory, 2014). On the other hand, ncRNAs may be evolutionarily new and highly species-specific, and sequence conservation alone may not be sufficient to designate function to an RNA as structure or genomic location may be equally or more important. Nevertheless, it has been suggested multiple times that the null hypothesis when studying a previously uncharacterized ncRNA should be that it has no function, and the experiments should be conceived to conclusively demonstrate whether a novel transcript has a molecular function in the given context. Thus, although large-scale studies by the FANTOM and ENCODE consortia have produced huge catalogs of ncRNAs, only a small percentage of the discovered transcripts have been assigned function thus far, with new molecular functions being revealed at a steadily increasing rate. We will go on to review landmark discoveries and define ncRNAs based on their size (small, mid-sized, and long, see Fig. 1.1).

    Small ncRNAs

    miRNA, siRNA, and RNAi

    One of the most well studied classes of ncRNAs are the miRNAs, the first of which was discovered in 1993 in the nematode Caenorhabditis elegans in Victor Ambros' laboratory (Lee et al., 1993). The group observed that the protein LIN-14 had to be downregulated for C. elegans larvae to progress from the first (L1) to the second (L2) stage. LIN-14 downregulation was dependent on the gene lin-4; however, it was discovered that the lin-4 transcript is not translated into a protein but instead produces two small RNAs of 21 and 61 nucleotides (nts) in length. The longer RNA was found to be a precursor for the shorter RNA. Interestingly, they and others showed that the short RNA derived from the lin-4 transcript was sequence complementary to the 3′ untranslated region (UTR) of lin-14 mRNA (Lee et al., 1993; Wightman et al., 1993). It turned out that the binding of the short lin-4 RNA to lin-14 mRNA results in the downregulation of LIN-14 at the protein level and that this is essential for developmental progression. Initially, this example of gene regulation by a short ncRNA was believed to be exclusive to C. elegans. In 2000, another small RNA was discovered that was involved in C. elegans larvae development: let-7 (Reinhart et al., 2000; Slack et al., 2000). Importantly, let-7 homologs were discovered in many other organisms, including human (Pasquinelli et al., 2000), and it became increasingly clear in the following years that miRNAs were indeed a large class of ncRNAs, important for gene regulation in many eukaryotes.

    miRNAs are defined as short/small (19–24nts) ncRNAs that are derived from a longer precursor miRNA. Unlike many of the earlier discovered ncRNAs such as rRNAs, tRNAs, and snRNAs, miRNAs are not necessarily expressed ubiquitously but can be restricted to certain tissue-types or developmental stages. Since the early findings in the 1990s, thousands of miRNAs have been described in the literature, and a microRNA database (miRbase) was set up in 2003 as a searchable repository for all miRNAs (Griffiths-Jones, 2004) with the current release (v22.1) containing almost 40,000 miRNAs across 271 organisms (Kozomara et al., 2019). According to GENCODE, 1879 miRNAs are annotated in the current version of the human genome (v38, November 2021) and 2201 in mouse (M27, November 2021). miRNAs have also been found to be associated with various diseases such as cancer, where they can act as oncogenes or tumor suppressors (Iorio and Croce, 2012).

    miRNAs function to trigger gene silencing via RNA interference (RNAi) (Fig. 1.2) which was initially reported in plants (Napoli et al., 1990; Romano and Macino, 1992) and later followed by Guo and Kemphues in C. elegans (Guo and Kemphues, 1995). However, the true mechanism of RNAi remained elusive until Andrew Fire and Craig Mellow deciphered it somewhat serendipitously in 1998 (Fire et al., 1998) and were subsequently awarded the 2006 Nobel Prize in Physiology or Medicine for their efforts (Fig. 1.1). In their seminal 1998 paper, they observed that double-stranded RNA (dsRNA) was 10–100 times more effective at silencing the target mRNA unc-22 than single-stranded RNA (ssRNA). Indeed, ssRNA only silenced unc-22 when sense RNA and antisense RNA were injected. Thus, dsRNA was sufficient to cause systemic silencing of the target mRNA and identified as the trigger of RNAi; however, the exact mechanisms and key players in small ncRNA function took several more years to define (Tomari and Zamore, 2005). In 2000, two separate teams identified the functional silencing intermediates: small interfering RNAs (siRNAs), a class of RNA of 21–23nt in length which co-purified with the sequence-specific nuclease. Thus, it was proposed that siRNAs degrade mRNA when incorporated into the RNA-induced silencing complex (RISC) (Hammond et al., 2000; Zamore et al., 2000).

    Figure 1.2  A schematic outlining the biogenesis and function of miRNA, siRNA, and piRNAs.In the left panel is a depiction of RNAi. miRNA are transcribed from a primary genome encoded miRNA and siRNA from exogenous dsRNA. After transcription, processing by the microprocessor consisting of drosha (black) and export via Exportin5 (purple) from the nucleus (miRNA pathway) or exogenous introduction of dsDNA (siRNA pathway) dicer (dark gray) cleaves the dsRNA into RNA duplexes. Subsequently, the duplexes are bound by RISC containing Ago2 (light blue). The Ago2 slicer cleaves the RNA duplex and via mRNA complementation targets the mRNA for cleavage, degradation, and/or translational repression. In the right panel is a schematic outlining piRNA transcription, biosynthesis, and function. From piRNA gene clusters a piRNA precursor transcript is transcribed and exported from the nucleus where it is bound by Ago3 and cleaved. Upon cleavage, the piRNA interacts with Aub where it either enters the ping-pong amplification step of piRNA biogenesis or the primary piRNA biogenesis step at the interface of the mitochondria in Yb bodies. After biogenesis, the piRNA is transported back into the nucleus where it silences genes. The piRNA can also remain in the cytoplasm where it can modify gene translation via mRNA deadenylation, cleavage, and stability.

    Since its discovery, the phenomenon of RNAi has been a key tool for gene knockdown experiments in molecular biology and has also lead to the development of RNA therapeutics. The first siRNA drug Patisiran was approved by the FDA for the treatment of hereditary transthyretin amyloidosis (hATTR) in 2018 (Wood, 2018). While Patisiran was the first approved siRNA drug, other RNA therapeutics paved the way, with Formivirsen as the first approval in 1998. Fomivirsen is a first-generation antisense oligonucleotide (ASO) targeting the cytomegalovirus (CMV) IE-2 mRNA for treatment of CMV retinitis. Although ASOs do not work through RNAi, they work on the same principles of targeting mRNA for RNA cleavage albeit via RNAse H. Numerous other ASO and siRNAs have been developed since, which target a myriad of inherited human diseases some of which are fatal if untreated. One of the most ground breaking ASO treatments developed to date is Spinraza, an ASO developed to treat spinal muscular atrophy (SMA). Spinraza was FDA approved in 2016 and corrects skipping of exon 7 in SMN2, leading to full-length, functional SMN2 protein (Rigo et al., 2012; Cartegni and Krainer, 2003; Chiriboga et al., 2016).

    miRNAs are endogenous and encoded in the genome as an RNA stem-loop structure (pri-miRNAs) (Ambros et al., 2003; Lee and Ambros, 2001; Lagos-Quintana et al., 2001), whereas siRNAs are synthetic or from other exogenous sources such as viruses or transposons excised from long, fully complementary dsRNAs (Ambros et al., 2003; Zamore et al., 2000; Hammond et al., 2000). Nevertheless, the size similarities and sequence-specific inhibitory functions of miRNAs and siRNAs indicate that they are related in biogenesis and function (Zeng et al., 2003). The detailed mechanism of how siRNAs and miRNAs function in RNAi along with RISC is outlined in Fig. 1.2. The miRNA pathway begins with transcription of the primary miRNA (pri-miRNA) encoded in the genome (75–110nts). After transcription, the pri-miRNA base-pairs with complementary sequences in other regions of the same molecule to form a double-stranded RNA structure defined as the hairpin. The microprocessor complex consisting of Drosha removes the hairpin structure (Yoontae Lee et al., 2003; Zeng et al., 2003) resulting in the precursor miRNA (pre-miRNA) that is actively transported from the nucleus to the cytoplasm by exportin (Yi et al., 2003; Lund et al., 2004). The siRNA and miRNA pathways converge at this point where the exogenous dsRNA (siRNA pathway) and pre-miRNA (miRNA pathway) are bound by Dicer, which cleaves the dsRNA and pre-miRNA into the 21–25nt siRNA and miRNA duplexes, respectively (Bernstein et al., 2001; Knight and Bass, 2001; Grishok et al., 2001; Ketting et al., 2001; Hutvágner and Zamore, 2002). The cleaved duplexes are then loaded onto Argonaut (AGO) proteins and incorporated into RISC as single-stranded RNAs (Rivas et al., 2005). Finally, this RISC:ssRNA identifies target messages via complementary sequences, leading to gene repression via a number of different mechanisms. In its canonical role in RISC, the slicer Ago2 cleaves the mRNA (Liu et al., 2004; Martinez et al., 2002) leading to its degradation via deadenylation of the mRNA poly(A) tail; however, RISC can also in inhibit translation by blocking translation initiation (Pillai et al., 2005) and small RNAs and RISC also function in heterochromatin formation (Reinhart and Bartel, 2002; Volpe et al., 2002).

    piRNAs

    At the same time as the discovery of RNAi, a novel class of long siRNAs (first termed repeat-associated siRNAs (rasiRNAs) because they originated from repetitive elements such as transposable sequences of the genome) were identified in Drosophila testis and found to silence the gene Stellate on the X-chromosome (Aravin et al., 2001). Later in 2006, Aravin and colleagues as well as three other studies reported their discovery in mammalian genomes (Aravin et al., 2006; Girard et al., 2006; Lau et al., 2006; Grivna et al., 2006), and they were renamed to piRNAs because of their interaction with the PIWI (P-element Induced WImpy testis) protein family. piRNAs are distinct from miRNAs and siRNAs (Vagin et al., 2006), specific to animals and by definition short (26–31nts) ncRNAs that are expressed specifically in the germline. Unlike miRNAs and siRNAs, they are processed from a long single-stranded precursor, of which the large majority are generated from piRNA clusters but they can also be derived from protein coding genes, transposons, tRNA, rRNA, and intergenic loci including lncRNA (Aravin et al., 2006; Girard et al., 2006). In contrast to miRNA and siRNA, the processing of these precursors is independent of the Dicer/Drosha mechanisms. The exact mechanisms of piRNA biogenesis in humans are unclear and most of the information has been gained from studies carried out in Drosophila; however, these processes are likely similar in humans (Williams et al., 2015; Rouget et al., 2010) (Fig. 1.2).

    After export from the nucleus (ElMaghraby et al., 2019; Kneuss et al., 2019), piRNAs interact with the germ-line specific PIWI clade of Argonaut proteins forming a ribonucleoprotein complex RISC analogous to miRNA-RISC and siRNA-RISC. This piRISC cleaves the complementary piRNA precursor transcript generating a piRNA intermediate that interacts with the PIWI protein Aubergine (Aub). After interacting with Aub, further piRNA biogenesis occurs via two interconnected mechanisms: phasing and amplification (Fig. 1.2). Briefly, primary biogenesis via phasing occurs in Yb bodies at the outer membrane of the mitochondria (Ge et al., 2019; Huiyan Huang et al., 2011; Haidong Huang et al., 2014; Watanabe et al., 2011) and generates de novo piRNAs increasing piRNA diversity. Secondary piRNA processing via amplification (Brennecke et al., 2007; Gunawardane et al., 2007) results in reciprocal cleavage of the piRNA via PIWI proteins increasing the available pool of certain piRNAs for gene silencing (Ramat and Simonelig, 2021). piRNAs have been implicated in gene silencing mainly through their role in transposon silencing (Brennecke et al., 2008; Khurana et al., 2011), but they also function to regulate gene expression via mRNA deadenylation, cleavage, and stability (Rojas-Ríos et al., 2017; Ma et al., 2017; Barckmann et al., 2015; Gou et al., 2014; Rouget et al., 2010; Zhang et al., 2015; Ramat and Simonelig, 2021).

    Medium ncRNAs

    Non-coding RNAs are often divided in just two categories (small/short and long), with a somewhat arbitrary threshold of 200nt in length to distinguish the two types (Brosnan and Voinnet, 2009). However, some publications suggest that mid- or medium-sized ncRNAs should be in an intermediate category of their own (Boivin et al., 2019). Here, we define medium ncRNAs as ranging between ∼50 and 200nt and of diverse regulatory functions, including important structural ncRNAs such as tRNAs, rRNAs, and snoRNAs.

    Transfer RNAs (tRNAs) and tRNA fragments

    The most abundant class of medium ncRNAs are tRNAs, with 587 genes in the human genome as defined by the HUGO Gene Nomenclature Committee (HGNC, www.genenames.org) (Seal et al., 2020). Originally identified in 1958 as described above, tRNAs serve as an adaptor molecule between anticodon and corresponding amino acid during protein synthesis in the ribosome, translating the genetic code from DNA to a protein sequence. tRNAs vary in length from 73 to 93nts, contain many modified nucleotides, and fold into a well-characterized cloverleaf structure (Giegé, 2008). In addition to their essential and highly conserved canonical function in translation, tRNAs have more recently been implicated in a number of metabolic pathways, from gene regulation in response to nutritional stress to cell wall biosynthesis and antibiotic synthesis (Raina and Ibba, 2014; Avcilar-Kucukgoze and Kashina, 2020). Recently, tRNAs have gained renewed attention as a myriad of tRNA-derived small ncRNAs, or tRNA-derived fragments (tRFs), and tRNA halves (tiRNAs) were identified in high-throughput sequencing experiments. An emerging body of research suggests that tRFs are not just degradation products of tRNAs but are produced through precise, defined cleavage events and can have functions in gene expression, translation, and the cell cycle (Schorn et al., 2017; Xie et al., 2020).

    Ribosomal RNAs (rRNAs)

    As described above, rRNA is the primary component of ribosomes, which are the catalytic enzymes underpinning the translation of proteins. Many copies of rRNA genes are encoded in the human genome (Seal et al., 2020) at the 5S rRNA cluster (transcribed by RNA Pol III) and at five 47S rRNA loci (47S is a multi-cistronic precursor to the mature 28S, 5.8S, and 18S rRNAs), which is transcribed by RNA Pol I. These rRNAs are further processed by snoRNA-RNPs (described in the section Small nucleolar RNAs). Although their canonical role as essential constituents of the ribosome is very well defined, emerging evidence has found that miRNA sequences exist within rRNA termed rRNA-hosted miRNA analogs, which might be important in stress conditions and development (Yoshikawa and Fujii, 2016; Locati et al., 2018; Yunwei Shi et al., 2019; Mangrauthia et al., 2018).

    Small nuclear RNAs (snRNAs)

    In 1979, Lerner and colleagues showed that small nuclear RNAs (also described as U-RNAs) complexed with RNPs from patients with systemic lupus etherymos (SLE) (Lerner and Steitz, 1979). This finding would form the basis of the work carried out by Thomas Cech and Sidney Altman which lead to their 1989 Nobel Prize as discussed above but was also the first work in a slew of manuscripts defining many of the mid-sized RNA moieties. By definition, snRNAs are nuclear in nature, ∼150nt in length, extensively modified like tRNAs and rRNAs, and functionally distinct from the small nucleolar RNAs defined below. snRNAs always form RNP complexes referred to as snRNPs (pronounced snurps) and the most common snRNAs are U1, U2, U4, U5, and U6, which are highly conserved among eukaryotes and are all components of the spliceosome, involved in group II intron splicing of mRNAs (Bohnsack and Sloan, 2018; Villa et al., 2002). Additionally, some studies suggest non-canonical functions of snRNAs, such as the regulation of gene expression and mRNAs processing (Ideue et al., 2012).

    Small nucleolar RNAs (snoRNAs)

    Small nucleolar RNAs (snoRNAs) function in the nucleolus, the largest substructure of the nucleus, where they facilitate multiple roles in ribosome biogenesis, such as modifying rRNA, guiding pre-rRNA processing, and acting as molecular chaperones. snoRNAs are abundant, ∼60–300nt long and some of the most functionally diverse trans-acting ncRNAs currently known. Similar to rRNAs and snRNAs, snoRNAs form RNPs with proteins to exert their functions (Filipowicz and Pogacić, 2002). The first human snoRNA was in fact first classified as a snRNA and thus named U3 (SNORD3) , identified in 1976 via biochemical fractionation assays as the most abundant small nuclear RNA in HeLa cells (Zieve and Penman, 1976). Almost a decade later, U3 was found to form an RNP, which was targeted by autoantibodies in a patient with scleroderma (Reimer et al., 1987). More specifically, the autoantibodies targeted fibrillarin, a protein binding to U3. These antibodies proved useful in the identification of many other snoRNAs that also bound fibrillarin, including U8 (SNORD118), U13 (SNORD13), U14 (SNORD14), and U15 (SNORD15) in human over the following years (Tyc and Steitz, 1989; Tycowski et al., 1993). Based on the common sequence motif and secondary structure enabling fibrillarin binding, these snoRNAs were classified as C/D-box snoRNAs. The two motifs, C box (RUGAUGA) and D box (CUGA) after which they are named, and a short stem loop constitute a kink-turn (K-turn) structural motif that is recognized by the snoRNP fibrillarin. C/D-box snoRNAs can also consist of internal motifs that are frequently imperfect copies of the C and D box motifs (Tamás Kiss, 2002).

    Another class of snoRNAs was discovered independently, the H/ACA-box snoRNAs (Kiss and Filipowicz, 1993; Ruff et al., 1993; Ganot et al., 1997). Like C/D-box snoRNAs, they are 60–300nts in length, often originate from intronic regions in mRNAs or other ncRNAs, but they bind to different protein partners and serve different functions in the cell (Balakin et al., 1996). H/ACA-box snoRNAs contain two motifs—the H (ANANNA) and the ACA (ACANNN) boxes—and fold into a hairpin-hinge-hairpin-tail structure. The 5′ and/or 3′ hairpin also consists of an internal loop pocket where the substrate RNAs are bound.

    snoRNAs have been shown to act by modifying other ncRNAs, more specifically by posttranscriptionally pseudouridylating and 2′-O-methylating rRNA and snRNA molecules (Tamás Kiss, 2002), respectively. While C/D-box snoRNAs are responsible for 2′-O-methylation, H/ACA-box snoRNAs mediate pseudouridylation (Kiss and Filipowicz, 1993; Tamás Kiss, 2002). snoRNAs are present in archaea as well as in eukaryotes, indicating that they arose over 2–3 billion years ago. Currently, 943 snoRNAs are annotated in the human genome (GENCODE v38, November 2021). With only 100–200 rRNA sites known to carry snoRNA-mediated modifications, the number of identified snoRNAs far exceeds what would be expected based on known modified ncRNA sites (Bachellerie et al., 2002). In addition, many snoRNAs lack sequence complementarity to potential rRNA or snRNAs targets and their localization is not restricted to the nucleolus. Therefore, these orphan snoRNAs are likely involved in other molecular mechanisms, such as modifications of mRNAs and other RNA biotypes, impacting splicing, as precursors of miRNAs, or by mediating chromatin accessibility. One of the more well-studied examples of an orphan snoRNA is the brain-specific C/D-box snoRNA HBII-52, which has been described to modulate alternative splicing of the transcript encoding the serotonin receptor. Patients with the genetic imprinting disorder Prader–Willi syndrome lack HBII-52, resulting in different serotonin receptor isoforms and, ultimately, in altered serotonin sensitivity (Kishore and Stamm, 2006; Sahoo et al., 2008). A 2012 study showed specific snoRNAs can mediate chromatin accessibility in Drosophila cells (Schubert et al., 2012). There are likely many other functions of snoRNAs yet to be unveiled.

    snaRs, Y RNAs, and vault RNAs

    Worthy of mention is the discovery of the more obscure and less studied examples of mid-sized RNAs (Y-RNA, snaRs, and vault RNAs) that are not very abundant and have ill-defined functions. Of these, small NF90-associated RNAs (snaRs) are by far the least characterized category of ncRNA having only been outlined in a handful of published manuscripts (Parrott and Mathews, 2009, 2007; Mathews and Parrott 2008; Parrott et al., 2011). They were first identified in 2007 after they immunoprecipitated with antibodies against NF90 (a protein product derived from ILF3). This combined with the fact that they also bind ribosomes suggests that they play a role in translational control; however, their function remains unclear. They are transcribed by RNA polymerase III, are only present in great apes, and have undergone rapid evolution, some specific to humans. There are 28 snaRs in the human genome (Seal et al., 2020).

    Shortly after the discovery of snRNAs from the nuclear extract of SLE patients (Lerner and Steitz, 1979) (described above in the section Small nuclear RNAs), the same scientists identified a second set of ncRNA using whole cell extract (Lerner et al., 1981). These ncRNAs were termed Y-RNAs because they were mostly cytoplasmic which differentiated them from the nuclear localized snRNAs and nucleolar snoRNAs. These RNAs complexed with the Ro60 protein, which is clinically important in patients with rheumatic disease, SLE and Sjögren's syndrome. There are only four Y-RNAs in the human genome which are all encoded at 7q36.1 and transcribed by RNA polymerase III (Seal et al., 2020). They are approximately 100 nucleotides in length and have a distinctive secondary structure, containing a stem formed from the base pairing of the 5′ and 3′ ends which includes the Ro60 binding site. Although our knowledge on the function of Y-RNAs is incomplete, they appear to influence subcellular location of Ro60 (Sim et al., 2009, 2012) and its ability to bind misfolded RNAs (Fuchs et al., 2006; Stein et al., 2005).

    In 1986, a novel class of mid-sized RNA termed vault RNAs (vRNAs) were discovered as part of the largest known RNP complexes called vaults (Kedersha and Rome, 1986). The function of these vault complexes (named as such due to similarities to the arches found in the vaults of cathedrals) are not well understood; however, in response to external stimuli, they translocate to different subcellular compartments and are thought to mediate shuttling processes between cytoplasm and nucleus (van Zon et al., 2003; Hahne et al., 2021). The vault protein TEP1 that binds vRNAs is similar to Ro60 which binds Y-RNAs, implying that these two RNAs might be evolutionary related (Bateman and Kickhoefer, 2003; Kickhoefer et al., 2001). Vault RNAs are only found in higher eukaryotes. In humans, four vRNAs are encoded on chromosome 5at two different genomic locations and are transcribed by RNA polymerase III (Seal et al., 2020). Although named vault RNAs, 95% of these RNAs do not associate with vaults, and even 30 years after their discovery, the molecular functions of vault RNAs are not well defined (Hahne et al., 2021). A recent study suggests that vault RNA1-1 is a riboregulator of autophagy, which may open new avenues of protein posttranslational regulation by vault RNAs (Horos et al., 2019).

    Long non-coding RNAs

    Early discoveries of lncRNAs

    To distinguish small and medium ncRNAs (see the sections Small ncRNAs and Medium ncRNAs) from long non-coding RNAs (lncRNAs), a cutoff of 200nt in length is generally accepted in the literature. In the pre-genomics era, the first human lncRNA to be identified was H19, a paternally imprinted lncRNA gene that was identified as one of the highest expressed transcripts in embryos, but silenced in most tissues at birth (Davis et al., 1987; Brannan et al., 1990; Bartolomei et al., 1991). H19 is reciprocally imprinted with its adjacent protein coding gene Igf2. Several functional mechanisms have been proposed for H19, which predominantly localizes to the cytoplasm, including its role as a precursor for miR-675 (Yoshimizu et al., 2008) but also as a competitive endogenous RNA (ceRNA), a molecular

    Enjoying the preview?
    Page 1 of 1