Navigating Non-coding RNA: From Biogenesis to Therapeutic Application
()
About this ebook
Navigating Non-coding RNA: From Biogenesis to Therapeutic Application provides a concise overview of the field of non-coding RNA (ncRNA). Chapters cover the history of discoveries that have occurred in the area of ncRNA, specific types of ncRNA, housekeeping ncRNAs such as ribosomal RNA, transfer RNA, small nuclear RNA and telomerase RNA, regulatory ncRNAs such as microRNA, small interfering RNA, long non-coding RNA and Y RNA. Biogenesis, structure, function, and regulation of each of these are also explored in addition to traditional and cutting-edge methods for the identification, functional characterization and structural characterization of ncRNA.
The book also focuses on the different types of epitranscriptomic modifications and their involvement in regulating ncRNA structure, stability and intermolecular interactions in addition to the role of ncRNAs in a range of diseases and potential therapeutic applications.
- Covers a wide range of non-coding RNAs, including ribosomal RNA, transfer RNA, telomerase RNA, microRNA, small interfering RNA and circular RNA
- Features both traditional and novel methodologies for investigating ncRNA, from microarray and conventional chemical probing to CAGE-seq and computational methods
- Includes chapters on ncRNAs in a range of diseases, including cancers, neurological disorders, cardiovascular conditions and infectious illnesses
- Discusses novel therapeutic strategies for targeting ncRNAs, including CRISP/Ca9 applications and RNAi-based strategies
- Explores the molecular mechanisms and intermolecular interactions of ncRNA
Related to Navigating Non-coding RNA
Related ebooks
Nanotechnologies in Preventive and Regenerative Medicine: An Emerging Big Picture Rating: 0 out of 5 stars0 ratingsVaccinology and Methods in Vaccine Research Rating: 0 out of 5 stars0 ratingsRNA Therapeutics: The Evolving Landscape of RNA Therapeutics Rating: 0 out of 5 stars0 ratingsParkinson's Disease: Molecular Mechanisms Underlying Pathology Rating: 0 out of 5 stars0 ratingsEpigenetics of Exercise and Sports: Concepts, Methods, and Current Research Rating: 0 out of 5 stars0 ratingsNeuro-Urology Research: A Comprehensive Overview Rating: 0 out of 5 stars0 ratingsProteolytic Signaling in Health and Disease Rating: 0 out of 5 stars0 ratingsThe Norovirus: Features, Detection, and Prevention of Foodborne Disease Rating: 0 out of 5 stars0 ratingsRecent Advances in iPSC Technology Rating: 0 out of 5 stars0 ratingsEpigenetics and DNA Damage Rating: 0 out of 5 stars0 ratingsMechanobiology: From Molecular Sensing to Disease Rating: 0 out of 5 stars0 ratingsiPSCs - State of the Science Rating: 0 out of 5 stars0 ratingsTranslational Neuroimmunology, Volume 8: Multiple Sclerosis Rating: 0 out of 5 stars0 ratingsIntrinsically Disordered Proteins: Dynamics, Binding, and Function Rating: 0 out of 5 stars0 ratingsEpigenomics in Health and Disease Rating: 0 out of 5 stars0 ratingsiPSCs from Diverse Species Rating: 0 out of 5 stars0 ratingsSmall Molecule Drug Discovery: Methods, Molecules and Applications Rating: 0 out of 5 stars0 ratingsSex Estimation of the Human Skeleton: History, Methods, and Emerging Techniques Rating: 0 out of 5 stars0 ratingsThe Behavioral, Molecular, Pharmacological, and Clinical Basis of the Sleep-Wake Cycle Rating: 0 out of 5 stars0 ratingsNeurodevelopmental Disorders: Comprehensive Developmental Neuroscience Rating: 0 out of 5 stars0 ratingsStem Cells and Biomaterials for Regenerative Medicine Rating: 0 out of 5 stars0 ratingsTranslating Epigenetics to the Clinic Rating: 0 out of 5 stars0 ratingsGenomics of Rare Diseases: Understanding Disease Genetics Using Genomic Approaches Rating: 0 out of 5 stars0 ratingsEpigenetics and Metabolomics Rating: 0 out of 5 stars0 ratingsiPSC Derived Progenitors Rating: 0 out of 5 stars0 ratingsNovel Therapeutic Approaches Targeting Oxidative Stress Rating: 0 out of 5 stars0 ratingsAdvances in Resting-State Functional MRI: Methods, Interpretation, and Applications Rating: 0 out of 5 stars0 ratingsMechanobiology in Health and Disease Rating: 0 out of 5 stars0 ratingsBioengineering Innovative Solutions for Cancer Rating: 0 out of 5 stars0 ratingsNew Horizons in Evolution Rating: 0 out of 5 stars0 ratings
Biology For You
Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5Anatomy and Physiology For Dummies Rating: 4 out of 5 stars4/5Peptide Protocols: Volume One Rating: 4 out of 5 stars4/5The Obesity Code: the bestselling guide to unlocking the secrets of weight loss Rating: 4 out of 5 stars4/5Anatomy 101: From Muscles and Bones to Organs and Systems, Your Guide to How the Human Body Works Rating: 4 out of 5 stars4/5Why We Sleep: Unlocking the Power of Sleep and Dreams Rating: 4 out of 5 stars4/5All That Remains: A Renowned Forensic Scientist on Death, Mortality, and Solving Crimes Rating: 4 out of 5 stars4/5This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking Rating: 4 out of 5 stars4/5Sapiens: A Brief History of Humankind Rating: 4 out of 5 stars4/5Homo Deus: A Brief History of Tomorrow Rating: 4 out of 5 stars4/5Dopamine Detox: Biohacking Your Way To Better Focus, Greater Happiness, and Peak Performance Rating: 3 out of 5 stars3/5Lies My Gov't Told Me: And the Better Future Coming Rating: 4 out of 5 stars4/5Genius Kitchen: Over 100 Easy and Delicious Recipes to Make Your Brain Sharp, Body Strong, and Taste Buds Happy Rating: 0 out of 5 stars0 ratingsEmotional Blackmail: When the People in Your Life Use Fear, Obligation, and Guilt to Manipulate You Rating: 4 out of 5 stars4/5How Emotions Are Made: The Secret Life of the Brain Rating: 4 out of 5 stars4/5The Deepest Well: Healing the Long-Term Effects of Childhood Trauma and Adversity Rating: 4 out of 5 stars4/5The Soul of an Octopus: A Surprising Exploration into the Wonder of Consciousness Rating: 4 out of 5 stars4/5Vax-Unvax: Let the Science Speak Rating: 5 out of 5 stars5/5Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition) Rating: 4 out of 5 stars4/5Jaws: The Story of a Hidden Epidemic Rating: 4 out of 5 stars4/5Mother of God: An Extraordinary Journey into the Uncharted Tributaries of the Western Amazon Rating: 4 out of 5 stars4/5The Grieving Brain: The Surprising Science of How We Learn from Love and Loss Rating: 4 out of 5 stars4/5Woman: An Intimate Geography Rating: 4 out of 5 stars4/5The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race Rating: 4 out of 5 stars4/5Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness Rating: 4 out of 5 stars4/5The Blood of Emmett Till Rating: 4 out of 5 stars4/5A Crack In Creation: Gene Editing and the Unthinkable Power to Control Evolution Rating: 4 out of 5 stars4/5"Cause Unknown": The Epidemic of Sudden Deaths in 2021 & 2022 Rating: 5 out of 5 stars5/5A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals Rating: 3 out of 5 stars3/5The Woman Who Changed Her Brain: And Other Inspiring Stories of Pioneering Brain Transformation Rating: 4 out of 5 stars4/5
Reviews for Navigating Non-coding RNA
0 ratings0 reviews
Book preview
Navigating Non-coding RNA - Joanna Sztuba-Solinska
Navigating Non-coding RNA
From Biogenesis to Therapeutic Application
Edited by
Joanna Sztuba-Solinska
Department of Biological Sciences, Auburn University, Auburn, AL, United States
Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
Table of Contents
Cover image
Title page
Copyright
List of contributors
Chapter 1. History and definitions of ncRNAs
The C-value paradox—there is more to a genome than protein-coding genes
Protein synthesis, splicing, and RNA with catalytic function—the early days of ncRNA research
Pervasive transcription
Small ncRNAs
Medium ncRNAs
Long non-coding RNAs
Conclusions
Chapter 2. Regulatory non-coding RNAs-biogenesis, mechanisms of action and role in gene expression regulation
Introduction
microRNAs (miRNAs)
Small interfering RNAs (siRNAs)
Long non-coding RNAs (lncRNAs)
Enhancer RNAs (eRNAs)
Circular RNAs (circRNAs)
Piwi RNAs (piRNAs)
Vault RNAs (vRNAs)
Concluding remarks
Chapter 3. Non-coding RNAs: Mechanisms of action
MicroRNAs
PIWI-interacting RNAs
Small nucleolar RNAs
Small nuclear RNAs
tRNA-derived fragments
Long non-coding RNAs
Circular RNAs
Enhancer RNAs
CRediT author statement
Chapter 4. Functional characterization of lncRNAs
Introduction
Genomic features
Transcriptomic features
Subcellular localization
Interacting partners (RNA/DNA/protein)
RNA structure
Perturbation experiments followed by phenotypic assessment
High-throughput functional screening
Outlook
Chapter 5. Secondary structural characterization of non-coding RNAs
Introduction
In silico approaches for the characterization of non-coding RNA secondary structure
Experimental approaches for the characterization of non-coding RNA secondary structure
Example analysis of the human H19 long noncoding RNA
Conclusion
Author contributions
Chapter 6. Regulation of non-coding RNAs
Introduction
Transcriptional regulation of ncRNAs
Identification of promoter regions for miRNAs and lncRNAs
Epitranscriptomic regulation of non-coding RNA
A-to-I editing
Funding
Chapter 7. Non-coding RNAs in human non-infectious diseases
Introduction
Aberrant regulation of non-coding RNAs in rheumatoid arthritis
Dysregulation of non-coding RNA in systemic lupus erythematosus
Conclusions and perspectives: autoimmune and inflammatory diseases
Cardiac and skeletal muscle
Disease of cardiac muscle
Atrophic conditions in skeletal muscle
Conclusions and perspectives: cardiac and skeletal muscle diseases
Chapter 8. Non-coding RNAs in human infectious diseases
Introduction
Biogenesis of non-coding RNA
Innate immune response elicited by pathogens
Non-coding RNAs in viral infections
Non-coding RNAs in bacterial infections
Non-coding RNAs in other infections
Predisposition to infectious diseases due to genetic polymorphism
Non-coding RNAs as therapeutics for infectious diseases
Conclusion
Chapter 9. Therapeutic targeting non-coding RNAs
Introduction
Non-coding RNA-based therapeutics
Conclusions and perspectives
Index
Copyright
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2023 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-323-90406-3
For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Wolff, Andre G.
Acquisitions Editor: Fisher, Michelle
Editorial Project Manager: Mapes, Matthew
Production Project Manager: Raviraj, Selvaraj
Cover Designer: Christian J. Bilbow
Typeset by TNQ Technologies
List of contributors
Ryan J. Andrews, Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States
Nicole I. Anthony, Department of Biological Sciences, Auburn University, Auburn, AL, United States
Liliana Roxana Balahura (Stamat), Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Alessandro Bonetti, Translational Genomics, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
Alessia Corbelli, Dipartimento di Biologia Ecologia Scienze Della Terra, Università Della Calabria, Rende, Italy
Marieta Costache
Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Research Institute of the University of Bucharest, Bucharest, Romania
Sarah D. Diermeier
Department of Biochemistry, University of Otago, Dunedin, New Zealand
Amaroq Therapeutics Ltd., Auckland, New Zealand
Sorina Dinescu
Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Research Institute of the University of Bucharest, Bucharest, Romania
Agnieszka Dzikiewicz-Krawczyk, Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
Ota Fuchs, Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
Marta Elżbieta Kasprzyk, Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
Marta Kazimierska, Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
Akhilesh Kumar, Department of Biological Sciences, Laboratory of Immunology and Infectious Disease Biology, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal, Madhya Pradesh, India
Himanshu Kumar
Department of Biological Sciences, Laboratory of Immunology and Infectious Disease Biology, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal, Madhya Pradesh, India
Laboratory of Host Defense, WPI Immunology, Frontier Research Centre, Osaka University, Osaka, Japan
Andreea Daniela Lazar, Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Megan P. Leask
Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States
Department of Physiology, University of Otago, Dunedin, New Zealand
Phillip J. McCown, Department of Internal Medicine - Nephrology, Michigan Medicine, University of Michigan, Ann Arbor, MI, United States
Alexandra Elena Mocanu-Dobranici, Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Walter N. Moss, Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States
Simona Nazarie (Ignat), Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Collin A. O'Leary, Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States
Clay E. Pandorf, Cell Biology and Physiology, Edward Via College of Osteopathic Medicine-Auburn, Auburn, AL, United States
Simona Panni, Dipartimento di Biologia Ecologia Scienze Della Terra, Università Della Calabria, Rende, Italy
Jake M. Peterson, Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States
Marta Podralska, Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
Athira S. Raj, Department of Biological Sciences, Laboratory of Immunology and Infectious Disease Biology, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal, Madhya Pradesh, India
Roslyn Michelle Ray, Gene Therapy Research, CSL Behring, Pasadena, CA, United States
Warren B. Rouse, Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States
Iuliana Samoilă, Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Aida Şelaru, Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest, Romania
Weronika Sura, Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
Joanna Sztuba-Solinska
Department of Biological Sciences, Auburn University, Auburn, AL, United States
Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
Van S. Tompkins, Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States
Emma Catharina Walsh, Translational Genomics, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
Chapter 1: History and definitions of ncRNAs
Sarah D. Diermeier ¹ , ² , and Megan P. Leask ³ , ⁴ ¹ Department of Biochemistry, University of Otago, Dunedin, New Zealand ² Amaroq Therapeutics Ltd., New Zealand ³ Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States ⁴ Department of Physiology, University of Otago, Dunedin, New Zealand
Abstract
Recently, non-coding RNAs (ncRNAs) have received a lot of attention in the literature due to their involvement in a plethora of molecular roles, such as regulation of gene expression, splicing, or as modulators of protein activity. As reflected in the number of published primary research articles over the past 5–10 years, the spotlight has been in particular on some of the more recently discovered RNA biotypes, such as long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs). However, these are just the most recent developments in a long history of ncRNA research that extends some 60 years. Here, we describe the history of ncRNA research from the first findings of catalytically active RNA molecules in the 1950s to the discovery of pervasive transcription in genome-wide transcriptome studies, and the current state of research. We go on to describe and define the different classes of ncRNAs based on size and provide some examples of their functions in the cell.
Keywords
circRNA; eRNA; lncRNA; miRNA; ncRNA; piRNA; PROMPTs; rRNA; siRNA; snoRNA; snRNA; tRNA
The C-value paradox—there is more to a genome than protein-coding genes
The earliest hints at the widespread existence of non-coding RNAs stems from the C-value paradox,
which was coined in the 1970s (Thomas, 1971). Around that time, it became apparent that the amount of DNA in a haploid genome, referred to as the C-value
(constant value of a haploid genome
) showed little correlation with the complexity of its organism. Frogs for example have larger genomes than humans, and single-celled amoeba have some of the largest genomes we know of, which seemed paradoxical to scientists at the time based on the assumption that the higher
organism should have the bigger genome. A few years later, an explanation for the C-value paradox arose as newer research suggested that much of the genome does not code for proteins. Calculations of the mutational load in the genome determined that the human genome likely contains 20,000–30,000 genes (Ohno, 1972). In this context, genes
were defined as transcriptional units ultimately translated to proteins. A few years later, DNA–RNA hybridization experiments further confirmed this estimate (Levin, 1980). Interestingly, this number is very close to the currently identified number of protein-coding genes in the human genome, just under 20,000 (19,955 according to GENCODE v38, November 2021). The remaining non-coding
part of the genome was termed junk DNA
under the initial assumption that it had no function at all. However, not all scientists agreed with this notion at the time and some interest about the potential functionality of non-coding DNA remained. Early hypotheses included regulatory functions, RNA processing, a reservoir for evolutionary innovation, and genome integrity, among others (Yunis and Yasmineh, 1971; Britten and Davidson, 1971; Orgel and Crick, 1980; Lewin, 1982 ). As we now know, much of the non-coding genome does indeed have biochemical functionality, including regulatory regions such as enhancers or promoters to transposons, repetitive elements and, of course, non-coding RNA (ncRNA) genes (ENCODE Project Consortium et al., 2007, 2020). This chapter explores the history of ncRNA research (Fig. 1.1), which contributes to our current understanding of the C-value paradox.
Protein synthesis, splicing, and RNA with catalytic function—the early days of ncRNA research
A decade before the C-value paradox was hypothesized the very first ncRNA species were discovered. In 1958, transfer RNAs (tRNAs) (Hoagland et al., 1958) and ribosomal RNAs (rRNAs) (Crick, 1958) were described, both of which are essential for protein synthesis. Not only were these discoveries crucial to forming the Central Dogma of Molecular Biology (the flow of information from a gene to its corresponding protein) earning Watson and Crick their Nobel Prize in 1962, these ncRNAs also demonstrated for the first time that RNA is an important functional molecule in the cell. Three years later, in 1961, three papers published within the same month described the protein-coding messenger RNA (mRNA), the third and final component needed to understand the Central Dogma, the flow of information from a gene to its corresponding protein (Brenner et al. 1961; Gros et al., 1961; Jacob and Monod, 1961).
Figure 1.1 A schematic of the history of ncRNA discoveries since 1950 to the present day.In gold are discoveries that lead to the award of Nobel Prizes.
Some 20 years later, one of the earliest discoveries that started to provide some explanation for the C-value paradox was the identification of intronic sequences in eukaryotes, which account for a substantial proportion (25%) of non-coding DNA (Jo and Choi, 2015). In 1977, two labs independently discovered split genes
in eukaryotes (Chow et al., 1977; Berget et al., 1977). In bacteria, mRNAs are a perfect, sequence-complementary copy of the original DNA sequence. However, eukaryotic genes contain intervening sequences, called introns, that are removed when mRNA is transcribed, and are not part of the coding sequence that is translated. Some organisms have large intronic sequences which correlate with larger genomes but not necessarily with more coding genes (Vinogradov, 1999). Using nucleic acid hybridization, both teams showed that a probe for mRNA hybridized with the genomic DNA complement caused the DNA to loop out in the region where the non-coding sequence was displaced by the mRNA (termed R loops) indicating the presence of an intron. For this discovery, Richard Roberts and Philip Sharp were awarded the Nobel Prize in Physiology or Medicine in 1993. In the 1980s, scientists showed that these intronic sequences were removed by a novel class of ncRNAs functioning in a ribonucleoprotein (RNP) complex (Black et al., 1985; Chabot et al., 1985) defined in the same year as the spliceosome (Brody and Abelson, 1985). This unexpected class of ncRNAs was in fact first identified in 1966 through gel electrophoresis (Hadjiolov et al., 1966) and originally termed U-RNAs due to their uridine content, but are now known as the small nuclear RNAs (snRNAs) (Busch et al., 1982) (described in more detail in the section Small nuclear RNAs
).
In the late 1970 and 1980s, a large number of studies found that many different ncRNAs are complexed in RNPs (Lerner et al., 1981; Lerner and Steitz, 1979; Reimer et al., 1987; Kedersha and Rome, 1986), some of which have catalytic function, including rRNAs in the ribosome and snRNAs in the spliceosome. These RNA enzymes, or ribozymes,
catalyze chemical reactions that are critical for life. In the case of both the ribosome and the spliceosome, the RNA component is sufficient for the enzymatic reaction catalyzed while proteins are the structural units that support and stabilize the RNA core. Although Woese, Crick, and Orgel were the first to suggest that RNA could act as an enzymatic catalyst in 1967 (Woese, 1967), the general discovery of RNAs with catalytic functions is credited to Thomas Cech's group, who found a self-splicing intron in the rRNA locus of Tetrahymena and coined the term ribozyme
(Cech et al. 1981; Kruger et al., 1982). Despite extensive efforts of the group to remove proteins from the experiment, the splicing reaction kept occurring, leaving catalytic RNA as the only possible explanation. Thomas Cech went on to win the Nobel Prize in Chemistry together with Sidney Altman for establishing the catalytic properties of RNA in 1989. It took another decade to show that the ribosome is indeed a ribozyme as well. The conclusion that rRNA has catalytic function could only be made once the RNP structure of the large subunit was solved, which revealed that there are no proteins in the active center where peptide bond formation occurs (Ban et al., 2000). Similarly, once the structure of the spliceosome was revealed, this RNP was classed as a protein-directed metalloribozyme
(Yigong Shi, 2017). While RNA-catalyzed protein synthesis is a universal principle across all animal kingdoms, the ncRNA field did not progress much further at the time, as researchers assumed them to be, for the most part, unstable intermediates. Nonetheless, the discoveries of the first ncRNA genes and intronic sequences were able to explain in part the C-value paradox, but there was still a lot to be discovered over the next decades.
Pervasive transcription
In the 1970s, more hints emerged that some of the junk DNA
was actually being transcribed. In mammals, a new class of RNAs was discovered as being transcribed from repetitive and heterochromatic regions, as well as >20% of non-repetitive regions, and termed heterogenous nuclear RNA
(hnRNA). As the name implies, about half of these hnRNAs were restricted to the nucleus (Holmes et al., 1972; Pierpont and Yunis, 1977). In 1975, it was determined that there is 10-fold more hnRNA in the cell compared to mRNA using nucleic acid hybridization reassociation kinetics, or cot curves
(Hough et al., 1975). In 1980, cot curves further demonstrated a 10-fold greater complexity of nuclear compared to cytoplasmic polyadenylated RNA (Holland et al., 1980).
While these findings were intriguing, it was not until new technologies such as microarrays and high-throughput sequencing emerged in the 2000s that the genomics era revealed just how widespread or pervasive
transcription in eukaryotic genomes is and elucidated the many different ncRNA classes comprising hnRNA. Multiple landmark studies demonstrated independently that genomes contain a lot more transcripts than expected based on existing annotations, dramatically reducing the percentage of junk
or non-functional/non-transcribed DNA. In 2002, tiling arrays identified widespread unannotated transcription (Kapranov et al., 2002). In the same year, several studies detected widespread ncRNA and antisense transcription by sequencing full-length cDNAs (Chen et al., 2002; Saha et al., 2002; Okazaki et al., 2002). The first whole-genome transcriptome mapping experiments were completed in 2003–04 for Drosophila, Arabidopsis, and human (Stolc et al., 2004; Yamada et al., 2003; Bertone et al., 2004) and many of the newly discovered transcripts comprised numerous different biotypes of RNA, including previously unidentified protein-coding genes, new splice isoforms of protein-coding genes, and overlapping transcripts on both strands, leading to the discovery of extensive antisense transcription. In a landmark paper in Science in 2005, >70% of all mammalian sense transcripts were found to have antisense partners (Katayama et al., 2005). Furthermore, these early transcriptomics studies hinted at the existence of a vast abundance of transcripts lacking open reading frames (ORFs) of significant length. These ncRNAs appeared to be more tissue-specific than mRNAs, and many of them were expressed in a tightly controlled manner throughout development or associated with disease (Inagaki et al., 2005; Ravasi et al., 2006; Sasaki et al., 2007).
Several international consortia were formed at around the same time, producing large-scale datasets from many different cell and tissue types in human and mouse. These large groups of scientists set out to identify and characterize all functional elements in the human and mouse genomes, taking a swing at the antiquated concept of junk
DNA. Functional ANnoTation Of the Mammalian genome (FANTOM) was established in 2000at Riken, Japan, to generate an atlas of mouse transcripts (Kawai et al., 2001; Carninci et al., 2005). Over the years, FANTOM evolved and expanded, with the two latest projects FANTOM5 and FANTOM6 focusing specifically on cataloging mammalian ncRNAs such as microRNAs (miRNAs) and long ncRNAs (lncRNAs), and functionally characterizing lncRNAs, respectively (Hon et al., 2017; de Rie et al., 2017; Ramilowski et al., 2020). FANTOM data is publicly available and provides an invaluable resource for the scientific community. Another international consortium, the Encyclopedia of DNA Elements (ENCODE) was launched in 2003 by the US National Human Genome Research Institute to identify all functional elements of the human genome. They went on to catalog all existing human and mouse transcripts and determined their subcellular localizations (ENCODE Project Consortium et al., 2007; Djebali et al., 2012). Their findings revealed that only 1%–2% of the human genome encodes for proteins while 60%–75% of the mammalian genome can be transcribed in a context- and cell type-specific manner, with up to 80% of the genome being transcribed or having regulatory function. In particular, the number of short/small RNAs and long poly-adenylated ncRNAs expanded significantly due to the efforts of the consortium. Like FANTOM, ENCODE data is publicly available and continues to release regular updates, with the latest major release in 2020 (ENCODE Project Consortium et al., 2020).
In addition to direct detection of RNA transcripts, further evidence for pervasive transcription emerged from epigenetics studies such as genome-wide histone modification profiles and chromatin accessibility assays that confirmed the existence of many actively transcribed non-coding loci. In 2004, chromatin immunoprecipitation coupled to high-throughput sequencing (ChIP-seq) experiments showed an unexpected abundance of mammalian promoters, many of which were associated with ncRNAs (Cawley et al., 2004). ENCODE continues to extend on these initial epigenetic findings integrating transcriptomics with multi-omics datasets, including DNA methylomes, histone modifications, chromatin accessibility, chromatin organization in 3D, transcription factor occupancies, and RNA binding proteins, creating an ever expanding repository of functional annotations (ENCODE Project Consortium et al., 2020).
The discovery of pervasive transcription provided some of the answers for the C-value paradox proposed over 30 years earlier. In 2012 ENCODE reported that 80.4% of the genome had biochemical function (ENCODE Project Consortium, 2012). Some scientists suggest that all or most transcripts that have been detected are in fact functional in onecell state or developmental context or another (Mattick et al., 2010; Pennisi, 2012). However, pervasive transcription is not an unchallenged concept, with some scientists continuing to argue that the C-value paradox remains unsolved and most of the genome consists of junk
DNA. This opinion is based on the observation that many ncRNAs are expressed at very low levels, leading to the classification of some non-coding transcripts as non-functional, caused by transcriptional noise
such as spurious transcription from randomly initiating or leaky
RNA polymerases. Detection limits of the technologies used have been criticized as well. In addition to this, many ncRNAs are not conserved across species, which leads some experts to question if they can have function (Eddy, 2012; Graur et al., 2013; Palazzo and Gregory, 2014). On the other hand, ncRNAs may be evolutionarily new
and highly species-specific, and sequence conservation alone may not be sufficient to designate function to an RNA as structure or genomic location may be equally or more important. Nevertheless, it has been suggested multiple times that the null hypothesis when studying a previously uncharacterized ncRNA should be that it has no function, and the experiments should be conceived to conclusively demonstrate whether a novel transcript has a molecular function in the given context. Thus, although large-scale studies by the FANTOM and ENCODE consortia have produced huge catalogs of ncRNAs, only a small percentage of the discovered transcripts have been assigned function thus far, with new molecular functions being revealed at a steadily increasing rate. We will go on to review landmark discoveries and define ncRNAs based on their size (small, mid-sized, and long, see Fig. 1.1).
Small ncRNAs
miRNA, siRNA, and RNAi
One of the most well studied classes of ncRNAs are the miRNAs, the first of which was discovered in 1993 in the nematode Caenorhabditis elegans in Victor Ambros' laboratory (Lee et al., 1993). The group observed that the protein LIN-14 had to be downregulated for C. elegans larvae to progress from the first (L1) to the second (L2) stage. LIN-14 downregulation was dependent on the gene lin-4; however, it was discovered that the lin-4 transcript is not translated into a protein but instead produces two small RNAs of 21 and 61 nucleotides (nts) in length. The longer RNA was found to be a precursor for the shorter RNA. Interestingly, they and others showed that the short RNA derived from the lin-4 transcript was sequence complementary to the 3′ untranslated region (UTR) of lin-14 mRNA (Lee et al., 1993; Wightman et al., 1993). It turned out that the binding of the short lin-4 RNA to lin-14 mRNA results in the downregulation of LIN-14 at the protein level and that this is essential for developmental progression. Initially, this example of gene regulation by a short ncRNA was believed to be exclusive to C. elegans. In 2000, another small RNA was discovered that was involved in C. elegans larvae development: let-7 (Reinhart et al., 2000; Slack et al., 2000). Importantly, let-7 homologs were discovered in many other organisms, including human (Pasquinelli et al., 2000), and it became increasingly clear in the following years that miRNAs were indeed a large class of ncRNAs, important for gene regulation in many eukaryotes.
miRNAs are defined as short/small (19–24nts) ncRNAs that are derived from a longer precursor miRNA. Unlike many of the earlier discovered ncRNAs such as rRNAs, tRNAs, and snRNAs, miRNAs are not necessarily expressed ubiquitously but can be restricted to certain tissue-types or developmental stages. Since the early findings in the 1990s, thousands of miRNAs have been described in the literature, and a microRNA database (miRbase) was set up in 2003 as a searchable repository for all miRNAs (Griffiths-Jones, 2004) with the current release (v22.1) containing almost 40,000 miRNAs across 271 organisms (Kozomara et al., 2019). According to GENCODE, 1879 miRNAs are annotated in the current version of the human genome (v38, November 2021) and 2201 in mouse (M27, November 2021). miRNAs have also been found to be associated with various diseases such as cancer, where they can act as oncogenes or tumor suppressors (Iorio and Croce, 2012).
miRNAs function to trigger gene silencing via RNA interference (RNAi) (Fig. 1.2) which was initially reported in plants (Napoli et al., 1990; Romano and Macino, 1992) and later followed by Guo and Kemphues in C. elegans (Guo and Kemphues, 1995). However, the true mechanism of RNAi remained elusive until Andrew Fire and Craig Mellow deciphered it somewhat serendipitously in 1998 (Fire et al., 1998) and were subsequently awarded the 2006 Nobel Prize in Physiology or Medicine for their efforts (Fig. 1.1). In their seminal 1998 paper, they observed that double-stranded RNA (dsRNA) was 10–100 times more effective at silencing the target mRNA unc-22 than single-stranded RNA (ssRNA). Indeed, ssRNA only silenced unc-22 when sense RNA and antisense RNA were injected. Thus, dsRNA was sufficient to cause systemic silencing of the target mRNA and identified as the trigger
of RNAi; however, the exact mechanisms and key players in small ncRNA function took several more years to define (Tomari and Zamore, 2005). In 2000, two separate teams identified the functional silencing intermediates: small interfering RNAs (siRNAs), a class of RNA of 21–23nt in length which co-purified with the sequence-specific nuclease. Thus, it was proposed that siRNAs degrade mRNA when incorporated into the RNA-induced silencing complex (RISC) (Hammond et al., 2000; Zamore et al., 2000).
Figure 1.2 A schematic outlining the biogenesis and function of miRNA, siRNA, and piRNAs.In the left panel is a depiction of RNAi. miRNA are transcribed from a primary genome encoded miRNA and siRNA from exogenous dsRNA. After transcription, processing by the microprocessor consisting of drosha (black) and export via Exportin5 (purple) from the nucleus (miRNA pathway) or exogenous introduction of dsDNA (siRNA pathway) dicer (dark gray) cleaves the dsRNA into RNA duplexes. Subsequently, the duplexes are bound by RISC containing Ago2 (light blue). The Ago2 slicer
cleaves the RNA duplex and via mRNA complementation targets the mRNA for cleavage, degradation, and/or translational repression. In the right panel is a schematic outlining piRNA transcription, biosynthesis, and function. From piRNA gene clusters a piRNA precursor transcript is transcribed and exported from the nucleus where it is bound by Ago3 and cleaved. Upon cleavage, the piRNA interacts with Aub where it either enters the ping-pong amplification step of piRNA biogenesis or the primary piRNA biogenesis step at the interface of the mitochondria in Yb bodies. After biogenesis, the piRNA is transported back into the nucleus where it silences genes. The piRNA can also remain in the cytoplasm where it can modify gene translation via mRNA deadenylation, cleavage, and stability.
Since its discovery, the phenomenon of RNAi has been a key tool for gene knockdown experiments in molecular biology and has also lead to the development of RNA therapeutics. The first siRNA drug Patisiran was approved by the FDA for the treatment of hereditary transthyretin amyloidosis (hATTR) in 2018 (Wood, 2018). While Patisiran was the first approved siRNA drug, other RNA therapeutics paved the way, with Formivirsen as the first approval in 1998. Fomivirsen is a first-generation antisense oligonucleotide (ASO) targeting the cytomegalovirus (CMV) IE-2 mRNA for treatment of CMV retinitis. Although ASOs do not work through RNAi, they work on the same principles of targeting mRNA for RNA cleavage albeit via RNAse H. Numerous other ASO and siRNAs have been developed since, which target a myriad of inherited human diseases some of which are fatal if untreated. One of the most ground breaking ASO treatments developed to date is Spinraza, an ASO developed to treat spinal muscular atrophy (SMA). Spinraza was FDA approved in 2016 and corrects skipping of exon 7 in SMN2, leading to full-length, functional SMN2 protein (Rigo et al., 2012; Cartegni and Krainer, 2003; Chiriboga et al., 2016).
miRNAs are endogenous and encoded in the genome as an RNA stem-loop structure (pri-miRNAs) (Ambros et al., 2003; Lee and Ambros, 2001; Lagos-Quintana et al., 2001), whereas siRNAs are synthetic or from other exogenous sources such as viruses or transposons excised from long, fully complementary dsRNAs (Ambros et al., 2003; Zamore et al., 2000; Hammond et al., 2000). Nevertheless, the size similarities and sequence-specific inhibitory functions of miRNAs and siRNAs indicate that they are related in biogenesis and function (Zeng et al., 2003). The detailed mechanism of how siRNAs and miRNAs function in RNAi along with RISC is outlined in Fig. 1.2. The miRNA pathway begins with transcription of the primary miRNA (pri-miRNA) encoded in the genome (75–110nts). After transcription, the pri-miRNA base-pairs with complementary sequences in other regions of the same molecule to form a double-stranded RNA structure defined as the hairpin. The microprocessor complex consisting of Drosha removes the hairpin structure (Yoontae Lee et al., 2003; Zeng et al., 2003) resulting in the precursor miRNA (pre-miRNA) that is actively transported from the nucleus to the cytoplasm by exportin (Yi et al., 2003; Lund et al., 2004). The siRNA and miRNA pathways converge at this point where the exogenous dsRNA (siRNA pathway) and pre-miRNA (miRNA pathway) are bound by Dicer, which cleaves the dsRNA and pre-miRNA into the 21–25nt siRNA and miRNA duplexes, respectively (Bernstein et al., 2001; Knight and Bass, 2001; Grishok et al., 2001; Ketting et al., 2001; Hutvágner and Zamore, 2002). The cleaved duplexes are then loaded onto Argonaut (AGO) proteins and incorporated into RISC as single-stranded RNAs (Rivas et al., 2005). Finally, this RISC:ssRNA identifies target messages via complementary sequences, leading to gene repression via a number of different mechanisms. In its canonical role in RISC, the slicer
Ago2 cleaves the mRNA (Liu et al., 2004; Martinez et al., 2002) leading to its degradation via deadenylation of the mRNA poly(A) tail; however, RISC can also in inhibit translation by blocking translation initiation (Pillai et al., 2005) and small RNAs and RISC also function in heterochromatin formation (Reinhart and Bartel, 2002; Volpe et al., 2002).
piRNAs
At the same time as the discovery of RNAi, a novel class of long siRNAs
(first termed repeat-associated siRNAs (rasiRNAs) because they originated from repetitive elements such as transposable sequences of the genome) were identified in Drosophila testis and found to silence the gene Stellate on the X-chromosome (Aravin et al., 2001). Later in 2006, Aravin and colleagues as well as three other studies reported their discovery in mammalian genomes (Aravin et al., 2006; Girard et al., 2006; Lau et al., 2006; Grivna et al., 2006), and they were renamed to piRNAs because of their interaction with the PIWI (P-element Induced WImpy testis) protein family. piRNAs are distinct from miRNAs and siRNAs (Vagin et al., 2006), specific to animals and by definition short (26–31nts) ncRNAs that are expressed specifically in the germline. Unlike miRNAs and siRNAs, they are processed from a long single-stranded precursor, of which the large majority are generated from piRNA clusters but they can also be derived from protein coding genes, transposons, tRNA, rRNA, and intergenic loci including lncRNA (Aravin et al., 2006; Girard et al., 2006). In contrast to miRNA and siRNA, the processing of these precursors is independent of the Dicer/Drosha mechanisms. The exact mechanisms of piRNA biogenesis in humans are unclear and most of the information has been gained from studies carried out in Drosophila; however, these processes are likely similar in humans (Williams et al., 2015; Rouget et al., 2010) (Fig. 1.2).
After export from the nucleus (ElMaghraby et al., 2019; Kneuss et al., 2019), piRNAs interact with the germ-line specific PIWI clade of Argonaut proteins forming a ribonucleoprotein complex RISC analogous to miRNA-RISC and siRNA-RISC. This piRISC cleaves the complementary piRNA precursor transcript generating a piRNA intermediate that interacts with the PIWI protein Aubergine (Aub). After interacting with Aub, further piRNA biogenesis occurs via two interconnected mechanisms: phasing and amplification (Fig. 1.2). Briefly, primary biogenesis via phasing occurs in Yb bodies at the outer membrane of the mitochondria (Ge et al., 2019; Huiyan Huang et al., 2011; Haidong Huang et al., 2014; Watanabe et al., 2011) and generates de novo piRNAs increasing piRNA diversity. Secondary piRNA processing via amplification (Brennecke et al., 2007; Gunawardane et al., 2007) results in reciprocal cleavage of the piRNA via PIWI proteins increasing the available pool of certain piRNAs for gene silencing (Ramat and Simonelig, 2021). piRNAs have been implicated in gene silencing mainly through their role in transposon silencing (Brennecke et al., 2008; Khurana et al., 2011), but they also function to regulate gene expression via mRNA deadenylation, cleavage, and stability (Rojas-Ríos et al., 2017; Ma et al., 2017; Barckmann et al., 2015; Gou et al., 2014; Rouget et al., 2010; Zhang et al., 2015; Ramat and Simonelig, 2021).
Medium ncRNAs
Non-coding RNAs are often divided in just two categories (small/short and long), with a somewhat arbitrary threshold of 200nt in length to distinguish the two types (Brosnan and Voinnet, 2009). However, some publications suggest that mid
- or medium
-sized ncRNAs should be in an intermediate category of their own (Boivin et al., 2019). Here, we define medium ncRNAs as ranging between ∼50 and 200nt and of diverse regulatory functions, including important structural ncRNAs such as tRNAs, rRNAs, and snoRNAs.
Transfer RNAs (tRNAs) and tRNA fragments
The most abundant class of medium ncRNAs are tRNAs, with 587 genes in the human genome as defined by the HUGO Gene Nomenclature Committee (HGNC, www.genenames.org) (Seal et al., 2020). Originally identified in 1958 as described above, tRNAs serve as an adaptor molecule between anticodon and corresponding amino acid during protein synthesis in the ribosome, translating the genetic code from DNA to a protein sequence. tRNAs vary in length from 73 to 93nts, contain many modified nucleotides, and fold into a well-characterized cloverleaf structure (Giegé, 2008). In addition to their essential and highly conserved canonical function in translation, tRNAs have more recently been implicated in a number of metabolic pathways, from gene regulation in response to nutritional stress to cell wall biosynthesis and antibiotic synthesis (Raina and Ibba, 2014; Avcilar-Kucukgoze and Kashina, 2020). Recently, tRNAs have gained renewed attention as a myriad of tRNA-derived small ncRNAs, or tRNA-derived fragments (tRFs), and tRNA halves (tiRNAs) were identified in high-throughput sequencing experiments. An emerging body of research suggests that tRFs are not just degradation products of tRNAs but are produced through precise, defined cleavage events and can have functions in gene expression, translation, and the cell cycle (Schorn et al., 2017; Xie et al., 2020).
Ribosomal RNAs (rRNAs)
As described above, rRNA is the primary component of ribosomes, which are the catalytic enzymes underpinning the translation of proteins. Many copies of rRNA genes are encoded in the human genome (Seal et al., 2020) at the 5S rRNA cluster (transcribed by RNA Pol III) and at five 47S rRNA loci (47S is a multi-cistronic precursor to the mature 28S, 5.8S, and 18S rRNAs), which is transcribed by RNA Pol I. These rRNAs are further processed by snoRNA-RNPs (described in the section Small nucleolar RNAs
). Although their canonical role as essential constituents of the ribosome is very well defined, emerging evidence has found that miRNA sequences exist within rRNA termed rRNA-hosted miRNA analogs, which might be important in stress conditions and development (Yoshikawa and Fujii, 2016; Locati et al., 2018; Yunwei Shi et al., 2019; Mangrauthia et al., 2018).
Small nuclear RNAs (snRNAs)
In 1979, Lerner and colleagues showed that small nuclear RNAs (also described as U-RNAs) complexed with RNPs from patients with systemic lupus etherymos (SLE) (Lerner and Steitz, 1979). This finding would form the basis of the work carried out by Thomas Cech and Sidney Altman which lead to their 1989 Nobel Prize as discussed above but was also the first work in a slew of manuscripts defining many of the mid-sized RNA moieties. By definition, snRNAs are nuclear in nature, ∼150nt in length, extensively modified like tRNAs and rRNAs, and functionally distinct from the small nucleolar RNAs defined below. snRNAs always form RNP complexes referred to as snRNPs (pronounced snurps) and the most common snRNAs are U1, U2, U4, U5, and U6, which are highly conserved among eukaryotes and are all components of the spliceosome, involved in group II intron splicing of mRNAs (Bohnsack and Sloan, 2018; Villa et al., 2002). Additionally, some studies suggest non-canonical functions of snRNAs, such as the regulation of gene expression and mRNAs processing (Ideue et al., 2012).
Small nucleolar RNAs (snoRNAs)
Small nucleolar RNAs (snoRNAs) function in the nucleolus, the largest substructure of the nucleus, where they facilitate multiple roles in ribosome biogenesis, such as modifying rRNA, guiding pre-rRNA processing, and acting as molecular chaperones. snoRNAs are abundant, ∼60–300nt long and some of the most functionally diverse trans-acting ncRNAs currently known. Similar to rRNAs and snRNAs, snoRNAs form RNPs with proteins to exert their functions (Filipowicz and Pogacić, 2002). The first human snoRNA was in fact first classified as a snRNA and thus named U3 (SNORD3) , identified in 1976 via biochemical fractionation assays as the most abundant small nuclear RNA in HeLa cells (Zieve and Penman, 1976). Almost a decade later, U3 was found to form an RNP, which was targeted by autoantibodies in a patient with scleroderma (Reimer et al., 1987). More specifically, the autoantibodies targeted fibrillarin, a protein binding to U3. These antibodies proved useful in the identification of many other snoRNAs that also bound fibrillarin, including U8 (SNORD118), U13 (SNORD13), U14 (SNORD14), and U15 (SNORD15) in human over the following years (Tyc and Steitz, 1989; Tycowski et al., 1993). Based on the common sequence motif and secondary structure enabling fibrillarin binding, these snoRNAs were classified as C/D-box snoRNAs. The two motifs, C box (RUGAUGA) and D box (CUGA) after which they are named, and a short stem loop constitute a kink-turn (K-turn) structural motif that is recognized by the snoRNP fibrillarin. C/D-box snoRNAs can also consist of internal motifs that are frequently imperfect copies of the C and D box motifs (Tamás Kiss, 2002).
Another class of snoRNAs was discovered independently, the H/ACA-box snoRNAs (Kiss and Filipowicz, 1993; Ruff et al., 1993; Ganot et al., 1997). Like C/D-box snoRNAs, they are 60–300nts in length, often originate from intronic regions in mRNAs or other ncRNAs, but they bind to different protein partners and serve different functions in the cell (Balakin et al., 1996). H/ACA-box snoRNAs contain two motifs—the H (ANANNA) and the ACA (ACANNN) boxes—and fold into a hairpin-hinge-hairpin-tail
structure. The 5′ and/or 3′ hairpin also consists of an internal loop pocket where the substrate RNAs are bound.
snoRNAs have been shown to act by modifying other ncRNAs, more specifically by posttranscriptionally pseudouridylating and 2′-O-methylating rRNA and snRNA molecules (Tamás Kiss, 2002), respectively. While C/D-box snoRNAs are responsible for 2′-O-methylation, H/ACA-box snoRNAs mediate pseudouridylation (Kiss and Filipowicz, 1993; Tamás Kiss, 2002). snoRNAs are present in archaea as well as in eukaryotes, indicating that they arose over 2–3 billion years ago. Currently, 943 snoRNAs are annotated in the human genome (GENCODE v38, November 2021). With only 100–200 rRNA sites known to carry snoRNA-mediated modifications, the number of identified snoRNAs far exceeds what would be expected based on known modified ncRNA sites (Bachellerie et al., 2002). In addition, many snoRNAs lack sequence complementarity to potential rRNA or snRNAs targets and their localization is not restricted to the nucleolus. Therefore, these orphan
snoRNAs are likely involved in other molecular mechanisms, such as modifications of mRNAs and other RNA biotypes, impacting splicing, as precursors of miRNAs, or by mediating chromatin accessibility. One of the more well-studied examples of an orphan
snoRNA is the brain-specific C/D-box snoRNA HBII-52, which has been described to modulate alternative splicing of the transcript encoding the serotonin receptor. Patients with the genetic imprinting disorder Prader–Willi syndrome lack HBII-52, resulting in different serotonin receptor isoforms and, ultimately, in altered serotonin sensitivity (Kishore and Stamm, 2006; Sahoo et al., 2008). A 2012 study showed specific snoRNAs can mediate chromatin accessibility in Drosophila cells (Schubert et al., 2012). There are likely many other functions of snoRNAs yet to be unveiled.
snaRs, Y RNAs, and vault RNAs
Worthy of mention is the discovery of the more obscure and less studied examples of mid-sized RNAs (Y-RNA, snaRs, and vault RNAs) that are not very abundant and have ill-defined functions. Of these, small NF90-associated RNAs (snaRs) are by far the least characterized category of ncRNA having only been outlined in a handful of published manuscripts (Parrott and Mathews, 2009, 2007; Mathews and Parrott 2008; Parrott et al., 2011). They were first identified in 2007 after they immunoprecipitated with antibodies against NF90 (a protein product derived from ILF3). This combined with the fact that they also bind ribosomes suggests that they play a role in translational control; however, their function remains unclear. They are transcribed by RNA polymerase III, are only present in great apes, and have undergone rapid evolution, some specific to humans. There are 28 snaRs in the human genome (Seal et al., 2020).
Shortly after the discovery of snRNAs from the nuclear extract of SLE patients (Lerner and Steitz, 1979) (described above in the section Small nuclear RNAs
), the same scientists identified a second set of ncRNA using whole cell extract (Lerner et al., 1981). These ncRNAs were termed Y-RNAs because they were mostly cytoplasmic which differentiated them from the nuclear localized snRNAs and nucleolar snoRNAs. These RNAs complexed with the Ro60 protein, which is clinically important in patients with rheumatic disease, SLE and Sjögren's syndrome. There are only four Y-RNAs in the human genome which are all encoded at 7q36.1 and transcribed by RNA polymerase III (Seal et al., 2020). They are approximately 100 nucleotides in length and have a distinctive secondary structure, containing a stem formed from the base pairing of the 5′ and 3′ ends which includes the Ro60 binding site. Although our knowledge on the function of Y-RNAs is incomplete, they appear to influence subcellular location of Ro60 (Sim et al., 2009, 2012) and its ability to bind misfolded RNAs (Fuchs et al., 2006; Stein et al., 2005).
In 1986, a novel class of mid-sized RNA termed vault RNAs (vRNAs) were discovered as part of the largest known RNP complexes called vaults (Kedersha and Rome, 1986). The function of these vault complexes (named as such due to similarities to the arches found in the vaults of cathedrals) are not well understood; however, in response to external stimuli, they translocate to different subcellular compartments and are thought to mediate shuttling processes between cytoplasm and nucleus (van Zon et al., 2003; Hahne et al., 2021). The vault protein TEP1 that binds vRNAs is similar to Ro60 which binds Y-RNAs, implying that these two RNAs might be evolutionary related (Bateman and Kickhoefer, 2003; Kickhoefer et al., 2001). Vault RNAs are only found in higher eukaryotes. In humans, four vRNAs are encoded on chromosome 5at two different genomic locations and are transcribed by RNA polymerase III (Seal et al., 2020). Although named vault RNAs, 95% of these RNAs do not associate with vaults, and even 30 years after their discovery, the molecular functions of vault RNAs are not well defined (Hahne et al., 2021). A recent study suggests that vault RNA1-1 is a riboregulator of autophagy, which may open new avenues of protein posttranslational regulation by vault RNAs (Horos et al., 2019).
Long non-coding RNAs
Early discoveries of lncRNAs
To distinguish small and medium ncRNAs (see the sections Small ncRNAs
and Medium ncRNAs
) from long non-coding RNAs (lncRNAs), a cutoff of 200nt in length is generally accepted in the literature. In the pre-genomics era, the first human lncRNA to be identified was H19, a paternally imprinted lncRNA gene that was identified as one of the highest expressed transcripts in embryos, but silenced in most tissues at birth (Davis et al., 1987; Brannan et al., 1990; Bartolomei et al., 1991). H19 is reciprocally imprinted with its adjacent protein coding gene Igf2. Several functional mechanisms have been proposed for H19, which predominantly localizes to the cytoplasm, including its role as a precursor for miR-675 (Yoshimizu et al., 2008) but also as a competitive endogenous RNA (ceRNA), a molecular