Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Computational Non-coding RNA Biology
Computational Non-coding RNA Biology
Computational Non-coding RNA Biology
Ebook727 pages7 hours

Computational Non-coding RNA Biology

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Computational Non-coding RNA Biology is a resource for the computation of non-coding RNAs. The book covers computational methods for the identification and quantification of non-coding RNAs, including miRNAs, tasiRNAs, phasiRNAs, lariat originated circRNAs and back-spliced circRNAs, the identification of miRNA/siRNA targets, and the identification of mutations and editing sites in miRNAs. The book introduces basic ideas of computational methods, along with their detailed computational steps, a critical component in the development of high throughput sequencing technologies for identifying different classes of non-coding RNAs and predicting the possible functions of these molecules.

Finding, quantifying, and visualizing non-coding RNAs from high throughput sequencing datasets at high volume is complex. Therefore, it is usually possible for biologists to complete all of the necessary steps for analysis.

  • Presents a comprehensive resource of computational methods for the identification and quantification of non-coding RNAs
  • Introduces 23 practical computational pipelines for various topics of non-coding RNAs
  • Provides a guide to assist biologists and other researchers dealing with complex datasets
  • Introduces basic computational methods and provides guidelines for their replication by researchers
  • Offers a solution to researchers approaching large and complex sequencing datasets
LanguageEnglish
Release dateSep 14, 2018
ISBN9780128143667
Computational Non-coding RNA Biology
Author

Yun Zheng

Yun Zheng is Associate Professor in Bioinformatics at Kunming University of Science and Technology in China. He has been working in bioinformatics for more than 10 years, concentrating on non-coding RNAs, and has published over 30 papers in the area. He has developed novel tools for a wide-range of computational topics in non-coding RNAs, validated by influential work in the field of non-coding RNAs. Yun Zheng holds a PhD from the Nanyang Technological University in Singapore.

Related to Computational Non-coding RNA Biology

Related ebooks

Biology For You

View More

Related articles

Reviews for Computational Non-coding RNA Biology

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Computational Non-coding RNA Biology - Yun Zheng

    2018

    Part 1

    Background

    Outline

    Introduction

    Chapter 1. Introduction to Non-coding RNAs and High Throughput Sequencing

    Introduction

    Non-coding RNAs, RNA-seq Technologies, and Computational Tools

    In recent years, with the development of high throughput sequencing technologies, more and more non-coding RNAs (ncRNAs) have been identified. Based on their sizes, non-coding RNAs are further classified as small RNAs and long non-coding RNAs (lncRNAs). Small RNAs mainly consist of two major different types, microRNAs (miRNAs) and small interfering RNAs (siRNAs). LncRNAs are longer than 200 nucleotides. Some lncRNAs are in circular form, derived from lariats generated in splicing processes or from back-spliced exons, and are known as circular RNAs (circRNAs). These diverse types of ncRNAs are attracting attention from all fields of life sciences and translational medicine. Many of these ncRNAs were discovered very recently with the fast development of RNA high throughput sequencing (RNA-seq) technologies. Thus, this part introduces the ncRNAs, the RNA-seq technologies, and the computational methods that are used throughout the book.

    Chapter 1

    Introduction to Non-coding RNAs and High Throughput Sequencing

    Abstract

    There are many kinds of non-coding RNAs (ncRNAs) in living cells. We briefly introduce six classes of ncRNAs that are covered and analyzed in the book: microRNA (miRNA), trans-acting small interfering RNA (tasiRNA) and phased small interfering RNA (phasiRNA), long non-coding RNA (lncRNA), lariat RNA, and circular RNA (circRNA). The functions of these ncRNAs and important issues when predicting them, and some online resources of miRNAs, lncRNAs, and circRNAs are also discussed. We also introduce some sequencing technologies for detecting and/or quantifying ncRNAs. Then we introduce some general computational tools that are used in the following parts of the book. Finally, we introduce several file formats for storage and analysis of nucleotide sequences and high throughput sequencing data, and several file formats for gene annotation.

    Keywords

    Non-coding RNA; microRNA (miRNA); trans-acting small interfering RNA; Phased small interfering RNA; Circular RNA; Lariat RNA; Resources of ncRNAs; High throughput sequencing; RNA-seq; Small RNA-seq; PAR-CLIP; Degradome; Computational tools; FASTA; FASTQ; GFF; GTF; BED

    Chapter Outline

    Acknowledgements

    1.1  Introduction to Different Classes of Non-coding RNAs

    1.1.1  Introduction to microRNAs

    1.1.2  Introduction to trans-acting siRNAs and Phased siRNAs

    1.1.3  Introduction to Long Non-coding RNAs

    1.1.4  Introduction to Lariat Originated Circular RNAs

    1.1.5  Introduction to Back-spliced Circular RNAs

    1.2  Introduction to High Throughput Sequencing Technologies

    1.2.1  Introduction to RNA-seq Technologies

    1.2.2  Introduction to sRNA-seq Technologies

    1.2.3  Introduction to PAR-CLIP Sequencing Technology

    1.2.4  Introduction to Degradome Sequencing Technology

    1.3  Brief Introduction to the Software Used in the Book

    1.3.1  The Java Platform

    1.3.2  The JSmallRNA Package

    1.3.3  The FastQC Program

    1.3.4  The Vienna Package

    1.3.5  BLAST

    1.3.6  SOAP2

    1.3.7  BOWTIE and BOWTIE2

    1.3.8  The tcsh Shell

    1.3.9  Cufflinks

    1.3.10  SAMTools and BCFTools

    1.3.11  BEDTools

    1.3.12  Integrated Genomics Viewer

    1.3.13  MATLAB

    1.3.14  The R Environment and RStudio

    1.3.15  The edgeR Package

    1.3.16  The SRA Toolkit

    1.4  File Formats of Sequences and Sequencing Profiles

    1.4.1  The FASTA Format

    1.4.2  Special FASTA Format for Processed Small RNA Profiles

    1.4.3  The FASTQ Format for Raw Sequencing Profiles

    1.5  File Formats for Gene Annotations

    1.5.1  The GFF Format

    1.5.2  The GTF Format

    1.5.3  The BED Format

    1.5.4  The bedGraph Format

    1.6  Summary

    Glossary or Keywords

    Acknowledgements

    Some materials in this chapter were modified from a paper published in the journal PLOS Genetics of Public Library of Science (PLOS), "Ziwei Li, Shengpeng Wang, Jinping Cheng, Chuanbin Su, Songxiao Zhong, Qi Liu, Yuda Fang, Yao Yu, Hong Lv, Yun Zheng, and Binglian Zheng. Intron Lariat RNA Inhibits MicroRNA Biogenesis by Sequestering the Dicing Complex in Arabidopsis, PLOS Genetics, Volume 12, Issue 11, 21 November 2016, Pages e1006422; a paper published in the journal BMC Genomics of BioMed Central, Kun Chen, Li Liu, Xiaotuo Zhang, Yuanyuan Yuan, Shuchao Ren, Junqiang Guo, Qingyi Wang, Peiran Liao, Shipeng Li, Xiuming Cui, Yong-Fang Li and Yun Zheng. Phased secondary small interfering RNAs in Panax notoginseng. BMC Genomics 2018, 19(Suppl 1):41; and a paper published in the journal Cancer Letters of Elsevier B.V., Yun Zheng, Li Liu and Girish C. Shukla. A comprehensive review of web-based non-coding RNA resources for cancer research, Cancer Letters, Volume 407, 28 October 2017, Pages 1–8."

    1.1 Introduction to Different Classes of Non-coding RNAs

    1.1.1 Introduction to microRNAs

    1.1.1.1 Basic Information on microRNAs

    MicroRNAs (miRNAs) are small non-coding RNA molecules (ncRNAs), with 21 to 22 nucleotides (nt), that can regulate gene expression by specifically recognizing base-pairing sites on their target mRNAs [1]. The first miRNAs discovered were lin-4 and let-7, which regulate the development of Caenorhabditis elegans [2–4]. miRNAs have been found to be pervasive regulatory molecules in both animals [5–7] and plants [8,9]. To date, more than twenty thousand miRNA genes have been identified in more than 200 species [10].

    1.1.1.2 Biogenesis of microRNAs

    Although miRNAs exist in both animals and plants, there are some differences in the biogenesis processes for animal and plant miRNAs. In animals the primary transcript of miRNA (pri-miRNA) is transcribed by RNA polymerase II (Pol II) or III and is then folded into a special hairpin-like secondary structure (Fig. 1.1A) [11,12]. In the nucleus, the free ends of the hairpin-structured primary miRNA are cut by Drosha to release a precursor of miRNA (pre-miRNA) with 80 to 100 nt [13,14]. Then the pre-miRNA is exported into cytoplasm with Exportin 5 (Exp5) [15]. Another protein enzyme, called Dicer, conducts another cleavage to the loop ends of pre-miRNAs to release a miRNA:miRNA* duplex with a 2 nt overhang at the 3' end [16]. The functional miRNA is loaded into the RNA-induced silencing complex (RISC), normally containing a core protein of the Argonaute (Ago) family [17–19]. The RISC is then guided by miRNAs through sequence complementarity to the mRNA targets [19]. The mRNA targets are then either translationally repressed or become unstable at the RNA level [19].

    Figure 1.1 The biogenesis of miRNA and tasiRNA/phasiRNA.(A) The biogenesis of miRNA in animals. (B) The biogenesis of miRNA in plants. (C) The biogenesis of tasiRNA/phasiRNA in plants.

    Although most animal miRNAs are produced in the canonic way shown in Fig. 1.1A, miR-451 in vertebrates is processed by a Dicer-independent pathway [20]. Similarly, some pre-miRNAs in introns, the so-called mirtrons, escape the Drosha-induced cleavage by using the splicing machinery to conduce the first cleavage of their transcripts [21–23].

    Most miRNAs negatively regulate their target genes through homolog-based mRNA cleavage or translation inhibition at the post-transcriptional level [24,11]; however, some miRNAs may activate their targets through different mechanisms [25–27].

    As shown in Fig. 1.1B, in plants transcribed by RNA polymerase II [28,29], the primary transcripts of miRNAs often form typical hairpin structures that are cleaved twice by Dicer Like 1 (DCL1) in the nucleus to release the miRNA:miRNA* duplex [9]. Two co-factors, HYPONASTIC LEAVES 1 (HYL1) and SERRATE (SE), work with DCL1 to induce efficient cleavages [30–32]. Unlike in animals, miRNAs:miRNA* in plants are methylated by HUA ENHANCER 1 (HEN1) [9]. After being exported to the cytoplasm by HASTY (an Exportin 5 homolog) [33], plant miRNAs are loaded into an RISC that also contains an Argonaute (AGO) protein (most commonly AGO1), and guide the RISC to cause site-specific cleavages of mRNA targets [34,35] or translational repression of the targets [36].

    1.1.1.3 Functions of miRNAs

    The miRNA-mediated gene regulation mechanism is conserved from worms to mammals [37,38], which indicates its important functions. miRNAs are involved in many biological processes including cell cycle, differentiation, development, and metabolism [39–44]. Recent studies have emphasized the essential roles of miRNAs in diverse diseases [45–49].

    Although both animal and plant miRNAs can bind to their targets, the mechanisms of animal and plant miRNAs are different. Animal miRNAs normally have partial complementarities with their targets, while plant miRNAs often complement their targets perfectly or nearly perfectly. The first eight nucleotides in animal miRNAs, normally called the seed region, are particularly important in determining the functionality of miRNAs [50]. The imperfect complementarities between animal miRNAs and their targets normally lead to translational repression or induce degradation of the target mRNAs. In contrast, the miRNAs in plants often induce cleavages in the center regions of their fully or nearly fully matched complementary sites.

    Plant miRNAs are important small non-coding RNAs that play essential regulatory roles in plant development and stress responses by targeting important transcription factors [51,52,24]. Animal miRNAs have much wider range of functions than plant miRNAs. The number of direct targets of a given animal miRNA is generally much larger than that of a given plant miRNA by at least an order of magnitude [53].

    1.1.1.4 Computational Considerations When Predicting miRNAs and Their Targets

    When predicting plant miRNAs a set of criteria proposed by Meyers et al. [54] were often used. Recently, Axtell and Meyers updated these criteria [55]. The updated criteria consider several aspects including repeatable detections of miRNAs and miRNAs* with a clear 3' overhang in the sRNA-seq profiles; hairpin-like secondary structures of pre-miRNAs without a large central loop; balanced miRNA and miRNA* regions; sequencing reads mostly generated from mature miRNA or miRNA*; and mature miRNAs preferentially with 20 to 22 nt. These criteria should also be considered for the animal miRNA prediction, although the setting may be revised according to the animal miRNA. We revisit these criteria and the identification of the miRNAs in detail in Chapter 2.

    The targets of animal miRNAs should preferentially be accompanied with the witness of PhotoActivatable Ribonucleoside-enhanced CrossLinking and ImmunoPrecipitation (PAR-CLIP) sequencing reads (see Section 1.2.3). In comparison, when predicting the targets of miRNAs in plants, the most critical consideration is the accumulation of degradome-seq reads (see Section 1.2.4). However, because of the diversity of tissues or cell lines for the degradome-seq and PAR-CLIP profiles, some functional miRNA:targets pairs might not be detected in the tissues from which the degradome-seq and PAR-CLIP profiles were generated. We introduce the miRNA target prediction in animals and plants in Chapters 5 and 6, respectively.

    1.1.1.5 Databases and Resources of miRNAs

    The web-based resources for miRNAs are listed in Table 1.1. The first database, miRBase, officially reports the miRNAs in all species, including humans [10]. The sequences of pre-miRNAs and mature miRNAs, the secondary structures of pre-miRNAs, and related literature can be obtained from the miRBase.

    Table 1.1

    Web-based resources for miRNAs.

    EVpedia is an integrated and comprehensive proteome, transcriptome, and lipidome database of extracellular vesicles (EVs) in many species, including humans [56,57]. EVpedia provides databases of vesicular mRNAs, miRNAs, and lipids. Users can for miRNAs in EVs originating from different cells and some cancer cell lines in EVpedia.

    deepBase annotates various small RNAs (miRNAs, siRNAs, and piRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs) [58,59]. In addition to expression functions of sRNAs, deepBase also provides conservation, expression, and prediction functions of lncRNAs.

    miRGator provides diversities or isoforms, expression profiles, targets of miRNAs, and expression relations between miRNAs and targets [60–62].

    ChIPBase collects transcription factor (TF) binding sites and histone modifications and motifs for lncRNAs, miRNAs, and protein-coding genes from 10,200 ChIP-seq data sets [63,64]. ChIPBase includes a tool for exploring the co-expression patterns between TFs and genes by integrating around 10,000 tumors and 9100 normal samples. ChIPBase also provides a tool to find enriched Gene Ontology (GO) terms of a given TF.

    GTRD has processed 8828 ChIP-seq data sets for 713 TFs for humans and mice with four different peak-calling algorithms. The gene regulated by a given TF or the potential TFs for a given gene can be searched for in GTRD [65]. GTRD visualizes the putative TF binding sites on a genome browser.

    DIANA-TarBase is a collection of over 500,000 miRNA:target relations with experimental validations from 356 different cell types from 24 species [66].

    miRTarBase includes around 360,000 experimentally verified miRNA:target relations obtained by text mining and manual surveying [69]. miRTarBase offers various ways to query, such as by targets, pathways, and diseases, to find the relation between miRNAs and diseases.

    miRCode reports putative miRNA target sites across the complete GENCODE annotated transcriptome, including 10,419 lncRNA genes [70].

    starBase reports interactions between miRNAs and various molecules, such as mRNAs, lncRNAs, and circRNAs, by analyzing CLIP-seq data [71,72]. starBase provides a useful tool for analyzing the networks of miRNAs, specific targets of interest, and competitive endogenous RNAs (ceRNAs) for the The Cancer Genome Atlas data [72].

    miRWalk provides predicted and experimentally verified miRNA:target interactions within the complete sequence of a gene, and combines this information with a comparison of binding sites from 12 existing miRNA-target prediction programs [73,74]. miRNA:target pairs can be searched for specific GO terms, diseases, and Online Mendelian Inheritance in Man (OMIM) disorders in miRWalk.

    Mutations in miRNAs or miRNA target sites may change the specificities between miRNAs and their targets. Thus, some mutations in miRNAs or miRNA target sites may have played roles in cancers [85]. PolymiRTS is a database of mutations in miRNAs and miRNA target sites [75,76]. SomamiR provides somatic mutations in miRNAs, or miRNA complementary sites in multiple classes of target RNAs, including mRNAs, circRNAs, and lncRNAs [77,78].

    Oncomir provides miRNA expression in sarcoma and colon cancer [79,80]. OncomiRDB collects experimentally verified oncogenic and tumor-suppressive miRNAs using text mining [81]. miRCancer provides miRNA expression profiles in various human cancers obtained by text mining techniques and manual revision [82].

    HMDD (v2.0) [83] is a comprehensive database of miRNAs and disease associations that are experimentally supported. HMDD supports the search for a miRNA's role in different diseases or miRNAs related to a specific disease. In addition to miRNA:target relations, circulation, and genetic and epigenetic relations between miRNAs and diseases are also collected in HMDD.

    miR2Disease [84] is another database of curated relations of miRNA and diseases. miR2Disease supports queries based on miRNAs, targets, and diseases. miR2Disease includes deregulated expression patterns of miRNAs in various human diseases, experimentally verified miRNA targets, and related references.

    1.1.2 Introduction to trans-acting siRNAs and Phased siRNAs

    1.1.2.1 Basic Information on tasiRNAs and phasiRNAs

    In addition to miRNAs, there is another class of small RNA called small interfering RNAs (siRNAs) in plants and some animals [32,86]. siRNAs are characterized by their biogenesis depending on different RNA-dependent RNA polymerase (RDR) members. Several types of siRNAs have been identified in plants, including natural antisense siRNAs (nasiRNAs or natsiRNAs), trans-acting siRNAs (tasiRNAs), repeat-associated siRNAs (rasiRNAs), phased siRNAs (phasiRNAs), chromatin-associated siRNAs (casiRNAs), and promoter-associated siRNAs (pasiRNAs).

    1.1.2.2 Biogenesis of tasiRNAs and phasiRNAs

    Phased siRNAs are a class of secondary siRNA where the biogenesis of these siRNAs is often triggered by miRNAs. Phase simple means these siRNAs are generated precisely in a phased pattern initiated at a specific nucleotide. As shown in Fig. 1.1C, the biogenesis of phasiRNAs or tasiRNAs requires an initiative cleavage on the phasiRNA precursor transcript (PHAS) by a specific miRNA in either a one-hit or two-hit manner [87–92]. Then one of the cleaved products is made double stranded by RNA-dependent RNA polymerase 6 (RDR6) and Suppressor of Gene Silencing 3 (SGS3) [93]. Then the dsRNA is catalyzed by DCL4 and RDB4 into a 21 nt siRNA in a phased pattern [88,90,87,94,89,91]. Some of the phasiRNAs may also target their parental genes in cis or other genes in trans [88,87,95,96].

    PhasiRNAs can be generated from either long non-coding RNAs or from coding genes. TAS is a special type of PHAS loci originating from non-coding RNAs. Arabidopsis miR173 (TAS1 and TAS2), miR390 (TAS3), and miR828 (TAS4) can function as guides on non-coding primary transcripts to initiate tasiRNA1-2, tasiRNA3, and tasiRNA4 processing, respectively. These tasiRNAs can further target pentatricopeptide repeat (PPR) family members, auxin response factors (ARFs), and the myeloblastosis (MYB) transcription factor in trans manner [88,90,97,98]. Among them, TAS3 is highly conserved in land plants [87,88,95]. Recently, 21 and 24 nt phasiRNAs derived from long non-coding RNAs have been reported in the male reproductive organs of rice and maize; they are trigged by miR2118 and miR2275, and cleaved by DCL4 and DCL5 (also known as DCL3b), respectively [91,99]. A non-coding PHAS locus, triggered by miR4392, was found to accumulate preferentially in soybean anthers

    Enjoying the preview?
    Page 1 of 1