Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Omics in Horticultural Crops
Omics in Horticultural Crops
Omics in Horticultural Crops
Ebook2,189 pages25 hours

Omics in Horticultural Crops

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Omics in Horticulture Crops presents a comprehensive view of germplasm diversity, genetic evolution, genomics, proteomics and transcriptomics of fruit crops (temperate, tropical and subtropical fruits, fruit nuts, berries), vegetables, tuberous crops, ornamental and floricultural crops and medicinal aromatic plants. Information covering phenomics, genetic diversity, phylogenetic studies, genome sequencing, and genome barcoding through the utilization of molecular markers plays an imperative role in the characterization and effective utilization of diverse germplasm are included in the book. This is a valuable reference for researchers and academics seeking to improve cultivar productivity through enhanced genetic diversity while also retaining optimal traits and protecting the growing environment.
  • Highlights perspectives, progress and promises of -omics application
  • Provides a systematic overview of origin, progenitor and domestication process as well as genetic insights
  • Includes full range of horticultural crops
LanguageEnglish
Release dateJul 16, 2022
ISBN9780323899130
Omics in Horticultural Crops

Related to Omics in Horticultural Crops

Related ebooks

Agriculture For You

View More

Related articles

Related categories

Reviews for Omics in Horticultural Crops

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Omics in Horticultural Crops - Gyana Ranjan Rout

    Chapter 1: State of the art of omics technologies in horticultural crops

    Thomas Debener    Faculty of Natural Sciences, Leibniz University Hannover, Institute for Plant Genetics, Hannover, Germany

    Abstract

    Horticultural crops display an extraordinary diversity in species, which corresponds to different levels of application of omics technologies. Omics technologies were adapted early in major fruit and vegetable crops. Here vast resources as several sequenced genomes per genus/species, SNP chips as well as vast transcriptomic resources are available and already in use in commercial breeding programs. In contrast, ornamentals representing the most diverse group of horticultural crops are rarely represented among species with sequenced genomes. However, for major species of fruit, vegetable, and ornamental crops, sequenced genomes and transcriptomic and proteomic resources are available, and new resources are added with increasing speed. In addition, numerous metabolomics studies have been published, and recently, phenotyping protocols supported by machine learning analysis tools complement the breeders toolbox. In this overview, a focus will be laid on genomic, transcriptomic, and high-throughput marker analysis in major horticultural crops available via public databases.

    Keywords

    NGS technologies; Genomics; Transcriptomics; Proteomics; Phenomics; SNP marker; SNP array; RNA-Seq; Multiomics integration; Marker-assisted selection

    1: Introduction

    The term omics is a relatively new term that describes scientific strategies that aim at the collective characterization and quantification of entire pools of biomolecules and their functions in the structure, physiology, and dynamics of an organism (Jamil et al., 2020; Kolker, 2004; Kumar et al., 2021a). Its use has become popular within the last two decades and emerged from the development of high-throughput analysis technologies for biomolecules. In the field of omics disciplines, the suffix -ome is used (in contrast to its use in classical medicine, where it describes swellings or tumors) for the analysis of the entirety of a particular class of biomolecules, such as DNA (genome), RNA (transcriptome), proteins (proteome), or metabolites (metabolome). As the entire census of most of these molecules is hardly possible in a single analysis (with the exception of the genome), the terms often describe the analyses of these molecules in particular organs, cellular compartments, physiological states, or developmental stages (e.g., the defense proteome of a plant). The term genomics is based on the term genome, which was already coined by the German biologist Winkler in 1920 (Winkler, 1920), who defined the German term Genom as the hereditary material present in the nucleus, prior to knowledge of the true nature of the genome. The popularity of the terms omics and the suffix -ome has led to an influx of terms harboring the suffix -ome. Among the generally accepted terms are the interactome, describing the interactions among proteins but also between molecule classes (Ran et al., 2020) and subgroups of molecules and the phosphoproteome (Chen et al., 2021), describing the entirety of proteins that are phosphorylated.

    In addition to these individual disciplines, researchers have also begun to combine data and knowledge gained in the individual omics disciplines in so-called multiomics-integration (MOI) studies (Jamil et al., 2020). The most basic approaches in MOIs try to detect correlations between the abundance of biomolecules, while other strategies aim at a higher-level integration of different omics data to enhance our understanding of plant functions. The most relevant and most widely applied omics disciplines for the plant sciences are the genome, transcriptome, proteome, metabolome, and phenome. In this review, I will give a very brief definition of these disciplines before I present some examples focusing on genome and transcriptome analyses in horticultural crops (Fig. 1).

    Fig. 1

    Fig. 1 Graphical overview of the major omics technologies with the highest relevance for horticultural crops. MOI , multiomics integration.

    1.1: Genomics

    1.1.1: Tasks

    •Analysis of the structure, function, and evolution of genomes

    •Molecular marker maps, sequencing of individual genes, and functional analysis of genetic information with various techniques, including the generation of transgenic plants

    •Complete sequences and gene spaces of whole genomes

    1.1.2: Technological advances

    •Development of various novel sequencing technologies (next-generation sequencing(NGS)) following Sanger technology

    •Development of bioinformatics tools for the analysis of large sequence datasets

    •Development of high-throughput marker technologies (single-nucleotide-polymorphism SNP arrays, genotyping-by-sequencing GBS)

    1.1.3: Challenges

    •Large complex genomes of many crop species, often polyploid, hinder completion of chromosome-level genomes due to repetitive DNA and allelic differences in heterozygous plants

    •Significant structural differences in the genomes between genotypes within species

    •Exponential increase in datasets that need to be analyzed

    1.2: Transcriptomics

    1.2.1: Tasks

    •Analysis of the transcribed part of the genetic information of the genome

    •Analysis of gene expression differences (differential gene expression) among genotypes, tissues, and physiological and developmental stages

    1.2.2: Technological advances

    •Development of various novel sequencing technologies following Sanger technology

    •Development of bioinformatics tools for the analysis of large sequence datasets

    •Development of array technologies

    1.2.3: Challenges

    •Complexity of transcriptomes higher than that of genomes due to splicing variants

    •Analysis of some tissues and/or stages difficult (rare stages, small groups of cells)

    •Assignment of short reads to individual paralogues difficult in complex genomes and without completely sequenced genomes

    1.3: Proteomics

    1.3.1: Tasks

    •Analysis of the translated part of the genome

    •Analysis of protein abundance in different tissues and developmental and physiological stages

    1.3.2: Technological advances

    •Significant extension of sensitivity and throughput by novel separation technologies, such as liquid chromatography (LC) and mass spectrometric (MS) methods

    •Development of bioinformatics tools to assign putative amino acid sequences to peptides

    •Development of reference databases

    1.3.3: Challenges

    •Posttranslational modifications lead to a higher diversity of proteins than transcripts

    •Separation technologies and MS can only identify a subset of all postulated proteins and only a subset of all possible posttranslational modifications

    •Analysis of some tissues and/or stages difficult (rare stages, small groups of cells)

    1.4: Metabolomics

    1.4.1: Tasks

    •Analysis of the type and amount of metabolites and signaling molecules in genotypes, tissues, physiological and developmental stages

    1.4.2: Technological advances

    •Significant extension of sensitivity and throughput by novel separation technologies (gas chromatography (GC), liquid chromatography (LC), mass spectrometric methods)

    1.4.3: Challenges

    •Very large complexity of metabolites in plants, including numerous chemical derivatives of compounds

    •Limited capacity to detect all molecular variants of compound groups

    •Limited methods to analyze compounds nondestructively in plant tissues and putative artifacts generated during the isolation processes

    1.5: Phenomics

    1.5.1: Tasks

    •Measurement of plant growth, architecture, physiological state, and composition at different scales

    •Low- and high-throughput analysis of particular traits in large groups of plants

    1.5.2: Technological advances

    •Significant progress in sensor technologies, particularly optical sensors, on various scales

    •Development of software for automated image processing recently including machine learning

    1.5.3: Challenges

    •Huge, high-dimensional datasets difficult to process due to computational limits

    •Large-scale (whole greenhouse/field phenotyping) very resource demanding

    •Analyses of complex structures (e.g., compound leaves and complex plant architectures) and some physiological changes computationally demanding

    2: Preliminary remarks

    This introductory chapter will first give an overview of the application of omics technologies to horticultural crops. It is not meant to be complete in the sense that not all horticultural crops will be covered. The reason for this is that the species diversity of horticultural crops by far exceeds the diversity of agricultural crops. A major contribution to this species diversity is due to the large diversity among ornamental plants, with several thousand cultivated species (as one of several examples, see the information provided by the International Society for Horticultural Sciences, ISHS: https://www.ishs.org/). In this group, another level of diversity is the large number of varieties bred for some of the major ornamental crops, which is responsible for the large number of protected varieties in the CPVO database in Europe, with 29,785 granted variety rights out of a total of 56,561 granted for all crop species together (https://cpvo.europa.eu/en/statistics; https://cpvo.europa.eu/sites/default/files/documents/a4_cpvo_statistics_version_2-2021.pdf, updated on 03.05.2021).

    In addition, because genomics resources are currently advancing with tremendous speed, only a selection of horticultural crops will be exemplarily discussed in this introductory chapter.

    Furthermore, different omics technologies have not had an equal impact on horticultural crops thus far. Only a few model species, such as tomato and petunia, are found among horticultural crops. As a consequence, a significant part of research conducted in horticulture can be considered to be applied research. This in turn makes genomics the most important omics technology; therefore, this lead chapter will be biased toward developments in genomics and transcriptomics with less coverage of proteomics and metabolomics. Another reason for this is that genomics paved the way for transcriptomics and proteomics in providing gene models that are needed in transcriptomic research and, to some extent, also in proteomic research.

    3: Genomics

    Before novel technologies emerged for the analysis of plant genomes, the foundations had been laid by decades of research on genetics successively supported by molecular biology techniques, such as molecular markers. By 2000, when the first plant genome was published (Kaul et al., 2000), maps with molecular markers had been published for hundreds of horticultural crops, and a large number of genes had been isolated and characterized (Ahmad et al., 2021; Rajapakse, 2003).

    As a prominent example, tomato, as a model plant for vegetables, already profits from numerous genetic and genomic resources, including genotype and mutant collections (Bai and Lindhout, 2007), genetic maps, and cloned genes. One of the first genetic maps constructed with molecular markers in plants was available for tomato starting in 1986 (Bernatzky and Tanksley, 1986). The first marker maps for fruit crops were published for Citrus in 1992 (Jarrell et al., 1992), and marker maps for ornamentals were published starting in 1999 (Debener and Mattiesch, 1999). These resources advanced genetic research in that they significantly supported trait analysis and analyses of genome structure.

    3.1: Technological advances

    If we consider the revolutionary developments in omics technologies in plants, the genomics revolution probably had the highest impact among all omics technologies thus far. The tremendous increase in sequencing capacity, accompanied by a drop in costs for sequencing (https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data), was a major trigger for major developments throughout the plant sciences (Le Nguyen et al., 2019). A major breakthrough in omics technologies stems from three areas: the development of new sequencing technologies, the development of microarray technologies, and progress in bioinformatics, which has allowed us to handle the ever-increasing datasets generated by sequencing technologies (Le Nguyen et al., 2019; van Dijk et al., 2018; Kumar et al., 2021b; Lowe et al., 2017). Until 2005, genomics projects relied on Sanger sequencing technology (van Dijk et al., 2018), which had been moderately improved over time by increasing the capacity of sequencing machines with the Arabidopsis genome mentioned above, mainly sequenced by this technology. Starting in 2005, sequencing capacity developed exponentially, starting with the introduction of 454 technology (Margulies et al., 2005), followed by several others, which led to a tremendous drop in sequencing costs and an increase in sequencing capacity (https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost). To date, the major technologies used for genomics are Illumina technology, which offers unprecedented numbers of reads per machine run, PacBio and Nanopore sequencing for sequencing long DNA molecules, and a combination of other technologies such as chromosome conformation capture (HiC) and optical mapping with BioNano technology (Lam et al., 2012) to build chromosome-scale assemblies.

    A remarkable example of horticultural crops that profit from these technologies is the recent de novo sequencing of an apple genome (Zhang et al., 2019) that combined PacBio sequencing with the HiC technique and a BioNano optical map to support the assembly. This genome provides a significant improvement over the initial sequence of the Golden Delicious genome in average scaffold length and also ultimately in the precision of the genome models derived from the sequence.

    Today, chromosome-scale assemblies represent the gold standard of genome assemblies, where the sequences are arranged in so-called pseudomolecules, each representing a more or less complete chromosome. These sequences are not complete and without gaps, such as centromeres, telomeres, and regions rich in repeats, they are extremely difficult to assemble even with the technologies mentioned above.

    3.2: Completely sequenced horticultural genomes

    To date, several hundred plant genomes have been sequenced. As the number of sequenced genomes is constantly increasing, only examples will be given here in this introductory chapter. Information on sequenced plant genomes can be obtained using various databases that cover genomes from all taxonomic groups or just subsets, such as resources on plant families such as Solanaceae (The Sol-Genomics network: https://solgenomics.net/; Bombarely et al., 2011) or Rosaceae (Genome Database For Rosaceae: https://www.rosaceae.org/; Main et al., 2016). The following web-based platforms represent a small selection of general overviews on the current status of sequenced plant genomes with the restriction that none of the databases contains all sequenced and/or published genomes (Table 1).

    Table 1

    Fully sequenced genomes also have significance for recently developed biotechnological strategies (Fig. 2). Here, genome editing is of prime importance, representing one of the most important advances in biological sciences of the last three decades (Atkins and Voytas, 2020; Barman et al., 2020). Genome editing, in particular genome editing conducted by CRISPR-Cas techniques, relies on precise prediction of target sequences. These can be obtained from high-quality genomes and resequencing projects. Most importantly, a fully sequenced genome also provides information on related genes (as putative so-called off-target sequences), which is information important for the development of specific target sequences. If genome-edited plants have to be analyzed for the extent of off-target modifications, a high-quality reference genome sequence allows efficient targeted (e.g., only sequences closely related to the original target are analyzed) or nontargeted (e.g., the whole genome of edited genotypes is resequenced) resequencing approaches (Modrzejewski et al., 2019).

    Fig. 2

    Fig. 2 Graphical representation of genomics and transcriptomics, their major sources of innovation, and their mutual interaction for important applications. MAS , marker-assisted selection.

    3.2.1: Vegetables

    A large number of vegetables have been sequenced to date; a selection of five important species is given in Table 2. In vegetables such as in the major agricultural crops, a significant number of genomes have been generated by the private sector and hence have not been published. The first published genome was the cucumber genome in 2009, quickly followed by others (Table 2). In addition to the cucumber genomes deposited in the public databases mentioned above, more than 180 additional so-called resequenced genomes, e.g., genomes sequenced with lower sequencing depth, have been generated (Huang et al., 2009; Zhang et al., 2020). A somewhat similar situation can be seen for many other important horticultural crops.

    Table 2

    3.2.2: Fruits

    The first genome of a fruit crop that was sequenced was grape in 2007 (Jaillon et al., 2007) and 2008 (Ming et al., 2008) and apple in 2010 (Velasco et al., 2010). Genomic data for a selection of six important fruit species are given in Table 3.

    Table 3

    3.2.3: Ornamentals

    The first ornamental genome that was sequenced was Prunus mume in 2012 (Zhang et al., 2012), which has a relatively small genome of 280 Mb, followed by carnation (Yagi et al., 2014). In a recent review on genomics in ornamentals, which cited publications up to October 2020, 69 ornamental genome sequences were reported (Zheng et al., 2021). A selection of important ornamental species is shown in Table 4. However, the quality of the sequenced genomes of ornamentals varies significantly from scaffold-level assemblies in carnation to high-quality chromosome-level assemblies in roses. Compared with major agricultural crops, the number of completed genomes is smaller and of lower average quality (Table 4).

    Table 4

    4: Transcriptomics

    As in genomics, transcriptomics started before the advent of NGS technologies with numerous conventional techniques, such as Sanger sequencing of cDNAs (EST sequencing) and transcript profiling technologies, such as cDNA-AFLPs, cDNA microarrays, and other techniques (Lowe et al., 2017). In particular, cDNA arrays provided the first medium- to high-throughput information on gene expression in plants. However, array-based technologies often do not reach sensitivity in the detection of gene expression as compared with NGS-based technologies (Yang and Wei, 2015).

    The use of NGS tools significantly boosted gene expression analysis in two ways; it allowed for more comprehensive analyses of transcribed genes of a plant and for the analyses of physiological states of tissues or whole plants with unprecedented depth. However, this is only possible due to a continuous improvement in the bioinformatics tools that now allow the quantification of short reads assigned to specific transcripts or gene models (Simoneau et al., 2021). To date, transcriptomic studies using high-throughput platforms (e.g., either various types of expression arrays or variations of the RNA-Seq strategy) have been applied for numerous horticultural crops. All species listed in Tables 2–4 are represented by several publications using transcriptomic techniques for a broad range of scientific questions. Its greatest potential, however, is reached when RNA-Seq analyses are performed in species where comprehensive sets of gene models have been generated (Fig. 2). In these cases, read mapping, e.g., the determination of the genes matching transcripts, is technically much easier on complete sets of gene models than in nonsequenced genomes in which the targets of the mapping process have to be generated from the transcriptome data itself (Lowe et al., 2017).

    5: Applications of combined genomic and transcriptomic data

    5.1: Definition of gene spaces

    The sequencing of genomes at the chromosome level now allows us to define the complete gene space (e.g., the complete set of all the genes of a plant) of a plant. This is usually done by predicting gene models in sequenced genomes using various gene prediction algorithms that make use of information from complete cDNA sequences to obtain trained information on the particular gene structure of the species under study (Fig. 2, Stanke et al., 2006; Keller et al., 2011). Here, chromosome-level genome sequences are the gold standard because they not only provide information on the position of genes but also more comprehensive information on the relationship between closely related paralogues and alleles. This also allows a higher precision in assigning short RNA-Seq reads to the correct transcript in differential expression analyses via RNA-Seq.

    5.2: Marker development

    Despite available genomes and transcriptomes, molecular markers are still crucial for genomic analysis of many horticultural important traits and for the isolation of causal genetic factors (Rout and Mohapatra, 2006). Technological advances leading to progress in genomics and transcriptomics have also significantly improved marker technologies (Le Nguyen et al., 2019; Zhang et al., 2020; Jaganathan et al., 2020). Although single locus marker systems are still in use in many horticultural crops, more advanced applications, such as genome-wide association studies (GWAS) or high-resolution gene and quantitative trait locus (QTL) mapping, mostly utilize high-throughput marker systems (Schulz et al., 2016; Luo et al., 2020). Among these markers, SNP markers have acquired a dominant position, as they can be efficiently automated and multiplexed. In contrast to single locus markers such as SSRs (simple sequence repeats), SNP markers also provide marker dosage information. The latter characteristic is very important in polyploid crops, where heterozygotes display more allele dosage classes than single-class heterozygotes in diploids (Smulders et al., 2019). Among the strategies for high-throughput SNP genotyping, two main analysis strategies are currently used: single-nucleotide polymorphism (SNP) arrays and NGS-based genotyping on complexity-reduced genomic DNA, such as genotyping by sequencing (GBS) or restriction site-associated DNA sequencing (RadSeq) (Rasheed et al., 2017; Zhang et al., 2020). SNP analysis is based on the detection of two alternative states for a polymorphic nucleotide position. On SNP arrays, this is based on DNA hybridization, and each marker detects the state of a single nucleotide position within a clearly defined sequence. Some sequences (mostly but not exclusively coding sequences) may be represented by several markers on an array, each detecting several different SNP positions within the target sequence. GBS and RadSeq use subsets of genomic regions of a genome, and fragments will be sequenced by one of the short read NGS technologies. The sequences will then be mapped onto genomic targets, and SNPs can be detected for sequenced regions if the sequencing depth is sufficient to allow allele discrimination and dosage determination (Torkamaneh et al., 2018; Clevenger et al., 2015). For many of the major horticultural crops, SNP-based technologies have been developed (Table 5). Custom-made SNP arrays are available for a number of crops (Table 5), and numerous service providers offer processing of DNA samples, including bioinformatics analyses for both SNP arrays and sequencing-based SNP analyses, so that even labs without direct access to expensive equipment can use these resources.

    Table 5

    5.3: Use of omics to link phenotypes to genes and QTLs

    One of the immediate advantages of the application of omics technologies in horticultural crops is the facilitation of gene identification. Here, both genomics and transcriptomics led to a major leap forward in identifying a large number of genes responsible for major characteristics and horticultural traits (Fig. 2). Individual candidate genes for particular monogenic horticultural characteristics can be identified based on various omics technologies. These include the following:

    •Differentially expressed genes for which additional information is available, e.g., annotations matching an expected gene function.

    •Predicted genes within a marker interval determined by molecular marker maps. Here, annotations matching the expected function together with expression information will help to limit the number of possible candidate genes.

    •Candidate genes predicted for well-studied pathways (e.g., anthocyanin biosynthesis) for which gene models and expression information help to distinguish likely candidates (expressed in the right tissue and physiological state, sequences carrying no disruptive mutations) from less likely candidates.

    Often, a combination of positional information by high-density marker analyses and transcriptomic analyses is particularly fruitful in the identification of novel gene functions.

    One of the numerous examples from horticultural crops is the isolation of candidate genes for amygdalin production in sweet almonds (Sánchez-Pérez et al., 2019). Here, the almond reference genome was completed by NGS, followed by high-resolution mapping with molecular markers in a segregating almond population, leading to a genomic interval of 46 kb between the most closely linked markers. This interval comprised 11 predicted genes, 5 of which belonged to a group of bLHL transcription factors that are putative regulators of structural genes in amygdalin biosynthesis. By functional analyses, which also revealed loss-of-function mutations in orthologues isolated from almonds low in amygdalin, one of the five candidate genes was identified to regulate amygdalin biosynthesis. It was therefore identified as the genetic factor selected in sweet almond domestication.

    Another area with significant progress stimulated by the development of methods and resources in omics is the analysis of quantitative traits. An example of the combination of low-throughput analysis of QTL mapping and gene expression analysis is the analysis of russeting in apple in a biparental segregating population (Lashbrooke et al., 2015). Russeting, a fruit skin disorder in apple, and cuticle-related traits were phenotyped in 88 individuals of an F1 population for which a dense marker map had been previously generated. Two major QTL regions were identified, and in two very large genomic intervals spanned by each of the QTLs, a likely candidate gene was identified by expression analysis in affected and nonaffected fruits.

    However, conventional QTL analysis in biparental segregating populations with low-throughput molecular markers often suffers from low resolution of QTL loci. The analysis of unrelated individuals in so-called association mapping approaches can circumvent this problem and allows both higher resolution in the detection of QTLs and the possible detection of both more alleles and more effective QTLs than the biparental populations. However, this depends on how large and how diverse the population under study is (Nordborg and Weigel, 2008; Myles et al., 2009; Rafalski, 2010). The application of high-throughput marker systems, such as SNP arrays or GBS-based SNP genotyping, has significantly facilitated GWAS in horticultural crops. An example of complex ornamental genomes was recently described in a publication by Schulz et al., in which a set of 96 mostly polyploid rose genotypes were phenotyped for petal-related traits and genotyped with an Axiom68k SNP array. This revealed a number of associated loci, some of which only spanned small regions of the genome (Schulz et al., 2021).

    5.4: Understanding crop evolution

    Genome sequences and large datasets on expressed genes that resulted from omics projects also opened new opportunities in studies of crop plant evolution (Adams and Wendel, 2005; Clark and Donoghue, 2018; Soltis and Soltis, 2016). Here, either significantly extended sets of sequences or even the structure of chromosomes could be used to infer the evolutionary history of related crop species. This allows us to analyze the domestication history of crops and the genomic changes that happened during the diversification of crop species from progenitors. Examples from the Rosaceae include the publications of Illa et al. (2011) and Jung et al. (2012), where both markers and detailed genomic information were used to analyze synteny within rosaceous fruit crops with several fissions and fusions of syntenic chromosomes across the Rosoideae, Maloidea, and Prunoidea within the Rosaceae family, finally resulting in the postulation of an ancestral genome structure for all three groups (Illa et al., 2011; Jung et al., 2012). Other examples are results from comparisons between chromosome-level genomes of Cucumis sativus and Cucumis melo (Ling et al., 2021) with a focus on resistance-related genome regions or comparisons of the Raphanus species with NGS-generated high-density marker maps (Luo et al., 2020).

    Major results related to genome evolution based on omics data:

    •Every plant genome, including the genomes of diploid species with small genomes, such as Arabidopsis thaliana, contains ancient duplicated regions. These regions are the results of so-called paleopolyploidy and reflect the remains of ancient genome duplication events. Each extant angiosperm genome has undergone several such events since it diverged from its common ancestors with gymnosperms (Bayer et al., 2020; Clark and Donoghue, 2018).

    •Within each species, regions with extensive variability between genotypes of the same species exist (Cao et al., 2011; de Franceschi et al., 2018). This is pronounced for regions harboring members of large gene families, e.g., TIR-NBS genes. In some cases, little collinearity exists between otherwise related genotypes (de Boer et al., 2015; Ballvora et al., 2007).

    •Resequencing projects and analyses of transcriptome data revealed differences between the sets of expressed genes between individuals of the same species. These observations resulted in the postulation of the so-called pangenome theory describing the so-called core genome present in all genotypes of a species and the dispensable genome, which comprises genes present only in some of the genotypes. The core genome and dispensable genome together build the pangenome (Morgante et al., 2007; Hirsch et al., 2014; Hubner et al., 2019; Jayakodi et al., 2021).

    References

    Adams and Wendel, 2005 Adams K.L., Wendel J.F. Polyploidy and genome evolution in plants. Curr. Opin. Plant Biol. 2005;8(2):135–141. doi:10.1016/j.pbi.2005.01.001.

    Ahmad et al., 2021 Ahmad R., Anjum M.A., Naz S., Balal R.M. Applications of molecular markers in fruit crops for breeding programs—a review. Phyton Int. J. Exp. Bot. 2021;90(1):17–34. doi:10.32604/phyton.2020.011680.

    Atkins and Voytas, 2020 Atkins P.A.P., Voytas D.F. Overcoming bottlenecks in plant gene editing. Curr. Opin. Plant Biol. 2020;54:79–84. doi:10.1016/j.pbi.2020.01.002.

    Bai and Lindhout, 2007 Bai Y., Lindhout P. Domestication and breeding of tomatoes: what have we gained and what can we gain in the future?. Ann. Bot. 2007;100(5):1085–1094. doi:10.1093/aob/mcm150.

    Ballvora et al., 2007 Ballvora A., Joecker A., Viehoever P., Ishihara H., Paal J., Meksem K., et al. Comparative sequence analysis of Solanum and Arabidopsis in a hot spot for pathogen resistance on potato chromosome V reveals a patchwork of conserved and rapidly evolving genome segments. BMC Genomics. 2007;8:doi:10.1186/1471-2164-8-112.

    Barman et al., 2020 Barman A., Deb B., Chakraborty S. A glance at genome editing with CRISPR-Cas9 technology. Curr. Genet. 2020;66(3):447–462. doi:10.1007/s00294-019-01040-3.

    Bayer et al., 2020 Bayer P.E., Golicz A.A., Scheben A., Batley J., Edwards D. Plant pan-genomes are the new reference. Nat. Plants. 2020;6(8):914–920. doi:10.1038/s41477-020-0733-0.

    Bernatzky and Tanksley, 1986 Bernatzky R., Tanksley S.D. Toward a saturated linkage map in tomato based on isozymes and random CDNA sequences. Genetics. 1986;112(4):887–898.

    Bombarely et al., 2011 Bombarely A., Menda N., Tecle I.Y., Buels R.M., Strickler S., Fischer-York T., et al. The Sol Genomics Network (solgenomics.net): growing tomatoes using Perl. Nucleic Acids Res. 2011;39(1):D1149–D1155. doi:10.1093/nar/gkq866.

    Bombarely et al., 2016 Bombarely A., Moser M., Amrad A., Bao M., Bapaume L., Barry C.S., et al. Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida. Nat. Plants. 2016;2(6):doi:10.1038/NPLANTS.2016.74.

    Cai et al., 2015 Cai J., Liu X., Vanneste K., Proost S., Tsai W.C., Liu K.W., et al. The genome sequence of the orchid Phalaenopsis equestris. Nat. Genet. 2015;47(1):65. doi:10.1038/ng.3149.

    Cao et al., 2011 Cao J., Schneeberger K., Ossowski S., Guenther T., Bender S., Fitz J., et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 2011;43(10):956–U60. doi:10.1038/ng.911.

    Chen et al., 2021 Chen Y., Wang Y., Yang J., Zhou W., Dai S. Exploring the diversity of plant proteome. J. Integr. Plant Biol. 2021;doi:10.1111/jipb.13087.

    Clark and Donoghue, 2018 Clark J.W., Donoghue P.C.J. Whole-genome duplication and plant macroevolution. Trends Plant Sci. 2018;23(10):933–945. doi:10.1016/j.tplants.2018.07.006.

    Clevenger et al., 2015 Clevenger J., Chavarro C., Pearl S.A., Ozias-Akins P., Jackson S.A. Single nucleotide polymorphism identification in polyploids: a review, example, and recommendations. Mol. Plant. 2015;8(6):831–846. doi:10.1016/j.molp.2015.02.002.

    D’Hont et al., 2012 D’Hont A., Denoeud F., Aury J.-M., Baurens F.-C., Carreel F., Garsmeur O., et al. The banana (Musa acuminata ) genome and the evolution of monocotyledonous plants. Nature. 2012;488(7410):213. doi:10.1038/nature11241.

    de Boer et al., 2015 de Boer J.M., Datema E., Tang X., Borm T.J.A., Bakker E.H., van Eck H.J., et al. Homologues of potato chromosome 5 show variable collinearity in the euchromatin, but dramatic absence of sequence similarity in the pericentromeric heterochromatin. BMC Genomics. 2015;16:doi:10.1186/s12864-015-1578-1.

    de Franceschi et al., 2018 de Franceschi P., Bianco L., Cestaro A., Dondini L., Velasco R. Characterization of 25 full-length S-RNase alleles, including flanking regions, from a pool of resequenced apple cultivars. Plant Mol. Biol. 2018;97(3):279–296. doi:10.1007/s11103-018-0741-x.

    Debener and Mattiesch, 1999 Debener T., Mattiesch L. Construction of a genetic linkage map for roses using RAPD and AFLP markers. Theor. Appl. Genet. 1999;99(5):891–899. doi:10.1007/s001220051310.

    Finkers et al., 2021 Finkers R., van Kaauwen M., Ament K., Burger-Meijer K., Egging R., Huits H., et al. Insights from the first genome assembly of Onion (Allium cepa ). G3. 2021;11(9):jkab243. doi:10.1093/g3journal/jkab243.

    Guo et al., 2013 Guo S., Zhang J., Sun H., Salse J., Lucas W.J., Zhang H., et al. The draft genome of watermelon (Citrullus lanatus ) and resequencing of 20 diverse accessions. Nat. Genet. 2013;45(1):51. doi:10.1038/ng.2470.

    Hirakawa et al., 2014 Hirakawa H., Shirasawa K., Miyatake K., Nunome T., Negoro S., Ohyama A., et al. Draft genome sequence of eggplant (Solanum melongena L.): the representative solanum species indigenous to the old world. DNA Res. 2014;21(6):649–660. doi:10.1093/dnares/dsu027.

    Hirakawa et al., 2019 Hirakawa H., Sumitomo K., Hisamatsu T., Nagano S., Shirasawa K., Higuchi Y., et al. De novo whole-genome assembly in Chrysanthemum seticuspe , a model species of Chrysanthemums, and its application to genetic and gene discovery analysis. DNA Res. 2019;26(3):195–203. doi:10.1093/dnares/dsy048.

    Hirsch et al., 2014 Hirsch C.N., Foerster J.M., Johnson J.M., Sekhon R.S., Muttoni G., Vaillancourt B., et al. Insights into the maize pan-genome and pan-transcriptome. Plant Cell. 2014;26(1):121–135. doi:10.1105/tpc.113.119982.

    Huang et al., 2009 Huang S., Li R., Zhang Z., Li L., Gu X., Fan W., et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 2009;41(12):1275–1289. doi:10.1038/ng.475.

    Hubner et al., 2019 Hubner S., Bercovich N., Todesco M., Mandel J.R., Odenheimer J., Ziegler E., et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants. 2019;5(1):54–62. doi:10.1038/s41477-018-0329-0.

    Illa et al., 2011 Illa E., Sargent D.J., Lopez Girona E., Bushakra J., Cestaro A., Crowhurst R., et al. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family. BMC Evol. Biol. 2011;11:9. doi:10.1186/1471-2148-11-9.

    Iorizzo et al., 2016 Iorizzo M., Ellison S., Senalik D., Zeng P., Satapoomin P., Huang J., et al. A high-quality carrot genome assembly provides new insights into carotenoid accumulation and asterid genome evolution. Nat. Genet. 2016;48(6):657. doi:10.1038/ng.3565.

    Jaganathan et al., 2020 Jaganathan D., Bohra A., Thudi M., Varshney R.K. Fine mapping and gene cloning in the post-NGS era: advances and prospects. Theor. Appl. Genet. 2020;133(5):1791–1810. doi:10.1007/s00122-020-03560-w.

    Jaillon et al., 2007 Jaillon O., Aury J.-M., Noel B., Policriti A., Clepet C., Casagrande A., et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449(7161):463–465. doi:10.1038/nature06148.

    Jamil et al., 2020 Jamil I.N., Remali J., Azizan K.A., Muhammad N., Azlan N., Arita M., Goh H.-H., Aizat W.M. Systematic multi-omics integration (MOI) approach in plant systems biology. Front. Plant Sci. 2020;11. doi:10.3389/fpls.2020.00944.

    Jarrell et al., 1992 Jarrell D.C., Roose M.L., Traugh S.N., Kupper R.S. A genetic-map of citrus based on the segregation of isozymes and RFLPS in an intergeneric cross. Theor. Appl. Genet. 1992;84(1–2):49–56. doi:10.1007/BF00223980.

    Jayakodi et al., 2021 Jayakodi M., Schreiber M., Stein N., Mascher M. Building pan-genome infrastructures for crop plants and their use in association genetics. DNA Res. 2021;28(1):doi:10.1093/dnares/dsaa030.

    Jung et al., 2012 Jung S., Cestaro A., Troggio M., Main D., Zheng P., Cho I., et al. Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies. BMC Genomics. 2012;13:129. doi:10.1186/1471-2164-13-129.

    Kaul et al., 2000 Kaul S., Koo H.L., Jenkins J., Rizzo M., Rooney T., Tallon L.J., et al. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408(6814):796–815. doi:10.1038/35048692.

    Keller et al., 2011 Keller O., Kollmar M., Stanke M., Waack S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics. 2011;27(6):757–763. doi:10.1093/bioinformatics/btr010.

    Kolker, 2004 Kolker E. Editorial. OMICS J. Integr. Biol. 2004;8(1):1. doi:10.1089/153623104773547444.

    Kumar et al., 2021a Kumar R., Sharma V., Suresh S., Ramrao D.P., Veershetty A., Kumar S., et al. Understanding omics driven plant improvement and de novo crop domestication: some examples. Front. Genet. 2021a;12:doi:10.3389/fgene.2021.637141.

    Kumar et al., 2021b Kumar R., Sharma V., Suresh S., Ramrao D.P., Veershetty A., Kumar S., et al. Understanding omics driven plant improvement and de novo crop domestication: some examples. Front. Genet. 2021b;12:637141. doi:10.3389/fgene.2021.637141.

    Lam et al., 2012 Lam E.T., Hastie A., Lin C., Ehrlich D., Das S.K., Austin M.D., et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 2012;30(8):771–776. doi:10.1038/nbt.2303.

    Lashbrooke et al., 2015 Lashbrooke J., Aharoni A., Costa F. Genome investigation suggests MdSHN3, an APETALA2-domain transcription factor gene, to be a positive regulator of apple fruit cuticle formation and an inhibitor of russet development. J. Exp. Bot. 2015;66(21):6579–6589. doi:10.1093/jxb/erv366.

    Le Nguyen et al., 2019 Le Nguyen K., Grondin A., Courtois B., Gantet P. Next-generation sequencing accelerates crop gene discovery. Trends Plant Sci. 2019;24(3):263–274. doi:10.1016/j.tplants.2018.11.008.

    Ling et al., 2021 Ling J., Xie X., Gu X., Zhao J., Ping X., Li Y., et al. High-quality chromosome-level genomes of Cucumis metuliferus and Cucumis melo provide insight into Cucumis genome evolution. Plant J. 2021;doi:10.1111/tpj.15279.

    Liu et al., 2014 Liu S., Liu Y., Yang X., Tong C., Edwards D., Parkin I.A.P., et al. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes. Nat. Commun. 2014;5:doi:10.1038/ncomms4930.

    Lowe et al., 2017 Lowe R., Shirley N., Bleackley M., Dolan S., Shafee T. Transcriptomics technologies. PLoS Comput. Biol. 2017;13(5):e1005457. doi:10.1371/journal.pcbi.1005457.

    Lu et al., 2016 Lu M., An H., Li L. Genome survey sequencing for the characterization of the genetic background of Rosa roxburghii Tratt and leaf ascorbate metabolism genes. PLoS One. 2016;11(2):doi:10.1371/journal.pone.0147530.

    Luo et al., 2020 Luo X., Xu L., Wang Y., Dong J., Chen Y., Tang M., et al. An ultra-high-density genetic map provides insights into genome synteny, recombination landscape and taproot skin colour in radish (Raphanus sativus L.). Plant Biotechnol. J. 2020;18(1):274–286. doi:10.1111/pbi.13195.

    Main et al., 2016 Main D., Jung S., Cheng C.-H., Lee T., Ficklin S.P., Gasic K., et al. Genome database for Rosaceae: a resource for genomics, genetics and breeding research. Hortscience. 2016;51(9):S134.

    Margulies et al., 2005 Margulies M., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L., et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437(7057):376–380. doi:10.1038/nature03959.

    Ming et al., 2008 Ming R., Hou S., Feng Y., Yu Q., Dionne-Laporte A., Saw J.H., et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008;452(7190):991–997. doi:10.1038/nature06856.

    Modrzejewski et al., 2019 Modrzejewski D., Hartung F., Sprink T., Krause D., Kohl C., Wilhelm R. What is the available evidence for the range of applications of genome-editing as a new tool for plant trait modification and the potential occurrence of associated off-target effects: a systematic map. Environ. Evid. 2019;8(1):11. doi:10.1186/s13750-019-0171-5.

    Morgante et al., 2007 Morgante M., de Paoli E., Radovic S. Transposable elements and the plant pan-genomes. Curr. Opin. Plant Biol. 2007;10(2):149–155. doi:10.1016/j.pbi.2007.02.001.

    Myles et al., 2009 Myles S., Peiffer J., Brown P.J., Ersoz E.S., Zhang Z., Costich D.E., Buckler E.S. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell. 2009;21(8):2194–2202. doi:10.1105/tpc.109.068437.

    Nordborg and Weigel, 2008 Nordborg M., Weigel D. Next-generation genetics in plants. Nature. 2008;456(7223):720–723. doi:10.1038/nature07629.

    Rafalski, 2010 Rafalski J.A. Association genetics in crop improvement. Curr. Opin. Plant Biol. 2010;13(2):174–180. doi:10.1016/j.pbi.2009.12.004.

    Rajapakse, 2003 Rajapakse S. Progress in application of molecular markers to genetic improvement of horticultural crops. Acta Hortic. 2003;625:29–36.

    Ran et al., 2020 Ran X., Zhao F., Wang Y., Liu J., Zhuang Y., Ye L., et al. Plant Regulomics: a data-driven interface for retrieving upstream regulators from plant multi-omics data. Plant J. 2020;101(1):237–248. doi:10.1111/tpj.14526.

    Rasheed et al., 2017 Rasheed A., Hao Y., Xia X., Khan A., Xu Y., Varshney R.K., He Z. Crop breeding chips and genotyping platforms: progress, challenges, and perspectives. Mol. Plant. 2017;10(8):1047–1064. doi:10.1016/j.molp.2017.06.008.

    Reyes-Chin-Wo et al., 2017 Reyes-Chin-Wo S., Wang Z., Yang X., Kozik A., Arikit S., Song C., et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 2017;8:doi:10.1038/ncomms14953.

    Rout and Mohapatra, 2006 Rout G.R., Mohapatra A. Use of molecular markers in ornamental plants: a critical reappraisal. Eur. J. Hortic. Sci. 2006;71(2):53–68.

    Sánchez-Pérez et al., 2019 Sánchez-Pérez R., Pavan S., Mazzeo R., Moldovan C., Aiese Cigliano R., Del Cueto J., et al. Mutation of a bHLH transcription factor allowed almond domestication. Science. 2019;364(6445):1095–1098. doi:10.1126/science.aav8197.

    Sato et al., 2012 Sato S., Tabata S., Hirakawa H., Asamizu E., Shirasawa K., Isobe S., et al. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012;485(7400):635–641. doi:10.1038/nature11119.

    Schulz et al., 2016 Schulz D.F., Schott R.T., Voorrips R.E., Smulders M.J.M., Linde M., Debener T. Genome-wide association analysis of the anthocyanin and carotenoid contents of rose petals. Front. Plant Sci. 2016;7:1798.

    Schulz et al., 2021 Schulz D., Linde M., Debener T. Detection of reproducible major effect QTL for petal traits in garden roses. Plants-Basel. 2021;10(5):doi:10.3390/plants10050897.

    Shulaev et al., 2011 Shulaev V., Sargent D.J., Crowhurst R.N., Mockler T.C., Folkerts O., Delcher A.L., et al. The genome of woodland strawberry (Fragaria vesca ). Nat. Genet. 2011;43(2):109–116. doi:10.1038/ng.740.

    Simoneau et al., 2021 Simoneau J., Dumontier S., Gosselin R., Scott M.S. Current RNA-seq methodology reporting limits reproducibility. Brief. Bioinform. 2021;22(1 SI):140–145. doi:10.1093/bib/bbz124.

    Smulders et al., 2019 Smulders M.J.M., Arens P., Bourke P.M., Debener T., Linde M., de Riek J., et al. In the name of the rose: a roadmap for rose research in the genome era. Hortic. Res. 2019;6(1):65.

    Soltis and Soltis, 2016 Soltis P.S., Soltis D.E. Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 2016;30:159–165. doi:10.1016/j.pbi.2016.03.015.

    Stanke et al., 2006 Stanke M., Schoffmann O., Morgenstern B., Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7:doi:10.1186/1471-2105-7-62.

    Torkamaneh et al., 2018 Torkamaneh D., Boyle B., Belzile F. Efficient genome-wide genotyping strategies and data integration in crop plants. Theor. Appl. Genet. 2018;131(3):499–511. doi:10.1007/s00122-018-3056-z.

    van Dijk et al., 2018 van Dijk E.L., Jaszczyszyn Y., Naquin D., Thermes C. The third revolution in sequencing technology. Trends Genet. 2018;34(9):666–681. doi:10.1016/j.tig.2018.05.008.

    Velasco et al., 2010 Velasco R., Zharkikh A., Affourtit J., Dhingra A., Cestaro A., Kalyanaraman A., et al. The genome of the domesticated apple (Malus x domestica Borkh.). Nat. Genet. 2010;42(10):833. doi:10.1038/ng.654.

    Winkler, 1920 Winkler H. Verbreitung und Ursache der Parthenogene-sis im Pflanzen- und Tierreiche. 1 Band Jena: Fischer; 1920.

    Xu et al., 2013 Xu Q., Chen L.-L., Ruan X., Chen D., Zhu A., Chen C., et al. The draft genome of sweet orange (Citrus sinensis ). Nat. Genet. 2013;45(1):59–92. doi:10.1038/ng.2472.

    Yagi et al., 2014 Yagi M., Kosugi S., Hirakawa H., Ohmiya A., Tanase K., Harada T., et al. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.). DNA Res. 2014;21(3):231–241. doi:10.1093/dnares/dst053.

    Yang and Wei, 2015 Yang C., Wei H. Designing microarray and RNA-Seq experiments for greater systems biology discovery in modern plant genomics. Mol. Plant. 2015;8(2):196–206. doi:10.1016/j.molp.2014.11.012.

    Zhang et al., 2012 Zhang Q., Chen W., Sun L., Zhao F., Huang B., Yang W., et al. The genome of Prunus mume. Nat. Commun. 2012;3:doi:10.1038/ncomms2290.

    Zhang et al., 2019 Zhang L., Hu J., Han X., Li J., Gao Y., Richards C.M., et al. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat. Commun. 2019;10(1):1494. doi:10.1038/s41467-019-09518-x.

    Zhang et al., 2020 Zhang J., Yang J., Zhang L., Luo J., Zhao H., Zhang J., Wen C. A new SNP genotyping technology Target SNP-seq and its application in genetic analysis of cucumber varieties. Sci. Rep. 2020;10(1):5623. doi:10.1038/s41598-020-62518-6.

    Zheng et al., 2021 Zheng T., Li P., Li L., Zhang Q. Research advances in and prospects of ornamental plant genomics. Hortic. Res. 2021;8(1):doi:10.1038/s41438-021-00499-x.

    Chapter 2: Small RNA-omics: Decoding the regulatory networks associated with horticultural traits

    Jitendra Kumar Mohanty; Swarup Kumar Parida    Genomics-assisted Breeding and Crop Improvement Laboratory, National Institute of Plant Genome Research (NIPGR), New Delhi, India

    Abstract

    Small RNAs (sRNAs) are the noncoding RNA functioning as ubiquitous regulators of gene expressions at the genetic and epigenetic levels. They are classified into several categories like miRNA and siRNA based on their complex biogenesis process and mode of action. The wide distribution and involvement in several biological processes make these tiny regulatory particles a critical source of investigation, starting from the model plants to the advanced crop species. At an initial stage, several types of research related to sRNAs are witnessed in the model species, but with the advancement of next generation sequencing (NGS) and computational genomics, the sRNA-omics driven regulations are extensively investigated in the horticultural crops. The current chapter mainly focuses on understanding the different categories of sRNAs, identification methods of novel and existing sRNAs and their targets, followed by their validation. Further, this chapter elaborates diverse strategies used in horticultural crops to characterize different sRNAs and their targets regulating vital biological processes in response to various developmental signals, pathogen infection, and environmental stress.

    Keywords

    Small RNA; miRNA; siRNA; Gene silencing

    1: Introduction

    Small RNAs are tiny double-stranded RNA molecule of 20–30 nucleotide length, which shows various regulatory properties. They can target the chromatin as well as transcript, hence can influence the central dogma at the genomic as well as the transcriptomic level (Guleria et al., 2011; Weiberg et al., 2014). The genomic-level regulation is done by RNA-dependent DNA methylation, histone modification, whereas the transcript-level regulation occurs by posttranscriptional gene silencing (PTGS) (Weiberg et al., 2014; Wang et al., 2020). Over the past two decades, the field of biological science has witnessed a rapid surge of research on small RNA revealing their diversity, mechanism of action, and biological function (Chen, 2009). However, behind this rapid surge, there lies a steady state of discoveries. Earlier the eukaryotic gene expressions and their regulations dwelled mainly around the protein, considering them as a sole regulatory molecule (Szymański et al., 2003) but with the passage of time, the concept of small RNA came into the picture with a serendipitous discovery of small noncoding dsRNA in Caenorhabditis elegans, which can switch off the translation process like a potent regulator (Fire et al., 1998). Although the small RNA got a clear picture in 1998, its presence and effect have been observed several years earlier in plants and fungi. At that time, the role of dsRNA was not known, and hence, different terms were used to describe the phenomena such as co-suppression in plants and quelling in fungi (Napoli et al., 1990; Van der Krol et al., 1990; Romano and Macino, 1992; Cogoni et al., 1996). About 10 years after the discovery of animal small RNA, the first plant sRNA was reported, and then the research in this area took pace around mid-2002 (Kamthan et al., 2015). Subsequently, the advancement in next-generation sequencing and third-generation sequencing has accelerated the decoding of several genomic sequences of plants. This high-throughput technology with advanced computational genomics has revolutionized the discovery of several small RNAs and generates a global landscape of sRNAs with their regulatory role in numerous developmental and stress-related processes. Keeping this in mind, the current chapter will provide a thorough idea about the different class of small RNAs (sRNAs), their biogenesis and mode of action, identification-cum-validation strategies, and their role in improving horticultural traits.

    2: Classification and biogenesis of small RNAs

    Plant small RNAs (sRNAs) derived from the processing of helical RNA precursors. Based on the nature of origin, small RNA can be classified into two dichotomous groups, the first group consists of those derived from a single-stranded precursor that forms intermolecular hairpin structure due to self-complementarity, and the other group includes those derived from the double-stranded RNA precursor formed by two distinct complementary RNA strands (Axtell, 2013). The microRNA (miRNA) falls into the first group, whereas different types of small interfering RNAs (siRNA) such as trans-activating siRNA (ta-siRNA), natural antisense siRNAs (nat-siRNA) fall into the second group. Apart from these two groups, other forms of biogenesis also exists, such as 22G RNAs of C. elegans, which derived from direct transcription by endogenous RNA-dependent RNA polymerase (Pak and Fire, 2007), and Piwi-associated RNAs (piRNAs) in animals, which derived from nonhelical single-stranded precursor (Juliano et al., 2011). These two forms of biogenesis are generally not found in plants. Plants mainly possess two forms of sRNAs, i.e., miRNAs and siRNAs, so this chapter will revolve around these two sRNAs (Fig. 1).

    Fig. 1

    Fig. 1 Overview of different strategies used for sRNAs/sRNA targets identification, validation, and their implication in crop genetic improvement. GO , gene ontology; KEGG , kyoto encyclopedia of genes and genomes; NGS , next-generation sequencing; rRNA , ribosomal RNA; snoRNA , small nucleolar RNA; tRNA , transfer RNA.

    2.1: Micro RNA (miRNA)

    miRNAs are widely studied and most important sRNA in the plant, which plays a significant role in various biological processes. In general, plant miRNAs are widely conserved and 20–24 nucleotide (nt) long. They have a complex biogenesis process, which starts from the transcription of 70–300 nt long MIR genes (Kim, 2005; Mirlohi and He, 2016). MIR genes are transcribed by RNA polymerase-II giving rise to a primary transcript (pri-miRNA) (Lee et al., 2004), which is capped at 5′ site and polyadenylated at 3′ site just like any other mRNA transcript. This pri-miRNA forms a fold-back structure (Baulcombe, 2004) due to self-complementarity (Allen et al., 2005; Jones-Rhoades et al., 2006; Voinnet, 2009; Achkar et al., 2016), which is further cleaved by an RNase III family endonuclease DCL1(DICER-LIKE 1) to give rise to precursor miRNAs (pre-miRNAs). Again DCL1 cleaves the pre-miRNA to produce mature miRNA, which is a duplex of miRNA/miRNA* (Kurihara et al., 2006). The so formed miRNA/miRNA* duplex possesses 3′dinucleotide overhangs, which subsequently got 2-O-methylated by HEN1 (HUAENHANCER1) (Yu et al., 2005; Achkar et al., 2016). All these above-described cleavage and modification steps take place inside the nucleus, but once the duplex got methylated, it is transported to the cytoplasm by HASTY (HST) (Park et al., 2005). In the cytoplasm, the miRNA* gets separated from the duplex, and the remaining mature miRNA gets loaded into the ARGONAUTE 1 (AGO1) protein to form RNA-INDUCED SILENCING COMPLEX (RISC) (Baumberger and Baulcombe, 2005; Yu et al., 2017). This RISC is a potent gene silencing complex that searches for the sequence complementarity according to its guide RNA (miRNA). Then, this miRNA-guided RISC silences the expression of the target gene either by translation inhibition or by transcript cleavage (Jones-Rhoades et al., 2006; Achkar et al., 2016).

    2.2: Small interfering RNA (siRNA)

    First siRNA in plant was reported back in 1999 followed by the discovery of a diverse set of siRNAs in the subsequent years (Hamilton and Baulcombe, 1999; Llave et al., 2002a,b; Reinhart et al., 2002). Plant siRNAs are generally around 24 nucleotide long, and unlike miRNA, they can be generated either from endogenous (host’s own genome) or exogenous (external transgene or viruses) gene. In plants, the precursors of siRNA are either long double-stranded RNA or transcript generated from inverted repeat regions (Chen, 2009; Voinnet, 2009). According to their biogenesis, siRNAs are further classified into several groups such as phased siRNA (phasiRNAs), natural antisense siRNAs (nat-siRNAs), repeat-associated siRNAs (rasiRNAs), etc. PhasiRNAs are called miRNA triggered siRNAs because they are produced by the miRNA-driven cleavage of primary PHAS transcript, which is generated from PHAS gene (Chen et al., 2018; Ramachandran et al., 2017). In case of nat-siRNAs, the dsRNA precursor is formed by the combination of two RNA strands, which are transcribed separately from either same locus or different locus (Jin et al., 2008). rasiRNA shares 24 nt long unique group of siRNA found in plants and is involved in transcriptional gene silencing by methylating the DNA molecule (Wassenegger, 2005). The methylation leads to the histone modification in the homologous region, which subsequently leads to the transcriptional inhibition of the particular gene. The mode of action and biogenesis of SiRNAs are more or less similar to that of miRNAs except a few components may vary for siRNA processing as well as gene targeting. Generally, the Argonaute proteins AGO1 and AGO10 are found in case of miRNA, but for siRNA, the RISC mainly contains AGO1, AGO4, AGO6, or AGO7. Also, for miRNA, the mechanism of gene regulation occurs mainly at posttranscriptional level either by translational repression or mRNA degradation, whereas for siRNA, the regulation can occur at both transcriptional and posttranscriptional level by means of histone modification, DNA methylation, and mRNA degradation (Sunkar and Zhu, 2007; Guleria et al., 2011).

    3: Identification and validation of small RNAs

    The discovery of small RNA has followed quite a roller coaster ride over the past few years. During the initial days, sRNAs were mainly discovered either by cloning followed by sequencing or by genetic screening of phenotypically different traits (McConnell and Barton, 1998; McConnell et al., 2001; Llave et al., 2002a,b; Reinhart et al., 2002; Lu et al., 2005). Among sRNAs, the miRNA is an important member whose classical cloning strategies mainly relied on bioinformatics prediction following all the defined criteria for miRNA annotation and subsequent validation by several experimental techniques (Meyers et al., 2008). The precursor of miRNA mainly forms stem-loop structure (Voinnet, 2009); hence, the confidence of prediction can be enhanced by analyzing the secondary structure of our predicted/cloned transcript by RNAfold software (Hofacker, 2003; Li et al., 2019). With time, several small RNAs got discovered steadily and accumulated in RNA databases (such as miRBase for miRNA), and their analysis has shown a conserve nature among species. This conserved nature of small RNA guided the prediction techniques toward microarray-based prediction (Jeyaraj et al., 2017) in which the probe was made based on the available resources and can capture different miRNAs with different stages of development or with different biotic and abiotic stimulus. However, the main demerit of the microarray technique is its fixed probe number, which limits the novel sRNA identification. Again the background noise and cross-hybridization limit its potential (Yin et al., 2008; Chen et al., 2010). To solve the problem in this hybridization-based approach cost-effective sequencing-based methods such as next-generation sequencing (NGS) and third-generation sequencing came into picture with a feature of de novo sRNAs prediction (Simon et al., 2009). In this approach, no previous information is required before sequencing so it can discover both conserved and novel sRNAs (Chen et al., 2010). In this method, total sRNAs are isolated through lithium enrichment techniques and are subsequently being sequenced. From the sequencing reads, the unwanted small RNAs (tRNA, rRNA, snoRNA) are filtered out, and after that based on the specific features, the miRNA and siRNA are annotated (Chappell et al., 2006; Pilcher et al., 2007; Carra et al., 2009; Liu et al., 2017a,b,c; Li et al., 2019).

    These advanced technologies mainly demand the knowledge of bioinformatics and depend upon various tools such as TopHat (Trapnell et al., 2009), Cufflinks (Trapnell et al., 2010), HMMER (Eddy, 2009), miRDeep-P (Yang and Li, 2011), miR-PREFeR (Lei and Sun, 2014), miRPlant (An et al., 2014), and miRCat2 (Paicu et al., 2017). An integrated software called sRNA toolbox is available for the analysis of sRNAs deep sequencing data. It is a collection of different tools, which can perform different functions, such as genome mapping, novel miRNA prediction, differential expression analysis, annotation of unassigned reads, consensus target prediction, and functional annotation of a given target (Rueda et al., 2015; Gómez-Martín et al., 2017). With the advancement of small RNA research, several features about small RNA (miRNA and siRNA) get cleared. Based on these accumulated ideas, several in silico sRNA gene prediction techniques were developed. These prediction techniques are of two types: homology-based and ab initio miRNA gene prediction (prediction only from the primary sequence) (Yousef and Allmer, 2014).

    Once the sRNA is predicted, the next step lies in its validation to know whether it is a potent sRNA or not. For validation, several techniques are used depending upon the need of a specific study. These techniques may be stem-loop RT-qPCR (Smoczynska et al., 2019), northern blot analysis (De la Rosa and Reyes, 2019), in situ hybridization techniques (Chen, 2004; Sieber et al., 2007), or generating reporter line of GFP (green fluorescent protein) or GUS (b-glucuronidase) tagged sRNA promoters to validate the expression pattern of sRNAs (Kawashima et al., 2009; Chen et al., 2010) (Fig. 1).

    4: Small RNA target prediction and validation

    Starting from the discovery, the knowledge of sRNA has increased significantly. The basic features of sRNAs are that they target the genes based on complementarity. Based on several available data and empirical case study, few tools have been developed for in silico target predictions such as miRU (Zhang, 2005) and its updated version psRNATarget (data; http://bioinfo3.noble.org/psRNATarget/), Target Finder (Allen et al., 2005; Fahlgren et al., 2007), and psRobot (Wu et al., 2012). These tools provide an opportunity to customize several parameters and require a certain plant genomic dataset for candidate target identification (Chen et al., 2010). Apart from that, several recently developed tools such as comTar (Chorostecki and Palatnik, 2014), Tapirhybrid (Bonnet et al., 2010), CIDER (Zhang et al., 2016) have shown great efficiency with fewer false-positive results. Cytoscape is one of the important miRNA-Target visualizing networks, which can be integrated with other database and can be run in any operating system (http://www.cytoscape.org/). These tools have been developed by the principle of complementarity between sRNA and its target, which is further enriched with various supporting features such as free energy change, target site accessibility, evolutionary relatedness, multiplicity of target site, location of the target site (coding region or UTR), and outcome of the target (cleavage or translational repression) (Pandey et al., 2019). Once the target gene for a given sRNA is predicted, then the Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis are performed to predict in which pathways the particular target genes are involved. All these analyses give us a clue about the biological pathways, which might be regulated by that particular sRNA.

    For the sRNA’s target validation, knowing the mode of action of sRNA is very important. One of the regulatory roles of sRNA is target cleavage. The important thing about target cleavage is that it generates a ligation competent 5′ monophosphate group. This 5′ end has the potential to be ligated with an RNA adaptor, which can be used for the detection of degraded mRNA through 5′ RNA Ligase-Mediated-Rapid Amplification of cDNA Ends (5′ RLM-RACE) technique (Llave et al., 2002a,b). It is a widely used strategy for the identification of cleavage product screening and target site mapping, but the task becomes tedious when we go for large-scale mass target validation. To overcome this problem, a high-throughput NGS-based strategy is generated, which is known as Parallel Analysis of RNA Ends (PARE) or degradome sequencing (Addo-Quaye et al., 2008; German et al., 2008). In this technique, 5′ RLM-RACE is used for the preparation of a degradome library, which contains 3′ cleaved mRNA product, and subsequently sequencing these libraries through NGS can provide a genome-wide degraded/cleaved mRNA prediction. It provides a large amount of data regarding RNA degradation and is used for large-scale cleavage site

    Enjoying the preview?
    Page 1 of 1