Frontiers in Clinical Drug Research - Anti Infectives: Volume 3
()
About this ebook
Atta-ur Rahman
Atta-ur-Rahman, Professor Emeritus, International Center for Chemical and Biological Sciences (H. E. J. Research Institute of Chemistry and Dr. Panjwani Center for Molecular Medicine and Drug Research), University of Karachi, Pakistan, was the Pakistan Federal Minister for Science and Technology (2000-2002), Federal Minister of Education (2002), and Chairman of the Higher Education Commission with the status of a Federal Minister from 2002-2008. He is a Fellow of the Royal Society of London (FRS) and an UNESCO Science Laureate. He is a leading scientist with more than 1283 publications in several fields of organic chemistry.
Read more from Atta Ur Rahman
Applications of NMR Spectroscopy: Volume 6 Rating: 0 out of 5 stars0 ratingsOne and Two Dimensional NMR Spectroscopy Rating: 0 out of 5 stars0 ratingsUrban Disasters and Resilience in Asia Rating: 0 out of 5 stars0 ratingsSolving Problems with NMR Spectroscopy Rating: 3 out of 5 stars3/5Frontiers in Anti-Infective Drug Discovery: Volume 6 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Alzheimer Disorders: Volume 6 Rating: 0 out of 5 stars0 ratingsGenes in Health and Disease Rating: 0 out of 5 stars0 ratingsFrontiers in Drug Design & Discovery: Volume 8 Rating: 0 out of 5 stars0 ratingsAnti-Angiogenesis Drug Discovery and Development: Volume 4 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Anti-Allergy Agents: Volume 3 Rating: 0 out of 5 stars0 ratingsAdvances in Organic Synthesis: Volume 7 Rating: 0 out of 5 stars0 ratingsApplications in Food Sciences Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Anti-Cancer Agents: Volume 3 Rating: 0 out of 5 stars0 ratingsFrontiers in Drug Design & Discovery: Volume 7 Rating: 0 out of 5 stars0 ratingsApplications of Modern Mass Spectrometry: Volume 1 Rating: 0 out of 5 stars0 ratingsApplications of NMR Spectroscopy: Volume 8 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - HIV: Volume 5 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Hematology: Volume 2 Rating: 0 out of 5 stars0 ratingsAdvances in Organic Synthesis: Volume 17 Rating: 0 out of 5 stars0 ratingsFrontiers in Cardiovascular Drug Discovery: Volume 5 Rating: 0 out of 5 stars0 ratingsRecent Advances in Analytical Techniques: Volume 4 Rating: 0 out of 5 stars0 ratingsFrontiers in Anti-Infective Drug Discovery: Volume 5 Rating: 0 out of 5 stars0 ratingsFrontiers in Stem Cell and Regenerative Medicine Research: Volume 4 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Anti-Cancer Agents: Volume 7 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Alzheimer Disorders: Volume 5 Rating: 0 out of 5 stars0 ratingsTopics in Anti-Cancer Research: Volume 8 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Hematology: Volume 3 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Diabetes and Obesity: Volume 2 Rating: 0 out of 5 stars0 ratings
Related to Frontiers in Clinical Drug Research - Anti Infectives
Related ebooks
Frontiers in Clinical Drug Research - Anti Infectives: Volume 3 Rating: 0 out of 5 stars0 ratingsFrontiers in Anti-Infective Drug Discovery: Volume 9 Rating: 0 out of 5 stars0 ratingsFrontiers in Medicinal Chemistry: Volume 9 Rating: 0 out of 5 stars0 ratingsFrontiers in Anti-Infective Drug Discovery: Volume 5 Rating: 0 out of 5 stars0 ratingsGenes in Health and Disease Rating: 0 out of 5 stars0 ratingsSystems Biology in Cancer Immunotherapy Rating: 0 out of 5 stars0 ratingsNatural Products for Skin Diseases: A Treasure Trove for Dermatologic Therapy Rating: 0 out of 5 stars0 ratingsTopics in Anti-Cancer Research: Volume 10 Rating: 0 out of 5 stars0 ratingsPlant-Based Genetic Tools for Biofuels Production Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Anti Infectives: Volume 7 Rating: 0 out of 5 stars0 ratingsFrontiers in Anti-Cancer Drug Discovery: Volume 12 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Anti Infectives: Volume 5 Rating: 0 out of 5 stars0 ratingsAntiprotozoal Drug Discovery: A Challenge That Remains Rating: 0 out of 5 stars0 ratingsGenome Editing in Bacteria (Part 1) Rating: 0 out of 5 stars0 ratingsAdvances in Organic Synthesis: Volume 17 Rating: 0 out of 5 stars0 ratingsFrontiers in Anti-Infective Drug Discovery: Volume 7 Rating: 0 out of 5 stars0 ratingsPostbiotics: Science, Technology, and Applications Rating: 0 out of 5 stars0 ratingsEmerging Technologies in Agriculture and Food Science Rating: 0 out of 5 stars0 ratingsDisease Prediction using Machine Learning, Deep Learning and Data Analytics Rating: 0 out of 5 stars0 ratingsAnti-Angiogenesis Drug Discovery and Development: Volume 5 Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Anti Infectives: Volume 4 Rating: 0 out of 5 stars0 ratingsAdulteration Analysis of Some Foods and Drugs Rating: 0 out of 5 stars0 ratingsApplications of Nanomaterials in Medical Procedures and Treatments Rating: 0 out of 5 stars0 ratingsRecent Advances in Analytical Techniques: Volume 1 Rating: 0 out of 5 stars0 ratingsAnti-obesity Drug Discovery and Development: Volume 3 Rating: 0 out of 5 stars0 ratingsFrontiers in Natural Product Chemistry: Volume 10 Rating: 0 out of 5 stars0 ratingsQuick Guideline for Computational Drug Design (Revised Edition) Rating: 0 out of 5 stars0 ratingsThyroid Toxicity Rating: 0 out of 5 stars0 ratingsCellular Mechanisms in Alzheimer’s Disease Rating: 0 out of 5 stars0 ratings
Chemistry For You
A to Z Magic Mushrooms Making Your Own for Total Beginners Rating: 0 out of 5 stars0 ratingsOrganic Chemistry I For Dummies Rating: 5 out of 5 stars5/5Chemistry: Concepts and Problems, A Self-Teaching Guide Rating: 5 out of 5 stars5/5Chemistry For Dummies Rating: 4 out of 5 stars4/5Biochemistry For Dummies Rating: 5 out of 5 stars5/5MCAT General Chemistry Review 2024-2025: Online + Book Rating: 0 out of 5 stars0 ratingsGeneral Chemistry Rating: 4 out of 5 stars4/5Chemistry: a QuickStudy Laminated Reference Guide Rating: 5 out of 5 stars5/5Fundamentals of Chemistry: A Modern Introduction Rating: 5 out of 5 stars5/5Organic Chemistry for Schools: Advanced Level and Senior High School Rating: 0 out of 5 stars0 ratingsThe Secrets of Alchemy Rating: 4 out of 5 stars4/5College Chemistry Rating: 4 out of 5 stars4/5An Introduction to the Periodic Table of Elements : Chemistry Textbook Grade 8 | Children's Chemistry Books Rating: 5 out of 5 stars5/5Handbook of Histopathological and Histochemical Techniques: Including Museum Techniques Rating: 4 out of 5 stars4/5Painless Chemistry Rating: 0 out of 5 stars0 ratingsThe Chemistry Book: From Gunpowder to Graphene, 250 Milestones in the History of Chemistry Rating: 5 out of 5 stars5/5Chemistry Workbook For Dummies with Online Practice Rating: 0 out of 5 stars0 ratingsMCAT Organic Chemistry Review 2024-2025: Online + Book Rating: 0 out of 5 stars0 ratingsStuff Matters: Exploring the Marvelous Materials That Shape Our Man-Made World Rating: 4 out of 5 stars4/5TIHKAL: The Continuation Rating: 4 out of 5 stars4/5Elementary: The Periodic Table Explained Rating: 0 out of 5 stars0 ratingsOrganic Chemistry II For Dummies Rating: 4 out of 5 stars4/5Chemistry All-in-One For Dummies (+ Chapter Quizzes Online) Rating: 0 out of 5 stars0 ratingsOrganic Chemistry I Essentials Rating: 4 out of 5 stars4/5Cannabis Alchemy: Art of Modern Hashmaking Rating: 0 out of 5 stars0 ratingsThe Nature of Drugs Vol. 1: History, Pharmacology, and Social Impact Rating: 5 out of 5 stars5/5PIHKAL: A Chemical Love Story Rating: 4 out of 5 stars4/5Chemistry for Breakfast: The Amazing Science of Everyday Life Rating: 4 out of 5 stars4/5AP Chemistry Flashcards, Fourth Edition: Up-to-Date Review and Practice Rating: 0 out of 5 stars0 ratingsMendeleyev's Dream Rating: 4 out of 5 stars4/5
Reviews for Frontiers in Clinical Drug Research - Anti Infectives
0 ratings0 reviews
Book preview
Frontiers in Clinical Drug Research - Anti Infectives - Atta-ur Rahman
Modern Approaches to Genome Mining for the Development of New Anti-infectives: In Silico Gene Prediction and Experimental Metabolomics
INTRODUCTION
More than half of the known natural products that have antimicrobial, antiviral or antitumor activity originate from only five cultivated bacterial groups: filamentous Actinomycetes, Myxobacteria, Cyanobacteria, as well as members of the genera Pseudomonas and Bacillus [1]. Actinomycetes are Gram-positive mycelial bacteria found mainly in the soil, but are also present in symbiotic association with terrestrial and aquatic invertebrates [2]. Actinomycetes produce metabolites as they undergo the morphological and physiological differentiation processes that are part of their life cycle [2].
The secondary metabolites that bacteria produce include aminoglycosides, polyketides, as well as small proteinaceous and peptide structures such as bacteriocins, oligopeptides and lipopeptides. These secondary metabolites may have bactericidal, immune suppression and tumor suppression properties and can be useful for human and veterinary medicine. Lipopeptides and polyketides have linear, cyclic or branched structures. Lipopeptides are generated by non-ribosomal peptide synthases (NRPSs) whilst polyketides are generated by polyketide synthases (PKSs) [3, 4].
The function that these metabolites have in their natural environment is not always known, but they are thought to provide a competitive advantage to the producing organism since many of these possess potent antibiotic activity [2]. It has also been suggested that antibiotics act as signaling molecules facilitating intra- or interspecies interactions within microbial communities [5].
Most of the antibiotics clinically used are microbial natural products or their derivatives [6]. In fact, of the 18,000 currently known bioactive compounds, 10,000 were described from the genus Streptomyces (Actinobacteria) [7]. Actinobacteria still are one of the most important producers of natural products that are currently applied as antibiotics, immunosuppresants, anticancer drugs, Anthelmintics and antifungals [8, 9].
The threat of multi-drug resistant pathogens puts at grave risk the advances of modern medicine [6, 10, 11] and yet, new antibiotics emerging in the markets are few. Drug discovery is expensive and the return on investment is difficult to predict. New products in the market are poorly sold because they are not prescribed in the hope to slow down development of resistance [6, 8].
Silent Biosynthetic Gene Clusters
With the onset of the genomic era, it became evident that Actinomycetes contain a largely untapped and unexplored potential for the production of secondary metabolites [2, 12]. Analyses of genome sequences have been revealing that each genome contains clusters to synthesize 20 or more secondary metabolites [13], which increases the chances of discovering novel bioactive natural products. Genome mining bioinformatics software detects biosynthetic gene clusters encoded in the genome, but bioinformatics programs alone will not lead to the discovery of new metabolites, since many of the secondary metabolism gene clusters are silent under laboratory conditions [14].
Secondary metabolite biosynthetic gene clusters remain silent until the required signals occur, which may be environmental or physiological [15]. However, it has been proposed that the majority of secondary metabolism gene clusters in Streptomyces are not silent, but are expressed at very low levels under laboratory conditions, so the transcription of these gene clusters is not sufficient to produce detectable amounts of novel secondary metabolites [16].
What is Genome Mining?
Genome mining consists on using genetic information to assess the potential of microorganisms to produce novel compounds [17]. Such analysis has to be followed by extensive experimental research [2] involving proteomics and metabolomics to confirm that the predicted gene cluster produces the target secondary metabolite [7]. Genome mining as a natural product discovery strategy is based on connecting an unknown structure of a natural product with its corresponding biosynthetic genes by applied biosynthetic knowledge. As proposed by Nett [18] genome mining involves "basic in silico analyses to aid in the proposal of putative genes and putative products, as well as
the emerging chemical or genetic methods that are applied to trace the metabolic products of the (putative genes)". New methods are necessary that allow linking conclusively a gene cluster and a natural product [19]. Several in silico and experiment-guided approaches have been developed for this purpose.
The first step in a genome mining approach is to identify the putative biosynthetic gene clusters in the genome sequence [17]. In the second step, once putative clusters have been identified, it is necessary to predict the biosynthetic products resulting from the enzymes encoded in the cluster [17].
Genome mining consists not only of the in silico determination of a gene cluster, but also in the activation of a cryptic gene cluster [20]. In fact, genome mining is typically accompanied by proteome and/or metabolome analyses to accurately link a metabolite to its biosynthetic gene cluster [17, 21]. Such a connection may help to ascertain the novelty of a compound, guide fractionation/identification and allow heterologous expression in a suitable host [19].
A gene cluster
is a set of co-localized and co-regulated genes, whose products are functionally connected [22]. A secondary metabolite biosynthetic gene cluster normally does not contain more than 20 genes. Identifying the gene clusters that are likely to encode new molecules is a key priority for genome mining. Thus, it is necessary to compare the predicted to known gene clusters and to predict the structure of the putative product, as summarized in Fig. (2), which is extremely challenging [23].
Ultimately, a genome mining approach should be more efficient to find bioactive secondary metabolites that are novel so as to avoid re-discovering the same compounds over and over again. As summarized in Fig. (1), traditionally, the isolation of natural products with bioactivity relied on screening extracts to detect bioactivity, followed by separation and characterization of the compounds by assay-guided fractionation [24].
Fig. (1))
The traditional approach compared to a genome mining approach for the discovery of novel compounds with bioactivity. The main difference between the traditional approach and a genome mining approach is that at a genome mining approach provides information about the genes involved in the biosynthesis of a metabolite. Figure based on [13].
Fig. (2))
Summarized steps involved in a genome mining approach. The greatest challenge of a genome mining approach is to link a produced metabolite with the genes involved in the biosynthesis of the metabolite. To propose the biosynthetic pathway of a metabolite, it is necessary to carry out the analysis of a genome sequence aided by bioinformatics platforms for the prediction of secondary metabolite gene clusters. Experimental work is required to confirm the computational prediction of the genes involved in the production of the metabolite and the actual structure of the metabolite.
Genome Mining Approaches and the Awakening of Silent Gene Clusters
Similarly to Choi et al. [13], for Gomez-Escribano and Bibb [25] the activation of the expression of silent gene clusters or the heterologous expression in a suitable organism are approaches to genome mining. For [25] genome mining can be defined as the use of bioinformatics, molecular genetics and natural product analytical chemistry to access the metabolic product of a gene cluster found in the genome of an organism
.
Bachmann et al. in 2014 [26] reported that genome mining now includes the bioinformatic prediction of clusters, the control of gene expression and the identification of new metabolites. The reason a genome mining approach includes strategies for the awakening of silent clusters is that, if they are not awakened, it will not be possible to produce the metabolite for its identification and bioactivity tests [27].
In agreement with the view that genome mining may be defined as the process of technically translating secondary metabolite-encoding gene sequence data into purified molecules in tubes
[26], in the following pages we have laid out a description of in silico biosynthetic gene cluster prediction software platforms and we have complied a list of online databases created to store information on secondary metabolites, biosynthetic genes and predictions. The in silico prediction section is followed by a metabolomics section to describe experimental metabolomics techniques that have the aim of identifying and characterizing secondary metabolites. It has been of particular interest for us to discuss how the data generated by metabolomics studies has been integrated to data generated by bioinformatics platforms to link a metabolite to its gene cluster.
Homologous vs. Heterologous Expression
Genome-mining efforts can be organized into two categories: homologous and heterologous expression. Homologous expression of secondary metabolites tries to elicit the expression of secondary metabolites in the encoding producing organism [26], i.e., a gene or gene cluster is being homologously
expressed when the organism that encodes the cluster in its genome is expressing the genes in the cluster. When the clusters are silent, strategies must be sought to force the expression of the clusters by the organism that encodes the cluster. Once the cluster has been expressed by the encoding-organism, the product of the cluster can be identified by comparison to non-expressing cells, then purified and analyzed. Several strategies have been reported in the literature to awaken the expression of secondary metabolite gene clusters, such as modification of the growth conditions, co-culturing various strains together or genetic engineering approaches that involve over-expressing a native transcriptional activator within the native strain [27]. In contrast to homologous expression, a gene cluster that has been isolated or amplified from one organism and has been introduced into another, perhaps more suitable organism for its expression is being heterologously
expressed by that new host.
For a review on heterologous expression of actinomycete genes in Streptomyces coelicolor A3(2), we invite the reader to consult the 2014 work of Gomez-Escribano and Bibb [25], who have reviewed publications involving cloning of genome fragments from diverse Actinomycetes to express them in the host S. coelicolor A3(2). According to [25], the cases when heterologous expression has been favored over homologous expression are normally due to the difficulty to study clusters in the natural hosts. For example:
When deletion of the cluster or gene cannot be achieved to validate a host strain.
When it is desired to generate novel unnatural chemical structures by combining genes from a variety of pathways to create a new unnatural pathway.
To study the function of particular genes from an imported
cluster in order to propose the biosynthetic pathway of a metabolite.
To optimize the production of a metabolite.
It is possible to modify the genetic content of the cluster by changing the native promoters, eliminating negative regulators or re-coding the codon to generate mutants of an enzyme, as detailed in [25]. The metabolites produced by the wild-type strain and the heterologous host can be compared to find the metabolite encoded in the foreign genes [28].
New approaches are required to identify and characterize the silent gene clusters that appear to be present among all Actinomycetes [19]. Therefore, the aim of this chapter is to review the latest proposed strategies for the computational identification of secondary metabolite biosynthetic gene clusters in the genome, as well as the most recent experimental methodologies that have succeeded at linking novel natural products with their biosynthetic gene cluster. Also, considering that Actinomycetes produce more than 70% of the natural product scaffolds of clinically-used anti-infective compounds [2], a summary of some of the most recent natural products synthesized by Actinobacteria are included.
Bioinformatics Tools and Databases for Secondary Metabolite Discovery
The current excess of genetic information encoding uncharacterized proteins [29] requires a variety of computational tools that have been developed to aid scientists in genome mining, as summarized in Fig. (3) [7].
Fig. (3))
Tools for the mining of genomes. Several programs and databases have been created to analyze genome sequences to predict genes clusters involved in the production of secondary metabolites and their putative metabolic products.
Bioinformatics platforms are essential in the search for secondary metabolite clusters because many organisms cannot be isolated or cultured in a laboratory [2, 30]. The key feature these programs exploit is the high degree of sequence similarity of the catalytic domains from enzymes involved in secondary metabolite biosynthesis, in spite of the immense chemical diversity of secondary metabolites [17]. In most bacteria, a secondary metabolite biosynthetic cluster includes not only the genes involved in the biosynthesis of the secondary metabolite but also regulators, transporters, and genes involved in conferring resistance to the metabolite produced. Computational screening of genes is a way to complement experimental assays where extracts or purified compounds are tested against specific targets with the goal of finding bioactive compounds [17]. Computationally identified biosynthetic clusters can be cloned or synthesized for heterologous expression [30].
The application of bioinformatics tools in the search for polyketide (PK) and non-ribosomal peptide (NRP) pathways requires browsing genetic sequences to pinpoint the location of the putative pathways by comparing with an ortholog of a known protein from the pathway, for which conserved catalytic domains are often used [23]. Once the biosynthetic gene cluster has been located, it is necessary to identify all the genes involved in the biosynthesis of the metabolite, which includes all genes encoding polyketide synthases (PKS), nonribosomal peptide synthases (NRPS), tailoring enzymes, biosynthetic genes, regulatory elements and resistance, which are typically tightly clustered together on the chromosome [23]. The automation of this second step of the process is challenging if some of the genes are located far from the core signature genes or if gene clusters that are located close together are merged into one cluster [23].
Often, NRPS and type I PKS enzymes work predictably, i.e. the order of recruitment for assembly of amino acids for NRPS or carboxylic acids for PKS is the same as the order of the catalytic domains. This insight into the architecture of the domains facilitates prediction of the structures they might produce based exclusively on genomic information [4]. Quoting a recent work by C. N. Boddy: the predicted connectivity of the individual building blocks selected by the A and AT domains is defined by the order of the A and AT coding regions in the gene cluster
[23].
Several tools have been developed to aid in genome mining of bacterial secondary metabolite gene clusters such as ClustScan, ClusterFinder, antiSMASH, PRISM, Pep2Path, and NRPquest, among several others. Hidden Markov Models are statistical models generated from multiple sequences that are superior to pairwise search methods such as BLAST to detect distantly related homologs [23].
ClustScan
The ClustScan program [31] contains descriptions of 170 natural product clusters. The program was developed to analyze modular clusters. What makes ClustScan unique is that it takes a top-down
approach to annotate gene clusters, so the cluster is also considered as a whole unit. Then the modules and domains are organized in a hierarchical way, so it is possible to predict the structures of the products [32].
The ClustScan program proceeds to construct the ClustScan database on its own, without any further manual curation. According to Weber [17], by 2014 the ClustScan database contained 57 entries on characterized PKS, 51 entries on NRPS and 62 entries on hybrid PKS/NRPS clusters. It is important to mention that the ClustScan program is a commercial program for the analysis of biosynthetic clusters.
ClusterFinder
ClusterFinder is a hidden Markov model-based probabilistic algorithm developed by Cimermancic et al. [33]. ClusterFinder aims to find clusters that are not well-characterized by converting the nucleotide sequence into a string of contiguous Pfam domains. Each domain is then assigned a probability of being part of a given gene cluster based on how frequently these domains occur in the ClusterFinder datasets [33].
Cimermancic et al. [33] experimentally tested the ClusterFinder predictions of biosynthetic gene clusters and discovered an aryl polyene (APE) carboxylic acid.
AntiSMASH
AntiSMASH is so far the most comprehensive and user-friendly tool for the analysis and identification of secondary metabolite biosynthetic gene clusters in bacteria [17, 21]. The program antiSMASH, now available in version 3.0 [21] at http://antismash.secondarymetabolites.org, has undergone major improvements since its first release in 2011 [34].
AntiSMASH is a powerful prediction tool because (1) it permits a more detailed prediction of the clusters, since it allows BLAST searches of the predicted cluster to find the closest homologues in the database and (2) it allows the analysis of fragmented genomes and metagenomes [4]. AntiSMASH includes rule-based and statistics-based algorithms and offers various modules for pathway analysis [7]. The new version of antiSMASH has integrated the ClusterFinder algorithm developed by Cimermancic et al. [33], which no longer limits results exclusively to the detection of known biosynthetic gene clusters.
AntiSMASH also incorporates the CLUSEAN and NRPS predictor tools [17] and it also takes advantage of the conserved regions within clusters. In particular, it can detect conserved operons responsible for the biosynthesis of specific building blocks [17]. The antiSMASH pipeline involves the following steps [35]:
Genes are predicted from the genome using Glimmer3.
Biosynthetic gene clusters are identified using profile Hidden Markov Models in order to search databases for homologous sequences.
Biosynthetic clusters are automatically annotated.
The core chemical structure of natural products is predicted based on the annotated gene clusters.
Fig. (4))
Screenshot of antiSMASH output. Identification of secondary metabolite biosynthetic gene clusters of S. coelicolor (NCBI accession number GCA_000203835.1) by antiSMASH. Biosynthetic gene clusters are identified, classified and listed according to the type of the potential metabolite.
The predicted secondary clusters that antiSMASH yields can be: NRPS, PKS, hybrid PKS/NRPS, siderophore, bacteriocin, lantibiotic [4]. AntiSMASH includes the ClusterBlast algorithm to quantify similarity between query clusters and close homolog clusters deposited in the NCBI database [23]. AntiSMASH defines clusters as groups of signature genes within 10 kb of each other and extends the cluster 20 kb on each side of the last signature gene for defining the boundaries of clusters to ensure that no important genes are left out from the predicted gene cluster [23]. Two screenshots have been included below of the output obtained after analyzing the S. coelicolor genome sequence (accession number: GCA_ 000203835.1) with the program antiSMASH. Fig. (4) shows a list of all the putative biosynthetic clusters in the genome sequence predicted by antiSMASH while Fig. (5) shows the resulting output of selecting one of the gene clusters.
Fig. (5))
Identification of biosynthetic gene cluster in antiSMASH. The biosynthetic, transport-related and regulatory genes are displayed including genomic information. The predicted core structure is shown according to the logical architecture of assembly.
PRISM
PRediction Informatics for Secondary Metabolomes (PRISM) is an open-source web application for the genomic prediction, as well as bio- and chemo-informatic dereplication of nonribosomal peptides, type I and II polyketide chemical structures including transacting acyltransferase or adenylation domains [36]. The program PRISM, available since 2015 at http://magarveylab.ca/prism/, can identify enzyme domains and regulatory genes associated with natural product biosynthesis and resistance. Also, a combinatorial library of natural product scaffolds are suggested by the identification of each monomer using an algorithm that accounts for all combinations of enzyme substrates consistent with known biosynthetic logic. Moreover, the set of predicted chemical structures is compared to a database of 49,860 known natural products via the Tanimoto coefficient in order to chemo-informatically dereplicate known natural products [36].
The enzyme domains associated with specialized metabolites are identified through a library of 479 hidden Markov models. The hypothetical domains are grouped into putative biosynthetic gene clusters and biosynthetically plausible open reading frame permutations are generated. Fig. (6) shows a screenshot of the predictions by PRISM from the analysis of the genome sequence of an Actinobacteria.
Fig. (6))
Screenshot of PRISM output. Identification of the genes that compose a biosynthetic gene cluster from the genome sequence of an Actinobacteria as determined by PRISM.
Pep2Path
Once antiSMASH has identified possible gene clusters and predicted putative NRPSs, the program NRPSPredictor2, developed by Röttig et al. in 2011 [37], can be used to obtain substrate specificity predictions of NRPSs and possible amino acids had to be compared manually [38]. Pep2Path is a program that aims to bridge the gap between antiSMASH and NRPSPredictor2. The creation of Pep2Path [38] was motivated by the tediousness of manually having to match possible amino acid sequences to substrate specificity predictions. Pep2Path uses mass shift sequence tags detected by tandem mass spectrometry to automatically identify candidate biosynthetic gene clusters of either NonRibosomally synthe-sized Peptides (NRPs) or Ribosomally-synthesized and Post-translationally- modified Peptides (RiPPs) [38].
The inputs for Pep2Path can be either: (1) the MS mass shift sequences or the amino acid search tags (MS mass shift sequences translated into amino acids) or (2) the biosynthetic gene clusters plus substrate specificity predictions generated by antiSMASH and NRPSPredictor2. Pep2Path will then bridge both, producing an output that consists of all the possible matches between the amino acid problem
sequence and all potential assembly steps that have the highest likelihood of generating peptidic natural products containing the sequence [35, 38]. Pep2Path is freely available at http://pep2path.sourceforge.net/.
NRPquest
The current problem of NRP analysis is that standard de novo sequencing tools were developed for linear peptides and cannot be applied to identify cyclic and branched NRPS peptides [39]. Mohimani et al. [39] presented in 2014 NRPquest, a resource to couple Mass Spectrometry and genome mining for the discovery of nonribosomal peptides. NRPquest performs mutation-tolerant and modification-tolerant searches of spectral datasets against a database of putative NRPs. NRPquest is available at www.cyclo.ucsd.edu, and it first generates a database of putative NRPs extracted from the genome using the nonribosomal code. The putative NRPs are matched to MS/MS spectra in the database to identify the NRP [39].
Genome-to-Natural Products (GNP)
Johnston et al. [40] proposed using LC-MS/MS data of crude extracts to discover natural products. Their proposed Genome-to-Natural Products platform, available at http://magarveylab.ca/gnp, is a tool that can generate natural product predictions from LC-MS/MS data [40]. The Genome-to-Natural Products platform uses hidden Markov models and predicts chemical structures based on genome-guided prediction of biosynthetic gene clusters and identification of modules, domains and substrate specificities [40].
The Motif Density Method (MDM)
The main disadvantage of similarity-based programs for the detection of secondary metabolite gene clusters, such as those described above, is that there is actually a limited number of known tested clusters that can serve as a template. As a result, these programs may overestimate the length of a cluster or may not be able to differentiate between adjacent clusters [22].
The Motif Density Method (MDM) is an entirely different approach proposed by Wolf et al. [22] that consists of determining in silico transcription factor binding site occurrences in promoter regions to predict gene clusters and potential regulators of the clusters. It can also help discover clusters co-regulated by the same transcription factors [22]. Cluster-specific transcription factors that bind in several locations within the gene cluster must be detectable by a common motif in the promoter regions. The program that can identify common motifs is MEME [41]. The Motif Density Method approach is an interesting alternative approach to the similarity-based programs for in silico prediction of secondary metabolite gene clusters. It opens up exciting prospects for the future of genome mining.
Additional Bioinformatics Platforms
There are several other bioinformatics platforms mentioned in genome mining research articles such as:
The bioinformatics platform MicroScope [42] enables visualizing genome synteny in annotated bacterial genomes. Genome synteny (colocalization of genetic loci) may also provide information on gene function [43].
The Natural Product Domain Seeker (NaPDoS) [44] uses hidden Markov models to identify NRPSs and PKSs in bacterial genomes [4]. NaPDoS analysis provides excellent identification of the KS and C domains [23].
The MIDDAS-M software [45] uses gene expression data to identify and assess cooperatively transcribed gene clusters.
Thiofinder is a genome mining tool for thiopeptides [46].
PKMiner [47] is a combination of database and program for genome mining. The database contains 42 characterized and 230 uncharacterized type II PKS clusters of Actinomycetes.
NRPSpredictor2 [37] is a web server for predicting NRPS adenylation domain specificity. The A domains select the amino acid building blocks. According to [23], in NRP pathways, the substrate selectivity of the A domains can be predicted.
BAGEL3 [48] is a program that allows the identification and analysis of bacteriocins as well as ribosomally synthesized peptides and post-translationally modified peptides (RiPPs).
NP searcher [49] predicts amino acid composition and connectivity, backbone heterocyclization, tailoring chemistries including dimerization, hetero-cyclization and glycosylation [17].
SBSPKS [50] is a tool for predicting the order of substrate addition that takes into account that not all PK and NRP biosynthetic pathways follow co-linearity.
Databases for Genome Mining
The analysis of sequences is time-consuming and generates a large amount of data regarding the specificities of domains and the chemical structures of the products. To incorporate the accumulated information [32], several databases have been developed, among them IMG-ABC, StreptomeDB, DoBISCUIT, ClusterMine360, Norine and Protein Data Bank (PDB).
Integrated Microbial Genomes-Atlas of Biosynthetic Clusters (IMG-ABC)
Hadjithomas et al. presented in 2015 IMG-ABC [30], the largest publicly available database of biosynthetic gene clusters, which permits to screen genomic and metagenomic data for secondary metabolite clusters. IMG-ABC is a platform that integrates powerful search and analysis tools and is available at https://img.jgi.doe.gov/abc. IMG-ABC aims to expand constantly to become an essential bioinformatics tool in the search for secondary metabolites.
StreptomeDB
StreptomeDB, now available in version 2.0 [51] is a database that