Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Frontiers in Clinical Drug Research - Anti Infectives: Volume 3
Frontiers in Clinical Drug Research - Anti Infectives: Volume 3
Frontiers in Clinical Drug Research - Anti Infectives: Volume 3
Ebook454 pages5 hours

Frontiers in Clinical Drug Research - Anti Infectives: Volume 3

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Frontiers in Clinical Drug Research – Anti infectives is an eBook series that brings updated reviews to readers interested in learning about advances in the development of pharmaceutical agents for the treatment of infectious diseases. The scope of the eBook series covers a range of topics including the chemistry, pharmacology, molecular biology and biochemistry of natural and synthetic drugs employed in the treatment of infectious diseases. Reviews in this series also include research on multi drug resistance and pre-clinical / clinical findings on novel antibiotics, vaccines, antifungal agents and antitubercular agents. Frontiers in Clinical Drug Research – Anti infectives is a valuable resource for pharmaceutical scientists and postgraduate students seeking updated and critically important information for developing clinical trials and devising research plans in the field of anti infective drug discovery and epidemiology.

The third volume of this series features reviews that cover a variety of topics including:

-Geomic mining and metabolomic techniques for developing antimcrobials

-Probiotic use in complementary antiretroviral therapy

-Anti-HIV pharmaceuticals

-Phytochemicals used for antimicrobial purposes

- Antimicrobial photodynamic therapy (APDT)
LanguageEnglish
Release dateApr 7, 2017
ISBN9781681083698
Frontiers in Clinical Drug Research - Anti Infectives: Volume 3
Author

Atta-ur Rahman

Atta-ur-Rahman, Professor Emeritus, International Center for Chemical and Biological Sciences (H. E. J. Research Institute of Chemistry and Dr. Panjwani Center for Molecular Medicine and Drug Research), University of Karachi, Pakistan, was the Pakistan Federal Minister for Science and Technology (2000-2002), Federal Minister of Education (2002), and Chairman of the Higher Education Commission with the status of a Federal Minister from 2002-2008. He is a Fellow of the Royal Society of London (FRS) and an UNESCO Science Laureate. He is a leading scientist with more than 1283 publications in several fields of organic chemistry.

Read more from Atta Ur Rahman

Related to Frontiers in Clinical Drug Research - Anti Infectives

Related ebooks

Chemistry For You

View More

Related articles

Reviews for Frontiers in Clinical Drug Research - Anti Infectives

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Frontiers in Clinical Drug Research - Anti Infectives - Atta-ur Rahman

    Modern Approaches to Genome Mining for the Development of New Anti-infectives: In Silico Gene Prediction and Experimental Metabolomics

    INTRODUCTION

    More than half of the known natural products that have antimicrobial, antiviral or antitumor activity originate from only five cultivated bacterial groups: filamentous Actinomycetes, Myxobacteria, Cyanobacteria, as well as members of the genera Pseudomonas and Bacillus [1]. Actinomycetes are Gram-positive mycelial bacteria found mainly in the soil, but are also present in symbiotic association with terrestrial and aquatic invertebrates [2]. Actinomycetes produce metabolites as they undergo the morphological and physiological differentiation processes that are part of their life cycle [2].

    The secondary metabolites that bacteria produce include aminoglycosides, polyketides, as well as small proteinaceous and peptide structures such as bacteriocins, oligopeptides and lipopeptides. These secondary metabolites may have bactericidal, immune suppression and tumor suppression properties and can be useful for human and veterinary medicine. Lipopeptides and polyketides have linear, cyclic or branched structures. Lipopeptides are generated by non-ribosomal peptide synthases (NRPSs) whilst polyketides are generated by polyketide synthases (PKSs) [3, 4].

    The function that these metabolites have in their natural environment is not always known, but they are thought to provide a competitive advantage to the producing organism since many of these possess potent antibiotic activity [2]. It has also been suggested that antibiotics act as signaling molecules facilitating intra- or interspecies interactions within microbial communities [5].

    Most of the antibiotics clinically used are microbial natural products or their derivatives [6]. In fact, of the 18,000 currently known bioactive compounds, 10,000 were described from the genus Streptomyces (Actinobacteria) [7]. Actinobacteria still are one of the most important producers of natural products that are currently applied as antibiotics, immunosuppresants, anticancer drugs, Anthelmintics and antifungals [8, 9].

    The threat of multi-drug resistant pathogens puts at grave risk the advances of modern medicine [6, 10, 11] and yet, new antibiotics emerging in the markets are few. Drug discovery is expensive and the return on investment is difficult to predict. New products in the market are poorly sold because they are not prescribed in the hope to slow down development of resistance [6, 8].

    Silent Biosynthetic Gene Clusters

    With the onset of the genomic era, it became evident that Actinomycetes contain a largely untapped and unexplored potential for the production of secondary metabolites [2, 12]. Analyses of genome sequences have been revealing that each genome contains clusters to synthesize 20 or more secondary metabolites [13], which increases the chances of discovering novel bioactive natural products. Genome mining bioinformatics software detects biosynthetic gene clusters encoded in the genome, but bioinformatics programs alone will not lead to the discovery of new metabolites, since many of the secondary metabolism gene clusters are silent under laboratory conditions [14].

    Secondary metabolite biosynthetic gene clusters remain silent until the required signals occur, which may be environmental or physiological [15]. However, it has been proposed that the majority of secondary metabolism gene clusters in Streptomyces are not silent, but are expressed at very low levels under laboratory conditions, so the transcription of these gene clusters is not sufficient to produce detectable amounts of novel secondary metabolites [16].

    What is Genome Mining?

    Genome mining consists on using genetic information to assess the potential of microorganisms to produce novel compounds [17]. Such analysis has to be followed by extensive experimental research [2] involving proteomics and metabolomics to confirm that the predicted gene cluster produces the target secondary metabolite [7]. Genome mining as a natural product discovery strategy is based on connecting an unknown structure of a natural product with its corresponding biosynthetic genes by applied biosynthetic knowledge. As proposed by Nett [18] genome mining involves "basic in silico analyses to aid in the proposal of putative genes and putative products, as well as the emerging chemical or genetic methods that are applied to trace the metabolic products of the (putative genes)". New methods are necessary that allow linking conclusively a gene cluster and a natural product [19]. Several in silico and experiment-guided approaches have been developed for this purpose.

    The first step in a genome mining approach is to identify the putative biosynthetic gene clusters in the genome sequence [17]. In the second step, once putative clusters have been identified, it is necessary to predict the biosynthetic products resulting from the enzymes encoded in the cluster [17].

    Genome mining consists not only of the in silico determination of a gene cluster, but also in the activation of a cryptic gene cluster [20]. In fact, genome mining is typically accompanied by proteome and/or metabolome analyses to accurately link a metabolite to its biosynthetic gene cluster [17, 21]. Such a connection may help to ascertain the novelty of a compound, guide fractionation/identification and allow heterologous expression in a suitable host [19].

    A gene cluster is a set of co-localized and co-regulated genes, whose products are functionally connected [22]. A secondary metabolite biosynthetic gene cluster normally does not contain more than 20 genes. Identifying the gene clusters that are likely to encode new molecules is a key priority for genome mining. Thus, it is necessary to compare the predicted to known gene clusters and to predict the structure of the putative product, as summarized in Fig. (2), which is extremely challenging [23].

    Ultimately, a genome mining approach should be more efficient to find bioactive secondary metabolites that are novel so as to avoid re-discovering the same compounds over and over again. As summarized in Fig. (1), traditionally, the isolation of natural products with bioactivity relied on screening extracts to detect bioactivity, followed by separation and characterization of the compounds by assay-guided fractionation [24].

    Fig. (1))

    The traditional approach compared to a genome mining approach for the discovery of novel compounds with bioactivity. The main difference between the traditional approach and a genome mining approach is that at a genome mining approach provides information about the genes involved in the biosynthesis of a metabolite. Figure based on [13].

    Fig. (2))

    Summarized steps involved in a genome mining approach. The greatest challenge of a genome mining approach is to link a produced metabolite with the genes involved in the biosynthesis of the metabolite. To propose the biosynthetic pathway of a metabolite, it is necessary to carry out the analysis of a genome sequence aided by bioinformatics platforms for the prediction of secondary metabolite gene clusters. Experimental work is required to confirm the computational prediction of the genes involved in the production of the metabolite and the actual structure of the metabolite.

    Genome Mining Approaches and the Awakening of Silent Gene Clusters

    Similarly to Choi et al. [13], for Gomez-Escribano and Bibb [25] the activation of the expression of silent gene clusters or the heterologous expression in a suitable organism are approaches to genome mining. For [25] genome mining can be defined as the use of bioinformatics, molecular genetics and natural product analytical chemistry to access the metabolic product of a gene cluster found in the genome of an organism.

    Bachmann et al. in 2014 [26] reported that genome mining now includes the bioinformatic prediction of clusters, the control of gene expression and the identification of new metabolites. The reason a genome mining approach includes strategies for the awakening of silent clusters is that, if they are not awakened, it will not be possible to produce the metabolite for its identification and bioactivity tests [27].

    In agreement with the view that genome mining may be defined as the process of technically translating secondary metabolite-encoding gene sequence data into purified molecules in tubes [26], in the following pages we have laid out a description of in silico biosynthetic gene cluster prediction software platforms and we have complied a list of online databases created to store information on secondary metabolites, biosynthetic genes and predictions. The in silico prediction section is followed by a metabolomics section to describe experimental metabolomics techniques that have the aim of identifying and characterizing secondary metabolites. It has been of particular interest for us to discuss how the data generated by metabolomics studies has been integrated to data generated by bioinformatics platforms to link a metabolite to its gene cluster.

    Homologous vs. Heterologous Expression

    Genome-mining efforts can be organized into two categories: homologous and heterologous expression. Homologous expression of secondary metabolites tries to elicit the expression of secondary metabolites in the encoding producing organism [26], i.e., a gene or gene cluster is being homologously expressed when the organism that encodes the cluster in its genome is expressing the genes in the cluster. When the clusters are silent, strategies must be sought to force the expression of the clusters by the organism that encodes the cluster. Once the cluster has been expressed by the encoding-organism, the product of the cluster can be identified by comparison to non-expressing cells, then purified and analyzed. Several strategies have been reported in the literature to awaken the expression of secondary metabolite gene clusters, such as modification of the growth conditions, co-culturing various strains together or genetic engineering approaches that involve over-expressing a native transcriptional activator within the native strain [27]. In contrast to homologous expression, a gene cluster that has been isolated or amplified from one organism and has been introduced into another, perhaps more suitable organism for its expression is being heterologously expressed by that new host.

    For a review on heterologous expression of actinomycete genes in Streptomyces coelicolor A3(2), we invite the reader to consult the 2014 work of Gomez-Escribano and Bibb [25], who have reviewed publications involving cloning of genome fragments from diverse Actinomycetes to express them in the host S. coelicolor A3(2). According to [25], the cases when heterologous expression has been favored over homologous expression are normally due to the difficulty to study clusters in the natural hosts. For example:

    When deletion of the cluster or gene cannot be achieved to validate a host strain.

    When it is desired to generate novel unnatural chemical structures by combining genes from a variety of pathways to create a new unnatural pathway.

    To study the function of particular genes from an imported cluster in order to propose the biosynthetic pathway of a metabolite.

    To optimize the production of a metabolite.

    It is possible to modify the genetic content of the cluster by changing the native promoters, eliminating negative regulators or re-coding the codon to generate mutants of an enzyme, as detailed in [25]. The metabolites produced by the wild-type strain and the heterologous host can be compared to find the metabolite encoded in the foreign genes [28].

    New approaches are required to identify and characterize the silent gene clusters that appear to be present among all Actinomycetes [19]. Therefore, the aim of this chapter is to review the latest proposed strategies for the computational identification of secondary metabolite biosynthetic gene clusters in the genome, as well as the most recent experimental methodologies that have succeeded at linking novel natural products with their biosynthetic gene cluster. Also, considering that Actinomycetes produce more than 70% of the natural product scaffolds of clinically-used anti-infective compounds [2], a summary of some of the most recent natural products synthesized by Actinobacteria are included.

    Bioinformatics Tools and Databases for Secondary Metabolite Discovery

    The current excess of genetic information encoding uncharacterized proteins [29] requires a variety of computational tools that have been developed to aid scientists in genome mining, as summarized in Fig. (3) [7].

    Fig. (3))

    Tools for the mining of genomes. Several programs and databases have been created to analyze genome sequences to predict genes clusters involved in the production of secondary metabolites and their putative metabolic products.

    Bioinformatics platforms are essential in the search for secondary metabolite clusters because many organisms cannot be isolated or cultured in a laboratory [2, 30]. The key feature these programs exploit is the high degree of sequence similarity of the catalytic domains from enzymes involved in secondary metabolite biosynthesis, in spite of the immense chemical diversity of secondary metabolites [17]. In most bacteria, a secondary metabolite biosynthetic cluster includes not only the genes involved in the biosynthesis of the secondary metabolite but also regulators, transporters, and genes involved in conferring resistance to the metabolite produced. Computational screening of genes is a way to complement experimental assays where extracts or purified compounds are tested against specific targets with the goal of finding bioactive compounds [17]. Computationally identified biosynthetic clusters can be cloned or synthesized for heterologous expression [30].

    The application of bioinformatics tools in the search for polyketide (PK) and non-ribosomal peptide (NRP) pathways requires browsing genetic sequences to pinpoint the location of the putative pathways by comparing with an ortholog of a known protein from the pathway, for which conserved catalytic domains are often used [23]. Once the biosynthetic gene cluster has been located, it is necessary to identify all the genes involved in the biosynthesis of the metabolite, which includes all genes encoding polyketide synthases (PKS), nonribosomal peptide synthases (NRPS), tailoring enzymes, biosynthetic genes, regulatory elements and resistance, which are typically tightly clustered together on the chromosome [23]. The automation of this second step of the process is challenging if some of the genes are located far from the core signature genes or if gene clusters that are located close together are merged into one cluster [23].

    Often, NRPS and type I PKS enzymes work predictably, i.e. the order of recruitment for assembly of amino acids for NRPS or carboxylic acids for PKS is the same as the order of the catalytic domains. This insight into the architecture of the domains facilitates prediction of the structures they might produce based exclusively on genomic information [4]. Quoting a recent work by C. N. Boddy: the predicted connectivity of the individual building blocks selected by the A and AT domains is defined by the order of the A and AT coding regions in the gene cluster [23].

    Several tools have been developed to aid in genome mining of bacterial secondary metabolite gene clusters such as ClustScan, ClusterFinder, antiSMASH, PRISM, Pep2Path, and NRPquest, among several others. Hidden Markov Models are statistical models generated from multiple sequences that are superior to pairwise search methods such as BLAST to detect distantly related homologs [23].

    ClustScan

    The ClustScan program [31] contains descriptions of 170 natural product clusters. The program was developed to analyze modular clusters. What makes ClustScan unique is that it takes a top-down approach to annotate gene clusters, so the cluster is also considered as a whole unit. Then the modules and domains are organized in a hierarchical way, so it is possible to predict the structures of the products [32].

    The ClustScan program proceeds to construct the ClustScan database on its own, without any further manual curation. According to Weber [17], by 2014 the ClustScan database contained 57 entries on characterized PKS, 51 entries on NRPS and 62 entries on hybrid PKS/NRPS clusters. It is important to mention that the ClustScan program is a commercial program for the analysis of biosynthetic clusters.

    ClusterFinder

    ClusterFinder is a hidden Markov model-based probabilistic algorithm developed by Cimermancic et al. [33]. ClusterFinder aims to find clusters that are not well-characterized by converting the nucleotide sequence into a string of contiguous Pfam domains. Each domain is then assigned a probability of being part of a given gene cluster based on how frequently these domains occur in the ClusterFinder datasets [33].

    Cimermancic et al. [33] experimentally tested the ClusterFinder predictions of biosynthetic gene clusters and discovered an aryl polyene (APE) carboxylic acid.

    AntiSMASH

    AntiSMASH is so far the most comprehensive and user-friendly tool for the analysis and identification of secondary metabolite biosynthetic gene clusters in bacteria [17, 21]. The program antiSMASH, now available in version 3.0 [21] at http://antismash.secondarymetabolites.org, has undergone major improvements since its first release in 2011 [34].

    AntiSMASH is a powerful prediction tool because (1) it permits a more detailed prediction of the clusters, since it allows BLAST searches of the predicted cluster to find the closest homologues in the database and (2) it allows the analysis of fragmented genomes and metagenomes [4]. AntiSMASH includes rule-based and statistics-based algorithms and offers various modules for pathway analysis [7]. The new version of antiSMASH has integrated the ClusterFinder algorithm developed by Cimermancic et al. [33], which no longer limits results exclusively to the detection of known biosynthetic gene clusters.

    AntiSMASH also incorporates the CLUSEAN and NRPS predictor tools [17] and it also takes advantage of the conserved regions within clusters. In particular, it can detect conserved operons responsible for the biosynthesis of specific building blocks [17]. The antiSMASH pipeline involves the following steps [35]:

    Genes are predicted from the genome using Glimmer3.

    Biosynthetic gene clusters are identified using profile Hidden Markov Models in order to search databases for homologous sequences.

    Biosynthetic clusters are automatically annotated.

    The core chemical structure of natural products is predicted based on the annotated gene clusters.

    Fig. (4))

    Screenshot of antiSMASH output. Identification of secondary metabolite biosynthetic gene clusters of S. coelicolor (NCBI accession number GCA_000203835.1) by antiSMASH. Biosynthetic gene clusters are identified, classified and listed according to the type of the potential metabolite.

    The predicted secondary clusters that antiSMASH yields can be: NRPS, PKS, hybrid PKS/NRPS, siderophore, bacteriocin, lantibiotic [4]. AntiSMASH includes the ClusterBlast algorithm to quantify similarity between query clusters and close homolog clusters deposited in the NCBI database [23]. AntiSMASH defines clusters as groups of signature genes within 10 kb of each other and extends the cluster 20 kb on each side of the last signature gene for defining the boundaries of clusters to ensure that no important genes are left out from the predicted gene cluster [23]. Two screenshots have been included below of the output obtained after analyzing the S. coelicolor genome sequence (accession number: GCA_ 000203835.1) with the program antiSMASH. Fig. (4) shows a list of all the putative biosynthetic clusters in the genome sequence predicted by antiSMASH while Fig. (5) shows the resulting output of selecting one of the gene clusters.

    Fig. (5))

    Identification of biosynthetic gene cluster in antiSMASH. The biosynthetic, transport-related and regulatory genes are displayed including genomic information. The predicted core structure is shown according to the logical architecture of assembly.

    PRISM

    PRediction Informatics for Secondary Metabolomes (PRISM) is an open-source web application for the genomic prediction, as well as bio- and chemo-informatic dereplication of nonribosomal peptides, type I and II polyketide chemical structures including transacting acyltransferase or adenylation domains [36]. The program PRISM, available since 2015 at http://magarveylab.ca/prism/, can identify enzyme domains and regulatory genes associated with natural product biosynthesis and resistance. Also, a combinatorial library of natural product scaffolds are suggested by the identification of each monomer using an algorithm that accounts for all combinations of enzyme substrates consistent with known biosynthetic logic. Moreover, the set of predicted chemical structures is compared to a database of 49,860 known natural products via the Tanimoto coefficient in order to chemo-informatically dereplicate known natural products [36].

    The enzyme domains associated with specialized metabolites are identified through a library of 479 hidden Markov models. The hypothetical domains are grouped into putative biosynthetic gene clusters and biosynthetically plausible open reading frame permutations are generated. Fig. (6) shows a screenshot of the predictions by PRISM from the analysis of the genome sequence of an Actinobacteria.

    Fig. (6))

    Screenshot of PRISM output. Identification of the genes that compose a biosynthetic gene cluster from the genome sequence of an Actinobacteria as determined by PRISM.

    Pep2Path

    Once antiSMASH has identified possible gene clusters and predicted putative NRPSs, the program NRPSPredictor2, developed by Röttig et al. in 2011 [37], can be used to obtain substrate specificity predictions of NRPSs and possible amino acids had to be compared manually [38]. Pep2Path is a program that aims to bridge the gap between antiSMASH and NRPSPredictor2. The creation of Pep2Path [38] was motivated by the tediousness of manually having to match possible amino acid sequences to substrate specificity predictions. Pep2Path uses mass shift sequence tags detected by tandem mass spectrometry to automatically identify candidate biosynthetic gene clusters of either NonRibosomally synthe-sized Peptides (NRPs) or Ribosomally-synthesized and Post-translationally- modified Peptides (RiPPs) [38].

    The inputs for Pep2Path can be either: (1) the MS mass shift sequences or the amino acid search tags (MS mass shift sequences translated into amino acids) or (2) the biosynthetic gene clusters plus substrate specificity predictions generated by antiSMASH and NRPSPredictor2. Pep2Path will then bridge both, producing an output that consists of all the possible matches between the amino acid problem sequence and all potential assembly steps that have the highest likelihood of generating peptidic natural products containing the sequence [35, 38]. Pep2Path is freely available at http://pep2path.sourceforge.net/.

    NRPquest

    The current problem of NRP analysis is that standard de novo sequencing tools were developed for linear peptides and cannot be applied to identify cyclic and branched NRPS peptides [39]. Mohimani et al. [39] presented in 2014 NRPquest, a resource to couple Mass Spectrometry and genome mining for the discovery of nonribosomal peptides. NRPquest performs mutation-tolerant and modification-tolerant searches of spectral datasets against a database of putative NRPs. NRPquest is available at www.cyclo.ucsd.edu, and it first generates a database of putative NRPs extracted from the genome using the nonribosomal code. The putative NRPs are matched to MS/MS spectra in the database to identify the NRP [39].

    Genome-to-Natural Products (GNP)

    Johnston et al. [40] proposed using LC-MS/MS data of crude extracts to discover natural products. Their proposed Genome-to-Natural Products platform, available at http://magarveylab.ca/gnp, is a tool that can generate natural product predictions from LC-MS/MS data [40]. The Genome-to-Natural Products platform uses hidden Markov models and predicts chemical structures based on genome-guided prediction of biosynthetic gene clusters and identification of modules, domains and substrate specificities [40].

    The Motif Density Method (MDM)

    The main disadvantage of similarity-based programs for the detection of secondary metabolite gene clusters, such as those described above, is that there is actually a limited number of known tested clusters that can serve as a template. As a result, these programs may overestimate the length of a cluster or may not be able to differentiate between adjacent clusters [22].

    The Motif Density Method (MDM) is an entirely different approach proposed by Wolf et al. [22] that consists of determining in silico transcription factor binding site occurrences in promoter regions to predict gene clusters and potential regulators of the clusters. It can also help discover clusters co-regulated by the same transcription factors [22]. Cluster-specific transcription factors that bind in several locations within the gene cluster must be detectable by a common motif in the promoter regions. The program that can identify common motifs is MEME [41]. The Motif Density Method approach is an interesting alternative approach to the similarity-based programs for in silico prediction of secondary metabolite gene clusters. It opens up exciting prospects for the future of genome mining.

    Additional Bioinformatics Platforms

    There are several other bioinformatics platforms mentioned in genome mining research articles such as:

    The bioinformatics platform MicroScope [42] enables visualizing genome synteny in annotated bacterial genomes. Genome synteny (colocalization of genetic loci) may also provide information on gene function [43].

    The Natural Product Domain Seeker (NaPDoS) [44] uses hidden Markov models to identify NRPSs and PKSs in bacterial genomes [4]. NaPDoS analysis provides excellent identification of the KS and C domains [23].

    The MIDDAS-M software [45] uses gene expression data to identify and assess cooperatively transcribed gene clusters.

    Thiofinder is a genome mining tool for thiopeptides [46].

    PKMiner [47] is a combination of database and program for genome mining. The database contains 42 characterized and 230 uncharacterized type II PKS clusters of Actinomycetes.

    NRPSpredictor2 [37] is a web server for predicting NRPS adenylation domain specificity. The A domains select the amino acid building blocks. According to [23], in NRP pathways, the substrate selectivity of the A domains can be predicted.

    BAGEL3 [48] is a program that allows the identification and analysis of bacteriocins as well as ribosomally synthesized peptides and post-translationally modified peptides (RiPPs).

    NP searcher [49] predicts amino acid composition and connectivity, backbone heterocyclization, tailoring chemistries including dimerization, hetero-cyclization and glycosylation [17].

    SBSPKS [50] is a tool for predicting the order of substrate addition that takes into account that not all PK and NRP biosynthetic pathways follow co-linearity.

    Databases for Genome Mining

    The analysis of sequences is time-consuming and generates a large amount of data regarding the specificities of domains and the chemical structures of the products. To incorporate the accumulated information [32], several databases have been developed, among them IMG-ABC, StreptomeDB, DoBISCUIT, ClusterMine360, Norine and Protein Data Bank (PDB).

    Integrated Microbial Genomes-Atlas of Biosynthetic Clusters (IMG-ABC)

    Hadjithomas et al. presented in 2015 IMG-ABC [30], the largest publicly available database of biosynthetic gene clusters, which permits to screen genomic and metagenomic data for secondary metabolite clusters. IMG-ABC is a platform that integrates powerful search and analysis tools and is available at https://img.jgi.doe.gov/abc. IMG-ABC aims to expand constantly to become an essential bioinformatics tool in the search for secondary metabolites.

    StreptomeDB

    StreptomeDB, now available in version 2.0 [51] is a database that

    Enjoying the preview?
    Page 1 of 1