Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Insect Molecular Biology and Biochemistry
Insect Molecular Biology and Biochemistry
Insect Molecular Biology and Biochemistry
Ebook2,010 pages26 hours

Insect Molecular Biology and Biochemistry

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

The publication of the extensive seven-volume work Comprehensive Molecular Insect Science provided a complete reference encompassing important developments and achievements in modern insect science. One of the most swiftly moving areas in entomological and comparative research is molecular biology, and this volume, Insect Molecular Biology and Biochemistry, is designed for those who desire a comprehensive yet concise work on important aspects of this topic.

This volume contains ten fully revised or rewritten chapters from the original series as well as five completely new chapters on topics such as insect immunology, insect genomics, RNAi, and molecular biology of circadian rhythms and circadian behavior. The topics included are key to an understanding of insect development, with emphasis on the cuticle, digestive properties, and the transport of lipids; extensive and integrated chapters on cytochrome P450s; and the role of transposable elements in the developmental processes as well as programmed cell death. This volume will be of great value to senior investigators, graduate students, post-doctoral fellows and advanced undergraduate research students. It can also be used as a reference for graduate courses and seminars on the topic. Chapters will also be valuable to the applied biologist or entomologist, providing the requisite understanding necessary for probing the more applied research areas related to insect control.

  • Topics specially selected by the editor-in-chief of the original major reference work
  • Fully revised and new contributions bring together the latest research in the rapidly moving fields of insect molecular biology and insect biochemistry, including coverage of development, physiology, immunity and proteomics
  • Full-color provides readers with clear, useful illustrations to highlight important research findings
LanguageEnglish
Release dateAug 16, 2011
ISBN9780123847485
Insect Molecular Biology and Biochemistry

Related to Insect Molecular Biology and Biochemistry

Related ebooks

Biology For You

View More

Related articles

Reviews for Insect Molecular Biology and Biochemistry

Rating: 5 out of 5 stars
5/5

2 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Insect Molecular Biology and Biochemistry - Lawrence I. Gilbert

    Table of Contents

    Cover image

    Front Matter

    Copyright

    Preface

    Contributors

    1. Insect Genomics

    1.1. Introduction

    1.2. Genome Sequencing

    1.3. Genome Analysis

    1.4. Proteomics

    1.5. Structural Genomics

    1.6. Metabolomics

    1.7. Systems Biology

    1.8. Conclusions and Future Prospects

    2. Insect MicroRNAs

    2.1. Introduction: The Big World of Small RNAs

    2.2. Biogenesis of miRNAs

    2.3. Mechanism of Action of miRNAs

    2.4. Identification of miRNAs in Insects

    2.5. Target Prediction

    2.6. miRNA Functions

    2.7. Conclusions and Perspectives

    3. Insect Transposable Elements

    3.1. Introduction

    3.2. Classification and Transposition Mechanisms of Eukaryotic Transposable Elements

    3.3. Methods to Uncover and Characterize Insect TEs

    3.4. Diversity and Characteristics of Insect TEs

    3.5. Search for Active TEs in Insect Genomes

    3.6. Evolution of Insect TEs

    3.7. TEs in Insect Populations

    3.8. Impact of TEs in Insects

    3.9. Applications of Insect TEs

    3.10. Summary

    4. Transposable Elements for Insect Transformation

    4.1. Introduction

    4.2. P Element Transformation

    4.3. Excision and Transposition Assays for Vector Mobility

    4.4. Transformation Marker Systems

    4.5. Transposon Vectors

    4.6. Transformation Methodology

    4.7. Summary

    5. Cuticular Proteins

    5.1. Introduction

    5.2. Cuticle Structure and Synthesis

    5.3. Classes of Proteins Found in Cuticles

    5.4. Genomic Information

    5.5. Interactions of Cuticular Proteins with Components of Cuticle

    5.6. Summary and Future Challenges

    6. Cuticular Sclerotization and Tanning

    6.1. Introduction

    6.2. A Model for Cuticular Sclerotization

    6.3. Sclerotization (Tanning) Precursors

    6.4. Transport of Sclerotization Precursors to the Cuticle

    6.5. Cuticular Enzymes and Sclerotization

    6.6. Control of Sclerotization

    6.7. Cuticular Darkening

    6.8. Cuticular Sclerotization in Insects Compared to That in Other Arthropods

    6.9. Unsolved Problems

    7. Chitin Metabolism in Insects

    7.1. Introduction

    7.3. Chitin Synthesis

    7.4. Chitin Degradation and Modification

    7.5. Chitin-Binding Proteins

    7.6. Chitin-Organizing Proteins

    7.7. Hormonal Regulation of Chitin Metabolism

    7.8. Chitin Metabolism and Insect Control

    7.9. Future Studies and Concluding Remarks

    8. Insect CYP Genes and P450 Enzymes

    8.1. Introduction

    8.2. Diversity and Evolution of Insect CYP Genes

    8.3. P450 Enzymes

    8.4. P450 Functions

    8.5. Regulation of P450 Gene Expression

    8.6. Working with Insect P450 Enzymes

    8.7. Conclusion and Prospects

    9. Lipid Transport

    9.1. Historical Perspective

    9.2. Flight-Related Processes

    9.3. Apolipophorin III

    9.4. Lipophorin Receptor Interactions

    9.5. Other Lipid-Binding Proteins

    10. Insect Proteases

    10.1. Introduction and History

    10.2. Proteases in Eggs and Embryos

    10.3. Hemolymph Plasma Proteases

    10.4. Cellular Proteases

    10.5. Conclusions and Future Prospects

    11. Biochemistry and Molecular Biology of Digestion

    11.1. Introduction

    11.2. Overview of the Digestive Process

    11.3. Midgut Conditions Affecting Enzyme Activity

    11.4. Digestion of Carbohydrates

    11.5. Digestion of Proteins

    11.6. Digestion of Lipids and Phosphates

    11.7. Microvillar Membranes

    11.8. The Peritrophic Membrane

    11.9. Organization of the Digestive Process

    11.9.2.9. Lepidoptera

    11.10. Digestive Enzyme Secretion Mechanisms

    11.11. Concluding Remarks

    12. Programmed Cell Death in Insects

    12.1. Introduction

    12.2. PCD, Apoptosis, Autophagy, or Necrosis?

    12.3. Historical Overview and Current Trends

    12.4. The Manduca Model

    12.5. The Drosophila Model

    12.6. Insights from Other Tissues

    12.7. Summary and Conclusions

    13. Regulation of Insect Development by TGF-β Signaling

    13.1. Overview and Components

    13.2. Dpp, the BMP Pathway, and Gradients

    13.3. Other Developmental Contexts and Regulation of BMPs

    13.4. Activins and Non-Canonical TGF-β Signaling

    13.5. Evolution of TGF-β Signaling in Insects

    14. Insect Immunology

    14.1. Introduction

    14.2. Insect Immunology Background

    14.3. PAMP-Recognition Proteins in Insect Immunology

    14.4. Humoral Innate Immune Responses

    14.5. Cellular Innate Immune Responses

    14.6. Newly Emerging Topics in Insect Immunology

    14.7. Conclusion

    15. Molecular and Neural Control of Insect Circadian Rhythms

    15.1. Introduction

    15.2. The Drosophila Circadian Pacemaker

    15.3. Input Pathways to the Drosophila Circadian Pacemaker

    15.4. Neural Control of Drosophila Circadian Behavior

    15.4.1. Anatomy of the Drosophila Circadian Neural Circuit

    15.5. Control of Circadian Rhythms in Non-Drosophilid Insects

    15.6. Conclusions

    Index

    Front Matter

    Insect Molecular Biology and Biochemistry

    E dited by

    LAWRENCE I. GILBERT

    Department of Biology, University of North Carolina, Chapel Hill, NC

    Copyright

    Academic Press is an imprint of Elsevier

    32 Jamestown Road, London NW1 7BY, UK

    225 Wyman Street, Waltham, MA 02451, USA

    525 B Street, Suite 1800, San Diego, CA 92101-4495, USA

    First edition 2012

    Copyright © 2012 Elsevier B.V. All Rights Reserved

    No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher

    Permissions may be sought directly from Elsevier’s Science & Technology Rights

    Department in Oxford, UK: phone (+ 44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: permissions@elsevier.com. Alternatively, visit the Science and Technology Books website at www.elsevierdirect.com/rights for further information

    Notice

    No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein.

    Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    ISBN: 978-0-12-384747-8

    For information on all Academic Press publications visit our website at elsevierdirect.com

    Typeset by TNQ Books and Journals Pvt Ltd. www.tnq.co.in

    Printed and bound in China

    10 11 12 13 14 15 10 9 8 7 6 5 4 3 2 1

    Preface

    In 2005 the seven-volume series Comprehensive Molecular Insect Science appeared and summarized the research in many fields of insect research, including one volume on Biochemistry and Molecular Biology. That volume covered many, but not all, fields, and the newest references were from 2004, with many chapters having 2003 references as the latest in a particular field. The series did very well and chapters were cited quite frequently, although, because of the price and the inability to purchase single volumes, the set was purchased mainly by libraries. In 2010 I was approached by Academic Press to think about bringing two major fields up to date with volumes that could be purchased singly, and would therefore be available to faculty members, scientists in industry and government, postdoctoral researchers, and interested graduate students. I chose Insect Molecular Biology and Biochemistry for one volume because of the remarkable advances that have been made in those fields in the past half dozen years.

    With the help of outside advisors in these fields, we decided to revise 10 chapters from the series and select five more chapters to bring the volume in line with recent advances. Of these five new chapters, two, by Subba Palli and by Xavier Belles and colleagues, are concerned with techniques and very special molecular mechanisms that influence greatly the ability of the insect to control its development and homeostasis. Another chapter, by Park and Lee, summarizes in a sophisticated but very readable way the immunology of insects, a field that has exploded in the past six years and which was noticeably absent from the Comprehensive series. The other two new chapters are by Yong Zhang and Pat Emery, who deal with circadian rhythms and behavior at the molecular genetic level, and by Philip Jensen, who reviews the role of TGF-β in insect development, again mainly at the molecular genetic level. In most cases the main protagonist is Drosophila melanogaster, but where information is available representative insects from other orders are discussed in depth. The 10 updated chapters have been revised with care, and in several cases completely rewritten. The authors are leaders in their research fields, and have worked hard to contribute chapters that they are proud of.

    I was mildly surprised that, almost without exception, authors who I invited to contribute to this volume accepted the invitation, and I am as proud of this volume as any of the other 26 volumes I have edited in the past half-century. This volume is splendid, and will be of great help to senior and beginning researchers in the fields covered.

    Lawrence I. Gilbert

    Department of Biology, University of North Carolina, Chapel Hill

    Contributors

    Svend O. Andersen

    The Collstrop Foundation, The Royal Danish Academy of Sciences and Letters, Copenhagen, Denmark

    Yasuyuki Arakane

    Division of Plant Biotechnology, Chonnam National University, Gwangju, South Korea

    Hua Bai

    Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, USA

    Xavier Belles

    Instituto de Biología Evolutiva (CSIC-UPF), Barcelona, Spain

    Rollie J. Clem

    Division of Biology, Kansas State University, Manhattan, KS, USA

    Alexandre S. Cristino

    Queensland Brain Institute, The University of Queensland, Brisbane St Lucia, Queensland, Australia

    Patrick Emery

    University of Massachusetts Medical School, Department of Neurobiology, Worcester, MA, USA

    Susan E. Fahrbach

    Department of Biology, Wake Forest University, Winston-Salem, NC, USA

    Clélia Ferreira

    University of São Paulo, São Paulo, Brazil

    René Feyereisen

    INRA Sophia Antipolis, France

    Stavros J. Hamodrakas

    Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens, Greece

    Alfred M. Handler

    USDA, ARS, Center for Medical, Agricultural, and Veterinary Entomology, Gainesville, FL, USA

    Vassiliki A. Iconomidou

    Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens, Greece

    Philip A. Jensen

    Department of Biology, Rocky Mountain College, Billings, MT, USA

    Michael R. Kanost

    Department of Biochemistry, Kansas State University, Manhattan, KS, USA

    Karl J. Kramer

    Department of Biochemistry, Kansas State University, and USDA-ARS, Manhattan, KS, USA

    Bok Luel Lee

    Pusan National University, Busan, Korea

    Hans Merzendorfer

    University of Osnabrueck, Osnabrueck, Germany

    Subbaratnam Muthukrishnan

    Department of Biochemistry, Kansas State University, Manhattan, KS, USA

    John R. Nambu

    Department of Biological Sciences, Charles E. Schmidt College of Science, Florida Atlantic University, Boca Raton, FL, USA

    David A. O’Brochta

    University of Maryland, Department of Entomology and The Institute for Bioscience and Biotechnology Research, College Park, MD, USA

    Subba R. Palli

    Department of Entomology, University of Kentucky, Lexington, KY, USA

    Nikos C. Papandreou

    Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens, Greece

    Ji Won Park

    Pusan National University, Busan, Korea

    Maria-Dolors Piulachs

    Instituto de Biología Evolutiva (CSIC-UPF), Barcelona, Spain

    Mercedes Rubio

    Instituto de Biología Evolutiva (CSIC-UPF), Barcelona, Spain

    Robert O. Ryan

    Children’s Hospital Oakland Research Institute, Oakland, CA, USA

    Lawrence M. Schwartz

    Department of Biology, 221 Morrill Science Center, University of Massachusetts, Amherst, MA, USA

    Erica D. Tanaka

    Instituto de Biología Evolutiva (CSIC-UPF), Barcelona, Spain

    Walter R. Terra

    University of São Paulo, São Paulo, Brazil

    Zhijian Tu

    Department of Biochemistry, Virginia Tech, Blacksburg, VA, USA

    Dick J. Van der Horst

    Utrecht University, Utrecht, The Netherlands

    John Wigginton

    Department of Entomology, University of Kentucky, Lexington, KY, USA

    Judith H. Willis

    Department of Cellular Biology, University of Georgia, Athens, GA, USA

    Yong Zhang

    University of Massachusetts Medical School, Department of Neurobiology, Worcester, MA, USA

    1. Insect Genomics

    Subba R. Palli

    Department of Entomology, University of Kentucky, Lexington, KY, USA

    Hua Bai

    Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, USA

    John Wigginton

    Department of Entomology, University of Kentucky, Lexington, KY, USA

    1.1. Introduction2

    1.2. Genome Sequencing2

    1.2.1. Genome Assembly 3

    1.2.2. Homology Detection 3

    1.2.3. Gene Ontology Annotation 4

    1.2.4. Conserved Domains and Localization Signal Recognition 5

    1.2.5. Fisher's Exact Test 5

    1.2.6. Sequenced Genomes 6

    1.3. Genome Analysis9

    1.3.1. Forward and Reverse Genetics 9

    1.3.2. DNA Microarray 10

    1.3.3. Next Generation Sequencing (NGS) 14

    1.3.4. Other Methods 16

    1.4. Proteomics16

    1.4.1. Sample Protein Labeling and Separation 18

    1.4.2. Enrichment for PTM 18

    1.4.3. Applications of Proteomics 18

    1.5. Structural Genomics19

    1.5.1. Analysis of Protein–Ligand Interactions 20

    1.5.2. Cytochrome C: A Case Study 20

    1.5.3. Selecting a Template Structure 21

    1.5.4. Target–Template Sequence Alignment 21

    1.5.5. Modeling Suite Choice 21

    1.5.6. Critical Assessment of Protein Structure 21

    1.5.7. Structural Determination 22

    1.6. Metabolomics22

    1.7. Systems Biology23

    1.8. Conclusions and Future Prospects23

    Summary

    Genomic sequencing has become a routinely used molecular biology tool in many insect science laboratories. In fact, whole-genome sequences for 22 insects have already been completed, and sequencing of genomes of many more insects is in progress. This information explosion on gene sequences has led to the development of bioinformatics and several omics disciplines, including proteomics, transcriptomics, metabolomics, and structural genomics. Considerable progress has already been made by utilizing these technologies to address long-standing problems in many areas of molecular entomology. Attempts at integrating these independent approaches into a comprehensive systems biology view or model are just beginning. In this chapter, we provide a brief overview of insect whole-genome sequencing as well as information on 22 insect genomes and recent developments in the fields of insect proteomics, transcriptomics, and structural genomics.

    1.1. Introduction

    Research on insects, especially in the areas of physiology, biochemistry, and molecular biology, has undergone notable transformations during the past two decades. Completion of the sequencing of the first insect genome, the fruit fly Drosophila melanogaster, in 2000 was followed by a flurry of activities aimed at sequencing the genomes of several additional insect species. Indeed, genome sequencing has become a routinely used method in molecular biology laboratories. Initial expectations of genome sequencing were that much could be learned by simply looking at the genetic code. In practice, insects are too complex for a complete understanding based on nucleotide sequences alone, and this has led to the realization that insect genome sequences must be complemented with information on mRNA expression as well as the proteins they encode. This has led to the development of a variety of omics technologies, including functional genomics, transcriptomics, proteomics, metabolomics, and others. The vast amount of data generated by these technologies has led to a sudden increase in the field of bioinformatics, a field that focuses on the interpretation of biological data. Developments in the World Wide Web have allowed the distribution of this omics data, along with analysis, tools to people all over the world. Integrating these data into a holistic view of all the simultaneous processes occurring within an organism allows complex hypotheses to be developed. Instead of breaking down interactions into smaller, more easily understandable units, scientists are moving towards creating models which encompass the totality of an organism’s molecular, physical, and chemical phenomena. This movement, known as systems biology, focuses on the integration and analysis of all the available data about an entire biological system, and it aims to paint an authentic and comprehensive portrait of biology.

    During the past two decades, research on insects has produced large volumes of information on the genome sequences of several model insects. Genome sequencing allows quantificatation of mRNAs and proteins, as well as predictions on protein structure and function. Attempts to integrate this data into systems biology models are currently just beginning. While it is difficult to cover all the developments in these disciplines, we will try to summarize the latest developments in these existing fields. In the first section of this chapter, insect genome sequencing and the lessons learned from this will be presented. In the next section, analysis of sequenced genomes using omics and high-throughput sequencing technologies will be summarized. In the third part of this chapter, an overview of proteomics and structural genomics will be covered. A brief overview of insect systems biology approaches will be presented at the end of this chapter.

    1.2. Genome Sequencing

    Almost all insect genomes sequenced to date employed the whole-genome shotgun sequencing (WGS) method (Figure 1). Shotgun genome sequencing begins with isolation of high molecular weight genomic DNA from nuclei isolated from isogenic lines of insects. The genomic DNA is then randomly sheared, end-polished with Bal31 nuclease/ T4 DNA polymerase primers and, finally, the DNA is size-selected. The size-selected, sheared DNA is then ligated to restriction enzyme adaptors such as the BstX1adaptors. The genomic fragments are then inserted into restriction enzyme-linearized plasmid vectors. The plasmid DNA is purified (generally by the alkaline lysis plasmid purification method), isolated, sequenced, and assembled using bioinformatics tools. Automated Sanger sequencing technology has been the main sequencing method used during the past two decades. Most genomes sequenced to date employed this technology. Sanger sequencing must be distinguished from next generation sequencing technology, which has entered the marketplace during the past four years and is rapidly changing the approaches used to sequence genomes. Genomes sequenced by NGS technologies will be completed more quickly and at a lower price than those from the first few insect genomes.

    1.2.1. Genome Assembly

    Genomes and transcriptomes are assembled from shorter reads that vary in size, depending on the sequencing technology used. Contigs are created from these short reads by comparing all reads against each other. If sequence identity and overlap length pass a certain threshold value, they are lumped together into a contig by a program called an assembler. Many assembly programs are available, which differ mainly in the details of their implementation and of the algorithms employed. The most commonly used assembler programs are: The Institute for Genomic Research (TIGR) Assembler; the Phrap assembly program developed at the University of Washington; the Celera Assembler; Arachne, the Broad Institute of MIT assembler; Phusion, an assembly program developed by the Sanger Center; and Atlas, an assembly program developed at the Baylor College of Medicine.

    The contigs produced by an assembly program are then ordered and oriented along a chromosome using a variety of additional information. The sizes of the fragments generated by the shotgun process are carefully controlled to establish a link between the sequence-reads generated from the ends of the same fragment. In WGS projects, multiple libraries with varying insert sizes are normally sequenced. Additional markers such as ESTs are also used during the assembly of genome sequences. The ultimate goal of any sequencing project is to determine the sequence of every chromosome in a genome at single base-pair resolution. Most often gaps occur within the genome after assembly is completed. These gaps are filled in through directed sequencing experiments using DNA from a variety of sources, including clones isolated from libraries, direct PCR amplification, and other methods.

    1.2.2. Homology Detection

    After assembly, sequences representing the genome or transcriptome are analyzed for functional interpretation by comparing them with known homologous sequences. Proteins typically carry out the cellular functions encoded in the genome. Protein coding sequences, in the form of open reading frames (ORFs), must first be distinguished from other sequences or those that encode other types of RNA. Transcriptome analysis is simplified by the fact that the sequenced mRNAs have already been processed for intron removal in the cell. Distinguishing the correct ORF where translation occurs, from 5′ and 3′ untranslated regions, is easily accomplished by a blast search against a protein database, or possibly by selecting the longest ORF. Finding genes in eukaryotic genomes is more complex, and presents a unique set of challenges.

    1.2.2.1. Genomic ORF detection

    Detection of ORFs is more complex in eukaryotes than prokaryotes due to the presence of alternate splicing, poorly understood promoter sequences, and the under-representation of protein coding segments compared to the whole genome. If transcriptome data are available, a number of programs exist to map these sequences back to an organism’s genome (Langmead et al., 2009 and Clement et al., 2010). This strategy is especially useful when analyzing non-model organisms, or those projects that lack the manpower of worldwide genome sequencing consortiums. In this manner a large number of transcripts can potentially be identified, along with their regulatory and promoter sequences, and information on gene synteny.

    De novo gene prediction algorithms often use Hidden Markov Models or other statistical methods to recognize ORFs, which are significantly longer than might be expected by chance. These algorithms also search for sequences containing start and stop codons, polyA tails, promoter sequences, and other characteristics indicative of protein coding segments (Burge and Karlin, 1997). De novo gene discovery is partially dependent on the organism used, since compositional differences such as GC content and codon frequency introduce bias, which must be considered for each organism. Artificial intelligence algorithms can be trained to recognize these differences when a sufficient number of protein coding sequences are available. These may originate from transcriptome sequencing, or more traditional approaches such as PCR amplification and Sanger sequencing of mRNAs. Based on a small sample proportion of known genes, artificial intelligence programs can learn the codon bias and splice sites, for example, and extrapolate these findings to the rest of the genome. However, this process is often inaccurate (Korf, 2004).

    Comparative genomics is the process of comparing newly sequenced genomes to more well-curated reference genomes. Two highly related species will likely have well conserved protein coding sequences with similar order along a chromosome. The contigs or scaffolds from a newly assembled genome can be mapped to the reference, or the shorter reads can be mapped and assembled in a hybrid approach. Programs that perform this task may often be used to map transcriptome data to a genome, since the two approaches are mechanistically similar.

    1.2.2.2. Transcriptome gene annotation

    By definition, mRNA represents protein coding sequences, and finding the correct ORF requires only a blast search. However, ribosomal RNA (rRNA) may represent more than 99% of cellular RNA content. The presence of rRNA may be detrimental to the assembly process because stretches of mRNA may overlap, and thus cause erroneously assembled RNA amalgams. Strategies to reduce the amount of sequenced rRNA include mRNA purification and rRNA removal. Oligo (dt) based strategies, such as the Promega PolyATract mRNA isolation kit, use oligo (dt) sequences which bind to the poly A tail of mRNA. The poly T tract is linked to a purification tag, such as biotin, which binds to streptavidin-coated magnetic beads. The beads can be captured, allowing the non-poly adenylated RNA to be washed away. The Invitrogen Ribominus kit uses a similar principle, except oligo sequences complementary to conserved portions of rRNA allow it to be subtracted from total RNA.

    During RNA amplification, oligo (dt) primers may be used to increase the proportion of mRNA to total RNA. This process may introduce bias near the 3′ side of mRNA, and thus protocols have been developed to normalize the representation of 5′, 3′, and middle segments of mRNA (Meyer et al., 2009). If the rRNA sequence has already been determined, many assembly programs can be supplied a filter file of rRNA and other detrimental contaminant sequences, such as common vectors, which will be excluded from the assembly process.

    1.2.2.3. Homology detection

    Annotation is the step of linking sequences with their functional relevance. Since protein homology is the best predictor of function, the NCBI blastx algorithm (Altschul et al., 1990) is a good place to start in predicting homology and thus function. The blastx algorithm translates sequences in all six possible reading frames and compares them against a database of protein sequences.

    For less technically inclined users, the blastx algorithm may be most easily implemented in Windows-based programs such as Blast2GO (Conesa et al., 2005 and Conesa and Gotz, 2008; http://www.blast2go.org/). Blast2GO offers a comprehensive suite of tools for blasting and advanced functional annotation. However, relying on the NCBI server to perform blast steps often introduces a substantial bottleneck between the server and querying computer. Local blast searches, performed by the end user’s computer(s), may significantly reduce annotation time. The blast program suite and associated databases may be downloaded for local blast searches (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/). The NCBI non-redundant protein database is quite large and time consuming to search. Meyer et al. (2009) advocate a local approach where sequences are first queried against the smaller, better curated swiss-prot database, and then sequences with no match are blasted against the NR protein database (Meyer et al., 2009). Faster algorithms such as AB-Blast (previously known as WU-Blast) may also speed up the blasting process. After a blastx search, sequences may be compared to other nucleotide sequences (blastn), or translated and compared to a translated sequence to help identify unigenes, or unique sequences. However, blastx is the first choice, since the amino acid sequence is more conserved than the nucleotide sequence. This step will also yield the correct open reading frame of a sequence. In some cases, homologous relationships may be discovered using blastn and tblastn where blastx did not. The statistically significant expectation value, or the probability that two sequences are related by chance (also called an e value) is an important consideration in blasting, because setting an e value too low may create false relationships, while setting an e value too high may exclude real ones. As sequence length increases, the probability of finding significant blast hits also increases. In practice, blasting at a low e value and small sequence overlap length initially, and then filtering the results based on the distribution of hits obtained, may be beneficial.

    1.2.3. Gene Ontology Annotation

    Gene Ontology (GO) provides a structured and controlled vocabulary to describe cellular phenomena in terms of biological processes, molecular function, and subcellular localization. These terms do not directly describe the gene or protein; on the contrary they describe phenomena, and if there is sufficient evidence that the product of a gene, a protein, is involved in this phenomenon, then the probability increases that a paralogous protein is involved (Ashburner et al., 2000).

    For example, GO analysis for the Drosophila melanogaster protein Tango molecular functions indicates that it is a transcription factor which heterodimerizes with other proteins and binds to specific DNA elements and recruits RNA polymerase. The evidence shows what types of experiments or analyses were performed to determine the function. The GO evidence codes can be inferred experimentally from experiments, assays, mutant phenotypes, genetic interactions or expression patterns, as well as computationally from sequence, sequence model, and sequence or structural similarity. The biological processes information shows that Tango is involved in brain, organ, muscle, and neuron development. The cellular components information indicates that Tango’s subcellular localization is primarily nuclear. Gene Ontology annotation programs often allow the user to set evidence code weights manually. For example, evidence inferred from direct experiments may provide more confidence than evidence inferred from computational analysis which has been manually curated. Uncurated computational evidence may contain the least confidence level. Tango and its human paralog, the Aryl Hydrocarbon Receptor Nuclear Translocator (ARNT), are both well-studied proteins. However, when using the Tribolium castaneum sequence, for example, a good GO mapping algorithm must decide how to report the more relevant information on TANGO without losing pertinent information about the better studied ARNT.

    Gene ontology mapping is great when a well-studied parologous protein is available and the blast e value is low enough to provide statistical confidence in the evolutionary relatedness and conservation of function between two proteins. In our example, the user now has a wealth of information about the T. castaneum Tango function, and can design primers for qRTPCR, RNAi, protein expression, or link function to the mRNAs which may have changed between two treatment groups in a transcriptome expression survey such as microarray analysis.

    Enzyme codes are a numerical classification for reactions that are catalyzed by enzymes, given by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) in consultation with the IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN). Enzyme codes can be inferred from GO relationships.

    The Kyoto Encyclopedia of Gene and Genomes (KEGG) is a database of enzymatic, biochemical, and signaling pathways that also maps a variety of other data. KEGG is an integrated database resource consisting of systems, genomic, and chemical information (Kanehisa and Goto, 2000 and Kanehisa et al., 2006). The KEGG pathway database consists of hand-drawn maps for cell signaling and communication, ligand receptor interactions, and metabolic pathways gathered from the literature. Figure 2 shows the pathway for D. melanogaster hormone biosynthesis annotated in KEGG. The information in this database could help in interpretation of data from genome analysis employing omics methods.

    1.2.4. Conserved Domains and Localization Signal Recognition

    Conserved domains often act as modular functional units and can be useful in predicting a protein’s function. Domain detection algorithms do not require an absolute paralog to predict function, but often use multiple sequence alignments and Hidden Markov Models based on a number of homologous proteins that share common domains. Examples include SMART (Schultz et al., 1998), PFAM (Finn et al., 2010), and the NCBI Conserved Domain Database (CDD) (Marchler-Bauer et al., 2002). Some databases, such as SCOP (Lo Conte et al., 2002), CATH (Martin et al., 1998), and DALI (Holm and Rosenstrom, 2010), focus on structural relationships and evolution. These databases group and classify protein folds based on their structural and evolutionary relatedness. Domain recognition programs have strengths and weaknesses depending on their focus, algorithm implementation, and the database used. Interproscan (Zdobnov and Apweiler, 2001) is a direct or indirect gateway to the majority of these programs and the information they can reveal. Interproscan may be accessed on the web, or through the Blast2GO program suite. Other programs accessed via Interproscan allow the identification of localization signals (i.e., nuclear localization signals), transmembrane spanning domains, sites for post-translational modifications, sequence repeats, intrinsically disordered regions, and many more.

    1.2.5. Fisher’s Exact Test

    Perturbations in the expression levels between two treatment groups of gene products involved in GO phenomena or KEGG signaling, or which belong to domain/protein families, can indicate the physiologic effects of the treatment and the mechanisms that are ultimately responsible for changes in phenotypes. mRNA expression changes must be tested for statistical significance to ensure that changes between treatments are not the result of sampling a variable population. Fisher’s Exact Test calculates a p-value which corresponds to the probability that functional groups are over-represented by chance. A low p-value might indicate that the over-represented functional groups share some regulatory mechanism which was perturbed by treatment.

    1.2.6. Sequenced Genomes

    Table 1 lists some sequenced genomes.

    Fruit fly, Drosophila melanogaster. The D. melanogaster sequencing project used several types of sequencing strategies, including sequencing of individual clones, and sequencing of genomic libraries with three insert sizes (Adams et al., 2000). A portion of the D. melanogaster genome corresponding to approximately 120 megabases of euchromatin was assembled. This assembled genomic sequence contained 13,600 predicted genes. Some of the proteins coded by these predicted genes showed high similarity with vertebrate homologs involved in processes such as replication, chromosome segregation, and iron metabolism. About 700 transcription factors have been identified based on their sequence similarity with those reported from other organisms. Half of these transcription factors are zinc-finger proteins, and 100 of them contained homoeodomains. Genome sequencing identified 22 additional homeodomain-containing proteins and 4 additional nuclear receptors. Nuclear receptors are sequence-specific ligand-dependent transcription factors that function as both transcriptional activators and repressors, and which regulate many physiological and metabolic processes. The D. melanogaster genome encodes 20 nuclear receptor proteins. General translation factors identified in other sequenced genomes are also present in the D. melanogaster genome. Interestingly, the D. melanogaster genome contained six genes encoding proteins highly similar to the messenger RNA (mRNA) cap-binding protein, eIF4E, suggesting that there may be an added level of complexity to regulation of cap-dependent translation in the fruit fly. The cytochrome P450 monooxygenases (P450s) are a large superfamily of proteins that are involved in synthesis or degradation of hormones and pheromones, as well as the metabolism of natural and synthetic toxins and insecticides (Feyereisen, 2006; see also Chapter 8 in this volume). Eighty-six genes coding for P450 enzymes and four P450 pseudo genes were identified in the D. melanogaster genome. About 20% of the proteins encoded by the D. melanogaster genome are likely targeted to the cellular membranes, since they contain four or more hydrophobic helices. The largest families of membrane proteins are sugar permeases, mitochondrial carrier proteins, and the ATP-binding cassette (ABC) transporters coded by 97, 38, and 48 genes respectively. Among the proteins involved in biosynthetic networks, 31 triacylglycerol lipases that are involved in lipolysis and energy storage and redistribution and 32 uridine diphosphate (UDP) glycosyl transferases (which participate in the production of sterol glycosides and in the biodegradation of hydrophobic compounds) are encoded by the D. melanogaster genome. One additional ferritin gene and two additional transferrin genes have been identified by genome sequencing.

    In 2005, Richards and colleagues published the genome of a second Drosophila species, Drosophila pseudoobscura (Richards et al., 2005). In 2007 the Drosophila Genome Consortium completed the sequencing of 10 additional Drosophila genomes: D. sechellia; D. simulans; D. yakuba; D. erecta; D. ananassae; D. persimilis; D. willistoni; D. mojavensis; D. virilis; and D. grimshawi (Drosophila 12 Genome Consortium, 2007). Comparative analysis of sequences from these 10 genomes and the 2 genomes published earlier ( D. melanogaster and D. pseudoobscura) identified many changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. Many characteristics of the genomes, such as the overall size, the total number of genes, the distribution of transposable element classes, and the patterns of codon usage, are well conserved among these 12 genomes. Interestingly, a number of genes coding for proteins involved environmental interactions, and reproduction showed rapid change. In these 12 genomes, microRNA genes are more conserved than the protein-coding genes (see Chapter 2 in this volume). Genome-wide alignments of the 12 Drosophila species resulted in the prediction and refinement of thousands of protein-coding exons, genes coding for RNAs such as miRNAs, transcriptional regulatory motifs, and functional regulatory regions (Stark et al., 2007). For more information on comparative analysis of 12 Drosophila species genomes, the reader is directed to Ashburner’s excellent preface article (Ashburner, 2007).

    Malaria mosquito, Anopheles gambiae. 278Mb of genome sequence from An. gambiae was obtained by the WGS method (Holt et al., 2002). About 10-fold coverage of the genome sequence was achieved. The size of the assembled An. gambiae genome is larger than that of D. melanogaster (120Mb). About 14,000 predicted genes were identified in the assembled genome sequence. When compared to the D. melanogaster genome, the An. gambiae genome contained 100 additional serine proteases, central effectors of innate immunity, and other proteolytic processes (see Chapter 10 and Chapter 14 in this volume). The presence of additional serine proteases in An. gambiae may be due to differences in feeding behavior, as well as its intimate interactions with both vertebrate hosts and parasites. Also, 36 additional proteins containing fibrinogen domains (carbohydrate-binding lectins that participate in the first line of defense against pathogens by activating the complement pathway in association with serine proteases) and 24 additional cadherin domain-containing proteins were found in An. gambiae. Most of the genes coding for transcription factors, the C2H2 zinc-finger, POZ, Myb-like, basic helix–loop–helix, and homeodomain-containing proteins reported from sequenced genomes are also present in the An. gambiae genome. An over-representation of the MYND domain was observed in the An. gambiae genome. This domain is predominantly found in chromatin proteins, which are believed to mediate transcriptional repression.

    Genes coding for proteins involved in the visual system, structural components of the cell adhesion and contractile machinery, and energy-generating glycolytic enzymes that are required for active food seeking are present in higher numbers in the An. gambiae genome when compared with the D. melanogaster genome. Genes coding for salivary gland components, as well as anabolic and catabolic enzymes involved in protein and lipid metabolism, are over-represented in the An. gambiae genome. Genes coding for proteins involved in insecticide resistance, such as transporters and detoxification enzymes, were also found in higher numbers in the An. gambiae genome when compared to their numbers in the D. melanogaster genome.

    Red flour beetle, Tribolium castaneum. The 160-Mb T. castaneum genome sequence was obtained by WGS, and contained 16,404 predicted genes (Richards et al., 2008). The T. castaneum genome showed expansions in odorant and gustatory receptors, as well as P450s and other detoxification enzyme families (see also Chapter 7 in this volume). In addition, the T. castaneum genome contained more ancestral genes involved in cell–cell communication when compared to other insect genomes sequenced to date. RNA interference is systemic in T. castaneum, and thus works very well. The SID-1 multi-transmembrane protein involved in double-stranded RNA (dsRNA) uptake in C. elegans was not found in D. melanogaster. However, three genes that encode proteins similar to SID-1 were found in the T. castaneum genome. Expansions of odorant receptors, CYP proteins, proteinases, diuretic hormones, a vasopressin hormone and receptor, and chemoreceptors suggest that these adaptations allowed T. castaneum to become a serious pest of stored grain.

    Honeybee, Apis mellifera. The 236-Mb A. mellifera genome was assembled based on 1.8Gb of sequence obtained by WGS (The Honey Bee Genome Consortium, 2006). About 10,157 potential genes were identified in the assembled genome sequence. Genes coding for most of the highly conserved cell signaling pathways are present in the A. mellifera genome. Seventy four genes coding for 96 homeobox domains were identified in the A. mellifera genome. When compared to the D. melanogaster genome, the A. mellifera genome contained more genes coding for odorant receptors and proteins involved in nectar and pollen utilization. This genome also showed fewer genes coding for proteins involved in innate immunity, detoxification enzymes, cuticle-forming proteins, and gustatory receptors.

    Parasitoid wasps , Nasonia vitripennis, N. giraulti, and N. longicornis. 240Mb of N. vitripennis genome was assembled from sequences obtained by the Sanger sequencing method (Werren et al., 2010). Sequences from two sibling species, N. giraulti and N. longicornis, were completed with one-fold Sanger and 12-fold, 45 base-pair (bp) Illumina genome coverage. The assembled genome sequence contained 17,279 predicted genes. About 60% of Nasonia genes code for proteins showing high similarity with human proteins, 18% of the genes code for proteins showing similarity with other arthropod homologs, and about 2.4% of Nasonia genes code for proteins similar to those in A. mellifera, which could therefore be hymenoptera-specific. About 12% of genes code for proteins that showed no similarity with known proteins, and therefore may be Nasonia-specific.

    Body louse, Pediculus humanus humanus. 108Mb of P. h. humanus genome was assembled from 1.3 million pair-end reads from plasmid libraries obtained by WGS (Kirkness et al., 2010). The body louse has the smallest genome size of all the insect genomes sequenced so far. The assembled genome contained 10,773 protein-coding genes and 57 microRNAs. Compared with other insect genomes, the body-louse genome contains significantly fewer genes associated with environmental sensing and response. These proteins include odorant and gustatory receptors and detoxifying enzymes. Only 104 non-sensory G protein-coupled receptors and 3 opsins were identified in P. h. humanus genome. This insect has the smallest repertoire of GPCRs identified in any sequenced insect genome to date. Only 10 odorant receptors were detected in P. h. humanus genome. Only 37 genes in the P. h. humanus genome encode for P450s. Despite its smaller size, the P. h. humanus genome contains homologs of all 20 nuclear receptors identified in D. melanogaster genome.

    Pea aphid, Acyrthosiphon pisum. The 464-Mb genome of A. pisum was assembled from 4.4 million Sanger sequencing reads (The Pea Aphid Genome Consortium, 2010). Analysis of the A. pisum genome showed extensive gene duplication events. As a result, the aphid genome appears to have more genes than any of the previously sequenced insects. Genes coding for proteins involved in chromatin modification, miRNA synthesis, and sugar transport are over-represented in the A. pisum genome when compared with other insect genomes sequenced to date. About 20% of the predicted genes in the A. pisum genome code for proteins with no significant similarity to other known proteins. Proteins involved in amino acid and purine metabolism are encoded by both host and symbiont genomes at different enzymatic steps. N Selenocysteine biosynthesis is not present in the pea aphid, and selenoproteins are absent. Several genes in the A. pisum genome were found to have arisen from bacterial ancestors and some of these genes are highly expressed in bacteriocytes, which may function in the regulation of symbiosis. Interestingly, the genes coding for proteins that function in the IMD pathway of the immune system are absent in the A. pisum genome.

    Yellow fever Mosquito, Aedes aegypti. The 1.38-Gb genome of Ae. aegypti was assembled from sequence reads obtained by WGS (Nene et al., 2007). This is the largest insect genome sequenced to date, and is about five times larger than the An. gambiae and D. melanogaster genomes. Approximately 47% of the Ae. aegypti genome consists of transposable elements. The presence of large numbers of transposable elements could have contributed to the larger size of the Ae. aegypti genome. About 15,419 predicted genes were identified in the assembled genome. Compared to the genome of An. gambiae, an increase in the number of genes encoding odorant binding proteins, cytochrome P450s, and cuticle proteins was observed in the Ae. aegypti genome.

    Silk moth, Bombyx mori. The silkworm genome was sequenced by Japanese and Chinese laboratories simultaneously. The Japanese group used the sequence data derived from WGS to assemble 514Mbs including gaps, and 387Mbs without gaps (Mita et al., 2004). Chinese scientists assembled sequences obtained by WGS into a 429-Mb genome (Xia et al., 2004). The two data sets were merged and assembled recently (The International Silkworm Genome, 2008). This resulted in the 8.5-fold sequence coverage of an estimated 432-Mb genome. The repetitive sequence content of this genome was estimated at 43.6%. Gene models numbering 14,623 were predicted using a GLEAN-based algorithm. Among the predicted genes, 3000 of them showed no homologs in insects or vertebrates. The presence of specific tRNA clusters, and several sericin gene clusters, correlates with the main function of this insect: the massive production of silk.

    Recently, a consortium of international scientists sequenced the genomic DNA of 40 domesticated and wild silkworm strains to coverage of approximately three-fold. This represents 99.88% of the genome, and led to the development of a single base-pair resolution silkworm genetic variation map (Xia et al., 2009). This effort identified ~16 million single-nucleotide polymorphisms, many indels, and structural variations. These studies showed that domesticated silkworms are genetically different from wild ones; nonetheless, they have managed to maintain large levels of genetic variability. These findings suggest a short domestication event involving a large number of individuals. Candidate genes, numbering 354, that are expressed in the silk gland, midgut, and testes, may have played an important role during domestication.

    The southern house mosquito, Culex quinquefasciatusC. quinquefasciatus is a vector of important viruses such as the West Nile virus and the St Louis encephalitis virus, and harbors nematodes that cause lymphatic filariasis. Arensburger sequenced and assembled the whole genome of C. quinquefasciatus (Arensburger et al., 2010). A larger number of genes, 18,883, reported from the other two mosquito genomes ( Aedes aegypti and Anopheles gambiae), were identified in the assembled C. quinquefasciatus genome. An increase in the number of genes coding for olfactory and gustatory receptors, immune proteins, enzymes such as cytosolic glutathione transferases and cytochrome P450s involved in xenobiotic detoxification was observed.

    1.3. Genome Analysis

    Since its discovery, Sanger sequencing has been largely applied in most genome sequencing projects (Sanger et al., 1977); therefore, a large volume of sequence information from a variety of species has been deposited into various databases. With deciphered full genome sequences for a number of species, scientists could now begin to address biological questions on a genome-wide level. These analyses include the measurement of global gene expression, the identification of functional elements, and the mapping of genome regions associated with quantitative traits. Various new technologies have also been developed to assist with genome analysis. These include DNA microarrays (Schena et al., 1995), serial analysis of gene expression (SAGE) (Schena et al., 1995), chromatin immunoprecipitation microarrays (Ren et al., 2000, Iyer et al., 2001 and Lieb et al., 2001), next generation sequencing (NGS) (Margulies et al., 2005 and Shendure et al., 2005), genome-wide RNAi screens (Kiger et al., 2003), comparative genomics (Kiger et al., 2003), and metagenomics (Chen and Pachter, 2005). These genomic analysis tools have greatly improved our understanding of how biological and cellular functions are regulated by the RNAs or proteins encoded in an organism’s genome. Especially in the agricultural research field, functional genomics studies will enhance our understanding of the biology of insect pests and disease vectors, which in turn will assist the design of future pest control strategies. Here, we will discuss technologies used for functional genomics studies, with an emphasis on forward genetics, DNA microarray, and NGS technologies, and their applications in research on insects.

    1.3.1. Forward and Reverse Genetics

    The function of genes is often studied using forward genetics approaches. In forward genetic screens, insects are treated with mutagens to induce DNA lesions, followed by a screen to identify mutants with a phenotype of interest. The mutated gene is then identified by employing standard genetic and molecular methods. Follow-up studies on the mutant phenotype, including molecular analyses of the gene, often lead to determination of its function. Forward genetics approaches have been used for determining the function of many genes. In the fruit fly, D. melanogaster, genetic screens have been used for a number of years to discover gene–phenotype associations. With the availability of massive amounts of data derived from whole-genome and omics studies, a systems biology approach needs to be applied to enhance the power of gene function discovery in vivo. Mobile elements or chemicals are often used as mutagenesis tools (Ryder and Russell, 2003). The P element has been widely used in D. melanogaster forward genetics since its development as a tool for transgenesis in 1982 (Rubin and Spradling, 1982). The insertion of P elements into the D. melanogaster genome allowed subsequent cloning and characterization of a large number of fly genes. P-element mediated transgenesis is often used to create mutants by excising the flanking genes based on imprecise mobilization of the P elements. P elements were also modified to study genes, not only based on a phenotype, but also based on RNA or protein expression patterns, which are often referred to as enhancer trap and gene trap technologies. P elements are also being used as mutagenesis agents in a project aimed at generating insertions in every predicted gene in the fruit fly genome.

    Recent developments in transgenic techniques focused on the site-specific integration of transgenes at specific genomic sites, which employ recombinases and integrases, have made forward genetics in D. melanogaster effective and specific. One of the major drawbacks of P-element mediated transgenesis is the non-specific and positional effects caused by inserting exogenous DNA into insect genome. Recently, several methods have been developed to eliminate these unwanted, non-specific effects in transgenic insects. Transgene co-placement was developed by Siegal and Hartl (1996). This method uses two transgenes, a rescue fragment and its mutant version, which are inserted into the same locus by using a P-element vector that contains the recognition sites FRT (FLP recombinase recognition site) and loxP (the Cre recombinase recognition site). After integration, FLP can remove one transgene, such as the rescue gene. Cre can remove the other transgene, which may be the mutant version. A method was developed by Golic (Golic et al., 1997) by using FLP recombinase for remobilization of transgene by a donor transposon that contains a transgenic insert together with a marker gene such as white flanked by two FRT sites, and an acceptor transposon that contains a second marker and one FRT site. The remobilization of the donor transposon by FLP can be followed by the changes in the expression of white gene. The remobilization results in the excision of transgene and its potential integration into the FRT site of the acceptor transposon.

    Homologous recombination is the best method for in vivo gene targeting, since positional effects can be eliminated completely. Insertional gene targeting (Rong and Golic, 2000) and replacement gene targeting (Gong and Golic, 2003) are two alternative methods that have been developed. Insertional gene targeting results in the insertion of a target gene at a region of homology. Replacement gene targeting results in replacement of endogenous homologous DNA sequences with exogenous DNA through a double reciprocal recombination between two stretches of homologous sequences. Site-specific zinc-finger-nuclease-stimulated gene targeting has been developed to further improve in vivo gene targeting (Bibikova et al., 2003 and Beumer et al., 2006). The most widely used site-specific integration in D. melanogaster employs the bacteriophage Φ C31 integrase. The bacteriophage Φ C31 integrase catalyzes the recombination between the phase attachment site ( attP), previously integrated into the fly genome, and a bacterial attachment site ( attB) present in the injected transgenic construct (Groth et al., 2004). A combination of different transgenic methods should aid in D. melanogaster functional genomics studies aimed at determining the function of every gene in this insect.

    In the reverse genetics approach, studies on the function of the genes start with the gene sequences, rather than a mutant phenotype, which is often used in forward genetics approaches. In this approach, the gene sequence is used to alter the gene function by employing a variety of methods. The effect of the altered gene function on physiological and developmental processes of insects is then determined. Reverse genetics is an excellent complement to forward genetics, and some of the experiments are much easier to perform using reverse genetics rather than forward genetics. For example, RNA interference, a reverse genetics method (covered in Chapter 2 in this volume) is a better method compared to forward genetics to investigate the functions of all the members of a gene family. The availability of whole-genome sequences for a number of insects and the functioning of RNAi in these insects will keep scientists busy studying the functions of all genes in insects during the next few years.

    1.3.2. DNA Microarray

    In most cases, a group of functionally associated genes share similar expression patterns, which may be temporal, spatial, developmental, or physiological. For example, environmental changes and pathological conditions could alter global gene expression patterns. To understand and characterize the biological roles of an individual gene or a cluster of genes, a high-throughput quantitative method is needed to detect gene expression at the whole-genome level. The DNA microarray technique is one such method that has been developed for monitoring global gene expression patterns. Through robotic printing of thousands of DNA oligonucleotides onto a solid surface, one DNA microarray chip can accommodate more than 50,000 probes (unique DNA sequences). DNA microarrays utilize the principle of Southern blotting (Schena et al., 1995). First, fluorescently labeled probes are synthesized from RNA samples by reverse transcription; the probes are then hybridized to DNA microarrays which contain complementary DNA. After washing away the unbound probes, the intensity of the fluorescent signal for each spot is captured using a microarray scanner. DNA microarrays have been widely used in functional genomics research. In addition to their application on gene expression profiling, DNA microarrays can also be used to identify transcriptional or functional elements in the genome, or identify single nucleotide polymorphisms (SNP) among alleles within or between populations. The applications of DNA microarrays and various other types of arrays are listed in Table 2.

    1.3.2.1. Global gene expression analysis (transcriptome analysis)

    1.3.2.1.1. DNA microarray fabrication

    The DNA microarrays used for global gene expression analysis usually contain tens of thousands of probes which cover all the predicted genes in a genome, or sequences representing transcribed regions, also called expressed sequence tags (ESTs). For example, the Affymetrix GeneChip®Drosophila Genome 2.0 Array contains over 500,000 data points representing 18,500 transcripts and various SNPs (Affymetrix technical data sheets). DNA microarrays can be prepared by various methods, including photolithography, ink-jet technology, and spotted array technology. Photolithography and ink-jet technologies are used for fabricating so-called oligonucleotide microarrays, which are made by synthesizing or printing short oligonucleotide sequences (25-mer in Affymetrix array or 60-mer in Agilent array) directly onto a solid array surface. The photolithography method is used by Affymetrix and NimbleGen, while the ink-jet print method is used by Agilent. Typically, multiple probes per gene are used in order to achieve precise estimation of gene expression. Long oligonucleotides have better hybridization specificities than short ones, although short oligonucleotides can be printed at a higher density and synthesized at lower cost. In contrast, spotted microarrays are made by synthesizing probes prior to deposition onto the array surface. The probes used for spotted microarrays can be oligonucleotides, cDNA or PCR products. Because of their relatively low cost and flexibility, the spotted microarray technology has been widely used to produce custom arrays in many academic laboratories and facilities. However, spotted microarrays are less uniform and contain low probe density when compared with oligonucleotide arrays. As the cost of custom commercial arrays such as Agilent Custom Gene Expression Microarrays (eArray) has decreased, the use of spotted microarray is decreasing as well.

    1.3.2.1.2. Target preparation and hybridization

    Total RNA or mRNA is isolated from experimental samples using commercial TRIzol reagent or RNA isolation and purification kits. Total RNA (1μg to 15μg) or mRNA (0.2μg to 2μg) is reverse transcribed into first-strand cDNA. For smaller amounts of total starting RNA (10ng to 100ng), Affymetrix offers a two-cycle target labeling method to obtain sufficient amounts of labeled targets for DNA hybridization. Then, cDNAs are labeled and hybridized to spotted or oligonucleotide microarrays. In oligonucleotide microarrays, one mRNA sample labeled with one fluorescent dye is analyzed on a single channel. Alternatively, two different fluorescent dyes, such as Cy3 and Cy5, can be used to determine gene expression changes from two different experimental conditions.

    1.3.2.1.3. Data analysis

    Although the data analysis methods among commercial microarrays vary, the basic concepts are similar. After hybridization, the fluorescence images are captured by a microarray scanner. The fluorescence intensity data are then corrected and adjusted from the background (noise), which may result from non-specific hybridization or autofluorescence. In two-channel arrays, the fluorescence intensity ratio between two dyes is calculated and adjusted. If the data from a different array or hybridization are to be compared, they need to be normalized before further analysis.

    After normalization, various statistical analysis methods can be applied to identify differentially expressed genes between two treatments. Usually, a t-test is used for comparing the means of two sample populations, while ANOVA (analysis of variance) is applied for comparing multiple sets of samples or treatments to obtain more accurate variance estimates. Since many genes are tested for statistical differences, multiple test corrections, such as the Bonferroni correction and the Benjamini and Hochberg false discovery rate (FDR) (Benjamini and Hochberg, 1995), are applied to adjust the P-value and correct the occurrence of false positives. Bonferroni correction is a very stringent method that uses α/ n as the threshold P-value for each test where n is the number of tests or the number of genes. In contrast, the Benjamini and Hochberg FDR is less stringent, and the rate of false negative discovery is lower. Various statistical analysis programs are now available from either commercial microarray providers or open source websites. These include GeneSpring from Silicon Genetics (acquired by Agilent in 2004) and Significance Analysis of Microarrays (SAM) (Tusher et al., 2001). Besides differential expression analysis, genes with similar expression patterns can be grouped into one or more clusters using hierarchical clustering methods. Hierarchical clustering analysis helps to visualize gene expression patterns and identify relationships between functionally associated genes (Eisen et al., 1998). On the other hand, programs such as Gene Set Enrichment Analysis (GSEA) are used to determine whether there is a statistically significant, coordinated difference between control and treatment samples for a predefined set of genes that are involved in a similar biological process (Subramanian et al., 2005). Unlike traditional microarray analyses at the single gene level, GSEA has addressed a situation where the fold change between control and treatment samples is small, but there is a concordant difference in the representation of functionally related genes. Several published microarray datasets have been deposited in various online databases, including Gene Expression Omnibus (GEO) at NCBI, ArrayExpress at the European Bioinformatics Institute, and Stanford Genomic Resource at Stanford University. A list of microarray analysis tools and databases is shown in Table 3.

    1.3.2.1.4. Applications

    The primary goal of developing gene expression microarray technology is to monitor differentially expressed genes at the whole-genome level. Therefore, microarray technology has been used to study the molecular basis of pesticide resistance (Djouaka et al., 2008 and Zhu et al., 2010) (Figure 3), insect–plant interactions (Held et al., 2004), insect host–parasitoid associations (Lawniczak and Begun, 2004, Barat-Houari et al., 2006, Mahadav et al., 2008 and Kankare et al., 2010), insect behavior (McDonald and Rosbash, 2001, Etter and Ramaswami, 2002, Dierick and Greenspan, 2006, Adams et al., 2008 and Kocher et al., 2008), development and reproduction (White et al., 1999, Kawasaki et al., 2004, Dana et al., 2005, Kijimoto et al., 2009, Bai and Palli, 2010, Parthasarathy et al., 2010a and Parthasarathy et al., 2010b), etc. Understanding the mechanisms of pesticide resistance is critical for prolonging the life of existing insecticides, designing novel pest control reagents, and improving control strategies. As a result, several laboratories have begun using microarrays to identify genes responsible for insecticide resistance. For example, using a custom microarray, one cytochrome P450 gene, CYP6BQ9, has been identified to be responsible for the majority of deltamethrin resistance in T. castaneum (Zhu et al., 2010) (Figure 3). Another microarray study discovered that two cytochrome P450 genes, CYP6P3 and CYP6M2, are upregulated in multiple pyrethroid-resistant Anopheles gambiae populations collected in Southern Benin and Nigeria (Djouaka et al., 2008). A global view of tissue-specific gene expression profiling has been reported in Drosophila melanogaster (Chintapalli et al., 2007). This study identified many genes that are uniquely expressed in specific fly tissues, and provided useful information for understanding the tissue-specific functions of these candidate genes.

    Enjoying the preview?
    Page 1 of 1