Multivariate Analyses of Codon Usage Biases
()
About this ebook
- The main objective is to provide a complete and reproducible example of the power of multivariate analyses in this application field
Jean R. Lobry
Dr. Jean R. Lobry is a professor-researcher at Claude University Bernard Lyon as university lecturer since 1992 and then as a university professor in 2006
Related to Multivariate Analyses of Codon Usage Biases
Related ebooks
Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan Rating: 0 out of 5 stars0 ratingsMutualistic Networks Rating: 0 out of 5 stars0 ratingsModeling Populations of Adaptive Individuals Rating: 0 out of 5 stars0 ratingsScaling in Ecology with a Model System Rating: 0 out of 5 stars0 ratingsMaximizing Gene Expression Rating: 0 out of 5 stars0 ratingsPopulation Dynamics: New Approaches and Synthesis Rating: 0 out of 5 stars0 ratingsAdvanced Mechanical Models of DNA Elasticity Rating: 0 out of 5 stars0 ratingsScientific and Technological Communication Rating: 0 out of 5 stars0 ratingsModern Experimental Design Rating: 0 out of 5 stars0 ratingsAn Introduction to Stochastic Orders Rating: 0 out of 5 stars0 ratingsOrder Statistics & Inference: Estimation Methods Rating: 0 out of 5 stars0 ratingsMultilevel Analysis of Educational Data Rating: 0 out of 5 stars0 ratingsExploring Mathematical Modeling in Biology Through Case Studies and Experimental Activities Rating: 0 out of 5 stars0 ratingsPhylogenies in Ecology: A Guide to Concepts and Methods Rating: 0 out of 5 stars0 ratingsMeasuring Abundance: Methods for the Estimation of Population Size and Species Richness Rating: 0 out of 5 stars0 ratingsTropical Extremes: Natural Variability and Trends Rating: 0 out of 5 stars0 ratingsThe DNA Detective: Unraveling the Mysteries of Our Genetic Code Rating: 0 out of 5 stars0 ratingsDelay Differential Equations: With Applications in Population Dynamics Rating: 0 out of 5 stars0 ratingsBayesian Optimization and Data Science Rating: 0 out of 5 stars0 ratingsPractical Statistics for Field Biology Rating: 4 out of 5 stars4/5Handbook of Capture-Recapture Analysis Rating: 0 out of 5 stars0 ratingsA Manual for Wildlife Radio Tagging Rating: 0 out of 5 stars0 ratingsFins into Limbs: Evolution, Development, and Transformation Rating: 0 out of 5 stars0 ratingsModeling and Simulation of Human Behavior: An Introduction Rating: 0 out of 5 stars0 ratingsMeasuring Biological Diversity Rating: 0 out of 5 stars0 ratingsAdvances in Dendritic Macromolecules Rating: 0 out of 5 stars0 ratingsTwin and Family Studies of Epigenetics Rating: 0 out of 5 stars0 ratingsInsight on Environmental Genomics: The High-Throughput Sequencing Revolution Rating: 0 out of 5 stars0 ratings
Biology For You
Anatomy and Physiology For Dummies Rating: 4 out of 5 stars4/5Anatomy 101: From Muscles and Bones to Organs and Systems, Your Guide to How the Human Body Works Rating: 4 out of 5 stars4/5This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking Rating: 4 out of 5 stars4/5The Rise and Fall of the Dinosaurs: A New History of a Lost World Rating: 4 out of 5 stars4/5The Grieving Brain: The Surprising Science of How We Learn from Love and Loss Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5Sapiens: A Brief History of Humankind Rating: 4 out of 5 stars4/5Why We Sleep: Unlocking the Power of Sleep and Dreams Rating: 4 out of 5 stars4/5Dopamine Detox: Biohacking Your Way To Better Focus, Greater Happiness, and Peak Performance Rating: 3 out of 5 stars3/5The Obesity Code: the bestselling guide to unlocking the secrets of weight loss Rating: 4 out of 5 stars4/5The Seven Sins of Memory: How the Mind Forgets and Remembers Rating: 4 out of 5 stars4/5Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition) Rating: 4 out of 5 stars4/5Homo Deus: A Brief History of Tomorrow Rating: 4 out of 5 stars4/5Peptide Protocols: Volume One Rating: 4 out of 5 stars4/5How Emotions Are Made: The Secret Life of the Brain Rating: 4 out of 5 stars4/5Lifespan: Why We Age—and Why We Don't Have To Rating: 4 out of 5 stars4/5All That Remains: A Renowned Forensic Scientist on Death, Mortality, and Solving Crimes Rating: 4 out of 5 stars4/5The Coming Plague: Newly Emerging Diseases in a World Out of Balance Rating: 4 out of 5 stars4/5The Winner Effect: The Neuroscience of Success and Failure Rating: 5 out of 5 stars5/5The Soul of an Octopus: A Surprising Exploration into the Wonder of Consciousness Rating: 4 out of 5 stars4/5Mother of God: An Extraordinary Journey into the Uncharted Tributaries of the Western Amazon Rating: 4 out of 5 stars4/5The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race Rating: 4 out of 5 stars4/5Woman: An Intimate Geography Rating: 4 out of 5 stars4/5A Crack In Creation: Gene Editing and the Unthinkable Power to Control Evolution Rating: 4 out of 5 stars4/5Jaws: The Story of a Hidden Epidemic Rating: 4 out of 5 stars4/5A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals Rating: 3 out of 5 stars3/5Vax-Unvax: Let the Science Speak Rating: 5 out of 5 stars5/5The Sixth Extinction: An Unnatural History Rating: 4 out of 5 stars4/5Suicidal: Why We Kill Ourselves Rating: 4 out of 5 stars4/5
Reviews for Multivariate Analyses of Codon Usage Biases
0 ratings0 reviews
Book preview
Multivariate Analyses of Codon Usage Biases - Jean R. Lobry
Multivariate Analyses of Codon Usage Biases
Jean R. Lobry
Statistics for Bioinformatics Set
coordinated by
Guy Perrière
Table of Contents
Cover image
Title page
Dedication
Copyright
Acknowledgments
Introduction
I.1 Prerequisites and notations
I.2 The case under study is a genomic monster
1: Introduction to Correspondence Analysis
Abstract
1.1 Chapter objectives
1.2 Metric choice
1.3 Properties
2: Global Correspondence Analysis
Abstract
2.1 Data set
2.2 Running global correspondence analysis
2.3 The missing factor F0
2.4 First factor
2.5 Second and third factors
2.6 Fourth and fifth factors
3: Within and Between Correspondence Analysis
Abstract
3.1 Running the analyses
3.2 Synonymous codon usage (WCA)
3.3 Amino acid usage (BCA)
4: Internal Correspondence Analysis
Abstract
4.1 Running the analyses
4.2 Synonymous codon usage
4.3 Non-synonymous codon usage
Conclusion
Appendix 1
A1.1 Introduction
A1.2 Chapter 1
A1.3 Chapter 2
A1.4 Chapter 3
A1.5 Chapter 4
Appendix 2
A2.1 Session information
References
Index
Dedication
This book is dedicated to Tami and Noboru Sueoka
Copyright
First published 2018 in Great Britain and the United States by ISTE Press Ltd and Elsevier Ltd
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Press Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
Elsevier Ltd
The Boulevard, Langford Lane
Kidlington, Oxford, OX5 1GB
UK
www.elsevier.com
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
For information on all our publications visit our website at http://store.elsevier.com/
© ISTE Press Ltd 2018
The rights of Jean R. Lobry to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
Library of Congress Cataloging in Publication Data
A catalog record for this book is available from the Library of Congress
ISBN 978-1-78548-296-0
Printed and bound in the UK and US
Acknowledgments
Thanks are due to Anne-Béatrice DUFOUR, Laurent GUÉGUEN, Paweł MACKIEWICZ, Simon PENEL, Guy PERRIÈRE and Haruo SUZUKI.
This work was performed using the computing facilities of the CC LBBE/PRABI.
Introduction
I.1 Prerequisites and notations
statistical software [RCO 13] is assumed ¹ . We should be able to install and load the seqinr [CHA 07] and ade4 [CHE 04] packages. In the following, outputs are presented in black to distinguish them from inputs, and comments start with the # character:
library(seqinr) library(ade4) pi <- 3 # God’s approximation in 1 Kings 7:23 and 2 Chronicles 4:2 pi [1] 3 library(fortunes) fortune(pi <- 3
) John Fox: I’ve never understood why it’s legal to change the built-in global constants
in R, including T and F. That just seems to me to set a trap for users. Why not treat these as reserved symbols, like TRUE, Inf, etc.? Rolf Turner: I rather enjoy being able to set pi <- 3. -- John Fox and Rolf Turner R-help (June 2013)
To avoid too much verbosity, the code used to produce figures is detailed in the Appendix. This book is composed with LATEX using the command Sweave() [outputs. For instance, the approximation of π that was used by the Babylonians defined in the above code could be referred to in the source text as \Sexpr{pi} and then automatically transformed into three in the final document.
Some code fragments that need an Internet connection or that are time consuming are encapsulated using a logical flag that must be set to TRUE to allow for their execution. The list of these flags is as follows:
# NIC: Need Internet Connection # CTI: Computer Time Intensive TODO <- FALSE # Global switch final.query0 <- TODO # NIC: get Bb chromosome for chirochore structure final.stability <- TODO # NIC + CTI: codon usage in 12,317 bacteria final.query1 <- TODO # NIC: get sequences and infos from remote database final.query2 <- TODO # NIC: get ribosomal sequences informations final.query3 <- TODO # NIC: get PheL phenylalanin operon leader peptide final.query4 <- TODO # NIC: get GC content for Ixodes scapulatis CDS final.screeplot2 <- TODO # CTI: many CA on simulated tables under H0 final.delrow <- TODO # CTI: many CA while deleting progressively small CDS final.delcol <- TODO # CTI: many CA while deleting prgressively minor codons final.ttuco.eig <- TODO # CTI: many WCA and BCA on simulated tables under H0 final.ica.eig1 <- TODO # CTI: many WVA and BCA for within-between group analysis
final.ica.eig2 <- TODO # CTI: many ICA on simulated tables under H0
Notes on Figure I.1
The colors and the shape of points used in this book to represent different classes of coding sequences are given in the top panel. Simulations with package dichromat [code available in Appendix 1, section A1.1.1.
Figure I.1 shows the colors used in this book; they were choosen to be discriminated by color blind people. Most figures should be understandable in black and white because the illustrative variables are also encoded by the shape of points. For figures that are in color, a link is provided that leads to a color version. Here is a brief description of these variables:
−Leading: used for a coding sequence, which is transcribed divergently from the origin of replication and then encoded in the leading strand for replication as illustrated in Figure I.4.
−Lagging: used for a coding sequence, which is transcribed convergently toward the origin of replication and then encoded in the lagging strand for replication. In bacteria, there is no documented example of a coding sequence being simultaneously leading and lagging².
−Ribosomal: used for a sequence coding of a ribosomal protein, which is used as a proxy for sequences with a high expressivity, that is with a high expression level in at least some environmental conditions.
−Integral membrane protein: used for a sequence coding of an integral membrane protein; its location in the hydrophobic phospholipid bilayer requires an enrichment in hydrophobic amino acids.
Figure I.1 Color code. For a color version of this figure, see www.iste.co.uk/lobry/multivariate.zip
An elementary knowledge, or at least practice, of data dimension reduction methods such as principal component analysis (PCA) is assumed. Figure I.2 provides a basic introduction to PCA.
Figure I.2 Principal component analysis in a nutshell. For a color version of this figure, see www.iste.co.uk/lobry/multivariate.zip
Let C be the set of the 64 possible codons:
[I.1]
Let A be the set of the pseudo-amino acid Stp plus the 20 possible amino acids in proteins:
[I.2]
A genetic code is a surjective function from C onto A: every element of C maps to one element in A, and every element of A is mapped to by some elements of C, as in Figure I.3, corresponding to the so-called universal genetic code. Codons for the same amino acid are termed synonymous’s prompt with the tablecode() function.
Figure I.3 The surjective nature of genetic code. For a color version of this figure, see www.iste.co.uk/lobry/multivariate.zip