Multivariate Analysis of Ecological Data with ade4
()
About this ebook
This book introduces the ade4 package for R which provides multivariate methods for the analysis of ecological data. It is implemented around the mathematical concept of the duality diagram, and provides a unified framework for multivariate analysis. The authors offer a detailed presentation of the theoretical framework of the duality diagram and also of its application to real-world ecological problems. These two goals may seem contradictory, as they concern two separate groups of scientists, namely statisticians and ecologists. However, statistical ecology has become a scientific discipline of its own, and the good use of multivariate data analysis methods by ecologists implies a fair knowledge of the mathematical properties of these methods.
The organization of the book is based on ecological questions, but these questions correspond to particular classes of data analysis methods. The first chapters present both usual and multiway data analysis methods. Further chapters are dedicated for example to the analysis of spatial data, of phylogenetic structures, and of biodiversity patterns. One chapter deals with multivariate data analysis graphs.In each chapter, the basic mathematical definitions of the methods and the outputs of the R functions available in ade4 are detailed in two different boxes. The text of the book itself can be read independently from these boxes. Thus the book offers the opportunity to find information about the ecological situation from which a question raises alongside the mathematical properties of methods that can be applied to answer this question, as well as the details of software outputs. Each example and all the graphs in this book come with executable R code.Related to Multivariate Analysis of Ecological Data with ade4
Related ebooks
Chemometrics for Pattern Recognition Rating: 0 out of 5 stars0 ratingsComputer Capacity Planning: Theory and Practice Rating: 0 out of 5 stars0 ratingsA Student's Guide to Python for Physical Modeling: Second Edition Rating: 0 out of 5 stars0 ratingsComputer Performance Modeling Handbook Rating: 0 out of 5 stars0 ratingsDesign Automation: Automated Full-Custom VLSI Layout Using the ULYSSES Design Environment Rating: 0 out of 5 stars0 ratingsCharacter Recognition Systems: A Guide for Students and Practitioners Rating: 0 out of 5 stars0 ratingsMaking Sense of Data II: A Practical Guide to Data Visualization, Advanced Data Mining Methods, and Applications Rating: 0 out of 5 stars0 ratingsVerification of Systems and Circuits Using LOTOS, Petri Nets, and CCS Rating: 0 out of 5 stars0 ratingsA Student's Introduction to Engineering Design: Pergamon Unified Engineering Series Rating: 0 out of 5 stars0 ratingsFundamentals of Distributed Object Systems: The CORBA Perspective Rating: 0 out of 5 stars0 ratingsAn Architecture for Combinator Graph Reduction Rating: 0 out of 5 stars0 ratingsExploratory Image Databases: Content-Based Retrieval Rating: 5 out of 5 stars5/5Statistics and Data with R: An Applied Approach Through Examples Rating: 3 out of 5 stars3/5College Algebra and Trigonometry Rating: 2 out of 5 stars2/5Discrete Optimization Rating: 0 out of 5 stars0 ratingsPCs for Chemists Rating: 0 out of 5 stars0 ratingsPrinciples of Modern Digital Design Rating: 0 out of 5 stars0 ratingsArtificial Intelligence: A New Synthesis Rating: 4 out of 5 stars4/5Modeling with Data: Tools and Techniques for Scientific Computing Rating: 3 out of 5 stars3/5Embedded SoPC Design with Nios II Processor and VHDL Examples Rating: 0 out of 5 stars0 ratingsPrecalculus: Functions & Graphs Rating: 4 out of 5 stars4/5Parallel Computing Works! Rating: 0 out of 5 stars0 ratingsPascal-SC: A Computer Language for Scientific Computation Rating: 0 out of 5 stars0 ratingsApplied Data Mining: Statistical Methods for Business and Industry Rating: 0 out of 5 stars0 ratingsSchaum's Outline of Computer Architecture Rating: 0 out of 5 stars0 ratingsApplied Data Mining for Business and Industry Rating: 0 out of 5 stars0 ratingsSoftware Defined Radio: The Software Communications Architecture Rating: 0 out of 5 stars0 ratingsDesign Problem Solving: Knowledge Structures and Control Strategies Rating: 0 out of 5 stars0 ratings
Medical For You
The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment Inspired By Barbara O'Neill Rating: 0 out of 5 stars0 ratingsThe 40 Day Dopamine Fast Rating: 4 out of 5 stars4/5The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine Rating: 5 out of 5 stars5/5Holistic Herbal: A Safe and Practical Guide to Making and Using Herbal Remedies Rating: 4 out of 5 stars4/5Mediterranean Diet Meal Prep Cookbook: Easy And Healthy Recipes You Can Meal Prep For The Week Rating: 5 out of 5 stars5/5Rewire Your Brain: Think Your Way to a Better Life Rating: 4 out of 5 stars4/5The Hormone Reset Diet: Heal Your Metabolism to Lose Up to 15 Pounds in 21 Days Rating: 4 out of 5 stars4/5Period Power: Harness Your Hormones and Get Your Cycle Working For You Rating: 4 out of 5 stars4/5Adult ADHD: How to Succeed as a Hunter in a Farmer's World Rating: 4 out of 5 stars4/5Tight Hip Twisted Core: The Key To Unresolved Pain Rating: 4 out of 5 stars4/5What Happened to You?: Conversations on Trauma, Resilience, and Healing Rating: 4 out of 5 stars4/5ATOMIC HABITS:: How to Disagree With Your Brain so You Can Break Bad Habits and End Negative Thinking Rating: 5 out of 5 stars5/5The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally Rating: 4 out of 5 stars4/5Herbal Healing for Women Rating: 4 out of 5 stars4/5Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition) Rating: 4 out of 5 stars4/5The Amazing Liver and Gallbladder Flush Rating: 5 out of 5 stars5/5Woman: An Intimate Geography Rating: 4 out of 5 stars4/5Living Daily With Adult ADD or ADHD: 365 Tips o the Day Rating: 5 out of 5 stars5/5The Art of Dying Well: A Practical Guide to a Good End of Life Rating: 4 out of 5 stars4/5Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life Rating: 5 out of 5 stars5/5Working Stiff: Two Years, 262 Bodies, and the Making of a Medical Examiner Rating: 4 out of 5 stars4/5Healthy Gut, Healthy You: The Personalized Plan to Transform Your Health from the Inside Out Rating: 4 out of 5 stars4/5The Song of the Cell: An Exploration of Medicine and the New Human Rating: 4 out of 5 stars4/5The Butchering Art: Joseph Lister's Quest to Transform the Grisly World of Victorian Medicine Rating: 4 out of 5 stars4/5
Reviews for Multivariate Analysis of Ecological Data with ade4
0 ratings0 reviews
Book preview
Multivariate Analysis of Ecological Data with ade4 - Jean Thioulouse
© Springer Science+Business Media, LLC, part of Springer Nature 2018
Jean Thioulouse, Stéphane Dray, Anne-Béatrice Dufour, Aurélie Siberchicot, Thibaut Jombart and Sandrine PavoineMultivariate Analysis of Ecological Data with ade4https://doi.org/10.1007/978-1-4939-8850-1_1
1. Introduction
Jean Thioulouse¹ , Stéphane Dray¹, Anne-Béatrice Dufour¹, Aurélie Siberchicot¹, Thibaut Jombart² and Sandrine Pavoine³
(1)
Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR 5558 – Université de Lyon, Villeurbanne, France
(2)
Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
(3)
Centre d’Ecologie et des Sciences de la Conservation (CESCO), Muséum national déHistoire naturelle, CNRS, Sorbonne Université, Paris, France
Abstract
This introductory chapter presents the intended readership of the book and a short history of the ade4 software. It also describes the associated packages of the ade4 family and how to install and use these packages with R. Lastly, we provide a short presentation of the types of ecological data sets found in real case studies.
1.1 Intended Readership
Multivariate data analysis methods are not restricted to any particular application field: they have been used in many scientific domains. However, because of the background of its authors, ade4 has always been more particularly intended for biologists, especially in the field of Ecology. The subject area analysis of the list of scientific papers citing the three ade4 references (Thioulouse et al. 1997; Dray and Dufour 2007; Thioulouse and Dray 2007) highlights this trend (Fig. 1.1, source: ISI Web of Knowledge).
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig1_HTML.pngFig. 1.1
Number of citations of the three ade4 papers by ISI research area (top 15 research areas). The total number of citations reaches 2655 in March 2018. This figure is created with the treemap package (Tennekes 2017).
Researchers and students in ecological fields are therefore potentially interested in using multivariate analysis methods, and this book was primarily written for them. Other areas with fewer citations include, for example, Tropical Medicine, Physics Particles and Fields, Spectroscopy, Sociology and Literature. Researchers in these areas can also be interested in this book, but the examples used throughout the text come from ecological case studies.
Multivariate data analysis methods are particularly useful to analyse large data sets, for example tens or hundreds of variables measured on hundreds or thousands of samples. The synthetic properties of these methods are really helpful in this case. When fewer parameters and/or samples are available, other methods should be considered. Today, molecular biology methods provide huge data sets belonging to almost any biology area, that can be analysed very effectively with multivariate analysis methods.
1.2 Evolutions of ade4
1.2.1 The ade4 Add-On Package for R
The current version of ade4 is an add-on package for the R software. This has important consequences for the user: you need to install R on your computer and learn to handle it before you can start using ade4. But it also has many advantages: learning to deal with R will be valuable beyond the use of ade4, as all the common statistical computations needed by biologists can be performed with R.
There is also an easy-to-use Graphical User Interface (GUI) implemented in the ade4TkGUI add-on package (Thioulouse and Dray 2007, see Appendix B). This GUI can facilitate the transition from previous versions of ade4 to the R package, or help beginners start to use R and ade4.
Another advantage is the fact that R is a multi-platform software. This means that it runs on Windows, Mac and many Unix-like platforms, with optimised performances. Multi-platform compatibility also includes datafile format. You can, for example, start computations on one computer (say a Windows PC) and save the results in the .RData file created at the end of the work session. You can then copy this .RData file to another computer (including a Mac or Linux PC) and continue computations without problem. The .RData file can even be stored on a network file server and used through the network on a Linux, Mac or Windows computer.
The first version of the ade4 package was submitted to CRAN in late 2002. It has kept evolving since that time, many functions have been added and several spin-off
packages have appeared. The current version of the ade4 package is number 1.7–11. It comprises 225 functions and 108 data sets.
1.2.2 Previous Versions of ade4
Previous versions of ade4 date back to the early 1980s. Their evolution was cyclic, with periods of intense development that were needed to catch up with the fast evolution of operating systems and computer hardware. These periods were followed by several years of distribution of a stable version, during which the evolution was limited to the addition of new statistical or graphical methods.
Everything started from a small set of programs written in BASIC on the Data General Nova 3 minicomputer of the Biometry Lab (Lyon 1 University, France). The first move occurred in 1985: a diagonalisation procedure in assembly language was written for the Eclipse S/140 that had replaced the Nova. This procedure allowed to compute the eigenvalues and eigenvectors of a matrix in a reasonable time, and this made possible using multivariate data analysis methods interactively on real-size ecological data sets.
In the late 1980s, the Eclipse was discontinued and we switched from Data General to the new Apple Macintosh microcomputer. We ported the programs to Microsoft QuickBasic, and added a HyperCard interface. The first version of this new setup was called ADECO and its distribution started in 1989.
ADECO developed into ADE-3.7 in 1994, but it was still written in QuickBasic, which had been abandoned by Microsoft at that time. So in the mid-1990s, we switched again and started a new version, called ADE-4, completely re-written in C. This allowed us to propose a multi-platform solution in 1995, using a HyperCard user interface on Macintosh, and WinPlus on Windows PC.
A few years later, we decided to start teaching S-Plus to Master’s students. Courses began in 1999, but we eventually switched to R in 2001. After a few months of hesitation, we started working on the R version of the new ade4 package in early 2002, and submitted ade4-1.0 to CRAN in December 2002. Since that date, ade4 stands for Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
.
All these developments, during almost 40 years, were the fruit of the work of many people. Only a few are cited here, please forgive inconsistencies, errors or omissions. The first Basic programs were written by (among others) Jean-Dominique Lebreton, Daniel Chessel and Jean Thioulouse. A little while later, the ADECO software benefited from the help of Sylvain Dolédec and Jean-Michel Olivier. During the 1980s and 1990s, many other people contributed to the work, including Yves Auda, Stéphane Champely, François Chevenet, James Devillers, Mohamed Hanafi, Yves Lasne, Monique Simier, Claire Boisson. The ADE-4 development was financially supported by several contracts with the French Ministry of Environnement and the National Center for Scientific Research (CNRS). Alain Pavé, Richard Tomassone, Christian Gautier, Claude Amoros, Bernhard Statzner and Bernard Hugueny helped keep the boat afloat.
The R add-on package (ade4) started a new area, with many new contributors, among them Stéphane Dray, Anne-Béatrice Dufour, Aurélie Siberchicot, Jean Lobry, Sandrine Pavoine, Clément Callenge, Thibaut Jombart, Sébastien Ollier. The recent switch to GitHub introduced a new open development model (svn/git) and new contributors:
https://github.com/sdray/ade4/graphs/contributors
1.3 Using ade4
1.3.1 Computer Hardware
Any microcomputer sold today is sufficient to perform most ecological data analysis tasks. Even small laptops and netbooks have enough computing power to do a Principal Component Analysis (PCA) on a large ecological data table. Only a few computing-intensive tasks like permutation tests on large tables can necessitate a more powerful desktop workstation with a faster CPU.
The size of the disk and of the main memory of mainstream microcomputers is more than enough for almost any data analysis problem. Even large DNA fingerprint, microarray or even metagenomic data table will easily fit. Data tables with thousands of rows and columns can be analysed without problem.
1.3.2 Installing R
The first step to start using ade4 is to install R. The R project homepage is here:
https://www.r-project.org/
and precompiled binary distributions are available for the main operating systems (Linux, Windows, Mac). A list of international mirrors can be used to choose the nearest source:
https://cran.r-project.org/mirrors.html
Instructions on how to download, install and run R can be found on all the mirrors. It is advisable to use the most recent version of the R software. Use the sessionInfo( ) function to get information about the current version of R and of attached or loaded packages.
1.3.3 Installing ade4
After installing R, you need to install the ade4 package. The easiest way to do this is to launch R and type the following command:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig001_HTML.pngThis is to be done only once. After package installation, you must load the package with the following command:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig002_HTML.pngThis must be redone each time R is launched, but it can be automated by placing the library command in the .Rprofile file. See the Startup documentation in R for more information about this:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig003_HTML.pngThis documentation page is very important and explains many things about the R startup mechanism.
The latest development versions of ade4 are available on GitHub:
https://github.com/sdray/ade4
The development version of ade4 can be easily installed using the functionality provided by the devtools package (Wickham et al. 2018):
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig004_HTML.png1.3.4 Dependencies
Using advanced features of the ade4 package can necessitate the use of other R packages (called dependencies). You can install all the dependencies (i.e., all the packages potentially needed by ade4) at once by using the following install command:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig005_HTML.pngThis will download many other packages and can take some time, depending on your internet connection speed.
1.3.5 Packages of the ade4 Family
Since the first release of ade4 on CRAN, several associated packages have been developed. These packages improve or extend the original functionalities of ade4:
adegraphics: An S4 Lattice-Based Package for the Representation of Multivariate Data
ade4TkGUI: Tcl/Tk Graphical User Interface
adespatial: Multivariate Multiscale Spatial Analysis
adephylo: Exploratory Analyses for the Phylogenetic Comparative Method
adegenet: Exploratory Analysis of Genetic and Genomic Data
adehabitat(HR/HS/LT/MA): Analysis of Habitat Selection by Animals
adiv: Analysis of Diversity
Some chapters of this book also require the use of packages adegraphics, ade4TkGUI, adespatial and adephylo.
1.3.5.1 adegraphics
The adegraphics package (Siberchicot et al. 2017, see Chapter 4) offers a flexible framework to create and manage graphics. It is based on the lattice package (Sarkar 2008) and contains the definitions of graphical S4 classes and methods that were previously implemented in ade4 as plain functions and S3 classes. A full chapter of this book is dedicated to this package (see Chap. 4).
adegraphics is available from CRAN mirrors, and it can be installed and loaded independently from ade4. adegraphics replaces some former implementations of graphical functions in ade4. If both packages should be used, always load adegraphics after ade4 to make sure you are using the right version of the functions:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig006_HTML.pngadegraphics is distributed with a tutorial vignette which can be accessed using:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig007_HTML.pngThe latest development versions of adegraphics are available on GitHub:
https://github.com/sdray/adegraphics
1.3.5.2 ade4TkGUI
The ade4TkGUI package (Thioulouse and Dray 2007, see Appendix B) provides a graphical user interface for ade4. ade4 and adegraphics, which means that these two packages must be installed and that they are automatically loaded when ade4TkGUI is loaded. It is also available from CRAN mirrors, and you can install it just like you installed ade4:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig008_HTML.pngThe latest development versions of ade4TkGUI are available on GitHub:
https://github.com/aursiber/ade4TkGUI
1.3.5.3 adespatial
adespatial (Dray et al. 2018, see Chapter 12) provides tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran’s Eigenvectors Maps, MEM).
adespatial is available from CRAN mirrors:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig009_HTML.pngadespatial is distributed with a tutorial vignette which can be accessed using:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig010_HTML.pngThe latest development versions of adespatial are available on GitHub:
https://github.com/sdray/adespatial
1.3.5.4 adephylo
adephylo (Jombart et al. 2010a, see Chapter 13) has been developed at the interface between packages for exploratory data analysis (ade4), phylogenetic reconstruction (ape, Paradis et al. 2004) and phylogenetic comparative methods (phylobase, R Hackathon et al. 2017). adephylo is available from CRAN mirrors:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig011_HTML.pngadephylo replaces some former implementations of phylogenetic comparative methods in ade4, which are now deprecated.
adephylo is distributed with a tutorial vignette which can be accessed using:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig012_HTML.pngThe latest development versions of adephylo are available on GitHub:
https://github.com/thibautjombart/adephylo
1.3.6 Version of the Packages Used in This Book
The versions of R and of the packages that were used to compile this book are given by the sessionInfo function:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig013_HTML.png../images/217725_1_En_1_Chapter/217725_1_En_1_Fig014_HTML.png../images/217725_1_En_1_Chapter/217725_1_En_1_Fig015_HTML.png1.3.7 The adelist Forum
The ade4 package homepage is here:
http://pbil.univ-lyon1.fr/ade4/home.php?lang=eng
A public forum and mailing list can be found at this address:
http://listes.univ-lyon1.fr/wws/info/adelist
This is the place where questions about all aspects of ade4 and related packages should be asked. All the users of ade4 should subscribe to this list, at least temporarily. To report problems or errors, you can use the GitHub functionality (e.g., https://github.com/sdray/ade4/issues for ade4). Do no forget to quote the result of the sessionInfo function.
1.3.8 Using the Help System
You are now ready to start using the ade4 package. You can browse through the package documentation using the html interface (see the help.start functions). Like in any R package, all the functions and data sets have a documentation page, that can be accessed with the help command:
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig016_HTML.pngor
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig017_HTML.png1.4 Interactive Code Snippets
The code snippets used throughout this book are available online. They can be run, modified and checked thanks to the shiny system at the following address:
http://pbil.univ-lyon1.fr/ADE-4/book.php
1.5 Ecological Data Sets
The structure of ecological data sets can be very complex, but can generally be reduced to simpler forms, compatible with R data structures. Figures 1.2 and 1.3 show the main data structures used in ecological data analysis. These structures also correspond to particular data analysis methods in ade4.
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig2_HTML.pngFig. 1.2
Common structures of ecological data sets. A: rectangular data table (site × environmental variables or site × species), B: distance matrix, C: row and column weights, D: data table with groups of sites, E: pair of ecological tables (X = environmental variables, Y = species data), F: pair of ecological tables with groups of sites, G: K-table, H: pair of K-tables.
../images/217725_1_En_1_Chapter/217725_1_En_1_Fig3_HTML.pngFig. 1.3
Common structures of ecological data sets (continued). I: pair of ecological tables with species traits, J: dissimilarities between species and communities composition tables, K: rectangular data table (site × environmental variables or site × species) with geographical coordinates (xy), L: rectangular data table with phylogenetic information between rows.
The most frequent data structure is a rectangular table with samples (sites) as rows and variables as columns (Fig. 1.2A). This structure corresponds to quantitative environmental variable data tables (sites × variables, see Chap. 5), and also to floro-faunistic tables (sites × species, see Chap. 6). It perfectly fits the R data frame structure, and can be used directly in the ade4 package for single-table multivariate analysis methods. The case of qualitative (or categorical) environmental variables also fits well R data frames, with columns class set to factor. Mixes of quantitative and qualitative variables can also be stored in data frames, since data frame columns can have mixed types.
Another common practice in Ecology is to consider distance matrices. These distances can be either directly measured by ecologists or derived from original raw data (see functions dist.binary, dist.quant, etc. in ade4). Distances are used to describe dissimilarities among individuals such as genetic, morphometric or geographic distances. The analysis of distance matrices (Fig. 1.2B) requires an adequate statistical treatment and some methods are implemented in ade4 for that purpose (Sect. 6.5). In R, distance matrices are stored as objects of class dist.
In ade4, all the multivariate analysis methods make use of row and column weights, and they are a very important part of the analysis itself. The row and column weights of a data table can be stored in numeric vectors (Fig. 1.2C). Weights are generally not defined by the user: they are associated to a particular analysis and are computed directly by ade4 functions. For instance, in correspondence analysis (Sect. 6.2), rows and columns weights are derived from the row and column totals of the data set. However, in some cases, these weights can also be defined by the user as external constraints. For instance, in the case of differential sampling effort, row weights can be chosen proportional to sampling intensities so that highly sampled sites have more weight in the analysis.
In many cases, it is useful to define groups of samples to take into account different geographical locations, several types of habitats, or successive sampling dates (Fig. 1.2D). More generally, groups of samples can correspond to the experimental design used to collect the data, and it is very important to be able to take this information into account in the statistical analysis of the data set. We shall see in Chap. 7 that several methods exist in the ade4 ade4 package for this purpose. In R, a vector of class factor with a length equal to the number of rows of used to define groups of rows in a data table.
When both the abundance of species and environmental variables are recorded at the same site, it is possible to study how the species respond to environmental gradients. This is the most classical problem of ecological data analysis (see Chap. 8) and requires to analyse simultaneously a pair of tables. The rows of the two tables must be identical, as they correspond to the same sampling sites. In ade4, one data frame is used to store the environmental variables and another to store the species data. These two data frames can be pre-processed by simple one-table analysis methods, and the resulting objects can then be passed to two-table coupling methods (Fig. 1.2E). If the rows of these two tables are also partitioned in groups, it is possible to study species-environment relationships in different conditions, treatments or areas (Fig. 1.2F).
When sampling is repeated over time, one gets a series of tables, called a K-table. In ade4 this information is stored in a compact and easy-to-use data structure (a list of class ktab). This structure provides functions allowing a straightforward manipulation of individual tables and of the whole series (Fig. 1.2G). Many methods are available in ade4 to analyse ktab globally (see Chap. 9) and study how the structure of ecological communities change in time. Pairs of ktab can be used to analyse the evolution of the relationships between species and environment (Fig. 1.2H, see Chap. 10).
To improve our understanding of the functioning of ecological systems, it is possible to integrate information on species. Species traits can be integrated to identify which characteristics of species drive their response to environmental conditions. Several methods focusing on this question are presented in this book (Fig. 1.3I, see Chap. 11). Species traits can also be used to define species dissimilarities that are then used to measure functional diversities within or among communities (Fig. 1.3J, see Chap. 14).
Lastly, it should be noticed that neither sites nor species can be considered as independent samples. Sites are usually georeferenced and thus have geographical attributes (Fig. 1.3K, see Chap. 12). On the other hand, species share some common evolutionary history that can be represented by a phylogenetic tree (Fig. 1.3L, see Chap. 13). The adespatial and adephylo packages provide tools to study spatial and phylogenetic autocorrelation, respectively, in order to understand how ecological properties are affected by spatial and phylogenetic relatedness.
References
Dray S, Dufour A (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22(4):1–20Crossref
Dray S, Blanchet G, Borcard D, Clappe S, Guenard G, Jombart T, Larocque G, Legendre P, Madi N, Wagner HH (2018) adespatial: multivariate multiscale spatial analysis. https://CRAN.R-project.org/package=adespatial, R package version 0.2-0
Jombart T, Balloux F, Dray S (2010a) adephylo: new tools for investigating the phylogenetic signal in biological traits. Bioinformatics 26:1907–1909Crossref
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290Crossref
R Hackathon et al (2017) phylobase: base package for phylogenetic structures and comparative data. https://CRAN.R-project.org/package=phylobase, R package version 0.8.4
Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New YorkCrossref
Siberchicot A, Julien-Laferrière A, Dufour AB, Thioulouse J, Dray S (2017) adegraphics: an S4 lattice-based package for the representation of multivariate data. R J 9(2):198–212
Tennekes M (2017) treemap: treemap visualization. https://CRAN.R-project.org/package=treemap, R package version 2.4-2
Thioulouse J, Dray S (2007) Interactive multivariate data analysis in R with the ade4 and ade4TkGUI packages. J Stat Softw 22(5):1–14Crossref
Thioulouse J, Chessel D, Dolédec S, Olivier J (1997) ADE-4: a multivariate analysis and graphical display software. Stat Comput 7(1):75–83Crossref
Wickham H, Hester J, Chang W (2018) devtools: tools to make developing R packages easier. https://CRAN.R-project.org/package=devtools, R package version 1.13.5
© Springer Science+Business Media, LLC, part of Springer Nature 2018
Jean Thioulouse, Stéphane Dray, Anne-Béatrice Dufour, Aurélie Siberchicot, Thibaut Jombart and Sandrine PavoineMultivariate Analysis of Ecological Data with ade4https://doi.org/10.1007/978-1-4939-8850-1_2
2. Useful R Functions and Data Structures
Jean Thioulouse¹ , Stéphane Dray¹, Anne-Béatrice Dufour¹, Aurélie Siberchicot¹, Thibaut Jombart² and Sandrine Pavoine³
(1)
Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR 5558 – Université de Lyon, Villeurbanne, France
(2)
Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
(3)
Centre d’Ecologie et des Sciences de la Conservation (CESCO), Muséum national déHistoire naturelle, CNRS, Sorbonne Université, Paris, France
Abstract
This chapter explains the basic R functions needed for data import and export operations, and for handling vectors, data tables and qualitative variables (factors). This introductory presentation is limited to a few key elements needed for multivariate data analysis in Ecology with the ade4 package. It is not intended as a general introduction to R, and if needed, the reader should refer to a basic book on R. See, for example, here: https://cran.r-project.org/manuals.html or here: https://cran.r-project.org/other-docs.html.
2.1 Introduction
Data preparation and importation are one of the most time-consuming operations in the process of analysing ecological data with the ade4 package in R. Raw data sets are often stored in spreadsheet documents, and there is a long way between these raw documents and the data table that can be used in a multivariate analysis. Both technical and theoretical considerations must be taken into account during these preparation steps.
Technical problems arise in the task of cleaning up the data, that is, for example, checking for special characters that could prevent normal reading of the raw file, checking for row and column names, verifying aberrant values, removing missing data, etc. Some of these steps must be taken in the spreadsheet software, and some should preferably be done in R.
More theoretical questions appear later, and they are related, for example, to which variables should be included or not in the data table, which type of data should be considered, which data analysis method should be used, etc. Most of these steps should be performed inside R, using its powerful data handling functions.
2.2 Basic Data Import and Export Functions
The basic functions for reading and writing data tables in R are read.table and write.table. The data function can be used to load a predefined data set, either from the base R distribution or from a contributed package like ade4.
2.2.1 read.table
The main data import function is read.table. This is the function to use when reading a text file (for example, a spreadsheet exported from Excel).
> read.table(file, header = FALSE, dec = .
)
The first argument (file) is the name of the file which the data are to be read from. The argument header is a logical value indicating whether the file contains the names of the variables as its first line. The dec argument can be used to set the decimal mark (.
by default). Many other arguments are described in the read.table documentation. Use help( read.table
) in R to get access to this documentation.
From the spreadsheet software (see Fig. 2.1), the data table should be saved to a text file using the Save as…
command. It is then possible to read this text file using the read.table function in R, and to store the result in a data frame. In the following example, the text file MeauEnv.txt
is read and the resulting data are stored in the env data frame.
Fig. 2.1
Screenshot of an example Excel spreadsheet MeauEnv.xls
. The first row contains variable names, and the first column contains row names. The first cell (A,1) is left empty.
Note that for R, row names must be unique. Failing to follow this rule will prevent reading of the file. It results in the error message duplicate ‘row.names’ are not allowed
.
Column names should not contain any special character, particularly spaces, as they would be interpreted by default as column separators. These names will be used by ade4 graphical functions as labels on factor maps, so they should be kept short and informative. For example, do not use long species names that would clutter factor maps, but short species codes (like irve
for Ir is ve rsicolor). Usually, rows correspond to items (individuals, samples, etc.) and columns correspond to descriptors (variables, species, etc.).
Depending on the spreadsheet software preferences and on the computer system settings, the decimal mark in the text file can be a dot .
or a comma ,
(this is particularly the case in some European countries). It is necessary to check this point and to set the dec argument accordingly, or to change the decimal mark in the text file using a compatible text editor.
By default, read.table transforms