Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Multivariate Analysis of Ecological Data with ade4
Multivariate Analysis of Ecological Data with ade4
Multivariate Analysis of Ecological Data with ade4
Ebook614 pages4 hours

Multivariate Analysis of Ecological Data with ade4

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book introduces the ade4 package for R which provides multivariate methods for the analysis of ecological data. It is implemented around the mathematical concept of the duality diagram, and provides a unified framework for multivariate analysis.  The authors offer a detailed presentation of the theoretical framework of the duality diagram and also of its application to real-world ecological problems.  These two goals may seem contradictory, as they concern two separate groups of scientists, namely statisticians and ecologists. However, statistical ecology has become a scientific discipline of its own, and the good use of multivariate data analysis methods by ecologists implies a fair knowledge of the mathematical properties of these methods.

The organization of the book is based on ecological questions, but these questions correspond to particular classes of data analysis methods. The first chapters present both usual and multiway data analysis methods. Further chapters are dedicated for example to the analysis of spatial data, of phylogenetic structures, and of biodiversity patterns. One chapter deals with multivariate data analysis graphs.

In each chapter, the basic mathematical definitions of the methods and the outputs of the R functions available in ade4 are detailed in two different boxes. The text of the book itself can be read independently from these boxes. Thus the book offers the opportunity to find information about the ecological situation from which a question raises alongside the mathematical properties of methods that can be applied to answer this question, as well as the details of software outputs. Each example and all the graphs in this book come with executable R code. 

LanguageEnglish
PublisherSpringer
Release dateNov 8, 2018
ISBN9781493988501
Multivariate Analysis of Ecological Data with ade4

Related to Multivariate Analysis of Ecological Data with ade4

Related ebooks

Medical For You

View More

Related articles

Reviews for Multivariate Analysis of Ecological Data with ade4

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Multivariate Analysis of Ecological Data with ade4 - Jean Thioulouse

    © Springer Science+Business Media, LLC, part of Springer Nature 2018

    Jean Thioulouse, Stéphane Dray, Anne-Béatrice Dufour, Aurélie Siberchicot, Thibaut Jombart and Sandrine PavoineMultivariate Analysis of Ecological Data with ade4https://doi.org/10.1007/978-1-4939-8850-1_1

    1. Introduction

    Jean Thioulouse¹ , Stéphane Dray¹, Anne-Béatrice Dufour¹, Aurélie Siberchicot¹, Thibaut Jombart² and Sandrine Pavoine³

    (1)

    Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR 5558 – Université de Lyon, Villeurbanne, France

    (2)

    Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK

    (3)

    Centre d’Ecologie et des Sciences de la Conservation (CESCO), Muséum national déHistoire naturelle, CNRS, Sorbonne Université, Paris, France

    Abstract

    This introductory chapter presents the intended readership of the book and a short history of the ade4 software. It also describes the associated packages of the ade4 family and how to install and use these packages with R. Lastly, we provide a short presentation of the types of ecological data sets found in real case studies.

    1.1 Intended Readership

    Multivariate data analysis methods are not restricted to any particular application field: they have been used in many scientific domains. However, because of the background of its authors, ade4 has always been more particularly intended for biologists, especially in the field of Ecology. The subject area analysis of the list of scientific papers citing the three ade4 references (Thioulouse et al. 1997; Dray and Dufour 2007; Thioulouse and Dray 2007) highlights this trend (Fig. 1.1, source: ISI Web of Knowledge).

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig1_HTML.png

    Fig. 1.1

    Number of citations of the three ade4 papers by ISI research area (top 15 research areas). The total number of citations reaches 2655 in March 2018. This figure is created with the treemap package (Tennekes 2017).

    Researchers and students in ecological fields are therefore potentially interested in using multivariate analysis methods, and this book was primarily written for them. Other areas with fewer citations include, for example, Tropical Medicine, Physics Particles and Fields, Spectroscopy, Sociology and Literature. Researchers in these areas can also be interested in this book, but the examples used throughout the text come from ecological case studies.

    Multivariate data analysis methods are particularly useful to analyse large data sets, for example tens or hundreds of variables measured on hundreds or thousands of samples. The synthetic properties of these methods are really helpful in this case. When fewer parameters and/or samples are available, other methods should be considered. Today, molecular biology methods provide huge data sets belonging to almost any biology area, that can be analysed very effectively with multivariate analysis methods.

    1.2 Evolutions of ade4

    1.2.1 The ade4 Add-On Package for R

    The current version of ade4 is an add-on package for the R software. This has important consequences for the user: you need to install R on your computer and learn to handle it before you can start using ade4. But it also has many advantages: learning to deal with R will be valuable beyond the use of ade4, as all the common statistical computations needed by biologists can be performed with R.

    There is also an easy-to-use Graphical User Interface (GUI) implemented in the ade4TkGUI add-on package (Thioulouse and Dray 2007, see Appendix B). This GUI can facilitate the transition from previous versions of ade4 to the R package, or help beginners start to use R and ade4.

    Another advantage is the fact that R is a multi-platform software. This means that it runs on Windows, Mac and many Unix-like platforms, with optimised performances. Multi-platform compatibility also includes datafile format. You can, for example, start computations on one computer (say a Windows PC) and save the results in the .RData file created at the end of the work session. You can then copy this .RData file to another computer (including a Mac or Linux PC) and continue computations without problem. The .RData file can even be stored on a network file server and used through the network on a Linux, Mac or Windows computer.

    The first version of the ade4 package was submitted to CRAN in late 2002. It has kept evolving since that time, many functions have been added and several spin-off packages have appeared. The current version of the ade4 package is number 1.7–11. It comprises 225 functions and 108 data sets.

    1.2.2 Previous Versions of ade4

    Previous versions of ade4 date back to the early 1980s. Their evolution was cyclic, with periods of intense development that were needed to catch up with the fast evolution of operating systems and computer hardware. These periods were followed by several years of distribution of a stable version, during which the evolution was limited to the addition of new statistical or graphical methods.

    Everything started from a small set of programs written in BASIC on the Data General Nova 3 minicomputer of the Biometry Lab (Lyon 1 University, France). The first move occurred in 1985: a diagonalisation procedure in assembly language was written for the Eclipse S/140 that had replaced the Nova. This procedure allowed to compute the eigenvalues and eigenvectors of a matrix in a reasonable time, and this made possible using multivariate data analysis methods interactively on real-size ecological data sets.

    In the late 1980s, the Eclipse was discontinued and we switched from Data General to the new Apple Macintosh microcomputer. We ported the programs to Microsoft QuickBasic, and added a HyperCard interface. The first version of this new setup was called ADECO and its distribution started in 1989.

    ADECO developed into ADE-3.7 in 1994, but it was still written in QuickBasic, which had been abandoned by Microsoft at that time. So in the mid-1990s, we switched again and started a new version, called ADE-4, completely re-written in C. This allowed us to propose a multi-platform solution in 1995, using a HyperCard user interface on Macintosh, and WinPlus on Windows PC.

    A few years later, we decided to start teaching S-Plus to Master’s students. Courses began in 1999, but we eventually switched to R in 2001. After a few months of hesitation, we started working on the R version of the new ade4 package in early 2002, and submitted ade4-1.0 to CRAN in December 2002. Since that date, ade4 stands for Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences.

    All these developments, during almost 40 years, were the fruit of the work of many people. Only a few are cited here, please forgive inconsistencies, errors or omissions. The first Basic programs were written by (among others) Jean-Dominique Lebreton, Daniel Chessel and Jean Thioulouse. A little while later, the ADECO software benefited from the help of Sylvain Dolédec and Jean-Michel Olivier. During the 1980s and 1990s, many other people contributed to the work, including Yves Auda, Stéphane Champely, François Chevenet, James Devillers, Mohamed Hanafi, Yves Lasne, Monique Simier, Claire Boisson. The ADE-4 development was financially supported by several contracts with the French Ministry of Environnement and the National Center for Scientific Research (CNRS). Alain Pavé, Richard Tomassone, Christian Gautier, Claude Amoros, Bernhard Statzner and Bernard Hugueny helped keep the boat afloat.

    The R add-on package (ade4) started a new area, with many new contributors, among them Stéphane Dray, Anne-Béatrice Dufour, Aurélie Siberchicot, Jean Lobry, Sandrine Pavoine, Clément Callenge, Thibaut Jombart, Sébastien Ollier. The recent switch to GitHub introduced a new open development model (svn/git) and new contributors:

    https://​github.​com/​sdray/​ade4/​graphs/​contributors

    1.3 Using ade4

    1.3.1 Computer Hardware

    Any microcomputer sold today is sufficient to perform most ecological data analysis tasks. Even small laptops and netbooks have enough computing power to do a Principal Component Analysis (PCA) on a large ecological data table. Only a few computing-intensive tasks like permutation tests on large tables can necessitate a more powerful desktop workstation with a faster CPU.

    The size of the disk and of the main memory of mainstream microcomputers is more than enough for almost any data analysis problem. Even large DNA fingerprint, microarray or even metagenomic data table will easily fit. Data tables with thousands of rows and columns can be analysed without problem.

    1.3.2 Installing R

    The first step to start using ade4 is to install R. The R project homepage is here:

    https://​www.​r-project.​org/​

    and precompiled binary distributions are available for the main operating systems (Linux, Windows, Mac). A list of international mirrors can be used to choose the nearest source:

    https://​cran.​r-project.​org/​mirrors.​html

    Instructions on how to download, install and run R can be found on all the mirrors. It is advisable to use the most recent version of the R software. Use the sessionInfo( ) function to get information about the current version of R and of attached or loaded packages.

    1.3.3 Installing ade4

    After installing R, you need to install the ade4 package. The easiest way to do this is to launch R and type the following command:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig001_HTML.png

    This is to be done only once. After package installation, you must load the package with the following command:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig002_HTML.png

    This must be redone each time R is launched, but it can be automated by placing the library command in the .Rprofile file. See the Startup documentation in R for more information about this:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig003_HTML.png

    This documentation page is very important and explains many things about the R startup mechanism.

    The latest development versions of ade4 are available on GitHub:

    https://​github.​com/​sdray/​ade4

    The development version of ade4 can be easily installed using the functionality provided by the devtools package (Wickham et al. 2018):

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig004_HTML.png

    1.3.4 Dependencies

    Using advanced features of the ade4 package can necessitate the use of other R packages (called dependencies). You can install all the dependencies (i.e., all the packages potentially needed by ade4) at once by using the following install command:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig005_HTML.png

    This will download many other packages and can take some time, depending on your internet connection speed.

    1.3.5 Packages of the ade4 Family

    Since the first release of ade4 on CRAN, several associated packages have been developed. These packages improve or extend the original functionalities of ade4:

    adegraphics: An S4 Lattice-Based Package for the Representation of Multivariate Data

    ade4TkGUI: Tcl/Tk Graphical User Interface

    adespatial: Multivariate Multiscale Spatial Analysis

    adephylo: Exploratory Analyses for the Phylogenetic Comparative Method

    adegenet: Exploratory Analysis of Genetic and Genomic Data

    adehabitat(HR/HS/LT/MA): Analysis of Habitat Selection by Animals

    adiv: Analysis of Diversity

    Some chapters of this book also require the use of packages adegraphics, ade4TkGUI, adespatial and adephylo.

    1.3.5.1 adegraphics

    The adegraphics package (Siberchicot et al. 2017, see Chapter 4) offers a flexible framework to create and manage graphics. It is based on the lattice package (Sarkar 2008) and contains the definitions of graphical S4 classes and methods that were previously implemented in ade4 as plain functions and S3 classes. A full chapter of this book is dedicated to this package (see Chap. 4).

    adegraphics is available from CRAN mirrors, and it can be installed and loaded independently from ade4. adegraphics replaces some former implementations of graphical functions in ade4. If both packages should be used, always load adegraphics after ade4 to make sure you are using the right version of the functions:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig006_HTML.png

    adegraphics is distributed with a tutorial vignette which can be accessed using:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig007_HTML.png

    The latest development versions of adegraphics are available on GitHub:

    https://​github.​com/​sdray/​adegraphics

    1.3.5.2 ade4TkGUI

    The ade4TkGUI package (Thioulouse and Dray 2007, see Appendix B) provides a graphical user interface for ade4. ade4 and adegraphics, which means that these two packages must be installed and that they are automatically loaded when ade4TkGUI is loaded. It is also available from CRAN mirrors, and you can install it just like you installed ade4:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig008_HTML.png

    The latest development versions of ade4TkGUI are available on GitHub:

    https://​github.​com/​aursiber/​ade4TkGUI

    1.3.5.3 adespatial

    adespatial (Dray et al. 2018, see Chapter 12) provides tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran’s Eigenvectors Maps, MEM).

    adespatial is available from CRAN mirrors:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig009_HTML.png

    adespatial is distributed with a tutorial vignette which can be accessed using:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig010_HTML.png

    The latest development versions of adespatial are available on GitHub:

    https://​github.​com/​sdray/​adespatial

    1.3.5.4 adephylo

    adephylo (Jombart et al. 2010a, see Chapter 13) has been developed at the interface between packages for exploratory data analysis (ade4), phylogenetic reconstruction (ape, Paradis et al. 2004) and phylogenetic comparative methods (phylobase, R Hackathon et al. 2017). adephylo is available from CRAN mirrors:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig011_HTML.png

    adephylo replaces some former implementations of phylogenetic comparative methods in ade4, which are now deprecated.

    adephylo is distributed with a tutorial vignette which can be accessed using:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig012_HTML.png

    The latest development versions of adephylo are available on GitHub:

    https://​github.​com/​thibautjombart/​adephylo

    1.3.6 Version of the Packages Used in This Book

    The versions of R and of the packages that were used to compile this book are given by the sessionInfo function:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig013_HTML.png../images/217725_1_En_1_Chapter/217725_1_En_1_Fig014_HTML.png../images/217725_1_En_1_Chapter/217725_1_En_1_Fig015_HTML.png

    1.3.7 The adelist Forum

    The ade4 package homepage is here:

    http://​pbil.​univ-lyon1.​fr/​ade4/​home.​php?​lang=​eng

    A public forum and mailing list can be found at this address:

    http://​listes.​univ-lyon1.​fr/​wws/​info/​adelist

    This is the place where questions about all aspects of ade4 and related packages should be asked. All the users of ade4 should subscribe to this list, at least temporarily. To report problems or errors, you can use the GitHub functionality (e.g., https://​github.​com/​sdray/​ade4/​issues for ade4). Do no forget to quote the result of the sessionInfo function.

    1.3.8 Using the Help System

    You are now ready to start using the ade4 package. You can browse through the package documentation using the html interface (see the help.start functions). Like in any R package, all the functions and data sets have a documentation page, that can be accessed with the help command:

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig016_HTML.png

    or

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig017_HTML.png

    1.4 Interactive Code Snippets

    The code snippets used throughout this book are available online. They can be run, modified and checked thanks to the shiny system at the following address:

    http://​pbil.​univ-lyon1.​fr/​ADE-4/​book.​php

    1.5 Ecological Data Sets

    The structure of ecological data sets can be very complex, but can generally be reduced to simpler forms, compatible with R data structures. Figures 1.2 and 1.3 show the main data structures used in ecological data analysis. These structures also correspond to particular data analysis methods in ade4.

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig2_HTML.png

    Fig. 1.2

    Common structures of ecological data sets. A: rectangular data table (site × environmental variables or site × species), B: distance matrix, C: row and column weights, D: data table with groups of sites, E: pair of ecological tables (X = environmental variables, Y = species data), F: pair of ecological tables with groups of sites, G: K-table, H: pair of K-tables.

    ../images/217725_1_En_1_Chapter/217725_1_En_1_Fig3_HTML.png

    Fig. 1.3

    Common structures of ecological data sets (continued). I: pair of ecological tables with species traits, J: dissimilarities between species and communities composition tables, K: rectangular data table (site × environmental variables or site × species) with geographical coordinates (xy), L: rectangular data table with phylogenetic information between rows.

    The most frequent data structure is a rectangular table with samples (sites) as rows and variables as columns (Fig. 1.2A). This structure corresponds to quantitative environmental variable data tables (sites × variables, see Chap. 5), and also to floro-faunistic tables (sites × species, see Chap. 6). It perfectly fits the R data frame structure, and can be used directly in the ade4 package for single-table multivariate analysis methods. The case of qualitative (or categorical) environmental variables also fits well R data frames, with columns class set to factor. Mixes of quantitative and qualitative variables can also be stored in data frames, since data frame columns can have mixed types.

    Another common practice in Ecology is to consider distance matrices. These distances can be either directly measured by ecologists or derived from original raw data (see functions dist.binary, dist.quant, etc. in ade4). Distances are used to describe dissimilarities among individuals such as genetic, morphometric or geographic distances. The analysis of distance matrices (Fig. 1.2B) requires an adequate statistical treatment and some methods are implemented in ade4 for that purpose (Sect. 6.​5). In R, distance matrices are stored as objects of class dist.

    In ade4, all the multivariate analysis methods make use of row and column weights, and they are a very important part of the analysis itself. The row and column weights of a data table can be stored in numeric vectors (Fig. 1.2C). Weights are generally not defined by the user: they are associated to a particular analysis and are computed directly by ade4 functions. For instance, in correspondence analysis (Sect. 6.​2), rows and columns weights are derived from the row and column totals of the data set. However, in some cases, these weights can also be defined by the user as external constraints. For instance, in the case of differential sampling effort, row weights can be chosen proportional to sampling intensities so that highly sampled sites have more weight in the analysis.

    In many cases, it is useful to define groups of samples to take into account different geographical locations, several types of habitats, or successive sampling dates (Fig. 1.2D). More generally, groups of samples can correspond to the experimental design used to collect the data, and it is very important to be able to take this information into account in the statistical analysis of the data set. We shall see in Chap. 7 that several methods exist in the ade4 ade4 package for this purpose. In R, a vector of class factor with a length equal to the number of rows of used to define groups of rows in a data table.

    When both the abundance of species and environmental variables are recorded at the same site, it is possible to study how the species respond to environmental gradients. This is the most classical problem of ecological data analysis (see Chap. 8) and requires to analyse simultaneously a pair of tables. The rows of the two tables must be identical, as they correspond to the same sampling sites. In ade4, one data frame is used to store the environmental variables and another to store the species data. These two data frames can be pre-processed by simple one-table analysis methods, and the resulting objects can then be passed to two-table coupling methods (Fig. 1.2E). If the rows of these two tables are also partitioned in groups, it is possible to study species-environment relationships in different conditions, treatments or areas (Fig. 1.2F).

    When sampling is repeated over time, one gets a series of tables, called a K-table. In ade4 this information is stored in a compact and easy-to-use data structure (a list of class ktab). This structure provides functions allowing a straightforward manipulation of individual tables and of the whole series (Fig. 1.2G). Many methods are available in ade4 to analyse ktab globally (see Chap. 9) and study how the structure of ecological communities change in time. Pairs of ktab can be used to analyse the evolution of the relationships between species and environment (Fig. 1.2H, see Chap. 10).

    To improve our understanding of the functioning of ecological systems, it is possible to integrate information on species. Species traits can be integrated to identify which characteristics of species drive their response to environmental conditions. Several methods focusing on this question are presented in this book (Fig. 1.3I, see Chap. 11). Species traits can also be used to define species dissimilarities that are then used to measure functional diversities within or among communities (Fig. 1.3J, see Chap. 14).

    Lastly, it should be noticed that neither sites nor species can be considered as independent samples. Sites are usually georeferenced and thus have geographical attributes (Fig. 1.3K, see Chap. 12). On the other hand, species share some common evolutionary history that can be represented by a phylogenetic tree (Fig. 1.3L, see Chap. 13). The adespatial and adephylo packages provide tools to study spatial and phylogenetic autocorrelation, respectively, in order to understand how ecological properties are affected by spatial and phylogenetic relatedness.

    References

    Dray S, Dufour A (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22(4):1–20Crossref

    Dray S, Blanchet G, Borcard D, Clappe S, Guenard G, Jombart T, Larocque G, Legendre P, Madi N, Wagner HH (2018) adespatial: multivariate multiscale spatial analysis. https://​CRAN.​R-project.​org/​package=​adespatial, R package version 0.2-0

    Jombart T, Balloux F, Dray S (2010a) adephylo: new tools for investigating the phylogenetic signal in biological traits. Bioinformatics 26:1907–1909Crossref

    Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290Crossref

    R Hackathon et al (2017) phylobase: base package for phylogenetic structures and comparative data. https://​CRAN.​R-project.​org/​package=​phylobase, R package version 0.8.4

    Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New YorkCrossref

    Siberchicot A, Julien-Laferrière A, Dufour AB, Thioulouse J, Dray S (2017) adegraphics: an S4 lattice-based package for the representation of multivariate data. R J 9(2):198–212

    Tennekes M (2017) treemap: treemap visualization. https://​CRAN.​R-project.​org/​package=​treemap, R package version 2.4-2

    Thioulouse J, Dray S (2007) Interactive multivariate data analysis in R with the ade4 and ade4TkGUI packages. J Stat Softw 22(5):1–14Crossref

    Thioulouse J, Chessel D, Dolédec S, Olivier J (1997) ADE-4: a multivariate analysis and graphical display software. Stat Comput 7(1):75–83Crossref

    Wickham H, Hester J, Chang W (2018) devtools: tools to make developing R packages easier. https://​CRAN.​R-project.​org/​package=​devtools, R package version 1.13.5

    © Springer Science+Business Media, LLC, part of Springer Nature 2018

    Jean Thioulouse, Stéphane Dray, Anne-Béatrice Dufour, Aurélie Siberchicot, Thibaut Jombart and Sandrine PavoineMultivariate Analysis of Ecological Data with ade4https://doi.org/10.1007/978-1-4939-8850-1_2

    2. Useful R Functions and Data Structures

    Jean Thioulouse¹ , Stéphane Dray¹, Anne-Béatrice Dufour¹, Aurélie Siberchicot¹, Thibaut Jombart² and Sandrine Pavoine³

    (1)

    Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR 5558 – Université de Lyon, Villeurbanne, France

    (2)

    Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK

    (3)

    Centre d’Ecologie et des Sciences de la Conservation (CESCO), Muséum national déHistoire naturelle, CNRS, Sorbonne Université, Paris, France

    Abstract

    This chapter explains the basic R functions needed for data import and export operations, and for handling vectors, data tables and qualitative variables (factors). This introductory presentation is limited to a few key elements needed for multivariate data analysis in Ecology with the ade4 package. It is not intended as a general introduction to R, and if needed, the reader should refer to a basic book on R. See, for example, here: https://​cran.​r-project.​org/​manuals.​html or here: https://​cran.​r-project.​org/​other-docs.​html.

    2.1 Introduction

    Data preparation and importation are one of the most time-consuming operations in the process of analysing ecological data with the ade4 package in R. Raw data sets are often stored in spreadsheet documents, and there is a long way between these raw documents and the data table that can be used in a multivariate analysis. Both technical and theoretical considerations must be taken into account during these preparation steps.

    Technical problems arise in the task of cleaning up the data, that is, for example, checking for special characters that could prevent normal reading of the raw file, checking for row and column names, verifying aberrant values, removing missing data, etc. Some of these steps must be taken in the spreadsheet software, and some should preferably be done in R.

    More theoretical questions appear later, and they are related, for example, to which variables should be included or not in the data table, which type of data should be considered, which data analysis method should be used, etc. Most of these steps should be performed inside R, using its powerful data handling functions.

    2.2 Basic Data Import and Export Functions

    The basic functions for reading and writing data tables in R are read.table and write.table. The data function can be used to load a predefined data set, either from the base R distribution or from a contributed package like ade4.

    2.2.1 read.table

    The main data import function is read.table. This is the function to use when reading a text file (for example, a spreadsheet exported from Excel).

    > read.table(file, header = FALSE, dec = .)

    The first argument (file) is the name of the file which the data are to be read from. The argument header is a logical value indicating whether the file contains the names of the variables as its first line. The dec argument can be used to set the decimal mark (. by default). Many other arguments are described in the read.table documentation. Use help( read.table) in R to get access to this documentation.

    From the spreadsheet software (see Fig. 2.1), the data table should be saved to a text file using the Save as… command. It is then possible to read this text file using the read.table function in R, and to store the result in a data frame. In the following example, the text file MeauEnv.txt is read and the resulting data are stored in the env data frame.

    ../images/217725_1_En_2_Chapter/217725_1_En_2_Fig1_HTML.png

    Fig. 2.1

    Screenshot of an example Excel spreadsheet MeauEnv.xls. The first row contains variable names, and the first column contains row names. The first cell (A,1) is left empty.

    ../images/217725_1_En_2_Chapter/217725_1_En_2_Fig001_HTML.png

    Note that for R, row names must be unique. Failing to follow this rule will prevent reading of the file. It results in the error message duplicate ‘row.names’ are not allowed.

    Column names should not contain any special character, particularly spaces, as they would be interpreted by default as column separators. These names will be used by ade4 graphical functions as labels on factor maps, so they should be kept short and informative. For example, do not use long species names that would clutter factor maps, but short species codes (like irve for Ir is ve rsicolor). Usually, rows correspond to items (individuals, samples, etc.) and columns correspond to descriptors (variables, species, etc.).

    Depending on the spreadsheet software preferences and on the computer system settings, the decimal mark in the text file can be a dot . or a comma , (this is particularly the case in some European countries). It is necessary to check this point and to set the dec argument accordingly, or to change the decimal mark in the text file using a compatible text editor.

    By default, read.table transforms

    Enjoying the preview?
    Page 1 of 1