Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

R coding for data analysts: from beginner to advanced
R coding for data analysts: from beginner to advanced
R coding for data analysts: from beginner to advanced
Ebook503 pages2 hours

R coding for data analysts: from beginner to advanced

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book on coding with R for aspiring data analysts is designed to be a guide in this programming language from the basics. By the end of this book, you will be able to create, import, manipulate and manage datasets. We will learn together how to download, install and use some of the most important tools and libraries for using R. We will then move on to the creation of objects: R is based on certain structures that you need to know, such as vectors, matrices, lists and dataframes. Once we understand how to create and manipulate these data structures, extract elements from them and save them locally on the computer, we will move on to the use of loops and the creation of functions. 

We will look at a number of useful topics: how to set up a working directory, how to install and retrieve a package, how to get information about data, where to find datasets for testing, and how to get help with a function. When analysing data, we need to understand the concept of dataset or dataframe. We will therefore see how to import a dataframe from your computer, or from the internet, into R. There are many functions that are suitable for this purpose, and many packages that are useful for importing data that is in some particular format, such as the formats for Excel, .csv, .txt or JSON. We will then see how to manipulate data, create new variables, aggregate data, sort them horizontally and longitudinally, and how to merge two datasets. To do this, we will use some specific packages and functions, such as dplyr, tidyr or reshape2. We will also briefly see how to interface with a database and use other packages to streamline the management of somewhat larger datasets. 

R is also a very important language in the field of statistics. We will therefore learn some of the basic functions, such as calculating averages per row or per column, and the most common statistical functions in the field of descriptive statistics. When it comes to data analysis, we will often find ourselves creating graphs to explain our data and analyses. For this reason, we devote part of the book to seeing how to create graphs with both the functions of the basic library and the ggplot2 package. In the final sections, we will see how to create and export reports and slides, summarise the topics we have seen and the functions we have used, and look at the supporting material.
LanguageEnglish
Release dateAug 10, 2023
ISBN9791222434810
R coding for data analysts: from beginner to advanced

Related to R coding for data analysts

Related ebooks

Programming For You

View More

Related articles

Reviews for R coding for data analysts

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    R coding for data analysts - Porcu Valentina

    Valentina Porcu © 2023

    Introduction

    R is a programming language born in the early 90s as a fork of S, a language for statistical analysis earlier developed by Bell laboratories. The development of R as a language of its own is due to Robert Gentleman and Ross Idhaka, from University of New Zealand. Initially, these two researchers were meant to develop software for students. But in 1993, encouraged by the success of the project, they decided to turn it into an open source software. R is in fact free software, distributed under the GNU-GPL license, and can be downloaded and installed from r-project.org. In order to simplify our work in R, we can use development environments, the most famous of which is RStudio, which was specially developed for R.

    R, together with RStudio, provides a powerful combination suitable for various application areas:

    programming: R can be used to write scripts and functions aimed at data analysis and management

    it is one of the most important programming languages in data analysis, statistics, data visualisation, and for the creation of predictive models

    it is among the first languages created for statistical analysis, but it has kept up with the times, and is one of the most flexible and simple in all the steps of the data life cycle

    it is open source, so it can be downloaded and used free of charge, either for individual projects or in collaboration with other data analysts

    it is simple, as it allows with just a few lines of code to start getting an idea, since many basic tools and datasets are already pre-loaded and easy to access.

    This book was conceived as an introduction to programming in R. it is meant to be an agile guide for people who begin to study programming, in order to learn the ropes of R.

    After this Introduction where we will learn how to install, customize and use the main tools we need in order to learn to program properly, in Chapter One we will start talking about the basics of language, starting from data structures, relational operators, control structures and functions. We will then learn how to create objects in R and how to use the first functions in R, like for instance reorder a vector or add columns to a matrix.

    In Chapter Two we will learn how to set our working environment on R, how to install and retrieve a package, and how to get help in case of doubts and problems.

    Chapter Three is dedicated to importing data to R, from various formats, mainly .csv, but also data in excel, .txt and other formats.

    In Chapter Four we will begin to talk about how to manipulate our data through specific functions and packages, such as dplyr, while in Chapter Five we will briefly explore databases and the data.table package.

    Chapter Six will deal with basic statistics on R, while Chapter Seven is about the basis of graphic creation, both through the basic functions of R, and through specific packages, in particular ggplot2.

    Finally, in Chapter Eight we will learn how to structure reports on R, via Markdown and Knitr, as well as seeing the basis of Shiny. I hope that this book can be the simplest possible introduction to R, especially for those who have no previous programming background, and I therefore invite the reader to write the code and execute it step by step to better understand how programming for data analysis with R works. The complete code can be downloaded from the following link: https://github.com/valentinap/coding-in-R-for-data-analysis-

    Note for the reader

    Dear reader, please report errors and suggestions for any possible improvement to the following mail address:

    info@datawiring.me

    Every contribution is valuable. Thank you

    Valentina

    Chapter

    Chapter 1

    First steps

    Downloading and Installing R

    The first step we neede to take to get started with R is to download and install the R programming language. You can do this from the website https://www.r-project.org.

    The Download link in the middle of the page will take us to this page:

    This is where we will find all the versions of R that are hosted on different servers in different countries. We will choose the country closest to us and click on one of the links.

    On the page, we choose the link related to the operating system installed in our computer.

    We will then click on one of the links depending from the operating system installed in our computer and wait for the installation software to be downloaded.

    Once the installer has been downloaded, we click twice on the installation file.

    On this page we find an introduction to the software. Click on 'Continue' on the bottom right.

    On the second screen of the installer we find information on the version of R. Also in this case, we click on Continue.

    In the third screen we find the license agreement. Click on Continue and accept as in the following screen:

    We then decide where to install the software, for which users, then again we click on Continue:

    We proceed with the installation by clicking on Continue again:

    We then wait for the installation to be completed:

    Once the software is installed, the following image will be displayed:

    We can close the installer. R is now installed on our computer. We can open the R console by double clicking on the software logo, and we will get a window like this:

    From here we can already type in the code and perform some operations. We can also open the terminal or command prompt and type the R command, as in the first line of the following image, and press the Enter command:

    This will allow us to open and use R directly from the terminal.

    Downloading and Installing RStudio

    As we have seen, we can use the R language either directly from its console or from the terminal. As we will see shortly, there are dozens of programming environments that allow us to use our favorite programming language. The programming environment those who start working with R prefer is RStudio. To download it we go to www.posit.co.

    From this page we go to the DOWNLOAD RSTUDIO at the top right, and from there we scroll the page to the installation options.

    From the tab on the left we click on the blue DOWNLOAD RSTUDIO DESKTOP button.

    From the page that will open, we click on one of the DOWNLOAD buttons on the left tab, which will allow us to use the open source and free version of RStudio.

    From the page that will open, we can download RStudio for our operating system. Once the software is downloaded, we double-click on it and run the program.

    Customizing and Using RStudio

    When opening RStudio we can first see that the software is made up of 4 different windows.

    The first window, at the top left, defined as a script or editor window, is the window where I am going to write some code that I am interested in saving for later use. From the File ↵ New File command, we can open a new script of R, while in File ↵ Recent Files we can see the scripts we recently worked on and open them again. Once a code is created on a script, I can execute it using Ctrl + Enter and at the end of the session save it as an .R file for for later use.

    When we run some code, this will be executed on the second window, the one we called the console. However, we do not write the code on the console because, when we execute commands from here, the first performed commands will disappear as operations pile up, thus risking losing work or being confused. We can therefore run some tests from the console, but it is good to underline that the important code and the more complex operations should be written on a script which will then be saved as an .R file.

    The workspace window contains three distinct tabs. In the first, Environment, we will see the objects we create appearing as we perform code operations. The History tab, on the other hand, will feature all the operations that we performed in our work session. We can also go to retrieve some code that we have perhaps written by mistake on the console and send it back to a script or to the console itself.

    In the third tab of the Environment, Connections, we have tools that will simplify the connection with more complex databases, or in Big Data context, with working sessions in Spark, a workframe for certain types of data analysis.

    The last window, the 4, features 5 separate tabs. In the first, Files, we can browse from our computer and also import datasets via an interface for loading RStudio. The Plot tab will contain the graphs that we can possibly create during a work session.

    The third tab is reserved for libraries. R features many already preloaded functions, but it is possible to extend it through additional libraries. We can install the libraries and, at each session, load only those that we need for a certain type of analysis. In the Help tab we find support documentation for the functions and libraries themselves. Finally, the last tab, that of the Viewer, is dedicated to more advanced views, usually related to neural networks.

    Finally, we can customize the appearance of RStudio by going to the top RStudio ↵ Preferences menu.

    From the Appearance tab, in particular, we can change the font, its size, and the style of RStudio, choosing a dark theme or something like that.

    Using other IDE with R

    RStudio is the most famous and important integrated development environment for R, and certainly very suitable for those who start programming from the basics of this language. But there are many other development environments. Development environments are known in English as IDE, an acronym for Integrated Development Environment. It is working environment for programmers that, natively or through installable plugins, features a series of tools aimed at simplifying the work of programmers or data analysts. Usually IDEs are designed to work with many programming languages. An investigation carried out annually by the Stackoverflow website, which can be found at this link: https://insights.stackoverflow.com/survey/2019 gives us an idea of the most used development environments among developers.

    Enjoying the preview?
    Page 1 of 1