R coding for data analysts: from beginner to advanced
()
About this ebook
We will look at a number of useful topics: how to set up a working directory, how to install and retrieve a package, how to get information about data, where to find datasets for testing, and how to get help with a function. When analysing data, we need to understand the concept of dataset or dataframe. We will therefore see how to import a dataframe from your computer, or from the internet, into R. There are many functions that are suitable for this purpose, and many packages that are useful for importing data that is in some particular format, such as the formats for Excel, .csv, .txt or JSON. We will then see how to manipulate data, create new variables, aggregate data, sort them horizontally and longitudinally, and how to merge two datasets. To do this, we will use some specific packages and functions, such as dplyr, tidyr or reshape2. We will also briefly see how to interface with a database and use other packages to streamline the management of somewhat larger datasets.
R is also a very important language in the field of statistics. We will therefore learn some of the basic functions, such as calculating averages per row or per column, and the most common statistical functions in the field of descriptive statistics. When it comes to data analysis, we will often find ourselves creating graphs to explain our data and analyses. For this reason, we devote part of the book to seeing how to create graphs with both the functions of the basic library and the ggplot2 package. In the final sections, we will see how to create and export reports and slides, summarise the topics we have seen and the functions we have used, and look at the supporting material.
Related to R coding for data analysts
Related ebooks
Learn R By Coding Rating: 0 out of 5 stars0 ratingsR Programming - a Comprehensive Guide: Software Rating: 0 out of 5 stars0 ratingsBeginning R: The Statistical Programming Language Rating: 5 out of 5 stars5/5Learning RStudio for R Statistical Computing Rating: 4 out of 5 stars4/5Learn RStudio IDE: Quick, Effective, and Productive Data Science Rating: 0 out of 5 stars0 ratingsLearn Kotlin for Android Development: The Next Generation Language for Modern Android Apps Programming Rating: 0 out of 5 stars0 ratingsC Programming Language The Beginner’s Guide Rating: 0 out of 5 stars0 ratingsPYTHON CODING AND PROGRAMMING: Mastering Python for Efficient Coding and Programming Projects (2024 Guide for Beginners) Rating: 0 out of 5 stars0 ratingsCollection of Raspberry Pi Projects Rating: 5 out of 5 stars5/5C++ for Beginners: The Complete Guide to Learn C++ Programming with Ease and Confidence Rating: 0 out of 5 stars0 ratingsAn Introduction to R: Data Analysis and Visualization Rating: 0 out of 5 stars0 ratingsLearn R Programming in 24 Hours Rating: 0 out of 5 stars0 ratingsSo You Want To Be an iOS Developer Rating: 0 out of 5 stars0 ratingsIntroduction to C Programming, a Practical Approach Rating: 0 out of 5 stars0 ratingsPython from the Very Beginning Rating: 0 out of 5 stars0 ratingsIntroduction to Python 2018 Edition Rating: 4 out of 5 stars4/5Just Enough R: Learn Data Analysis with R in a Day Rating: 4 out of 5 stars4/5Understanding Python: Beginner's Guide to Programming Rating: 0 out of 5 stars0 ratingsC# Programming Illustrated Guide For Beginners & Intermediates: The Future Is Here! Learning By Doing Approach Rating: 0 out of 5 stars0 ratingsPYTHON FOR BEGINNERS: Unraveling the Power of Python for Novice Coders (2023 Guide) Rating: 0 out of 5 stars0 ratingsPython: Beginner's Guide to Programming Code with Python Rating: 0 out of 5 stars0 ratingsPython: Beginner's Guide to Programming Code with Python: Python Computer Programming, #1 Rating: 0 out of 5 stars0 ratingsPython for Beginners Rating: 0 out of 5 stars0 ratingsPYTHON FOR BEGINNERS: A Comprehensive Guide to Learning Python Programming from Scratch (2023) Rating: 0 out of 5 stars0 ratingsPython for Beginners: Learn It as Easy as Pie Rating: 0 out of 5 stars0 ratingsThe Dev-c++ Reference Manual Rating: 5 out of 5 stars5/5C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples Rating: 0 out of 5 stars0 ratingsThe 1 Page Python Book Rating: 2 out of 5 stars2/5
Programming For You
HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 0 out of 5 stars0 ratingsLearn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application Rating: 0 out of 5 stars0 ratingsCoding All-in-One For Dummies Rating: 4 out of 5 stars4/5Java for Beginners: A Crash Course to Learn Java Programming in 1 Week Rating: 5 out of 5 stars5/5Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1 Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming Rating: 0 out of 5 stars0 ratingsSQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5The Little SAS Book: A Primer, Sixth Edition Rating: 5 out of 5 stars5/5Teach Yourself C++ Rating: 4 out of 5 stars4/5Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS Rating: 5 out of 5 stars5/5Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles Rating: 4 out of 5 stars4/5
Reviews for R coding for data analysts
0 ratings0 reviews
Book preview
R coding for data analysts - Porcu Valentina
Valentina Porcu © 2023
Introduction
R is a programming language born in the early 90s as a fork of S, a language for statistical analysis earlier developed by Bell laboratories. The development of R as a language of its own is due to Robert Gentleman and Ross Idhaka, from University of New Zealand. Initially, these two researchers were meant to develop software for students. But in 1993, encouraged by the success of the project, they decided to turn it into an open source software. R is in fact free software, distributed under the GNU-GPL license, and can be downloaded and installed from r-project.org. In order to simplify our work in R, we can use development environments, the most famous of which is RStudio, which was specially developed for R.
R, together with RStudio, provides a powerful combination suitable for various application areas:
programming: R can be used to write scripts and functions aimed at data analysis and management
it is one of the most important programming languages in data analysis, statistics, data visualisation, and for the creation of predictive models
it is among the first languages created for statistical analysis, but it has kept up with the times, and is one of the most flexible and simple in all the steps of the data life cycle
it is open source, so it can be downloaded and used free of charge, either for individual projects or in collaboration with other data analysts
it is simple, as it allows with just a few lines of code to start getting an idea, since many basic tools and datasets are already pre-loaded and easy to access.
This book was conceived as an introduction to programming in R. it is meant to be an agile guide for people who begin to study programming, in order to learn the ropes of R.
After this Introduction where we will learn how to install, customize and use the main tools we need in order to learn to program properly, in Chapter One we will start talking about the basics of language, starting from data structures, relational operators, control structures and functions. We will then learn how to create objects in R and how to use the first functions in R, like for instance reorder a vector or add columns to a matrix.
In Chapter Two we will learn how to set our working environment on R, how to install and retrieve a package, and how to get help in case of doubts and problems.
Chapter Three is dedicated to importing data to R, from various formats, mainly .csv, but also data in excel, .txt and other formats.
In Chapter Four we will begin to talk about how to manipulate our data through specific functions and packages, such as dplyr, while in Chapter Five we will briefly explore databases and the data.table package.
Chapter Six will deal with basic statistics on R, while Chapter Seven is about the basis of graphic creation, both through the basic functions of R, and through specific packages, in particular ggplot2.
Finally, in Chapter Eight we will learn how to structure reports on R, via Markdown and Knitr, as well as seeing the basis of Shiny. I hope that this book can be the simplest possible introduction to R, especially for those who have no previous programming background, and I therefore invite the reader to write the code and execute it step by step to better understand how programming for data analysis with R works. The complete code can be downloaded from the following link: https://github.com/valentinap/coding-in-R-for-data-analysis-
Note for the reader
Dear reader, please report errors and suggestions for any possible improvement to the following mail address:
info@datawiring.me
Every contribution is valuable. Thank you
Valentina
Chapter
Chapter 1
First steps
Downloading and Installing R
The first step we neede to take to get started with R is to download and install the R programming language. You can do this from the website https://www.r-project.org.
The Download link in the middle of the page will take us to this page:
This is where we will find all the versions of R that are hosted on different servers in different countries. We will choose the country closest to us and click on one of the links.
On the page, we choose the link related to the operating system installed in our computer.
We will then click on one of the links depending from the operating system installed in our computer and wait for the installation software to be downloaded.
Once the installer has been downloaded, we click twice on the installation file.
On this page we find an introduction to the software. Click on 'Continue' on the bottom right.
On the second screen of the installer we find information on the version of R. Also in this case, we click on Continue.
In the third screen we find the license agreement. Click on Continue and accept as in the following screen:
We then decide where to install the software, for which users, then again we click on Continue:
We proceed with the installation by clicking on Continue again:
We then wait for the installation to be completed:
Once the software is installed, the following image will be displayed:
We can close the installer. R is now installed on our computer. We can open the R console by double clicking on the software logo, and we will get a window like this:
From here we can already type in the code and perform some operations. We can also open the terminal or command prompt and type the R command, as in the first line of the following image, and press the Enter command:
This will allow us to open and use R directly from the terminal.
Downloading and Installing RStudio
As we have seen, we can use the R language either directly from its console or from the terminal. As we will see shortly, there are dozens of programming environments that allow us to use our favorite programming language. The programming environment those who start working with R prefer is RStudio. To download it we go to www.posit.co.
From this page we go to the DOWNLOAD RSTUDIO at the top right, and from there we scroll the page to the installation options.
From the tab on the left we click on the blue DOWNLOAD RSTUDIO DESKTOP button.
From the page that will open, we click on one of the DOWNLOAD buttons on the left tab, which will allow us to use the open source and free version of RStudio.
From the page that will open, we can download RStudio for our operating system. Once the software is downloaded, we double-click on it and run the program.
Customizing and Using RStudio
When opening RStudio we can first see that the software is made up of 4 different windows.
The first window, at the top left, defined as a script or editor window, is the window where I am going to write some code that I am interested in saving for later use. From the File ↵ New File command, we can open a new script of R, while in File ↵ Recent Files we can see the scripts we recently worked on and open them again. Once a code is created on a script, I can execute it using Ctrl + Enter and at the end of the session save it as an .R file for for later use.
When we run some code, this will be executed on the second window, the one we called the console. However, we do not write the code on the console because, when we execute commands from here, the first performed commands will disappear as operations pile up, thus risking losing work or being confused. We can therefore run some tests from the console, but it is good to underline that the important code and the more complex operations should be written on a script which will then be saved as an .R file.
The workspace window contains three distinct tabs. In the first, Environment, we will see the objects we create appearing as we perform code operations. The History tab, on the other hand, will feature all the operations that we performed in our work session. We can also go to retrieve some code that we have perhaps written by mistake on the console and send it back to a script or to the console itself.
In the third tab of the Environment, Connections, we have tools that will simplify the connection with more complex databases, or in Big Data context, with working sessions in Spark, a workframe for certain types of data analysis.
The last window, the 4, features 5 separate tabs. In the first, Files, we can browse from our computer and also import datasets via an interface for loading RStudio. The Plot tab will contain the graphs that we can possibly create during a work session.
The third tab is reserved for libraries. R features many already preloaded functions, but it is possible to extend it through additional libraries. We can install the libraries and, at each session, load only those that we need for a certain type of analysis. In the Help tab we find support documentation for the functions and libraries themselves. Finally, the last tab, that of the Viewer, is dedicated to more advanced views, usually related to neural networks.
Finally, we can customize the appearance of RStudio by going to the top RStudio ↵ Preferences menu.
From the Appearance tab, in particular, we can change the font, its size, and the style of RStudio, choosing a dark theme or something like that.
Using other IDE with R
RStudio is the most famous and important integrated development environment for R, and certainly very suitable for those who start programming from the basics of this language. But there are many other development environments. Development environments are known in English as IDE, an acronym for Integrated Development Environment. It is working environment for programmers that, natively or through installable plugins, features a series of tools aimed at simplifying the work of programmers or data analysts. Usually IDEs are designed to work with many programming languages. An investigation carried out annually by the Stackoverflow website, which can be found at this link: https://insights.stackoverflow.com/survey/2019 gives us an idea of the most used development environments among developers.