Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Simple Data Science (R)
Simple Data Science (R)
Simple Data Science (R)
Ebook223 pages58 minutes

Simple Data Science (R)

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

The book Simple Data Science (R) covers R language, graphing, and machine learning. It is beginner-friendly, precise, and complete. The book explains data science concepts in a simple language, followed by implementing them in R language. It is one of the fastest ways to learn data science. The hands-on projects provide a detailed step-by-step guide for implementing machine learning solutions.

Topics covers -

* Data science introduction
* Basic statistics
* Data visualization
* Machine Learning (linear regression, logistic regression, random forests, and other machine learning algorithms)
* Hands-on projects

LanguageEnglish
Release dateNov 1, 2022
ISBN9798215903315
Simple Data Science (R)

Related to Simple Data Science (R)

Related ebooks

Computers For You

View More

Related articles

Reviews for Simple Data Science (R)

Rating: 5 out of 5 stars
5/5

1 rating1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 5 out of 5 stars
    5/5
    The book itself is a fairly technical manual on how to use R for data analysis. There are coding examples throughout, such as the one used to show how the R language would look if a user was trying to create a specific type of graph. The book takes you through data wrangling, prediction modeling, data classifications, and various other R program features.

    The graphs illustrated further in the book are both visually appealing and informative. The chapter discussing the statistics was an excellent summation of what a user would need to know for using R for data analysis and the graphing elements involved.

    A nice addition to the book is three "Hands-on Projects" which gave the reader links to data sets to work with and walked the reader through how to use those data sets by first loading them into the R program, then viewing the data structure, and ending with showing how to show the data outcomes visually through charts and graphs, with various outcomes listed and illustrated. Another nice addition was the chapter on use cases for using the R programming language.

    I am pleased with the results of the work.

    I would recommend this book to those who may be curious about R programming, those in the data science field that are looking for new ways to confront data, and those programmers who like to take on a challenge.

Book preview

Simple Data Science (R) - Narayana Nemani

About this book

0.1 Preface

Data Science is an emerging field. A large number of organizations use it for research and business improvement. Glassdoor ranked data science as one of the best careers.

This book is for beginners and domain experts who want to start their data science journey. The book is precise and complete. It is one of the fastest ways to learn data science. It covers data science, graphing, and machine learning.

As this is a beginner-level book, prior knowledge is not needed. Knowing mathematics, statistics, and programming would be helpful.

0.2 Book Links

The book is available online and your feedback is appreciated.

0.3 About the Author

Narayana Nemani

Narayana Nemani is a Lead Data Scientist. He is involved in the teaching and research of data science.

Kaggle Account, Twitter Account, Email - murthynn2015@gmail.com

Copyright

Published by Narayana Nemani

© 2022 Visakhapatnam

All rights reserved. No part of this book may be reproduced or modified in any form, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher.

The scanning, uploading, and distribution of this book via the internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions, and do not participate in or encourage piracy of copyrighted materials.

1 Getting Started

1.1 What is Data Science?

Data science is the domain of studying and using data. The primary objectives of data science are analyzing and predicting data. It applies to both small and large datasets.

Typical use cases of data science are -

A grocery store estimates its future sales and fills up the inventory accordingly.

A shopping app recommend products to customers based on their previous purchases.

Popular data science applications are self-driving cars, gaming AI, search engines (Google, DuckDuckGo), and virtual assistants (SIRI, Alexa).

Applications of data science

Figure 1.1: Applications of data science

Data science needs knowledge from various fields. Statistics, domain knowledge, and programming are the pillars of data science.

Pillars of data science

Figure 1.2: Pillars of data science

Data is the core of data science. Organizations’ internal data, government data, and surveys are data sources. For example, news channels conduct voting surveys before elections.

Data science jobs are one of the highest-paid occupations. Both programmers and domain experts fill up these positions.

Models

Models are collections of code for understanding and predicting data.

Create models with the following steps -

Understand the problem statement.

Transform data.

Analyze data.

Apply algorithms.

Creating models is an iterative process. After finding a new insight in a step, make relevant changes in other steps.

Steps of model creation

Figure 1.3: Steps of model creation

1.2 R Language

R is a programming language created by statisticians for statistics. It has inbuilt statistical and visualization capabilities. It is popular among the scientific community.

While introducing a new programming language, it is customary to start with a hello world example. Given below is the hello world in R language.

# Hello World Example string1 <- Hello World string1

## [1] Hello World

The hello world example is explained below.

Add the comments with number-sign/hash (#) character. Comments document the code.

# Hello World Example

The assignment operator (<-) passes a value to the variable.

string1 <- Hello World

R supports implicit printing. Run a variable to print its value.

string1

## [1] Hello World

1.3 Rstudio

Rstudio is the recommended IDE for the R language. Install it locally or access from the web on rstudio.cloud. This ebook itself is written in Rstudio.

The graphical interface of Rstudio has four areas. Each area has single or multiple panes.

RStudio

Figure 1.4: RStudio

Source code editor pane - Write the actual code in the editor pane. For creating a new script file in Rstudio, select the File > New File > R Script option. Save the code file for reusing it.

Console pane - Run code commands and view code output at the console. It is a command-line interface.

The below image shows the date function and its result.

Editor pane and console pane

Figure 1.5: Editor pane and console pane

Environment pane -

The environment pane displays the variables and their values.

Environment and history panes

Figure 1.6: Environment and history panes

Files, and plots panes - The files pane displays the files system. Use it for viewing and opening the code files. The plots pane displays the graphs.

Frequently used keyboard shortcuts of Rstudio IDE -

Crtl + Enter - Run current line or selected code

Crtl + Atl + R - Run entire document

Crtl + Shift + C - Comment/Uncomment current line or selected code

Crtl + l - Clear console

1.4 R Projects

Create separate script files for data cleaning, transforming, and machine learning. It improves the maintainability of code. In case of multiple files for a single task, add folders. If there are three transformation scripts, place all of them in a transform folder.

R project is a feature for clubbing and accessing all the associated files. Add the script files and data files to the project.

Typical project structure

Figure 1.7: Typical project structure

For creating projects in Rstudio, select the File > New Project option.

New project wizard

Figure 1.8: New project wizard

2 Statistics and R

2.1 Statistics Introduction

Statistics is the science of collecting,

Enjoying the preview?
Page 1 of 1