Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)
Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)
Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)
Ebook876 pages5 hours

Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book covers how to conduct Emotion Analysis based on Lexicons. Through a detailed code walkthrough, the book will explain how to develop systems for Sentiment and Emotion Analysis from popular sources of data, including WhatsApp, Twitter, etc.

The book starts with a discussion on R programming and Shiny programming as these will lay the foundation for the system to be developed for Emotion Analysis. Then, the book discusses essentials of Sentiment Analysis and Emotion Analysis. The book then proceeds to build Shiny applications for Emotion Analysis. The book rounds off with creating a tool for Emotion Analysis from the data obtained from Twitter and WhatsApp.

Emotion Analysis can be also performed using Machine Learning. However, this requires labeled data. This is a logical next step after reading this book.
LanguageEnglish
Release dateJun 2, 2021
ISBN9789390684236
Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)

Read more from Partha Majumdar

Related to Learn Emotion Analysis with R

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Learn Emotion Analysis with R

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Learn Emotion Analysis with R - Partha Majumdar

    SECTION - 1

    CHAPTER 1

    Getting Started with R

    Ris a free software environment for statistical computing and graphics. R language R language is very widely used. Ever since Data Sciences have gained popularity, the use of R language has increased tremendously. It is used by scientists, researchers, students, and across all industry. R language is open-source and thus, many developers contribute to its growth. As a result, R language has a very rich library for almost any aspect of developing systems for almost all requirements across almost all domains.

    We will use R language to create our system for emotion analysis in this book. So, let us discuss the basics of R language for you to understand the programs discussed in this book.

    Structure

    In this chapter, we will discuss the following topics:

    Brief introduction to R language

    Setting up the R software

    – Obtaining the R software

    – Installing R

    – Invoking R

    Setting up RStudio

    – Obtaining the RStudio software

    – Installing RStudio

    – Invoking RStudio

    Introduction to packages

    Introduction to vector

    Objectives

    After studying this unit, you should be able to:

    Install R and RStudio

    Install packages and load libraries

    Perform statistical operations using R

    Create graphs using R

    Brief introduction to R language

    R language was initially created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. The project was conceived in 1992. An initial version was released in 1995. The first stable version was released on 29th February 2000, versioned as 1.0. The latest version (at the time of writing this book) is R version 4.0.0 (Arbor Day) which was released on 24th April 2020.

    R language originated from S language developed by John M. Chambers at the Bell Labs. R is an implementation of the S language developed using C, FORTRAN, and R itself.

    R is supported by The R Project for Statistical Computing. The official website is www.r-project.org. Since the mid 1997, there has been a core group (the "R Core Team") responsible for modifying the R source code archive.

    The Comprehensive R Archive Network (CRAN) is a collection of sites which carry identical materials consisting of the R distribution(s), the contributed extensions, the documentation for R, and binaries. The official website of CRAN is www.cran.r-project.org. Its head office is in Austria and has many mirror sites around the world. All materials related to R language can be obtained from one of the CRAN sites.

    One must read the documentation provided at the following URL: https://cran.r-project.org/doc/FAQ/R-FAQ.html. There are some interesting facts including the names of the people who can modify the R Core software, why R was named R, what is R Foundation, what is R Forge, what is S, what is S-Plus, what is R-Plus, how I can create an R package, how I can contribute to R, how to report R bugs, etc.

    Setting up the R software

    R can be installed on any UNIX-like machine, Apple Mac, and Windows machines. We will walk through the process of installing R on an Apple Mac.

    Obtaining the R software

    To obtain the R Software, visit www.r-project.org. On invoking this URL on Google Chrome, the web page looks like as shown below:

    Figure 1.1: The R Project Home Page

    On this web page, click on the link to CRAN site as shown in the following screenshot:

    Figure 1.2: Click on the CRAN link to reach a CRAN mirror

    On clicking the CRAN link, you should get a web page as shown below with links to all the Mirror sites:

    Figure 1.3: Links to CRAN mirrors

    Click on any of the mirror sites to obtain the following web page:

    Figure 1.4: R Software download web page

    In the box highlighted in RED, you will find the links to the R software for different operating systems. Click on the appropriate link as per the operating system of your machine to obtain the R software.

    Installing R

    On the web page web page shown in figure 1.4, click on the link Download R for (Mac) OS X. You should get this web page as shown in the following screenshot:

    Figure 1.5: R software download link for Mac

    Click on the link as highlighted in the RED-colored color box. A dialog box asking for the location to download the R software package is displayed. Select a folder and click Save:

    Figure 1.6: R Package download dialog box

    Once the software has been downloaded, locate the downloaded package and run it. When the package is run, the installation process starts with this dialog box as shown in the following screenshot:

    Figure 1.7: R Installation step 1

    Click on the Continue button. The dialog box changes as shown in the following screenshot:

    Figure 1.8: R Installation step 2

    Read the information provided. You can print it and/or save it to your disk. Once done, click the Continue button. The dialog box shown in the following screenshot:

    Figure 1.9: R installation step 3

    This is the licensing agreement. R is free software. Read the information provided. You can print it and/or save it in your computer. Once done, click the Continue button. The dialog box shown in the following screenshot:

    Figure 1.10: R installation step 4

    Click the Agree button to continue. The following dialog box in the following screenshot:

    Figure 1.11: R installation step 5

    Select the destination disk where R needs installing and click Install. The installation starts after that:

    Figure 1.12: R installation step 6

    When the installation is complete, a message, as shown in the following screenshot:

    Figure 1.13: R installation step 7

    Invoking R

    Once R is installed, click on Launchpad to locate the R software. The icon for the R Software is as shown in the following screenshot:

    Figure 1.14: Click this icon to invoke the R software

    On clicking this icon, the R software is invoked. The R Console looks like as shown in the following screenshot:

    Figure 1.15: R Console

    The > symbol at the bottom of the screen is R Prompt. Any R command needs to be provided at the R prompt. Try typing any mathematical formula in front of the R Prompt and press Enter. You should see the result. An illustration is provided in the following screenshot:

    Figure 1.16: Performing simple arithmetic using R

    The process to install R Software on Windows and Linux machine is very similar. So, I will not repeat the process. You need to take care that the file types in Windows and Linux is different from that in Apple Mac. Also, the method of invoking an application in Windows and Linux is different from that in Apple Mac.

    Setting up RStudio

    RStudio is an Integrated Development Environment (IDE) for R. Initially, RStudio was exclusively for R programming. However, it also supports Python programming of late. An IDE provides all requisite tools for programming. RStudio provides templates for creating, running, and debugging the different types of R programs like R Scripts, R Markdowns, Shiny Applications, etc. Besides, RStudio provides a Shell Interface using which commands can be run on the Shell of the operating system. It provides Graphical User Interface (GUI) for conducting activities like installing/updating libraries, setting working directories, debugging programs, version control of code units, etc.

    Other IDEs are also available for R Programming, like Jupyter.

    Obtaining the RStudio software

    To obtain the RStudio software, visit www.rstudio.com. On invoking this URL on Google Chrome, the web page looks like as shown in the following screenshot:

    Figure 1.17: RStudio home page

    Click on the Download link as shown in figure 1.17. The following web page in the following screenshot:

    Figure 1.18: RStudio download page

    Scroll down on this page to get the download options as shown in the following screenshot:

    Figure 1.19: RStudio download options

    Select the download option Download option suitable for you. The RStudio Desktop version which is Free is enough for most requirements.

    Installing RStudio

    On the web page web page shown in figure 1.19, click on the download link under RStudio Desktop. You should get this web page as shown in the following screenshot:

    Figure 1.20: RStudio download link for Mac

    Click on the download link as shown in figure 1.20 to start the download of the software. Provide a location where the download should be saved (see in the following screenshot):

    Figure 1.21: Save RStudio software to the disk

    The download is a .dmg file (Disk Mirror Image). Once the software has been downloaded, locate the downloaded .dmg and run it. When the .dmg is run, the installation process starts. Once the installation is complete, the window shown in the following screenshot:

    Figure 1.22: RStudio installation completed

    In this window, drag the RStudio icon onto the applications icon. On doing so, the RStudio installation gets completed.

    Invoking RStudio

    You can see the RStudio icon in figure 1.22. Click on Launchpad and locate this icon. Then, click it to invoke RStudio.

    The RStudio interface looks as shown in the following screenshot:

    Figure 1.23: RStudio interface

    As you can see, there are many parts in the RStudio interface. We will go through two aspects of the RStudio to get started. The area marked 1 is the area where the code is written. On the top-right corner of this panel, you can see a button titled Run. On clicking the Run button, the written program (or a part of the program) is executed.

    The area marked 2 is the R Console. When the programs are run, the execution steps can be seen here. Also, it is possible to directly provide a command here and run it. Let us see an example for this.

    To find out which version of R we are using, we can give the command version. In the picture below, you can see the command version in the R Console of the RStudio. The output of the command is shown after hitting Enter:

    Figure 1.24: Running commands in R Console of RStudio

    The process to install RStudio Software on Windows and Linux machine is very similar. So, I will not repeat the process. You need to take care that the file types in Windows and Linux is different from that in Apple Mac. Also, the method of invoking an application in Windows and Linux is different from that in Apple Mac.

    Introduction to packages

    Packages are program collections which provide the libraries containing various functionalities. Using these libraries, it becomes possible to create various applications without bothering to program these essential features. The added advantage is that these libraries are thoroughly tested and tried by experts. Also, these libraries are routinely updated so that they function seamlessly with the new releases of R.

    When R is installed, it comes with the default packages. Experts from various fields of studies regularly create and publish packages for specific requirements. These packages not only contain program units, but also data. For example, the package tm contains various program units (functions) essential for text mining. Also, for example, the package janeaustenr contains data regarding all the books of Jane Austen.

    One of the most powerful features of R is that it has packages for almost all kinds of programming needs covering almost all domains. One of the most essential skills that R programmers need developing is to be able to find out the packages required for their purposes and know how to use those packages. Generally, elaborate documentation of each package can be found in the CRAN website. Also, there are various other websites which publish detailed documentation regarding all the published R packages.

    If a package for an essential feature of the program being developed is not available in any of the published packages, they can be developed by the programmer. Once these program units have been developed, they can be compiled into a library. These libraries can be published to be included as a part of the R Software. We will cover how to create functions in this book. However, it is not the scope of this book to discuss the mechanism to create libraries and to publish such libraries to be included as part of the R software.

    In general, all packages can be used without any need for licensing. However, there are certain packages which need licensing agreements if the developed software is to be used for commercial purposes.

    As a good practice, R programmers should cite the packages used by them in their programs. In the Appendix of this book, the mechanism to include citations is provided.

    Installing packages

    We can install packages by using both the Character User Interface (CUI), that is, using the Command Line Interface (CLI) and the Graphical User Interface (GUI). We will discuss both the methods. We start our discussion using the GUI in RStudio.

    Installing packages using RStudio GUI

    RStudio provides an easy interface for installing packages. In the screenshot below, the area of RStudio where all the installed packages are displayed is highlighted in the RED box. On clicking a package name, the documentation for the package is provided:

    Figure 1.25: Package information in RStudio

    On clicking the Install button as shown in figure 1.25, a new package can be installed by providing the name of the package. To update the installed packages to the latest version, click the Update button.

    To load a package (or library); tick the check box in front of the package.

    Installing packages using CLI

    To install a package from the command line or programmatically, we need to use the install.packages() command. For example, to install the package caret, the following command needs issuing:

    install.packages(caret)

    trying URL ‘https://cran.rstudio.com/bin/macosx/contrib/4.0/caret_6.0-86.tgz’

    Content type ‘application/x-gzip’ length 6246328 bytes (6.0 MB)

    ==================================================

    downloaded 6.0 MB

    The downloaded binary packages are in

    /var/folders/bv/rqp385z157s87nvm3mg_tgn00000gn/T//RtmpJr4yKG/ downloaded_packages

    We can check the packages installed currently using the command installed.packages():

    row.names(installed.packages())

    ##  [1]   abind        askpass     assertthat

    ##  [4]   backports    base        base64enc

    ##  [7]   BayesFactor  bayestestR  BH

    ##  [10]   bitops      boot        broom

    ##  [13]   callr       car         carData

    ##  [16]   caret       cellranger  CGPfunctions

    ##    …

    ##  [247]   utils      vctrs       viridis

    ##  [250]   viridisLite  whisker    withr

    ##  [253]   wordcloud    xfun       xml2

    ##  [256]   xtable       yaml       zip

    ##  [259]   zoo

    So, in a program, we can install a package, only if it is not already installed, by using a code as shown in the following code:

    if(!(caret %in% installed.packages())) {

    install.packages(caret)

    }

    Loading libraries

    Only by installing the packages, the associated libraries are not available to the R environment and/or to the R programs. To make the libraries available, the libraries need to be loaded.

    To load the libraries, we need the library() command. For example, to load the caret library, we need issuing the command as follows:

    library(caret)

    Introduction to vector

    Vector is a basic data structure in R. It contains elements of the same type. The data types of the elements of a vector can be logical, integer, double, character, complex, or raw.

    We can create a vector of characters as follows:

    c(‘h’, ‘e’, ‘l’, ‘l’, ‘o’)

    This vector contains 5 elements of the type character.

    We can create a vector of strings as follows:

    c(‘Welcome’, ‘to’, ‘R’)

    This vector contains 3 elements of the type character.

    We can create a vector of integers as follows:

    c(20, 32, 24, 32, 41, 34, 67, 96, 22, 52)

    This vector contains 10 elements of the type integer.

    Assigning vectors to variables

    We can store a vector in a variable. To store a vector in a variable, we need to use the <- (assignment) operator:

    v_age <- c(20, 32, 24, 32, 41, 34, 67, 96, 22, 52)

    The above command stores a vector containing 10 elements of the type integer in a variable named v_age. Let us consider that the elements of the vector, v_age, contains the ages of 10 persons.

    Let us create another vector of salaries. Let us call this vector v_salary. Suppose that the vector, v_salary, contains the monthly salaries of people whose ages are stored in the vector v_age:

    v_salary <- c(2000, 3200, 2400, 2000, 4000, 4200, 2000, 1000, 5000, 10000)

    Checking the type of a vectors

    A vector’s type can be checked with the typeof() function:

    typeof(v_salary)

    [1] double

    Checking the length of a vectors

    A vector’s length can be checked with the length() function:

    length(v_salary)

    [1] 10

    Conducting statistical operations on a vectors

    Various statistical operations can be conducted on a vector. Many of the functions for conducting statistical operations are available in the base R package. However, for more statistical functions, various libraries are essential.

    We will see some statistical operations on a vector. We can find the SUM of all the elements of a vector using the sum() function:

    sum(v_salary)

    [1] 35800

    We can find the average of all the elements of a vector using the mean() function:

    mean(v_salary)

    [1] 3580

    We can find the standard deviation of all the elements of a vector using the sd() function:

    sd(v_salary)

    [1] 2570.689

    To see the summary of a vector, we can use the summary() function:

    summary(v_age)

    ##  Min.  1st  Qu.  Median  Mean  3rd  Qu.  Max.

    ##  20.00  26.00    33.00   42.00  49.25  96.00

    To find the correlation between the elements of 2 vectors, we can use the cor() function. However, the cor() function is available in the caret package. So, first we need to load the caret library and then issue the cor() function:

    library(caret)

    cor(v_age, v_salary)

    [1] -0.1320019

    To create a scatter plot to visualize the relation between 2 vectors, we can use the qplot() function. The qplot() function is available in the ggplot2 package:

    Library(ggplot2)

    qplot(v_age, v_salary)

    Figure 1.26: Scatter plot created using qplot()

    Conclusion

    R is a very powerful and versatile language. Though it was initially created for statistical computing, it can now be used for many other types of computing. R is free software. Besides using R for developing applications, one can contribute libraries to R. As many people contribute libraries to R, R contains libraries for almost all types of computer programming for almost all domains.

    In the next chapter, we will discuss the various operations that we can perform using R.

    Points to remember

    R is free software. R comprises many packages. Most of the packages are available without the need for licensing. However, some packages may require licenses for commercial use.

    R can be installed on any UNIX-like machine, Apple Mac, and Windows machines.

    RStudio is an Integrated Development Environment (IDE) for R programming.

    Packages in R can be installed using the install.packages command. To check for the installed packages, use the installed.packages command.

    To load a library, use the library command.

    A vector is a basic datatype in R. A vector contains elements of the same data type.

    Use the typeof() function to find the type of a vector.

    Use the length() function to find the length of a vector.

    Use the sum() function to find the sum of all the elements of a vector containing numeric values.

    Use the mean() function to find the average of all the elements of a vector containing numeric values.

    Use the sd() function to find the standard deviation of all the elements of a vector containing numeric values.

    Use the summary() function to find the summary of a vector containing numeric values.

    Use the cor() function to find the correlation between 2 vectors.

    Use the qplot() to plot a graph displaying the relation between 2 vectors.

    Multiple choice questions

    What is the nickname of the R version used in this book?

    Moby Dick

    Arbor Day

    Armistice Day

    None of these

    What is the RStudio version used in this book?

    2.0.587

    1.2.531

    1.3.959

    None of these

    DMG stands for:

    Disk Mirror imaGe

    Disk Management Gate

    Direct Messaging Game

    None of these

    The command to load libraries in R is:

    load.libraries

    load.packages

    library

    None of these

    The command to list installed packages in R is:

    list.packages

    list.installed.packages

    installed.packages

    None of these

    The function to find the sum of all the elements of a vector is:

    SUM

    sum

    add

    None of these

    The cor() function is available in:

    correlation package

    caret package

    statistics package

    None of these

    The function to find the average of all the elements of a vector is:

    mean

    average

    avg

    None of these

    The function to find the standard deviation of all the elements of a vector is:

    sd

    stdev

    std.dev

    None of these

    The qplot() command is available in the package:

    graphics

    plots

    ggplot2

    None of these

    Answers to MCQs

    B

    C

    A

    C

    C

    B

    B

    A

    A

    C

    Questions

    What is type of file provided by RStudio for installing the software on a Windows machine?

    Explain the process of installing the R Software on a Linux machine.

    Find the documentation for the package caret on the CRAN website. List 5 functions available in the caret package.

    Key terms

    CLI: Command Line Interface processes commands to a computer program in the form of lines of text. The program which handles the interface is called a command-line interpreter or command-line processor.

    CRAN: The Comprehensive R Archive Network (CRAN) is a collection of sites which carry identical material consisting of the R distribution(s), the contributed extensions, the documentation for R, and binaries.

    CUI: Character User Interface is a way for users to interact with computer programs. It works by allowing the user to issue commands as one or more lines of text to a program.

    DMG: A file with the DMG file extension is an Apple Disk Mirror Image file, or sometimes called a Mac OS X Disk Mirror Image file, which is basically a digital reconstruction of a physical disk.

    GUI: Graphical User Interface is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicators, etc. GUI is different from CUI which uses indicator such text-based user interfaces, typed command labels, or text navigation.

    IDE: Integrated Development Environment provides an easy way of writing programs for a platform.

    CHAPTER 2

    Simple Operations Using R

    In chapter 1: Getting Started with R , we discussed how to install R and RStudio. Also, we discussed how to issue some basic command using R language.

    In this chapter, we will discuss how to conduct various operations using R language. There is a lot which can be done using R language. We will not try to cover everything in an exhaustive manner. Instead, we will focus on the features of R language that are required for creating our programs for emotion analysis. The ground that we will cover should provide a good foundation to find out the features that we will not discuss.

    Structure

    In this chapter, we will discuss the following topics:

    Introduction to data frame

    – Creating an empty data frame

    – Creating a Data frame from vector(s)

    – Renaming column(s) in a data frame

    – Referencing row(s)/column(s) of a data frame

    – Adding column(s) to a data frame

    – Adding row(s) to a data frame

    – Sorting a data frame

    – Visualizing Data in a data frame

    Reading Data from a comma separated value (CSV) File

    Reading Data from a database

    Objectives

    After studying this unit, you should be able to:

    Create data frames and conduct operations on data frames

    Read data from CSV Files

    Read data from databases

    Create graphs

    Introduction to data frame

    Data frame is a data structure in R which has columns of data of different data types. A data frame has variables of a data set as columns and the observations as rows. A data frame is essential for storing data regarding any practical data set required for most analysis.

    For example, to store the scores from an examination of a class of students, we may need to store the names and the roll numbers of the students along with the marks in each subject. While the names and the roll numbers would be of character data type, the marks in each subject would be of numeric data type (maybe integer data type).

    Let us visit various operations on data frames in R including how to create data frames, how to view data frames, etc.

    Creating an empty data frame

    An empty data frame can be created by initializing a data frame with a set of empty vectors using the data.frame() function. An example is shown in the following code:

    v_marks <- data.frame(Name = character(),

    Roll_Number = character(),

    Company = character(),

    Computer_Architecture = integer(),

    Data_Analysis = integer(),

    R_Programming = integer(),

    stringsAsFactors = FALSE

    )

    In the above example, we created a data frame named v_marks containing 6 columns. The names of the columns are Name, Roll_Number, Company, Computer_Architecture, Data_Analysis, and R_Programming. The intention is to store the names of the students in the column Name and thus it is of character data type. We intend storing the roll numbers of the students in the column Roll_Number and thus it is of character data type. We intend storing the name of the company of the students in the column Company and thus it is of character data type. We intend storing the marks in the subject computer architecture, data analysis, computer architecture, data analysis and r programming in the columns Computer_Architecture, Data_Analysis, and R_Programming, respectively and thus they are of integer data type.

    The basic data types in R are character, numeric, integer, complex and logical.

    To view a data frame, we need to provide the name of the data frame as shown in the following code:

    v_marks

    ## [1] Name                  Roll_Number           Company

    ## [4] Computer_Architecture Data_Analysis         R_Programming

    ## <0 rows> (or 0-length row.names)

    Every data frame has a structure. To view the structure

    Enjoying the preview?
    Page 1 of 1