Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Text Analysis with Python: A Research-Oriented Guide
Text Analysis with Python: A Research-Oriented Guide
Text Analysis with Python: A Research-Oriented Guide
Ebook456 pages2 hours

Text Analysis with Python: A Research-Oriented Guide

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Text Analysis with Python: A Research-Oriented Guide is a quick and comprehensive reference on text mining using python code. The main objective of the book is to equip the reader with the knowledge to apply various machine learning and deep learning techniques to text data. The book is organized into eight chapters which present the topic in a structured and progressive way.

Key Features
· Introduces the reader to Python programming and data processing
· Introduces the reader to the preliminaries of natural language processing (NLP)
· Covers data analysis and visualization using predefined python libraries and datasets
· Teaches how to write text mining programs in Python
· Includes text classification and clustering techniques
· Informs the reader about different types of neural networks for text analysis
· Includes advanced analytical techniques such as fuzzy logic and deep learning techniques
· Explains concepts in a simplified and structured way that is ideal for learners
· Includes References for further reading

Text Analysis with Python: A Research-Oriented Guide is an ideal guide for students in data science and computer science courses, and for researchers and analysts who want to work on artificial intelligence projects that require the application of text mining and NLP techniques.

LanguageEnglish
Release dateSep 12, 2002
ISBN9789815049602
Text Analysis with Python: A Research-Oriented Guide
Author

Mamta Mittal

Dr. Mamta Mittal works as Head and Associate Professor (Data Analytics and Data Science) in Delhi Skill & Entrepreneurship University (under Government of NCT Delhi), New Delhi. She received a PhD in Computer Science and Engineering from Thapar University, Patiala; MTech (Honors) in Computer Science & Engineering from YMCA, Faridabad; and B. Tech in Computer Science & Engineering from Kurukshetra University, Kurukshetra, in 2001. She has been teaching for the last 18 years with emphasis on Data Mining, Machine Learning, DBMS and Data Structure. Dr. Mittal is a lifetime member of CSI and published more than 80 research papers. She holds five patents, two of which have been granted copyrights, and three more published in the area of Artificial Intelligence, IoT and Deep Learning. Dr. Mittal has edited/authored many books with reputed publishers, and is working on DST approved Project “Development of IoT based hybrid navigation module for mid-sized autonomous vehicles”. Currently, she is guiding PhD scholars in Machine Learning, Computer Vision and Deep Learning areas. Dr. Mittal is Editorial Board member with Inder-Science, Bentham Science, Springer and Elsevier, handled Special issues, has chaired many Conferences.

Related to Text Analysis with Python

Related ebooks

Programming For You

View More

Related articles

Reviews for Text Analysis with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Text Analysis with Python - Mamta Mittal

    PREFACE

    The book focused on the latest research in the field of text mining using python code. The main objective of the book is to apply various machine learning and deep learning techniques to textual data. Natural language processing and fuzzy rule generation are also discussed in detail along with a basic introduction to python, data handing and shaping. Various data sets are used to show various techniques of text mining in the different research domains. This book is beneficial for the audience who want to work in the field related to text mining. In the book, the authors have presented the content of the book in simple and understandable manner to the reader by using the step-by-step implementation of different algorithms. This book will teach text mining concepts from scratch which is organized in eight chapters.

    Chapter 1 covers the basics and preliminaries of natural language processing. This chapter gives the basic idea about text mining workflow, information retrieval and extraction.

    Chapter 2 provides a brief introduction to the python programming language. This chapter focuses on the core Python language and important libraries to do the natural language processing by using the different IDEs like Anaconda and Google Co laboratory.

    Chapter 3 discusses the data analysis concentrating on data loading and pre-processing concepts of text mining using the python language, learning about importing some predefined Python libraries and visualization techniques using the python various modules.

    Chapter 4 discusses the basics of text mining and writing the python programs by using the NLP open-source libraries. The chapter also discusses different text mining techniques like pre-processing, feature selection, feature extraction, text summarization with detailed examples.

    Chapter 5 presents details about text classification and text prediction techniques. In this chapter, we have given the real movie review dataset and discussed four classifiers namely naive Bayes, random forest, k-nearest neighbour, and support vector machine in detail.

    Chapter 6 presents the details about how to conduct text clustering in python by unsupervised machine learning techniques. To explain this, we adopted the IRIS dataset which is famous in UCI Machine learning Repository and well presented with python script.

    Chapter 7 discusses fuzzy logic, different membership functions, their applications, challenges and how to implement different text mining concepts like pre-processing, feature extraction, clustering, association rules, and classification using the fuzzy membership functions and fuzzy rules.

    Chapter 8 provides details about deep learning in text mining using python. In this chapter, the basics of deep learning, different activation functions, their applications, challenges and how to write the python program using deep learning have been presented and explained clearly.

    CONSENT FOR PUBLICATION

    Not applicable.

    CONFLICT OF INTEREST

    The author declares no conflict of interest, financial or otherwise.

    ACKNOWLEDGEMENT

    Declared none.

    Mamta Mittal

    Delhi Skill & Entrepreneurship University,

    New Delhi, India

    Gopi Battineni

    University of Camerino,

    Camerino,

    Italy

    Ms Bhimavarapu Usharani

    Department of CSE,

    Koneru Lakshmaiah Education

    Foundation at Vaddeswaram,

    Andhra Pradesh, India

    &

    Lalit Mohan Goyal

    Department of Computer Engineering,

    J.C. Bose University of Science

    & Technology,

    YMCA Faridabad (Hr.), India

    Introduction

    Mittal Mamta, Battineni Gopi, Usharani Bhimavarapu, Mohan Goyal Lalit

    Abstract

    In this chapter, the basic idea of what text data is, along with the definition of natural language and techniques involved in it has been clearly explained. Besides this, the framework of text mining and its workflow framework is explained.

    Keywords: Data mining, Linguistics, NLP, Sentimental analysis, Text data.

    1.1. INTRODUCTION

    The text information accessible on the web and personal computers is increasing quickly, but dealing with that text data is not an easy task. To deal with the task, the introduction of intelligent algorithms was made to recover significant data from the data archives [1]. This data recovery is called text mining. Text mining is not a new concept, it is an advanced concept from data mining, and all the algorithms of data mining can be employed on the text data [2]. The difference between the idea of actual data mining and text mining is data mining. It is only applied to organized data, whereas text mining can be applied both on uni structural and semi-structural data. These days, text mining depends upon whether we are searching for the text context or content of the text.

    As we said, the text mining functions largely on unstructured data, in reality, to make this conceivable, the data has to be converted into a semi-structured or structured manner so the data mining-based machine learning algorithms (ML) can be applied easily [3]. This data conversion is done by data pre-processing techniques. The pre-processing of the text data is an essential step as the text data prepared for the mining is set up. If we do not do this, a data conflict might occur and it may be possible to attain comprehensive investigation results. Thus, during data pre-processing, all the accentuation and immaterial words are removed. Words can be gathered into groups, and can stem from their roots. All missing features can be replaced with average values. Text case could be replaced with a unique value. Based on the necessity of the application, we can apply various advances [4]. After the data pre-processing, the data is to be changed over into a vector space model, and on to that vector-space model, different algorithms work.

    Clustering operations, navigation, and visualization can be applied to gather comparable patterns if users are looking for support with text content. Finding connections, identifying relationships, and summarizing texts require content analysis, including data recovery, data extraction, and natural language processing (NLP) [ 5 ]. In this chapter, we will explore the idea of NLP and its philosophy. Besides, we will understand what text data is and its syntax, text pre-processing steps and applications of text in the real world.

    1.2. NATURAL LANGUAGE

    A natural language is a way in which we speak to each other. Both spoken and written communication are natural languages. Most of our daily lives are filled with text data, and we can imagine how often we come across text data in the form of menus, E-mails, SMS, signs, webpages, and others [6]. As a species, we may address one another more than we write. It may be easier to learn how to speak than to write. We communicate primarily through voice and text. As a result of the importance of this type of data, we should have techniques to use and understand regular language, just as we do with other types of information.

    The problem with natural language is that it is chaotic and has few guidelines. We can still understand each other most of the time just by looking at each other. However, human languages are pretty questionable. They are also constantly changing and evolving. Humans are exceptional at producing and understanding language and are capable of expressing, interpreting, and interpreting others' meanings in a nuanced way. Despite our extraordinary proficiency in a language, we are also very bad at understanding and describing the guidelines that govern speech.

    It is still not possible to function with natural language on text data. A half-century of study has been devoted to it, which is difficult to comprehend. From the perspective of the youngster, who spends years learning a language, it is hard for the senior student of language, it is hard for the researcher who uses proposed experiments to demonstrate important phenomena, and it is hard for the architect who builds frameworks that arrange with natural language output. Hence, Turing made familiar discussions in normal language stand out in his test for understanding [7].

    1.2.1. From Linguistics to Natural Language Processing (NLP)

    Language is studied scientifically, including punctuation, semantics, and phonetics. Old-style phonetics involved formulating and evaluating rules of language [8]. However, in general, the difficulties associated with understanding natural language oppose clean mathematical formalisms.

    The term linguist can refer to anyone who examines language. However, a self-described language specialist may be more focused on being out in the field. Science includes mathematics, where mathematicians dealing with natural language might refer to their work as mathematical semantics, focusing on discrete mathematical formulas and hypotheses that can be applied to natural language.

    Computational linguistics, on the other hand, uses computer knowledge to study linguistics. Using computational tools and thinking has overtaken most fields. Linguistics might be the present computational language specialist. The study of computational semantics is the application of computers to the generation and understanding of natural language. The use of computational linguistics in grammar testing is a natural purpose for theoretical linguistics.

    By composing and running software, large text datasets can be mined, and new and different things can be discovered. Due to their improved outcomes, robustness, and speed, statistical techniques and factual machine learning have largely replaced the old-style hierarchical principle-based ways of dealing with language during the 1990s. Currently, the field of natural language research is dominated by the use of statistics.

    Today, data-driven approaches to handling natural language have become so well-known that they should be considered standard approaches to computational linguistics. Undoubtedly, the wide availability of and access to electronic data have contributed to this development; another factor might be the perceived fragility of approaches based mostly on available manual guidelines.

    It is not only the basic statistical methods that can be used to deal with the natural language but also derivation techniques such as those used in applied machine learning. Understanding natural language involves many aspects of morphology, grammar, semantics, pragmatics, and world knowledge. One of the fundamental knowledge requirements for developing successful language frameworks is the ability to collect and encode all of this data.

    1.2.2. Natural Language Processing (NLP)

    As with text and speech, NLP is the automatic control of natural language by software. Over the past 50 years, the study of NLP has outgrown the field of semantics with the development of computers. The idea behind the processing of natural language and why it is so significant will be discussed in this section.

    After reading this chapter, the reader can answer the following questions

    • What is NLP, and how is it not quite the same as other data types?

    • What makes it challenging to work with natural language.

    • Where the field of NLP came from and how it is characterized by current professionals.

    To reflect the empirical approach taken by statical methods or engineer-based strategies, computer linguistics is often referred to as NLP techniques. As a result of the statistical dominance of this field, it is often referred to as statistical NLP, perhaps to exclude it from computational linguistics techniques.

    Computational semantics has both a logical and a designing side. The designing side of computational semantics, often referred to as NLP knowledge, focuses on building computational tools for a language, such as machine interpretation, outline, question-replying, and so forth. NLP incorporates a wide range of different scientific disciplines, as do other engineering disciplines. Even though the statistical approach of NLP has demonstrated incredible success in certain areas, there is still room and great benefits from traditional top-down approaches.

    In general, statistical NLP associates probabilities with the options encountered while investigating an expression or a text identifying the most likely outcome as the correct one. As ML professionals working with text information, we are concerned about the devices and strategies from NLP. From linguistics to NLP was the subject of the previous segment. Currently, we need to determine how analysts and professionals describe the problem here. This may be one of the more general textbooks written by top scientists in the field, which they call linguistic science, allowing discussions of traditional semantics and contemporary statistical techniques.

    Language science aims at describing and clarifying the many linguistic perceptions that circulate around us, such as those in discussions, literature, music, and other media. It has to

    Enjoying the preview?
    Page 1 of 1