Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Natural Language Processing: Fundamentals and Applications
Natural Language Processing: Fundamentals and Applications
Natural Language Processing: Fundamentals and Applications
Ebook139 pages1 hour

Natural Language Processing: Fundamentals and Applications

Rating: 0 out of 5 stars

()

Read preview

About this ebook

What Is Natural Language Processing


Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence that focuses on the interactions between computers and human language, specifically how to train computers to process and analyze massive volumes of natural language data. NLP is an interdisciplinary subfield that focuses on the interactions between computers and human language. The end goal is to have a computer that is capable of "understanding" the contents of documents, including the contextual intricacies of the language that is used within them. After that, the system is able to accurately extract information and insights contained within the papers, in addition to classifying and organizing the documents themselves.


How You Will Benefit


(I) Insights, and validations about the following topics:


Chapter 1: Introduction to Natural Language Processing


Chapter 2: Tokenization and Text Normalization


Chapter 3: Part-of-Speech Tagging


Chapter 4: Parsing and Syntax Trees


Chapter 5: Named Entity Recognition


Chapter 6: Sentiment Analysis


Chapter 7: Machine Translation


Chapter 8: Word Embeddings and Vector Space Models


Chapter 9: Deep Learning for Natural Language Processing


Chapter 10: Dialogue Systems and Chatbots


(II) Answering the public top questions about natural language processing.


(III) Real world examples for the usage of natural language processing in many fields.


(IV) 17 appendices to explain, briefly, 266 emerging technologies in each industry to have 360-degree full understanding of natural language processing' technologies.


Who This Book Is For


Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of natural language processing.

LanguageEnglish
Release dateJul 4, 2023
Natural Language Processing: Fundamentals and Applications

Related to Natural Language Processing

Titles in the series (100)

View More

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Natural Language Processing

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Natural Language Processing - Fouad Sabry

    Chapter 1: Natural language processing

    Natural language processing, also known as NLP, is a subfield of linguistics, computer science, and artificial intelligence that focuses on the interactions between computers and human language. More specifically, NLP investigates how to program computers to process and analyze large amounts of data pertaining to natural language. A computer that is able to understand the contents of papers, including the contextual subtleties of the language that is included inside them, is the objective of this project. After that, the system is able to correctly extract information and insights contained within the papers, in addition to classifying and organizing the documents themselves.

    Speech recognition, natural-language comprehension, and natural-language creation are three areas that regularly present difficulties in the field of natural language processing.

    The 1950s were a formative decade for the field of natural language processing. Already in the year 1950, Alan Turing wrote an essay titled Computing Machinery and Intelligence, in which he suggested what is now known as the Turing test as a criterion of intelligence. However, at the time, this was not stated as an issue apart from artificial intelligence. An activity that involves the automatic interpretation and production of natural language is included into the test that has been suggested.

    The idea behind symbolic NLP may be summed up rather succinctly by John Searle's famous Chinese room experiment: A computer may imitate natural language comprehension (or do other NLP activities) when it is provided with a set of rules (for example, a Chinese phrasebook with questions and corresponding answers). The computer does this by applying the rules to the data with which it is presented.

    1950s: As part of the Georgetown experiment in 1954, more than sixty Russian phrases were automatically translated into English. The authors believed that the issue of accurate machine translation will be resolved within the next three to five years. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was drastically reduced. This occurred after the report found that ten years of research had failed to fulfill the expectations. The field of machine translation saw very little additional development until the late 1980s, when the first statistical machine translation systems were established.

    SHRDLU, a natural language system working in restricted blocks worlds with restricted vocabularies, and ELIZA, a simulation of a Rogerian psychotherapist written by Joseph Weizenbaum between 1964 and 1966, were two of the most successful natural language processing systems developed in the 1960s. SHRDLU was a natural language system working in restricted blocks worlds with restricted vocabularies. ELIZA was able to create an encounter that was sometimes astonishingly human-like despite the fact that she had essentially little knowledge about human cognition or emotion. When the patient surpassed the extremely narrow knowledge base, ELIZA may have provided a general answer. For instance, in response to the statement My head aches, ELIZA may have asked, Why do you claim your head hurts?.

    1970s: During the 1970s, a large number of programmers started writing conceptual ontologies, which organized information from the actual world into data that a computer could interpret. Examples include MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units. Other authors include Meehan (1976), Meehan (1976), QUALM (Lehnert, 1977), and Meehan (1976). (Lehnert 1981). Around this period, the very first chatterbots were put into production (e.g., PARRY).

    The golden age of symbolic approaches in natural language processing (NLP) was the 1980s and early 1990s. Research on rule-based parsing (for example, the creation of HPSG as a computer operationalization of generative grammar) and morphology (for example, two-level morphology) were among the areas of concentration at the time.

    Up until the 1980s, the vast majority of natural language processing systems relied on intricate sets of rules that were developed by hand. However, beginning in the late 1980s, there was a revolution in the field of natural language processing due to the development of machine learning algorithms for language processing. This resulted in significant advancements in the field. This was due to both the consistent increase in computational power (see Moore's law) and the gradual lessening of the dominance of Chomskyan theories of linguistics (such as transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that is the basis for the machine-learning approach to language processing. This was because the theoretical underpinnings of Chomskyan theories of linguistics discouraged the sort of corpus linguistics that is the basis for the machine-.

    During the 1990s, the majority of the significant early accomplishments on statistical approaches in natural language processing took place in the area of machine translation. This was mostly owing to the work done at IBM Research. These systems were able to make use of preexisting multilingual textual corpora that had been produced by the Parliament of Canada and the European Union. This was made possible as a result of laws that required the translation of all governmental proceedings into all of the official languages of the respective systems of government. As a result, these systems were able to take advantage of the existing multilingual textual corpora. However, the vast majority of other systems relied on corpora that were generated particularly for the tasks that were being accomplished by these systems. This was (and often still is) a severe restriction that hindered the effectiveness of these systems. As a consequence of this, a significant amount of investigation has been put into developing strategies for more efficiently gaining insights from restricted quantities of data.

    Since the middle of the 1990s, growing quantities of raw (unannotated) linguistic data have been accessible due to the expansion of the web, which began in the 2000s. Therefore, researchers have been concentrating their efforts more and more on semi-supervised and unsupervised learning methods. These kinds of algorithms are able to learn from data that has not been hand-annotated with the answers that are required, or they can learn using a mix of data that has been annotated and data that has not been annotated. This job is often far more challenging than supervised learning, and it typically gives outcomes that are less accurate for a given quantity of input data. However, there is a vast amount of data that is not annotated and is readily available. This data includes, among other things, the entirety of the content on the World Wide Web. This data, when combined with an algorithm that has a low enough time complexity to be practical, can frequently compensate for less than ideal results.

    In the 2010s, representation learning and other machine learning techniques modeled after deep neural networks become more prevalent in the field of natural language processing. This popularity was in part caused by a rush of studies proving that such strategies were successful.

    In the early days of computer technology, many language processing systems were developed using symbolic approaches. This included the manual coding of a set of rules, which was then combined with a dictionary search. For example, grammars were written, and heuristic rules for stemming were developed.

    Recent systems that are based on machine learning algorithms offer several benefits over manually written rules because of these advantages:

    While the learning processes that are used during machine learning automatically concentrate on the most common scenarios, while developing rules by hand it is sometimes not at all evident where the effort should be directed. The learning procedures that are utilized during machine learning.

    Automatic learning procedures can make use of statistical inference algorithms to produce models that are robust to unfamiliar input (for example, containing words or structures that have not been seen before) as well as to erroneous input. This type of model can be used in a variety of applications, including machine learning, deep learning, and natural language processing (e.g. with misspelled words or words accidentally omitted). Handling such information gracefully using handwritten rules or, more broadly, constructing systems of handwritten rules that make soft judgments is exceptionally challenging, fraught with the possibility of making mistakes, and time-consuming.

    Systems that are designed to automatically learn the rules may have their accuracy improved by simply providing additional input data to the system. However, the only way to improve the accuracy of systems that are based on handwritten rules is to make the rules themselves more complicated, which is a process that is far more challenging. In particular, the complexity of systems that are built on handwritten rules has a limit, and once that limit is exceeded, the systems become more difficult to administer. Nevertheless, producing additional data for use by machine-learning systems necessitates nothing more than a proportional increase in the number of man-hours put in, and this can typically be accomplished without significantly increasing the level of complexity associated with the annotation process.

    Even if machine learning is becoming more popular in the field of NLP research, symbolic approaches are still widely employed (2020):

    when there is inadequate quantity of training data to adequately use

    Enjoying the preview?
    Page 1 of 1