Speech Recognition: Fundamentals and Applications
By Fouad Sabry
()
About this ebook
What Is Speech Recognition
Computer science and computational linguistics include a subfield called speech recognition that focuses on the development of approaches and technologies that enable computers to recognize spoken language and translate it into text. Speech recognition is an interdisciplinary subfield of computer science. It is also known as computer speech recognition (CSR) and speech to text (STT). Another name for it is automatic speech recognition (ASR). The domains of computer science, linguistics, and computer engineering are all represented in its incorporation of knowledge and study. Speech synthesis is the process of doing things backwards.
How You Will Benefit
(I) Insights, and validations about the following topics:
Chapter 1: Speech recognition
Chapter 2: Computational linguistics
Chapter 3: Natural language processing
Chapter 4: Speech processing
Chapter 5: Pattern recognition
Chapter 6: Language model
Chapter 7: Deep learning
Chapter 8: Recurrent neural network
Chapter 9: Long short-term memory
Chapter 10: Voice computing
(II) Answering the public top questions about speech recognition.
(III) Real world examples for the usage of speech recognition in many fields.
(IV) 17 appendices to explain, briefly, 266 emerging technologies in each industry to have 360-degree full understanding of speech recognition' technologies.
Who This Book Is For
Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of speech recognition.
Related to Speech Recognition
Titles in the series (100)
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation Rating: 0 out of 5 stars0 ratingsRecurrent Neural Networks: Fundamentals and Applications from Simple to Gated Architectures Rating: 0 out of 5 stars0 ratingsBio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World Rating: 0 out of 5 stars0 ratingsRadial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks Rating: 0 out of 5 stars0 ratingsFeedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs Rating: 0 out of 5 stars0 ratingsConvolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery Rating: 0 out of 5 stars0 ratingsLong Short Term Memory: Fundamentals and Applications for Sequence Prediction Rating: 0 out of 5 stars0 ratingsGroup Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis Rating: 0 out of 5 stars0 ratingsK Nearest Neighbor Algorithm: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsArtificial Immune Systems: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsArtificial Intelligence Systems Integration: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsAlternating Decision Tree: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsHopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories Rating: 0 out of 5 stars0 ratingsAttractor Networks: Fundamentals and Applications in Computational Neuroscience Rating: 0 out of 5 stars0 ratingsStatistical Classification: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsCompetitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition Rating: 0 out of 5 stars0 ratingsMultilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks Rating: 0 out of 5 stars0 ratingsHebbian Learning: Fundamentals and Applications for Uniting Memory and Learning Rating: 0 out of 5 stars0 ratingsNouvelle Artificial Intelligence: Fundamentals and Applications for Producing Robots With Intelligence Levels Similar to Insects Rating: 0 out of 5 stars0 ratingsRestricted Boltzmann Machine: Fundamentals and Applications for Unlocking the Hidden Layers of Artificial Intelligence Rating: 0 out of 5 stars0 ratingsPerceptrons: Fundamentals and Applications for The Neural Building Block Rating: 0 out of 5 stars0 ratingsNeuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution Rating: 0 out of 5 stars0 ratingsSituated Artificial Intelligence: Fundamentals and Applications for Integrating Intelligence With Action Rating: 0 out of 5 stars0 ratingsNaive Bayes Classifier: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsAgent Architecture: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsCognitive Architecture: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsEmbodied Cognitive Science: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsBackpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning Rating: 0 out of 5 stars0 ratingsMonitoring and Surveillance Agents: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsSupport Vector Machine: Fundamentals and Applications Rating: 0 out of 5 stars0 ratings
Related ebooks
Natural Language Understanding: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsTerminology Extraction: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsNatural Language User Interface: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsNatural Language Processing: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsSilent Speech Interface: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsStatistical Semantics: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsMachine Translation: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsRobust Automatic Speech Recognition: A Bridge to Practical Applications Rating: 0 out of 5 stars0 ratingsSpoken Language Understanding: Systems for Extracting Semantic Information from Speech Rating: 0 out of 5 stars0 ratingsLanguage Identification: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsExplanation Based Learning: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsSpeech Generating Device: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsMachine Reading Comprehension: Algorithms and Practice Rating: 0 out of 5 stars0 ratingsDeep Learning: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsComputers and Languages: Theory and Practice Rating: 0 out of 5 stars0 ratingsVoice Application Development for Android Rating: 1 out of 5 stars1/5Knowledge Reasoning: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsSpeech and Audio Processing for Coding, Enhancement and Recognition Rating: 0 out of 5 stars0 ratingsSpeaker Recognition: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsMastering Voice Interfaces: Creating Great Voice Apps for Real Users Rating: 0 out of 5 stars0 ratingsPrompt Engineering ; The Future Of Language Generation Rating: 4 out of 5 stars4/5Conceptual Dependency Theory: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsSpeech Recognition: Invited Papers Presented at the 1974 IEEE Symposium Rating: 0 out of 5 stars0 ratingsTopological UML Modeling: An Improved Approach for Domain Modeling and Software Development Rating: 0 out of 5 stars0 ratingsTechniques for Noise Robustness in Automatic Speech Recognition Rating: 0 out of 5 stars0 ratingsDistributed Systems Architecture: A Middleware Approach Rating: 0 out of 5 stars0 ratingsBeginning Ring Programming: From Novice to Professional Rating: 0 out of 5 stars0 ratingsGenerative AI Tools for Developers: A Practical Guide Rating: 0 out of 5 stars0 ratingsThe Language of Localization Rating: 1 out of 5 stars1/5From Words to Insights: A Deep Dive into Natural Language Processing Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
2084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5Our Final Invention: Artificial Intelligence and the End of the Human Era Rating: 4 out of 5 stars4/5Impromptu: Amplifying Our Humanity Through AI Rating: 5 out of 5 stars5/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 4 out of 5 stars4/5Summary of Super-Intelligence From Nick Bostrom Rating: 5 out of 5 stars5/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsThe Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions Rating: 5 out of 5 stars5/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications Rating: 0 out of 5 stars0 ratingsWays of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence Rating: 4 out of 5 stars4/5Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6 Rating: 0 out of 5 stars0 ratingsAI for Educators: AI for Educators Rating: 5 out of 5 stars5/5The Algorithm of the Universe (A New Perspective to Cognitive AI) Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 0 out of 5 stars0 ratingsDancing with Qubits: How quantum computing works and how it can change the world Rating: 5 out of 5 stars5/5
Reviews for Speech Recognition
0 ratings0 reviews
Book preview
Speech Recognition - Fouad Sabry
Chapter 1: Speech recognition
Computer science and computational linguistics have spawned a subfield known as speech recognition, which is an interdisciplinary field that focuses on the development of methodologies and technologies that enable computers to recognize and translate spoken language into text. The primary advantage of this is that the text can then be searched. Automatic speech recognition, sometimes abbreviated as ASR, is another name for it, as is computer speech recognition and voice to text (STT). The domains of computer science, linguistics, and computer engineering are all represented in its incorporation of knowledge and study. Speech synthesis is the process that occurs in reverse.
Certain voice recognition systems name this process training,
although it's also known as enrollment.
During training, an individual reader feeds the system text or isolated vocabulary. The accuracy of the speech recognition for that individual is improved as a consequence of the system's analysis of that person's unique voice and its application of that analysis to the process. Speaker-independent
systems are those that do not need users to go through any kind of training. The term speaker dependent
refers to the systems that need training.
Included in speech recognition applications are voice user interfaces such as voice dialing (for example, call home
), call routing (for example, I would like to make a collect call
), domotic appliance control, search key words (for example, find a podcast where particular words were spoken), simple data entry (for example, entering a credit card number), preparation of structured documents (for example, a radiology report), determining speaker characteristics, and speech-to-text processing (for example, word processors (usually termed direct voice input).
Voice recognition is more concerned with identifying who is speaking than with understanding what is being said by the individual. The task of translating speech in systems that have been trained on a specific person's voice can be made easier by recognizing the speaker, or it can be used to authenticate or verify the identity of a speaker as part of a security process. Both of these uses are important for ensuring the safety of sensitive information.
Speech recognition has a lengthy history, and during that history, there have been multiple waves of significant technological advancements. Recent developments in areas such as deep learning and big data have been beneficial to the subject. The developments are shown not only by the increase in the number of academic articles that have been published in the subject, but more significantly by the global industrial acceptance of a range of deep learning approaches in the process of creating and implementing voice recognition systems.
The most significant improvements were made in the following areas: vocabulary size; speaker independence; and processing speed.
The year 1952 saw three researchers from Bell Labs, Stephen Balashek,, The source-filter model of speech generation was created and published by Gunnar Fant in the year 1960.
At the World's Fair in 1962, IBM showed off the voice recognition capabilities of their Shoebox
system, which could recognize up to 16 words.
While working on voice recognition in 1966, Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone (NTT) came up with the idea for the Linear Predictive Coding (LPC) technique of speech coding.
In 1969, the prominent John Pierce issued an open letter that was critical of and defunded speech recognition research. As a result of this letter, funding for speech recognition research at Bell Labs dried up for many years. This financing cut remained in place until Pierce left the company and James L. Flanagan took charge.
When Raj Reddy was a graduate student at Stanford University in the late 1960s, he was the first person to work on continuous speech recognition. Previous methods required a pause from the user after each each word. The game of chess was controlled by verbal orders delivered by Reddy's system.
Around this time, researchers from the Soviet Union developed the dynamic temporal warping (DTW) method. They then used it to the development of a recognizer that could function using a vocabulary of up to 200 words. DTW analyzed speech by first breaking it into several small frames, each lasting ten milliseconds or less, and then processing each frame as if it were an independent entity. Although DTW would eventually be replaced by more advanced algorithms, the method itself survived. At this point in time, the problem of achieving speaker autonomy had not been resolved.
Voice Understanding Study was given funding by DARPA for a period of five years in 1971. This research focused on speech recognition and aimed to have a vocabulary of at least one thousand words. They believed that comprehending speech would be essential to make advances in speech recognition, but this turned out to not be the case later on. Research on voice recognition was resuscitated as a result of John Pierce's letter.
In the year 1972, a conference was hosted by the IEEE Acoustics, Speech, and Signal Processing section in Newton, Massachusetts.
Since its inception in 1976, the International Conference on Acoustics, Voice, and Signal Processing (ICASSP) has been the preeminent forum for the presentation and publishing of research on speech recognition. Researchers were able to integrate many domains of expertise, such as acoustics, language, and grammar, into a single probabilistic model thanks to the use of hidden Markov models (HMMs).
One of IBM's few rivals, Fred Jelinek's team at IBM built a voice-activated typewriter named Tangora in the middle of the 1980s. Tangora could handle a vocabulary of 20,000 words and was one of IBM's few competitors.
In addition, the n-gram language model was developed and put into use throughout the 1980s.
1987 saw the introduction of the back-off model, which made it possible for language models to make use of n-grams of varying lengths. At the same time, CSELT began using HMM to distinguish different languages (both in software and in hardware specialized processors, e.g. RIPAC).
The fast expanding capabilities of computers are largely responsible for the significant progress that has been made in this area. The DARPA program came to an end in 1976, and the finest computer that was accessible to researchers at that time was the PDP-10 with 4 MB of RAM.
There were also two useful goods:
1984 saw the debut of the Apricot Portable, which supported a maximum of 4096 words but could only keep 64 of them in RAM at any one moment.
1987 – a recognizer from Kurzweil Applied Intelligence
Dragon Dictate, a consumer product that was produced in 1990 Xuedong Huang, a former student of Raj Reddy's who designed the Sphinx-II system at CMU, was the inventor of Dragon Dictate. Sphinx-II was the first system to accomplish speaker-independent, big vocabulary, continuous speech recognition, and it had the greatest performance in DARPA's 1992 assessment. Sphinx-II also featured the most advanced features. A significant turning point in the development of voice recognition was the ability to process continuous speech together with a huge vocabulary. After that, in 1993, Huang established the voice recognition division at Microsoft where he worked. Kai-Fu Lee, who had been a student of Raj Reddy's, went on to work at Apple, where in 1992 he contributed to the creation of a voice interface prototype for the Apple computer known as Casper.
A firm called Lernout & Hauspie, which is situated in Belgium, is in the business of voice recognition. Over the years, it has purchased many other businesses, including Dragon Systems in 2000 and Kurzweil Applied Intelligence in 1997. Within the Windows XP operating system was a component that made use of the L&H voice technology. Before the firm was shut down in 2001 due to an accounting scandal, L&H had a prominent position in the industry. ScanSoft, who later changed their name to Nuance in 2005, purchased the speech technology that L&H had developed. Initially, Apple licensed software from Nuance in order to offer Siri, the company's digital assistant, with the capacity of voice recognition.
Both the Effective Affordable Reusable Voice-to-Text (EARS) program in 2002 and the Global Autonomous Language Exploitation program were speech recognition initiatives that were financed by DARPA throughout the 2000s (GALE). There were a total of four teams that took part in the EARS program. These teams included IBM, a group that was directed by BBN and included LIMSI and the University of Pittsburgh, Cambridge University, and