Applied Speech Processing: Algorithms and Case Studies
By Nilanjan Dey
()
About this ebook
- Includes basics of speech data analysis and management tools with several applications, highlighting recording systems
- Covers different techniques of big data and Internet-of-Things in speech signal processing, including machine learning and data mining
- Offers a multidisciplinary view of current and future challenges in this field, with extensive case studies on the design, implementation, development and management of intelligent systems, neural networks, and related machine learning techniques for speech signal processing
Nilanjan Dey
Nilanjan Dey is an Associate Professor in the Department of Computer Science and Engineering, Techno International New Town, Kolkata, India. He is a visiting fellow of the University of Reading, UK. He also holds a position of Adjunct Professor at Ton Duc Thang University, Ho Chi Minh City, Vietnam. Previously, he held an honorary position of Visiting Scientist at Global Biomedical Technologies Inc., CA, USA (2012–2015). He was awarded his PhD from Jadavpur University in 2015. He is the Editor-in-Chief of the International Journal of Ambient Computing and Intelligence , IGI Global, USA. He is the Series Co-Editor of Springer Tracts in Nature-Inspired Computing (SpringerNature), Data-Intensive Research(SpringerNature), Advances in Ubiquitous Sensing Applications for Healthcare (Elsevier). He was an associate editor of IET Image Processing and editorial board member of Complex & Intelligent Systems, Springer Nature. He is an editorial board member of Applied Soft Computing, Elsevier. He is having 35 authored books and over 300 publications in the area of medical imaging, machine learning, computer aided diagnosis, data mining, etc. He is the Fellow of IETE and Senior member of IEEE.
Read more from Nilanjan Dey
Social Network Analytics: Computational Research Methods and Techniques Rating: 0 out of 5 stars0 ratingsMagnetic Resonance Imaging: Recording, Reconstruction and Assessment Rating: 5 out of 5 stars5/5Soft Computing Based Medical Image Analysis Rating: 0 out of 5 stars0 ratingsBiomedical Sensors and Smart Sensing: A Beginner's Guide Rating: 0 out of 5 stars0 ratingsA Beginner's Guide to Data Agglomeration and Intelligent Sensing Rating: 0 out of 5 stars0 ratings
Related to Applied Speech Processing
Related ebooks
Artificial Intelligence-Based Brain-Computer Interface Rating: 0 out of 5 stars0 ratingsAcademic Press Library in Signal Processing, Volume 7: Array, Radar and Communications Engineering Rating: 0 out of 5 stars0 ratingsCognitive Systems and Signal Processing in Image Processing Rating: 0 out of 5 stars0 ratingsIntelligent Speech Signal Processing Rating: 0 out of 5 stars0 ratingsNew Paradigms in Computational Modeling and Its Applications Rating: 0 out of 5 stars0 ratingsGenerative Adversarial Networks for Image-to-Image Translation Rating: 0 out of 5 stars0 ratingsIntelligent Data Analysis for Biomedical Applications: Challenges and Solutions Rating: 0 out of 5 stars0 ratingsNanoelectronics: Devices, Circuits and Systems Rating: 0 out of 5 stars0 ratingsArtificial Intelligence and Data Science in Environmental Sensing Rating: 0 out of 5 stars0 ratingsMachine Learning for Future Fiber-Optic Communication Systems Rating: 0 out of 5 stars0 ratingsNanoscale Memristor Device and Circuits Design Rating: 0 out of 5 stars0 ratingsDeep Learning for Data Analytics: Foundations, Biomedical Applications, and Challenges Rating: 0 out of 5 stars0 ratingsMulti-Objective Combinatorial Optimization Problems and Solution Methods Rating: 0 out of 5 stars0 ratingsApplications of Computational Intelligence in Multi-Disciplinary Research Rating: 0 out of 5 stars0 ratingsSmart Electrical and Mechanical Systems: An Application of Artificial Intelligence and Machine Learning Rating: 0 out of 5 stars0 ratingsCognitive Informatics, Computer Modelling, and Cognitive Science: Volume 1: Theory, Case Studies, and Applications Rating: 0 out of 5 stars0 ratingsComprehensive Guide to Heterogeneous Networks Rating: 0 out of 5 stars0 ratingsNature-Inspired Computation and Swarm Intelligence: Algorithms, Theory and Applications Rating: 0 out of 5 stars0 ratingsDiagnostic Biomedical Signal and Image Processing Applications With Deep Learning Methods Rating: 0 out of 5 stars0 ratingsSmart Sensors Networks: Communication Technologies and Intelligent Applications Rating: 0 out of 5 stars0 ratingsThe Cognitive Approach in Cloud Computing and Internet of Things Technologies for Surveillance Tracking Systems Rating: 0 out of 5 stars0 ratingsEdge-of-Things in Personalized Healthcare Support Systems Rating: 0 out of 5 stars0 ratingsBlockchain for Smart Cities Rating: 0 out of 5 stars0 ratingsArtificial Neural Networks for Renewable Energy Systems and Real-World Applications Rating: 0 out of 5 stars0 ratingsMulti-Paradigm Modelling Approaches for Cyber-Physical Systems Rating: 0 out of 5 stars0 ratingsMetaheuristics in Water, Geotechnical and Transport Engineering Rating: 0 out of 5 stars0 ratingsCooperative and Cognitive Satellite Systems Rating: 5 out of 5 stars5/5Advanced Antenna Systems for 5G Network Deployments: Bridging the Gap Between Theory and Practice Rating: 5 out of 5 stars5/5Handbook of Computational Intelligence in Biomedical Engineering and Healthcare Rating: 0 out of 5 stars0 ratings
Technology & Engineering For You
Electrical Engineering 101: Everything You Should Have Learned in School...but Probably Didn't Rating: 5 out of 5 stars5/5The Big Book of Maker Skills: Tools & Techniques for Building Great Tech Projects Rating: 4 out of 5 stars4/5The Big Book of Hacks: 264 Amazing DIY Tech Projects Rating: 4 out of 5 stars4/580/20 Principle: The Secret to Working Less and Making More Rating: 5 out of 5 stars5/5The 48 Laws of Power in Practice: The 3 Most Powerful Laws & The 4 Indispensable Power Principles Rating: 5 out of 5 stars5/5The CIA Lockpicking Manual Rating: 5 out of 5 stars5/5The Art of Tinkering: Meet 150+ Makers Working at the Intersection of Art, Science & Technology Rating: 4 out of 5 stars4/5The Art of War Rating: 4 out of 5 stars4/5Logic Pro X For Dummies Rating: 0 out of 5 stars0 ratingsThe Total Inventor's Manual: Transform Your Idea into a Top-Selling Product Rating: 1 out of 5 stars1/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsSmart Phone Dumb Phone: Free Yourself from Digital Addiction Rating: 0 out of 5 stars0 ratingsSummary of Nicolas Cole's The Art and Business of Online Writing Rating: 4 out of 5 stars4/5My Inventions: The Autobiography of Nikola Tesla Rating: 4 out of 5 stars4/5The Fast Track to Your Technician Class Ham Radio License: For Exams July 1, 2022 - June 30, 2026 Rating: 5 out of 5 stars5/5Broken Money: Why Our Financial System is Failing Us and How We Can Make it Better Rating: 5 out of 5 stars5/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5The Total Motorcycling Manual: 291 Essential Skills Rating: 5 out of 5 stars5/5Understanding Media: The Extensions of Man Rating: 4 out of 5 stars4/5The Complete Titanic Chronicles: A Night to Remember and The Night Lives On Rating: 4 out of 5 stars4/5No Nonsense Technician Class License Study Guide: for Tests Given Between July 2018 and June 2022 Rating: 5 out of 5 stars5/5The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos, Rating: 4 out of 5 stars4/5How to Disappear and Live Off the Grid: A CIA Insider's Guide Rating: 0 out of 5 stars0 ratingsThe Art of War Rating: 4 out of 5 stars4/5Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time Rating: 4 out of 5 stars4/5
Related categories
Reviews for Applied Speech Processing
0 ratings0 reviews
Book preview
Applied Speech Processing - Nilanjan Dey
Japan
Preface
Nilanjan Dey, Editor, Department of Computer Science and Engineering, JIS University, Kolkata, India
This book presents basics of speech data analysis and management tools with several applications by covering different techniques in speech signal processing. Part 1 Speech enhancement and synthesis
includes five chapters, and Part 2 Speech identification, feature selection, and classification
includes three chapters. In Chapter 1, Radhika and Chandrasekar apply a data-selective affine projection algorithm (APA) for speech processing applications. To remove noninnovative data and impulsive noise, the authors propose a kurtosis of error-based update rule. Results of the author’s study show that the proposed scheme is suitable for speech processing application, as it obtained reduction in space and time as well as increased efficiency. In Chapter 2, Upadhyay and Rosales propose a recursive noise estimation-based Wiener filtering method for monaural speech enhancement. This method estimates the noise from present and past frames of noisy speech continuously, using a smoothing parameter value between 0 and 1. The authors compare the performance of the proposed approach with traditional speech enhancement methods. In Chapter 3, Kalamani and Krishnamoorthi develop a least mean square adaptive noise reduction (LMS-ANR) algorithm for enhancing the Tamil speech signal with acceptable quality under a nonstationary noisy environment that automatically adapts its coefficients with respect to input noisy signals. In Chapter 4, Saleem and Khattak propose an unsupervised speech enhancement to decrease the noise in nonstationary and difficult noisy backgrounds. They accomplish this by replacing the spectral phase of the noisy speech with an estimated spectral phase and merging it with a novel time-frequency mask during signal reconstruction. The results show considerable improvements in terms of short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), segmental signal-to-noise ratio (SSNR), and speech distortion. Chapter 5 by Khosravy et al. introduces a novel approach to speech synthesis by adaptively constructing and combining the harmonic components based on the fusion of Fourier series and adaptive filtering.
Part 2 begins with Chapter 6 by Bibish Kumar et al., who discuss the primary task of identifying visemes and the number of frames required to encode the temporal evolution of vowel and consonant phonemes using an audio-visual Malayalam speech database. In Chapter 7, Al-Kaltakchi et al. propose novel fusion strategies for text-independent speaker identification. The authors apply four main simulations for speaker identification accuracy (SIA), using different fusion strategies, including feature-based early fusion, score-based late fusion, early-late fusion (combination of feature and score-based), late fusion for concatenated features, and statistically independent normalized scores fusion for all the previous scores. In Chapter 8, Sangeetha et al. use the TAU Urban Acoustic Scenes 2019 dataset and DCASE 2016 Challenge Dataset to compare various standard classifications, including support vector machines (SVMs) using different kernels, decision trees, and logistic regression for classifying audio events. The authors extract several features to generate the feature vector, such as Mel-frequency cepstral coefficients (MFCCs). The experimental results prove that the SVM with linear kernels yields the best result compared to other machine learning algorithms.
The editor would like to express his gratitude to the authors and referees for their contributions. Without their hard work and cooperation, this book would not have come to fruition. Extended thanks are given to the members of the Elsevier team for their support.
Part 1
Speech enhancement and synthesis
Chapter 1: Kurtosis-based, data-selective affine projection adaptive filtering algorithm for speech processing application
S. Radhikaa; A. Chandrasekarb a Department of Electrical and Electronics Engineering, School of Electrical and Electronics Engineering, Sathyabama Institute of Science and Technology, Chennai, India
b Department of Computer Science and Engineering, St. Joseph’s College of Engineering, Chennai, India
Abstract
The data sets involved in speech processing applications are very large and, as such, they require huge memory and high-speed processing algorithms. Therefore data selectivity becomes inevitable in the present context. Moreover, these aggregated data sets often suffer from outliers that may occur due to the surroundings or measurement errors. Data-selective adaptive filters incorporate the strategy of data selection with removal of outliers. This is particularly useful when the new data does not provide any useful information when compared with existing old data. The affine projection algorithm (APA) is one of the most widely used algorithms for speech processing application due to its improved performance in terms of low steady-state error and fast convergence speed. The conventional algorithms do not incorporate data selectivity and hence they cannot solve problems associated with large data size. The variants available suffer from low efficiency, high computational load, high power consumption, and data redundancy; they are more suitable for Gaussian noise. Thus this chapter focuses on data-selective APA for speech processing applications. It proposes a kurtosis of error-based update rule that can simultaneously remove noninnovative data and impulsive noise. The proposed algorithm can reduce computational cost in terms of lesser coefficients available for updating while maintaining the same accuracy. Simulations were performed on real and simulated data sets to validate the performance improvement of the proposed algorithm.
Keywords
Affine projection algorithm; Data selection; Speech; Kurtosis; Steady state mean square error; Convergence
1.1: Introduction
Speech is an important mode of communication that is produced naturally without any electronic devices. Some of the major applications of the speech signal include vehicle automation, gaming, communication systems, new language acquisition, correct pronunciation, online teaching, and so on. In addition, speech signals also find applications in medicine for developing assistive devices, identifying cognitive disorders, and so on. In the military field, speech signals are used for the development of high-end fighter jets, immersive audio flights, and more [1–4]. Nowadays, speech signal processing is inevitable for the development of smart cities. Generally, these speech signals are collected using acoustic sensors deployed in different places. The tremendous increase in sensors at cheaper rates results in the availability of large amounts of data. The bulk data sets produced by these sensors demand huge amounts of memory space and fast processing speeds. Moreover, these aggregated data sets often suffer from outliers that may occur due to the surroundings or measurement errors [5]. Another key issue is that not all data in the data set are useful, as some data may not contain information. Thus there is a growing demand for some sort of adaptive algorithm that incorporates data selectivity with the capability to remove noise and outliers [2]. Basically, error is used as a metric to conclude the level of new information in the data set. As the speech signal contains more non-Gaussian noise, second-order statistics of error are not suitable metrics for speech processing applications. This work proposes an improved data-selective affine projection algorithm (APA) based on kurtosis of error for speech processing applications.
This chapter is organized as follows. Section 1.2 discusses the nature of speech signals, adaptive algorithms for speech processing applications, the traditional adaptive algorithms of Least Mean Square (LMS) and Normalized LMS (NLMS), and the proposed APA algorithm. It also examines the problems associated with current data-selective adaptive algorithms. Section 1.3 details the system model for the adaptive algorithm, and Section 1.4 examines the proposed update rule. Section 1.4 also discusses the mean squared error (MSE) of the algorithm and the nature of noise and error sources. Further, the section analyzes the steady-state MSE of the APA algorithm using the new proposed update rule. It provides simulations in which different scenarios are taken and compared with their original counterparts, and discusses the results obtained. Finally, Section 1.5 presents conclusions along with the limitations and future scope of the proposed work.
1.2: Literature review
In order to design adaptive algorithms suitable for speech processing applications, it is required to understand their nature. The impulse response of a speech signal has the general characteristics of long sequence length. It is also said to be time varying and subjected to both impulsive and background noises. Thus it is evident that a filter capable of adjusting the filter coefficients according to the change of signal properties is required for speech processing applications, as the signal statistics are not known prior or are time varying. An adaptive filter is a type of filter in which the coefficients are changed depending on the adaptive algorithm used. Therefore adaptive filters are the unanimous choice for speech processing applications [6]. The criteria to be satisfied by an adaptive filter used for speech are data-selective capability, fast convergence, low steady-state error, robustness against background and impulsive noise (in case of double talk), and reduced computational complexity