Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Applied Speech Processing: Algorithms and Case Studies
Applied Speech Processing: Algorithms and Case Studies
Applied Speech Processing: Algorithms and Case Studies
Ebook337 pages2 hours

Applied Speech Processing: Algorithms and Case Studies

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Applied Speech Processing: Algorithms and Case Studies is concerned with supporting and enhancing the utilization of speech analytics in several systems and real-world activities, including sharing data analytics related information, creating collaboration networks between several participants, and the use of video-conferencing in different application areas. The book provides a well-standing forum to discuss the characteristics of the intelligent speech signal processing systems in different domains. The book is proposed for professionals, scientists, and engineers who are involved in new techniques of intelligent speech signal processing methods and systems. It provides an outstanding foundation for undergraduate and post-graduate students as well.
  • Includes basics of speech data analysis and management tools with several applications, highlighting recording systems
  • Covers different techniques of big data and Internet-of-Things in speech signal processing, including machine learning and data mining
  • Offers a multidisciplinary view of current and future challenges in this field, with extensive case studies on the design, implementation, development and management of intelligent systems, neural networks, and related machine learning techniques for speech signal processing
LanguageEnglish
Release dateJan 19, 2021
ISBN9780128242131
Applied Speech Processing: Algorithms and Case Studies
Author

Nilanjan Dey

Nilanjan Dey is an Associate Professor in the Department of Computer Science and Engineering, Techno International New Town, Kolkata, India. He is a visiting fellow of the University of Reading, UK. He also holds a position of Adjunct Professor at Ton Duc Thang University, Ho Chi Minh City, Vietnam. Previously, he held an honorary position of Visiting Scientist at Global Biomedical Technologies Inc., CA, USA (2012–2015). He was awarded his PhD from Jadavpur University in 2015. He is the Editor-in-Chief of the International Journal of Ambient Computing and Intelligence , IGI Global, USA. He is the Series Co-Editor of Springer Tracts in Nature-Inspired Computing (SpringerNature), Data-Intensive Research(SpringerNature), Advances in Ubiquitous Sensing Applications for Healthcare (Elsevier). He was an associate editor of IET Image Processing and editorial board member of Complex & Intelligent Systems, Springer Nature. He is an editorial board member of Applied Soft Computing, Elsevier. He is having 35 authored books and over 300 publications in the area of medical imaging, machine learning, computer aided diagnosis, data mining, etc. He is the Fellow of IETE and Senior member of IEEE.

Read more from Nilanjan Dey

Related to Applied Speech Processing

Related ebooks

Technology & Engineering For You

View More

Related articles

Related categories

Reviews for Applied Speech Processing

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Applied Speech Processing - Nilanjan Dey

    Japan

    Preface

    Nilanjan Dey, Editor, Department of Computer Science and Engineering, JIS University, Kolkata, India

    This book presents basics of speech data analysis and management tools with several applications by covering different techniques in speech signal processing. Part 1 Speech enhancement and synthesis includes five chapters, and Part 2 Speech identification, feature selection, and classification includes three chapters. In Chapter 1, Radhika and Chandrasekar apply a data-selective affine projection algorithm (APA) for speech processing applications. To remove noninnovative data and impulsive noise, the authors propose a kurtosis of error-based update rule. Results of the author’s study show that the proposed scheme is suitable for speech processing application, as it obtained reduction in space and time as well as increased efficiency. In Chapter 2, Upadhyay and Rosales propose a recursive noise estimation-based Wiener filtering method for monaural speech enhancement. This method estimates the noise from present and past frames of noisy speech continuously, using a smoothing parameter value between 0 and 1. The authors compare the performance of the proposed approach with traditional speech enhancement methods. In Chapter 3, Kalamani and Krishnamoorthi develop a least mean square adaptive noise reduction (LMS-ANR) algorithm for enhancing the Tamil speech signal with acceptable quality under a nonstationary noisy environment that automatically adapts its coefficients with respect to input noisy signals. In Chapter 4, Saleem and Khattak propose an unsupervised speech enhancement to decrease the noise in nonstationary and difficult noisy backgrounds. They accomplish this by replacing the spectral phase of the noisy speech with an estimated spectral phase and merging it with a novel time-frequency mask during signal reconstruction. The results show considerable improvements in terms of short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), segmental signal-to-noise ratio (SSNR), and speech distortion. Chapter 5 by Khosravy et al. introduces a novel approach to speech synthesis by adaptively constructing and combining the harmonic components based on the fusion of Fourier series and adaptive filtering.

    Part 2 begins with Chapter 6 by Bibish Kumar et al., who discuss the primary task of identifying visemes and the number of frames required to encode the temporal evolution of vowel and consonant phonemes using an audio-visual Malayalam speech database. In Chapter 7, Al-Kaltakchi et al. propose novel fusion strategies for text-independent speaker identification. The authors apply four main simulations for speaker identification accuracy (SIA), using different fusion strategies, including feature-based early fusion, score-based late fusion, early-late fusion (combination of feature and score-based), late fusion for concatenated features, and statistically independent normalized scores fusion for all the previous scores. In Chapter 8, Sangeetha et al. use the TAU Urban Acoustic Scenes 2019 dataset and DCASE 2016 Challenge Dataset to compare various standard classifications, including support vector machines (SVMs) using different kernels, decision trees, and logistic regression for classifying audio events. The authors extract several features to generate the feature vector, such as Mel-frequency cepstral coefficients (MFCCs). The experimental results prove that the SVM with linear kernels yields the best result compared to other machine learning algorithms.

    The editor would like to express his gratitude to the authors and referees for their contributions. Without their hard work and cooperation, this book would not have come to fruition. Extended thanks are given to the members of the Elsevier team for their support.

    Part 1

    Speech enhancement and synthesis

    Chapter 1: Kurtosis-based, data-selective affine projection adaptive filtering algorithm for speech processing application

    S. Radhikaa; A. Chandrasekarb    a Department of Electrical and Electronics Engineering, School of Electrical and Electronics Engineering, Sathyabama Institute of Science and Technology, Chennai, India

    b Department of Computer Science and Engineering, St. Joseph’s College of Engineering, Chennai, India

    Abstract

    The data sets involved in speech processing applications are very large and, as such, they require huge memory and high-speed processing algorithms. Therefore data selectivity becomes inevitable in the present context. Moreover, these aggregated data sets often suffer from outliers that may occur due to the surroundings or measurement errors. Data-selective adaptive filters incorporate the strategy of data selection with removal of outliers. This is particularly useful when the new data does not provide any useful information when compared with existing old data. The affine projection algorithm (APA) is one of the most widely used algorithms for speech processing application due to its improved performance in terms of low steady-state error and fast convergence speed. The conventional algorithms do not incorporate data selectivity and hence they cannot solve problems associated with large data size. The variants available suffer from low efficiency, high computational load, high power consumption, and data redundancy; they are more suitable for Gaussian noise. Thus this chapter focuses on data-selective APA for speech processing applications. It proposes a kurtosis of error-based update rule that can simultaneously remove noninnovative data and impulsive noise. The proposed algorithm can reduce computational cost in terms of lesser coefficients available for updating while maintaining the same accuracy. Simulations were performed on real and simulated data sets to validate the performance improvement of the proposed algorithm.

    Keywords

    Affine projection algorithm; Data selection; Speech; Kurtosis; Steady state mean square error; Convergence

    1.1: Introduction

    Speech is an important mode of communication that is produced naturally without any electronic devices. Some of the major applications of the speech signal include vehicle automation, gaming, communication systems, new language acquisition, correct pronunciation, online teaching, and so on. In addition, speech signals also find applications in medicine for developing assistive devices, identifying cognitive disorders, and so on. In the military field, speech signals are used for the development of high-end fighter jets, immersive audio flights, and more [1–4]. Nowadays, speech signal processing is inevitable for the development of smart cities. Generally, these speech signals are collected using acoustic sensors deployed in different places. The tremendous increase in sensors at cheaper rates results in the availability of large amounts of data. The bulk data sets produced by these sensors demand huge amounts of memory space and fast processing speeds. Moreover, these aggregated data sets often suffer from outliers that may occur due to the surroundings or measurement errors [5]. Another key issue is that not all data in the data set are useful, as some data may not contain information. Thus there is a growing demand for some sort of adaptive algorithm that incorporates data selectivity with the capability to remove noise and outliers [2]. Basically, error is used as a metric to conclude the level of new information in the data set. As the speech signal contains more non-Gaussian noise, second-order statistics of error are not suitable metrics for speech processing applications. This work proposes an improved data-selective affine projection algorithm (APA) based on kurtosis of error for speech processing applications.

    This chapter is organized as follows. Section 1.2 discusses the nature of speech signals, adaptive algorithms for speech processing applications, the traditional adaptive algorithms of Least Mean Square (LMS) and Normalized LMS (NLMS), and the proposed APA algorithm. It also examines the problems associated with current data-selective adaptive algorithms. Section 1.3 details the system model for the adaptive algorithm, and Section 1.4 examines the proposed update rule. Section 1.4 also discusses the mean squared error (MSE) of the algorithm and the nature of noise and error sources. Further, the section analyzes the steady-state MSE of the APA algorithm using the new proposed update rule. It provides simulations in which different scenarios are taken and compared with their original counterparts, and discusses the results obtained. Finally, Section 1.5 presents conclusions along with the limitations and future scope of the proposed work.

    1.2: Literature review

    In order to design adaptive algorithms suitable for speech processing applications, it is required to understand their nature. The impulse response of a speech signal has the general characteristics of long sequence length. It is also said to be time varying and subjected to both impulsive and background noises. Thus it is evident that a filter capable of adjusting the filter coefficients according to the change of signal properties is required for speech processing applications, as the signal statistics are not known prior or are time varying. An adaptive filter is a type of filter in which the coefficients are changed depending on the adaptive algorithm used. Therefore adaptive filters are the unanimous choice for speech processing applications [6]. The criteria to be satisfied by an adaptive filter used for speech are data-selective capability, fast convergence, low steady-state error, robustness against background and impulsive noise (in case of double talk), and reduced computational complexity

    Enjoying the preview?
    Page 1 of 1