Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Silent Speech Interface: Fundamentals and Applications
Silent Speech Interface: Fundamentals and Applications
Silent Speech Interface: Fundamentals and Applications
Ebook187 pages2 hours

Silent Speech Interface: Fundamentals and Applications

Rating: 0 out of 5 stars

()

Read preview

About this ebook

What Is Silent Speech Interface


The term "silent speech interface" refers to a technology that enables communication through speech without the need of the sound that is produced when humans vocalize their speech sounds. In this sense, it can be thought of as a form of electronic lip reading. The computer is able to determine the phonemes that a person pronounces based on information about their speech motions and other non-auditory sources of information about those movements. After that, speech synthesis is utilized to reproduce the speech based on these components.


How You Will Benefit


(I) Insights, and validations about the following topics:


Chapter 1: Silent Speech Interface


Chapter 2: Speech Synthesis


Chapter 3: Brain-computer Interface


Chapter 4: Electromyography


Chapter 5: Subvocalization


Chapter 6: Gesture Recognition


Chapter 7: Subvocal Recognition


Chapter 8: Electroencephalography


Chapter 9: Facial Electromyography


Chapter 10: AlterEgo


(II) Answering the public top questions about silent speech interface.


(III) Real world examples for the usage of silent speech interface in many fields.


Who This Book Is For


Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of silent speech interface.


What is Artificial Intelligence Series


The artificial intelligence book series provides comprehensive coverage in over 200 topics. Each ebook covers a specific Artificial Intelligence topic in depth, written by experts in the field. The series aims to give readers a thorough understanding of the concepts, techniques, history and applications of artificial intelligence. Topics covered include machine learning, deep learning, neural networks, computer vision, natural language processing, robotics, ethics and more. The ebooks are written for professionals, students, and anyone interested in learning about the latest developments in this rapidly advancing field.
The artificial intelligence book series provides an in-depth yet accessible exploration, from the fundamental concepts to the state-of-the-art research. With over 200 volumes, readers gain a thorough grounding in all aspects of Artificial Intelligence. The ebooks are designed to build knowledge systematically, with later volumes building on the foundations laid by earlier ones. This comprehensive series is an indispensable resource for anyone seeking to develop expertise in artificial intelligence.

LanguageEnglish
Release dateJul 6, 2023
Silent Speech Interface: Fundamentals and Applications

Related to Silent Speech Interface

Titles in the series (100)

View More

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Silent Speech Interface

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Silent Speech Interface - Fouad Sabry

    Chapter 1: Auditory brainstem implant

    An auditory brainstem implant, also known as an ABI, is a piece of electronic hardware that may be surgically inserted in the brain of a person who is severely deaf as a result of a condition known as retrocochlear hearing impairment (due to illness or injury damaging the cochlea or auditory nerve, and so precluding the use of a cochlear implant). In Europe, both children and adults, as well as individuals diagnosed with neurofibromatosis type II, have been treated with ABIs.

    William F. Cochran was the first person to design the auditory brainstem implant in 1979.

    House, an otologist who specializes in neurology and is affiliated with the House Ear Institute, people who have been diagnosed with neurofibromatosis type 2 (NF2).

    House’s original ABI consisted of two ball electrodes that were implanted near the surface of the cochlear nucleus on the brainstem.

    In 1997, At the University of Wurzburg, Robert Behr may be found, Germany, carried out an ABI implantation by using a 12-electrode array implant in conjunction with an audio processor that was based on the MED-EL C40+ cochlear implant.

    Both the internal component, which is known as the implant, and the external component make up an ABI system (the audio processor or sound processor). Its appearance and purpose are comparable to those of a cochlear implant.

    It is possible to wear the external audio processor on or behind the ear. At a minimum, it is equipped with a microphone, which can take up sound signals coming from the surroundings. These analog impulses are then converted by the audio processor into digital signals before being sent to the coil. The coil is responsible for transmitting the impulses through the skin and down to the implant.

    ABI was only recommended for individuals diagnosed with Neurofibromatosis Type 2 before to the year 2018. (NF2). The NF2 genetic abnormality is distinguished by the development of benign tumors all throughout the nervous system. This is the primary symptom of the condition. These vestibular schwannomas, which are often referred to as acoustic neuromas, frequently grow on the auditory nerve. Surgical excision of these NF2 tumors may cause damage to the auditory nerve, which in turn limits the patient's ability to hear.

    Patients aged 12 months and older who are unable to benefit from a cochlear implant owing to non-functional auditory nerves are eligible for ABI treatment in Europe and other countries where it has been CE-marked and authorized. This covers etiologies that are present at birth as well as those that have developed through time:

    Auditory nerve aplasia

    Auditory nerve hypoplasia

    Head trauma

    Non-NF2 tumours

    Severe cochlear ossification

    Because it needs a craniotomy, ABI implantation is a far more complicated procedure than CI surgery. In most cases, the procedure is carried out jointly by a neurosurgeon and an ENT surgeon. During the procedure, the electrode array is positioned on the surface of the cochlear nucleus by inserting it via the fourth ventricle.

    Speech perception results with an ABI are, in general, far lower than those found in users of multichannel cochlear implants. The majority of patients are able to identify the presence of speech and other noises from their surroundings.

    {End Chapter 1}

    Chapter 2: Speech synthesis

    The artificial manufacture of human speech is what's known as speech synthesis. The term speech computer or speech synthesizer refers to the kind of computer system that is used for this purpose. Speech computers may be implemented in either software or hardware devices. Other systems turn symbolic linguistic representations such as phonetic transcriptions into speech, as contrast to a text-to-speech (TTS) system, which converts written language text into spoken language. Speech recognition is the process that occurs in reverse.

    By stringing together segments of previously recorded speech that are kept in a database, it is possible to generate synthetic speech. The size of the speech units that are kept in a system may vary; a system that stores phones or diphones has the capacity to produce the widest variety of sounds, but it may not be very clear. When applied to certain application areas, storing whole words or phrases enables the production of high-quality output. The output of a synthesizer may also be made to sound totally synthetic by including a model of the human vocal tract in addition to other human voice features.

    The effectiveness of a speech synthesizer may be evaluated based on how realistically it reproduces the sound of the human voice and how easily it can be comprehended. People who are unable to read due to visual impairments or reading problems are able to listen to printed words on their home computers thanks to a text-to-speech technology that is understandable. Since the early 1990s, a number of different computer operating systems have featured voice synthesis software.

    A text-to-speech system, sometimes known as a engine, is made up of two components: which is then superimposed over the speech that is produced; and.

    People have attempted to create machines that are capable of mimicking human speech for a very long time, far before the discovery of electronic signal processing. Pope Silvester II, who died in 1003 AD, Albertus Magnus, who lived from 1198 to 1280, and Roger Bacon, who lived from 1214 to 1294 were all mentioned in early stories pertaining to the presence of Brazen Heads..

    In 1779 the German-Danish scientist Christian Gottlieb Kratzenstein won the first prize in a competition announced by the Russian Imperial Academy of Sciences and Arts for models he built of the human vocal tract that could produce the five long vowel sounds (in International Phonetic Alphabet notation: [aː], [eː], [iː], [oː] and [uː]).

    The vocoder was first created by Bell Labs in the 1930s. It was able to mechanically break down human voice into its component tones and resonances. Homer Dudley, known for his work on the vocoder, went on to design a voice-synthesizer that was driven by a keyboard and given the name The Voder (Voice Demonstrator). He displayed this device in the 1939 New York World's Fair.

    Haskins Laboratories employee Dr. Franklin S. Cooper and his coworkers began construction on the Pattern playback in the late 1940s and did not finish the project until 1950. This piece of hardware existed in a number of distinct iterations; however, only one of them is still in existence today. This device is able to reverse the process of converting spectrograms, which are visual representations of the acoustic patterns of speech, into sound. With the use of this instrument, Alvin Liberman and his colleagues were able to uncover acoustic signals for the perception of phonetic segments (consonants and vowels).

    The latter half of the 1950s saw the birth of the very first computer-based speech-synthesis systems. At the Electrotechnical Laboratory in Japan in the year 1968, Noriko Umeda and a group of other researchers created the first broad English text-to-speech system. Later on, LPC served as the foundation for early voice synthesizer chips, such as the Texas Instruments LPC Speech Chips that were included into Speak & Spell toys beginning in 1978.

    While working at NTT in 1975, Fumitada Itakura created the line spectral pairs (LSP) technique for high-compression voice coding. This approach was named after the spectral lines that appear in pairs.

    MUSA was one of the first Speech Synthesis systems, and it was originally made available to the public in 1975. It was able to read Italian because to its combination of a standalone computer's hardware and customized, language-specific software. A second version, which came out in 1978, was also able to sing in a a cappella way while singing in Italian.

    The DECtalk system, which was largely based on the work that Dennis Klatt did at MIT, and the Bell Labs system were the dominant systems in the 1980s and 1990s, respectively. The Bell Labs system was one of the first multilingual language-independent systems, and it made extensive use of natural language processing methods.

    Beginning in the 1970s, portable electronic devices with speech synthesis started to become available. The Telesensory Systems Inc. (TSI) Speech+ portable calculator for the blind was one of the first devices of its kind and was released in 1976. Another early example from that same year is the arcade version of Berzerk, which was released. In the same year, the Milton Bradley Company developed and released the game Milton, which is considered to be the first multi-player electronic game to use speech synthesis.

    The first electronic speech-synthesizers typically produced a robotic tone and were difficult to understand at best. Although the quality of synthesized speech has continually improved, the output from modern speech synthesis systems is still easily identifiable from genuine human speech as of the year 2016.

    Prior to 1990, when Ann Syrdal of AT&T Bell Laboratories produced a female voice, synthesized voices often had a masculine quality to them.

    The ability of a voice synthesis system to sound natural and to be understandable are two of its most essential characteristics. The degree to which the output sounds like human speech is referred to as its naturalness, while the ease with which it may be comprehended is referred to as its intelligibility. The perfect speech synthesizer would have a voice that was both natural and understandable. Speech synthesis systems will often attempt to optimize the effectiveness of each of these qualities.

    Concatenative synthesis and formant synthesis are the two basic methods that are used in the generation of waveforms for synthetic speech. Each method has advantages and disadvantages, and the objectives of the synthesis system will often serve as the primary determinant in selecting one method over another.

    The concatenation, or stringing together, of individual portions of previously recorded speech is the foundation of concatenative synthesis. In most cases, the concatenative synthesis technique results in synthetic speech that sounds the most natural. On the other hand, because of the inherent nature of the automated processes used to segment the waveforms, the output may sometimes include audible glitches. These glitches may be distinguished from the natural changes that occur in speech. Concatenative synthesis may be broken down into three distinct sub-types.

    Large datasets of previously recorded speech are used in unit selection synthesis. During the process of creating a database, each recorded speech is broken down into many components, including individual phones, diphones, half-phones, syllables, morphemes, words, phrases, and sentences. Some or all of these components may be present. In most cases, the division of the recording into segments is accomplished with the assistance of a speech recognizer that has been specially modified and is programmed to operate in the forced alignment mode. After this, some manual correction is performed with the assistance of visual representations such as the waveform and the spectrogram. After the units in the speech database have been segmented, an index of those units is constructed based on the segmentation as well as acoustic properties such as the fundamental frequency (pitch), duration, location within the syllable, and surrounding phones. At the moment of execution, the intended target utterance is generated by selecting the candidate units that comprise the chain of units with the highest score from the database (unit selection). This approach is often carried out with the use of a decision tree that has been given a specialized weighting.

    The least amount of digital signal processing (DSP) is applied to the recorded voice when unit selection is used, hence the result is the most natural sounding of the three methods. Although digital signal processing (DSP) tends to make recorded speech seem less realistic, some systems apply a modest bit of signal processing at the point when the signals are concatenated in order to smooth out the waveform. It may be difficult to tell the difference between the output of the best unit-selection systems and the voices of actual people, particularly in situations for which the text-to-speech (TTS) system has been tailored. However, in order to achieve the highest possible level of naturalness, unit-selection speech databases often need to be quite big, with some systems having recorded data that ranges into the terabytes and including hundreds of hours of speech.

    Diphone synthesis makes use of a minimum speech database that is comprised of all the diphones, or transitions from one sound to the next, that are found in a language. There are around 800 diphones in Spanish, but there are approximately 2500 in German. The amount of diphones in a language is determined by its phonotactic structure. In the process of diphone synthesis, the speech database only stores a single instance of each individual diphone. At the moment of execution, the desired prosody of a phrase is overlaid on these minimum units using digital signal processing methods such as linear predictive coding, PSOLA, and others.

    The creation of whole utterances is the goal of domain-specific synthesis, which does this by stringing together prerecorded words and sentences. It is employed in applications such as transit schedule announcements and weather forecasts, when the diversity of texts that the system will generate is confined to a certain domain. The technology is quite easy to implement, and it has been put to use in a variety of consumer products, such as talking clocks and calculators,

    Enjoying the preview?
    Page 1 of 1