Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Neural Modeling of Speech Processing and Speech Learning: An Introduction
Neural Modeling of Speech Processing and Speech Learning: An Introduction
Neural Modeling of Speech Processing and Speech Learning: An Introduction
Ebook581 pages5 hours

Neural Modeling of Speech Processing and Speech Learning: An Introduction

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book explores the processes of spoken language production and perception from a neurobiological perspective. After presenting the basics of speech processing and speech acquisition, a neurobiologically-inspired and computer-implemented neural model is described, which simulates the neural processes of speech processing and speech acquisition. This book is an introduction to the field and aimed at students and scientists in neuroscience, computer science, medicine, psychology and linguistics.

LanguageEnglish
PublisherSpringer
Release dateJul 11, 2019
ISBN9783030158538
Neural Modeling of Speech Processing and Speech Learning: An Introduction

Related to Neural Modeling of Speech Processing and Speech Learning

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Neural Modeling of Speech Processing and Speech Learning

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Neural Modeling of Speech Processing and Speech Learning - Bernd J. Kröger

    © Springer Nature Switzerland AG 2019

    Bernd J. Kröger and Trevor BekolayNeural Modeling of Speech Processing and Speech Learninghttps://doi.org/10.1007/978-3-030-15853-8_1

    1. Introduction

    Bernd J. Kröger¹  and Trevor Bekolay²

    (1)

    Department of Phoniatrics, Pedaudiology and Communications Disorders, RWTH Aachen University, Aachen, Germany

    (2)

    Applied Brain Research, Waterloo, ON, Canada

    Abstract

    Language is an effective means of communication. Spoken language provides information quickly. How does spoken language work? Broadly speaking, spoken language relies on speech production and speech perception networks, and neural repositories for linguistic knowledge and linguistic skills.

    Keywords

    Speech productionSpeech perceptionSpeech learningNeural modelKnowledgeMemory

    This book is about the processing and learning of spoken language . In contrast to the processing of written language, spoken language is always based on communication scenarios: speech-based interactions between a speaker and one or more listeners. In addition to the word sequence within an utterance, hand-arm gesturing, facial expressions, intonation, and voice quality give essential information within a communication scenario .

    The question of why and how spoken language emerged would go well beyond the scope of this book. However, the will of every person to communicate is present from birth, and contributes significantly to our success and survival. There are several causes for the presence of a driving force for learning to speak. From the evolutionary point of view, language offers clear advantages for humans, since language allows a group of people to develop complex strategies for hunting, defense, and so on. Groups of creatures that can speak are more successful in survival than an individual or a group with more primitive means of communication. Language is also an effective tool for the direct transmission of complex information both within and across generations.

    Speech processing encompasses speech production and speech perception. Production starts from an intention—a specific pattern of neural activity in the brain—and ends with the generation of an acoustic speech signal. Perception starts from the acoustic signal and ends with understanding or comprehension—again, a pattern of neural activity. Both production and perception are important components of language learning, which requires listening (perception), and imitative attempts to reproduce heard items (production). A breakdown of the specific processes necessary for perception and production is shown in Fig. 1.1. In addition to these processes, both perception and production require a repertoire of the speaking skills and linguistic knowledge acquired through the phases of language learning. These knowledge and skill repositories are stored permanently in the brain and are shown in the central column of Fig. 1.1. The necessity of the knowledge and skill repositories stored in our long-term memory becomes clear when we try to communicate with a person whose language we do not know.

    ../images/450312_1_En_1_Chapter/450312_1_En_1_Fig1_HTML.png

    Fig. 1.1

    A basic model of speech processing. Left side: perception; right side: production; middle: skill and knowledge repositories

    Speech perception (left column in Fig. 1.1) starts with the acoustic speech signal generated by a speaker. This signal is converted to nerve impulses via cells in the inner ear and is transmitted to the brain through the auditory nerve. The inner ear mechanically performs a frequency analysis, which allows the auditory nerve and brainstem pathways to encode simple sound features. However, most sound features relevant to speech and other aspects of phonology are extracted in the cerebral cortex. The acoustic feature analysis in cerebral cortex allows us to distinguish between periodic and nonperiodic signal components, which correlate with voiced and unvoiced portions of the speech signal (see Sect. 2.​3). Further analysis with respect to the spectral structure of the signal allows us to relate the acoustic signal to movements of speech articulators. Thus, even at this very low processing level, acoustic sound features can be related to articulation (see Sect. 2.​2), and these relationships are learned implicitly (see Sect. 4.​2). This phonetic but not necessarily language-specific knowledge (see center column in Fig. 1.1) forms the basis of our library of articulatory-acoustic relationships and is built up in the early phases of speech acquisition (see Sect. 4.​1). Here already, an acoustic-auditory short-term memory is needed to carry out sound feature analysis at the level of sounds and syllables.

    After the recognition of acoustic features, subsequent phonological analysis (see Sect. 2.​1) is already closely related to word recognition and is already language specific. This acoustic sound analysis process activates learned language-specific speech sound candidates and syllable candidates, which leads to word candidate activation. At higher levels of the perception hierarchy, the extracted phonological sound sequence is held for several seconds in a phonological short-term memory to facilitate a syntactic and semantic analysis of whole sentences. This analysis leads to the extraction of sentence meaning or comprehension—that is, the utterance now is understood. High levels of the perception hierarchy require the grammatical, syntactic, and lexical knowledge stored in long-term memory (see middle column in Fig. 1.1).

    Memory plays a significant role in perception. Acoustic-auditory short-term memory is needed to extract syllables from the mental syllabary. Phonological short-term memory is needed to extract words and for further lexical and grammatical analysis. These short-term memories , which hold current speech information, are fundamentally different from long-term repositories like the mental syllabary and mental lexicon , which store language-specific knowledge that is accessed when needed. Both types of memory, and interactions between both types of memory, are essential for speech perception and production.

    Full understanding of a communication process also draws upon knowledge about the environmental context, particularly when discerning the intention of a communication partner. For example, the intention of a person uttering the sentence I would like to get a carton of milk! in a grocery store is different from the intention of the same person uttering that sentence in a library. Depending on the situation, determining intention can call upon knowledge stored in procedural memory and parts of long-term semantic memory that are not language specific. A more detailed presentation of the entire process of language perception is given in Chap. 3.

    Speech production (right column in Fig. 1.1) starts with conceptualization. The speaker begins formulating a sentence with a specific intention by mentally specifying the information to be transmitted at the concept level. This specification is realized as neural activations in the semantic network (see Chap. 2.​1). In the formulator module (Fig. 1.1), words (concepts, lemmas, and lexemes) are activated and grouped together using grammatical knowledge of phrases and sentences. Here again, the speaker needs knowledge from long-term memory (grammatical knowledge, see middle column in Fig. 1.1) to specify the grammatical structure of the desired sentence, properly inflect the words, and put them in the correct order. These grammatical and syntactic processes take place before the phonological realization of the utterance is activated from the mental lexicon (see Chap. 2.​1). Following syllabification, motor plans for each syllable are activated from the mental syllabary, resulting in articulator movements producing the desired speech signal.

    It should already be clear in the brief presentations of speech perception and production that both processes depend on previously acquired knowledge . As such, production and perception are interwoven and complementary, both during and after learning. Language and speech learning requires listening (perception) as well as speaking (production). The linguistic interaction between communication partners is essential and allows for learning with reference to the target language (mother tongue). Learning language and how to speak takes place primarily in childhood, but continues over our whole life span. We continually extend our vocabulary and adapt our listening and speaking behavior for different communication partners in a myriad of ways. A detailed description of speech acquisition, lifelong adaptation of our speech behavior, and refinement of speech production and perception are discussed in Chap. 4.

    The main goal of this book is to describe a computer-implemented model of speech processing and acquisition . We provide a clear and quantitative description of the neural processes underlying speech production, perception, and learning , including how short-term and long-term memories are stored and accessed by other parts of the model. We present the model in Chaps. 8 and 9. Before that, we introduce the functional anatomy of the nervous system for speech processing (Chap. 5) and a framework for neural modeling (Chaps. 6 and 7).

    Chapters 1–5, introducing the linguistic, phonetic, and neurobiological aspects of speech production, perception, and acquisition, were written with the goal of preparing the reader for understanding our computer-implemented model (Chap. 9). These introductions are not the only possible path to understanding the model, nor the only path to having a good understanding of the linguistics, phonetics, and neurobiology of speech. We have attempted to teach what we consider to be accepted by most linguists and neurobiologists (i.e., the common sense of the field). However, there is no one and only neurobiological theory of language processing, and so we have carefully cited basic literature at the end of each chapter. This literature is, in our opinion, accessible for reads that would consider themselves beginners in this field. We recommend that readers read the cited literature to further develop your linguistic and neurobiological knowledge.

    The same applies to our introduction of neural models in Chaps. 6 and 7. In this book, we provide only basic information on connectionism (Chap. 6) and one approach to neural modeling (Chap. 7). We focus our discussion of neural modeling on the neural engineering framework (NEF) because it provides a comprehensive approach to cognitive and sensorimotor aspects of speech processing and learning, which allows all aspects of speech processing and learning to be implemented in one large-scale neural model . Additionally, using the NEF allows a computer implementation of our speech processing and learning model with the Nengo neural simulator , which is freely available for noncommercial purposes, and is understandable and usable for those with little or no programming experience (www.​nengo.​ai, Chaps. 7 and 8).

    Conclusion to Chap. 1

    The brain uses distinct cognitive and sensory-motor levels to process spoken language. At the sensory-motor level, speech motor skills and associated sensory knowledge are stored in the mental syllabary. At the cognitive level, linguistic symbolic knowledge is stored in the mental lexicon. Speech perception, speech production, and language comprehension networks in the brain all use the mental lexicon and mental syllabary.

    Part IBasics of Linguistics and Phonetics

    © Springer Nature Switzerland AG 2019

    Bernd J. Kröger and Trevor BekolayNeural Modeling of Speech Processing and Speech Learninghttps://doi.org/10.1007/978-3-030-15853-8_2

    2. Speech Production

    Bernd J. Kröger¹  and Trevor Bekolay²

    (1)

    Department of Phoniatrics, Pedaudiology and Communications Disorders, RWTH Aachen University, Aachen, Germany

    (2)

    Applied Brain Research, Waterloo, ON, Canada

    Abstract

    In this chapter we introduce semantic networks, mental lexicon, mental syllabary, articulation, and how the acoustic speech signal is generated. We detail the types of information associated with lexical items (concept, lemma, and phonological form) and syllables (motor form or motor plan, auditory form, somatosensory form), and discuss how motor plans are created with speech movement units. We then explain how motor plans activate articulatory movements for the lips, tongue, velum, and larynx, and how those motor plans generate the acoustic speech signal. At the end of the chapter, we briefly discuss production-related language and speech disorders.

    Keywords

    Speech movement unitsSyllableArticulationVocal tractFormantPhonationSpeech production

    2.1 Words , Syllables, and Speech Sounds

    2.1.1 Concepts and Semantic Networks

    If we want to produce an utterance, we start with a communicative intention, or some specific linguistic content. This intention or content is first activated on the semantic level, in the semantic neural network (see also Steyvers and Tenenbaum 2005). The extent to which this level is metalinguistic or already linguistically determined goes beyond the scope of this book. We assume here that at least parts of this semantic level contain information from the already learned target language (mother tongue). The semantic-linguistic network works as follows: if, for example, a speaker wants to produce the sentence the dog chases the cat, the concepts dog, chase, and cat are activated. These concepts are stored within the mental lexicon, which also stores their associated lemmas and lexemes (i.e., their phonological representations). In addition, it is necessary to define at the semantic level how the nouns dog and cat and the verb chase are related to each other:

    $$ Subject:\left\langle Dog\right\rangle + Object:\left\langle Cat\right\rangle + Verb: active:\left\langle chase\right\rangle $$

    (2.1)

    Using learned grammatical and syntactic knowledge, the verb is then inflected to the form chases due to the third-person singular subject, and the function word the is added before the subject and object, resulting in the sentence’s phonological form ../images/450312_1_En_2_Chapter/450312_1_En_2_Figa_HTML.gif . The phonetic characters are explained in Sect. 2.1.3.

    To elucidate the role of the metalinguistic part of the semantic network , we use a simpler example: word production by a toddler who is still learning its mother tongue. The intention of the child is to make a communication partner aware of a round, red object lying in a corner of the room. The intention results in producing the word ball through the following procedure. First, the child sees the ball lying in a corner of a room. The visual features of the ball activate the child’s semantic neural network with respect to the visually triggered semantic features , , and other semantic features arising from their prior knowledge, such as , , and . In the metalinguistic semantic network of the child, all these semantic features or concepts are activated together with the concept .

    Because of their experience gained through spoken communication, the child knows that the concept can be named directly by the lexeme ball, despite not being able to write. Not all concepts have a simple one-to-one relationship between the concept and a lexeme. For example, the concept could be mapped to the lexeme bike or bicycle depending on the context. For the child, however, ball is the clear lexeme to be uttered, which results in a motor realization corresponding to that lexeme. At their age, the articulation of ../images/450312_1_En_2_Chapter/450312_1_En_2_Figb_HTML.gif is still too difficult, so they instead produce the simpler form ../images/450312_1_En_2_Chapter/450312_1_En_2_Figc_HTML.gif . After a certain age, however, the correct articulation will be stored. Thus, a concept activated in the semantic network leads to the retrieval of the associated phonological form of the lexeme from the mental lexicon. The phonological form or sound sequence of the word (e.g., ../images/450312_1_En_2_Chapter/450312_1_En_2_Figb_HTML.gif ) can then be articulated by activating an associated motor plan in the mental syllabary. The semantic network, mental lexicon, and mental syllabary all play a role in converting the child’s intention and the visual features of the ball to the final utterance ../images/450312_1_En_2_Chapter/450312_1_En_2_Figc_HTML.gif .

    Individual semantic features like , , , and , as well as the resulting complexes of semantic features, e.g., , make up the semantic network. The semantic network represents a speaker’s knowledge about their entire world. Concepts within this network are linked to other concepts via semantic relations . For example, the concept becomes understandable only if semantic features like and are linked to by semantic relations like [is], [is a], and [can be] (see also Sect. 7.​4):

    $$ \left\langle ball\right\rangle \kern0.5em \left[ is\right]\kern0.5em \left\langle movable\right\rangle; \kern0.5em \left\langle ball\right\rangle \kern0.5em \left[ is\right]\kern0.5em \left\langle round\right\rangle; \left\langle ball\right\rangle \kern0.5em \left[ is\ a\right]\kern0.5em \left\langle toy\right\rangle $$$$ \left\langle ball\right\rangle \kern0.5em \left[ is\ an\right]\kern0.5em \left\langle object\right\rangle; \kern0.5em \left\langle ball\right\rangle \kern0.5em \left[ can\ be\right]\kern0.5em \left\langle thrown\right\rangle; \kern0.5em \left\langle ball\right\rangle \kern0.5em \left[ can\ be\right]\kern0.5em \left\langle caught\right\rangle $$

    (2.2)

    These relations point from one concept to another and are therefore always directed. Semantic networks can be visualized with box-and-arrow diagrams, as in Fig. 2.1. Here, the expressions surrounded by ovals represent concepts like and while the arrows represent directed semantic relations like [is a] or [requires]. The nature of the relations in Fig. 2.1 is indicated by text near the arrow. Implicit relations between concepts (e.g., [requires] ) do not have to be explicitly stated in semantic networks.

    ../images/450312_1_En_2_Chapter/450312_1_En_2_Fig1_HTML.png

    Fig. 2.1

    A small semantic network

    Questions for Sect. 2.1.1

    1.

    Name the components of a semantic network

    2.

    Name several concepts related to the concept

    ▸ Answers

    1.

    Concepts and semantic relations

    2.

    [is] ; [is] ; [has] ; [can]

    2.1.2 Mental Lexicon and Mental Syllabary

    The metalinguistic semantic network contains the speaker’s world knowledge (Fig. 1.​1). A word or lexeme, on the other hand, has both a phonological form (i.e., a sound sequence) and a conceptual meaning. All words are stored in a knowledge repository called the mental lexicon . In a typical adult, the mental lexicon contains knowledge of approximately 60,000 words. The mental lexicon is closely tied to the semantic network, connecting the concepts from the semantic network with language-specific phonological forms (cp. Levelt et al. 1999).

    But how does a child learn a word form? A child who can voice a word (e.g., ball) does not necessarily need to be aware that this word consists of the sequence of three sounds, namely ../images/450312_1_En_2_Chapter/450312_1_En_2_Figb_HTML.gif . As a result, we can surmise that phonological consciousness—the knowledge of a word’s specific sound sequence—develops gradually. Instead, children learn to voice words through a series of imitation trials. Consider how a child might learn the word ball. They are interested in a ball lying in the corner of the room, and want their caretaker to pass them the ball. Since they have formed an association between the ball and their caretaker’s utterance of ball, they imitate that utterance as best as possible, perhaps expressing ../images/450312_1_En_2_Chapter/450312_1_En_2_Figd_HTML.gif or ../images/450312_1_En_2_Chapter/450312_1_En_2_Fige_HTML.gif while pointing towards the ball. If the caretaker does not understand, the child will continue to produce sounds they associate with the ball. If a similar communication scenario takes place days later, the child’s practice producing the word ball may have paid off such that its utterance of the word ball now satisfies the caretaker. The caretaker will likely praise (reward) the child’s first successful production of the word, resulting in the child remembering the sound sequence and corresponding articulation as the correct realization of the word ball. It is important to note that this learning scenario always involves interaction with a communication partner (e.g., a caretaker; cp. Sect. 4.​2).

    Learning through imitation trials leads to both the learning and storage of the motor form or motor plan of a word (how it is articulated), but also to the learning and storage of the auditory form of the word (how it sounds). The stored auditory form for each word remains important throughout a speaker’s life, as we use it to correct and refine our pronunciation. Since humans naturally economize and simplify our speech movement processes over time, we must monitor our pronunciation to ensure that we have not oversimplified to the degree that a word becomes incomprehensible.

    In addition to how a syllable or word sounds, we also store the somatosensory form , which is how it physically feels to articulate that syllable or word. Somatosensory feedback includes touch and proprioception, which is the perception of joint positions (like the angle of the jaw) and muscle tension (like the tension required to position the tongue appropriately). Somatosensory feedback is typically faster than auditory feedback ; we can intuit that an articulation sequence has been performed correctly if it felt normal, and then use auditory feedback to confirm our intuition.

    Concepts are abstract symbolic entities which are stored efficiently in the semantic network. Unlike motor sequences, concepts require relatively few neurons in the brain. The around 60,000 concepts and their phonological forms can be stored in a few square millimeters of neocortex because they are cognitive entities . With cognitive entities, we only need to distinguish between different entities, meaning that we only need 60,000 distinct activity patterns in the small number of neurons representing these concepts. In the naïve case we can do this with 60,000 neurons, but since neurons are individually unreliable, and there are advantages to similar concepts having similar activity patterns, we use more than 60,000 neurons to represent concepts.

    Storage of cognitive entities contrasts with storage of motor plans, auditory forms, and somatosensory forms, which are grounded to the real world. While with concepts we only need to distinguish between them, with these grounded representations we need to store enough information to relive the sensory experience of hearing or articulating a speech sound. To store all this information with our limited neural resources, we exploit the fact that words and utterances can be decomposed into syllables . Approximately 95% of what we speak on a daily basis are different sequences of the same 2000 syllables. We therefore only need to store about 2000 motor sequences, auditory forms, and somatosensory forms, which can also be accomplished with a few square millimeters of neocortex.

    We call the parts of the brain that store the motor plans, auditory forms, and somatosensory forms of our well-learned syllables the mental syllabary (cf. Levelt et al. 1999; Cholin 2008). The process of syllabification is when conceptual and symbolic language information (stored in the mental lexicon plus grammar rules) is transformed to motor and sensory information (stored in the mental syllabary). Syllabification is a critical component of speech production (see Fig. 1.​1).

    Finally, it should be noted that the concept and the phonological forms of words are stored separately from their grammatical status (e.g., noun or verb, male or female, singular or plural). In some theories, the grammatical status of a word form is called a lemma . Accordingly, each word in the mental lexicon is to be defined on three levels: the concept level, the lemma level, and the phonological level.

    Questions for Sect. 2.1.2

    1.

    The speech production hierarchy consists of the cognitive-symbolic level and the sensory-motor level. Which is the higher level?

    2.

    Which of the two levels above is phonetic, and which includes phonology?

    3.

    Name the representational forms of words in the mental lexicon and describe them briefly.

    ▸ Answers

    1.

    The cognitive-symbolic level.

    2.

    The phonetic level is the sensory-motor level. Phonology is part of the cognitive-symbolic level.

    3.

    Concept level: word meanings. Lemma level: grammatical status. Phonological level: word form. In some models, the concept and lemma levels are grouped as one level.

    2.1.3 Mental Syllabary and Phonological Awareness

    As a child learns more and more syllables, they intuitively recognize certain structural principles: syllables always seem to have a vowel sound, and zero, one, or more consonant sounds can occur before and after the vowel sound. Children learn to distinguish between consonants and vowels with no explicit instruction. In addition, children recognize that the number of vowels and consonants used by their communication partners is limited. After learning many syllables, children will come to recognize that words and syllables are composed of smaller sounds. For example, the word ball is composed of the sounds ../images/450312_1_En_2_Chapter/450312_1_En_2_Figb_HTML.gif , which they recognize are sounds that also occur in other syllables (e.g., /b/ also occurs in the ../images/450312_1_En_2_Chapter/450312_1_En_2_Figf_HTML.gif of banana; ../images/450312_1_En_2_Chapter/450312_1_En_2_Figg_HTML.gif in the ../images/450312_1_En_2_Chapter/450312_1_En_2_Figh_HTML.gif of fault; /l/ in the ../images/450312_1_En_2_Chapter/450312_1_En_2_Figi_HTML.gif of land). Children will also recognize different vowels by comparing pairs of words. For example, ../images/450312_1_En_2_Chapter/450312_1_En_2_Figj_HTML.gif and ../images/450312_1_En_2_Chapter/450312_1_En_2_Figk_HTML.gif are phonetically quite similar, but refer to completely different objects, pen and pan. The difference results from just one differing sound. Since the two sounds differentiate between two distinct words, we consider it a phonemic difference , meaning that both sounds are present in the repertoire of sounds used by that language. In a different language, both sounds may map to the same phoneme . These processes lead to phonological awareness. This accumulated phonological knowledge is stored in the mental syllabary . Minimal pairs like ../images/450312_1_En_2_Chapter/450312_1_En_2_Figj_HTML.gif and ../images/450312_1_En_2_Chapter/450312_1_En_2_Figk_HTML.gif are therefore useful tools for linguists, children learning to speak, and second language learners. In Fig. 2.2, a simplified version of the sounds in American English is listed by their broad phonetic (or phonological) transcription. In Table 2.1, we provide some additional minimal pairs and their transcriptions for American English.

    ../images/450312_1_En_2_Chapter/450312_1_En_2_Fig2_HTML.png

    Fig. 2.2

    Sound system of American English (simplified). Top: vowels; bottom: consonants (per cell: left side: voiceless consonants, right side: voiced consonants). Laterals are subtypes of approximants

    Table 2.1

    Small selection of minimal pairs for vowels and consonants

    ../images/450312_1_En_2_Chapter/450312_1_En_2_Tab1_HTML.png

    The vowel list in Fig. 2.2 is simplified. English also makes use vowels with inherent articulatory movements called diphthong-like vowel; for example, the vowel sounds in tight ../images/450312_1_En_2_Chapter/450312_1_En_2_Figl_HTML.gif and goat ../images/450312_1_En_2_Chapter/450312_1_En_2_Figm_HTML.gif are not found in Fig. 2.2. Importantly, however, Fig. 2.2 shows that vowels are primarily defined by the position of the tongue in the mouth, on two dimensions: front-back and high-low. A third dimension not visualized is whether the lips are rounded or unrounded when voicing that vowel. In American English, the non-low back vowels ../images/450312_1_En_2_Chapter/450312_1_En_2_Fign_HTML.gif are rounded while the front and low vowels ../images/450312_1_En_2_Chapter/450312_1_En_2_Figo_HTML.gif are unrounded.

    Consonants can be distinguished in several ways. They can be voiced or voiceless (e.g., /b/ and /θ/ are voiced versions of /p/ and /ð/); they can be articulated in a different way (manner of articulation , e.g., plosive , fricative , nasal ); and they can be articulated in different places (place of articulation , e.g., tongue tip to teeth, lips together). The speech sounds shown in Fig. 2.2 become clearer when seen as part of a minimal pair, like those listed in Table 2.1.

    In addition to the intuitive learning of meaning-differing sounds and sound classes , children also learn which syllable structures are permitted in a language. American English allows many syllable structures , including, but not limited to, /CV/ like /kɪ/ as in kick, /CCV/ like /klɪ/ as in clean, /CVC/ like /рεt/ as in pet, and /CCVC/ like /klɪk/ as in click. Children will also learn that not all sounds or sound sequences are allowed to occur in each syllable position. For example, in American English, the sound sequence /klɪ/ as in click is allowed, while the sound sequence /lkɪ/ is not. Syllable structures and other syllable-level rules are language specific, though languages often share common syllable structures that are rules.

    Because words are composed of one or more syllables, children learn to recognize the phonological structure of syllables , and can therefore decompose complex words into syllables (e.g., banana is made up of three syllables, ../images/450312_1_En_2_Chapter/450312_1_En_2_Figp_HTML.gif ). For some grammatical procedures like inflection, syllabification may be important for speech production; for example, act ../images/450312_1_En_2_Chapter/450312_1_En_2_Figq_HTML.gif does not share a syllable with acting ../images/450312_1_En_2_Chapter/450312_1_En_2_Figr_HTML.gif . For efficiency reasons, the brain represents the speech sounds for words with cognitive symbols, rather than storing the whole motor and sensory form of each word. The symbolic representations for speech sounds are linked to the conceptual representations of the word in the mental lexicon (Fig. 2.3). The storage of words in the mental lexicon is therefore purely cognitive, which saves neural resources. During speech production, the phonological word form is syllabified, resulting in syllabic cognitive forms, which are projected to the mental syllabary to retrieve the motor and sensory forms for those syllables (Fig. 2.3).

    ../images/450312_1_En_2_Chapter/450312_1_En_2_Fig3_HTML.png

    Fig. 2.3

    Flow of processing in speech production, including the mental lexicon and the mental syllabary. In the mental lexicon, concepts are associated with syntactical attributes (lemmas) and word pronunciation (sound chains or phonological forms). In the mental syllabary, the sound sequence of a single syllable activates the motor plan for articulation and the learned auditory form (how the syllable sounds) and the somatosensory form (how the syllable feels to produce)

    Questions for Sect. 2.1.3

    1.

    Name the one cognitive-symbolic and three sensory-motor representations for each syllable in the mental syllabary.

    2.

    Define the term phoneme.

    3.

    What is a minimal pair?

    4.

    Name at least two minimal pairs of English, one for a vowel and one for a consonant.

    ▸ Answers

    1.

    Cognitive-symbolic representation: phonological form. Sensory-motor representations: motor plan, auditory form, somatosensory form (consisting of tactile and proprioceptive subforms).

    2.

    A speech sound with a distinctive function.

    3.

    Two words with different meanings that differ in only one speech sound.

    4.

    See Table 2.1. Example: beat-bit; pin-bin.

    Conclusion to Sect. 2.1

    The semantic network contains a set of concepts and a set of relations between concepts. Concepts, in turn, contain one or more semantic features. While concepts can be abstract, words (lemmas and lexemes) are language specific. Many concepts can also be viewed as language specific (and thus word specific). The pronunciation of a word defines its phonological form. The grammatical status of a word is specified at the lemma level. The specification of phonemes from speech sounds within a specific language is called its phoneme inventory and allows the phonological specification of each word. In the mental lexicon all the learned words in a language are stored as symbolic-cognitive entities (typically around 60,000 words). In the mental syllabary, auditory, somatosensory, and motor representations of the frequently used syllables in a language are stored (around 2000 syllables). As a result, all words and sentences consisting of frequently used syllables can be realized using stored motor representations (motor plans). In English, this accounts for about 95% of utterances. The stored auditory and somatosensory forms of frequent syllables help control articulatory execution. Rare syllables may be realized from frequent syllables with similar syllable structure.

    2.2 Articulation

    The syllable is the basic unit of articulation . Even when pronouncing isolated sounds, we do so using the syllable form. We do not say /b/ but ../images/450312_1_En_2_Chapter/450312_1_En_2_Figs_HTML.gif , because to utter a plosive in

    Enjoying the preview?
    Page 1 of 1