Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Informal Speech: Alphabetic and Phonemic Text with Statistical Analyses and Tables
Informal Speech: Alphabetic and Phonemic Text with Statistical Analyses and Tables
Informal Speech: Alphabetic and Phonemic Text with Statistical Analyses and Tables
Ebook664 pages6 hours

Informal Speech: Alphabetic and Phonemic Text with Statistical Analyses and Tables

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This title is part of UC Press's Voices Revived program, which commemorates University of California Press’s mission to seek out and cultivate the brightest minds and give them voice, reach, and impact. Drawing on a backlist dating to 1893, Voices Revived makes high-quality, peer-reviewed scholarship accessible once again using print-on-demand technology. This title was originally published in 1974.
LanguageEnglish
Release dateJul 28, 2023
ISBN9780520329331
Informal Speech: Alphabetic and Phonemic Text with Statistical Analyses and Tables
Author

Edward C. Carterette

Enter the Author Bio(s) here.

Read more from Edward C. Carterette

Related to Informal Speech

Related ebooks

Language Arts & Discipline For You

View More

Related articles

Reviews for Informal Speech

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Informal Speech - Edward C. Carterette

    INFORMAL SPEECH

    Informal Speech

    Alphabetic & Phonemic Texts With Statistical Analyses And Tables

    by

    EDWARD C. CARTERETTE

    and

    MARGARET HUBBARD JONES

    UNIVERSITY OF CALIFORNIA PRESS

    Berkeley Los Angeles London 1974

    University of California Press Berkeley and Los Angeles, California

    University of California Press, Ltd.

    London, England

    Copyright © 1974 by The Regents of the University of California ISBN: 0-520-01476-6 Library of Congress Catalog Card Number: 73-92376

    Printed in the United States of America

    CONTENTS 1

    CONTENTS 1

    PREFACE

    1 INTRODUCTION

    2 PROCEDURES

    3 DISCUSSION OF THE STATISTICAL TABLES

    4 APPENDIX

    5 REFERENCES

    6 THE RUNNING TEXTS

    SECTION 7 FIRST ORDER STATISTICAL TABLES

    SECTION 8 HIGHER ORDER STATISTICAL TABLES

    PREFACE

    ACKNOWLEDGMENTS

    1. The U.S. Office of Education, through Project OE-1877, supported the initial phase of the study, including collection of data, key-punching of texts, and the statistical analysis of first-order tables and sequential constraints. Support for higher-order analyses and preparation of the manuscript came from the U.S. Public Health Service, National Institute of Mental Health (Grant MH-07809). The Hope for Hearing Foundation, Los Angeles, California, provided funds to defray the cost of typing the phonemic texts and tables as well as their proofreading. The University of California Press, through its Scientific Fund, graciously bore the original pulication cost. We are particularly grateful for the encouragement of Mr. Robert Zachary, Los Angeles Editor of the Press, and Mr. James Kubeck, Managing Editor of the Press in Los Angeles.

    There is no brief way of saying how much we are in the debt of Mrs. Claudia Seapy (the former Miss Croteau), Professor David W. Packard, and Mr. Howard Golden. The help they gave in editing, in the design of type and format, and in expert consulting on the arts of computer typesetting and programming is incalculable. Mere money could never pay our debt to them.

    2. Computing assistance was obtained from the Health Sciences Computing Facility, UCLA, sponsored by National Institutes of Health Grant FR-3, and from the UCLA Campus Computing Network.

    3. We owe thanks to many people for their assistance in arranging for subjects and space, and especially to Mr. Kyle Esgate, Mr. William Haley, Mrs. Esther S. McGinnis, Dr. John D. McNeil, Mrs. Marjorie M. Rohrbough, and Mrs. Genie Swinney.

    4. Thanks are due also to our Psychology research assistants over the course of the project: Janis Stone, Dolores D. Kluppel, and S. Joyce Brotsky, and to our phoneticians: Elite Ohlstain, Robert Chamberlain, Jane Chamberlain, and Herman Pevner. In addition a number of people have spent many tedious hours transcribing tapes, punching IBM cards, typing tables, coding phonetic texts into Fortran symbols and proofing. To them all we owe thanks for a job well done, but especially to Lori Bohlmann, Judith Hayward, Mary Ann Nakagawa, Gudrun Ulman, and Cheryl Grossman.

    S. We thank Dr. Peter Neely and Mssrs. Charles Goldberg, John Dewey Lovell, and Robert Madigan for having written some useful and complex computer programs.

    INTRODUCTION

    1.1 PURPOSE OF THE STUDY

    1.2 USES FOR TEXTS AND TABLES

    1 INTRODUCTION

    1.1 PURPOSE OF THE STUDY

    In spite of a vast amount of interest in language, there is very little information available about the informal spoken language. Yet psychologists, linguists, psycholinguists, and educators need such information for both research and curricular purposes. Among other things, the native language, the spoken dialect learned mainly at the mother’s knee, is the most overleamed behavior in an individual’s entire repertoire. It is important for many purposes to know how different it is from the formal written language, when can learning of the native language be considered virtually complete, what are the natural units, what is usual and what unusual in speech. Some description of standard spoken language by means of which to measure dialectical variations is doubtless important too.

    This monograph will not answer all these questions but we hope that it will throw some light on them. All previous statistical studies of language known to us have derived their material from written language, whatever the claims.1 It is part of our purpose to show that genuine spoken language is actually quite different from written language, even on such a gross level as proportional frequencies of letters and phonemes. To this end many sorts of comparisons of spoken and written language will be presented. A second purpose is the comparison of informal spoken language across the age range from 6 years to adulthood, in terms which permit of some quantitative evaluation, with an eye to estimating the rate of maturation of native language facility and its state at the beginning of formal schooling. Still a third purpose is the use of the rather powerful tools of information theory in the description of informal speech over the age range, in an effort to trace the role of redundancy in shaping language as a person uses it and presumably understands it in discourse. There are several pieces of evidence that might lead to prediction of a curvilinear relationship between age and redundancy of speech. If this were supported, it could lead to the development of some interesting instructional strategies.

    A final but important purpose of this monograph is to present the verbatim transcriptions of the corpus, in both alphabetic and phonemic form, so that others, with other purposes, will be spared the arduous, expensive, and time-consuming task of collecting and editing such material.²

    Although the reader will often bring with him his own different uses for the materials presented here, we shall discuss briefly some needs of a general sort for these materials. The tabular material will be discussed in order of appearance.

    Section & The Running Texts. These texts, alphabetic and phonemic on facing pages, represent the corpus from which all tables were derived and the basis of all statistics. The method of recording and transcribing is described more fully in Section 2: Procedures. Briefly the texts contain running dialogue from a number of peer groups of three persons in free discussion. This is considered of prime importance, since our pilot work with various techniques based on interviews and child-adult interactions yielded very impoverished language with children (vide infra), yet these techniques are widely used without comment. The transcription is sequential, just as it was recorded, from one group to the next. The texts for each of the four age groups are kept separate, however. There are more than 15,000 words at each age level and more than 1500 sentences. These can be used for many sorts of syntactic analysis of informal language; for example, tabulation of frequent syntactic patterns, use of alternative forms, sentence-depth, or right-branching versus left-branching. Theories of structural linguistics may want some revision of classes of transformations in the light of these data. Comparisons of the syntax of spoken language with formal written language are possible, as well as comparisons between the language of children and that of adults, and among dialects. We caution that the strings of sentences interweave among triples of talkers and that one cannot identify an individual speaker. Indentation indicates a change of speaker, but it has not proved possible to identify all speakers accurately. These data are intended to provide reliable, quantifiable information about a homogeneous group of speakers which can serve as a norm for spoken language. Labov also claims (1968) that data of this sort are essential for development of linguistics as a behavioral science.

    The texts can also serve as a source of material for psycholinguistic and educational experiments. These utterances were apparently easily comprehended by peers and could be used to construct lists for testing memory, comprehension, or the effect of context appropriate for a given age. For example, comprehension of sample sentences could be compared, by means of the Cloze procedure (Taylor, 19S6, 1957; Peisach, 1965), with various of their transforms with somewhat more justification than can be summoned for sentences artificially created by adults for the purpose of testing children’s comprehension.

    Another possibly fruitful area for those interested in speech is extent of phonemic variability within this rather homogeneous dialect.

    Section?. First-order Statistical Tables. There are a number of different tables in this section with rather different potential utilities.

    Table 7.1 presents the frequencies and proportions of each letter in transcribed natural speech for first, third, and fifth graders, and adults. These are useful in experimental situations where high and low frequency letters are needed. These simple proportions are not very different from those derived from written texts, and neither are the several age levels very different from one another.

    Table 7.2 shows similar data for the phonemes of natural speech but these data are unique, since other published tabulations of phonemes have been derived from written material transcribed into a phonemic alphabet,¹ not from informal discourse. This table should be useful in understanding some of the difficulties of learning to read and to spell: the most frequent phoneme in English is schwa, /«/, which is not rendered by any single letter of the alphabet, nor by any group consistently unless stress be taken into account.

    Table 7.3 presents the comparison of rank orders of vowel and consonant phonemes for the four age levels of informal speech. It includes, in addition, Denes’ (1960) and Robert’s (1965) data, both of which sets are essentially transcriptions of written language. The data based on informal speech could be useful in more accurate evaluation of the importance of acoustic confusions; for example, the biasing effects of frequency of occurrence could be taken into account. Table 7.4 presents the same data grouped according to articulatory categories, by age. These tables should provide useful new data for speech research.

    ¹ See the discussion of Table 7.5 in Section 3.1.

    Table 7.5 contains the correlation matrix for 22 consonant phonemes from 16 sets of data. It is useful in evaluating the differences between real speech and various kinds of other language sources.

    Table 7.6 presents data for relative sequential constraints, redundancy H, mean word, and mean sentence length for both letter and phonemic transcriptions of informal speech at all four ages. The growth of constraint over the first 11 pairs of symbols can be traced for each age, and comparison of child and adult language shows that very similar constraints operate.

    Table 7.7 presents similar data for relevant written texts, permitting a comparison with alphabetic transcriptions of speech in terms of redundancy. Per cent redundancy is seen to be a useful quantitative measure of difficulty of a text. As we have already shown (Carterette and Jones, 1965) redundancy does not depend on style and is not correlated with length of word or sentence, as many currently employed measures are.

    Table 7.8 presents comparative data from four levels of alphabetically transcribed natural speech and four levels of written texts of the proportional frequencies of the 26 letters of the alphabet, permitting a rough evaluation of similarities between speech and what is considered appropriate reading material at each age level. Table 7.9 shows that correlations are all remarkably high.

    Table 7.10 shows the relative sequential constraint in free-reading choices for three levels of children’s reading material. Table 7.11 shows the effect of sample size on the various statistical measures used here.

    Section & Higher-order Statistical Tables. Here are presented tables of digrams (Tables 8.1.1-8.1.4) and diphones (Tables 8.2.1-8.2.4), trigrams (Tables 8.6.1-8.6.4) and triphones (Tables 8.9.1-8.9.4), for the running texts taken from each of the four age levels. They can be used in preparation of materials for verbal learning experiments, the diphones and triphones being very much more useful for auditory experiments than their letter counterparts currently in use. Until now such materials have usually been constructed from digrams and trigrams drawn from lexicons. Our data indicate that there is progressive divergence of spoken and written language as the unit of analysis becomes larger. Whereas simple proportional frequencies of single symbols are similar, differences begin to appear when higher order constraints obtain. These differences are already substantial for successive pairs of symbols. Control of language frequency in an experiment should be based on real usage.

    Special tables have been prepared for use by educators, although psychologists and speech specialists should find them useful also. Tables

    8.3.1 and 8.3.2 give information regarding the relative frequency of word-initial and word-final letters derived from the alphabetic transcription of natural speech. Tables 8.4.1 and 8.4.2 show similar data for phrase-initial and phrase-final phonemes. Table 8.S compares the proportion of vowels and consonants in word-initial, word-final, and unspecified position for letters and phonemes. Tables 8.7 and 8.8 present those trigrams which occur in the initial position and those which occur in the final position of a word. Since the first and last parts of words carry most of the information, it was thought that tables showing frequency of occurrence of trigrams in these positions might be of especial use in preparing material for initial reading instruction or for perceptual experiments. Similar tables of word-initial and word-final tetragrams are also given (Tables 8.10 and 8.11).

    2 PROCEDURES

    2.1 SOURCES OF MATERIAL

    2.1.1 Spoken Language

    2.1.2 Written Language

    2.2 TRANSCRIPTION PROCEDURES

    2.2.1 Spoken Language

    2.2.2 Written Language

    2.3 STATISTICAL METHODS

    1 These texts have already been used by the Southwest Regional Laboratory of the U.S. Office of Education for analysis of syntactic patterns.

    ¹ A possible exception is the groundwork for the St. Cloud method of teaching French as a second language (Gougenheim et al., 1956), but the language sample is neither large nor thoroughly analyzed.

    1.2 USES FOR TEXTS AND TABLES

    2 PROCEDURES

    2.1 SOURCES OF MATERIAL

    2.1.1 Spoken Language

    The few published samples of spoken language which have been collected are limited by several difficulties: small sample size, restricted vocabulary, formal, stressful, or otherwise constrained situations, and more particularly by poor fidelity of recording—a very important point for phonetic analysis—and are ordinarily unavailable for reanalysis; Of the four modern analyses of spoken English which concern themselves with phonology and analysis of language constraints, Denes’ paper (1963), is only an analysis of two phonetic readers, Baddeley, Conrad, and Thomson’s paper (1960), is based upon written British radio dialogue, Hultzen, Allen, and Miron’s monograph (1964), is based upon conversation from American one-act high school plays, and Roberts’ data are derived from words from a lexicon spoken in sentences. For our analysis it was desired that the sample should be sufficiently large so that any descriptive measure (with the exception of word counts, which require a vast and expensive sample) would be stable and representative. Likewise, it was important to obtain adequate language samples at several levels of language competency. The two reasonable extremes were thought to be an adult sample and a first grade (approximately 6-year olds) sample, the latter consisting of children who were just beginning to learn to read and thus have virtually no acquaintance with visual language. To have dealt with younger children would have entailed great difficulties in terms of rapport, time, and representativeness. The other two samples chosen were third and fifth graders, judged to be at such positions on the growth curve as to shed some light on the development of language to adult status. The adult sample was similar to the child samples in community, national and regional origin, and socio-economic status. It was obtained from junior college classes of a city college in California.

    We arranged to use children from two different schools to reduce possible biases of unsuspected sorts. Both schools had children drawn largely from the middle socio-economic level. The investigation of the effect of socio-economic status on language development is an important one, but not the subject under investigation here. Moreover, it also requires a norm. It was just this norm for studies of language development of all sorts that we wished to make available. We used all children in a grade who were present when called and were not excused by reason of foreign language background, marked non-California dialect, or speech impediments. Since regions of the country differ in the phonemes used in speech, it was judged more in keeping with the aims of the study to include only one type of regional speech, which happened to be Southern Californian.

    Several ways of arranging a situation in which natural speech would occur were tried. First, an attempt was made to record unrestricted free speech of children among their peers in the lunchroom or on the playground. This was a complete failure acoustically because the noise exceeded the signal at all times. An attempt to record discussions in the classroom also failed because of excessive noise and poor flow of speech. Next, recordings of speech were made while individual children responded to requests to make up a story about pictures shown them. The speech was impoverished in grammatical constructions, vocabulary, rate, and even phonemes. The speech of a child when responding to requests to tell about summer vacation or what he would do during the next vacation likewise was not normal speech. Next, Strickland’s method (1962) was tried. This required props—a group of objects on a table—and a small group (3 or 4) of children seated around it. The children were asked by the adult moderator to talk about the objects. The response was better in this situation, but even here there appeared to be restriction both of vocabulary, and grammatical forms; the normal flow of speech became halting. Finally we tried a simple social situation. Three children were seated around a small table with a young, friendly adult. The adult greeted the children by name, told them she wanted to find out what children in their grade were interested in, and asked them to talk to each other about anything they wanted to talk about. Some groups required somewhat more encouragement; if so, the adult asked a question or two: MWhat do you do after school?, or How many in your family?" Thereafter she said nothing. After the initial warm-up period, which was discarded for the transcription, the speech appeared to be children’s normal speech. It was rapid; there were interruptions; it covered every conceivable topic; it was full of slang and noise words; there was give-and-take. This, then, was the situation in which all the child speech samples were recorded.

    For the adults the situation was structured differently. The participants were from elementary psychology classes, so their knowledge of psychological jargon was flattered. They were told that the experiment was one in small group process and that the situation was to be completely non-directive. Then they were introduced by first names and told they were at a party. The experimenter excused himself (psychologically) to get the snacks. Again groups of three were always used. Most of the adults did not know each other, whereas in the children’s group, they did. The 3-person interaction proved as useful for adults as for children, and the language produced was judged to be normal, everyday conversation, as rapid, slangy, and diverse as any in an unrecorded situation.

    The second major problem concerned the quality of the recording. If phoneticians were to be able to transcribe the material with any reliability, the recording had to be very good—better than any speech recordings we had heard. To this end we assembled the following high-fidelity components: (1) Ampex 1260 4-track stereophonic magnetic tape recorder and (2) 2 Altec-Lansing Model M-20 condenser microphone systems. The microphones were placed 2 feet apart, and about 8 feet from the farthest talker. New Scotch (Minnesota Manufacturing and Mining Company) l.S mil acetate magnetic tape was used. Recording was at 7.S inches per second and each microphone fed a separate channel. Only two of the four tracks were used, giving the maximum possible channel separation, since an unused channel separated the two which were used. The cross-talk rejection ratio in the middle frequencies was much greater than 60 dB, more than adequate for our purposes. The recordings were made in the best available room at the three schools. Test tapes were made to disclose the acoustic properties of the room. Where necessary, one section of the room was draped with cloth, felt was applied to chair and table legs, microphones were adjusted and insulated. The gain of amplifiers was checked at every tape change. The microphones were always concealed from the elementary school children. To the best of our knowledge, their presence was not suspected. For the case of young adults, the microphones were in plain sight, since their presence would have been deduced in any case. After the brief warm-up period, the students lost themselves in conversation and paid no further attention to the presence of the microphones or the experimenter; in fact, it was often hard to get them to stop at the end of the period.

    Since the aim was to collect a sample as representative as possible of natural language, we attempted to include as many individuals and as many groups as possible, in order to reduce the effect of idiosyncracies of vocabulary, topic, sentence construction, pronunciation, and various aspects peculiar to spoken language. However, it was also important to allow sufficient time for each group to warm up and then become thoroughly engrossed in conversation, for otherwise many aspects of oral language suffer.

    We have shown (Jones and Carterette, 1963) that at least 6,000 words are necessary for stable statistical results, so the goal was set at 10,000 words per level. It was felt that more than this would require too much time for phonemic transcription, which is very slow. Well over 10,000 words per level were actually transcribed, but more material than that was collected and remains untranscribed. The number of groups and individuals represented in this analysis follows:

    The reason for the discrepancy in number of groups and individuals is the size of the sample required at each level. First graders did more giggling, interrupting, and drowning each other out, so less usable material resulted per unit time. The total number of characters transcribed is given in the following table.

    * Column A refers to total number of characters, B refers to this total with word marks (space) and sentence marks (period) removed.

    2.1.1 Written Language

    The sources of written language suitable for the various grades was sought first in standard reading textbooks. Readers of two widely used series were chosen as representative of material of this sort; first, third, and fifth readers were used in both series. The series were the Ginn Readers (1956, 1957), and the Sheldon Basic Readers (Allyn and Bacon, 1957). In addition, the second reader in the Ginn series was analyzed, after it appeared that the step between first and third readers was very large. All material in a given section was used with the exception of foreign words and poetry. Abbreviations were spelled out according to rules (e.g., Mr. = Mister). Any story containing a large number of foreign words was omitted. The only punctuation marks used were word and sentence mark. The exact rules are given in Section 4.1.

    Next, material which children like to read was sought. Wilson’s Children’s Catalog (West and Schor, 1961) was used as a guide because of the large number and wide distribution of the judges making the ratings. These books are rated by children’s librarians from all over the country and by specialists in children’s literature as books they could not do without. Each rating is accompanied by a concensus of proper grade level. We chose at random from among those books rated in the top one-ninth and assigned to kindergarten through second or third grade as our level 1, those assigned to grade 2 or 3 through 5 as our level 3, and those assigned to grades 4-6 as our level 5. At a single level we used all or part of three or four separate books, drawing equally from them to reduce the effects of idiosyncratic language, aiming at approximately 10,000-word samples. Although there is undoubtedly some adult bias in these ratings due to pedagogical urges, the ratings are also dependent upon circulation and enjoyment in story hours. It is difficult to believe that a book which is unpopular with children could remain in the list year after year, regardless of adult bias. At least these ratings have the advantage of representing a national sample and a large one.

    Written material on the adult level had previously been analyzed by Newman and Waugh (1960). Their calculations were used for average adult text (William James sample) and difficult adult text (Atlantic Monthly sample),⁴ but for the easy adult text (the King James’ Bible), we punched the same passage used by Newman and Waugh and as a test of the computer program calculated our own statistics.

    1.2 TRANSCRIPTION PROCEDURES

    1.2.1 Spoken Language

    Transcription of the spoken language was very difficult and time-consuming. Each tape was first transcribed by a typist, listening with binaural earphones on a research-quality playback system (Ampex Model 350-2, two channel magnetic tape-recorder). This often required many iterations. The instructions were to include all consecutive material, but if some part was incomprehensible to omit the entire utterance. Similarly, if some person was interrupted and did not pick up the thread of his utterance, the whole utterance was to be omitted, but if he did resume, then the interruption was to be omitted. The purpose of these rules was to preserve the continuous and sequentially-dependent structure of speech. It should be understood clearly that the texts preserve the order of occurrence of the speech of three peers. There was no attempt to collate the speech of a single individual. A string of phonemes from a single speaker is immediately followed by a string from another individual of the group; indentation signals change of speaker. In this way the natural sequence of meaningful discourse among several individuals is retained.

    Ordinary lexical words were used in these letter transcriptions to make them comparable with the written material; i.e., words were spelled properly and word marks occurred where they would normally occur in written material. However, normalized spellings were constructed for idiosyncratic words (e.g., guy, used by many of these children as an exclamation), noise words (cf. Strickland) like er, uh, and the like.

    The second step was to have the letter transcriptions checked by a research assistant. This always resulted in many changes. Next the phoneticians took over. When we could get them, there were two working. Because of the volume of material, it was not feasible to have them both transcribe all the text independently. Instead we had to be content with occasional reliability checks together with subsequent ironing out of discrepancies.

    Two formal reliability checks were made. The number of disagreements in a passage were counted and this number was divided by the total number of phonemes in the passage. In the first instance there was six per cent discrepancy in 2300 phonemes transcribed. In subsequent discussion, agreement was reached. This involved listening together to the taped passage and evolving new rules for transcription. Examples of these rules are: (1) Don’t use schwa if you can hear the quality of the vowel; (2) If the sound is genuinely between a t and a d, use the one which is normal for the word. A second check revealed a still greater discrepancy, sixteen per cent (1800 phonemes). Much of this was due to insistence by the authors that the transcription be as phonetic as possible within the bounds of the phonemic alphabet. Since the phoneticians were not used to this procedure, and also had used different systems in the past, some learning was necessary. The intent, with which everyone concurred, was to represent all the variations possible within the alphabets used. There were three very good reasons for not attempting a phonetic transcription: (1) the programming would have been more difficult, (2) the transcriptions would have been very much less reliable, and (3) the import of speech seems to be much better rendered by the phonemic transcription since this is presumably based upon the contrastive features of the language (Daniel Jones, 1962). The phonetic use of the phonemic alphabet with children was intended to permit discovery of lesser redundancy in young children due to inconsistency in pronunciation, if this existed. As an aside we should like to point out that apparently most so-called phonemic transcriptions of an individual’s speech that appear in the linguistic literature are really pseudo-phonetic transcriptions like ours. That is, a standard phonemic alphabet is used to represent the phonetic character of the speech, but the contrastive analysis necessary to establish his phonemic system is never undertaken. We shall hereafter refer to our analysis as phonemic, with the understanding that it is not technically a fully phonemic one.

    The problems of reliability of transcription were dealt with as effectively as possible under the constraints of time and availability of phoneticians. Phonetics is not an area in which Linguistics students in America are highly trained. It is therefore difficult to hire phoneticians with any but minimum experience. The turnover was likewise high, with long periods of vacancies and retraining. Four phoneticians participated in the final transcriptions and all were interested enough in the project to give it more than mere duty required. There was free interchange and discussion between the members of each pair and some interchange across pairs.

    The phoneticians placed wword marks" only where there were brief pauses. Instructions were to not make printed words a guide, but to indicate only breaks in the flow of speech. The placement of sentence marks became a very difficult problem. Since a good deal of the material was agrammatical, the use of a grammatical criterion was deemed inappropriate, because it might conceal any unusual implications of the data. An attempt was made to use tentative pause and final pause, but the two phoneticians could not assign the first of these reliably, so it was abandoned. The terminal juncture was placed by the phoneticians and was used whenever an utterance was terminated by a change of speaker or when there was a clear pause of a type the phoneticians were willing to call terminal, judging largely by intonation contours and appreciable silence. The reliability was not high for this aspect of the transcription. Detailed transcription rules are given in Section 4.1.

    The purpose of these rules was to permit generation of hypotheses about the natural units of language, since it became immediately obvious upon listening to the tapes that the language was quite different from linguistically ideal or written language. The 41-character alphabet used is presented in Section 4.3.1. There are 39 phonemes, 14 vowel and 25 consonant phonemes, plus word mark and sentence mark. The system used is basically the Trager-Smith system (1951), with some modifications suggested by Professor Peter Ladefoged for this study. The phonemic transcriptions were coded into an alphabet accepted by Fortran programs, checked, and key-punched.

    2.2.2 Written Language

    Since ordinary alphabetic characters are acceptable to the Fortran program to be used in the statistical analysis, transcription of the written material consisted in key-punching directly from the text, according to the rules given in Section 4.1. Once punched, these were proof-read by two people, corrections made, and the corrections proofed. The original keypuncher did not proof her own work alone.

    2.3 STATISTICAL METHODS

    A program for computing the sequential constraints of chains of symbols (i.e., letters, phonemes, words, grammatical forms) was written for the IBM 7094 computer. The program computed measures of information and redundancy on a printed language each of whose symbols was represented as an arbitrary symbol of the Fortran alphabet. The measure of information, as estimate of the Shannon-Wiener

    Enjoying the preview?
    Page 1 of 1