The Syllable in Speech Production

A Theory About the Syllable
The following speculation is based on a paper by Peter MacNeilage on the evolution of articulatory capabilities, the neural organization theories of Gerald Edelman, and various other sources.

MacNeilage's idea is that articulation developed from jaw open/close movements used for chewing. The primary basis for this claim, beyond the obvious movement similarities, is that the highest levels of articulatory control are in the same temporal/parietal/premotor brain areas as jaw control. According to MacNeilage, new modulator circuits would have evolved to alter various parameters during the basic jaw cycle, perhaps initially expanding on lip movements, then tongue position, and, much later (I assume), glottal control. This allowed the selective articulation of specific phonetic qualities, forming the basis for syllable control.

Lexical items are stored with specific codes for syllable-initial (SI) consonants, vowels, and syllable-final (SF) consonants. Speech error data (MacNeilage's analysis), the practices of the speech recognition industry, and "my intuition" suggest that SI and SF consonants use different coding sets. Thus, I will refer to "phones" rather than "phonemes", consistent with the terminology used by the speech processing industry. As evidence for this position, I would cite articulation gestures which differ between initial and final consonants, such as /l/ and /r/, and duration differences, especially of stops. I find it interesting that Maddieson's work on "resyllabification" does not strongly support one side or the other. Languages can be found which tend to support the argument for either side of the question. For the time being, I am going with distinct SI and SF codes. (We'll see why in a minute).

When a word is activated for speech production, the syllables are sequentially activated, probably triggered by the timing of syllable output. As each syllable is activated, the phone codes and prosodic controls are linked to the jaw cycle modulators. The jaw movement cycle proceeds under timing control of midbrain sequencing circuits [note 1].

As the jaw cycle proceeds, the various articulatory modulations are imposed upon the basic open/close cycle, producing the syllable. As the jaw cycle timing pattern starts to repeat, a trigger is sent back to the lexical entry which activates the next syllable code packet. Having distinct SI and SF consonants means that the entire set of articulation modulators can be connected in one step, and not require a sub-syllable mechanism.

Speech errors are by far dominated by swaps between similar phones in corresponding positions in adjacent syllables. This strongly suggests a type of pre-load or look-ahead buffer being filled with the upcoming syllable.

Actually, I've left out a major step. As any linguist will protest, "Hey, What about the phonological component?" For this, I like Jackendoff's model of a "community blackboard" system, in which lexical items are sequentially posted to a memory space which is also accessible to the phonological rules. Any necessary changes are made by the rules and the modified codes are then available to the articulatory mechanism.

In terms of "Edelman-like" neural connectivity, it would be something like the following: Links to (certain details of) the lexical items are activated, which connect those details to corresponding control structures in the phonological system. The links are rearranged (in some way that I haven't yet seen described or worked out for myself) to implement any needed phonological changes and then new links from there to the jaw cycle modulator controls are activated.

The articulation modulators are quite abstract at the level we have been discussing. Keep in mind that the actual motor movements still need to be processed through smoothing and coordination circuits (what Edelman calls the thalamo-cerebellar "major loops") for handling bite-block adjustments, etc. and through the basic cerebellar motor control circuits. Considering these additional layers of processing, it is my belief that much of what linguists tend to ascribe to a phonological component is actually done by the interaction of these lower-level motor control curcuits with the physical constraints of the articulator mechanisms.

I earlier believed that all phonological processing could be accounted for in this manner. I now think that "most" of it can be so attributed, but that there remains a core of phonological operations which must be handled in the traditional way by a component of the grammatical system.

Phonetic/Phonological Features
Note that the abstractness of the jaw-cycle modulator representations at lexical and phonological levels might suggest the appropriateness of a traditional system of distinctive phonological features. I see no reason to believe that all of the modulators must be expressed by a distinctive feature system, but, on the other hand, if a particular abstract articulator movement specification is available at the right place and the right time, it might very well be used opportunistically to aid in specifiying some other correlated activity. In that sense, phonological rules could make use of something very much like phonological features. However, I believe they need not be systematically "distinctive".

Browman and Goldstein discuss an interesting an interesting articulatory model based on gestures, rather than the more traditional features. It is the hope of many in the speech community that this model, or something similar, will be taken up and developed to continue in the direction where Browman and Goldstein have left off.

King and Taylor present a fairly convincing argument that the perceptual system can make good use of a feature representation in recognition. The issue is that, due to coarticulatory effects, the various features do not all turn on or off at phone boundaries. The overlaps can be used to good advantage by the perceptual system. Although promising, their results are not yet final. It seems fairly clear that such feature effects would not account for all allophonic variation, but given variable, rather than binary, values, some such variation might be accounted for in this way.


Note 1:
Edelman has suggested that these timing circuits would be located in the caudate nucleus. MacNeilage makes several references to a timing system known as the central pattern generator. However,
Churchland and Sejnowski use this term to refer to a timing system further down in the spinal cord in which alternating flexor-extensor actions are out of phase between the left and right sides of the body. While appropriate for gait pattern control, such a system is clearly not applicable for chewing. Thach, et al, in Gazzaniga describe the cerebellar connections which regulate the motor activity, but do not discuss timing generators. Related discussion may be found in Arbib, et.al.


Show list of book reviews sorted:
by Topic | by Title | by Author
Show list of essays sorted:
by Topic