The Evolution of Language

The Starting Point
We start with the assumption (see MacNeilage 1998) that our pre-linguistic forebears possessed brain functions and capabilities similar to those that we see in today's non-human primates. We will occassionally refer to modern apes as a basis for guessing what could have been going on with our ancestors.

The story begins at a time when few human-specific traits can be discerned, putting it at a period approximately 2.5 million years ago, with the appearance of Homo habilis. One human-like characteristic that is fairly well developed (and well documented) is an upright walking posture. Tool making has not advanced much beyond the stage of using sticks and stones in essentially their natural forms for various purposes, often related to obtaining food. There is a rich social structure of self-aware individuals with well-established pairwise and kinship relationships. And at about this time, climate changes were underway which would lead to new ways of life, out of the forests and onto the savannas.

A change that was perhaps underway, or just beginning, was a new approach to tool making. Evidence from Olduvai levels of 2 million years ago (give or take 1/2 million years) shows the transportation of tool materials over distances well beyond anything seen in present-day apes. In addition, Toth and Schick claim that the care with which stones were chipped to form cutting edges demonstrates a level of manual skill and hand/finger coordination which exceeds the abilities of apes.

The parietal/temporal brain areas which will later develop into articulator controls are now devoted to chewing and other jaw movements. MacNeilage notes, in particular, the cyclic, repetitive action involved in chewing and proposes that these brain controls were well suited to be restructured for speaking as new circuits were added to allow more detailed control to modulate the basic pattern of the jaw movement cycle. The modulated jaw cycle lays down the basis upon which syllable articulations will form.

The basic jaw movement consists of opening and closing motions, with lip and tongue movements limited to the manipulation of food particles (and avoiding the teeth). MacNeilage lists three types of modulations which come under increasingly detailed control, lip and tongue body position, and voicing timing control. He gives a possible scenario that independent lip movement controls may be the most likely to have appeared first, growing out of presently observable lip smacking activities which often accompany grooming and other close social behavior.

Lieberman argues that the throat configuration of H. habilis would not allow the nasal tract to be closed off from the throat, meaning that it would not be possible to produce a "ba" sound distinct from a "ma" sound. This also implies that the timing control of voicing relative to lip movement initially need not be particularly precise. I believe that the addition of voicing timing almost certainly came later, after the basics of vocabulary growth had started.

Lieberman also notes that the larynx/tongue configuration would limit the range of vowel qualities which could be produced. MacNeilage considers an initial ability to manage three distinct vowel qualities with the tongue positioned at the limits of travel forward (but still behind the teeth), backward and downward, giving a trio of sounds which we could symbolize with "i", "u" and "a". Again, we need not assume that all three were initially available. A plausible scenario could begin with a single distinction, such as "a" vs. "u". This particular pair is suggested because, as MacNeilage notes, the distinction can be made initially by the use only of jaw and lip movements, with little or no need for modification of tongue position controls. These jaw and lip movements are already used in the same manner for existing calls.

An interesting question is whether a vocabulary can begin to develop with a total repertoire of syllables consisting of "ma" and "mu". Keep in mind that these new utterances appear alongside an existing repertoire of a few dozen established calls. Calls already exist which mean "food", "hawk", "snake" and "big cat" (or other large predatory animal?), as well as a variety of sounds used during social interactions. What is different is that the new sounds are controlled by different brain areas, bringing them under greater voluntary control, while the existing calls are largely uttered involuntarily. This difference is hypothesized by Calvin and Bickerton and discussed at length by MacNeilage.

Before the new "words" are useful, however, they must mean something. The new vocabulary items must be connectable to conceptual structures. Present-day primates can distinguish the sounds of different words when spoken by humans, but do not readily learn meanings for them. We can imagine the "ma" and "mu" sounds occurring more or less randomly, without predetermined environmental associations.

Falk, Lieberman and Tobias have described possible changes in the Homo habilis skull which indicates brain growth in the parietal/temporal region, although these interpretations have been sharply criticized (see MacNeilage paper and ??). Countering this interpretation, _______ claims that the evidence for brain configuration based on skull indentations is extremely doubtful. He states that ... (see Lieberman?). MacNeilage discusses an increase in the ability to mimic and adopt observed behavior. Bickerton discusses a scenario centered around a change of habitat from forest to savannas, which would place a value on the availablility of additional vocabulary in situations which were not covered by the existing predetermined calls.

We must assume, then, that some such pressure leads to the development of new connections between the utterance controls for the new sounds and existing conceptual structures. If we can imagine how the new semantic connections might develop, it seems reasonable that established connections between perception and motor activity will be elaborated concurrently, allowing the perceptual identification of the new sounds to be connected to the same conceptual structures.

How could these new semantic connections be formed? The question can be approached from two perspectives, either as individual adaptations or as group changes. A possible argument might be MacNeilage's observation that lip-smacking behavior often occurs in close social situations. Even without specific conceptual structures, it is still plausible that associations could be formed between specific sounds and particular social activities (Present-day primates can, with some difficulty, form such associations). An individual might be seen as having enhanced reproductive desirability based on a habitual use of such associations. On the group level, the increased use of such social mechanisms could conceivably enhance group cohesion.

Clearly, these are speculative ideas. Pinker has presented additional ideas on how meanings might have developed. Pinker and Bloom present some scenarios for the initial development of semantic assignments.

With no predetermined semantic content, any specific meanings the new utterances might have are only those assigned by community agreement. One can imagine a long period of time, many thousands of generations, during which a few new utterances exist, perceived as random noises by many individuals, but associated with specific objects or events by gradually increasing numbers of individuals. See remarks on this point by Pinker and Bloom.

New Articulations, New Vocabulary
Once the first brain connections for learning meanings have begun to appear in a significant fraction of the population, there is a relatively sudden demand for additional vocabulary capabilities. This puts pressure on the mechanisms for modulation of the jaw/syllable cycles to refine existing distinctions and to add new combinations. The same tongue position controls that allow the distinction between front and back vowels can also, with new jaw cycle modulators, be used to produce a closed articulation at the tongue tip. Without nasality control, this adds "ni" and "nu" to the vocabulary.

As new articulatory capabilities are added, such new capabilities have relatively large payoffs in increased vocabulary size. As an example, if we assume the existence of lip vs. tongue tip closure and high vs. low vowels, an increase in vocal fold timing accuracy, allowing a voiced vs. unvoiced distinction, would potentially double the syllable inventory from four to eight.

     closure original
  lips ma, mu pa, pu
  tongue-tip na, nu ta, tu

This change, however, requires the refinements of both motor controls and auditory discrimination. Again, there will be long periods during which the new distinctions will be used more or less randomly, with some individuals having better command than others. It does not really matter whether a particular assignment of meaning survives across these generations. There will be no shortage of environmental situations begging to be "named" and we could expect continuing flux in the actual semantic assignments.

There are a couple of technical problems with this development. The minor issue is that all vowels are nasalized, rather than oral as suggested by the above notations. The second issue is more serious. There is some doubt about just what might be the nature of a labial (or any oral) closure, given that the nasal passage cannot be closed at the velum. According to Liebermen, the larynx height change, which will remedy that situation, will not occur for about a million years yet. Such a condition renders a voiced/unvoiced distinction much less salient than it would be with velar control. This raises the issue that the phonemic system might well have been limited to place of articulation changes (and possibly a fricative distinction) until well after the development of lexical management capabilities. The implication is that vocabulary growth came slowly until the larynx height change provided a larger phonemic inventory. On the plus side, it does set the stage for increased pressure on the range of available articulations, leading to the selective forces needed to cause changes in the height of the larynx. But, as Lieberman makes clear, that change had a large cost in breath and food management. It could take a while. This could help explain why neither H. habilis nor their successors, H. erectus, made much progress with tool development until near the emergence of H. sapiens.

Homo Erectus and More Words

Larynx Height Changes

The Evolution of Syntax
What is the nature of the innate component of syntax? While theoretical linguists hotly debate this point, the AI people and neural network builders have shown how it might be possible to learn language with very little, if any, innate grammatical capability. Elman has shown that parts of speech and related agreement patterns can be learned by certain types of network architectures without any prewired connections. This learning, which Chomsky and others have claimed is not possible, requires that the network be set up such that it "advances" through early "developmental phases". If the complete network is initially exposed to the complete array of possible syntactic arrangements, learning does not occur.

Just what does it mean to require no innate syntactic structure? The network itself must exist, of course. And, if we apply that logic to the brain, the connections to the network from the appropriate word percepts must also exist. If we assume that a semantic structre already exists capable of learning community-assigned meanings, how much more is required to add the structures needed for learning syntax? The semantic connections need to be extended such that they connect to the outputs of the new syntax system, while retaining the old connections to the word percepts. The word percepts also must get connected to the syntactic network inputs.

More to follow ...

Show list of book reviews sorted:
by Topic | by Title | by Author
Show list of essays sorted:
by Topic