ARCHIVEUM · The Architecture of Language · Artifact I of VIII
The biological and social origins of the most extraordinary thing human beings do, examined from first principles through the fossil record, genetics, neuroscience, and the animal kingdom.
Prelude
Consider what is happening when two humans talk. One of them produces a rapid series of small, shaped sounds, each one lasting a fraction of a second, each one formed by a precise choreography of tongue, teeth, lips, and expelled breath. The other, without touching, without being shown anything, without any physical demonstration at all, reconstructs from those sounds a complex thought about something that happened yesterday, in another city, to someone who is not present. They do this effortlessly, automatically, at conversational speed, while simultaneously tracking the social dynamics of the exchange, planning their own response, and sipping coffee.
No other animal on Earth does this. Not remotely. A chimpanzee can learn to associate a symbol with an object, and can use that symbol to request the object. But it cannot tell another chimpanzee about something that happened last week, or warn it about a danger that lies around a corner it has not yet turned, or propose a plan for tomorrow. The gap between what language allows and what any other communication system allows is not a gap of degree. It is a gap of kind.
Language is so deeply woven into human experience that its strangeness is nearly impossible to perceive. It is like asking a fish to notice water.
Understanding how language evolved requires sitting with its strangeness first. Something happened in the history of one primate lineage: something genetic, neurological, anatomical, and social, that produced a communicative system of unlimited expressive power. That it happened at all is extraordinary. That we carry it inside us, and use it every waking moment without noticing its improbability, is more extraordinary still.
This artifact works from the ground up: first, what language is (formally, precisely); then, what biology makes it possible; then, the contested questions of when it evolved, why it evolved, and what its universality tells us about human nature. Several of the central questions remain genuinely open. Where that is the case, the honest position is to say so.
I
The first serious attempt to define language formally enough to ask whether other animals have it was made by the American linguist Charles Hockett in a 1960 paper in Scientific American, "The Origin of Speech." Hockett identified thirteen design features that characterise human language, properties that, taken together, distinguish it from every other known communication system. Not all of them are unique to human language individually, but their combination is.
The features that matter most are five.
Ferdinand de Saussure
1857 – 1913 · Cours de linguistique générale, 1916
Saussure distinguished between langue, the abstract system of rules and conventions shared by all speakers of a language, and parole, meaning any individual act of speaking. Linguistics, for Saussure, studies langue, not parole. Parole is too variable, too contextual, too individual to yield stable generalizations. Langue is a social fact: it exists between minds, in the space of shared convention, rather than in any single individual. He also distinguished synchronic analysis (the language as a system at one moment in time) from diachronic analysis (how the language changes over time), arguing that synchronic analysis has logical priority. These distinctions turned linguistics from historical philology into a structural science, and they propagated outward to influence anthropology, literary criticism, and philosophy throughout the twentieth century.
II
Language is not purely a cognitive phenomenon. It is also a biological one: a set of structural adaptations in the brain, the skull, and the throat that no other primate shares in the same configuration. Understanding these adaptations is essential to understanding both what language is and how it evolved.
In all other adult mammals, and in human infants up to about three months of age, the larynx sits high in the throat where it can interlock with the nasopharynx, creating a passage that allows simultaneous breathing and swallowing. This is a useful arrangement for animals that need to drink without pausing to breathe. In adult humans, the larynx has descended to a lower position, opening up a resonating chamber (the pharynx) that permits the production of the wide variety of vowel sounds that human languages use. The vocal tract becomes, in effect, a two-resonator system capable of generating the acoustic complexity that language requires.
The anatomist and linguist Philip Lieberman, working from the late 1960s onward, reconstructed the probable vocal tracts of extinct hominids and Neanderthals from fossil skulls and argued that the high larynx would have severely limited their phonemic range. His specific claim about Neanderthals has been disputed; the discovery of a Neanderthal hyoid bone at Kebara Cave in 1989 whose morphology is essentially identical to that of modern humans weakened Lieberman's case. The general point stands: the descended larynx is a human specialization with a clear functional connection to complex vocalization.
Broca's area (inferior frontal gyrus, left hemisphere) governs speech production; Wernicke's area (posterior superior temporal gyrus) governs comprehension. The arcuate fasciculus, a bundle of white-matter fibres, connects them. Damage anywhere in this circuit produces characteristic language deficits.
In 1861, the French surgeon Paul Broca examined the brain of a patient at the Bicêtre hospital near Paris. The patient, nicknamed "Tan" because it was essentially the only syllable he could produce, had spent twenty years unable to speak in any other way despite appearing to understand language perfectly. He had been able to express himself clearly through gesture and facial expression; his intelligence seemed undiminished. When Tan died, Broca performed an autopsy and found a lesion in the left inferior frontal gyrus.
Broca concluded that speech production was localized to this region. He was partially right. Broca's area is not a simple speech-production centre; modern neuroimaging has shown that it is also involved in syntactic processing, in understanding complex sentences, and in sequencing motor actions more generally. But Broca's observation was foundational for two reasons. First, it demonstrated that specific cognitive capacities can be localized to specific brain regions, a radical claim in 1861. Second, it revealed that language is lateralized: predominantly in the left hemisphere. This lateralization is itself significant, and its implications remain actively studied.
In 1874, Carl Wernicke described a complementary region: the posterior superior temporal gyrus of the left hemisphere. Patients with lesions there produce fluent, flowing speech, but what they say is semantically incoherent. They may substitute incorrect words (paraphasia), produce invented non-words (neologisms), or generate grammatically structured but meaningless output. They cannot understand what is said to them. They are not confused about the language; they have lost the mechanism by which spoken sound is mapped to meaning.
Carl Wernicke also proposed that damage to the white-matter tract connecting his area to Broca's area should produce a characteristic syndrome: patients who can understand and can produce speech but cannot repeat what they have just heard. This prediction was confirmed. The tract is the arcuate fasciculus, and the syndrome is conduction aphasia. The existence of conduction aphasia as a distinct clinical entity, predicted from the anatomy alone, was early evidence that the neural architecture of language is genuinely modular in its structure.
Modern neuroimaging has significantly complicated the Broca-Wernicke model. Language involves a distributed network that includes the left inferior frontal gyrus, the posterior superior temporal sulcus, the angular gyrus, the middle temporal gyrus, the premotor cortex, and regions of the basal ganglia. Angela Friederici's work at the Max Planck Institute for Human Cognitive and Brain Sciences has traced two distinct white-matter pathways connecting frontal and temporal language regions: a dorsal stream involved in syntactic integration and a ventral stream involved in semantic processing. Language is not a single cognitive system; it is an architecture of interacting systems.
What makes this neural architecture interesting in evolutionary terms is that the homologues of most language regions exist in chimpanzee brains. The relevant areas are not unique to humans; they have simply been expanded, re-wired, and more tightly lateralised. The human language faculty did not emerge from nothing. It was built from pre-existing cognitive and neural machinery, repurposed and elaborated. This fact is both expected given evolutionary theory and fascinating in its details.
III
In the late 1980s, researchers at University College London began studying a London family designated the KE family in the scientific literature. A severe speech and language disorder ran through three generations. Half the members of this family, affected in a pattern consistent with autosomal dominant inheritance (meaning a single copy of a mutated gene was sufficient to cause the condition), shared a characteristic cluster of difficulties that had been observed in the family for at least thirty years.
The affected members had profound difficulty with the fine motor control required for fluent speech. They could not consistently make the rapid, precisely timed movements of the tongue, lips, jaw, and velum that articulate speech at normal rates. But the deficit was not purely motoric, and this is what made the family scientifically significant. Affected members also showed specific difficulties with grammatical processing: they struggled with morphological inflections (marking past tense, plurals), with complex syntactic structures, and with tasks that required explicit manipulation of grammatical rules. Non-verbal intelligence was somewhat affected but not severely. Language, and particularly its grammar, was disproportionately impaired.
The hyoid bone from Kebara Cave, Israel, the only known Neanderthal hyoid. Its morphology is essentially indistinguishable from that of modern humans, suggesting the anatomical prerequisites for speech were present in Neanderthals. Whether those prerequisites were sufficient for full language remains contested.
In 2001, Cecilia Lai and colleagues published in Nature the identification of the mutation responsible for the KE family's disorder. It was a point mutation, a single nucleotide change, in a gene on chromosome 7, designated FOXP2. The mutation changed a single amino acid in the FOXP2 protein, disrupting its function.
FOXP2 belongs to a family of transcription factors, proteins that bind to DNA and regulate the expression of other genes. It is not a "language gene" in any simple sense. It is a master regulator of gene expression during development, with binding sites affecting several hundred downstream genes. These downstream genes are involved in lung development, heart development, gut development, and neural circuit formation. FOXP2 is expressed in birds that learn songs. It is expressed in bats that use echolocation. It is expressed in mice. Its evolutionary history extends back at least 450 million years, long before anything resembling language, long before birds or mammals. What matters for language is not FOXP2 itself, but the specific version of FOXP2 that humans have.
The human version differs from the chimpanzee version at two amino acid positions, two changes that occurred in the human lineage after its divergence from chimpanzees. These changes appear to have altered the regulatory targets of the FOXP2 protein, particularly in the striatum and the cerebellum, regions involved in the learning and execution of motor sequences. Neuroimaging of the affected KE family members shows reduced grey matter density precisely in these regions, as well as in Broca's area.
In 2007, Johannes Krause and colleagues at the Max Planck Institute for Evolutionary Anthropology published a striking finding: they had sequenced FOXP2 from ancient DNA extracted from Neanderthal remains at the El Sidrón cave in Spain, and found that the Neanderthals carried both of the human-specific mutations that distinguish our version of FOXP2 from that of chimpanzees. These mutations were thought to have been positively selected in the human lineage sometime in the last 500,000 to 600,000 years, after the human-chimpanzee split but, according to this evidence, before the divergence of modern humans and Neanderthals.
What this does and does not prove deserves careful attention. Neanderthals had the human variant of FOXP2. FOXP2 is involved in the fine motor control of speech articulation and possibly in the neural processing of grammatical sequences. Many commentators concluded that Neanderthals could, therefore, speak. But the argument moves too fast.
FOXP2 is one gene among the thousands involved in the development of the language-capable brain. Its presence indicates that one necessary component of the language system was in place, but not that all necessary components were. A car engine is one component of a working automobile; its presence does not prove the vehicle has working brakes, steering, or wheels. FOXP2's presence in Neanderthals tells us that the neuromotor foundations of speech articulation were likely more similar to ours than some models had assumed. It does not tell us whether Neanderthals had syntax, displacement, or productivity, the features that most distinguish human language from animal communication.
The deeper lesson of the FOXP2 story is about the relationship between genes and complex cognitive traits. There is no gene for language. There is no gene for syntax. There are genes that, when mutated, disrupt specific aspects of the neural architecture that language depends on. FOXP2 is a lever in a machine of extraordinary complexity. The machine is what natural selection built over hundreds of thousands of years. What the KE family's story gives us is a window into one component of that machine, and a reminder that the window is very small.
IV
The desire to determine whether other primates could acquire human language is understandable. If a chimpanzee, our closest living relative sharing approximately 98.7% of our DNA, could be taught language, the uniqueness of human linguistic capacity would need to be radically reconsidered. Beginning in the 1960s, a series of research programs attempted this directly. The results were instructive, though not always in the ways their authors hoped.
The early attempts to teach chimpanzees to speak had failed because chimpanzees lack the vocal anatomy for human speech; their larynx is high, their tongue musculature different, their control of the oral articulators insufficient. Beatrix and Allen Gardner at the University of Nevada sidestepped this constraint. In 1966, they began raising a female chimpanzee, Washoe, in a home environment and teaching her American Sign Language. By the time Washoe was five, she had acquired approximately 160 signs, used them spontaneously in novel combinations, and generalized them appropriately to new referents. She signed "water bird" on seeing a swan, apparently producing a novel compound. Most compellingly, she appeared to teach signs spontaneously to her adopted son Loulis, who was never explicitly trained by human researchers, when Loulis observed and imitated her signing. Some of the sign combinations Washoe produced had the flavour of productivity: novel combinations that conveyed meanings not expressly taught.
Herbert Terrace at Columbia University set out to teach ASL to a chimpanzee he named Nim Chimpsky, the name a pointed tribute to Chomsky, whose nativist theory of language the project was designed to test. Nim acquired a vocabulary of roughly 125 signs over several years. But when Terrace analyzed the videotape record of Nim's signing sessions in 1979, he found something troubling. The majority of Nim's combinations were not productive new constructions; they were imitations or partial repetitions of what his trainers had just signed. Nim's "sentences" appeared to be extensions of the trainer's initiating signs rather than independent linguistic propositions.
Terrace extended his critique to the other ape language projects, including the Gardners' data on Washoe. His argument: apes were responding to subtle, involuntary cues from their trainers, cueing effects that had also misled researchers in studies of "Clever Hans," the horse famous in the early twentieth century for appearing to perform arithmetic. The apes were genuinely intelligent; they were learning to navigate the social dynamics of their interactions with humans. But they were not learning language.
Terrace's critique did not end the ape language studies, but it changed them. Subsequent researchers were more careful about controlling for cueing, more rigorous in their documentation, and more conservative in their interpretive claims.
Kanzi, a bonobo (Pan paniscus), acquired his lexigram system not through direct training but by observing his mother's training sessions. His demonstrated comprehension of spoken English sentences under controlled conditions remains the most impressive result in the ape language literature.
The most sophisticated case remains that of Kanzi, a bonobo (Pan paniscus) studied by Sue Savage-Rumbaugh at the Language Research Center in Atlanta. Kanzi's acquisition of his lexigram system, a keyboard of symbols each standing for a word, was itself unusual: he was not explicitly trained, but acquired it by observing his mother's training sessions as a young juvenile. This incidental learning through observation has a distinctly human quality.
Under controlled conditions designed to eliminate cueing, Kanzi demonstrated comprehension of spoken English sentences at a level comparable to a human child of approximately two and a half years. He could follow novel instructions that required correct understanding of sentence structure: distinguishing "Make the dog bite the snake" from "Make the snake bite the dog," or "Give the ball to Rose" from "Give Rose to the ball." These are not symbolic associations; they require sensitivity to word order as a carrier of grammatical meaning.
What does this establish? Bonobos can learn symbolic representation. They can combine symbols in ways that produce novel requests. They can respond to syntactic structure in comprehension tasks. What they do not do: spontaneously develop grammar in the wild. Produce sentences beyond three or four symbols. Teach language to their offspring without human intervention. Show anything resembling the explosive vocabulary acquisition, the "naming explosion," that human children exhibit at around eighteen months. Embed propositions within propositions. Narrate events. Ask questions. Refer to things that are not present.
The gap between what Kanzi can do and what a human four-year-old can do is not a quantitative gap to be narrowed by more training. It is a structural gap: a difference in the kind of cognitive system being applied to communicative tasks. Kanzi illuminates the lower boundary of what counts as language, symbolic reference and some sensitivity to ordering, while making the upper boundary, which humans occupy, all the more remarkable in its isolation.
V
The honest answer is that we do not know. Language leaves almost no direct fossil record. The structures of the mind are soft tissue; they do not survive for tens of thousands of years. We are left, as paleoanthropologists so often are, inferring cognitive capacity from physical proxies: anatomical structures, genetic sequences, and archaeological remains whose relationship to language is always a matter of interpretation.
The modern human skull shape, high and rounded, with a relatively flat face and a fully descended larynx, has existed for approximately 200,000 to 300,000 years. Homo sapiens specimens from Jebel Irhoud in Morocco, dated in 2017 to approximately 300,000 years before present, already show the facial morphology of modern humans, though the braincase shape was still somewhat more elongated than today. By around 200,000 years ago, skulls essentially indistinguishable from modern humans are found in the record.
From the skull shape, researchers can make inferences about the likely position of the larynx and therefore about the range of sounds the vocal tract could produce. The inference is indirect and contested, as the larynx itself does not fossilize, but the overall picture is consistent with the vocal anatomy required for full human speech being in place for at least 200,000 years.
Endocasts, casts of the inner surface of fossil skulls that preserve the gross shape of the brain, offer another line of evidence. Broca's area appears as a subtle bulge on the left frontal lobe. Endocasts of specimens of Homo habilis from approximately 2 million years ago show what may be a Broca's area enlargement, though interpreting endocasts is notoriously difficult, and claims about cognitive capacity from skull impressions must be treated cautiously. More reliable are the endocasts of Homo heidelbergensis and Neanderthals, which show brains broadly comparable in size to modern humans and similar in their broad organisation.
The most striking evidence for fully modern symbolic cognition comes not from anatomy but from archaeology. The Upper Palaeolithic, beginning around 50,000 to 60,000 years ago in Europe, is characterised by a rapid proliferation of symbolic and cultural material that leaves an unmistakable signature in the archaeological record: cave paintings, carved figurines, musical instruments, body ornaments, and deliberately modified objects of no obvious practical function.
The paintings at Chauvet Cave in southern France, dated to approximately 32,000 years before present, are drawn with a sophistication that is genuinely difficult to process. The horses, aurochs, and rhinoceroses depicted are anatomically accurate, rendered in perspective, with an understanding of how animals look in motion and how overlapping bodies create spatial depth. These are not the scratching experiments of creatures testing a new cognitive capacity. They are the products of an aesthetic tradition already highly developed.
From Hohle Fels in Germany: a vulture-bone flute dated to approximately 40,000 years ago, with carefully aligned finger holes. From the same site: a carved ivory figurine, the Venus of Hohle Fels, of similar age. From Blombos Cave in South Africa: ochre engraved with abstract geometric patterns dated to 77,000 years ago. Shell beads from Nassarius kraussianus perforated and apparently strung at Blombos, 75,000 years old. These African dates push the evidence for symbolic behaviour back well before the European Upper Palaeolithic.
The question this raises is whether the Upper Palaeolithic explosion represents the arrival of language, or merely its first clear archaeological signature. Both interpretations are defensible. Neither is conclusive. The key difficulty: absence of evidence is not evidence of absence
Jared Diamond argued in The Third Chimpanzee (1991) for what he called the "Great Leap Forward," a relatively sudden transformation in human cognitive and cultural capacity around 50,000 years ago, possibly caused by a genetic mutation that enabled fully modern language. On this model, anatomically modern humans existed for 150,000 years before they were cognitively modern; something changed around 50,000 years ago that unlocked the full capacity.
The African evidence has increasingly complicated this picture. The earliest unambiguous symbolic behaviour, ochre engravings, shell beads, and compositional pigment use at Blombos Cave, predates the European Upper Palaeolithic by tens of thousands of years. This suggests either that behavioural modernity emerged gradually in Africa over a long period, or that it emerged and then periodically disappeared as small populations experienced bottlenecks, or that it was triggered by demographic expansion rather than genetic mutation. The debate is genuinely unresolved.
Neanderthals (Homo neanderthalensis) were large-brained, sophisticated tool-makers who buried their dead, used ochre, made personal ornaments, and, according to FOXP2 evidence, carried the human variant of the gene most directly associated with speech articulation. Their brain volumes overlapped with those of modern humans. Were they capable of language?
The evidence is genuinely ambiguous. The Kebara hyoid is consistent with a vocal tract capable of human speech. The FOXP2 evidence is consistent with the neuromotor foundations of articulation being in place. But the archaeological record of Neanderthal symbolic behaviour, while present, is less elaborate and less consistent than that of contemporary Homo sapiens. Whether this reflects a genuine cognitive difference, or simply reflects different lifestyles and ecological contexts, or reflects the fact that most of Neanderthal Europe during the relevant period has not been excavated, remains unresolved.
The minimum defensible claim: the anatomical and genetic prerequisites for language were in place in the hominin lineage leading to both modern humans and Neanderthals at least 500,000 years ago. The behavioural evidence for fully modern, symbolically complex language use becomes unambiguous by 40,000–50,000 years ago at the latest and is strongly suggested by 75,000–80,000 years ago in southern Africa. Everything before that is inference from proxies, and the proxies are imperfect.
VI
Once the question of when language evolved is set aside, or at least honestly deferred, the question of why it evolved becomes pressing. What was the selective pressure? What problem did language solve? Three serious accounts exist. They are not simply alternative hypotheses waiting for a decisive experiment to adjudicate between them; they are answers to partially different questions, and the conflict between them reveals deep disagreement about what language fundamentally is.
Robin Dunbar
b. 1947 · Grooming, Gossip and the Evolution of Language, 1996
Dunbar is an evolutionary psychologist and anthropologist at Oxford. His Social Brain Hypothesis, developed through the 1990s, links neocortex size to social complexity across primates. His specific argument about language and social bonding, published in 1996, generated significant debate and remains influential in evolutionary linguistics, primatology, and social psychology.
Dunbar's argument begins with data rather than theory. In a 1992 paper in the Journal of Human Evolution, "Neocortex size as a constraint on group size in primates," he established a robust correlation across 38 genera of primates: species with a higher ratio of neocortex to total brain volume consistently live in larger social groups. The correlation holds across prosimians, New World monkeys, Old World monkeys, and apes. Computing the expected group size for a creature with the human neocortex ratio yields approximately 147 to 150 individuals, a figure that has become known as "Dunbar's number."
Neocortex Ratio vs. Mean Social Group Size Across Primates, after Dunbar (1992)
Data points approximate values from Dunbar (1992), Journal of Human Evolution. The human data point is the predicted value from the regression, not a measured wild group size. The dramatic discontinuity between great apes (~50) and predicted human group size (~150) anchors Dunbar's argument about the scale of human sociality.
The puzzle Dunbar poses: primate social cohesion is maintained through physical grooming, picking through each other's fur, which reduces stress hormones and reinforces social bonds. Data on primate time budgets show that monkeys and apes spend between 10% and 20% of their day grooming. Above approximately 20%, grooming becomes incompatible with adequate feeding and predator vigilance. This puts a hard ceiling on the group size any grooming-based social species can maintain.
Human ancestral groups, projected from neocortex size, should have been about three times larger than the groups that physical grooming alone could have maintained. Something had to replace physical grooming as the mechanism of social bonding. Dunbar's argument: language did. Specifically, conversation, particularly conversation about social relationships, about who did what to whom, about alliances and betrayals and reputations. What people actually talk about most of the time, in every culture that has been studied, is other people: their behaviour, their characters, their relationships. Dunbar calls this "gossip" and argues that it is not a trivial byproduct of language but its original function.
The elegance of this account is that it explains both the content of language (social information) and the scale of human sociality (groups of ~150) from a single mechanism. Its vulnerability is that it explains the communicative function of language while leaving its formal properties, recursion, displacement, and productivity, largely unexplained. Why would social grooming require syntax? Why would bonding conversations require the ability to embed propositions within propositions? Dunbar's account explains why our ancestors needed a richer communicative system than physical grooming; it does not explain why that system took the particular formal shape that human language has.
Noam Chomsky
b. 1928 · Syntactic Structures, 1957; The Minimalist Program, 1995
Chomsky transformed linguistics in 1957 by arguing that the grammars of human languages are not sets of memorised patterns but systems of generative rules, rules that can produce an infinite number of sentences from finite means. His subsequent work has refined and radically pared down this proposal, arriving at the Minimalist Program's account of syntax as fundamentally a single recursive operation, Merge.
Chomsky's position on language evolution has been deliberately contrarian. His view, particularly as developed in the 2002 paper with Hauser and Fitch and in subsequent work, is that language evolved primarily as an instrument of thought, a system for internal computation, and that communication is a secondary application of this system. The evidence he cites: language is a remarkably poor design for communication. It is deeply ambiguous. It leaves vast amounts unsaid, relying on context to fill the gaps. It is not transparent. These would be strange properties for a system shaped primarily by communicative pressure. They make more sense if the system was optimised for something else, for the internal manipulation of complex representations, and was subsequently pressed into communicative service.
The key operation is Merge. Take two syntactic objects, words or already-assembled phrases, and combine them into a new syntactic object. The output of Merge is available as input to a further Merge operation. This single recursive rule, applied to a lexicon, generates the structured, hierarchical sentences of human language. Chomsky's minimalist claim is that this is all that is uniquely linguistic in the narrow sense. Everything else, the ability to produce sound, the ability to perceive speech, the conceptual and intentional systems that give words their meanings, is either shared with other cognitive systems or shared with other animals.
The Minimalist Program's account of language evolution is accordingly modest: Merge is a small computational innovation, possibly arising from a single genetic change, that plugged into pre-existing cognitive systems and produced an enormous increase in cognitive expressivity. The suggestion is that language, specifically its recursive structure, is not a gradual adaptation but something close to a sudden emergent property.
Michael Tomasello
b. 1950 · The Cultural Origins of Human Cognition, 1999
Tomasello is a developmental psychologist and comparative cognitivist, formerly co-director of the Max Planck Institute for Evolutionary Anthropology in Leipzig. His work on joint attention in human infants and its absence in great apes has produced the most empirically grounded developmental account of how language-specific cognitive capacities emerge in human children.
Tomasello begins not with linguistics but with developmental psychology and comparative cognition. The critical experiment: nine-month-old human infants engage in what psychologists call joint attention. They follow a gaze; if an adult looks in a particular direction, the infant looks there too, monitoring what the adult is attending to. They follow a pointing gesture, looking at the object indicated rather than at the pointing finger itself. They initiate joint attention themselves, pointing to objects to share their attention to them with another person. These behaviours appear reliably in human infants at nine months regardless of culture.
Chimpanzees raised in human households, given every opportunity to develop these capacities, do not. They follow gaze; but they interpret pointing as a location signal (pointing toward the floor to indicate something buried there) rather than a referential one (pointing at an object to share attention to it). Most strikingly, they do not point at objects to share their attention to them with humans. They point to request, pointing at food they want, but not to share experience. This distinction, Tomasello argues, marks the boundary between a signaling system and a communicative one.
The foundation of language, on Tomasello's account, is not a formal operation like Merge but a social-cognitive capacity: the ability to understand that another mind has intentions, that those intentions can be shared, and that communication is a cooperative act in which both parties construct meaning together from the interaction of speaker intention and listener inference. This is the Gricean insight, the philosopher Paul Grice's argument that meaning in communication is fundamentally about the recognition of communicative intentions, applied at the level of developmental and evolutionary psychology.
These three accounts answer different questions and contain genuine incompatibilities. Dunbar's theory is about the selective pressure that made a richer communicative system advantageous, the ecological problem of managing large social groups. Tomasello's theory is about the cognitive prerequisite that makes language possible, the capacity for shared intentionality. Chomsky's theory is about the formal minimum of the linguistic system itself, the recursive operation that generates structured hierarchy.
The genuine conflict runs between Chomsky and Tomasello. Chomsky treats syntax as an autonomous formal system, shaped by internal computational efficiency rather than communicative use. Tomasello argues that the structure of language is shaped by the cooperative communicative context in which it developed, that syntax emerged from patterns of social interaction rather than abstract computation. If Tomasello is right, then the Chomskyan picture of a language organ disconnected from its social context is a fundamental mischaracterisation of what language is. If Chomsky is right, then Tomasello's developmental account, however empirically rich, misses the formal core of what distinguishes human language from everything else.
VII
Every known human society has language. No known human group lacks it. No group of humans has ever been found whose language is too simple or too primitive to count as language. This universality, human language being as universal as upright walking, demands explanation. The most influential explanation is Chomsky's Universal Grammar hypothesis: that humans are born with a specialised linguistic endowment, a set of abstract principles and parameters that all human languages conform to.
Children acquire the grammar of their native language by the age of three or four, from the often fragmentary, disordered, and error-ridden speech they hear around them. They receive no explicit grammatical instruction. They are not corrected for grammatical errors in any systematic way; parents who hear their child say "I goed to the shop" typically understand and respond rather than correcting the morphology. And yet children converge on the same complex grammatical system as adult speakers of their language with a speed and reliability that vastly exceeds what a general learning algorithm operating on the available evidence could achieve.
This is the Poverty of the Stimulus argument, first articulated by Chomsky in the late 1950s and refined repeatedly since. Consider a specific case. The rule for question formation in English involves inversion: "The man is tall" becomes "Is the man tall?" Simple. But what about: "The man who is tall is kind"? The question form is "Is the man who is tall kind?" and not "Is the man who tall is kind?" which is what a simple left-to-right inversion rule would produce. Children uniformly produce the correct form. They could not have learned this from simple induction over their input, because the input does not unambiguously distinguish the structure-dependent rule (invert the main verb, not the first verb encountered) from simpler alternatives. The correct rule must, on this argument, be somehow specified in advance, contributed by biology rather than learned from data.
Chomsky's proposed mechanism is the Language Acquisition Device (LAD): a specialised cognitive module that provides children with Universal Grammar, the abstract principles that all human languages share, and a set of parameters that individual languages set in particular ways. English sets the head of a phrase before its complement; Japanese sets it after. The child's LAD is pre-equipped with the knowledge that heads and complements exist and that their order is a parametric option; the child merely needs to determine which option the ambient language has selected. Language acquisition becomes, on this account, a matter of parameter setting rather than learning a system from scratch.
Steven Pinker's The Language Instinct (1994) made the nativist case accessible to a wide readership, arguing that language is a biological adaptation in the same sense that echolocation is an adaptation in bats: not a cultural tool that humans invented and that other animals merely lack, but a species-specific capacity shaped by natural selection, encoded in the genome, and expressed in the normal course of development.
In 2009, Nicholas Evans and Stephen Levinson published a paper in Behavioral and Brain Sciences titled "The myth of language universals: Language diversity and its importance for cognitive science." Their argument was empirical: there is no known property that holds of every human language without exception. Among the features commonly assumed universal, they cited: noun-verb distinction (absent in some languages), consonant-vowel syllable structure (massively violated), subject-object-verb ordering, case marking, reference to colour terms, and even, most controversially, recursion.
The recursion claim enters through the Pirahã controversy. Daniel Everett spent decades as a missionary and then linguist among the Pirahã people of Amazonia, a group of approximately 300 speakers of a language with highly unusual properties. In a 2005 paper in Current Anthropology, Everett argued that Pirahã lacks grammatical embedding, the ability to put a clause inside another clause. If true, this would directly contradict Chomsky's claim that recursion is a universal feature of human language, the core property of the Faculty of Language in the narrow sense.
Chomsky's response was that recursion is present in human cognition regardless of whether it appears in surface grammatical structures; the language may lack embedding but the cognitive capacity remains. Everett replied that this renders the UG hypothesis unfalsifiable. The debate, still unresolved, has forced the field to examine its foundational assumptions with unusual care.
Tomasello's alternative to Universal Grammar explains the universality of language without invoking innate grammatical principles. Human languages share properties not because those properties are pre-specified in the genome, but because all human languages are used by human beings with the same cognitive architecture, for the same communicative purposes, in broadly similar social contexts. The structural similarities are convergent products of common human cognition and communicative need, not a common grammatical blueprint.
The evidence Tomasello marshals from child language acquisition: children's early language is systematically item-specific. They do not begin with abstract grammatical categories and fill them with words; they begin with individual words and constructions, gradually abstracting away from the specifics. They over-generalise in ways that UG would not predict (producing "She giggled me" on the model of "She tickled me"). Cross-linguistically, the course of acquisition tracks the structural features of the ambient language rather than converging on a language-independent trajectory.
Where does this leave Universal Grammar? The nativist hypothesis has been weakened empirically by the diversity evidence, the Pirahã case, and the developmental data. But it has not been refuted. The Poverty of the Stimulus argument remains logically compelling: children do acquire grammatical knowledge that their input underdetermines. The question is whether this gap is best explained by innate linguistic structure or by domain-general learning mechanisms applied to rich, structured input. This remains one of the central unresolved questions in cognitive science.
VIII
In the late 1970s, the Sandinista government of Nicaragua established the country's first schools for deaf children in the capital, Managua. Before this, deaf Nicaraguans had been largely isolated from each other, raised in hearing families that communicated with them through improvised, family-specific gestural systems. When these children were brought together, something happened that had almost certainly happened many times in human history but had never been observed by linguists in real time: a new language was born.
The first cohort of students, older children who had already spent years developing their individual home-sign systems, communicated with each other. Their ad hoc gestural systems began to merge and stabilise. Patterns emerged. Conventions formed. By the early 1980s, a contact language was in use among the students: loose, relatively ungrammaticalised, and resembling a pidgin.
Then younger children arrived, children for whom this contact language was one of their first exposures to any conventional communication system. Ann Senghas of Columbia University has studied the Idioma de Señas de Nicaragua (ISN) longitudinally over more than two decades. Her key finding, published in Science in 2004 with colleagues, was that the second cohort of learners did not simply adopt the system of the first cohort. They systematised and elaborated it. Specifically, they developed consistent use of spatial grammar, the use of locations in signing space to mark grammatical relationships, a feature present in most natural sign languages but absent or inconsistent in the first cohort's signing.
This is striking in the extreme. The second generation received as input the relatively ungrammaticalised signing of the first generation. Their output was more grammatically complex. They did not simply learn what they were given; something in their language-learning apparatus imposed additional structure on the input. The language became more organised in the generation of learners who were acquiring it as a first language, rather than as a secondary contact system.
The Nicaraguan case is the closest thing available to a controlled natural experiment in language genesis. It demonstrates that human children, given communicative contact with other humans, will spontaneously generate a fully structured language. They do not need a model; they create one. They do not need instruction; they impose grammatical organisation on communicative input. And the language they create in a generation or two bears all the hallmarks of a natural human language: systematic phonology (in this case, cheirological organisation, the structure of hand shapes and movements), morphology, syntax, and displacement.
Derek Bickerton's related observation concerns creole languages, the languages that emerge when populations of speakers of mutually unintelligible languages are thrown into contact (historically, most often under conditions of forced labour). A pidgin develops first: a minimal contact language used for practical communication, with reduced grammar, no native speakers, and heavy reliance on lexical meaning rather than grammatical structure. When children are born to pidgin-speaking parents and acquire the pidgin as a first language, something remarkable happens: they systematically expand it into a full, grammatically complex language, a creole.
Bickerton's observation, developed in Language and Species (1990), is that creoles from different parts of the world, Caribbean English creoles, Pacific creoles, creoles arising from contact between completely different language families, share grammatical structures that their source languages do not share. He proposed a Language Bioprogram Hypothesis: that there is a default grammar built into the human language faculty, which the child's brain falls back on when a full linguistic model is unavailable. The creole is, on this view, not the product of mixing the source languages; it is the product of the bioprogram asserting itself.
Bickerton's hypothesis has been challenged; later researchers have identified both more variation among creoles and more influence of substrate languages than the bioprogram model allows. But his core observation stands: children consistently do more with their linguistic input than the input warrants. They are not passive recorders; they are active constructors. What they construct, even from impoverished models, systematically resembles natural human language.
IX
Ferdinand de Saussure's Cours de linguistique générale was not written by Saussure. It was assembled from the lecture notes of his students and published posthumously in 1916, three years after his death. The irony of the foundational text of modern linguistics being a document reconstructed from other people's records of speech, parole in the service of describing langue, would not have escaped him.
The most famous of Saussure's insights is the arbitrariness of the linguistic sign. This is not immediately obvious, and it is worth being precise about what it claims and what it does not. The linguistic sign, for Saussure, is the pairing of a signifier (the sound pattern, the acoustic image) with a signified (the concept, the mental content). The claim is that the connection between signifier and signified is arbitrary: there is no natural, intrinsic, or necessary relationship between the sound "tree" and the concept of a tree. The evidence is cross-linguistic comparison. If the connection were natural, all languages would use the same or similar sounds for the same concepts. They do not.
The implications are significant. Arbitrariness means that the mapping between sounds and meanings is a social contract, a convention maintained by the community of speakers. An individual cannot unilaterally change the meaning of a word; the community's usage determines meaning. But it also means that the specific inventory of sounds and meanings is, in principle, contingent: any sounds could mean anything. The fact that English speakers use "tree" and Mandarin speakers use "木 (mù)" for the same concept reflects historical accident and social convention, not natural necessity.
The distinction between langue and parole, the system and the individual performance, solved a methodological problem that had been invisible to the historical linguists who preceded Saussure. Language, as actually encountered, is messy: people mispronounce, hesitate, make slips of the tongue, use idioms incorrectly, invent new words, and fail to apply rules they manifestly know. If linguistics tries to describe this actuality, it loses the object of study in the noise.
Saussure's move: the object of linguistics is not parole but langue, the abstract system of differences and relations that makes communication possible. Langue exists between minds, in the space of social convention. No individual speaker completely embodies it; it is a collective fact about a speech community. This is why it is possible to write the grammar of a language, a description of the system, despite no individual speaker knowing or applying that system perfectly.
The consequences of this distinction extended far beyond linguistics. Structuralism, the intellectual movement that dominated French and European thought from the 1950s through the 1970s, applied the langue/parole distinction to anthropology (Claude Lévi-Strauss: myth as a system of differences), to literary theory (Roland Barthes: the distinction between the work and the text), and to psychoanalysis (Jacques Lacan: the unconscious structured like a language). Whether these extensions were philosophically valid is a separate question; that Saussure's foundational move made them possible is not in doubt.
Saussure's other major contribution was the argument that linguistic meaning is relational rather than intrinsic. The meaning of a word is not a property it has independently; it is determined by its relationship to other words in the system. "Sheep" means what it means partly because English has "mutton." French does not have this distinction; French uses mouton for both the animal and the meat. The system carves up conceptual space through differences, not through positive content.
This relational theory of meaning anticipates contemporary work in distributional semantics, the approach that defines word meaning by the contexts in which it appears, and which underlies the word-embedding models used in modern natural language processing. The specific technical implementations are different, but the underlying intuition, that meaning is relational and that words mean what they mean in relation to other words, has proven remarkably durable.
X
The capacity for language is biological, but it requires environmental input to be expressed, and it requires that input during a specific developmental window. This is the Critical Period Hypothesis, first formally articulated by the neurologist Eric Lenneberg in his 1967 book Biological Foundations of Language. The hypothesis proposes that there is a period, roughly from birth to puberty, during which language acquisition proceeds naturally, automatically, and completely. After this period, full language acquisition becomes extremely difficult or impossible, even with intensive instruction and exposure.
The evidence converges from several independent sources. Second language acquisition: adults who begin learning a second language after puberty almost never achieve native-like competency across all dimensions of the language. Their grammar may become near-native; their phonology, the ability to perceive and produce the phonemic distinctions of the second language, typically does not. Adults learning a second language retain an accent because the perceptual categories of the first language are already fixed, and the fine motor programmes for second-language articulation are acquired late and imperfectly. This is not a question of effort or motivation; even the most dedicated adult learners show systematic differences from native speakers in phonological processing.
The lateralization evidence: in young children, damage to either hemisphere can be compensated for in language; the right hemisphere can take on language function if the left is damaged early enough. After puberty, this plasticity is substantially reduced. Lesions to the left hemisphere language areas in adulthood produce persistent aphasias; equivalent lesions in early childhood often leave language largely intact after a recovery period. The window for neural reorganization closes at approximately the same time that the window for effortless language acquisition closes.
The deprivation evidence is the most troubling and the most direct, though also the most difficult to interpret. Cases of children raised in conditions of severe language deprivation, lacking meaningful exposure to language in early childhood, consistently show the same pattern: partial acquisition is possible, but complete acquisition is not. The case known as Genie, discovered in Los Angeles in 1970 at the age of thirteen having been confined and abused in conditions of near-total linguistic isolation, is the most studied. With intensive intervention, Genie acquired a vocabulary and some grammatical structures. But the full syntactic competence of a native speaker, the automatic, effortless grammatical processing that typically-developing children achieve by age four, was never reached. Her language, carefully documented by Susan Curtiss and others, showed the asymmetric profile predicted by the critical period hypothesis: vocabulary and lexical semantics relatively accessible, syntax and morphology persistently impoverished.
Cases like Genie's cannot be clean experiments; the severity of the abuse she suffered affected many cognitive and emotional systems. But the pattern they reveal is consistent with, and convergently supported by, every other line of evidence: the brain is maximally receptive to language during childhood, and this receptivity is not simply a matter of time and opportunity. It is a biological given.
The question of what happens to children raised without language is not new. "Victor," the "wild boy of Aveyron," discovered in the forests of southern France in 1800 and brought to Paris for study and education, became a celebrated case in Enlightenment debates about human nature and the role of language in making us human. Despite the patient efforts of Jean Marc Gaspard Itard, the physician who undertook his education, Victor never acquired spoken language. He learned to read a small number of words and to communicate through gesture and written word, but language as an expressive and generative system remained beyond him.
Victor's case was not interpreted through the lens of critical period theory in 1800; that framework did not yet exist. But it raised, with urgency, the question that the critical period hypothesis would eventually answer: language is not simply a matter of intelligence or exposure. There is something biological at stake, something that time and developmental trajectory either equip or foreclose.
XI
Charles Darwin devoted a substantial and underappreciated section of The Descent of Man, and Selection in Relation to Sex (1871) to language. Eleven years after the Origin of Species, Darwin was willing to make the claim he had carefully avoided in 1859: that human beings, including their most distinctively human capacities, were products of evolution. Language was central to his argument.
Darwin's analysis of language in The Descent of Man is worth reading in the original because it anticipates, with remarkable precision and from first principles, conclusions that would require another century of research to establish properly. Darwin observes that children acquire language spontaneously and at a consistent developmental schedule, but that the specific language they acquire depends entirely on which language they are exposed to. He draws the analogy with birds: young birds of song-learning species are born with a capacity to learn the species' song, and will learn it from adults if exposed to them during a sensitive period, but will develop only an impoverished version if raised in isolation. Language is, for Darwin, an instinct for learning a language, not an instinct for any particular language. This is, in essence, the distinction between the Language Acquisition Device and any particular natural language, a distinction Chomsky would formalise ninety years later.
Darwin also observed that the diversity of languages among different human groups mirrors biological diversity. Different groups speak different languages as they have different physical traits. Languages share common ancestors; they diverge as populations separate; they go extinct as their speakers are absorbed into larger groups. Darwin was making explicit and deliberate analogies between linguistic and biological evolution. He compared the appearance of new words through "variation" with biological mutation, the "struggle for existence" between competing linguistic forms in contact situations with natural selection, and the "extinction" of languages with biological extinction.
The analogy is imperfect; linguistic evolution differs from biological evolution in important respects, particularly in the possibility of horizontal transfer (languages borrow from each other in ways that genes between separate species typically do not). But Darwin's insight that language is a biological phenomenon subject to evolutionary analysis, not a purely cultural invention, was foundational. August Schleicher had been developing a biological model of language evolution contemporaneously; Darwin's endorsement placed evolutionary linguistics on more secure theoretical ground.
Darwin lacked the tools, genetics, comparative neuroanatomy, cognitive psychology, that would make his intuitions testable and precise. But his basic characterisation has proven durable: language is a biological capacity, as much a part of the human species' biological profile as the upright gait or the precision grip. That it is expressed through cultural transmission, through specific languages that must be learned, does not make it any less biological. It means that the biology in question is of an unusual kind: a capacity whose realisation depends on, and is shaped by, the social environment in which it develops.
Charles Darwin
1809 – 1882 · The Descent of Man, 1871
"Man has an instinctive tendency to speak, as we see in the babble of our young children; whilst no child has an instinctive tendency to brew, bake, or write. Even the deaf and dumb invent instinctively certain signs by which they can hold converse with each other." Darwin's observation prefigures the distinction between the biological capacity for language and the culturally specific forms it takes, and his inclusion of spontaneous sign language invention among congenitally deaf individuals anticipates the Nicaraguan case by over a century.
XII · Synthesis
Taking stock of what the evidence, taken together, actually establishes. Not what is speculated, not what is dramatically proposed, but what is genuinely supported and what remains open.
Language is a biological phenomenon, not merely a cultural tool. Every known human society has it. No known human group lacks it. Children deprived of linguistic input in early development do not invent language independently from scratch (Genie, Victor), but groups of children brought into communicative contact do spontaneously generate it (Nicaragua). Individual languages are learned; the capacity for language is biological. This much is not seriously contested.
Language is built on a neural architecture that is partly shared with other primates and partly uniquely human. Broca's area, Wernicke's area, and the arcuate fasciculus have homologues in chimpanzee brains; the human versions are enlarged, more lateralised, and more elaborately connected. The descended larynx is a uniquely human anatomical specialisation with a real fitness cost, maintained by selection because of the acoustic flexibility it enables. The FOXP2 gene has a human-specific variant associated with the fine neuromotor control of articulation and with grammatical processing; its Neanderthal version appears to have been identical to ours.
The formal properties of language, particularly its recursivity and productivity, distinguish it from every other known communication system by more than degree. The gap between what Kanzi can do and what a human four-year-old can do is structural, not quantitative. The poverty-of-the-stimulus argument remains unrefuted: children acquire grammatical knowledge their input does not fully support. Whether this gap is best explained by an innate universal grammar (Chomsky, Pinker) or by domain-general learning capacities applied to rich structured input (Tomasello, Evans, Levinson) is genuinely unresolved.
The when question is honest in its uncertainty. The anatomical prerequisites for language were likely in place for at least 200,000 years. The unambiguous archaeological signature of fully modern symbolic cognition, Chauvet, the Hohle Fels flute, Blombos Cave, appears by 40,000 to 80,000 years ago. Whether behavioural modernity arrived suddenly or gradually, and what role Neanderthals had in linguistic culture, remain open.
The why question has no single answer, and the competing answers illuminate different aspects of the phenomenon. Language is simultaneously a technology for managing large social groups (Dunbar), a system for internal computation and thought (Chomsky), and a cooperative communicative act grounded in shared intentionality (Tomasello). These accounts are not mutually exclusive in every respect; the selective pressure that Dunbar identifies, the cognitive prerequisite that Tomasello identifies, and the formal architecture that Chomsky identifies can all be simultaneously true. But they embody fundamentally different commitments about what language essentially is.
The most important thing that the universality of language tells us about human nature is not that all humans share a common cognitive endowment, though they do, but that the medium of human thought is inherently social. Language did not evolve in isolated brains. It evolved between them.
The strangeness noted at the beginning of this artifact deserves one more pass. Two humans sit across from each other. One produces shaped air. The other reconstructs a thought. This happens billions of times a day across thousands of languages in every corner of the planet. What makes it possible is a neural architecture several hundred thousand years in the making, built on top of primate social cognition perhaps six million years old, and assembled by the same blind process of variation and selection that built the peacock's tail and the immune system.
The difference is that the peacock cannot think about its tail. Language is the capacity through which we examine all other capacities, including, with the particular difficulty and particular reward of turning a system on itself, language itself. Every subsequent artifact in this curriculum examines something that language makes possible. The study of those things begins, inescapably, here.
Charles Hockett (1916–2000): "The Origin of Speech," Scientific American, 1960. Thirteen design features of human language. · Ferdinand de Saussure (1857–1913): Cours de linguistique générale, 1916 (posth.). Arbitrariness of the sign; langue/parole. · Paul Broca (1824–1880): Identification of Broca's area, 1861. · Carl Wernicke (1848–1905): Identification of Wernicke's area, 1874. · Philip Lieberman (b. 1934): Vocal tract evolution and the descended larynx. · Angela Friederici (b. 1948): Neuroscience of language processing; dorsal/ventral pathway distinction. · Cecilia Lai et al.: FOXP2 mutation discovery, Nature, 2001. · Johannes Krause et al.: Neanderthal FOXP2 sequencing, Current Biology, 2007. · Allen and Beatrix Gardner: Washoe ASL studies, 1966 onward. · Sue Savage-Rumbaugh (b. 1946): Kanzi lexigram studies. · Herbert Terrace (b. 1936): Nim Chimpsky study; critique of ape language research, 1979. · Robin Dunbar (b. 1947): Social Brain Hypothesis; Grooming, Gossip, 1996. · Noam Chomsky (b. 1928): Universal Grammar; Language Acquisition Device; Minimalist Program, 1995; Merge. · Marc Hauser, N. Chomsky, W.T. Fitch: FLN/FLB distinction, Science, 2002. · Michael Tomasello (b. 1950): Shared intentionality; Cultural Origins, 1999. · Steven Pinker (b. 1954): The Language Instinct, 1994. · Eric Lenneberg (1921–1975): Critical Period Hypothesis, Biological Foundations of Language, 1967. · Ann Senghas: Nicaraguan Sign Language acquisition; grammatical elaboration across cohorts. · Derek Bickerton (1926–2018): Language Bioprogram Hypothesis; creole genesis. · Daniel Everett (b. 1951): Pirahã and the recursion controversy. · Nicholas Evans & Stephen Levinson: "The myth of language universals," Behavioral and Brain Sciences, 2009. · Charles Darwin (1809–1882): The Descent of Man, 1871. Language as a biological instinct.
The first movement establishes why language exists at all. The next one turns from origin to internal design: the architecture of sounds, grammar, and linguistic form.
Next ArtifactII. The Structure of Language