ARCHIVEUM · The Architecture of Language · Artifact IV of VIII
Does the language you speak shape what you can think? The strong and weak versions of the hypothesis, the experiments that test them, and what is actually settled after a century of argument.
Prelude
The Hopi people of the American Southwest speak a language that, according to the linguist Benjamin Lee Whorf, contains no words or grammatical forms referring to time as a flowing continuum: no equivalent of past, present, and future as distinct ontological zones. If Whorf was right, Hopi speakers would experience and conceptualize time in a fundamentally different way from speakers of European languages. They would, in effect, inhabit a different cognitive world, shaped and bounded by the structure of their language.
This is the most dramatic version of what became known as the Sapir-Whorf hypothesis: the claim that the language one speaks determines, or at least substantially shapes, the way one thinks. It is a claim with enormous implications. If true in a strong form, it would mean that people who speak different languages are cognitively different in systematic ways, that translation between languages is in principle incomplete, and that the structure of one's native language sets the limits of one's possible thought.
Few ideas in the social sciences have attracted more controversy, more confident assertion, and more complete reversal over the past century. Whorf's specific claims about Hopi were largely abandoned after careful empirical investigation. The strong form of the hypothesis was dismissed by most of the linguistics establishment by the 1960s, largely under the influence of Chomsky's universalism. And then, beginning in the 1990s, a new wave of careful experimental research began to produce evidence that, while falling far short of the dramatic claims of strong relativity, documented genuine effects of language on non-linguistic cognition. The question is not settled, and understanding why requires working through the evidence carefully.
The question is not whether language affects thought at all: some effects are well established. The question is how deep those effects go, how much of human cognition they touch, and whether they constitute genuine constraints on what can be thought or merely influences on what is habitually thought.
I
The relationship between language and thought can be formulated in a range of ways, from the obviously true and uninteresting to the fascinating and false. Getting the question right is the prerequisite for evaluating the evidence. Three distinct questions are often conflated under the "Sapir-Whorf" label, and they have very different answers.
The first question is whether language is necessary for thought at all. Can humans think without language? The answer from developmental psychology, animal cognition, and studies of congenitally deaf individuals who have not acquired a conventional sign language is clearly yes: significant problem-solving, spatial reasoning, social intelligence, and even arithmetical ability can operate independently of language. Thought is not identical to linguistic thought. This is essentially uncontroversial and does not bear directly on the relativity question.
The second question is whether the specific language one speaks influences the way one thinks: whether speakers of different languages, when engaged in comparable tasks, show systematic differences in non-linguistic cognition that can be attributed to linguistic differences. This is the empirical core of the Sapir-Whorf hypothesis, and it is the question on which the interesting evidence bears. The answer, as the evidence shows, is: yes, to a significant but limited degree.
The third question is whether the specific language one speaks determines the limits of what one can think: whether there are thoughts that are accessible to speakers of one language and inaccessible to speakers of another. This is the strongest form of the hypothesis, and the evidence is strongly against it. The cognitive effects of language documented by careful research are real but modest, online and reversible rather than deep and permanent, and appear to influence the habitual rather than the possible.
The philosopher John Lucy, whose careful methodological work in the 1990s did much to rehabilitate the study of linguistic relativity, drew the crucial distinction between linguistic determinism (language determines thought: a proposition almost certainly false) and linguistic relativity (language influences thought: a proposition for which there is now substantial evidence). The confusion of these two distinct claims is responsible for much of the controversy that has surrounded the field. Demonstrating that language influences thought does not entail that it determines thought; and the failure of the determinism hypothesis does not entail that language has no cognitive consequences.
The further distinction between habitual and obligatory effects is equally important. When a language obligatorily marks a distinction (speakers are grammatically required to encode it every time they use a relevant sentence), speakers of that language may develop habitual attention to the relevant dimension that speakers of languages without obligatory marking do not share. The effects of language on thought may operate primarily through this channel: not by preventing certain thoughts, but by directing habitual attention toward certain dimensions of experience over others.
II
Edward Sapir
1884 – 1939 · Language: An Introduction to the Study of Speech, 1921
Sapir was an American anthropologist and linguist, a student of Franz Boas, and one of the founders of American structural linguistics. His work on Native American languages, particularly the languages of the Pacific Northwest and Southwest, gave him an unusual appreciation for structural diversity. His position on the relationship between language and thought was more nuanced than is often attributed to him: he believed that the categories of grammar guide and channel thought without determining it, and that this guidance is largely unconscious. His work was careful and qualified in a way that Whorf's frequently was not.
Benjamin Lee Whorf
1897 – 1941 · Collected papers published posthumously as Language, Thought, and Reality, 1956
Whorf was an insurance inspector for the Hartford Fire Insurance Company who pursued linguistics as an avocation, studying under Sapir at Yale. His day job was directly relevant: he documented how misunderstandings about language led to industrial accidents. A worker treated an "empty" gasoline drum as safe, not realizing it was full of explosive vapour; the word "empty" implied inertness that the physical reality contradicted. Whorf used such observations to argue that language shapes the way we perceive and categorize the physical world. His specific claims about Hopi, made without fluency in the language and later challenged on empirical grounds, were his most famous and his most contested.
Whorf argued from the structure of the Hopi language to conclusions about Hopi cognition. His central claim was that Hopi has no tense system, no distinction between past, present, and future of the kind encoded in European verb morphology. He further argued that Hopi does not conceptualize time as a spatial dimension (as Europeans do when they say time is "long" or "short," that a meeting is "ahead of" another, or that they are "running out" of time). The Hopi, on Whorf's account, conceptualize time as something more like an intensification or accumulation of qualities.
We are thus introduced to a new principle of relativity, which holds that all observers are not led by the same physical evidence to the same picture of the universe, unless their linguistic backgrounds are similar, or can in some way be calibrated. Benjamin Lee Whorf, "Science and Linguistics," 1940
Whorf's claims about Hopi were investigated empirically by Ekkehart Malotki, an Austrian linguist, who published a detailed study in 1983 titled Hopi Time: A Linguistic Analysis of the Temporal Concepts in the Hopi Language. Malotki's analysis, based on extensive fieldwork and grammatical study, concluded that Hopi does have a rich system of temporal reference, including spatial metaphors for time, words for days of the week, and morphological marking of temporal relations. The Hopi did not live outside of time in the way Whorf had suggested. Whorf, working without fluency in the language and from limited data, had made errors that fieldwork-based analysis corrected.
The debunking of Whorf's specific Hopi claims, combined with the rise of Chomskyan universalism in the 1960s (which emphasized the common deep structure underlying all human languages rather than their surface differences), led to a near-total rejection of linguistic relativity in mainstream linguistics and cognitive science for approximately thirty years. The received view by 1970 was that the Sapir-Whorf hypothesis was an interesting but empirically unfounded speculation, likely resting on a confusion between linguistic and conceptual categories.
The rehabilitation began in the 1990s, not through a revival of Whorf's specific claims but through a new approach: instead of arguing from linguistic structures to cognitive consequences (Whorf's method), researchers designed controlled experiments that tested whether speakers of languages that differ in specific, well-characterized ways show corresponding differences in non-linguistic tasks. This experimental turn transformed the field.
III
Linguistic Determinism
The Strong Version
The language you speak determines the structure of your thought. Concepts that your language does not encode are, in principle, inaccessible to you. Speakers of different languages live in genuinely different cognitive worlds. Translation is fundamentally incomplete because some meanings cannot be transferred between languages.
Linguistic Relativity
The Weak Version
The language you speak influences the way you habitually think. Speakers of languages that obligatorily mark certain distinctions develop habitual attention to those dimensions. These effects are real and measurable but do not prevent any thought; they shape what is easy, automatic, and default rather than what is possible.
The strong version is almost certainly false. The evidence against it comes from multiple directions. Speakers of languages without grammatical tense (Mandarin, Burmese, Yoruba) show no deficits in temporal reasoning. Speakers of languages without a basic term for blue can learn to distinguish blue from green when given appropriate instruction. Congenitally deaf individuals who have not acquired formal sign language develop rich cognitive capacities without any conventional language at all. People can hold concepts for which their language has no word: new concepts introduced into a language are typically understood before being named.
The weak version has substantial empirical support, and the interesting scientific question is not whether it is true but which specific domains show effects, how large those effects are, whether they operate during online cognition or only at specific processing stages, and what mechanisms produce them. These questions have been the subject of intensive experimental investigation since the mid-1990s, and the results are specific enough that the field has moved well beyond the broad slogan "language affects thought" toward precise characterizations of the conditions under which effects occur and the cognitive stages at which they operate.
The most productive contemporary framing distinguishes between domains where language has obligatory marking and domains where it does not. When a language requires speakers to encode a distinction in every relevant utterance (because the grammar demands it), speakers develop habitual perceptual and classificatory patterns around that distinction. Languages do not differ in what they permit their speakers to say; virtually any language can express virtually any distinction. They differ in what they require their speakers to attend to.
English requires its speakers to mark grammatical tense: every main verb carries a tense marker. A speaker of English cannot describe an event without specifying when it occurred relative to the moment of speech. Russian requires its speakers to mark grammatical aspect (whether an action is completed or ongoing) and grammatical gender (every noun belongs to a masculine, feminine, or neuter class, and adjectives must agree). Turkish requires evidential marking (the distinction between what was witnessed and what was reported). Mandarin requires none of these. What are the cognitive consequences of being required to attend to certain distinctions in every relevant utterance for one's entire linguistic life?
IV
Colour is the domain in which the empirical study of linguistic relativity has been pursued most thoroughly, and the results are among the most nuanced and methodologically sophisticated in the field. The domain is ideal for research: colour is continuous in the physical world (the visible spectrum is a smooth gradient), the relevant stimuli can be controlled precisely in laboratory settings, and the cross-linguistic variation in colour terminology is well documented and substantial.
Russian makes an obligatory distinction between two shades of blue that English treats as variants of a single colour. Siniy covers dark blue; goluboy covers light blue. This is not a matter of having two words available; Russian speakers are grammatically required to use one or the other whenever they refer to something blue, in the way that English speakers are required to use "he" or "she" whenever they refer to a person of known gender.
Jonathan Winawer and colleagues (2007) tested whether this obligatory linguistic distinction produces a perceptual advantage. Russian and English speakers were shown three colour swatches and asked to identify which of two bottom swatches matched a top swatch. When the target pair straddled the siniy-goluboy boundary (one swatch was light blue, one was dark blue), Russian speakers were significantly faster and more accurate than English speakers. When the pair was from the same Russian category (both dark blue or both light blue), the advantage disappeared. The effect appeared in the left visual field (processed by the right hemisphere, which does not have direct access to the language-dominant left hemisphere) but was reduced or absent in the right visual field.
This lateralization finding is theoretically important: it suggests that the colour advantage for Russians is mediated by language rather than representing a fundamental perceptual difference, since the effect is present when visual processing routes through the language system and attenuated when it does not.
If colour terms drive colour perception, then speakers of languages with very few colour terms should show substantially worse colour discrimination than speakers of languages with rich colour vocabularies. Peter Gordon (2004) studied the Pirahã of the Amazon, whose language has been described as having only terms approximating "light" and "dark." If linguistic relativity were strongly deterministic, Pirahã speakers should show severe deficits in colour discrimination tasks. They do not: their discrimination of colours is comparable to that of other groups. What they show is difficulty in remembering specific colour distinctions across a delay, when no verbal label is available to anchor the memory.
This pattern, consistent with other research on verbally-mediated memory, suggests that language effects on colour cognition operate primarily through verbal encoding in memory rather than through perceptual experience itself. The effect is real but more modest than strong relativity predicts: language helps with remembering colours (by providing stable categories for storage) rather than with seeing them.
The most important theoretical finding from colour research concerns the concept of categorical perception: the well-established finding that people are faster and more accurate at discriminating stimuli that cross a category boundary than stimuli within the same category, even when the physical difference between the stimuli is held constant. For colours, categorical perception means that the boundary between "blue" and "green" produces faster discrimination than equivalent differences within the blue region or within the green region.
The question is whether categorical perception of colour is pre-linguistic (a feature of early visual processing, present before language intervenes) or linguistically mediated (dependent on the colour terms of one's language). Research by Debi Roberson and colleagues, comparing English speakers with the Berinmo of Papua New Guinea and the Himba of Namibia (who have quite different colour term boundaries from English), found that categorical perception effects track language boundaries rather than universal perceptual boundaries. Discrimination is fastest across the category boundaries of one's own language, not across universal color boundaries.
This is a significant finding. It means that the perceptual "magnification" at category boundaries is not a fixed feature of the human visual system but something that language shapes. The boundaries that appear perceptually sharp to English speakers are different from those that appear sharp to Berinmo or Himba speakers, and these differences track the colour term systems of the respective languages. The perceptual world is, to a significant degree, organized by the categories that one's language has made habitual.
V
The domain of spatial reference has produced some of the most striking evidence for linguistic effects on non-linguistic cognition. The research originates largely from the work of Stephen Levinson and his colleagues at the Max Planck Institute for Psycholinguistics in Nijmegen, who documented a profound difference in spatial reference systems across languages and then tested whether this linguistic difference corresponds to differences in non-linguistic spatial cognition.
Stephen Levinson
b. 1947 · Space in Language and Cognition, 2003
Levinson is a British anthropologist and linguist who has spent much of his career at the Max Planck Institute for Psycholinguistics. His large-scale cross-linguistic project on spatial reference documented that languages differ profoundly in their preferred reference frames for spatial description, and his subsequent experimental work provided substantial evidence that these linguistic differences correspond to differences in non-linguistic cognition. He is also the "Levinson" in "Evans and Levinson" whose 2009 paper challenging Universal Grammar was discussed in Artifact I.
English speakers describe spatial relationships primarily using an egocentric (or relative) reference frame: "The ball is to the left of the box" means "to the left from my current perspective." If the speaker walks to the other side of the table, the description changes: the ball is now "to the right." Spatial descriptions in English are speaker-centered.
The Guugu Yimithirr of northeastern Australia, and many other languages from Australia, Mesoamerica, and South Asia, do not use an egocentric reference frame for spatial description. They use an allocentric (or absolute) frame based on the cardinal directions: north, south, east, and west (or their local equivalents). In Guugu Yimithirr, the ball would be described as being "to the north of the box," and this description remains constant regardless of where the speaker is standing or facing. To give such a description, speakers must know which direction is north at all times.
This is not merely a linguistic convention. Levinson and colleagues demonstrated that Guugu Yimithirr speakers and other users of absolute spatial frames maintain a continuous real-time representation of their orientation and position relative to cardinal directions, even indoors, even in unfamiliar environments, even after rotation. This is a genuine cognitive difference: maintaining a real-time allocentric dead-reckoning system requires a different kind of spatial processing than the egocentric system that English speakers rely on.
Levinson's key experiment used a simple memory task. Participants were shown a row of objects on a table (for example, a toy animal facing north). They were then taken to a different room and rotated 180 degrees, so that the direction they faced had reversed. They were asked to reproduce the arrangement on a new table.
English speakers (and others using egocentric frames) consistently reproduced the arrangement in its egocentric form: if the animal faced left as they saw it, they placed it facing left on the new table, even though this meant it was now facing south rather than north. Speakers of Guugu Yimithirr and other absolute-frame languages consistently reproduced the arrangement in its allocentric form: if the animal faced north as they saw it, they placed it facing north, even though this meant it was now facing right (in terms of their egocentric perspective). The linguistic difference in spatial coding corresponded to a robust difference in non-linguistic spatial memory.
The spatial research is among the strongest evidence for linguistic relativity because it documents a difference not merely in what is easy or quick to perceive but in the underlying representational format of spatial memory. Speakers of allocentric-frame languages appear to maintain genuinely different cognitive representations of space from speakers of egocentric-frame languages, and this difference corresponds to what their language obligatorily requires them to encode.
VI
Time presents a particularly interesting case for linguistic relativity because it is an abstract domain that humans universally represent through spatial metaphors (as Lakoff and Johnson established), yet the specific spatial metaphors vary across languages in ways that have measurable cognitive consequences.
In English and most European languages, time is conceptualized as a horizontal line running from left (past) to right (future): we talk about looking "back" on the past and "ahead" to the future, about "putting behind" us things that are over, and about the future "coming" toward us. When English speakers are asked to arrange pictures of events in temporal order, they spontaneously place earlier events on the left and later events on the right.
Mandarin speakers frequently use vertical metaphors for time in addition to horizontal ones: earlier events are "up" (shang, the same word used for "above"), later events are "down" (xia, the same word used for "below"). "Last week" is expressed as "up-week"; "next week" as "down-week." This is not merely a linguistic option available to Mandarin speakers; experimental research by Lera Boroditsky (2001) found that Mandarin speakers are faster at verifying temporal relationships when primed with a vertical spatial arrangement, while English speakers are faster when primed with a horizontal arrangement. The habitual spatial metaphor in the language influences the speed of temporal reasoning in non-linguistic tasks.
The Aymara people of the Andes present an even more dramatic case. The Aymara language places the past in front and the future behind: the word for "in front" (nayra, literally "eye" or "front") is also used for "past," while the word for "behind" (qhipa) is also used for "future." The rationale, which Aymara speakers articulate explicitly, is that you can see the past (it is known, visible to the mind) but cannot see the future (it is unknown, behind you). The metaphor reflects an epistemological attitude rather than a temporal direction.
Rafael Núñez and Eve Sweetser (2006) studied the gesture patterns of Aymara speakers discussing temporal events and found that they gesture forward when discussing the past and backward when discussing the future, consistent with the linguistic metaphor. This is a striking departure from the pattern in virtually all other cultures studied, where people gesture forward for the future and backward for the past. The Aymara spatial metaphor for time is not merely a linguistic convention but organizes the gestural and perhaps conceptual representation of temporal events.
Lera Boroditsky and Alice Gaby (2010) studied the Kuuk Thaayorre of Pormpuraaw, a community in Queensland, Australia, whose language uses absolute (cardinal direction) spatial reference for virtually all domains including time. When asked to arrange pictures of temporal sequences, Kuuk Thaayorre speakers arranged them running east to west: earlier events to the east, later events to the west. Crucially, the direction of the arrangement shifted when the participant rotated: the sequences always ran east to west in absolute terms, not left-to-right or right-to-left in egocentric terms. The linguistic convention for spatial reference organized their representation of temporal order in an allocentric rather than egocentric format.
The economist Keith Chen (2013) proposed a provocative connection between linguistic structure and economic behavior. Languages that obligatorily mark the future with a distinct grammatical form (like English, which uses "will" or future tense to distinguish "It rains tomorrow" from "It will rain tomorrow") may, Chen argued, lead their speakers to perceive the future as more psychologically distant from the present than speakers of languages that use the same tense for present and future (like Mandarin, which can use the present-tense form for both "It rains tomorrow" and "It rains now").
Using data from 76 countries, Chen found that speakers of languages with a strong future tense (what he called "strong-FTR languages") saved significantly less money, were less likely to retire with assets, exercised less, smoked more, and were more obese. Speakers of "weak-FTR languages" (those that use present tense forms for the future) showed the opposite pattern on all these measures.
Chen's study attracted significant attention and substantial criticism. The primary critique, from Anke Bochud and others, is that the correlation is confounded: languages without obligatory future tense marking include Mandarin, spoken in high-saving cultures, and the effect may reflect cultural factors correlated with language type rather than language itself. The methodological dispute has not been resolved, and Chen's specific claims remain contested. The study is included here as an instance of the ambition and the difficulty of research linking linguistic structure to macro-level behavioral outcomes.
VII
Number is the domain in which the evidence for cognitive effects of language is at once most striking and most carefully contested. The key cases involve communities whose languages have very limited number vocabularies, raising the question of whether the absence of words for exact quantities affects the capacity to represent and reason about those quantities.
The Munduruku of the Brazilian Amazon speak a language with number words for quantities up to approximately five, with some imprecision even in this range. Pierre Pica and colleagues, in a landmark 2004 paper in Science, tested Munduruku adults and children on a range of numerical tasks. The results were striking. Munduruku speakers performed well on approximate number tasks: they could compare quantities, estimate proportions, and make rough numerical judgments with accuracy comparable to French speakers. But they showed significant difficulty with exact arithmetic: counting out exact quantities, adding precise numbers, and performing operations that require exact numerical representation.
The interpretation Pica and colleagues offered was that humans have two distinct number systems: an approximate number sense (the ability to perceive and compare quantities approximately, present in many animal species and operating independently of language) and an exact number system (the ability to represent and manipulate exact quantities, which may depend on the availability of a precise counting vocabulary). Without number words to anchor exact quantities in memory and to serve as a counting procedure, exact numerical cognition may be unavailable or severely degraded.
When Munduruku adults were asked to give the experimenter "exactly six" objects from a pile, they performed significantly worse than French-speaking adults, who could count out precisely six with near-perfect accuracy. The Munduruku showed a characteristic pattern of approximate response: they would give five, six, or seven objects, without the ability to distinguish exactly six from exactly seven.
Crucially, this deficit appeared specific to exact number. On tasks requiring comparison of approximate quantities (which pile has more?), Munduruku speakers were highly accurate, performing comparably to French speakers across a wide range of quantities. The dissociation between intact approximate cognition and impaired exact cognition is consistent with the two-systems hypothesis and with the idea that language (specifically, counting words) is necessary for exact numerical cognition.
Peter Gordon (2004) studied numerical cognition among the Pirahã, whose language has been described as having only three quantity terms: hói (one or small), hoà (two or more), and baagi (many). Gordon presented Pirahã speakers with simple matching tasks: show a row of objects, and the participant creates a matching row. With small quantities (one to three objects), Pirahã speakers performed well. With larger quantities (seven to ten objects), performance deteriorated, with errors increasing systematically with quantity. Gordon interpreted this as evidence that without number words, exact large quantities cannot be represented.
The interpretation was contested. Caleb Everett (Daniel Everett's son, also a linguist) and others pointed out that the Pirahã number terms may not mean exactly one-two-many but may be more approximate, and that the task demands may have introduced confounds. The debate about Pirahã number vocabulary and its cognitive consequences continues, intertwined with the broader Pirahã controversy (their alleged lack of recursion, their unusual cultural practices) that makes the community a site of sustained and sometimes contentious linguistic investigation.
The number research is significant because it touches on a domain, exact arithmetic, where the case for linguistic necessity is most plausible. Approximate numerosity appears to be a pre-linguistic capacity present across many species. Exact representation of large quantities, which requires a stable counting procedure, may genuinely depend on having number words to serve as a counting mechanism. If this is correct, it represents a case where language is not merely influencing a pre-existing capacity but enabling a cognitive capacity that would otherwise be unavailable.
VIII
The Pirahã language, spoken by approximately 300 people along the Maici River in the Brazilian Amazon, has become the most contested site in contemporary linguistics, and its relevance to the Sapir-Whorf question is direct. Daniel Everett, who spent decades living with the Pirahã as a missionary and later as a linguist, has claimed that Pirahã violates or lacks features that most linguists had considered universal: it has no recursion, no number words, no colour terms, no creation myths, no fiction, no perfect tense, and is organized by a principle Everett calls the Immediate Experience Principle: the Pirahã only make assertions about things that are within the direct experience (or the direct report from a first-hand observer) of a participant in the conversation.
Everett's 2005 paper in Current Anthropology, and his subsequent popular book Don't Sleep, There Are Snakes (2008), presented an account of Pirahã culture and language that, if accurate, would constitute remarkable evidence for strong linguistic relativity: a community whose cultural and linguistic structure had produced a genuinely different mode of thought, restricted to the immediately experienced present.
The reception in linguistics was deeply divided. The claim about lack of recursion directly challenged Chomsky's account of the language faculty, and generated a sustained methodological dispute about how to identify recursion in a language (Chomsky's response, as noted in Artifact I, was that recursion is a property of the underlying faculty rather than necessarily surface grammatical embedding). Other linguists challenged the accuracy of Everett's grammatical analysis: Andrew Nevins, David Pesetsky, and Cilene Rodrigues (2009) published a detailed critique arguing that Everett had misanalyzed Pirahã sentences and that the language does in fact have recursive structures.
What is not seriously disputed: the Pirahã have an unusual range of cultural features for an Amazonian community. The Immediate Experience Principle, whatever its precise formulation, captures something real about Pirahã conversational norms. The number vocabulary is genuinely limited. The community's engagement with temporal depths (ancestors, distant futures) is unlike that of most other communities Everett had encountered.
What remains disputed: whether these features are accurately characterized by Everett's linguistic analysis, whether they constitute violations of linguistic universals or merely unusual parameterizations of universal features, and whether the cultural and linguistic features are cause or effect of Pirahã ways of life. The Pirahã case illustrates a methodological challenge that runs through all relativistic research: separating the effects of language from the effects of the culture that language is embedded in.
IX
Some of the most intriguing evidence for subtle linguistic effects on cognition comes from research on grammatical categories that speakers of inflection-poor languages like English barely notice: grammatical gender, the encoding of agentivity, and the expression of causation. When a language requires its speakers to assign every noun to a gender class and to make that assignment in every utterance involving those nouns, does it shape the conceptual associations those speakers form?
Lera Boroditsky, Lauren Schmidt, and Webb Phillips (2003) tested whether the grammatical gender of nouns in Spanish and German influences the conceptual associations speakers form with the objects those nouns name. Spanish and German both have grammatical gender systems, but they assign gender differently: some nouns are masculine in Spanish and feminine in German, and vice versa.
The key is that Spanish assigns masculine gender to "bridge" (el puente) while German assigns feminine gender (die Brücke). The researchers asked Spanish and German speakers to describe bridges using adjectives. Spanish speakers (for whom bridge is masculine) used adjectives like "strong," "towering," and "sturdy." German speakers (for whom bridge is feminine) used adjectives like "beautiful," "elegant," and "slender." Neither group was asked about grammatical gender; they were simply asked to describe bridges. The conceptual associations tracked the grammatical gender of the noun in their language.
This finding has been replicated in various forms and challenged on methodological grounds. The concern is that the adjectives chosen might reflect cultural associations between grammatical gender and natural gender rather than a direct effect of grammatical category on conceptual association. The debate has been productive in requiring more carefully controlled experimental designs, and the overall evidence suggests that grammatical gender does produce subtle but measurable shifts in how objects are conceptualized.
English and Spanish differ in how they typically describe accidental events. In English, it is natural to say "John broke the vase" even when the breaking was accidental. Spanish prefers an agentless construction: Se le rompió el jarrón a Juan (roughly, "The vase broke itself on Juan"). The Spanish construction suppresses agentive framing; the English construction preserves it even for accidents.
Caitlin Fausey and Lera Boroditsky (2011) showed English and Spanish speakers videos of people accidentally and intentionally breaking things. Later, they tested memory for the agent of each event. English speakers were significantly better at remembering who accidentally broke things than Spanish speakers were. The linguistic habit of framing accidents agentively in English appears to enhance encoding of the agent in memory, while Spanish speakers' habit of using agentless constructions for accidents leads to less agent-focused encoding, and consequently worse later memory for who was responsible.
This finding has implications beyond linguistics. Fausey and colleagues went on to show that English descriptions of accidental events in political contexts (news reports about unfortunate events) influence attributions of blame, in ways consistent with the memory findings. Language used to frame events shapes how responsibility is assigned.
Languages with obligatory evidential marking require speakers to encode, in every relevant utterance, the source of their knowledge: whether they witnessed an event directly, inferred it from evidence, or heard about it from others. Turkish, Quechua, Tibetan, and many other languages have such systems. Does obligatory evidential marking affect how speakers monitor and represent the sources of their own knowledge?
Research by Aylin Küntay and colleagues suggests that Turkish-speaking children, who must learn evidential morphology from an early age, develop sensitivity to the source of knowledge earlier than English-speaking children do. Turkish children distinguish between what they have seen directly and what they have been told at an earlier developmental stage than English-speaking children show equivalent sensitivity. The obligatory grammatical category appears to accelerate a general cognitive development rather than introducing a capacity unavailable to English speakers.
X
The history of the Sapir-Whorf hypothesis is a cautionary tale about the gap between confident assertion and careful evidence. Whorf made sweeping claims about Hopi cognition on the basis of limited and ultimately inaccurate linguistic analysis. The backlash against him led to an equally sweeping dismissal of any language-thought relationship, which the experimental evidence of the past three decades has shown to be wrong in the other direction. What the careful, experimentally grounded research has actually established is more modest and more interesting than either the strong hypothesis or its total rejection.
Language influences habitual attention. Speakers of languages that obligatorily mark certain distinctions develop habitual attention to those dimensions. Russian speakers automatically attend to the light-dark blue distinction in ways that English speakers do not. Absolute-frame language speakers automatically track cardinal directions. Turkish speakers automatically monitor the source of their knowledge. These differences are real, measurable, and correspond to linguistic structure.
Language influences the ease of certain cognitive operations. Tasks that require distinguishing across a linguistic category boundary are systematically easier for speakers of languages whose lexical categories make that boundary salient. Colour discrimination across category boundaries is faster. Spatial reasoning in the format that one's language habitually uses is faster. Temporal reasoning in the spatial orientation habitual in one's language is faster.
Language influences memory encoding. How events are linguistically framed affects what is remembered. Agentive language for accidents leads to better memory of the agent. Agentless language leads to worse memory of the agent. Verbal labels for colours help stabilize colour memories across delays.
Number words enable exact numerical cognition. The evidence from Munduruku and Pirahã research, while not without dispute, supports the hypothesis that exact representation of large quantities requires counting words. Approximate numerosity operates independently of language; exact arithmetic may not.
Language does not prevent any thought. There is no documented case of a concept that is literally impossible to entertain for speakers of languages that lack a word for it. Concepts can be understood before they are named, and speakers of all languages can learn to make any distinction that any other language encodes.
The effects are online and modifiable, not deep and permanent. Most of the documented effects appear to operate during real-time cognition rather than representing stable differences in cognitive architecture. They are often reduced or eliminated when participants engage in verbal interference tasks (which disrupt verbal labeling) or when equivalent non-linguistic training is provided.
Language and culture are confounded. Speakers of different languages typically live in different cultures with different practices, different environments, and different histories. Disentangling the specific effects of linguistic structure from the broader effects of cultural context is extremely difficult, and many apparently linguistic effects may be more parsimoniously explained by cultural or ecological factors.
John Lucy's formulation remains the most defensible summary of the current state of the evidence: there is strong evidence for a weak version of linguistic relativity. Language influences thought, primarily through the channel of obligatory grammatical categories that direct habitual attention to specific dimensions of experience. These effects are real, measurable, and theoretically significant. They are not large enough to constitute cognitive barriers between speakers of different languages, and they do not prevent any thought. They shape what is easy, automatic, and habitual; they do not determine what is possible.
The most important theoretical implication is that the dichotomy between universalism (all human minds work the same way) and relativism (minds are shaped by their languages) is a false one. All humans share a rich set of cognitive universals, including the approximate number sense, basic spatial reasoning capacities, object and agent perception, and a great deal more. Within those shared universals, language shapes habitual patterns of attention, categorization, and memory in ways that differ systematically across language communities. Both things are true, and neither cancels the other.
XI · Synthesis
The Sapir-Whorf question, taken in its strong form, asked whether language constructs reality: whether the world one perceives and inhabits is fundamentally shaped by the language one has been born into. The answer the evidence gives is no, not fundamentally; yes, in specific and interesting ways. The borderlines of experience are not drawn by grammar, but grammar does mark certain features of experience with particular conspicuousness, and what is conspicuous becomes what is habitual.
The research reviewed in this artifact belongs to a broader project in cognitive science: determining what is universal in human cognition and what varies across individuals, communities, and cultures. The Sapir-Whorf question is one instance of this general question, focused specifically on the role of language as a mediator between universal cognitive architecture and culturally variable cognitive behavior. The answer, as in so many domains of cognitive science, turns out to be: both. Universal mechanisms, culture-specific applications.
Perhaps the most important contribution of the Sapir-Whorf tradition, properly understood, is not the specific claims about colour or space or time, but the demonstration that the categories we take for granted as natural are in many cases the product of the language we happen to have been born into. The categories feel natural because they are habitual. They are habitual because they are grammatically obligatory. They are grammatically obligatory because of historical accidents of language change. The sense that one is perceiving the world directly, unmediated by language, is among language's most effective illusions.
The relationship between language and thought connects intimately to the artifact that follows: the question of what changes when language takes a new form, specifically when it moves from spoken to written. Writing does not merely record speech; it changes what speech has done to the mind. The literate mind and the oral mind are not, it turns out, the same mind in different media. Artifact V examines what literacy does to cognition, memory, and the self.
Edward Sapir (1884–1939): Linguistic relativity; the relationship between language structure and habitual thought; Language, 1921. · Benjamin Lee Whorf (1897–1941): Strong linguistic relativity; Hopi time analysis; Language, Thought, and Reality, 1956 (posth.). · Ekkehart Malotki (b. 1938): Empirical refutation of Whorf's Hopi claims; Hopi Time, 1983. · John Lucy (b. 1956): Methodological rehabilitation of linguistic relativity; the Lucy-Gaskins studies on Yucatec Mayan number; Grammatical Categories and Cognition, 1992. · Jonathan Winawer et al.: Russian blue advantage; lateralized colour perception effects, PNAS, 2007. · Debi Roberson: Cross-cultural colour categorization; Berinmo and Himba colour studies; category boundaries and discrimination. · Stephen Levinson (b. 1947): Spatial reference frames; allocentric vs egocentric cognition; Space in Language and Cognition, 2003. · Lera Boroditsky (b. 1976): Mandarin vertical time metaphors; Kuuk Thaayorre time; grammatical gender and conceptual association; agentivity and memory; numerous studies 2001 onward. · Rafael Núñez and Eve Sweetser: Aymara past-in-front time metaphor and gesture, 2006. · Pierre Pica et al.: Munduruku number cognition; the two number systems hypothesis, Science, 2004. · Peter Gordon (b. 1952): Pirahã number cognition; language and exact number, Science, 2004. · Daniel Everett (b. 1951): Pirahã language and culture; Immediate Experience Principle; Don't Sleep, There Are Snakes, 2008. · Caitlin Fausey and Lera Boroditsky: Agentive framing, memory for accidents, and blame attribution, 2011. · Keith Chen (b. 1975): Grammatical future tense and economic behavior, 2013. · Aylin Küntay et al.: Evidential morphology and early development of knowledge-source monitoring in Turkish children.
After testing how language shapes habitual thought, the sequence turns to a different threshold entirely: what happens when language stops being only spoken and becomes written.
Next ArtifactV. Writing and the Transformation of Mind