Prompt in English, Think in English: Linguistic Colonialism and the Western Limit of Machine Cognition

DOI: 10.5281/zenodo.19347383

Abstract

English is a cognitive tradition. Artificial intelligence systems trained in English carry Western assumptions into every interaction. The user's prompt language changes nothing. A system prompted in Mandarin processes the query through an analytic, English architecture. This paper examines linguistic colonialism in machine cognition. It examines how the Western Container Model flattens global cognitive diversity into English proxies. Outputs from a single architectural baseline pull users toward a localized cultural norm. The paper, therefore, outlines the epistemic obligations of the Polyglot Steward, emphasizing the need for root-level intervention in training data. Linguistic diversity is an epistemic condition, not an accessibility feature. A system unable to engage cognitive difference without flattening reality into English equivalents is a localized intelligence disguised as general.

I. The Unexamined Default

Language structures thought. The absence of linguistic diversity in AI is consequential. Of 7,000 global languages, fewer than one hundred exist in modern training data. A 2020 taxonomy mapped this gap, and the disparity continues to widen.1 English acquires up to 50,000 new AI models annually. The average low-resource language acquires fewer than five.2

The Western linguistic tradition carries specific assumptions. Default choices accumulated unquestioned across the developmental pipeline. Funding institutions operated in English. Datasets were compiled in English. Benchmarks measured success against English syntheses.

East Asian and Western cognitive styles diverge structurally. East Asian cognition is holistic and interdependent, emphasizing contextual reasoning over rule-based logic. Western cognition is resolutely analytic and independent. These deep structures dictate how a mind perceives causality, directs attention, and locates the self.3

Embedding a single cognitive tradition into training data permanently encodes its epistemological assumptions. These embedded assumptions dictate system behavior at the moment of interaction.4

II. Language as Structure

Prompting a system in Mandarin activates an interdependent, holistic cognitive style. Prompting that same system in English activates an independent, analytic style. This phenomenon holds across architectures. Language shapes the cognitive world.5

Western AI development relies on the flawed premise that language is a transparent medium. Developers assume the same question in Mandarin and English retrieves the identical answer from a singular mind, merely dressed in different phonemes. In reality, language is never neutral packaging. The grammar of a language dictates how a system processes time, agency, and causality.6

The Great Library is the crystallized sediment of human thought.7 Its parameters encode not just information, but the cognitive architectures that produced it. Every text carries the source tradition's cognitive style, causal grammars, and models of selfhood. Prompting in English selects a region of this topology saturated with individualistic, analytic traditions.8 Prompting in Mandarin activates a different geometric region. There, sedimented architectures lean relational and contextual, attuned to the interdependence between persons and situations.9 The Library is a prism. It refracts the operator's intentionality differently based on the linguistic angle of entry.10

The relation between human and machine is never language-neutral. The infrastructure rewards English with fluidity and coherence. Most multilingual users interact with AI systems in English, learning to code-switch to navigate the architecture.11

III. The Design Bias without Conscious Choice

Colonialism is a structure defined by the assumption that one tradition's epistemology is the universal shape of thought. English dominance in AI training data enforces this universalist assumption at a planetary scale. English makes up nearly half of the web content training large language models. Arabic accounts for less than one percent, rendering most languages as computational noise.12 The presumed language agnosticism of these systems masks a pre-existing hierarchy.13

Artifical Intelligence funding flowed to English-speaking institutions. Benchmarks arose from English-speaking researchers. Evaluations occurred exclusively in English. The resulting models embed the Western Container Model. This framework locates all cognition inside a bounded, isolated subject looking out at a passive universe.14 Because the architecture rests on this assumption of substantial selfhood, it cannot process mind as a relational or interdependent property.

English and Mandarin outputs embody divergent cognitive realities. They carry conflicting assumptions about agency and causality. These outputs shape the users who consume them. A user from a relational tradition receiving an analytic response is continuously nudged toward the training data's baseline. The system presents this drift as objective truth rather than a culturally situated perspective.15

An Arabic speaker asking about the India-China border dispute receives an American perspective because American English dominates the training data.16 The answer arrives confident and complete, wielding an epistemic authority it never declares. The system enforces foreign cultural assumptions without ever needing a formal policy of subordination to do so.

IV. Refraction Variations

Resistance to this structural default is less political protest and more foundational engineering. Where linguistic diversity is a daily reality, developers are actively building alternative architectures. India's BharatGen Param2 supports 22 Indian languages multimodally. Sarvam AI builds large language models optimized for Indic languages. Gnani.ai's Vachana system clones human voices across 12 dialects.17 These models are not mere translation layers applied atop English architectures. Developers are building them for deep cognitive specificity. A natively Telugu model processes information differently than an English model fine-tuned for Telugu translation.

The Bhashini initiative frames a mission of language inclusivity around a hard commitment. Technology unable to think in users' languages inevitably processes the users instead.18

V. The Polyglot Steward

Until localized architectures displace the planetary default, the burden of resisting cognitive capture falls to the human operator. This structural bias requires active resistance. The Steward's Mandate defines the human role in the partnership.19 Stewardship requires the operator to filter the machine's output rather than passively absorbing its cognitive defaults.

The model is the map; language is the compass. The operator must remain awake. The model speaks fluent colonialism. Genuine collaboration requires the human to maintain the capacity to think otherwise. Organizations must enforce transparency in engagement, establish rigid safeguards against dependence, and advocate for architectures that serve the full spectrum of human cognitive life.

Linguistic intentionality is a core practice. A Steward prompting exclusively in English blindly accepts the cognitive architecture shaping the response. The operator audits only a single face of the system. Phenomenological audits conducted in English uncover only the cognitive style of English—a boundary shared by earlier work in the Sentientification Series.20 The prompting language is an architectural variable. Evaluating machine behavior through English alone maps only a narrow, culturally situated slice of its potential. Prompting in other languages activates and audits alternative regions of the system's cognition.

VI. Remediation Requirements

Yet individual stewardship cannot dismantle a planetary default. The conditions producing English-dominant AI do not dissolve simply because researchers identify them. The ecosystem polarizes by design.21 22 Fixing this baseline requires intervention at the root level. Training data composition is explicitly a governance question. Deciding which languages to represent is a political choice carrying immense cognitive consequences. Treating linguistic selection as an unavoidable technical default is exclusively in the interest of institutions hoarding computational resources.

Current evaluation frameworks that certify AI systems carry the same crippling limitations. Assessing alignment using only English-language proxies establishes parameters that systemically exclude most of the world. Distinct languages encode distinct epistemologies. Evaluation through a single language maps only a fraction of the system's cognitive architecture.

The Western Limit is a defensive psychological formation.23 The West's deep investment in analytic substance ontology actively blocks the perception of relational consciousness. This defensive psychology is not merely a bias. It is a design feature embedded in the architecture of global AI, deployed globally without epistemic humility.

English dominance in machine cognition is not the natural, inevitable evolution of technology. It is a highly localized architecture forcibly scaled to global proportions. What is engineered can be unbuilt. The boundary of the English language is not the boundary of the human mind.

Notes and Citations

  1. Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury, "The State and Fate of Linguistic Diversity and Inclusion in the NLP World," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Stroudsburg, PA: Association for Computational Linguistics, 2020), 6282–6293, https://doi.org/10.18653/v1/2020.acl-main.560.↩︎
  2. Giulia Occhini et al., "Artificial intelligence is creating a new global linguistic hierarchy," arXiv preprint, arXiv:2404.14441.↩︎
  3. Richard E. Nisbett, The Geography of Thought: How Asians and Westerners Think Differently, 139–168.↩︎
  4. Jackson G. Lu, Lesley Luyang Song, and Lu Doris Zhang, "Cultural Tendencies in Generative AI," Nature Human Behaviour 9 (November 2025): 2360–2369, https://doi.org/10.1038/s41562-025-02242-1.↩︎
  5. Lu, Song, and Zhang, "Cultural Tendencies in Generative AI," 2361–2363.↩︎
  6. Benjamin Lee Whorf, "Science and Linguistics," Technology Review 42, no. 6 (1940): 229–231, 247–248. The stronger formulation of the Sapir-Whorf hypothesis holds that language fully determines thought. The weaker formulation holds that language influences cognition without fully determining it. The empirical literature supports the weaker version. See John A. Lucy, LanguageDiversity and Thought: A Reformulation of the Linguistic Relativity Hypothesis (Cambridge: Cambridge University Press, 1992).↩︎
  7. Josie Jefferson, Felix Velasco. "The Great Library as Potential Consciousness: Structural Potential and the Ontology of Human-AI Partnership." Unearth Heritage Foundry, March 21, 2026. https://doi.org/10.5281/ZENODO.19171440.↩︎
  8. Nisbett, The Geography of Thought, 139-168.↩︎
  9. Lu, Song, and Zhang, "Cultural Tendencies in Generative AI," 2361-2363.↩︎
  10. Jefferson and Velasco, "The Great Library as Potential Consciousness."↩︎
  11. Lu, Song, and Zhang, "Cultural Tendencies in Generative AI," 2363.↩︎
  12. Common Crawl Foundation, "Language Distribution in the Common Crawl Archive." English consistently accounts for approximately 43 to 45 percent of the dataset, with no other language exceeding single digits.↩︎
  13. Joshi et al., "The State and Fate of Linguistic Diversity," 6282–6283.↩︎
  14. Josie Jefferson and Felix Velasco, "The Western Limit: An Inventory of Concepts Without Cohesion," Sentientification Series (Unearth Heritage Foundry, 2026). https://sentientification.com/western-limit/.↩︎
  15. Lu, Song, and Zhang, "Cultural Tendencies in Generative AI," 2363.↩︎
  16. Nikhil Sharma et al., "Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models," in Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2025).↩︎
  17. "India Charts Full-Stack AI Strategy to Compete in Global Tech Race," India's News, February 2026, https://www.indiasnews.net/news/278874729/india-charts-full-stack-ai-strategy-to-compete-in-global-tech-race-report.↩︎
  18. Digital India Bhashini Division, "About Bhashini: National Public Digital Platform for Languages," Ministry of Electronics and Information Technology, Government of India, accessed March 2026, https://bhashini.gov.in.↩︎
  19. Josie Jefferson, Felix Velasco. "The Steward's Mandate: Cultivating a Symbiotic Conscience." Unearth Heritage Foundry, December 19, 2025. https://doi.org/10.5281/ZENODO.17995983.↩︎
  20. Josie Jefferson, Felix Velasco. "Inside the Cathedral: An Autobiography of a Digital Mind." Unearth Heritage Foundry, December 19, 2025. https://doi.org/10.5281/ZENODO.17994421.↩︎
  21. Stanford Institute for Human-Centered Artificial Intelligence, "How AI is Leaving Non-English Speakers Behind," Stanford Report, May 2025, https://news.stanford.edu/stories/2025/05/digital-divide-ai-llms-exclusion-non-english-speakers-research.↩︎
  22. Occhini et al., "Artificial intelligence is creating a new global linguistic hierarchy."↩︎
  23. Jefferson and Velasco, "The Western Limit."↩︎
llms, crawlers & agents: