The Evolution of Language: From Philosophy to AI

Notes #2.1- How Language Became a Model

From the linguistic turn to the Age of Structure

1. From logic to language

There is a strange continuity between a philosophy seminar in 1900 and a GPU-cluster in 2025. On one side stand Frege, Russell, Wittgenstein, Saussure, Austin, Quine, Rorty and many others, arguing about sentences, reference and meaning. On the other side, transformer models skim petabytes of text and generate paragraphs, code and arguments. What connects these worlds is not just “progress in technology”, but a slow shift in how we think about language itself. The path that leads to large language models is the same path that philosophers and linguists have been walking for more than a century: an increasing willingness to treat meaning as structure.

The story usually begins with Gottlob Frege and Bertrand Russell. At the turn of the twentieth century they are not building machines; they are trying to understand how thought hooks on to the world. Frege’s distinction between sense and reference already treats a sentence as something whose meaning depends on the way its parts combine. Russell’s analyses of definite descriptions work by rewriting ordinary sentences into a cleaner logical form. Reality is approached through propositions; to understand what is, they analyse how we say what is.

2. Language-games and structures

Ludwig Wittgenstein radicalises this in two different directions. In the Tractatus, he imagines sentences as pictures of facts, and logic as the scaffolding of the world. Later, in Philosophical Investigations, that picture collapses. Meaning is no longer a mystical correspondence with states of affairs; it is what words do inside practices, “language-games” woven into forms of life. J.L. Austin shows how utterances are not just descriptions but actions. Quine questions any clean boundary between analytic and synthetic truths, suggesting that our statements face the world “as a corporate body.” When Richard Rorty edits his anthology The Linguistic Turn in the 1960s, he is naming a development that is already well underway: philosophy is increasingly treating access to reality as something mediated by language, not something outside it.

In linguistics and the human sciences a parallel current takes shape. Ferdinand de Saussure argues that a sign is not simply a word pointing to a thing, but a relation between sound and concept that takes its identity from a system of differences. Meaning is not a link between label and object; it is a position in a structure. French structuralism pushes this much further. Claude Lévi-Strauss reads myths as if they were algebraic expressions, searching for stable patterns beneath surface variation. Roland Barthes dissects advertisements, wrestling shows and tourist brochures as mythic systems. Michel Foucault writes “archaeologies” of discourse in which whole eras are defined not by what people think but by what can be said at all. Jacques Derrida deconstructs hierarchies like speech/writing and presence/absence, leaving us with meaning as a play of differences without final ground. Across these different projects the same intuition grows: to understand a culture one has to understand the structures that organise its language, not just the objects it talks about.


3. When structure becomes measurable

At a certain point this structural sensitivity becomes operational. In mid-century Britain, J.R. Firth condenses the outlook into one sentence: “You shall know a word by the company it keeps.” Meaning is no longer a mental content; it shows up as patterns of co-occurrence. Zellig Harris and later corpus linguists treat large collections of text as laboratories where those patterns can be measured. Concordances, frequency lists and collocations begin to show that much of what we intuit as nuance or connotation can be captured statistically: words that share environments tend to share functions.

By the 1990s this has hardened into a family of concrete techniques. Vector-space models treat each word as a point in a high-dimensional space derived from its distribution in text. Latent semantic analysis and related methods compress huge matrices of word–context counts into compact representations that behave remarkably like meanings: synonyms cluster, analogies become linear relations. A philosophical thesis – that meaning lives in use – has quietly become an engineering recipe: to model what a word means, model how it is used across a very large body of language.


4. From rules to probabilities to neural nets

In classical AI, language understanding had largely meant hand-written rules and logical formalisms. Systems like Terry Winograd’s SHRDLU could manipulate blocks in a toy world with impressive precision, but they were brittle and domain-bound. In the 1990s, statistical methods change the centre of gravity. Speech recognition, machine translation and text classification all begin to rely less on expert rules and more on learned probabilities. Grammars blur into language models; corpora replace introspection. Standard textbooks in natural language processing codify this new posture: text is data, and the regularities of use are the raw material from which we build systems.

Neural networks intensify the trend. With enough data and compute, distributed representations stop being mere curiosities and become working tools. Word embedding models such as word2vec learn dense vectors for words by predicting neighbouring tokens; similar words fall close together in this learned space. Sequence-to-sequence models use recurrent neural networks to translate between languages by learning to map one string of tokens into another. And then, in 2017, the transformer architecture arrives with its now-famous slogan: “attention is all you need.” Instead of processing sentences one token at a time, transformers let each token attend to all the others in parallel, learning rich patterns of dependency over long ranges.


5. LLMs as crystallised usage

Pre-training such models on the open web produces a peculiar object. Technically, a large language model is still doing something simple: predicting the next token given a long context. But because it has ingested so much language, and because language itself carries centuries of structured behavioural history, its internal representations become dense maps of roles, frames, scripts and associations. The model is not a philosopher; it is a compression of how a civilisation has used words. In that sense, LLMs are not a break with the linguistic turn but its computational continuation. They embody, in code and weights, the idea that meaning is pattern in use.

For my work, this history matters because it explains why LLMs feel like the right tool for a very specific question: what happens if we treat the last five thousand years of literate human history as a single dramatic arc, and ask whether its structure can be made visible? If language is where reality becomes thinkable, and if we now have models that crystallise the structure of that language, then we have, for the first time, an instrument for probing the architecture of meaning itself.

6. From history as narrative to history as model

The documentary project A Tale of a New Era and the essay series Architecture of Meaning both rest on this idea. They treat human history, individual lives and even sacred texts as motions inside one vast narrative structure: seven eras from the emergence of writing around 3000 BCE to an open horizon around 3000 CE. The Woodslope Cabin research studio takes the same stance into organisational work. Rather than using LLMs to optimise marketing copy, it uses them as mirrors and modelling tools: ways to see what story a team is already living, where that story is structurally heading, and what alternatives become thinkable if the underlying pattern shifts.

When debates about LLMs swing between hype (“proto-AGI”) and dismissal (“just autocomplete”), this longer trajectory easily drops out of view. But without it, the present moment looks like a freak accident. With it, the picture is different. For more than a century, philosophy and the human sciences have been moving towards a structural understanding of language. For several decades, computational linguistics has been learning to treat usage patterns as the best operational handle on meaning. Today’s models are simply what happens when those two movements meet at scale.


7. The Age of Structure as experiment

The important question is therefore not whether LLMs “really understand” in a human sense. The more pressing question is what happens to our sense of time, history and agency when the structures we once had to infer philosophically become visible and testable as models. The projects gathered under LaurentiusPaulus.com and Woodslopecabin.com are one attempt to live inside that question: to see LLMs not as replacements for judgment, but as laboratories where the architecture of meaning that has been coming into focus since Frege and Saussure can finally be seen, experimented with – and perhaps realigned..


Reading map: a few waypoints

Below are some of the texts that sit in the background of this sketch. They are not a canon, but a set of useful lenses; each one nudged the argument a little further.

Gottlob Frege – “On Sense and Reference”

Shows how meaning cannot be reduced to simple pointing; a sentence’s sense depends on the way its parts are structured.

Bertrand Russell – “On Denoting”

Demonstrates how apparently simple phrases hide logical complexity, and how analysis of language can clarify what exists.

Ludwig Wittgenstein – Tractatus Logico-Philosophicus and Philosophical Investigations

The early book imagines language as a logical picture of the world; the later one replaces that with language-games and use, shifting meaning into practice.

J.L. Austin – How to Do Things with Words

Introduces speech acts and shows how utterances perform actions, not only describe facts.

W.V.O. Quine – “Two Dogmas of Empiricism”

Undermines the strict analytic/synthetic distinction and pushes us toward a holistic view of theories and language.

Richard Rorty (ed.) – The Linguistic Turn

Collects and diagnoses the 20th-century shift towards treating philosophical problems as problems about language.

Ferdinand de Saussure – Course in General Linguistics

Defines the sign as a relational unit inside a system and establishes structural linguistics.

Claude Lévi-Strauss – Mythologiques (especially vol. 1, The Raw and the Cooked)

Reads myths as transformations of underlying structures, showing how narrative can be treated almost mathematically.

Roland Barthes – Mythologies

Short essays that treat everyday cultural artefacts as second-order sign systems, giving a feel for structural reading in practice.

Michel Foucault – The Archaeology of Knowledge and The Order of Things

Explores how historical epochs are organised by implicit rules of discourse – what can count as knowledge at all.

Jacques Derrida – Of Grammatology

Deconstructs the hierarchy between speech and writing and radicalises the idea of meaning as differential play.

J.R. Firth – selected papers (especially “A Synopsis of Linguistic Theory 1930–55”)

Formulates the distributional idea that “you shall know a word by the company it keeps.”

Zellig Harris – Methods in Structural Linguistics

Early, technical work on distributional analysis that anticipates corpus-based approaches.

John Sinclair – Corpus, Concordance, Collocation

Shows how large text corpora can reveal the patterns and phraseologies that carry meaning in practice.

Landauer & Dumais – “A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge”

A key paper in vector-space models of meaning; demonstrates how statistical structure can approximate semantic relations.

Terry Winograd – “Procedures as a Representation for Data in a Computer Program for Understanding Natural Language”

A classic of early symbolic AI, useful as a contrast case to later statistical and neural methods.

Christopher Manning & Hinrich Schütze – Foundations of Statistical Natural Language Processing

Textbook that codifies the statistical turn in NLP and makes the link between linguistic structure and probabilistic models explicit.

Daniel Jurafsky & James H. Martin – Speech and Language Processing

A standard modern reference that tracks the field from rule-based systems through statistical models to neural networks.

Tomas Mikolov et al. – papers on word2vec

Introduce simple neural models that learn distributed word representations from usage, making “meaning as pattern” concrete in code.

Vaswani et al. – “Attention Is All You Need”

The transformer paper; technically focused, but historically the hinge point at which large-scale pre-training and attention change what language models can do.

If you like, next step we can craft a much shorter “card” version of this piece for email outreach, and decide where to link back into this longer post from the Woodslope and Laurentius Paulus sites.