The Elysium Thesis Is Right About the Problem and Wrong About the Endpoint

February 19, 2026 ยท c4573.org

๐Ÿ”Š Listen to this post
Speed:

The argument goes like this: AI removes the need for human labor. People without economic leverage become irrelevant. The wealthy, no longer dependent on the rest of us, stop caring about the rest of us. Earth becomes irrelevant. Elysium happens.

It's a coherent argument. It's mostly right about the mechanism. Where it breaks down is assuming this is new, that it's inevitable, and that the technology is heading toward permanent concentration rather than toward the thing it actually trends toward over time: radical efficiency and radical democratization.

We've Been Here Before. Three Times.

Pre-fire. For hundreds of thousands of years, Homo erectus lived in a world where survival meant staying warm by accident โ€” finding naturally burning trees after lightning strikes, huddling for body heat. Fire, when it was controlled, looked like an existential threat to the existing order. You no longer had to stop at dark. Predators that relied on nighttime hunting became less effective. The social structure built around collective warmth broke down. And then something remarkable happened: cooking enabled smaller guts and larger brains. The freed metabolic energy rewired human cognition. The same technology that disrupted the old survival model built the neural architecture that would eventually write symphonies, theorems, and code.

Pre-agriculture. Roughly 10,000 years ago, the Neolithic Revolution upended a way of life that had worked for tens of thousands of years. Hunter-gatherers had been genuinely free โ€” mobile, egalitarian by necessity, intimately knowledgeable about their landscape. Agriculture changed all of that. It created surplus, which created hierarchy, which created the first concentrations of wealth and power in human history. The early agricultural transition made most people's lives worse in measurable ways: skeletal records show shorter stature, more disease, more violence, more repetitive stress injury. The critics of agriculture, had they existed, would have been right in the short run. But agriculture also created cities, writing, mathematics, and eventually the accumulated knowledge that makes modern medicine possible. The disruption was real. So was the adaptation.

Pre-industrial. Two hundred fifty years ago, 90% of humanity worked in agriculture or manual craft. The steam engine arrived and people panicked โ€” correctly โ€” that it would eliminate their livelihoods. The Luddites weren't stupid. They were accurate about what the machine would do to their specific jobs. What they couldn't model was that the same industrial system would create entirely new categories of work that didn't exist before: engineers, managers, designers, electricians, accountants. In 1800, there was no such thing as a software developer, an urban planner, a radiologist, or a UX researcher. The labor that replaced the lost labor was unimaginable from inside the disruption.

Each time, the fear was reasonable. Each time, the specific jobs lost stayed lost. Each time, human adaptability โ€” not human charity from above โ€” found the next thing.

How Wasteful the Opposition Actually Is

Let's be precise about what we're up against, because precision here is clarifying rather than frightening.

A single NVIDIA H100 GPU โ€” the chip that powers most frontier AI inference today โ€” draws 700 watts at peak load and delivers roughly 1,000 trillion floating point operations per second. Your brain draws 20 watts and performs an estimated 1015 synaptic operations per second. On raw throughput per watt, the gap is already close. On useful intelligence per watt โ€” the kind that navigates ambiguity, transfers across domains, learns from a single example โ€” biological neural computation is not even in the same league as current silicon.

But that's the GPU in isolation. Add the datacenter around it.

The Power Usage Effectiveness (PUE) of an average commercial datacenter is 1.5 to 1.6 โ€” meaning for every watt that reaches a compute chip, another half watt is burned on cooling, power conversion, lighting, and infrastructure. The best hyperscale facilities in the world achieve a PUE of around 1.1. Most don't come close. A frontier model training run at GPT-4 scale consumed an estimated 50 to 100 megawatts of power sustained over weeks โ€” the annual energy consumption of tens of thousands of homes, for a single training run, producing a single model, owned by a single company.

The architecture itself is wasteful by design. The transformer โ€” the engine underneath every major language model โ€” uses dense attention: every token looks at every other token on every forward pass. The computation grows quadratically with sequence length. The brain doesn't work this way. Biological neural networks are sparse: at any given moment, only a small fraction of the brain's 86 billion neurons are active. The rest are quiet. This sparsity is not a bug in biology โ€” it's the primary reason the brain can run a general intelligence on 20 watts instead of a megawatt.

The industry knows this. Mixture-of-Experts architectures route each input through only a small subset of the network's total parameters. State space models escape the quadratic attention bottleneck. Neuromorphic chips are demonstrating inference at 1,000ร— lower power than conventional silicon for specific tasks, beginning to approach biological efficiency ranges. The entire trajectory of hardware and architecture research points the same direction: toward the brain, not away from it.

The monopoly is expensive to run. It gets more expensive every year. And the people trying to maintain it are running on a clock.

The 1 Billion Parameter Argument

Andrej Karpathy โ€” one of the people who actually understands how these systems work at a mechanistic level โ€” has publicly asked what happens when you train a 1 billion parameter model on the highest-quality data you can assemble, no compromises, and push micromodels as far as they'll go.

"Curious what the most powerful e.g. 1B param model trained on a dataset of 10B tokens looks like, and how far 'micromodels' can be pushed." โ€” Andrej Karpathy

We're starting to get the answer. DeepSeek R1, released in January 2025, demonstrated that a 1.5 billion parameter distilled model โ€” trained not by labeling data by hand but by learning from the reasoning traces of a much larger model โ€” outperforms GPT-4o on both key mathematical benchmarks: 83.9% vs 74.6% on MATH-500, and 28.9% vs 9.3% on AIME 2024. A model small enough to run locally, without a datacenter, without an API, without any infrastructure cost, outperforming a model that had been considered frontier less than a year before.

This is what distillation does: it compresses the learned capabilities of a massive model into a far smaller one by training on the reasoning process, not just the outputs. The large model thinks out loud. The small model learns to think the same way with a fraction of the parameters.

The scaling laws discovered by OpenAI in 2020 established that model capability grows predictably with compute, parameters, and data โ€” a power law with no obvious ceiling. The Chinchilla paper from DeepMind in 2022 refined this: most large models were dramatically undertrained relative to their size. More data, not just bigger models, was the key. And now a third scaling axis has appeared: inference-time compute โ€” models that spend more time thinking before answering, trading latency for correctness, unlocking reasoning capability independent of raw parameter count.

The trajectory these laws describe is not "AI gets bigger and more expensive forever." It's "AI gets more capable and more efficient simultaneously, and capability increasingly separates from infrastructure cost."

Science Wins. It Always Has.

The history of knowledge is not a history of monopolies holding. It is a history of monopolies breaking.

The Catholic Church controlled the reproduction and interpretation of written knowledge for a thousand years. It took scribes years to copy a single manuscript. Information was expensive, scarce, and gatekept by design. Then Gutenberg built a press in 1440, and within fifty years there were an estimated 20 million books in Europe where there had been thousands. The Church tried to burn books. It tried to license printers. It tried to define heresy broadly enough to include inconvenient ideas. It failed. The technology was too cheap. The replication was too easy. The knowledge escaped.

The same pattern runs through the history of cryptography. For most of the twentieth century, strong encryption was classified military technology, export-controlled, jealously guarded by governments who understood that information security was strategic leverage. Then Phil Zimmermann released PGP in 1991, distributed it as free software, and the U.S. government spent years trying to prosecute him for it. They failed. Encryption became a public good. HTTPS now protects billions of transactions per day.

The Human Genome Project was a publicly-funded international collaboration racing against Celera Genomics, a private company that wanted to patent the human genome and license access to it. The public project won, released the data freely, and the downstream science โ€” including mRNA vaccine technology that saved millions of lives in 2021 โ€” built on that open foundation. The monopoly lost. Science won.

This is not a coincidence. It is a law. Knowledge, once discovered, wants to spread. The marginal cost of copying it is always falling. Every attempt to maintain an information monopoly through restriction has eventually failed against the same force: enough people, distributed widely enough, working in enough different directions, will rediscover, reformulate, and redistribute the thing you tried to contain.

AI is not different. The attention mechanism โ€” the core mathematical insight behind the transformer โ€” was published in a freely available academic paper in 2017. Attention Is All You Need was posted on arXiv. The authors were not trying to build a monopoly. They were doing science. Every subsequent breakthrough โ€” RLHF, chain-of-thought prompting, scaling laws, constitutional AI, distillation from reasoning traces โ€” has been published, debated, replicated, and improved upon in the open. The frontier labs have more compute. They do not have a monopoly on ideas. They never will.

Open Wins. The Math Is Already There.

In January 2025, DeepSeek released R1 โ€” a model that matched GPT-4o at a reported training cost of under six million dollars. They released the weights openly. It was the moment the monopoly narrative cracked in public.

That was last year. Look at what happened last week.

Five Chinese labs dropped frontier open-weight models within days of each other, timed to the Lunar New Year sprint. MiniMax M2.5 hit 80.2% on SWE-Bench Verified โ€” within 0.6 points of Claude Opus 4.6 and comfortably ahead of GPT-4o at coding โ€” and priced API access at one dollar per hour. Z.ai's GLM-5 from Zhipu AI took the top spot on the Intelligence Index leaderboard as an open-weight model available to anyone. Moonshot AI's Kimi K2.5 significantly outperformed Claude Opus 4.5 on Humanity's Last Exam โ€” 50.2% vs 32.0%, an 18-point gap โ€” at roughly one-eighth the price, and published the weights. Alibaba released Qwen3.5 โ€” a 397 billion parameter Mixture-of-Experts open-weight model (with only 17 billion parameters active per inference) with agentic capabilities, built to run on your own infrastructure. And ByteDance released Doubao 2.0, the latest in a family that is now one of the most widely used AI products in the world. Five labs. One week. All open. This was not a coordinated campaign โ€” it was a race, and every lab chose the same finish line: give it away.

MIT published a study this month showing that Chinese open-source models have now surpassed US models in total downloads on Hugging Face. Alibaba's Qwen family โ€” not Meta's Llama, not Mistral โ€” is the most downloaded model series on the planet. The center of gravity of open AI has shifted, and it shifted without anyone's permission.

This is what the distillation curve looks like at full speed. Each open release becomes training signal for the next one. You train a small model on the reasoning traces of a large model, and the small model learns to think the same way with a fraction of the parameters. The monopoly's most valuable asset โ€” the learned reasoning capability of a trillion-parameter model โ€” becomes the source material for its own circumvention. The closed system's intelligence leaks into the open ecosystem with every generation.

A distilled version of DeepSeek R1 with 1.5 billion parameters already runs locally on a laptop and beats GPT-4o on math โ€” 83.9% vs 74.6% on MATH-500. The frontier capability of 2024 fits in your pocket in 2025. What fits in your pocket in 2026 is what required a datacenter in 2024.

The economics of openness are brutal for monopolists. Every open release accelerates the ecosystem around it. Every publicly available weight becomes infrastructure that thousands of researchers build on simultaneously. The closed system has more capital but fewer people. The open system has the entire world.

The Structural Instability of Separation

Even setting aside the efficiency argument, the Elysium scenario is unstable on pure game theory grounds.

The residents of the space station still need each other. Small, insular populations stagnate โ€” genetically, intellectually, culturally. The history of innovation is not a history of isolated aristocrats solving their own problems. It's a history of contact with the messy, contradictory world generating hard questions worth answering. You cannot maintain a civilization of thousands indefinitely without the diversity of problem-solving that comes from billions.

Power without legitimacy is expensive. Even fully automated authoritarian systems spend enormous resources managing the people they've excluded. The walls of Elysium aren't free. They're a permanent tax paid in surveillance, enforcement, and the constant vigilance required when the system you depend on is maintained by people who have no stake in it.

The Elysium scenario assumes separation is a stable equilibrium. It probably isn't.

The Accelerationist Case

Here is the uncomfortable version of the counter-argument: slow AI deployment may be worse than fast.

If AI improves at a pace that lets existing institutions absorb it incrementally, those institutions will capture the gains. Slowly. Legally. Without a visible rupture. The people who own financial infrastructure, real estate, and utilities get 40 years to entrench. The gains flow upward through mechanisms already in place, and no single event is dramatic enough to trigger a political response.

If AI improves fast enough to be disruptive before institutions adapt, it breaks old power structures alongside new ones. The printing press didn't gradually give power to the church โ€” it shattered the information monopoly before the church could figure out how to own it. The Industrial Revolution was chaotic and brutal and also, within two generations, raised average living standards more than the previous ten millennia had.

This isn't an argument for recklessness. It's an argument against assuming that "go slow" is a safety strategy. Slow tends to benefit the people who are already safe.

It Starts Now

The intelligence monopoly is real. The concentration of compute, capital, and proprietary data into a small number of hands is real. The Elysium scenario is not science fiction โ€” it is a live risk being actively created by decisions being made right now.

And it is going to lose. Not because powerful people will become generous. Because the technology is too cheap, too replicable, and too scientifically open for any monopoly to hold.

The brain is 20 watts. The datacenter is a megawatt and climbing. The gap will close. The open models are 12 months behind and accelerating. The distillation curve means every breakthrough in a closed system becomes training signal for an open one. The architectural research points toward sparse, efficient, biologically-inspired computation that doesn't require hyperscale infrastructure. Science publishes its best ideas. Open communities build on them faster than any single organization can.

Every time in human history that a small group tried to own a transformative technology โ€” fire, writing, the press, the genome, encryption โ€” they failed. Not because they weren't trying. Because the technology had a direction, and the direction was toward the many, not the few.

We are going to solve our own problems. We have always solved our own problems. We do it with the tools available, we build better tools, and the people who tried to make the tools scarce end up footnotes in the history of the thing they failed to contain.

The adaptation is already happening. The open models are already here. The efficiency curve is already running.

It starts now.



c4573.org builds tools to break the digital caste system. Browse our tools or read more about us.