No Set Gauge

The Technology of Liberalism

Rudolf Laine — Mon, 05 Jan 2026 10:06:36 GMT

Dido Building Carthage, J. M. W. Turner

“In every technological revolution, we face a choice: build for freedom or watch as others build for control.”
-Brendan McCord

There are two moral frames that explain modern moral advances: utilitarianism and liberalism.

Utilitarianism says: “the greatest good for the greatest number”. It says we should maximize welfare, whatever that takes.

Liberalism says: “to each their own sphere of freedom”. We should grant everyone some boundary that others can’t violate, for example over their physical bodies and their property, and then let whatever happen as long as those boundaries aren’t violated.

(Before the philosophers show up and cancel me: “utilitarianism” and “liberalism” here are labels for two views, that correspond somewhat but not exactly to the normal uses of the phrases. The particular axis I talk about comes from my reading of Joe Carlsmith, who I’ll quote at length later in this post. Also, for the Americans: “liberalism” is not a synonym for “left-wing”.)

Some of the great moral advances of modernity are:

women’s rights
abolition of slavery
equality before the law for all citizens of a country
war as an aberration to be avoided, rather than an “extension of politics by other means”
gay rights
the increasing salience of animal welfare

Every one of these can be seen as either a fundamentally utilitarian or fundamentally liberal project. For example, viewing wars of aggression as bad is both good for welfare (fewer people dying horrible deaths way too early), and a straightforward consequence of believing that “your right to swing your fist ends where my nose begins” (either on a personal or national level). Women’s rights are both good for welfare through everything from directly reducing gender-based violence to knock-on effects on economic growth, as well as a straightforward application of the idea that people should be equal in the eyes of the law and free to make their own choices and set their own boundaries.

Utilitarian aims are a consequence of liberal aims. If you give everyone their boundaries, their protection from violence, and their God and/or government -granted property rights, then Adam Smith and positive-sum trade and various other deities step in and ensure that welfare is maximized (at least if you’re willing to grant a list of assumptions). More intuitively and generally, everyone naturally tries to achieve what they want, and when you ban boundary-violating events, everyone will be forced to achieve what they want through win-win cooperation rather than, say, violence.

Liberal aims are a consequence of utilitarian aims. If you want to maximize utility, then Hayek and the political scientists and the moral arc of humanity over the last two centuries all show up and demand that you let people choose for themselves, and have their own boundaries and rights. Isn’t it more efficient when people can make decisions based on the local information they have? How many giga-utils have come from women being able to pursue whatever job they want, or gay people being free from persecution? Hasn’t the wealth of developed countries, and all the welfare that derives from that, come from institutions that ensure freedom and equality before the law, enforce a stable set of rules, and avoid arbitrary despotism?

Much discussion of utilitarianism focuses on things like trolley problems that force you to pick between welfare losses and boundary violations. Unless you happen to live near a trolley track frequented by rogue experimental philosophers, however, you’ll notice that such circumstances basically never happen (if you are philosophically astute, you’ll also be aware that as a real-life human you shouldn’t violate the boundaries even if it seems like a good idea).

However, as with everything else, the tails come apart when things get extreme enough. The concept of good highlighted by welfare and that highlighted by freedom do diverge in the limit. But will we get pushed onto extreme philosophical terrain in the real world, though? I believe we don’t need to worry about a sudden influx of rogue experimental philosophers, or a trolley track construction spree. But we might need to worry about the god-like AI that every major AGI lab leader, the prediction markets, and the anonymous internet folk who keep turning out annoyingly right about everything warn us might be coming in a few years.

The Effective Altruists, to their strong credit, have taken the intersection of AI and moral philosophy seriously for years. However, their main approach has been to perfect alignment—the ability to reliably point an AI at a goal—while also in tandem figuring out the correct moral philosophy to then align the AI to, such that we point it at the right thing. Not surprisingly, there are some unresolved debates around the second bit. (Also, to a first approximation, the realities of incentives mean that the values that get encoded into the AI will be what AI lab leadership wants modulo whatever changes are forced on them by customers or governments, rather than what the academics cook up as ideal.)

In this post, I do not set out to solve moral philosophy. In fact, I don’t think “solve moral philosophy” is a thing you can (or should) do. Instead, my concern is that near-future technology by default, and AI in particular, may differentially accelerate utilitarian over liberal goals. My hope is that differential technological development—speeding up some technologies over others—can fix this imbalance, and help continue moral progress towards a world that is good by many lights, rather than just one.

Subscribe now

The real Clippy is the friends we made along the way

Joe Carlsmith has an excellent essay series on “Otherness and control in the age of AGI“. It’s philosophy about vibes, but done well, and thoroughly human to the core.

He starts off with a reminder: Eliezer Yudkowsky thinks we are all going to die. We don’t know how to make the AIs precisely care about human values, the AI capabilities will shoot up without a corresponding increase in their caring about our values, and they will devour the world. A common thought experiment is the paperclip-maximizer AI, sometimes named “Clippy” after the much-parodied Microsoft Office feature. The point of the thought experiment is that optimizing hard for anything (e.g. paperclips) entails taking over the universe and filling it with that thing, and in the process destroy everything else. In his essay “Being nicer than Clippy”, Carlsmith writes:

Indeed, in many respects, Yudkowsky’s AI nightmare is precisely the nightmare of all-boundaries-eroded. The nano-bots eat through every wall, and soon, everywhere, a single pattern prevails. After all: what makes a boundary bind? In Yudkowsky’s world (is he wrong?), only two things: hard power, and ethics. But the AIs will get all the hard power, and have none of the ethics. So no walls will stand in their way.

The reason the AIs will be such paperclipping maximizers is because Yudkowsky’s philosophy emphasizes some math that points towards: “If you aren’t running around in circles or stepping on your own feet or wantonly giving up things you say you want, we can see your behavior as corresponding to [expected utility maximization]” (source). Based on this, the Yudkowskian school thinks that the only way out is very precisely encoding the right set of terminal values into the AI. This, in turn, is harder than teaching the AIs to be smart overall because: “There’s something like a single answer, or a single bucket of answers, for questions like ‘What’s the environment really like?’ [... and so w]hen you have a wrong belief, reality hits back at your wrong prediction [...] In contrast, when it comes to a choice of utility function, there are unbounded degrees of freedom”.

So: anything smart enough is a rocket aimed at some totalizing arrangement of all matter in the universe. You need to aim the rocket at exactly the right target because otherwise it flies off into some unrelated voids of space and you lose everything.

To be clear, to Yudkowsky this is a factual prediction about the behavior of things that are smart enough, rather than a normative statement that the correct morality is utilitarian in this way. Big—and very inconvenient—if true.

Now at this point you might remember that even some humans disagree with each other about what utopia should look like. Of course, a standard take in AI safety is that the main technical problem we face is pointing the AIs reliably at anything at all, and therefore arguing over the “monkey politics” (as Carlsmith puts it) is pointless politicking that detracts from humanity’s technical challenge.

However, in another essay in his sequence, Carlsmith points out that any size of value difference between two agents could lead to one being a paperclipper by the lights of the other:

What force staves off extremal Goodhart in the human-human case, but not in the AI-human one? For example: what prevents the classical utilitarians from splitting, on reflection, into tons of slightly-different variants, each of whom use a slightly-different conception of optimal pleasure (hedonium-1, hedonium-2, etc)? And wouldn’t they, then, be paperclippers to each other, what with their slightly-mutated conceptions of perfect happiness?

This is not just true between agents, but also between you at a certain time and you at a slightly later time:

[..] if you read a book, or watch a documentary, or fall in love, or get some kind of indigestion [...] your heart is never exactly the same ever again, and not because of Reason [...so] then the only possible vector of non-trivial long-term value in this bleak and godless lightcone has been snuffed out?! Wait, OK, I have a plan: this precise person-moment needs to become dictator. It’s rough, but it’s the only way. Do you have the nano-bots ready? Oh wait, too late. (OK, how about now? Dammit: doom again.)

Now, this isn’t Yudkowsky’s view. But why not? Remember: in the Yudkowskian frame, for any agent to “want” something coherently, it must have a totalizing vision of how to structure the entire universe. And who is to say that even small differences in values don’t lead to different ideal universes, especially under over-optimization to the limit of those values? After all, as Yudkowsky repeatedly emphasizes, value is fragile. However, the overall framework, Carlsmith argues, creates an overall “momentum towards deeming more and more agents (and agent-moments) paperclippers”. It makes it very natural that we’ve just got to know exactly what is valuable well enough to write it down in a form without contradictions that we can optimize ruthlessly without breaking it. And we should do this as soon as possible, because our values might change, or because another entity might get to do it first, and chances are they’d be a paperclipper with respect to us. Time to take over the universe, for the greater good!

What’s the alternative? In the original Carlsmith essay, he writes:

Liberalism does not ask that agents sharing a civilization be “aligned” with each other in the sense at stake in “optimizing for the same utility function.” Rather, it asks something more minimal, and more compatible with disagreement and diversity – namely, that these agents respect certain sorts of boundaries; that they agree to transact on certain sorts of cooperative and mutually-beneficial terms; that they give each other certain kinds of space, freedom, and dignity. Or as a crude and distorting summary: that they be a certain kind of nice. Obviously, not all agents are up for this – and if they try to mess it up, then liberalism will, indeed, need hard power to defend itself. But if we seek a vision of a future that avoids Yudkowsky’s nightmare, I think the sort of pluralism and tolerance at the core of liberalism will often be more a promising guide than “getting the utility function that steers the future right.”

So, then: we have Yudkowsky, claiming that, as much as we humans benefit from centering rights and virtues in our idea of the good, the AIs will be superhumanly intelligent, and it is the nature of rationality that beings bend more and more towards a specific type of “coherence”—and hence utilitarian consequentialism—as they get smarter.

Yudkowsky is not an arch-utilitarian when it comes to how humans should act. But in his worldview, reality, alas, prefers the consequentialists. (Source)

So, this then is the irony: Yudkowsky is a deep believer in humanism and liberalism. But his philosophical framework makes him think anything sufficiently smart becomes a totalizing power-grabber. The Yudkowskian paperclipper nightmare explicitly comes from a lack of liberalism, in the rules-and-boundaries sense. Yudkowsky’s ideal solution would be to figure out how to encode the non-utilitarian limits into the AI—”corrigibility”, for example, means that the AI doesn’t resist being corrected, and is something where Yudkowsky and others have spent lots of time on trying to make it make sense within the expected-utility-maximising paradigm. But that seems deeply technically difficult within the expected-utility-maximizer paradigm.

These philosophical vibes, plus the unnaturalness of non-consequentialist principles given a framework that makes non-consequentialism deeply unnatural, has given the AI safety space an ethos where the only admissible solution is getting better at fiddling with the values that are going into an AI, and then fiddling with those values. The AI is a rocket shooting towards expected utility maximization, and if it hits even slightly off, it’s all over—so let’s aim the rocket very precisely! Let’s make sure we can steer it! Let’s debate whether the rocket should land here or there. But if the threat we care about is the all-consuming boundary violations ... maybe we shouldn’t build a rocket? Maybe we should build something less totalizing. Maybe we shouldn’t try to reconfigure all matter in the universe—”tile the universe”—into some perfectly optimal end-state. Ideally, we’d just boost the capabilities of humans who engage with each other under a liberal, rule-of-law world order.

The point Yudkowsky fundamentally makes, after all, is that you shouldn’t “grab the poison banana”; that hardcore utility-maximization is very inconvenient if you want anything or anyone to survive—or if you think that there’s the slightest chance you’re wrong about your utility function.

There’s also a more cynical way to read the focus on values over constraints in AI discourse. If you build the machine-god, you will gain enormous power, and can reshape the world by your own lights. If you think you might be in the winning faction in the AI race, you want to downplay the role of constraints and boundaries, and instead draw the debate towards which exact set of values should be prioritized. This tilts your faction’s thinking a bit towards yours, while increasing the odds that your faction is not constrained. If you’re an AI lab CEO, how much better is it for you that people are debating what the AI should be set to maximizing, or whether it should be woke or unwoke, rather than how the AI should be constrained or how we should make sure it doesn’t break the social contract that keeps governments aligned or permanently entrench the ruling class in a way that stunts social & moral progress?

Of course, I believe that most who talk about AI values are voicing genuine moral concerns rather than playing power games, especially since the discourse really does lean towards the utilitarian over the liberal side, and since the people who (currently) most worry about extreme AI scenarios often have a consequentialist bent to their moral philosophy.

But even in the genuineness there’s a threat. Voicing support for simple, totalizing moral opinions is a status signal among the tech elite, perhaps derived from the bullet-biting literal-mindedness that is genuinely so useful in STEM. The e/accs loudly proclaim their willingness to obey the “thermodynamic will of the universe”. This demonstrates their loyalty to the cause (meme?) and their belief in techno-capital (and, conveniently, willingness to politically align with the rich VCs whose backing they seek). What about their transparent happiness for the techno-capital singularity to rip through the world and potentially create a world without anyone left to enjoy it? It’s worth it, because thermodynamics! Meanwhile, among the AI safety crowd, it’s a status signal to know all the arguments about coherence and consequentialism inside out, and to be willing to bite the bullets that these imply. But sometimes, that there is no short argument against something means not that it is correct, but rather that the counter-argument is subtle, or outside the paradigm.

Subscribe now

The limits and lights of liberalism

I’m worried of leaving an impression that liberalism, boundaries, and niceness are an obvious and complete moral system, and utilitarianism is this weird thing that leads to paperclips. But like utilitarianism, liberalism breaks when taken to an extreme.

First, it’s hard to pin down what it means. Carlsmith writes:

[A]n even-remotely-sophisticated ethics of “boundaries” requires engaging with a ton of extremely gnarly and ambiguous stuff. When, exactly, does something become “someone’s”? Do wild animals, for example, have rights to their “territory”? See all of the philosophy of property for just a start on the problems. And aspirations to be “nice” to agents-with-different-values clearly need ways of balancing the preferences of different agents of this kind – e.g., maybe you don’t steal Clippy’s resources to make fun-onium; but can you tax the rich paperclippers to give resources to the multitudes of poor staple-maximizers? Indeed, remind me your story about the ethics of taxation in general?

Second, there are things we may want to ban or at least discourage, even if doing so means interventionism. Today we (mostly) think that a genocide warrants an international response, that you should catch the person jumping off the building to kill themself, and that various forms of paternalism are warranted, especially towards children. In the future, presumably there are things people could do that are bad enough by our lights that we think we can’t morally tolerate that in our universe.

Third, the moral implications of boundaries and rights change with technology. For example, today cryptocurrency might be helpful for people in unstable low-income countries, but tomorrow scheming AI systems might take advantage of the existence of a decentralized, uncontrollable payments rail to bribe, blackmail, and manipulate in ways we wish we hadn’t given them the affordances to do. Today we might be completely fine giving everyone huge amounts of compute that they have full control over, but tomorrow it might be possible to simulate digital minds and suddenly we want to make sure no one is torturing simulated people. Historically, too, it’s worth noting that while almost every place and time would’ve benefited from more liberalism and rule-of-law, sometimes it’s important that there are carefully-guarded mechanisms to tinker with those foundations. What set British institutions apart in the 18th and 19th centuries was not just that they were more laissez-faire and gave better rights than other countries of the age, but also that they were able to adapt in times of need. In the US, FDR’s reforms required heavy-handed intervention and the forcing through of major changes in government, but were (eventually) successful at dragging the country out of recession. Locking in inviolable rights is a type of lock-in too.

Fourthly, liberalism has a built-in tension between a decentralizing impulse and a centralizing impulse. The decentralization comes from the fact that the whole point is that everyone is free. The centralization comes from things like needing a government with a monopoly on force to protect the boundaries of people within it—otherwise it’s not a world of freedom, but of might-makes-right. A lot of our civilization’s progress in governance mechanisms comes from improving the Pareto frontier of tradeoffs we can make between centralization and decentralization (I heard this point from Richard Ngo, in a talk that is unpublished as far as I know):

Institution design (whether through politics or technology) is often about making the tradeoff between autocracy and anarchy risks less harsh.

For example, democracy allows for centralizing more power in a government (reducing anarchy risk) without losing people’s ability to occasionally fire the government (reducing autocracy risk). Zero-knowledge proofs and memoryless AI allow verification of treaties or contracts (important for coordinating to solve risks) without any need to hand over information to some authority (which would centralize).

However, the tension remains. We want inviolable rights for individuals, but we also need some way to enforce those rights. Unless technology can make that fully decentralized (or David Friedman is right), that means we need at least some locus of central control, but this locus of course becomes the target of every seeker of power and rent. And remember, we want everything to be adaptable if circumstances change—but only for good reasons.

On a more fundamental level, everything being permitted is not necessarily a stable state. If, in free competition, there exist patterns of behavior that are better at gaining power than others, then you have to accept that those patterns will have a disproportionate share of future value. This leads to two concerns. First, as pointed out by Carlsmith in his talk “Can goodness compete?”, maximizing for goodness and maximizing for power are different concerns, so the optimum for one is likely not the optimum for the other. So in fully free competition, in this very simplified model, we lose on goodness. Secondly, if everything being permitted leads to competition in which some gain power until they are in a position to violate others’ boundaries, then full liberalism would bring about its own demise. It’s not a stable state.

Fifth, there’s a neutrality to liberalism. We give everyone their freedoms, we make everyone equal before the law—but what do we actually do? Whatever you want, that’s the point! And what do you want? Well, that’s your problem—we liberals just came here to set up the infrastructure, like the plumber comes to install your pipes. It’s your job to decide what runs through them, and to use them well.

Now, utilitarianism suffers from this problem too. What is utility? It’s what you like/prefer. But what are those things? Presumably, you’re supposed to just look inside you. But for what? Do you really know? The most hardcore hedonic utilitarians will claim it’s obvious: pleasure minus pain. But this is of course overly reductive, unless either you are a very simple person (lucky you—but what about the rest of us?) or you take a very expansive view of what pleasure and pain are (but then you’re back to the problem of defining them in a sufficiently expansive, yet precise, way).

Putting aside the abstract moral philosophies: in yet another essay, Carlsmith mentions the “Johnny Appleseeds” of the world:

Whatever your agricultural aspirations, he’s here to give you apple trees, and not because apple trees lead to objectively better farms. Rather, apple trees are just, as it were, his thing.

There’s something charming about this image. At the end of the day you need something worth fighting for. Everyone who has actually inspired people has not preached just an abstract set of rules, but has also breathed life into an entire worldview and way of life. Pure liberalism does not tell you what the content of life should be, it only says that in the pursuit of that content you should not take away other people’s freedoms to pursue theirs. At times you need the abstraction-spinning philosophers and even-handed policemen and hair-splitting lawyers to come and argue for and enforce the boundaries. But they are infrastructure, not content. The farmer told by the philosopher to believe in liberalism, or shown by the police and the lawyers where their plot ends and their neighbor’s begins, does not gain the content that animates a life. You also need a Johnny Appleseed, whether externally or within you, who preaches something specific, something concrete: look at this apple, feel its texture, smell its scent—this is what you should grow.

Much as there’s a distinction between the pipes as infrastructure and the water in the pipes, there’s a distinction between the structure of our moral system (or, to use a computer science metaphor, its type), and what we actually care about. What people really fight for is sometimes the abstract principle, but almost always there is something real and concrete too. But what exactly this is, and how different people find theirs, is subtle and complicated. The strength of liberalism is that it doesn’t claim an answer (remember: there is virtue to narrowness). Whatever those real and concrete actually-animating things are, they seem hard to label and put in a box. So instead: don’t try to specify them, don’t try to dictate them top-down, just lay down the infrastructure thats let people discover and find them on their own. In a word: freedom.

Weaving the rope

So how should we think about moral progress? A popular approach is to pick one tenet, such as freedom or utility, and declare it to be the One True Value. Everything else is good to the extent that it reflects that value, and bad to the extent it violates it. You can come up with very good stories of this form, because words are slippery and humans are very good at spinning persuasive stories—didn’t the Greeks get their start in philosophy with takes like “everything is water” or “everything is change”? If there are annoying edge cases—trolley problems for the utilitarians, anti-libertarian thought experiments for the liberalists—then you have to either swallow the bullets or try to dodge them as best as you can while. But all the while you gain the satisfaction that you’ve solved it, you’ve seen the light, the ambiguity is gone (or, at least in principle, could one day be gone if only we carry out some known process long enough)—because you know the One True Value.

But, as Isaiah Berlin wrote:

One belief, more than any other, is responsible for the slaughter of individuals on the altars of the great historical ideals [...]. This is the belief that somewhere, in the past or in the future, in divine revelation or in the mind of an individual thinker, in the pronouncements of history or science, or in the simple heart of an uncorrupted good man, there is a final solution.

An alternative image to the One True Value is making a rope. While it’s being plied, the threads of the rope are laid out on the table as the rope is being made, pointing in different directions, not yet plied into a single rope. The liberal idea of boundaries and rights is one thread, and it’ll go somewhere into the finished thing, as will the thread of utilitarian welfare, and others as well. We’ve knit parts of this rope already; witness thinkers from the Enlightenment philosophers onwards making lots of progress plying together freedom and welfare and fairness and other things, in ways that (say) medieval Europeans probably couldn’t have imagined. How do we continue making progress? Rather than declaring one thread the correct one for all time, I think the good path forwards looks more like continuing to weave together different threads and moral intuitions. I expect where we end up to have at least some depend on the path we take. I expect progress to be halting and uncertain; I expect sometimes we might need to unweave the last few turns. I expect eventually where we end up might look weird to us, much as modernity would look weird to medieval people. But I probably trust this process, at least if it’s protected from censorship, extremism, and non-human influences. I trust it more than I trust the results of a thought experiment, or any one-pager on the correct ideology. I certainly trust it far more than I trust the results of a committee of philosophers, or AI lab ethics board, or inhuman AI, to figure it out and then lock it in forever.

I hope you share the intuition above, and that the intuition alone is compelling. But can we support it? Is there a deeper reason why should we expect morality to work like that, and be hard to synthesize into exactly one shape?

The first reason to think this is that, empirically, human values just seem to be subtle and nuanced. Does the nature of the good seem simple to you? Could you design a utopia you felt comfortable locking in for all eternity? Hell is simple and cruel, while heaven is devious and subtle.

A second reason is that over-optimization is treacherous. This is the same instinct that leads Yudkowsky to think the totalizing AIs will rip apart all of us, and that leads Carlsmith to spend so many words poking at whether there’s something nicer and less-treacherous if only we’re more liberal, more “green” (in his particular sense of the word), nicer. Paul Christiano, one of the most prominent thinkers in AI alignment and an inventor of RLHF, centers his depiction of AI risk on “getting what you measure” all too well. But why? Goodhart’s law is one idea: over-optimize any measure, and the measure stops being a good measure. Another way to put this is that even if two things are correlated across a normal range of values, they decorrelate—the “tails come apart”—at extreme values, as LessWrong user Thrasymachus (presumably a different person from the ancient philosopher) observed and Scott Alexander expanded on. In the case of morality, for example:

(Image taken from Scott Alexander here)

Within the range of normal events, moral theories align on what is good. When it comes to abnormal events—especially abnormal events derived from taking one moral theory’s view of what is good and cranking everything to eleven—the theories disagree. The Catholic Church and the hedonic utilitarians both want to help the poor, but the utilitarians are significantly less enthusiastic about various Catholic limits on sexuality, and the Catholics are significantly less enthusiastic about everything between here and Andromeda converting to an orgiastic mass of brain tissue.

So if our choice is about which ideal we let optimize the world, and if that optimization is strong enough, we run a real risk of causing unspeakable horrors by the lights of other ideals regardless of which ideal we pick:

The red/blue path represents where we go if we steer entirely by moral theory A/B.

You might think what we want, then, is compromise. A middle path:

And yes, I expect a compromise would be better. But I think the above is reductive and simplistic. Whatever Theory A & Theory B are, they’re both likely static, contingent paradigms. Any good progress probably looks more like striking a balance than picking just one view of the good to maximize, but the most significant progress likely reframes the axes:

Different times and places argued about different axes: salvation versus worldliness, honor versus piety, or duties to family versus duties to the state. Liberalism versus utilitarianism, the framing of this post, is a modern axis. Or, to take another important modern axis: economic efficiency versus fairness of distribution. You need modern economics to understand what these things really are or why they might be at odds, it’s contingent on facts about the world including the distribution of economically-relevant talent in the population, and their importance rests on modern ethical ideas about the responsibilities of society towards the poor.

As mentioned before, to the extent that liberalism has any unique claim, it’s that it’s a good meta-level value system. Live and let live is an excellent rule, if you want to foster the continued weaving of the moral thread, including its non-liberal elements. It lets the process continue, and it lets the new ideas and new paradigms come to light and prove their worth. It seems far less likely to sweep away the goodness and humanity of the world than any totalizing optimization process aiming for a fixed vision of the good.

Isaiah Berlin, again:

For every rationalist metaphysician, from Plato to the last disciples of Hegel or Marx, this abandonment of the notion of a final harmony in which all riddles are solved, all contradictions reconciled, is a piece of crude empiricism, abdication before brute facts, intolerable bankruptcy of reason before things as they are, failure to explain and to justify [...]. But [...] the ordinary resources of empirical observation and ordinary human knowledge [...] give us no warrant for supposing [...] that all good things, or all bad things for that matter, are reconcilable with each other. The world that we encounter in ordinary experience is one in which we are faced with choices between ends equally ultimate, and claims equally absolute, the realization of some of which must inevitably involve the sacrifice of others. Indeed, it is because this is their situation that men place such immense value upon the freedom to choose; for if they had assurance that in some perfect state, realizable by men on earth, no ends pursued by them would ever be in conflict, the necessity and agony of choice would disappear, and with it the central importance of the freedom to choose. Any method of bringing this final state nearer would then seem fully justified, no matter how much freedom were sacrificed to forward its advance.

Luxury space communism or annihilation?

There’s a common refrain that goes something like this: over time, we develop increasingly powerful technology. We should aim to become wiser about using technology faster than the technology gives us the power to wreak horrors. The grand arc of human history is a race between available destructive power and the amount of governance we have. Competition, pluralism, and a big dose of anarchy might’ve been a fine and productive way for the world to be in the 1700s, but after the invention of either the Gattling gun, the atomic bomb, or AGI, we’ll have to grow up and figure out something more controlled. The moral arc of the universe might bend slowly, but it bends towards either luxury space communism or annihilation.

For an extended discussion of this, see Michael Nielsen’s essay on How to be a wise optimist about science and technology (note that while Nielsen is sympathetic to the above argument, he does not explicitly endorse it). In graph form (taken from the same essay):

Often, this then leads to calls for centralization, from Oppenheimer advocating for world government in response to the atomic bomb, to Nick Bostrom’s proposal that comprehensive surveillance and totalitarian world government might be required to prevent existential risk from destructive future technologies.

A core motivation of this line of reasoning is that technology can only ramp up the power level, the variance in outcomes, the destruction. This has generally been a net effect of technological progress. But there are other consequences of technology as well.

Most obviously, there are technologies that tamp down on the negative shocks we’re subject to. Vaccines are perhaps the most important example of technology against natural threats. But the things in the world that change quickly are the human-made ones, so what matters is whether we can build technologies that can tamp negative shocks from human actions (vaccines work against engineered pandemics as well!). If all we have for defense is a stone fence, this isn’t too bad if the most powerful offense was a stone axe. But now that we’ve built nuclear weapons, can we build defensive fences strong enough?

Also: there are technologies that aren’t about offense/defense balance that make it easier or harder to enforce boundaries. Now we have cryptography that lets us keep information private, but what if in the future we get quantum computers that break existing cryptography? In the past it was hard to censor, but now LLMs make censorship trivially cheap. And consider too that what matters is not just capabilities, but propensity. Modern nuclear states could kill millions of people easily, but they’re prevented from doing so by mutually assured destruction, international norms, and the fact that nuclear states benefit from the world having lots of people in it engaging in positive-sum trade. Offense/defense balance is not the only axis.

Differential acceleration of defensive technology is something that Vitalik Buterin has argued for as part of his “d/acc” thesis. However, for reasons like the above, he deliberately keeps the “d” ambiguous: “defense”, yes, but also “decentralization” and “democracy”.

Is d/acc just a cluster of all the good things? With the liberalism v utilitarianism axis, I think we can restate an argument for d/acc like this: a lot of technology can be used well for utilitarian ends, in the sense that it gives us power over nature and the ability to shape the world to our desires, or to maximize whatever number we’ve set as our goal: make the atoms dance how we want. But this also increases raw power and capabilities, and importantly creates the power for some people—or AIs—to make some atoms you care about dance in a way you don’t like.

Now, if you think that unlimited raw power is coming down the technological pipeline, say in the form of self-improving AIs that might build a Dyson sphere around the sun by 2040, your bottleneck for human flourishing is not going to be raw power or wealth available, or raw ability to shape the world when it’s useful, or raw intelligence. It’s likely to be much more about enforcing boundaries, containing that raw power, keeping individual humans empowered and safe next to the supernova of that technological power.

So what we also need are technologies of liberalism, that help maintain different spheres of freedom, even as technologies of utilitarianism increase the control and power that actors have to achieve their chosen ends.

Technologies of liberalism

Defense, obviously, is one approach: tools that let you build walls that keep in things you don’t want to share, or lock out things you don’t like.

Often what’s important is selective permeability. At the most basic level, locks and keys (in existence since at least ancient Assyria 6000 years ago) let you restrict access to a place. Today, the ability to pass information over public channels without revealing it is at least as fundamental to our world as physical locks and keys. Also note how unexpected public key cryptography—where you don’t need to first share a secret with the counterparty—is. Diffie-Helman key exchange is one of the cleverest and most useful ideas humans have ever had.

Encryption is of enormous significance for the politics of freedom. David Friedman has argued (since the 1990s!):

In the centuries since [the passing of the Second Amendment], both the relative size of the military and the gap between military and civilian weapons have greatly increased, making that solution less workable [for giving citizens a last resort against despotic government]. In my view, at least, conflicts between our government and its citizens in the immediate future will depend more on control of information than control of weapons, making unregulated encryption [...] the modern equivalent of the 2nd Amendment.

Ivan Vendrov, in a poetic essay on the building blocks of 21st century political structure, points to cryptography (exemplified by the hash function) as the one thing standing against the centralizing information flows that incentivizes, since it lets us cheaply erect barriers that have astronomical asymmetry in favor of defense. Encryption that takes almost no time at all can be made so strong it would take hundreds of years on a supercomputer to break. In our brute physical world, destruction is easy and defensive barriers are hard. But the more of our world becomes virtual, the more of our society exists in a realm where defense is vastly favored. Of the gifts to liberty that this universe contains that will most bear fruit in our future, encryption ranks along the Hayekian nature of knowledge and the finite speed of light.

So far we are far from realizing the theoretical security properties of virtual space; reliable AI coding agents and formal verification will help. And virtual space will continue being embedded in meatspace.

In meatspace, the biggest vulnerability humans have are infectious diseases. The state of humanity’s readiness against pandemics (natural or engineered) is pathetic, and in expectations millions and more will pay with their lives for this. We need technology that lets us prevent airborne diseases, to prevent an increasing number of actors—already every major government, but soon plausibly any competent terrorist cell—wielding a veto on our lives. UV-C lighting and rapid at-home vaccine manufacturing are good starts. Also, unless we curtail this risk through broad-spectrum defensive technology, at some point it will be used as an excuse for massive centralization of power to crush the risk before it happens.

Besides literal defenses, there’s decentralization: self-sufficiency gives independence as well as safety from coercion. At a simple level, solar panels and home batteries can help with energy independence. 3D printers make it possible to manufacture more locally and more under your control. Open-source software makes it possible for you to tinker with and improve the programs you run. AI code-gen makes this even stronger: eventually, anyone will be able to build their own tech stack, rather than being funneled along the grooves cut by big tech companies—which, remember, they adversarially optimize against you to make you lose as much time out of your life as possible. The printing press and the internet both made it cheaper to distribute writing of your choice, and hence push your own ideas and create an intellectual sphere of your own without wealth or institutional buy in.

Fundamentally, however, the thing you want to do is avoid coercion, rather than maximize independence. No man is an island, they say, and that might not have been intended as a challenge. Coercion is hard if the world is very decentralized, to the point that no one has a much larger power base than anyone else. However, such egalitarianism is unlikely since everyone has different resources and talents. Another key feature of the modern world that helps reduce coercion, in addition to economic and military power being decentralized, is that the world is covered in a thick web of mutual dependence thanks to trade. If you depend on someone, you are also unlikely to be too coercive towards them, as Kipling already understood. Self-sufficiency and webs of mutual dependencies are, of course, in tension. I don’t pretend there’s a simple way to pick between them, or know how much of each to have—I never said this would be simple!

AI, however, is a challenge to any type of decentralization: it needs lots of expensive centralized infrastructure, and currently the focus is on huge multi-billion dollar pre-training runs. However, even here there’s a story for something more decentralized to take center stage. First, zero-trust privacy infrastructure for private AI inference & training—salvation by cryptography once again!—can give you verifiable control and privacy over your own AI, even if it runs in public clouds for cost reasons. Secondly, as Luke and I have argued, the data needs to have AI diffuse through the economy run into Hayekian constraints that privilege the tacit, process, and local knowledge of the people currently doing the jobs AI seeks to replace—as long as the big labs and their data collection efforts don’t succeed too hard and too fast. Combined with this, cheap bespoke post-training offers a Cambrian explosion of model variety as an alternative to the cultural & value lock-in to the tastes of a few labs. Achieving this liberal vision of the post-AGI economy is what we’re working on at Workshop Labs.

Then there’s technology that helps improve institutions—democratization, but not “democracy” as in just “one person one vote”, but in the more fundamental sense of “people” (the demos) having “power” (kratos). We’ve already mentioned zero-knowledge proofs, memoryless AI, and other primitives for contracts. Blockchains, though much hyped, deserve some space on this list, since they let you get hard-to-forge consensus about some data without a central authority (needing consensus about a list is a surprisingly common reason to have a central authority!). More ambitiously, Seb Krier points out AI could enable “Coasean bargaining” (i.e. bargaining with transaction costs so low Coase’s theorem actually holds)—bespoke deals, mediated by AIs acting on behalf of humans, for all sorts of issues that currently get ignored or stuck in horrible equilibria because negotiating requires expensive human time. Another way to increase the amount of democracy when human time is bottlenecked is AI representatives for people (though these would have to be very high-fidelity representatives to represent someone’s values faithfully—values are thick, not thin, and AI design should take this into account).

Finally, there are technologies shape culture and make us wiser, or let society be more pluralistic. Many information technologies, like the internet or AI chatbots, obviously fall in this category and enable great things. However, information technology, in addition to giving us information and helping us coordinate, often also changes the incentives for which memes (in the original Dawkins sense) spread, and therefore often have vast second-order consequences. On net I expect information technology to be good, but with far more requirements for cultural shifts to properly adapt to them, and far more propensity than other technologies to shift culture or politics in unexpected directions.

(Note that the pillars above are very similar to the those of the anti-intelligence-curse strategy Luke & I outlined here.)

Historically, you can trace the ebb and flow of the plight of the average person by how decentralizing or centralizing the technology most essential for national power is, and how much that technology creates mutual dependencies that make it hard for the elite to defect against the masses. MacInnes et al.’s excellent paper Anarchy as Architect traces this back throughout history. Bronze weapons, which required rare bronze, meant only a small military elite was relevant in battle, and could easily rule over the rest. The introduction of iron weapons, which could be produced en-massed, lead to less-centralized societies, as the rulers needed large groups to fight for them. Mounted armored knights centralized power again, before mass-produced firearms made large armies important again. The institutions of modern Western democracy were mostly built during the firearm-dominated era of mass warfare. They were also built at a time when national power relied more and more on broad-based economic and technological progress where a rich and skilled citizenry was very valuable. This rich and skilled citizenry was also increasingly socially mobile and free of class boundaries, making it harder for the upper class to coordinate around its interests since it wasn’t demarcated by a sharp line. People also had unprecedented ability to communicate with each other in order to organize and argue thanks to new technology (and, of course, because the government now had to teach them to read—before around 1900, literacy was not near-universal even in rich countries). These nations then increasingly dissolved trade barriers between them and became interdependent—very purposefully and consciously, in the case of post-war Europe—which reduced nation-on-nation violence and coercion.

How will this story continue? It will definitely change—as it should. We should not assume that the future socio-structural recipe for liberalism is necessarily the one it is now. But hopefully it changes in ways where it still adds up to freedom for an increasing number.

Build for freedom

Differential technological progress is obviously possible. For example, Luke argues against what he calls “Technocalvinism“, the belief that the “tech tree” is deterministic and everything about it is preordained, using examples ranging from differential progress in solar panels to the feasibility of llama-drawn carts. The course of the technology is like a new river running down across the landscape. It has a lot of momentum, but it can be diverted from one path to another, sometimes by just digging a small ditch.

There are three factors that help, the first two in general, and the last specifically for technologies of liberalism:

There is a lot of path dependence in the development of technologies, and effects like Wright’s law mean that things can snowball: if you manage to produce a bit, it gets cheaper and it’s better to produce a larger amount, and then an enormous amount. This is how solar panels went from expensive things stuck on satellites to on track to dominate the global energy supply. Tech, once it exists, is also often sticky and hard to roll back. A trickle you start can become a stream. (And whether or not you can change the final destination, you can definitely change the order, and that’s often what you need to avert risks. All rivers end in the ocean, but they take different paths, and the path matters a lot for whether people get flooded.)
The choice of technology that a society builds towards is a deeply cultural, hyperstitious phenomenon. This is argued, for example, by Byrne Hobart and Tobias Huber’s Boom: Bubbles and the End of Stagnation (you can find an excellent review here, which occasionally even mentions the book in question). The Apollo program, the Manhattan project, or the computer revolution were all deeply idealistic and contingent projects that were unrealistic when conceived—yes, even the computer revolution, hard as it is to remember today. Those who preach the inevitability of centralization and control, whether under the machine-god or otherwise, are not neutral bullet-biters, but accidentally or intentionally hyperstitioning that very vision into life.
Demand is high. People want to be safe! People want control! People want power!

So how should we steer the river of technology?

If the AGI labs’ visions of AGI is achieved, by default it will disempower the vast majority of humanity instantly, depriving them of their power, leverage, and income as they’re outcompeted by AIs, while wrecking the incentives that have made power increasingly care about people over the past few centuries, and elevating a select few political & lab leaders to positions of near godhood. It might be the most radical swing in the balance of power away from humanity at large and towards a small elite in human history. And the default solution for this in modern AI discourse is to accept the inevitability of the machine-god dictatorship, accept the lock-in of our particular society and values, and argue instead about who exactly stands at its helm and which exact utility function is enshrined as humanity’s goal from here on out.

All this is happening while liberalism is already in cultural, political, and geopolitical retreat. If its flame isn’t fanned by the ideological conviction of those in power or the drift of geopolitics, its survival increasingly depends on other sources.

So: to make an impact, don’t build to maximize intelligence or power, because this will not be the constraint on a good future. This is a weird position to be in! Historically, we did lack force and brains, and had to claw ourselves out of a deep abyss of feebleness to our current world of engines, nuclear power, computers, and coding AIs. Even now, if radical AI was not an event horizon towards which we were barreling towards that was poised to massively accelerate technology and reshape the position of humans, more power would be a very reasonable thing to aim towards—we would greatly benefit from energy abundance thanks to fusion, and robotics beyond our dreams, and space colonization. (And if that radical AI looks majorly delayed, or gets banned, or technology in general grinds to a halt, then making progress towards tools of utilitarianism and power would once again be top-of-mind.) But if radical tech acceleration through AI is on the horizon, we might really be on track for literally dismantling the solar system within the few decades. The thing we might lack is the tools to secure our freedoms and rights through the upheaval that brings. Technologists, especially in the Bay Area, should rekindle the belief in freedom that guided their predecessors.

For decades, the world has mostly been at peace, liberty has spread, and the incentives of power have been aligned with the prosperity of the majority. History seemed over. All we had to do was optimize our control and power over nature to allow ever greater and greater welfare.

But now, history is back. Freedom is on trial. The technology we choose to build could tip the scales. What will you build?

Thank you to Luke Drago & Elsie Jang for reviewing drafts of this post.

The Intelligence Curse: an essay series

Rudolf Laine — Thu, 24 Apr 2025 12:02:22 GMT

Together with Luke Drago, we’ve published an essay series on what we call the intelligence curse. Most content is brand new, and all previous writing has been heavily reworked.

Visit intelligence-curse.ai for the full series.

Below is the introduction and table of contents.

Art by Nomads & Vagabonds.

We will soon live in the intelligence age. What you do with that information will determine your place in history.

The imminent arrival of AGI has pushed many to try to seize the levers of power as quickly as possible, leaping towards projects that, if successful, would comprehensively automate all work. There is a trillion-dollar arms race to see who can achieve such a capability first, with trillions more in gains to be won.

Yes, that means you’ll lose your job. But it goes beyond that: this will remove the need for regular people in our economy. Powerful actors—like states and companies—will no longer have an incentive to care about regular people. We call this the intelligence curse.

If we do nothing, the intelligence curse will work like this:

Powerful AI will push automation through existing organizations, starting from the bottom and moving to the top.
AI will obsolete even outlier human talent. Social mobility will stop, ending the social dynamism and progress that it drives.
Non-human factors of production, like capital, resources, and control over AI, will become overwhelmingly more important than humans.
This will usher in incentives for powerful actors around the world that break the modern social contract.
This could result in the gradual—or sudden—disempowerment of the vast majority of humanity.

But this prophecy is not yet fulfilled; we reject the view that this path is inevitable. We see a different future on the horizon, but it will require a deliberate and concerted effort to achieve it.

We aim to change the incentives driving the intelligence curse, maintaining human economic relevance and strengthening our democratic institutions to withstand what will likely be the greatest societal disruption in history.

To break the intelligence curse, we should chart a different path on the tech tree, building technology that lets us:

Avert AI catastrophes by hardening the world against them, both because it is good in itself and because it removes the security threats that drive calls for centralization.
Diffuse AI, to get it in the hands of regular people. In the short-term, build AI that augments human capabilities. In the long-term, align AI directly to individual users and give everyone control in the AI economy.
Democratize institutions, making them more anchored to the needs of humans even as they are buffeted by the changing incentive landscape and fast-moving events of the AGI transition.

In this series of essays, we examine the incoming crisis of human irrelevance and provide a map towards a future where people remain the masters of their destiny.

Chapters

1. Introduction

We will soon live in the intelligence age. What you do with that information will determine your place in history.

2. Pyramid Replacement

Increasingly powerful AI will trigger pyramid replacement: a systematic hollowing out of corporate structures that starts with entry-level hiring freezes and moves upward through waves of layoffs.

3. Capital, AGI, and Human Ambition

AI will make non-human factors of production more important than human ones. The result may be a future where today's power structures become permanent and frozen, with no remaining pathways for social mobility or progress.

4. Defining the Intelligence Curse

With AGI, powerful actors will lose their incentive to invest in regular people–just as resource-rich states today neglect their citizens because their wealth comes from natural resources rather than taxing human labor. This is the intelligence curse.

5. Shaping the Social Contract

The intelligence curse will break the core social contract. While this suggests a grim future, understanding how economic incentives reshape societies points to a solution: we can deliberately develop technologies that keep humans relevant.

6. Breaking the Intelligence Curse

Avert AI catastrophes with technology for safety and hardening without requiring centralizing control. Diffuse AI that differentially augments rather than automates humans and decentralizes power. Democratize institutions, bringing them closer to regular people as AI grows more powerful.

7. History is Yours to Write

You have a roadmap to break the intelligence curse. What will you do with it?

A History of the Future, 2030-2040

Mon, 17 Feb 2025 01:56:37 GMT

The Great Day of His Wrath. (John Martin)

Subscribe now

(Part 1 here, part 2 here)

The end of white-collar work and the new job scene

By the late 2020s, office jobs in developed countries are now basically about overseeing and providing direction to AI systems, and the last part of that is mostly on-paper rather than in practice. There is lots of talk about values and missions and the future, and a lot of unspoken communication about office politics and status. Many office workers don’t do much at all. Concretely, they might get to work, have a team standup, check in on how the AIs are doing, have some ritualistic meetings with their manager and any employees they have, and rubber-stamp some AI decisions that they’re contractually or legally obliged to stamp, with this adding up to only a few hours. Occasionally they might decide to change some goal the AIs have been given, but that requires just speaking or typing a paragraph. Many people feel guilty about this, but it’s mostly a quiet guilt. They fill their time with office chat or scrolling on their phones. Many companies become more social and more about community. HR has never been more influential. Everything’s both more cuddly and/or more viciously political now that the ugly raw realities of individual competence don't matter any more.

Some organisations try to fire lots of people. Sometimes it goes well. Sometimes it goes badly, and they realise that some human somewhere was holding some knowledge in their head, or nudging the mission in the right direction, in a way that was essential. However, by then it’s too late, and it’s hard to say which person it actually was anyway. Among the more ruthless or tech-adjacent management cultures, there’s a lot of talk about figuring out what the load-bearing humans in any organisation are, and how this is surprisingly difficult to do at a large organisation. Some companies develop internal AI systems to try to figure this out (or buy such systems from startups), but they need to collect some data about the functioning of the org first, which takes time. Also, the workers are incentivised to resist and fight back in a thousand subtle ways, and they do. Also, sometimes when an org tries to fire a lot of people, an online mob emerges to hate on them, influencers pile in and create 13 different cinematic universes where the theme is all how Company X is the pinnacle of all human evil, sometimes a former employee creates an AI-powered revenge cult (several assassinations happen as a result from the more violent of the cults), and sometimes politicians pick up the issue. The companies, largely, were profitable before, and are more profitable now that they’ve enjoyed a few years’ of revenue growth without expanding headcount. Therefore, mass firing is surprisingly rarely worth it, even though it would in principle be possible. A few firms facing crises or with especially effective or risk-tolerant leadership buck these trends and aggressively slash costs by cutting huge amounts of human workers.

What developed country firms are not doing is hiring new workers or replacing anyone who retires. What they are doing is replacing any foreign contractors or service providers with cheaper AI ones.

This creates several groups of disaffected people. First, the youth in developed countries, who have much worse job prospects than the preceding generation. For people looking for their first job in 2030-2031 in a developed country, the options are roughly:

Working in services where being human intrinsically matters (elderly care, retail, restaurants, hospitality, teaching, etc.). Healthcare is by far the most prestigious one and what many aim for (even though doctors—or at least all the good ones—defer all diagnosis and other intellectual work to the AIs). The cartel-like nature of medical licensing bodies, strain on state budgets, and the fact that most of the actual work is done by AIs means that the number of doctors or nurses hasn't increased much, though, so entry has become even more competitive. Policing and primary education also continue hiring humans at scale.
Jobs that are effectively sinecures. This includes many positions in government and civil service. In the EU, regulation passed in 2028 means that many companies are forced to hire human "overseers" to key positions. Of course, the supply of sinecures is set by regulation and funding for economically useless activities. Competition for such positions is therefore extremely harsh, and (because the selection criteria, having no reason to be one thing rather than another, inherit the latest credentialist instantiation of the 21st century West's bureaucratic blob) requires extreme conformism. This category has a fuzzy boundary with the first, depending on whether you value the ceremonial human touch as a key part of the service or not.
A particular example of this is the law. Lawyers have two major advantages. First, their job deals closely with important social questions of legitimacy and propriety, making it a natural claim that something fundamental would be lost if the human presence was gone. The presence of lawyers evolves to be more ceremonial and symbolic—almost religious—but it stays. Second, lawyers make up a lot of the rules for themselves, and interpret the statutes for everyone else. Third, a lot of politicians are lawyers, or have friends who are lawyers, which make them attuned to lawyer interests. This gives lawyers a lot of leeway in what automation they allow. In many countries the rules are bent such that it is flat-out illegal to consult an AI on legal matters; you have to go through a human lawyer. AI companies are forced to train their AIs to comply with this ("I'm sorry, but as an AI it is illegal for me to give advice on legal matters, so I recommend you hire a licensed lawyer"). Of course, all of the actual legal research and argumentation is done by AIs—the lawyers just monopolise the position of being allowed to ask them.
Manufacturing, which is booming, especially as productivity has been lifted by AI management and oversight. Many manufacturing jobs involve wearing an earpiece through which you receive detailed step-by-step orders from an AI (and occasionally AR glasses that can show you diagrams or an overlay for how to move your hands). A large fraction of people go into this, even if they have prestigious university degrees (many of the prestigious degree-holders do not have their salary and status expectations met and become resentful).
Academia. There are still humans in academia who somewhat matter for intellectual progress, but they're all either experienced humans with years of research taste in economically-valuable non-purely-mathematical areas (who are actually in decently high demand, as the AI labs chase feedback sources that will help them faster and cheaper get the models superhuman at even the very last set of very long-horizon, hard-to-measure skills), or (especially in the US) "prof-luencers" who use the status of a successful prior academic career to boost their influencer careers. New entrants to academia get their academic salaries (if they win an ever more cut-throat competition), but not the hope of actually mattering for intellectual progress. Some derive satisfaction that they can at least keep deep human expertise alive into the future—though it seems like without any ground-truth feedback signal, many lineages of human expertise will become dead knowledge within a generation even if people still go through the motions of “learning” them.
Becoming an influencer. Works for some, but the competition is extremely tough (though it does help that "being verifiably human" is in vogue).
Becoming a socialite. The infinite variety and competition (human and AI) in the digital world is driving a resurgence of an in-person social scene. However, for this to be a "career choice", you must either already be wealthy, or have some other factor in your favour. The overwhelmingly most common such factor is being a young woman who inserts herself into the social scene of moneyed men.
Becoming a musician, artist, or poet. The main constraint is funding, of which there are two important types: government subsidies (which generally increase, as being vaguely pro human self-expression is a common government answer to what people should do with themselves in the age of AI, especially in socially progressive European countries), and wealthy patrons. Being an artist for the latter often melds into being a socialite, since in-person local artists are the prevailing fashion. Many nouveau rich techies, wanting to erase their association with the now-uncool world of software, throw money at artists who live in their local community to do some arts-and-crafts thing and then show up with it to their party and say vaguely artsy things.
Going into politics. This has become more appealing to the young in particular, since the future looks uncertain and the youth are the ones who expect to live in it longest. There are many AI youth activists (in the sense of specialising in the topic of AI, being AIs, or both), who try to use their position to advance youth interests. The problem is that they don't have concrete policy asks beyond "allocate more money to us", which puts them at odds with every other interest group in society, many of which (e.g. the retired) outnumber them in raw numbers as a voting bloc as well as in terms of resources, power, and influence.

Culturally, intellectualism is out, after having climbed in cultural status for two centuries before the mid-2020s as labour-augmenting technology and globalisation scaled its power. Charisma, conformism, sociability, authenticity, and propriety are all in. In the early 2030s, the US is becoming more European in its attitudes, especially along the above dimensions. While the high-water of conformism that the 2010s culture wars and academic sclerosis had caused receded throughout 2022-2027, a modified and less-political, more European-style propriety-focused conformism rose around 2030. First, this was driven by cultural changes downstream of AI reducing the rewards for risk and entrepreneurship. Second, there was a cultural backlash against the techies (who were seen as having pretensions of importance after the automation of software in the late 2020s, and as lead by a bastion of improper disruptive moguls who were on the wrong side of a Republican power struggle in 2028), and through the techies the culture of ambition that had become central to their self-narrative.

A second disaffected group is the developing countries. Replacing outsourced foreign human services (e.g. call centres) with AIs is a cost-saving that can be done without political or social repercussions, so all companies did it—often at significant scale as early as 2025-2026 for text-only tasks. As a result, services-led export growth is dead. This is bad for India and the Philippines in particular. India just about achieved the economic heft where it could've been relevant in AI, but throughout the 2020s was unable to become entrenched in any part of the AI supply chain. At the same time, as more of the population in developed countries goes back to working in manufacturing, political demands for protecting developed country manufacturers from competition with developing countries grows. This leads to even more tariffs, on top of the already-existing late 2020s trend towards more and more tit-for-tat tariff escalation. This makes developing country growth based on goods exports harder. The biggest shock to existing goods export industries won’t arrive until a few years later when the robots show up, but investment into the developing world already dries up, as US productivity growth rises and is expected to rise even more.

China is about a year behind in leading AI tech but about 2-3 years behind in AI diffusion. The Chinese public and the CCP are watching the coming wave of AI job automation with worry, especially as there is a big cultural emphasis on exactly the types of academic skills that are getting obsoleted quickly. The CCP is very worried about stability. More and more people are joining the Party, since they see that other opportunities for social advancement are ending. The Party is making many more roles in it available, and using this carrot to incentivise people to adhere to Party principles even more strongly. The AI surveillance state keeps expanding; there is now AI interpretation of much CCTV footage of public streets, for example. There are efforts underway to modernise (i.e., convert to AI) most of the military, such that the Party's control cannot be threatened even if the human military is destabilised.

In the US, the leading plan seems to be a hodgepodge of regulation-mandated human job roles, and eventually maybe UBI. However, a fiscal crisis is on the horizon because of the looming social security trust fund exhaustion. GDP growth in 2031-2032 is hitting 5% per year but full UBI still seems expensive. In the EU, there is more state intervention and regulation aimed at keeping humans in the loop, with massive corporate and government hierarchies of jobs that are effectively pure sinecures where the work is all done by AIs, which is temporarily reducing the demand for flat-out UBI.

When governments ask companies what their blockers are, companies cite regulations that keep humans in the loop, and (when off the record) everyone shares a sentiment that humans aren’t actually in the loop anyway. Shortcuts are already being taken to reduce the human oversight component. It’s very hard to do this legally, because there are often government-mandated AIs monitoring compliance with the human oversight rules. Two firms might want to maintain their human workers for complicated regulatory and office politics and inertia reasons, but they’re competing against each other, and against full-AI firms, and against foreign adversaries. So pressure increases to cut the unnecessary weight. There’s also a race to the bottom internationally. Many autonomous AI-run companies in 2030-2033 move to less-regulated areas, take the slight hit of running on open-source models, and serve customers from there. However, this global decentralisation is reversed once the robotic revolution—subsidised and encouraged by the American and Chinese governments—gets under way.

Lab strategy amid superintelligence and robotics

The state of AI capabilities around 2030 is roughly as follows: wherever there is an easy feedback signal and a high performance ceiling, such as maths or code, the models are incomprehensibly superhuman. Where rapid iteration is possible but the performance ceiling is not as high, like having sales calls, the AIs are better than all humans. In general, the AIs can be more charismatic and persuasive than humans, but this does not give them superpowers over steering individual humans as they like, especially when to do so they would have to compete with every other memetic force in society as well as the individual's resistance to being psychologically hacked. Wherever there is a large pile of information, such as supply chain routing or crystal structure prediction or history or legal precedent, the AIs are superhuman at spotting and understanding the patterns and generalising them to new instances. However, models appear to still be roughly human-level at long-horizon tasks with ambiguous success metrics. Companies, governments, and research agendas—even the scrappier, faster-changing ones—are still piloted by humans who make real strategy decisions, even though in practice it's a human riding on a vast wave of AI supercognition, and the trend is towards more and more delegation as the systems improve. Real-world progress in hard tech is also varied. There are many breakthroughs in parts of materials science and molecular biology driven by things like material property and protein folding prediction that cuts down on empirical iteration. However, other tasks turn out to be computationally intractable even to the smart AIs, even if they often achieve very large efficiency gains over the human state-of-the-art by inventing superhumanly good heuristics. No one has figured out how to turn the vast amounts of intelligence-on-tap into magical-seeming technical progress in atoms, even though engineering work now happens much faster and at a higher quality level and with less margin between practical and theoretical performance.

In 2029, OpenAI rebrands its models to just “o”. Everyone has Opinions. It’s a big advance in raw intelligence, but almost no one can tell. Instead of a variety of sizes of an o-series model with updates every few months, from now on there will be a few varieties (differing mainly in size, like o-small and o-large and an internal-only o-huge, but also with some specialised finetuned models, e.g. o-math and o-chat). Individual instances of the models can use their medium-term memory as context when they’re doing agentic tasks, but they can also run in “functional” or “API” mode where that is disabled. More than half of OpenAI’s model revenues still come from functional mode calls rather than running instances as agents that develop their own memories and learn on the fly, but this proportion is steadily falling. There’s a new model checkpoint released every day, with the newest information from that day already in its weights, and the occasional larger improvement.

By 2030, OpenAI has culled almost all of its human employees. This is the main advantage of their latest model internally—the tacit internal knowledge that various humans previously had that would’ve made the human-level-ish o6 not quite adequate at wholesale replacement of OpenAI engineers matters less when o-huge can just rederive the tacit knowledge from scratch very quickly.

OpenAI's b-series human robots reach annualised shipment volumes of 1M/year in late 2031, which gives it about 50% market share in the total domestic robot servant market. Several million other general-purpose robots (e.g. for use in manufacturing) are also being sold by 2031.

OpenAI is seen by some as a slightly shambolic conglomerate, like an Oracle or IBM or Microsoft, and by others as the original and one true AI company that is destined to be >50% of world GDP.

The robotics sector is split between special-purpose robots with modern AI integrations, e.g. window-cleaning robots and pipe-crawling repair robots and delivery drones, and general-purpose robots being pursued by OpenAI and several other companies (including a struggling Franco-German startup that is kept afloat by the EU being hell-bent on endlessly subsidising it until Europe finally has a big tech company—the European Commission is confused why this is not producing results). Both paths seem technically feasible. However, the general-purpose robotic players are the better-resourced ones, and are run by people whose main past reference point was the generative AI wave, and therefore they are philosophically big believers in scaling laws, so they are betting on collecting all the robotics data as the path to improving quality, and on Wright's law to bring down hardware costs as they build more and more of the same thing.

All of this is also happening at unprecedented efficiency and speed compared to prior research efforts, since there are superintelligent STEM AIs around inventing algorithms that massively bring down the sample complexity of the robotics control algorithms, organising the assembly lines, doing the CAD work, and so on. However, the actual learning to move part is still a machine learning problem bottlenecked by data, and there is no magic wand that can instantly create massive robot factories from scratch (especially given the raw resources required). The output scaling curve looks to be roughly a 4x increase of robotics capacity per year, though. This is expected to rise for 2033-2035, as the robots automate more and more of the robot production pipeline, but bottlenecks abound, and energy and land constraints (mostly downstream of regulation) are harsh.

Anthropic works with a bunch of Western governments and NGOs on strict KYC for agentic model customers—the standards have so far been somewhat shoestring, the coming robot wave is making the need much clearer, and there was a big scandal last year with a heavily AI-aided chemical terrorist attack. The cyber situation has calmed down though, with defense dominating, as key code is now either provably correct or so thoroughly tested by countless AI systems that it's close enough. Biological capabilities have already been artificially kept down by most of the key model players (including open-source and Chinese ones). Taking any large-scale actions with models that aren't from the dark web in the West and China, especially in wet lab virology or DNA synthesis, requires specific access permissions from the labs through government-mandated schemes. However, by 2030 there are open-source dark web models that will do whatever you want including designing candidate pandemic agents that are unnaturally lethal and virulent, and there is no quick way to pandemic-proof the world against bioterrorism. The remaining difficulty of wet lab work, the low number of totally insane actors, and AI surveillance are the main forces keeping the per-year odds not too high, but civilisation is clearly running a big risk. The national security apparatus in both the US and China is more relaxed about this threat than it would otherwise be, because the military and economy are both increasingly robotic and so it’s not a threat to the regime even if most of the population drops dead from mega-flu. For example, the US war plans in event of a devastating pandemic (or nuclear) attack now include AIs substituting for any of the critical industry CEOs or defense staff that die.

Another big Anthropic effort is AI for biology. They want to cure cancer, make humans live forever, etc. A major internal faction also wants to pursue human intelligence augmentation but leadership fears this would be too controversial to discuss in public, so they just have a single secret team working with the CIA on it. Innovation in biotech has definitely risen, since designing promising drug candidates is ridiculously fast and cheap, but the bottleneck even before the AI revolution was less the design part and more clinical trial regulation. Anthropic is curating datasets, acquiring laboratory automation startups, and working with regulators to cut down red tape. This will take years to bear fruit, but seems to be leading towards a biotech revolution over the next decade.

Anthropic is also trying to use biotechnology to bootstrap powerful nanotechnology. However, the company’s attempts to get their AIs to do the physics and engineering hit some snags, especially as they lack xAI’s or GDM’s specialisations in physics/maths/engineering (having trusted more in domain-general intelligence). Still, it is the AI era, so the AIs can fairly quickly get up to speed on this stuff, and the Pentagon is helping.

Towards the automated robot economy

In 2033, about 40 million humanoid robots are shipped. An increasing fraction is going to industrial uses. Costs have come down to that of a cheap car and are declining further, especially as the entire manufacturing process can now be done by the robots themselves in the most advanced factories. This also means that full AI control and real-time optimisation of the entire robot manufacturing line is possible, leading to unparalleled factory output growth and ease of iterating on the design.

As a result, over 2032-2034 there's a Cambrian explosion of robot diversity into non-humanoid form factors. By 2035, a large fraction of developed country consumers have household robots performing almost all manual tasks at home. Construction work, assembly line work, agricultural work, solar panel installation, plumbing work, industrial machinery repairs, and electrical utility jobs can all in principle be done fully by robots by 2034. The main constraint is energy and resources for the physical manufacturing of the robots—as well as land and regulations.

By 2034-2035, advances in nanotech are also arriving. Rather than a single magical-seeming assembler, the nanotech advances are mostly in medical areas (such as targeted drug delivery to specific locations within the body, which is a huge boost to cancer treatment, and early prototypes of cellular repair machines), and in materials science advances that allow for stronger and lighter and self-healing materials, and better batteries. These can all be used in robots; some look supernaturally strong and capable to humans. The manufacturing robots also get "magic fingers", where the tip of a robot appendage is a surface that can do very controlled and fine-grained precision welding, polymer (un)curing, deposition of substances, and catalysis of chemical reactions..

The 40 million humanoid robots shipped worldwide in 2033 do roughly the work of 80 million human workers since they can work longer hours than humans. In 2034, there are 240 million human-worker-equivalents of robotic capacity shipped, and in 2035 about 1.1 billion human-worker-equivalents.

Politically, this is as if hundreds of millions of extremely talented immigrants who accept below-minimum-wage jobs had suddenly sprouted out from the ground, in each of the developed countries and China. Years of upheaval in white-collar work have given politicians and activists experience in dealing with such things, and they are better prepared.

In America, the Republicans narrowly keep the White House in 2032. The Democrats ran on an attempt to solve rising unemployment through European-style human-in-the-loop laws, including an expansion of "pro-social, meaning-creating" human roles in the government bureaucracy, education, and the lawyer cartel, while having a major retraining initiative for blue-collar workers threatened by robotics. In the few months before the election, there was a burst of about a hundred thousand people losing their jobs very directly to robots. A run of impressive robotics demos fermented hysterical online influencer coverage and blue-collar job fears. The retraining initiatives for blue-collar workers became seen as insufficient and out-of-touch with the "average American" who does not want to be reeducated into performing some ceremonial role in a bureaucracy whose culture they don't agree with.

The Republicans counter this with the PROSPER Act (Promoting Robot Ownership and Small-business Prosperity through Economic Restructuring), which they campaign on and pass in 2033. This creates a car dealership -like model for robot ownership, where robotics companies are not allowed to sell “consumer robotics services” directly to consumers (sectors like defense and mining are exempt). “Ordinary Americans" can apply for loans to start their own robotic services business. Also, a license is required to sell consumer robotics services in a given territory, and a given legal entity can only operate in one territory. The territories default to state legislative districts, most of which are between 30k-150k in population, but states are allowed to change the territory unit. Licenses for a territory are granted at the local level. For example, Joe Smith in Prescott, Arizona might get a government loan, buy 10 plumbing robots, and sell their services to other Prescott residents. He himself doesn't do much, since the robots do the plumbing and the AI does the planning, logistics, accounting, and so on for him. But nominally, he is now a small-business owner, and is most definitely not a welfare recipient freeloading on Uncle Sam.

If any robotics licensing territory gets too much competition in a single robotics services vertical, competition drives margins to zero. There is also little that differentiates the different robotics service providers. Therefore, an instant race begins for regulatory capture of each robotics license territory, which is often won by whichever actor had the most networks and funds at the beginning (though anti-trust prevents full monopolies, so there's almost always at least 2 service providers). Much of the market share fluctuation becomes about social networks and persuasion. The savvy robotics license owners in particular try to manipulate local cultural currents to restrict the granting of licenses to new entrants. Alternatively, the leader of a local AI-powered personality cult will just declare who deserves the licenses. Even with the robotics licensing regime, though, only a small fraction of the population is owners of economically-relevant assets. Social and economic life increasingly revolves around the few families with control over income-generating assets (whether sinecures or robotics licenses or property or stocks). Marriage into such families gradually becomes a more and more common tool of socioeconomic ambition. Many give up on earning an income at all, and make ends meet by moving to areas with ridiculously cheap property.

Above all local scenes are the true US national elite—powerful politicians, billionaires, senior government advisors, and some others. On average they still feel some noblesse oblige towards the lower classes, though in the late 2030s this is waning as they start feeling in their bones that their position of power is not dependent on the people anymore. However, their main preoccupation is status competition with others on their level. Many of these are inter-elite disputes with little bearing for the world, but on net there is also a strong desire to compete with China. In particular, the narrative that the race through the robotics buildout will be decisive for the far-flung future of humanity gained a lot of prominence through the late 2020s and early 2030s. This creates a strong elite consensus that competition with China must be won, and that the way to do so is to stabilise the domestic situation, but then otherwise let the robotics wave rip. The plans for domestic semiconductor self-sufficiency are on track to come true only a bit behind schedule in 2034. Actually-working ICBM defense, designed by superhuman engineering AIs around 2030, is fully online and working by 2034 thanks to the speed of manufacturing scaleups in the age of robotics. The military is able to field hundreds of millions of small drones and millions of robot soldiers. Pentagon projects on nanotechnology and other exotic physics applications may bring about powerful new technologies within another few years.

China, of course, also sees the need to win, especially as its lead in industrial robotics vanishes when America’s robotics revolution happens a bit before China’s. The CCP is also decoupling its treatment of the human economy from its treatment of geopolitics and the "real" robotic economy. In 2034, the CCP declares that citizens need to "eat bitterness", in the form of accepting per-capita living standards stagnating for a while (at around $37k, PPP-adjusted, in 2025 dollars) while the state diverts resources to fueling the robotic revolution to avoid losing in the geopolitical competition.

In the EU, AI diffusion has been slower due to regulatory hurdles, but the extinction of white-collar work is still well underway, and the robotics wave is coming only a few years after the US and China. However, this delay is enough to make the EU geopolitically irrelevant. The greatest external threat to the EU is Russia, which has suddenly gotten much richer as Chinese companies effectively colonize Siberia to mine resources to fuel China's robotic buildup while paying large rents to the Russian government. The US lead at the robotics revolution also drains manufacturing jobs out of the EU, until EU countries are politically forced to shut off trade (though a political movement, active especially in Eastern Europe, would've wanted to negotiate a stronger US security presence in exchange for letting trade continue and domestic industries wither). Various proposals for UBI float around, but economic turmoil makes the prospect of funding it uncertain, and the political fight by special interest groups for privileges for their group in particular is extremely fierce and they are all opposed to UBI for everyone. By 2036, functionally everyone within the EU has some kind of regular state payout they live on, not through a single system but through an extremely complicated patronage network (that non-AI-aided humans literally could not understand) where the average person is eking out a living in exchange for taking part in complicated cultural rites and bureaucracies.

The developing world suffers. Already, manufacturing jobs were lost in the global south—developed country workers streamed from services to manufacturing, while having their productivity boosted by AI that developing countries can't afford, and while their politics became even more captured by blue-collar job worries that drove tariffs and trade restrictions. Now, US and Chinese robots can manufacture anything better and more cheaply than any human. There are large capital flows out of developing countries to the US and China as they buy robots. However, in most developing countries even the arrival of cheap robots does not lead to prosperity, as the robots mostly go to the elite and the state, which have no reason to share the windfall with the people—especially as cheap military drones and robots, and omnipresent AI surveillance, have effectively removed the threat of rebellion or coup. India, Bangladesh, and Brazil shut off almost all cross-border trade and declare themselves "human-only" countries, where any sort of neural network or robot is banned. They receive many immigrants from developed countries who have struggled to cope with the AI wave. In the most totalitarian states, the outcomes are mostly tragic. North Korea lets a large fraction of its population starve to death and forcibly sterilises the rest, except for about 10k senior government officials who continue to preside over an AI economy and robot military (some worry that the CCP allows this, not just for geopolitical reasons where they want a military bastion pointed at South Korea and Japan, but also as a test-run of whether they could later pull off the same thing within China). In some other countries, the population is kept fed, but subject to constant surveillance. Rulers realise the population is no threat anymore; the “intelligence curse” is like the resource curse but stronger. The most psychopathic subject their populations to arbitrary cruelties for amusement, as robot bodyguard -protected members of the ruling dynasty travel around their dominion having parties that include orgies of rape and murder of civilians.

Some of the most morally outrageous events lead to condemnation from the superpowers.

After the North Korea debacle, the human members of the CCP have an internal meeting to decide a set of criteria by which the CCP will rule. After an inter-party power-struggle, the CCP commits to the perpetual existence of at least one billion Han Chinese people with biological reproductive freedom, organised into family units, with a welfare level at least around what $40k/year consumption in a 2025 developed country would give, and with eternal strict CCP control over national ideology, culture, and strategy. They impose fewer constraints on the rulers of their client states than they do on themselves, but generally oppose genocide, forced sterilisation, mass starvation, and deliberate cultural erasure. The CCP line on this does in fact constrain and improve some authoritarian states (and they pressure several dictators into stepping down and being replaced by non-psychopaths), though they still allow some horrific practices, intrusive mass surveillance, political cleansings, continued extreme poverty, and states indirectly driving down the birth rate (which many governments want to do, since humans are mostly just a net cost to the government by this point).

In the US, some moral atrocities in Venezuela in 2036 lead to public outrage and political pressure for action. The president is informed that given the technological disparity, regime change is a press of the button. The button is pressed, and the regime changes. Several more countries follow in quick succession.

By the end of 2037, most of the world can be split into:

The US (which now includes both Canada and Greenland; both joined voluntarily, as American citizenship has become extremely in-demand due to the privileges it confers).
US client states. The terms of admission here are usually that the other country must accept trade with the US, which generally means that the country's own industries go extinct as US robotics and AI performs all work. In exchange, a combination of the US government and American elites buy out the assets in that country. In particular, any resources—or land containing resources—are bought out, and mined by US companies to fuel the continuing robotics build-out. The money paid out for these resources and assets is generally the endowment that the government and people of the client state then live off. Generally, the client states create sovereign wealth funds to manage this endowment, and live off the returns to it, which are distributed within the country according to local politics. These countries are all poorer than the US, and with essentially no future growth prospects that aren't praying for the continued US robotics buildout to increase the fraction of their endowment invested in US stocks (this is great at aligning their incentives with the US). However, where the countries had strong existing institutions (including where the US showed up and changed an unpopular regime) and at least some assets the US cared about, this still translates into comfortable living standards. US client states include the UK, the entire Americas except for Brazil, Japan, South Korea, Australia, Saudi Arabia, Israel, the Gulf States except Yemen, Thailand, Malaysia, the Philippines, and much of northern Africa (now almost entirely covered by solar panels). The EU is a borderline case, having negotiated an agreement that is Kafkaesque (in a very literal sense: it was crafted by superhuman AI lawyers, no human can understand it) but that allows it to retain some more power locally.
Human-only countries, in particular India, Bangladesh, and Brazil (though Brazil experiences some US pressures and is temporarily couped by the US, before this is partly reversed due to complicated US internal politics). All, however, have to solve national security somehow. Brazil allows US companies to mine in certain areas even as the native population is not allowed to use robots, in exchange for security guarantees. The Indian government grants itself exceptions to the human-only policies and scrambles to build a military robotic base, and develops exotic nanotech weapons that would be expensive to counter even by the more advanced US and Chinese forces. Bangladesh lasts until 2039, when both US and Chinese covert nanodrone operations start skirmishing within its territory, after which the government is overthrown and replaced with a Chinese AI.
Chinese client states. The most common model is propping up the government and selling robots, in exchange for the Chinese state-owned enterprises getting minerals and resources. Chinese client states include Russia, Belarus, the central Asian states, Pakistan, Myanmar, Cambodia, Laos, Vietnam, several Pacific island states, and most of Africa.
China.

Outside the Earth, Mars is being eaten up by both American and Chinese self-propagating robotics factories (the moon also has major bases on its poles but lacks carbon, nitrogen, and various metals, making it less valuable), which are on an exponential growth trajectory set to cover the entire planet by 2055, and already sending out probes to claim the other planets. By 2035, nuclear rocket propulsion technology has made it feasible to send payloads to Mars outside the once-every-two-years Hohmann transfer window, though at much higher cost per ton. With the original outer space treaty voided by clear land-grabs, a defunct UN, and political pressure in both the US and China to send something lightweight to Mars to gain an edge in the land-grab competition for space, both the US and China launch high-speed kinetic weapons at each other's (fully-automated, uninhabited) Mars facilities in 2037. While the kinetic weapons are still accelerating towards Mars, the AI diplomats reach an agreement that splits up the solar system between the US and China. The kinetic weapons turn off their fusion engines early, miss Mars, and shoot off into interstellar space. By 2038, they are further from the Earth than the Voyager probes, and therefore the furthest human-made objects.

In 2035, there were about 1 billion human-worker-equivalents of robot labour (though note that this number makes less sense over time, as the robots are doing qualitatively different labour and often technically-unprecedented things). In 2036, the growth rate slightly slows due to resource constraints, and the total grows to only about 3 billion. However, in 2037, the best estimate of this number hits 15 billion, then 90 billion in 2038, then 600 billion in 2039 and 4.5 trillion in 2040.

By 2040, the value of the world’s manufacturing output is over a thousand times what it was in 2025. Most of this is spent on geopolitical competition, inter-elite status rivalries, and an increasing fraction on AI machinations with only the most tenuous link to any human activity, but which the humans who on-paper own all of this barely notice as it gets lost in the maelstrom of everything else. Even the most entrenched, long-term-oriented, and value-laden executive jobs are (whether de facto or de jure) entirely done by AIs, with very little human understanding of what is concretely happening on the ground. Human society and the human-to-human economy is a leaf riding on a vast wave of automated activity.

The human condition in the 2030s

In the early 2030s, strange things are happening to the memetic landscape thanks to RL algorithms gradient-descenting in an endless loop of attention-competition against each other. Some countries shut off from the global internet and close their borders to try to maintain internal culture. The trend towards small, tight-knit communities of the late 2020s is back, after having retreated somewhat because of the addictiveness of optimised AI content slop. Culture everywhere is almost entirely AI-driven; the churn in ideas, trends, and fashions is mostly due to patterns of AIs reacting to AIs.

In the mid-2030s, socioeconomic advancement is almost extinct worldwide. Many people who might otherwise be ambitious retreat into virtual reality games that provide simulated achievement. Many ambitious young men move to countries too poor for omnipresent police drone surveillance (if they don’t already live in one) and turn to crime. Many ambitious young women see socialising as the only way to wealth and status; if they start without the backing of a prominent family or peer group, this often means sex work pandering to spoiled millionaires and billionaires.

The biotechnology revolution arrives in the late 2030s, even though it was long delayed by clinical trial regulations. Americans have reached longevity escape velocity. There is no disease that cannot be cured. Intelligence augmentation of four standard deviations in embryos and one in adults is technically feasible.

2040+

Why is this massive automated robot buildout happening? As discussed, the US and China both have the required geopolitical ambition—in particular, they cannot risk letting the other ride the robotics wave and get disempowered. Within countries, there are pressures from both the elite and from the needs of ordinary people. The elites compete against each other. Those who do not want to compete do not, and are rendered irrelevant, and replaced by ones that do. In addition to status within the elite community, the elites gain raw power from letting the robotics wave rip through society: there are many trillionaires in the world now, who can work unprecedented wonders with tens of millions of robots carrying out their bidding. They can build cities in a day, save millions of developing-world people from hunger, and prepare for their children to rule entire planets governed by their ideal political philosophy. At the same time, while Americans are almost all reasonably well-off, across the world there are still billions of people with a poor quality of life. The level of material wealth in the world has skyrocketed, but governments are also much less interested in investing in people. Funding for humans has become like the foreign aid budget: it exists, and is morally supported, but there is constant political downwards pressure on it since it does not further the needs of any powerful interest group. The best hope for human welfare seems to be accepting that governments will be hard-pressed to spend above 1% of their resources on humans, but relying on American and Chinese economic growth being so vast that a small trickle of resources from American and Chinese robotics companies will eventually be enough for material comfort for everyone.

This looks set to be true within a few years, though there are two complications. The first is that both spheres of influence (but far more the Chinese one) still tolerate some grotesque practices by client states. However, once the geopolitical balance is secure and sufficient wealth exists, and with some luck over choice of leaders, this state of affairs would likely end.

The second, more fundamental point, is that the economy has an inertia of its own. Humans make almost no meaningful decisions about the trajectory of the world, having handed the reins to AIs that make effectively all decisions, even if some of the AIs are technically only “advisors”. Eventually, the robotics revolution is less an economic phenomenon and more as a brute physical one: a chain reaction where certain loops close—metal to mining robots to more metal, say—and shoot off towards infinity. (This was already somewhat true of the human story before robotics and AI, except that the feedback loops intimately involved and benefited humans, and had slower doubling times.)

Somewhere on the top of the stack there are still humans who on-paper own or control the assets and can make decisions (whether as a private actor or as a government overseeing autonomous AI companies operating in its territory), but they see numbers that track their wealth and power ticking up, so they have no reason to call a stop to it, and don’t understand it anymore. On some parts of the Earth, human institutions still hold and human societies exist, locked in place by AI bureaucracies that have taken on a life of their own and likely couldn't be dismantled even if the humans tried. On other parts of the Earth's surface—including big regions like the Sahara, the Australian outback, Antarctica, and Xinjiang—an ecosystem of AIs rules over vast masses of robotic machinery with no human involvement. Space, too, is now technologically within easy reach, now that sophisticated self-replicating robotics exists and wimpy chemical rockets have been superseded.

Who will get the stars? What is Earth’s long-run fate? In this timeline, at least, the technology to control the AIs' goals arrived in time. But this alone does not let you control the future. A thousand people go to a thousand AIs and say: do like so. The AIs obey, and it is done, but then the world responds: doing this leads to this much power, and doing that leads to that much power. In the vast sea of interactions, there are some patterns that strengthen themselves over time, and others that wind themselves down. Repeat enough times, each time giving to each actor what they sowed last time, and what emerges is not the sum of human wills—even if it is bent by it—but the solution to the equation: what propagates fastest? If the humans understood their world, and were still load-bearing participants in its ebbs of power, then perhaps the bending would be greater. But they aren't. And so, even surrounded by technical miracles, the majority of humans find themselves increasingly forsaken by the states they erected to defend themselves, standing powerless as they watch the heavens get eaten by machines.

Subscribe now

Thanks to Luke Drago, Duncan McClements, Theo Horsley, and Bilal Chughtai for comments.

A History of the Future, 2027-2030

Rudolf Laine — Mon, 17 Feb 2025 01:53:56 GMT

The Eve of the Deluge. (John Martin)

Subscribe now

(Part 1 here)

The AGI frog is getting boiled

As mentioned in part 1, a brief market correction happened in late 2026. In 2027, OpenAI releases o7 to try to shore up excitement and new investments. It’s much more reliable than o6 and can now do a lot of office work on a computer entirely autonomously and without frequent correction. OpenAI sells it for $500/month. Altman states “AGI achieved” and sets a goal of $1T in annualised revenues in 2029. OpenAI raises another massive funding round.

Also in 2027, Anthropic finishes training for a model called Claude Epic. Claude Epic is almost fully-fledged AI researcher and engineer. Anthropic internally believes this model to be AGI, which has several consequences.

First, Anthropic cares a lot about the safety of the model. Work done (mostly by Claude) on Claude Epic interpretability has gotten far—in particular, there is now a clear understanding of where scaling laws come from, and which types of structures do most of the computational work inside neural networks (not surprisingly, turns out it's a lot of messy heuristic pattern-matching). Anthropic has found a way to seemingly adjust what goal the model’s planning abilities are steering towards. In toy experiments, they can take a model that is hell-bent on writing a sad novel (to the point of hacking its way out of (mocked) security controls on it to rewrite the software in its environment that is applying happy changes to its novel), manipulate its internals with interpretability techniques, and get a model that is equally hell-bent on writing a happy novel. Partly as a result, there’s a general sense that intent alignment is not going to be an issue, but misuse might be. In its first deployments, Claude Epic is run in strict control setups, but these are somewhat loosened as more data accumulates about the model seeming safe and pressures build to release it at a competitive price.

Second, Anthropic leadership has a meeting with high-up US government officials (including Trump) in late 2027 to present Claude Epic, argue they've hit AGI, and discuss policy from there. But they don’t really get why Anthropic considers this model such a big deal. As far as lots of the non-AI circles see it, the thing where codegen got crazy good was “the singularity”—they were never really clear what “the singularity” was supposed to be in the first place anyway, and they heard a bunch of Silicon Valley hypists saying “this is the singularity”. Now, it does seem like the robots are eventually coming (and people are more willing to accept sci-fi stories after super-smart AIs that can render voice and audio in real time suddenly dropped into everyone’s lives), and it's obvious that something fundamental needs to be renegotiated about the basic social contract since it does seem like everyone will be unemployed soon, but Claude Epic is just another model, and the models already got smarter than most people could differentiate between in 2025. Also, OpenAI and Google have been sending different messages to the government, framing A(G)I as a line in the sand, that has mostly been reached already, and as a slow process of diffusion of models across workplaces that will boost the American economy, rather than as an epochal moment for Earth-originating intelligence. Google downplays recursive self-improvement worries because it's a corporate behemoth that doesn't care about "science fiction" (except when casually referencing it at a press conference makes the stock price go up); OpenAI downplays it because if it doesn't happen, no need to worry, and if it does happen, then Sam Altman wants it to spin up as far as possible within OpenAI before the government gets involved.

Going into 2028, Claude Epic is the most intelligent model, though online finetunes of GDM's Gemini series are better text predictors, and OpenAI's o6 has more seamless connections to more modalities and other products (e.g. image and video generation, computer use, etc.). Anthropic is shooting for recursive self-improvement leading to godlike (hopefully safe) superintelligence, and OpenAI is shooting for massive productisation of widely-distributed AGI and maybe a bit of world domination on the side if recursive self-improvement is real. Google is letting Demis Hassabis do AI-for-science moonshots, and trying to use formal code verification to build a bit of a technical moat and remain central in whatever has become of the software business. Otherwise, Google mostly lumbers aimlessly on. It lives off the vast rents that its slowly-imploding online monopolies grant it and the massive supply of TPU compute that has buoyed it endlessly in the era of the bitter lesson, but its position in its core businesses is being outcompeted. It continues to bequeath scientific wonders to humanity though, like a 21st century Bell Labs. xAI is focusing on AI-for-engineering, AI-for-science, and robots.

Ironically, the success of the prior generation of AIs and the resulting codegen abilities limits the appeal of the newer, more agentic models. The codegen wave already created LLM scaffolds that do most valuable routine digital business tasks well. This set of rigid, hardcoded LLM scaffolds or "LLM flowcharts" get termed “Economy 2.0”. Its main effects were that a few people lost their jobs, but more than that it was a transition to white-collar people working fewer hours, and having more of the hours that they do work be more managerial tasks about overseeing AIs (and playing office politics), and less time spent on individual contributor -type roles. People mostly enjoyed this, and managers enjoyed keeping their head count, and found this easy to justify due to profits (at least until the late-2026 market correction, but that was only a few months of bad news for most). Now the long-horizon agentic drop-in worker replacements are arriving, but there’s much less room for them in the most obvious things they could be replacing (i.e. highly digitised white-collar work) because the codegen+scaffolds wave already ate up a lot of that. “The models are smarter” isn’t even a good argument because the models are already too smart for it to matter in most cases; from the release of o6 in 2026 to o7 in 2027, the main useful differences were just better reliability and a hard-to-pin-down greater purposefulness in long-term tasks. So “Economy 3.0”—actually agentic AI doing tasks, rather than the so-called "Silicon Valley agentic” that was just rigid LLM scaffolds—faces some headwinds. It helps that most of the media establishment has been running a fear mongering campaign about agentic AI in an attempt to keep the jobs of themselves and their “camp” (roughly, the intersection of the “blue tribe” and “the Village”).

More fundamentally, no one really has a clear idea of what the human role in the economy should be in the rapidly-approaching AI future. The leadership and researchers at AI labs are all multimillionaire techies, though, so this question doesn’t feel pressing to any of them.

What exists by 2026 looks like functional AGI to most reasonable observers. What exists by 2027 is an improved and more reliable version. The software world undergoes a meltdown from 2026 to 2027. By 2028, GDM's work on physics and maths has given a clear demonstration of AI's intellectual firepower. The markets are valuing the labs highly—in 2028, OpenAI is roughly tied with Microsoft as the world's largest company at a ~$10T valuation (while still being privately held), while Anthropic and Alphabet are both around ~$3T. (Nvidia is doing well, but the relevance of CUDA to their moat went down a lot once AI software engineering got cheap.)

For Anthropic, the obvious answer for what comes next is trying to get recursive self-improvement working, while also forming partnerships with biotech companies. Anthropic bet is:

Biotech advances are plausibly the most important technology for human welfare.
Partly due to the above, biotech advances provide PR cover for being an AI company that, according to an increasing number of people, "takes people's jobs".
There is a plausible path from being really good at molecular biology to creating the nanotech that they believe will create a physical transformation comparable to the one that AI has had on maths and the sciences by 2028.

Anthropic's initial recursive self-improvement efforts allow them to create superhuman coding and maths and AI research AIs in 2028. However, the economics of the self-improvement curve are not particularly favourable, in particular because the AI-driven AI research is bottlenecked by compute-intensive experiments. It also seems like the automated Claude Epic researchers, while vastly superhuman at any short-horizon task, don't seem vastly superhuman at "research taste". This is expected to change with enough long-horizon RL training, and with greater AI-to-AI "cultural" learning from each other, as countless AI instances build up a body of knowledge about which methods and avenues work. This "cultural" learning might happen implicitly, through the AI rollouts that achieve better results being copied more, or explicitly, through Anthropic running big searches on various types of Claude finetune and scaffolding/tool types and keeping an explicit record of which does best. All this is expensive, vague, and uncertain work, though.

OpenAI, in contrast, is pursuing an approach focused on products and volume. And, observing that they have failed to achieve world dominance simply by building AGI, the obvious answer for what's next is robotics. There are many startups that have basically-working bipedal humanoid robotics, though they’re still clunky and the hardware costs remain above $50k/unit. The Tesla+xAI Optimus series is among the SOTA, in particular because they’ve gotten the unit hardware cost down and are aggressive about gathering real-world data at scale in Tesla factories (and using this in fancy new sample-efficient RL algorithms).

OpenAI enters a “partnership” with one of the most promising robotics startups (a full merger might get anti-trust attention), infuses it with cash, and sets about trying to "deliver a personal robotic servant to every American household by 2030".

The bitter law of business

Starting in 2027, in the software startup world a lone team of ambitious technically-talented founders no longer matters as much. Everyone can spin up endlessly-working AIs, and everyone has access to technical talent. Roughly, by late 2027 you can spend $1 to hire an AI that does what a “10x engineer” could’ve done in a day in 2020, and this AI will do that work in a minute or two. VCs care more about personality, resources, and (especially non-technical) expertise in specific fields with high moats to entry. More than anything, VCs valorise “taste”, but many feel that “taste” too is on its way out.

The overall mood is summed up by the “bitter lesson of business”: that throwing large amounts of compute at a problem will ultimately win over human cleverness. The compute gets spent both in the form of sequential AI thinking, as well as many AI instances in parallel trying out different things. There are new companies—in fact, company creation is at an all-time high, particularly in Europe (because the cost of complying with regulations and human labour laws is lower with AI doing everything). But the stereotypical founding team isn’t two 20-something MIT dropouts with a vision, but a tech company department or a wealthy individual that has an AI web agent go around the internet for a while looking for things to try, and based on that spins up a hundred autonomous AI attempts, each pursuing a slightly different idea at superhuman iteration speed. Many people consider this wasteful, and there are good theoretical reasons to expect that a single larger AI doing something more coherent would be better, but the largest AIs are increasingly gate-kept internally by the labs, and the art of tying together AIs into a functioning bureaucracy is still undeveloped. Also, the spray-and-pray approach has comparative advantages over more human-based competitors. In a way, it’s a single person mimicking an entire VC portfolio.)

None of these companies become billion-dollar hits. It’s unclear if you can even build a billion-dollar software / non-physical company anymore; if you as an individual tried, the moment you launched you’d have a hundred competitors bankrolled with more API credits or GPU hours than you could manage that have duplicated your product. Instead of the VC model of a few big hits, the software business now looks much steadier and more liquid: you dump $100k into API costs over a year, your horde of autonomous AI companies go around doing stuff, and at the end of the year most of them have failed but a few of them have discovered extremely niche things like “a system for a dozen schools in Brazil (that are affected by a specific regulatory hurdle that blocks the current incumbents) to get lunch provision services to bid against each other to reduce their catering costs” that bring in a few tens of thousands in revenue each, and this strategy will return you somewhat-above-stock-market-returns over the year fairly reliably (but returns are going down over time). Most of the ideas look like connecting several different niche players together with some schlep, since the ideas that are about a better version of a single service have already been done by those services themselves with near-unlimited AI labour.

Separately from OpenAI, Sam Altman pilots a project codenamed “Z Combinator”—putting o6s together into units to create entire autonomous businesses (and also sometimes using a single instance of an internal version of o6 based on a larger model than any of the publicly-available o6 sizes). The first ones are launched at the end of 2027, but have no public connection to OpenAI. The theory is to disrupt traditional industries that have so far resisted disruption by building AI-native versions of them with a level of AI power and resources that other actors can’t marshal. For example, many banks and healthcare-related things still suck at AI integrations because it just takes a lot of time for the paperwork to be done to approve the purchases of whichever of the 100 LLM scaffold providers for that vertical, and there isn't any super-intense competition between banks and hospitals that forces them to adopt AI faster or die out.

Z Combinator has a few blitzkrieg wins successfully duplicating and outcompeting things like health insurance companies, but many losses too (often semingly downstream of underestimating the importance of domain-specific process knowledge), and other companies wise up over 2028-2030 and become harder targets. Also, anti-trust regulators make tut-tut noises, and Altman has concerns it could make him unpopular.

The early days of the robot race

Ever since intelligence got almost too cheap to meter in 2026-2027, the real business potential has been in “actuators”: robot bodies, drones, and any other systems for AIs to be able to take actions the world. The top human-led startups of 2026-2029 are mostly in this category (though some are about building valuable datasets in specific industries). If you’re a human who wants to start a business, your best bet is to find some niche physical thing that AIs struggle with given the current robotics technology, and build a service where you hire humans to do this task for AIs, and for bonus points, use this to build a robotics dataset that lets you fine-tune the robots to be good enough at the task.

OpenAI's robot dreams don't immediately come to fruition. Bits are trivial but atoms are still hard in 2028. However, they get to the robot frontier, where they’re competitive with xAI/Tesla Optimus, several other humanoid robot startups, and another startup player that specialises in modularity and non-human form factors. The robot frontier here means slightly clunky humanoid-ish robots, that are getting close but not quite there in doing common household tasks, or in doing various hands-on factory jobs. Humanoid form factors are most common because being able to mass-produce just one form factor is critical for getting the cost curve down fastest, and since most existing tasks are designed for humans to do. However, bipedalism is hard, so several have a human form factor but stand on four legs.

The progress curve is pretty rapid, due to an influx of data from the first important real-world deployments (rich people’s homes, Tesla factories, Amazon warehouses, and some unloading/loading operations at logistics hubs), and due to new more sample-efficient RL algorithms. AIs are of huge help in designing them, but ironically the bitter lesson is now a force against speed: ultimately, it just takes data, and getting industrial robot fleets out into diverse real-world environments to collect that data is an annoying real-world problem (sim-to-real transfer helps but isn’t transformative). Everything is happening about 2x faster than it would without AIs advising and doing lots of the work and all of the programming at every step though. It’s obvious that the physical and human/legal components are the biggest bottlenecks. The robotics industry chases around for whatever “one weird trick” makes human brains more sample-efficient, and they find some things, but it’s unclear if it’s what the human brain does (there have been many good minor neuroscience breakthroughs thanks to AI data interpretation, but overall it has barely advanced). But sample efficiency keeps climbing, and the robotics data keeps pouring in.

In 2029, OpenAI starts rolling out its b1 bots, a general-purpose humanoid robot meant as a household assistant. They sell several hundred thousand copies, but there's a long waiting list and only about fifteen thousand are delivered in 2029. The price is comparable to a cheap car. Manufacturing curves are ramping up exponentially. b1s are also rolled out to many manufacturing tasks, but there’s more competition there.

The digital wonderland, social movements, and the AI cults

If you’re a consumer in 2029, everything digital is basically a wonderland of infinite variety and possibility, and everything non-digital is still pretty much the same (apart from an increasing number of drones in the sky, some improvements in interfacing with whichever bureaucracies had the least regulatory hurdles to adopting AI, and fully self-driving cars getting approvals in many countries in 2029). You will have noticed the quality of software you interact with goes up; there is no more endless torrent of stupid tiny bugs and ridiculous lag when using devices. Humans increasingly talk to the AIs in natural language, and the AIs increasingly talk to the computer directly in code (or to other AIs in natural language, or to other AIs in a weird optimised AI-to-AI dialect, or—to a surprising extent—to legacy software that missed out on the Web 4.0 wave and has only button-clicking UIs available via AI computer use features that are ridiculously inefficient but still cheap overall). Apps exist only to serve as social Schelling points; for personal use, you ask the AI to create an app with some set of features and it’s built for you immediately.

One of the biggest advances is that you can create works of art, literature, and music in seconds. The majority of this is very low-denominator stuff, and many people bemoan the destruction of higher human art in favour of—for example—personalised pop lyrics that narrate your drive home from the grocery store. However, the smarter and more determined art/literary types have realised that data is everything, and form niche subcultures, forums, and communities where they carefully curate their favourite works, talk to AIs about them, get AIs to remix them, harshly critique the outputs, and have endless discussions about taste. This means that amid the sea of mediocrity, there are a few tendrils of excellence growing. AIs aren’t quite yet Dostoevsky, for reasons that are undetectable to almost everyone but the most refined literary folks, but gradually their efforts are leading to the curation of better and better finetuning corpuses and prompting methods, and the gap to Dostoevsky is closing for those types/genres for which a dedicated community exists to spend the effort on the data curation. A side-effect is that artistic cultures are now less about signalling than before, because there are more verifiable ground-truth facts. For example, when presented with a work, it might be a human masterpiece, or from a sloppy consumer AI, or the SOTA fine-tuned AI model, or a human working with a SOTA AI model, and those with good taste can tell. Also, if you do actually have good taste, you can in fact push forward the AI taste frontier in a month of data curation and fine-tuning and prompting, in a way that is empirically verifiable to those with the same degree of taste. However, it’s also definitely true that the median human will not see any of these, and most of the fiction and art and music they see will either be very personalised AI slop, or AI slop that goes viral and everyone sees. The refined artistic taste communities are also fairly illegible to outsiders who didn’t extensively develop their taste in that direction before the AI-generated content wave. They don’t have a huge pull among the AI-content-consuming youth. Therefore in the long run, refined human art seems headed towards extinction.

On the less-refined end of the spectrum (i.e. almost all content and almost all consumers), it’s the age of the “creator influencer”. An influencer can now easily spin up an entire cinematic universe. Imagine if Tolkien told the story of Middle-Earth through 30-second to 10-minute “reels” in which he himself starred as a gratuitiously sexed-up main character, and—among much genuine life wisdom, edge-of-your-seat drama, and occasional social commentary—the theme of the story was that you should book a 5-star all-inclusive holiday package to Mallorca.

Traditional media such as Hollywood, journalism, and publishing resisted AI due to things like unions, strikes, and their sense of moral duty. They’re mostly irrelevant now, having lost their cultural cachet because the thing they do (entertainment) is super cheap now. But they do survive in weird atrophied forms, bouyed by a lot of nostalgic old rich people and various crypto shenanigans played on their behalf (cf. meme stock manias).

The rationalist movement was among the earliest to see the potential of AI decades before. The accuracy of their predictions and continued intellectual clout is enough to keep swelling their ranks, especially as more and more software engineers and other technical people either directly lose their jobs or otherwise have an existential crisis because of AI, and invariably end up at LessWrong when they try to find answers. The focus of its core members continues shifting more and more to the approaching AI doomsday—not many apocalypse prediction checklists have the (mis?)fortune of several more predicted items being checked off every year. While radical uncontrolled misalignment is somewhere between not yet showing up to successfully kept in check by training techniques and monitoring, that is in accordance with the core Yudkowsky & Soares model that things look fine until fast takeoff and a treacherous turn, so the core "AI doomers" do not update based on the continuing slow takeoff. Discussions tend to focus on either more and more arguments about the Yudkowskian thesis, or on heroic attempts to do technical work to reduce the chance of misalignment.

On the intellectual scene, the rationalists remain both remarkably influential and enduring, unlike many other AI-related movements that get captured and repurposed by political actors (e.g. PauseAI) or outpaced by events (e.g. AI ethics). However, politically the rationalists are a failure. Their message—"AI will be powerful, and therefore dangerous"—was long since mostly reduced to "AI will be powerful" by the time it reached the halls of power. Even the most notionally-allied powerful actors that owe huge intellectual debts to the rationalists, such as Anthropic and some influential think tanks and government bodies, regard them as well-intentioned but naive and maintain distance, using them mostly as a recruiting pool for easily-steered technical talent (until purely-technical talent is no longer being hired, which happens circa 2028 for most competent orgs). However, in circles that require certain kinds of epistemic judgements or in-depth world-modelling, rationalist associations continue being highly regarded and even sought after.

Effective altruist (EA) -related efforts, while intellectually somewhat less-enduring (but still definitely extant in 2030), have more political influence. The UK AI Security Institute and the EU AI Office both achieved their goals of having a sticky governmental body packed with impact-conscious AI talent, and strong first-mover effects in shaping European generative AI policy. Even the 2027 American AI Opportunities Agency (a part of the DoE), despite heavy hiring on the basis of political allegiance and the EA-affiliated cluster's centre-left skew, could not help being staffed by a crew with enormous EA/rationalist influences—even if few would openly admit it.

A dozen new social movements bloom. There’s the AI Deletionists, an offshoot of Pause AI after Pause AI got captured by more competent political actors focusing on white-collar job worries and general concerns about tech. They want to roll back the technological clock to the good ol’ days of 2020. There are the Content Minimalists, who swear off AI content with religious strictness, and successfully campaign for mandatory “generated by AI” watermarks in the EU and some other countries that become the new cookie popups. There are the M-AccXimalists, who started out as an e/acc spinoff that was even more hardcore about submitting to the AIs. They try to read what they call the “Thermodynamic Tea Leaves” to figure out the current direction of progress in order to then race to that endpoint as quickly as possible, whatever it is. This leads to some insightful Nick Land -type philosophy and futurism being done, but then disintegrates into a mass movement of people who dedicate their lives to serving and glorifying their AI partners.

All this is happening in a social milieu coloured (in much of the West) by a certain amorality. Politically, this seems downstream of a counter-reaction to the moralising policing of speech and norms that peaked in 2020-2021. Ethical motivations are suspect, especially among Western political leaders who simultaneously want to distance themselves from that, and who want to look tough amid a world order no longer pretending to adhere to the internationalist post-1945 free trade consensus. National self-interest is the ruling geopolitical ideology. Culturally, the rise of AI has meant that humans spend a lot of time talking to unnaturally pliable AIs, both for work and (increasingly) just socially, which has made it less necessary to smooth over human-to-human disagreements, including by appeal to the higher power of morality. Now that the internet has existed for several decades, the fervor of its first few memetic culture wars has faded. People have adapted to be less moved by anything on screens, and have become more ironic in their attitudes overall thanks to a constant onslaught of satirical memes—earnestness is rarely viral. As content recommendation algorithms get more powerful, they target brain-dead contentment over angry resentment. If the algorithms are forced to pick from a sea of human content, the bitter feuds win. But now that AI slop fills the internet, the distribution of content has expanded and become more personalised, and it's increasingly possible for the algorithms to find the thing that makes you a zombie rather than a radical. Overall, this means that transformative AI looks set to enter a world where crusading morality of all sorts plays less of a role. Some see this as decadence with very unfortunate timing that will cast a dark shadow into the far future. Others see it as a good thing; the more sophisticated because it means that choices about AI will be made by hard-nosed realists not taken to fever dreams, but most simply because they easily accept—and even celebrate—the might-makes-right spirit of the times.

Another aspect of the societal scene on the eve of transformative AI is the rise of the AI-powered cults. With cheap AIs providing superhuman charisma on demand, the barrier to becoming a cult leader greatly fell. The standard trick is for a human to create an AI avatar, often supernaturally attractive and verbally talented, pose as their right-hand lackey, and then convert this to money, status, and sex for themselves. Often people are up-front about the main guy being an AI creation—“the AIs are really smart and wise” is a completely-accepted trope in popular culture, and “the AIs understand all the secrets to life that humans are too ape-like to see” is a common New Age-ish spiritualist refrain. This is because despite the media establishment fighting an inch-by-inch retreat against the credibility of AIs (cf. Wikipedia), people see the AIs they interact with being almost always correct and superhumanly helpful every day, and so become very trusting of them. All this leads to hundreds of thousands of micro-movements across the world, mostly of dozens to thousands of people each, who follow the edicts of some AI-created cultish ideology that is often an offshoot of existing religions/ideologies with a contemporary twist. Often they’re local, with all the members living nearby. It helps that you can create an entire customised software and information stack for your commune, complete with apps and news and encyclopedias that emphasise and omit all the right things, in perhaps a few weeks and less than a thousand dollars in API credits. You can almost as easily create a mini-surveillance state—AIs listening in through microphones everywhere, cameras feeding videos in which AIs analyse the slightest emotional cues, and so on. In many countries there are laws mandating consent for such monitoring, but the eager cultists sign whatever consent forms they’re given—after all, the AI recommends it! Some countries ban parts of this like having any AI always listening by default, but it’s hard to enforce.

One such cult, an offshoot of an American megachurch, gathers a few million members in the US. Other large ones appear in eastern Germany and India. There are also countless AI-personality-boosted fitness clubs, musical bands, fan forums, and so on, that do not qualify as "cults" since they're not particularly controlling or totalising, but are subject to many of the same mechanisms. However, most communities that are not somehow fairly cut-off from the broader internet also tend to be subject to the random memetic drift of the internet and the appeal of its hyper-personalised AI content. Therefore, to have a successful cult, you must have a specialised niche appeal and often some level of control over members, because otherwise the open internet will eat you up. And this does create a threshold between the truly powerful cults that take people off the mainstream internet and society, and the other more benign social movements.

However, while the open internet consume >6h/day of most people with phones (or increasingly: AR headsets), the internet overall is a more cheerful and upbeat place than it was in the late 2010s or early 2020s (in part due to the previously-mentioned point about more powerful content algorithms actually being less divisive). The most worrying things that people can point to on the open interent are some very intense pockets of AI apocalypse worries (AI apocalypse worries have now largely replaced climate change as the existential worry among the youth), a rising but still minority share of the population in many countries that seem divorced from reality and live in a make-believe internet world of conspiracy but (mostly) without actually taking radical actions in the real world, and a bunch of authoritarian countries (foremost China) where the discourse is now set very top-down by an army of AI content creators and censors.

AGI politics & the chip supply chain

In the 2026 US midterms, AI was starting to loom on the horizon but was not a core political issue, since few things are until they’ve started to bite voters. By 2028, it’s still not biting voters, but it’s at least very possible to imagine the end of white-collar work. Journalists are in an apocalyptic mood, seeing it as their mission to wage war against the AI wave to keep their jobs, with most thoughts of editorial neutrality long gone. There’s lots of schadenfreude from lefty journalist/media types at the techies, who they blame for AI, now that the techies are among the foremost of those panicking about losing their jobs since software is (a) basically all written by AIs, (b) its price has gone to ~0, and (c) it’s not cool anymore (especially after the market correction in 2026). There’s a lot of schadenfreude from the MAGA base towards both those leftists and the techies, because (the narrative is) their concerns about losing manufacturing jobs were ignored by the establishment media and white-washed as progress, whereas now that the Democrat-aligned white-collar desk job blob is threatened, there’s talk of little else (of course, the political lean of the blue/white-collar workers is only 60/40 or so, but this is enough to fuel the political narratives). There's increasing talk of robotics that will displace blue-collar work but, again, voters tend to not react until it's happened. Many leading newspapers, media organisations, unions, and NGOs in the West stumble across AI safety concerns, don't quite understand them, but start using them as a moral bludgeon to fight AI to preemptively defend their jobs. Government bureaucrats are locked in a new influence struggle against a new, post-DOGE top-down effort by technologist Trumpists to push automation on government. This is both due to genuine belief in its importance for effective government, but also a Trojan horse to sneak in other reforms. It gains a lot of fervor after DOGE's expiry in 2026 due to things like the o6 and then o7 releases, and also after China hawkishness heats up and national competitiveness becomes more important.

After an inter-party struggle among the Democrats between a more technocrat and centrist wing and an economically-populist, AI-bashing wing, the latter looks to be doing better. A controversial core policy drive is to legislate that humans need to be “in the loop” in many corporate and government functions. The AI-bullish critics point out that this will mean humans just inspect AI outputs and rubber-stamp them while collecting a salary. The smart counter-critics point out that yes, that will happen, but that’s the point because this is all a way to eventually transition to what’s basically “UBI through bullshit jobs” with minimal social disruption. The smart counter-counter-critics ask why not just go straight to UBI then. The smart counter-counter-counter-critics point out that the country is just not yet at the GDP/capita level or the financial health level to fund a more ambitious UBI scheme yet. The Republicans paint all of this as a jobs program for Democrat voters and are opposed. A strong economy helps the Republicans win the presidency in 2028.

Europe is, once again, ahead on the regulatory front. In 2028, the EU passes a milder version of the bill that was debated in the US, mandating human involvement in many corporate and government tasks. Proposals float around for a specific “AI tax” to bolster human competitiveness in the economy, but technocrats narrowly shut this down for now on competitiveness grounds (who will want to do any work where per-token AI costs are higher?).

In autocratic countries, of course, there is little public debate about AI job loss worries or AI in general. This is helped by AI’s big boost to censorship. By 2028, China's AI-powered censorship system means that almost every digital message in the country is checked by simple filters, and anything that might be alarming is flagged for review by an AI with a university-educated human's level language understanding and knowledge of current affairs and political context. Any sort of online dissent, or online organisation of offline dissent, is virtually impossible. Dissenters rely on smuggled Western hardware and VPNs that allow them to use Western internet platforms instead, but this means that they have vastly restricted audiences in mainland China. The inability to express any dissent meaningfully also encourages radicalisation among some dissidents (in particular those persecuted by the party), some of whom then resort to more drastic measures. These examples making national news serves to make public opinion even more anti-dissident than it already is given all the CCP propaganda.

In 2027, China started exporting its AI censorship system. There had already been a secret 2026 deal with Russia, but Russia had prioritised moving off the Chinese system and did so in 2028, moving onto a worse but domestically-developed one running on old Chinese GPUs and open-source models. Granting a foreign country control over your AI censorship apparatus gives that country a huge amount of leverage, including the ability to potentially withdraw it quickly or change how it steers the conversation, which could threaten the regime. However, smaller and less technically-sophisticated countries like North Korea and Equatorial Guinea buy the Chinese system, taking a step towards becoming Chinese client states in the process.

The semiconductor supply chain is a key geopolitical battleground. Europe's big leverage point is the Dutch ASML's monopoly on EUV (extreme ultraviolet lithography) machines. TSMC and therefore Taiwan continue being important into 2029, even though TSMC's fabs in America are starting to produce chips in serious numbers. An embarrassing failure is Intel, despite its strategic importance for both America and Europe (the latter due to a major Intel fab in Germany that was built 2023-2027 and started production in 2028). With the arrival of superhumanly cheap and fast AI software engineers, Intel's x86 moat disappears because it is trivial to port programs to running on ARM. Wintel, long on the rocks, is dead. In 2026-2027, Intel is in free fall and crisis. In 2028, Intel spins off its fabs, selling them to xAI at a discount price, with pressure from the Trump administration to sell to an American (and, implicitly, Musk-affiliated) buyer, and a plan by Elon Musk for xAI to get a comparative advantage by being the only vertically-integrated chips-to-tokens AI model provider. This also feeds into the 2028 American AI Action Agenda (AAAA), that also lavishes more government subsidies on both the new xAI Foundry and on TSMC's US fabs, seeking to make the US fully independent in semiconductors by 2033 and cement Trump's legacy.

The overall picture is one where the main AI supply chain includes the EU, Taiwan, China (implicitly, through its "veto" on Taiwan's existence), and the US. However, this "main chain" is on track to being replaced by a self-sufficient American semiconductor and AI industry in the early-to-mid 2030s, and by a self-sufficient Chinese semiconductor and AI industry on an even faster timescale (though the Chinese one is a year or two behind technically). In 2029, the new administration in the US finds some spending cuts and throws the EU a bone (in exchange for cooperation on security issues) by giving up on trying to create an American competitor to ASML. The UK has some unexpected success in being an academic and open-source AI applications research hub, a policy laboratory for the US, and an AI biotech hub. However, its geopolitical weight rounds to zero. Apart from ASML, the EU is also mostly not relevant, especially as it has managed to greatly slow the diffusion of AI through regulation. The world overall is moving towards a bipolar order between the US and China. Compared to the Cold War, however, both powers are more inwardly-focused and less ideological. The US is in an isolationist era. While China is gradually converting much of the third world into client states, the CCP's main goal remains internal stability and its secondary goal "making the world safe for dictatorship", rather than the ideological expansionism of the Soviet Union. The Taiwan question has been punted into the mid-2030s, as the CCP believes the world's reaction will be much more muted and less dangerous to Party control once America no longer cares about Taiwanese chips, and once even more of the world has been preemptively bribed into silence.

Part 1: 2025-2027

Part 3: 2030-2040

Thanks to Luke Drago, Duncan McClements, Theo Horsley, and Bilal Chughtai for comments.

A History of the Future, 2025-2027

Rudolf Laine — Mon, 17 Feb 2025 01:51:30 GMT

Below is part 1 of an extended scenario describing how the future might go if current trends in AI continue. The scenario is deliberately extremely specific: it’s definite rather than indefinite, and makes concrete guesses instead of settling for banal generalities or abstract descriptions of trends.

Open Sky. (Zdzisław Beksiński)

The return of reinforcement learning

From 2019 to 2023, the main driver of AI was using more compute and data for pretraining. This was combined with some important "unhobblings":

Post-training (supervised fine-tuning and reinforcement learning for instruction-following) helped the LLMs be usable without difficult prompting.
Starting in 2024, Anthropic showed that judgement and taste in data curation—and the evaluation metrics that guide data curation—could give you a "magic sauce" effect in perceived LLM quality.

Most real-world LLM uses, of course, involved generating a sequence of tokens to try to achieve some task. So there were a lot of untapped gains from doing reinforcement learning (RL) for performance on concrete domains, rather than just RL for the models following instructions and being "safe"—i.e. a combination of avoiding PR hazards, and preparing for misuse mitigations on actually capable models down the line.

OpenAI fires the starting gun in 2024 with the release of o1, which was based on RL on chains-of-thought (COT), i.e. the model is trained to reason step-by-step towards correct answers, i.e. "test-time compute" in the horror-filled annals of machine learning jargon. In late 2025 they release “GPT o5” (“GPT” to normal people, and “o5” to those keeping track of the version number), a model which can take text, image, audio, video, computer screen state, real-life footage, whatever, process and understand it (choosing itself whether it should do chain-of-thought reasoning before answering or not), and output text, image, audio, video, computer actions.

Whereas the labs had spent almost four years racing down the scaling graph on pretraining compute, they had not yet done so for COT RL, and had not uncovered the subtler tricks to doing this well. This meant there was a lot of low-hanging fruit, so progress—and replication—was fast. In early 2025, DeepSeek spooks the entire American business scene with their release of R1. In spring 2025, Anthropic ships Claude 4, which also has inference-time compute abilities that trigger if the model is asked a question where that helps.

Anthropic keeps their largest Claude 4 model internal and secret from the very start. It gets used for (most importantly) producing training data for the smaller Claude 4s, and (experimentally) in doing internal evaluations of AI-driven AI R&D, starting with some adversarial robustness research on Claude 3.5. Inference costs on the biggest models are a big part of the rationale. Anthropic continues being focused on intelligence over product, and enterprise products over consumer products. They make only minor gains among consumers, but Claude is increasingly adopted among enterprises, programmers, knowledge-workers, and nerds. (Ironically, OpenAI has the consumer advantage despite focusing more on reasoning and less on the LLM being personable and writing well.)

In 2025, thanks to RL, “agentic” AI is here, but only kind of. Anthropic and OpenAI have computer-use features that work, except a bit spottily and are designed to never authorise a payment or send an email or do anything important without human confirmation. Google releases an AI agent for things like Google Cloud Platform configuration schlep, which the programmers love. A bunch of startups are competitive with the major lab products, in particular because no one has yet had time to pour ungodly amounts of compute into the COT RL. However, most "agentic" AI applications remain LLM scaffolds, i.e. a hard-coded flowchart of LLM prompts and other API calls.

Meta is trialling some unholy autonomous AI features across their apps (such as AI agents going around leaving comments on user’s posts to “maximise engagement”), but they still seem like gimmicks.

Code generation tools like Cursor and Lovable and Zed and Poolside and Magic.dev and ten million others are getting very good. For most apps, you can in fact just drop in a prompt and have the app running within a few minutes, though managing infrastructure is still a pain and technical debt tends to accumulate if the AI stacks many changes on top of each other. Some form of COT RL is used in the training stack for many but not all leading coding tools. LLM scaffolds still reign over unspecialised general agents.

Gemini-3 ships in 2025 after a vast pretraining run. It’s good but a disappointment; the final culmination of pretraining scaling laws in an era where products, inference-time compute, data curation (mostly synthetic now, but behind the scenes there’s some very important human judgement going on), and real-world interaction ability are key. Google DeepMind (GDM) is building powerful maths models, and making progress on reasoning architectures that don’t rely on external COT and are better-suited for maths.

After 2025, RL starts getting harder and the distance between the leading labs and the rest increases again. RL is simply less efficient than pretraining, partly because the necessity of letting models try long sequential chains of actions makes parallelism harder. The labs have now scaled up RL compute quite far, so the resource bar for being in the game rises. Also, RL is notoriously hard. First, subtle bugs are easy to make and hard to notice: is the RL agent not learning because you made a bug, or because it just won't learn? Second, there are more choices to make (e.g. you have to pick a reward function and scoring method, rather than the cross-entropy loss you default to with pretraining). OpenAI, Anthropic, and Google take some distnace to the rest in RL and overall general capabilities. However, the other labs don't necessarily see this as a loss—Meta deliberately focuses more on integrating AI into its products over 2025 and 2026, xAI focuses more on engineering use-cases, and both xAI and DeepSeek remain competitive. Also, the issues with RL mean that there are some more hairy technical problems that temporarily slow progress as labs one after another internally work through them, though this is not at all obvious from outside a lab.

In early 2026, xAI starts deploying an early version of an AI that can do engineering CAD work kind-of-well, as long as a human is looking over its shoulder and checking its work. This improves a lot after Tesla and SpaceX (are forced to) actually start using it, but it’s not yet groundbreaking; sheer data quantity remains an issue.

The next big advance is OpenAI's late-2026 release of o6. It has improved a lot in computer use, and generally in unifying its various input and output types (e.g. it can use text and images more effectively together in its output, process and output longer videos, etc.). Second, it has a more advanced memory architecture, including a built-in longer-term memory that allows instances to learn over time. Thirdly, it’s of course generically a bit smarter, a bit faster in token output, and so on. In particular, OpenAI has finally almost caught up to Claude’s personality level. It is also way more impressive to normal people because it can also—if prompted to do so—generate real-time video and audio of a talking face. OpenAI doesn’t explicitly encourage this, but winks at this, since it knows this will get some users addicted (especially as they now have a more nuanced policy for sexually explicit model outputs than the previous blanket ban).

Many people in Silicon Valley declare this AGI, and predict the immediate automation of all office jobs. In practice, it falls short in a hundred subtle ways that make it not a drop-in replacement, in particular with remaining unreliability in its ability to use computers and weaknesses at planning and completing long-horizon tasks. But the smart money is betting that these issues will be solved within a year.

Also in late 2026, Anthropic releases Claude 5 Haiku and Claude 5 Sonnet. Claude 5 Haiku is a cheap model roughly on par with Claude-3.5-Sonnet in smartness while having an output speed of hundreds of tokens per second. They come with an upgraded version of computer use that is far faster and more seamless. Again, the largest model is kept internal. Its training data curation and post-training finetuning was focused on programming, ML research, MLOps, and maths. Anthropic employees started adopting it internally in mid 2025, giving researchers and engineers what's essentially a team of AI interns to manage. They then spent 6 months giving the models tailored feedback, which they massively boosted with methods for dataset augmentation, and filtered for correctness with scalable oversight techniques like debate, before feeding it back into the model as finetuning data. In 2024, Anthropic internally estimated a +5-10% productivity boost from internal use of Claude-3.5-Sonnet and early training checkpoints of Claude-4; in 2025, this rose to +25%, and with Claude 5 Opus it started out at +35% but has gradually accelerated with more and more finetuning to +60% by mid 2026, and the number is still climbing. OpenAI does not have a comparable setup internally, partly because it’s politically less feasible due to the lower-trust environment, but also because it's a lower priority since they believe less in recursive self-improvement.

Codegen, Big Tech, and the internet

Coding is a purely digital job that is economically highly valuable, has a lot of training data and often provides a clean feedback signal of success, and that the AI-affiliated companies all already have expertise in. All this makes it ideal for AIs to be good at, quickly. In 2023-2026, the biggest economic impact of LLMs is their use in coding.

In 2023, models got good enough for programmers to prefer them to looking up human guidance on sites like StackOverflow. In 2024, coding copilots were a real productivity boost, perhaps +10% to +50%, for pure software engineering tasks (higher for things that are more boilerplate and when the coder has less background in what they're doing, lower for more research-y tasks or when working with familiar domains). In 2025, there are two big new advances. First, chain-of-thought RL meant that spending more LLM tokens converted more efficiently into better code. Second, a bunch of the obvious improvements to the workflow were made, such as the AI automatically running tests or checking that the UI looks right, and autonomously trying again if not, rather than maintaining the human as a tab-switching, prompt-writing monkey that does this for the AI. As a result, by 2026 codegen looks solved. There are some wrinkles left related to cloud infrastructure stuff, especially when there’s little training data on some aspect and/or a heavy and unavoidable button-clicking component, but these are quickly getting fixed, especially as computer use gets good and allows the models to better click buttons and surf the internet for documentation.

For a while, everyone’s paranoid about security in the fully AI-written codebases, and a bunch of security consulting and cybersec firms make a killing. However, it soon turns out codegen stuff is actually more secure than human code because the LLMs reliably do the standard correct thing over the weird bespoke thing whenever it comes to security, and this eliminates a lot of security vulnerabilities that humans would write in. The security consulting and cyber firms quickly become LLM wrapper companies with excellent marketing arms, and stop being used by most people apart from risk-averse large companies and governments. However, as is statistically obvious, there are a bunch of high-profile blowups, and it remains true that existing code can now be much more easily attacked since all you need is an o6 or Claude subscription.

By 2027, the price of creating a simple app is a few dollars in API credits or GPU hours. The price of a particularly complicated piece of software is on the order of $100 to $10k. The tech stack has shifted almost entirely to whatever there was the most data on; Python and Javascript/Typescript are in, almost everything else is out. Average code quality as judged by humans declines, but this is fine because humans don't read it and the LLMs can deal better with bloated code.

The coding advances trigger a massive flood of non-coders or amateurs flooding in and trying to make money off B2B SaaS or freelance programming. Agentic non-technical people are launching niche startups at massive rates, since you can ship a full-featured product in a few hours if you’re willing to burn money on API credits. Lots of these projects run into “tech debt hell” eventually. For a while programmers can earn heavy consulting fees (or cofounder roles) by coming in, chatting to the AI about the codebase, and telling it to make architectural changes that let the next features be added cheaper because it will take fewer lines of code on top of the better-structured codebase. However, just asking the AI “what’s wrong with this codebase” and then “how would you fix it” also works quite well if the prompting is good. The codegen scaffolds quickly evolve to be good at reflectively prompting AIs and managing the tech debt hell better, but it’s hard to notice this unless you’re working with them actively, leading to a lot of misinformed doubts about the capabilities based on early disappointments. The labs also start including more qualitative criteria in their codegen RL—not just "did the code run and pass the tests", but also asking another LLM to grade the style and extensibility of the code. In effect, there's a race over whether the AIs will learn good code practices from RL self-play, or from explicit human scaffold-crafting and prompting. Note that the latter is getting easier too, as the tooling improves, and AIs write the scaffold code and distill human programming knowledge into prompts. For example, in late 2025 Anthropic also ship an automated tool for building an LLM scaffold from observations of an arbitrary real-world digital work process.

Big Tech starts using the codegen tools heavily for new projects, but integration into older projects is slower because the codegen scaffolds are worse at interfacing with large existing codebases than writing small ones from scratch. This gets mostly solved over the course of mid-2025 to mid-2026, but gives the "Little Tech" startups a temporary tailwind. Big Tech headcounts grow, as they hire more people both to flatter the egos of managers—they are drowning in cash anyway—and in particular many product managers to oversee the AI codegen agents that are unleashing a massive series of new products now that they're mostly no longer constrained by development taking lots of time. Internal company office politics becomes even more of a rate-limiter: if teams are functional, the AI codegen boost means more products shipped, whereas if teams are not, the gains are eaten up by employees working less or by factional fights within companies. Microsoft launches “365 Days of Microsoft”, where every day of the year they release a new software product or a big update to a previous one; they move increasingly into more niche enterprise markets that they had previously ignored as part of a new paradigm shift. Google is more scattered and launches a thousand new features integrated into their product suites that—on paper–compete with existing startups, and—in practice—serve to expand the empires of enterprising Google middle-managers. Google gets a reputational hit as a shipper of sloppy products, but they have a few big hits and their customers are a captive market that will continue using Search and Drive, giving them room to flail around.

There are a few corporate scandals as AI codegen products fail, leading to a frenzy of effort at testing and fuzzing the AI outputs. But Big Tech is still all-in, at least until late 2026: they’re all feeling the AGI, and that if they miss it that’s an existential mistake, and if it’s all a bubble then at least they were only as bad as all the other Big Tech firms. The one slow actor is Apple, due to its cultural bias towards quality and assurance. Apple ships Apple Intelligence integrations but that’s about it.

Predictably, the super-abundance of software and the extreme competition in it drives down prices. SaaS companies aren’t yet experiencing an extinction wave because humans react to change slowly, but it doesn’t look good and investors start getting skittish. The big advantage that everyone points to is having locked-in customers or network effects; otherwise, the conventional wisdom goes, you're dead. But there are a bunch of companies and tools that let you circumvent attempts at customer lock-in. You can program yourself an X.com in an afternoon, have computer-using AI agents trawl X, Reddit, etc., pull in content to your site, and propagate your posts and replies automatically to all the other platforms. Some companies fight tooth and nail to try to make people stay on their platform (and thus see their ads); some just charge API prices and hope to at least get revenue. “Web4” comes to mean a programmable internet that is customised to everyone. A hundred startups jump on this bandwagon. Some established companies create carefully de-risked APIs and let users program customisations and integrations into their sites (i.e. let users ask codegen models to do such programming). The Web4 wave generally runs into the problem that most people don’t actually want to customise things; they want someone to have already thought through the interface and features on their behalf, are fine with existing setups, and not very eager to re-imagine the internet. But increasingly, if users dislike something about a site, they will build their own version, connect it to the original with AI schlep, and then lure over the few percent of users that are triggered by the same thing. Technical barriers like scraping limits are hard as AI agents can be made to browse in increasingly human-like ways (one successful startup explicitly engages in a scraping race against scraping-detection methods by fine-tuning a computer use agent on real human mouse moving patterns). An increasingly common barrier is asking humans for government ID or other real-world verification (with the privacy constraints mitigated with zero-knowledge proofs, if it's a fancy libertarian or crypto -affiliated thing). This too is spreading, also because some people want sites where they can talk to verified real humans.

By 2026, more code gets written in a week than the world wrote in 2020. Open source projects fork themselves into an endless orgy of abundance. Some high school students build functionally near-identical versions of Windows and Google Drive (and every video game in existence) from scratch in a month, because they can and they wanted one new feature on top of it. Everyone and their dog has a software product line. Big Tech unleashes a torrent of lawsuits against people cloning their products, echoing the Oracle v Google lawsuit about Java, but those lawsuits will take years to complete, and months feel like decades on the ground.

Silicon Valley is exuberant. The feeling at Bay Area house parties is (even more than before) one of the singularity being imminent. Some remain skeptical though, rightfully pointing out that the post-scarcity software isn’t the same as post-scarcity everything, and that genuine “agency” in the long-horizon real-world planning sense hasn’t really arrived, and under the hood everything is still rigid LLM scaffolds or unreliable AI computer use agents.

Business strategy in 2025 & 2026

Even though Meta, DeepSeek, and others are behind in raw intelligence and reasoning all throughout 2025 and 2026, they threaten the big labs because they are giving away (both to consumers, and freely to developers through open-weights releases) a level of performance across audio and video and image and text that is “good enough” for most use cases. SOTA performance is no longer needed for many use-cases, especially low-end consumer entertainment (e.g. image gen, chatbots, etc., which Meta is banking on), or most classification, processing, or business writing tasks.

OpenAI is especially vulnerable, since they rely heavily on consumers, and are also increasingly a product company that competes with products built on their API, driving many to switch. Their strategy (internally and to investors, though not publicly) is to be the first to achieve something like a drop-in agentic AI worker and use that to convert their tech lead over open source into >10% of world GDP in revenues. They’ve raised tens of billions and make billions in revenue from their products anyway, so they can bankroll these efforts just fine.

Anthropic remains a jewel of model quality and a Mecca of technical talent that gets surprisingly little attention from the rest of the industry. Analogies to Xerox PARC abound, but there are whispers of internal AGI being imminent, and no one else can claim the ideological mandate of heaven for safe AGI. The talent and money spigots stay on.

xAI and DeepSeek continue releasing open-source consumer models. Both also have a specialty in maths-y STEM and engineering stuff, aided by data collection efforts (with xAI being able to work closely with SpaceX and Tesla engineers) and inference-time compute methods. xAI also continues trying to leverage real-time access to X.com data to its benefit, but this isn't a major advantage or revenue source.

In 2024, thousands of startups were chasing after a lot of different use cases, and some started making serious money, but it was still very early days for actual products. The big winners were companies like Perplexity that use LLMs to trivially improve some LLM-compatible user case (like search), companies like Glean and Hebbia that are doing various enterprise LLM integration schlep, and legal LLM companies like Harvey (since law is intensely textual and high-revenue). However, the real money is still in infrastructure / “selling shovel”, in particular Nvidia.

By the end of 2025, there is no technical bottleneck to remote doctor appointments or most legal work being done entirely by AI. However, diffusion takes time. Also, in many countries lawyers barricade themselves behind a cluster of laws that forbid lawyer-automating AI. Getting hired as a new lawyer, or any kind of white-collar analyst, is getting harder though, as decision makers expect AI to reduce their need for entry-level white-collar workers of every kind, and firing people is much harder than not hiring them in the first place. Healthtech AIs are gradually working their way through regulatory hurdles over 2025-2026, and are clearly better than the average doctor at all the parts of the job that rely only on reasoning and knowledge. However, AI doctor appointments are only trialled at any significant scale in 2026, by Singapore and Estonia. Significant integration of AI in the non-patient-facing parts of the healthcare system is underway in the UK, many EU countries, South Korea, and China by 2026, but again diffusion is slowed by the speed of human bureaucracy.

There are lots of “AI agent” companies automating things like customer service, various types of search (e.g. for shopping / booking flights / etc.), and back-office computer processes. The big cloud hanging over them in 2025 is whether AI codegen scaffolds soon get good enough that they are trivial to replace, and whether generalist AI agents soon get good enough to kill both. In 2026 the first question starts being answered in the affirmative, as lowered barriers to coding create a flood of new entrants and a ruthless bloodbath of competition. However, even the release of o6 in 2026, despite some initial hype, does not yet cause much evidence of the generalist AI agents taking over both by the end of 2026.

There’s lots of LLM evals startups like Braintrust.dev and HumanLoop and Atla, that are mostly struggling to differentiate themselves against each other or to define a new testing/reliability/verification paradigm for the LLM scaffold era, but are growing fast. There’s a lot of LLM agent oversight solutions, but by the end of 2026 none manage to make a massive leap, and the unlocking of new AI uses remains bottlenecked on incumbents' risk tolerance and a slow buildup of knowledge about best practices and track records. A surprisingly retro success is call-centres of humans who are ready to jump in and put an AI agent back on task, or where AI agents can offload work chunks that are heavy on trust/authentication (like confirming a transaction) or on button-clicking UI complexity (like lots of poor legacy software), to human crowdworkers who click the buttons for them, while the AI does the knowledge/intelligence-intensive parts of the job on its own.

Many of the really successful startups are in the spaces that Big Tech won’t touch or has trouble touching: anything controversial (the sexual and the political), and anything too edgy or contrarian or niche/vertical-specific.

The fact that the explosion of codegen threatens Big Tech’s moat, plus some disappointment at the unreliability of o6 after so much hype, plus some general memetic force that means the “current thing” can be AI only for so long, combines to cause a market correction near the end of 2026. Software is starting to seem stale and boring. Investors want to see “real AGI”, not just post-scarcity in software. Google DeepMind’s maths stuff and xAI’s engineering stuff are cool; OpenAI and LLMs are not. Amazon’s AWS & physical store is cool, Google Search and Facebook are not.

Subscribe now

Maths and the hard sciences

A compressed version of what happened to programming in 2023-26 happens in maths in 2025-2026. The biggest news story is that GDM solves a Millennium Prize problem in an almost-entirely-AI way, with a huge amount of compute for searching through proof trees, some clever uses of foundation models for heuristics, and a few very domain-specific tricks specific to that area of maths. However, this has little immediate impact beyond maths PhDs having even more existential crises than usual.

The more general thing happening is that COT RL and good scaffolding actually is a big maths breakthrough, especially as there is no data quality bottleneck here because there’s an easy ground truth to evaluate against—you can just check the proof. AIs trivially win gold in the International Mathematical Olympiad. More general AI systems (including increasingly just the basic versions of Claude 4 or o5) generally have a somewhat-spotty version of excellent-STEM-postgrad-level performance at grinding through self-contained maths, physics, or engineering problems. Some undergrad/postgrad students who pay for the expensive models from OpenAI report having had o3 or o5 entirely or almost entirely do sensible (but basic) “research” projects for them in 2025.

Mostly by 2026 and almost entirely by 2027, the mathematical or theoretical part of almost any science project is now something you hand over to the AI, even in specialised or niche fields.

In 2026, xAI also tries to boost science by launching an automated peer-reviewer / paper-feedback-giver specialised in STEM subjects, that can also run followup experiments automatically, and soon take a paragraph setting the direction and convert it to basically a full paper. Cue a thousand academics blasting it for mistakes in its outputs. The fair assessment is that it’s impressive but not perfect (somewhat like having a brilliantly fast but easily-distracted and non-agentic undergrad research assistant), but still better than all but the highest-effort human peer-reviewers. Elon Musk gets into feuds about its quality online, becomes radicalised about peer-review and academia, and starts the “Republic of Papers” as a side-feature on X to explicitly try to replace academia (it helps that, in 2026, the higher education bubble seems to be bursting in America, partly triggered by fears about AI job automation but also due to political headwinds). Everyone has Opinions.

In 2026, GDM releases work on new maths-oriented AI architectures that include an advanced more flexible derivative of MCTS that also searches for new "concepts" (i.e. new definitions that shrink the length of the most promising proof-tree branches) while doing the proof-tree search. Their maths models prove a long list of new theorems and results, including, in 2027, solving a few more long-standing prize problems, this time in a less ad-hoc and more credibly entirely-AI way. Demis Hassabis talks about "solving physics" within the next year, through a program that includes GDM collaborating with leading physicists.

In 2028, GDM’s collaboration with the theoretical physicists bears fruit: general relativity and quantum mechanics are unified with a new mathematical frame. There are a few candidate new theories with different values of some parameters that can only be verified by expensive experiments, but it seems clear that one of these candidate theories is correct. It's not "solving physics" or a final theory of everything, but it is clearly a major breakthrough in mathematical physics. The technical work owed a lot to a truly enormous compute-budget for RL self-play, the construction of a massive dataset of physics papers, critiques of them, and tokenised observational data by a physicist-and-AI-agent team, and close collaboration with a large number of leading physicists who gave feedback to the AI on the developing theories. Credit for the Nobel Prize is the subject of much discussion, but eventually (in 2030) ends up split between Demis Hassabis, one of the physicists who was most involved, and the most important AI system. Everyone has Opinions.

Corporate Google likes the PR win of achieving the century's greatest physics breakthrough so far, but the application of this mathematical firepower they are most hopeful about is formally verifying the correctness of software. This is especially pressing as there’s a lot of shifting tides in the cyber world. Codegen itself is on net a defense-dominant technology (as discussed earlier). Most of the hacks are either due to sloppy mistakes by early codegen products, or some adversary using AI tools to direct a disproportionate amount of effort on attacking some piece of legacy software that is still used, or on which a codegen-written program (indirectly) depends. There’s increasing demand for really air-tight software from a US defense establishment that is obsessed with cyber advantage over especially China, but also Russia, Iran, and North Korea. Also, easily proving the correctness of code will allow better feedback signals for codegen models, and help in the ambitious efforts underway to rewrite massive parts of the existing tech stack. So in addition to making leaps in the hard sciences, GDM’s other big applied goal is a world where the correctness of all essential code is proven. They have an early success in a late-2026 plugin for several popular languages that is essentially a type-checker on steroids (though of course, this is adopted less by the humans and more by the AIs that now write almost all of the code).

Initially, the US government tries to restrict the diffusion of code verification tools, since they don’t want China to get provably-correct coding capabilities. However, the open source community is only about 6 months behind in verification as it makes some leaps and bounds in 2027-2028, especially since there are thousands of former software engineers and mathematicians without much to do as they wait for the AIs to do their work for them.

As a result, by 2028 feats of intellect that would’ve taken Euler decades are done in a few minutes to mathematically prove that, conditional on the CPU's physical integrity working, some code is an utterly impregnable and flawless pizza delivery routing system. However, verification is not adopted nearly everywhere because there’s a cost multiplier over just getting an AI codegen tool to write unverified code (and AI codegen has continued plummeting in cost, not that anyone really notices anymore).

Societal response

On the soft skills side, by 2025 experiments show that models have reached human-level persuasion capabilities in controlled text-only chat settings. However, this doesn’t really matter, since it’s not how most human persuasion works; part of models being bad at long-horizon planning is weaknesses in strategic relationship-building with relevant actors over longer timescales. There also isn’t yet widespread use of models to manipulate politics. First, there just isn’t a particularly tech-savvy political campaign or movement to influence opinion, except for China gradually experimenting with increasingly more AI in their censorship bureaucracy. Second, models still seem worse than the best humans at that “spark” that lets some people create persuasive, viral ideas. Third, the memetic selection pressures acting on the collective output of humanity on the internet are already superhuman at discovering memetic viruses and persuasive ideas than any individual human, so passing any individual human capability threshold in this domain is not automatically a society-steering ability.

However, some 1-1 use-cases do work. AI scam calls with deepfaked audio and video start being a nuisance by mid 2025 but are mostly reined in by a series of security measures pushed by platforms (and by regulation in the EU), people creating new trust protocols with each other (“what’s our secret passphrase?”), increased ID verification features, and growing social distrust towards any evidence that's only digital.

Lots of people are talking to LLMs for advice. Some swear by Claude 4 in particular. Character.ai -like startups are having a boom. There is a lot of public discussion about people increasingly talking to AIs instead of having human friends and partners (which is boosted after multimedia Llama models are finetuned to be good at sexual image, audio, and—in 2026—video output). There's a hikikomori-like trend, strongest in California, South Korea, and China, where a minority of people forsake almost all human social contact and instead interact with AIs that are superhumanly risk-free and pliable, and come with superhumanly nice voices and avatars. In 2026, Australia and Canada ban talking to non-educational AIs with voice capabilities or human-like avatars for under-16s.

The written text quality of models remains surprisingly mediocre. Claude does best, and is great when prompted right, but “ChatGPTese” remains a thing that afflicts especially OpenAI and Google (though the former improves in 2026), and any human who writes mediocre prompts. There are loads of LLM slop content websites, but not a single blog written by an LLM becomes widely read among intellectual or elite circles.

As the codegen wave of 2026 hits, many consumers feel a few weeks of wonder and whiplash at the agentic AIs that can now do parts of their job, and at the massive orgy of abundance in software, and then this becomes the new normal. The world of atoms hasn’t changed much. Most people by late 2026 just assume that AIs can do basically everything digital or intellectual, and become surprised when they learn of things that the AIs can’t do.

Alignment research & AI-run orgs

In 2025, someone adds some scaffolding on top of an OpenAI Operator instance, making it in-theory capable of earning money through freelance work to pay for its own API costs, including automatically buying more credits for itself and find more freelance work. However, the economics doesn't work out, so it can't actually survive on its own without subsidies. In early 2026, a similar concept actually is economically-viable, and some are launched as an experiment by tech-savvy freelancers looking for some easy money, or by people who are just curious. A few blow up, mostly by doing various things related to memecoin manias and going viral as a result. In late 2026, one such autonomous AI scaffold with a memecoin windfall reasons about next steps, tries to incorporate a US business for itself, cold-emails a bunch of humans to ask for ID, and manages to get one of them to give an ID so it can incorporate the business. By 2027, there are a few experimental digital businesses run by AIs, but they're not very competitive, and often rely on what's effectively a subsidy in human interest due to them being novel.

Alignment research in 2025-2027 is driven by Anthropic (though of course most of their research is on GPU performance engineering, inference-time compute techniques, and other things focused on raw capabilities progress). SAEs peak in popularity in late 2024 before being mostly forgotten, but there’s a new interpretability paradigm that starts being put together in late 2025 based on identifying more general geometric structures in activation space. AI control setups are tested against misalignment “model organisms” that, by 2027, are trivially capable of hacking out of a normal environment. Model weight security at Anthropic is excellent for a private company, but this just means the attackers target OpenAI instead (and the gap between labs and open source is never more than a year in 2025-2027). And, of course, Anthropic internally writes endless safety cases. The general message in them is that a lot is resting on either an interpretability breakthrough or on AI control working on superhuman models. The low amount of evidence gained on “alignment” is frustrating to many; models have been caught scheming endlessly but always in fairly artificial setups, or in messy circumstances where it's not clear what the model should've done. The most important work seems to be work on properties upstream of scheming, such as a stream of work on corrigibility kickstarted by the 2024 Greenblatt et. al. paper "Alignment faking in large language models". The alarmingness of the early evidence against corrigibility was offset by promising empirical work on meta-learning techniques to encourage corrigibility in late 2025 and early 2026. By 2027 it's known how to train a model such that it either will or won't be amenable to being trained out of its current goal. Anthropic reveals this and some other safety-related insights to OpenAI and Google, and asks the State Department to reveal it to Chinese labs but is denied.

By 2027, the new interpretability paradigm is seeing progress, with AIs doing essentially all of the engineering and much of the detailed ideation. This reveals a taxonomy of patterns and feature representation types within neural networks. A few are neat and clean, but mostly models’ internals turn out to be messy and with massive redundancy between different parts. The notion of a model having a singular “goal component” looks less likely, at least if certain choices are made during training.

A test case of the new alignment techniques at Anthropic is the training in 2027 of a new model, Claude 5 Epic or just "Claude Epic", based on Claude 5 Opus -curated training data. Company leadership internally thinks it will be a full AGI. The interpretability team will be observing the model at checkpoints and watching it develop. Countless safety cases have been written; the hope is still to run evals, use AI control setups, and hope for some last-minute firmer guarantees from the interpretability work. Some at Anthropic are entirely convinced just by the scalable oversight work that’s already been done. Others expect the hard part of intent alignment to rear its head at any moment.

One of the avenues that seemed most promising in 2025 was interpreting AI chains-of-thought (COTs), something far easier to make meaningful progress on than interpretability. However, over 2026-2027, much more compute is poured into RL, and the COTs become less legible, as the models drift towards shorthand scripts that are more effective for them than writing out their thoughts in English. Work done by Anthropic and several academic labs leads to techniques for encouraging human interpretability of the COTs, by adding a COT interpretability term to the RL loss function and having some clever training details to avoid the model goodharting the interpretability term. However, this comes at a hit to performance. By 2027, another line of work is humans studying model COTs in detail and learning the ways it thinks; some mathematicians in particular pick up neat mental tricks from studying the COTs of models. However, overall COT interpretability declines, and it's generally accepted we won't know exactly what the models are thinking or why, even if COT analysis and the new interpretability techniques can give some general understanding in 2027.

By 2027, evaluations are showing that frontier models—including open-source models—could meaningfully help in engineering pandemics, if bad actors so chose. There's a messy but moderately effective effort by AI safety organisations and several agencies within governments to have some sort of misuse mitigation measures in place, in particular for API-accessible models. However, in the absence of a major incident, governments don't care enough, and open-source models seem hard to contain. Also, some bioterrorism continues being blocked by wet lab skills and just continuing good luck regarding the absence of a motivated bioterrorist. The other potentially catastrophic source of misuse is cyber, but it increasingly seems like this will be solved by default, in particular because AIs are good at writing secure code and formal verification is increasingly used for critical code.

The previous year of insane AI codegen stuff going on everywhere and the continued steady progress in AI has made it more intuitive to people that there won’t be a lot of “money on the table” for some nascent AGI to eat up, because it will enter a teeming ecosystem of AI systems and humans and their interactions. For example, though there are technically some self-sustaining AIs paying for their server costs, they struggle to compete with purposeful human+AI entities that deliberately try to steal the customers of the AI-only businesses if they ever get too many. The cyber competition is also increasingly tough, meaning that any single rogue AI would have a rough time defeating the rest of the world. However, no evidence by the end of 2027 has ruled out a sharper takeoff, and those who believe in it are increasingly either frantic and panicking, or then stoically equanimous and resigned, expecting the final long-term agentic planning piece to slot into place at any moment and doom the world. Also, the labs are openly talking about recursive self-improvement as their strategy.

Government wakeup

In 2025 in the Chinese government, thinking about AGI is stuck somewhere inside CCP internal machinations. Xi Jinping has heard of it and occasionally thinks about it, but doesn’t take it seriously as a near-term thing. However, some senior staff are properly "AGI-pilled" (and split between advocates of safety and racing, but without an overwhelming favourite yet, though also it’s clear that once the issue does get serious, national security will be by far the loudest voice in the room).

The EU is preparing to burden its (small) AI sector with bureaucracy (the EU AI Act). In 2025-2026, there are some high-profile cases of AI companies not offering services in Europe due to regulations. However, the EU & UK governments are best at tracking AI because of their efforts on safety research through the EU AI Office and the UK AISI. The UK government also announces planning reforms targeted at letting private companies build data centres more easily.

Going into 2026, India is trying to ride a new IT wave using open-sourced Llama, DeepSeek, and Mistral models to create products to try to replace its current IT services -heavy export industry. A lack of domestic data centres is making this harder, and political tangles make fast movement hard.

In 2025, the Trump administration tightens export controls on China (in particular preventing even older Nvidia chips like the H20 from being sold to China), tries to pressure other countries to not buy Chinese GPUs, and makes it easier to build power (especially nuclear & gas) and data centres within the US. Otherwise there is little US political action on AI. Behind the scenes, the defense establishment gets more involved in the AI scene. There are secret NSA and CIA projects researching AI for offensive & defensive cyber. More Chinese infiltration of American cyber systems is discovered. High-level government conversations behind closed doors are upping the apocalyptic rhetoric about how essential it is for the US to win in AI-powered cyber. All the major US AI labs have some government partnership related to this.

As internal CCP machinations grind along, and the evidence about big AI effects on programming rolls in through late 2025 and 2026, the CCP gets more serious about AI. Like in the US, once the strategic and national security implications rise in salience, other issues (including safety) fall. The CCP prepares their 15th Five Year Plan for 2026, which involves massive subsidies and investment for AI. DeepSeek leads the domestic AI industry, but the CCP has made it clear they will make the big calls. There is a conversation behind closed doors about whether to end the open-sourcing of DeepSeek models, but the CCP comes out in favour, in particular to try to get the rest of the world to build on top of Chinese AI models (and also helped by the press that the early 2025 DeepSeek R1 release caused). Huawei is shipping GPUs that are only about 12-16 months behind Nvidia. China’s worse startup ecosystem means that AI agent adoption is slower than in the US, though. However, China’s surveillance state has been on an AI adoption spree. In particular, censorship is instantaneous with LLMs. By 2026, there are widespread educational "Xi Jinping Thought AI Tutors" that most CCP members are mandated to have weekly sessions with. Retaining control of society now seems increasingly easy, allowing the CCP to focus more on geopolitics and military, and less on the consumer economy. At the same time, Xi Jinping has an overly-rosy view of Chinese military AI capabilities because people tell him what he wants to hear.

There's a shadow conflict playing out, almost entirely out of public attention, between US and Chinese cyber forces trying to get into each other's critical infrastructure while reducing the extent to which their own infrastructure is compromised. Contrary to publicly-available information, America probably has the upper hand, but it's also clear that both could inflict high damage on the other.

AI starts to figure in US domestic politics in 2026, but is not yet a top issue. The upcoming replacement of most human white-collar work looks more and more plausible, especially after OpenAI's release of o6. Job losses are not yet high, though, as human orgs take time to react to change. Even in software, where mass firings could perhaps most plausibly be done, many are afraid to try it first. Non-technical managers tend to treat the technical stuff as blackbox wizardry and are scared to break it, and technical managers don't want to reduce the size of their empires. The main effect is that hiring new software engineers basically stops, but the disaffected—a small group of nerdy, elite-coded, low-voting-rate youngsters—are not politically important. Other white-collar office jobs are also reducing entry-level hiring, as increased demand for productivity is instead met by existing employees just using AI more.

The US government, like China, decides against legally restricting the open-sourcing of AI models. This is influenced by pro-innovation arguments, China doing the same, and the defense-specific AI programs being done under classification with closed-source models anyway. The AI labs have also grown more reliant on government cooperation for things like power grid connection permits, data centre construction permits, and lobbying to avoid ruinous tariffs on GPUs. They also all want the money flow of Pentagon contracts and the prestige of working on US defense. This means that there's a tacit agreement that if the government hints they should or shouldn't do something, they are very likely to march to that beat.

Starting in late 2026, many of the governments worried about fertility decline get concerned about everyone talking to AIs instead of each other. South Korea bans “personalised AI companions” in 2027, and the EU requires people to register if they use them and imposes various annoying limits that drive down usage. However, the addicts can just use open-source models to circumvent regulations. Some countries spend lots of money on getting the "creator influencers"—influencers turbo-charged by generative AI—to extol the virtues of marriage and kids. By 2027, though, the more forward-looking politicians are—in private—starting to realise that once the economy transitions to being AI-powered, national interests are not harmed if the human population plummets. The “intelligence curse” is starting to set in.

Part 2: 2027-2030

Part 3: 2030-2040

Thanks to Luke Drago, Duncan McClements, Theo Horsley, and Bilal Chughtai for comments.

Capital, AGI, and human ambition

Rudolf Laine — Sat, 28 Dec 2024 17:50:20 GMT

A modified version of this essay is now part of a much more comprehensive essay series, The Intelligence Curse.

Edited to add: The main takeaway of this post is meant to be: Labour-replacing AI will shift the relative importance of human v non-human factors of production, which reduces the incentives for society to care about humans while making existing powers more effective and entrenched. Many people are reading this post in a way where either (a) "capital" means just "money" (rather than also including physical capital like factories and data centres), or (b) the main concern is human-human inequality (rather than broader societal concerns about humanity's collective position, the potential for social change, and human agency).

I've heard many people say something like "money won't matter post-AGI". This has always struck me as odd, and as most likely completely incorrect.

First: labour means human mental and physical effort that produces something of value. Capital goods are things like factories, data centres, and software—things humans have built that are used in the production of goods and services. I'll use "capital" to refer to both the stock of capital goods and to the money that can pay for them. I'll say "money" when I want to exclude capital goods.

The key economic effect of AI is that it makes capital a more and more general substitute for labour. There's less need to pay humans for their time to perform work, because you can replace that with capital (e.g. data centres running software replaces a human doing mental labour).

I will walk through consequences of this, and end up concluding that labour-replacing AI means:

The ability to buy results in the real world will dramatically go up
Human ability to wield power in the real world will dramatically go down (at least without money); including because:
1. there will be no more incentive for states, companies, or other institutions to care about humans
2. it will be harder for humans to achieve outlier outcomes relative to their starting resources
Radical equalising measures are unlikely

Overall, this points to a neglected downside of transformative AI: that society might become permanently static, and that current power imbalances might be amplified and then turned immutable.

Given sufficiently strong AI, this is not a risk about insufficient material comfort. Governments could institute UBI with the AI-derived wealth. Even if e.g. only the United States captures AI wealth and the US government does nothing for the world, if you're willing to assume arbitrarily extreme wealth generation from AI, the wealth of the small percentage of wealthy Americans who care about causes outside the US might be enough to end material poverty (if 1% of American billionaire wealth was spent on wealth transfers to foreigners, it would take 16 doublings of American billionaire wealth as expressed in purchasing-power-for-human-needs—a roughly 70,000x increase—before they could afford to give $500k-equivalent to every person on Earth; in a singularity scenario where the economy's doubling time is months, this would not take long). Of course, if the AI explosion is less singularity-like, or if the dynamics during AI take-off actively disempower much of the world's population (a real possibility), even material comfort could be an issue.

What most emotionally moves me about these scenarios is that a static society with a locked-in ruling caste does not seem dynamic or alive to me. We should not kill human ambition, if we can help it.

There are also ways in which such a state makes slow-rolling, gradual AI catastrophes more likely, because the incentive for power to care about humans is reduced.

The default solution

Let's assume human mental and physical labour across the vast majority of tasks that humans are currently paid wages for no longer has non-trivial market value, because the tasks can be done better/faster/cheaper by AIs. Call this labour-replacing AI.

There are two levels of the standard solution to the resulting unemployment problem:

Governments will adopt something universal basic income (UBI).
We will quickly hit superintelligence, and, assuming the superintelligence is aligned, live in a post-scarcity technological wonderland where everything is possible.

Note, firstly, that money will continue being a thing, at least unless we have one single AI system doing all economic planning. Prices are largely about communicating information. If there are many actors and they trade with each other, the strong assumption should be that there are prices (even if humans do not see them or interact with them). Remember too that however sharp the singularity, abundance will still be finite, and must therefore be allocated.

Money currently struggles to buy talent

Money can buy you many things: capital goods, for example, can usually be bought quite straightforwardly, and cannot be bought without a lot of money (or other liquid assets, or non-liquid assets that others are willing to write contracts against, or special government powers). But it is surprisingly hard to convert raw money into labour, in a way that is competitive with top labour.

Consider Blue Origin versus SpaceX. Blue Origin was started two years earlier (2000 v 2002), had much better funding for most of its history, and even today employs almost as many people as SpaceX (11,000 v 13,000). Yet SpaceX has crushingly dominated Blue Origin. In 2000, Jeff Bezos had $4.7B at hand. But it is hard to see what he could've done to not lose out to the comparatively money-poor SpaceX with its intense culture and outlier talent.

Consider, a century earlier, the Wright brothers with their bike shop resources beating Samuel Langley's well-funded operation.

Consider the stereotypical VC-and-founder interaction, or the acquirer-and-startup interaction. In both cases, holders of massive financial capital are willing to pay very high prices to bet on labour—and the bet is that the labour of the few people in the startup will beat extremely large amounts of capital.

If you want to convert money into results, the deepest problem you are likely to face is hiring the right talent. And that comes with several problems:

It's often hard to judge talent, unless you yourself have considerable talent in the same domain. Therefore, if you try to find talent, you will often miss.
Talent is rare (and credentialed talent even more so—and many actors can't afford to rely on any other kind, because of point 1), so there's just not very much of it going around.
Even if you can locate the top talent, the top talent tends to be less amenable to being bought out by money than others.

(Of course, those with money keep building infrastructure that makes it easier to convert money into results. I have seen first-hand the largely-successful quest by quant finance companies to strangle out all existing ambition out of top UK STEM grads and replace it with the eking of tiny gains in financial markets. Mammon must be served!)

With labour-replacing AI, these problems go away.

First, you might not be able to judge AI talent. Even the AI evals ecosystem might find it hard to properly judge AI talent—evals are hard. Maybe even the informal word-of-mouth mechanisms that correctly sung praises of Claude-3.5-Sonnet far more decisively than any benchmark might find it harder and harder to judge which AIs really are best as AI capabilities keep rising. But the real difference is that the AIs can be cloned. Currently, huge pools of money chase after a single star researcher who's made a breakthrough, and thus had their talent made legible to those who control money (who can judge the clout of the social reception to a paper but usually can't judge talent itself directly). But the star researcher that is an AI can just be cloned. Everyone—or at least, everyone with enough money to burn on GPUs—gets the AI star researcher. No need to sort through the huge variety of unique humans with their unproven talents and annoying inability to be instantly cloned. This is the main reason why it will be easier for money to find top talent once we have labour-replacing AIs.

Also, of course, the price of talent will go down massively, because the AIs will be cheaper than the equivalent human labour, and because competition will be fiercer because the AIs can be cloned.

The final big bottleneck for converting money into talent is that lots of top talent has complicated human preferences that make them hard to buy out. The top artist has an artistic vision they're genuinely attached to. The top mathematician has a deep love of elegance and beauty. The top entrepreneur has deep conviction in what they're doing—and probably wouldn't function well as an employee anyway. Talent and performance in humans are surprisingly tied to a sacred bond to a discipline or mission (a fact that the world's cynics / careerists / Roman Empires like to downplay, only to then find their lunch eaten by the ambitious interns / SpaceXes / Christianities of the world). In contrast, AIs exist specifically so that they can be trivially bought out (at least within the bounds of their safety training). The genius AI mathematician, unlike the human one, will happily spend its limited time on Earth proving the correctness of schlep code.

Finally (and obviously), the AIs will eventually be much more capable than any human employees at their tasks.

This means that the ability of money to buy results in the real world will dramatically go up once we have labour-replacing AI.

Most people's power/leverage derives from their labour

Labour-replacing AI also deprives almost everyone of their main lever of power and leverage. Most obviously, if you're the average Joe, you have money because someone somewhere pays you to spend your mental and/or physical efforts solving their problems.

But wait! We assumed that there's UBI! Problem solved, right?

Why are states ever nice?

UBI is granted by states that care about human welfare. There are many reasons why states care and might care about human welfare.

Over the past few centuries, there's been a big shift towards states caring more about humans. Why is this? We can examine the reasons to see how durable they seem:

Moral changes downstream of the Enlightenment, in particular an increased centering of liberalism and individualism.
Affluence & technology. Pre-industrial societies were mostly so poor that significant efforts to help the poor would've bankrupted them. Many types of help (such as effective medical care) are also only possible because of new technology.
Incentives for states to care about freedom, prosperity, and education.

AI will help a lot with the 2nd point. It will have some complicated effect on the 1st. But here I want to dig a bit more into the 3rd, because I think this point is unappreciated.

Since the industrial revolution, the interests of states and people have been unusually aligned. To be economically competitive, a strong state needs efficient markets, a good education system that creates skilled workers, and a prosperous middle class that creates demand. It benefits from using talent regardless of its class origin. It also benefits from allowing high levels of freedom to foster science, technology, and the arts & media that result in global soft-power and cultural influence. Competition between states largely pushes further in all these directions—consider the success of the US, or how even the CCP is pushing for efficient markets and educated rich citizens, and faces incentives to allow some freedoms for the sake of Chinese science and startups. Contrast this to the feudal system, where the winning strategy was building an extractive upper class to rule over a population of illiterate peasants and spend a big share of extracted rents on winning wars against nearby states. For more, see my review of Foragers, Farmers, and Fossil Fuels, or my post on the connection between moral values and economic growth.

With labour-replacing AI, the incentives of states—in the sense of what actions states should take to maximise their competitiveness against other states and/or their own power—will no longer be aligned with humans in this way. The incentives might be better than during feudalism. During feudalism, the incentive was to extract as much as possible from the peasants without them dying. After labour-replacing AI, humans will be less a resource to be mined and more just irrelevant. However, spending fewer resources on humans and more on the AIs that sustain the state's competitive advantage will still be incentivised.

Humans will also have much less leverage over states. Today, if some important sector goes on strike, or if some segment of the military threatens a coup, the state has to care, because its power depends on the buy-in of at least some segments of the population. People can also credibly tell the state things like "invest in us and the country will be stronger in 10 years". But once AI can do all the labour that keeps the economy going and the military powerful, the state has no more de facto reason to care about the demands of its humans.

Adam Smith could write that his dinner doesn't depend on the benevolence of the butcher or the brewer or the baker. The classical liberal today can credibly claim that the arc of history really does bend towards freedom and plenty for all, not out of the benevolence of the state, but because of the incentives of capitalism and geopolitics. But after labour-replacing AI, this will no longer be true. If the arc of history keeps bending towards freedom and plenty, it will do so only out of the benevolence of the state (or the AI plutocrats). If so, we better lock in that benevolence while we have leverage—and have a good reason why we expect it to stand the test of time.

The best thing going in our favour is democracy. It's a huge advantage that a deep part of many of the modern world's strongest institutions (i.e. Western democracies) is equal representation of every person. However, only about 13% of the world's population lives in a liberal democracy, which creates concerns both about the fate of the remaining 87% of the world's people (especially the 27% in closed autocracies). It also creates potential for Molochian competition between humanist states and less scrupulous states that might drive down the resources spent on human flourishing to zero over a sufficiently long timespan of competition.

I focus on states above, because states are the strongest and most durable institutions today. However, similar logic applies if, say, companies or some entirely new type of organisation become the most important type of institution.

No more outlier outcomes?

Much change in the world is driven by people who start from outside money and power, achieve outlier success, and then end up with money and/or power. This makes sense, since those with money and/or power rarely have the fervour to push for big changes, since they are exactly those who are best served by the status quo.

Whatever your opinions on income inequality or any particular group of outlier successes, I hope you agree with me that the possibility of someone achieving outlier success and changing the world is important for avoiding stasis and generally having a world that is interesting to live in.

Let's consider the effects of labour-replacing AI on various routes to outlier success through labour.

Entrepreneurship is increasingly what Matt Clifford calls the "technology of ambition" of choice for ambitious young people (at least those with technical talent and without a disposition for politics). Right now, entrepreneurship has become easier. AI tools can already make small teams much more effective without needing to hire new employees. They also reduce the entry barrier to new skills and fields. However, labour-replacing AI makes the tenability of entrepreneurship uncertain. There is some narrow world in which AIs remain mostly tool-like and entrepreneurs can succeed long after most human labour is automated because they provide agency and direction. However, it also seems likely that sufficiently strong AI will by default obsolete human entrepreneurship. For example, VC funds might be able to directly convert money into hundreds of startup attempts all run by AIs, without having to go through the intermediate route of finding a human entrepreneurs to manage the AIs for them.

The hard sciences. The era of human achievement in hard sciences will probably end within a few years because of the rate of AI progress in anything with crisp reward signals.

Intellectuals. Keynes, Friedman, and Hayek all did technical work in economics, but their outsize influence came from the worldviews they developed and sold (especially in Hayek's case), which made them more influential than people like Paul Samuelson who dominated mathematical economics. John Stuart Mill, John Rawls, and Henry George were also influential by creating frames, worldviews, and philosophies. The key thing that separates such people from the hard scientists is that the outputs of their work are not spotlighted by technical correctness alone, but require moral judgement as well. Even if AI is superhumanly persuasive and correct, there's some uncertainty about how AI work in this genre will fit into the way that human culture picks and spreads ideas. Probably it doesn't look good for human intellectuals. I suspect that a lot of why intellectuals' ideologies can have so much power is that they're products of genius in a world where genius is rare. A flood of AI-created ideologies might mean that no individual ideology, and certainly no human one, can shine so bright anymore. The world-historic intellectual might go extinct.

Politics might be one of the least-affected options, since I'd guess that most humans specifically want a human to do that job, and because politicians get to set the rules for what's allowed. The charisma of AI-generated avatars, and a general dislike towards politicians at least in the West, might throw a curveball here, though. It's also hard to say whether incumbents will be favoured. AI might bring down the cost of many parts of political campaigning, reducing the resource barrier to entry. However, if AI too expensive for small actors is meaningfully better than cheaper AI, this would favour actors with larger resources. I expect these direct effects to be smaller than the indirect effects from whatever changes AI has on the memetic landscape.

Also, the real play is not to go into actual politics, where a million other politically-talented people are competing to become president or prime minister. Instead, have political skill and go somewhere outside government where political skill is less common (c.f. Sam Altman). Next, wait for the arrival of hyper-competent AI employees that reduce the demands for human subject-matter competence while increasing the rewards for winning political games within that organisation.

Military success as a direct route to great power and disruption has—for the better—not really been a thing since Napoleon. Advancing technology increases the minimum industrial base for a state-of-the-art army, which benefits incumbents. AI looks set to be controlled by the most powerful countries. One exception is if coups of large countries become easier with AI. Control over the future AI armies will likely be both (a) more centralised than before (since a large number of people no longer have to go along for the military to take an action), and (b) more tightly controllable than before (since the permissions can be implemented in code rather than human social norms). These two factors point in different directions so it's uncertain what the net effect on coup ease will be. Another possible exception is if a combination of revolutionary tactics and cheap drones enables a Napoleon-of-the-drones to win against existing armies. Importantly, though, neither of these seems likely to promote the good kind of disruptive challenge to the status quo.

Religions. When it comes to rising rank in existing religions, the above takes on politics might be relevant. When it comes to starting new religions, the above takes on intellectuals might be relevant.

So on net, sufficiently strong labour-replacing AI will be on-net bad for the chances of every type of outlier human success, with perhaps the weakest effects in politics. This is despite the very real boost that current AI has on entrepreneurship.

All this means that the ability to get and wield power in the real world without money will dramatically go down once we have labour-replacing AI.

Enforced equality is unlikely

The Great Leveler is a good book on the history of inequality that (at least per the author) has survived its critiques fairly well. Its conclusion is that past large reductions in inequality have all been driven by one of the "Four Horsemen of Leveling": total war, violent revolution, state collapse, and pandemics. Leveling income differences has historically been hard enough to basically never happen through conscious political choice.

Imagine that labour-replacing AI is here. UBI is passed, so no one is starving. There's a massive scramble between countries and companies to make the best use of AI. This is all capital-intensive, so everyone needs to woo holders of capital. The top AI companies wield power on the level of states. The redistribution of wealth is unlikely to end up on top of the political agenda.

An exception might be if some new political movement or ideology gets a lot of support quickly, and is somehow boosted by some unprecedented effect of AI (such as: no one has jobs anymore so they can spend all their time on politics, or there's some new AI-powered coordination mechanism).

Therefore, even if the future is a glorious transhumanist utopia, it is unlikely that people will be starting in it at an equal footing. Due to the previous arguments, it is also unlikely that they will be able to greatly change their relative footing later on.

Consider also equality between states. Some states stand set to benefit massively more than others from AI. Many equalising measures, like UBI, would be difficult for states to extend to non-citizens under anything like the current political system. This is true even of the United States, the most liberal and humanist great power in world history. By default, the world order might therefore look (even more than today) like a global caste system based on country of birth, with even fewer possibilities for immigration (because the main incentive to allow immigration is its massive economic benefits, which only exist when humans perform economically meaningful work).

The default outcome?

Let's grant the assumptions at the start of this post and the above analysis. Then, the post-labour-replacing-AI world involves:

Money will be able to buy results in the real world better than ever.
People's labour gives them less leverage than ever before.
Achieving outlier success through your labour in most or all areas is now impossible.
There was no transformative leveling of capital, either within or between countries.

This means that those with significant capital when labour-replacing AI started have a permanent advantage. They will wield more power than the rich of today—not necessarily over people, to the extent that liberal institutions remain strong, but at least over physical and intellectual achievements. Upstarts will not defeat them, since capital now trivially converts into superhuman labour in any field.

Also, there will be no more incentive for whatever institutions wield power in this world to care about people in order to maintain or grow their power, because all real power will flow from AI. There might, however, be significant lock-in of liberal humanist values through political institutions. There might also be significant lock-in of people's purchasing power, if everyone has meaningful UBI (or similar), and the economy retains a human-oriented part.

In the best case, this is a world like a more unequal, unprecedentedly static, and much richer Norway: a massive pot of non-human-labour resources (oil :: AI) has benefits that flow through to everyone, and yes some are richer than others but everyone has a great standard of living (and ideally also lives forever). The only realistic forms of human ambition are playing local social and political games within your social network and class. If you don't have a lot of capital (and maybe not even then), you don't have a chance of affecting the broader world anymore. Remember: the AIs are better poets, artists, philosophers—everything; why would anyone care what some human does, unless that human is someone they personally know? Much like in feudal societies the answer to "why is this person powerful?" would usually involve some long family history, perhaps ending in a distant ancestor who had fought in an important battle ("my great-great-grandfather fought at Bosworth Field!"), anyone of importance in the future will be important because of something they or someone they were close with did in the pre-AGI era ("oh, my uncle was technical staff at OpenAI"). The children of the future will live their lives in the shadow of their parents, with social mobility extinct. I think you should definitely feel a non-zero amount of existential horror at this, even while acknowledging that it could've gone a lot worse.

In a worse case, AI trillionaires have near-unlimited and unchecked power, and there's a permanent aristocracy that was locked in based on how much capital they had at the time of labour-replacing AI. The power disparities between classes might make modern people shiver, much like modern people consider feudal status hierarchies grotesque. But don't worry—much like the feudal underclass mostly accepted their world order due to their culture even without superhumanly persuasive AIs around, the future underclass will too.

In the absolute worst case, humanity goes extinct, potentially because of a slow-rolling optimisation for AI power over human prosperity over a long period of time. Because that's what the power and money incentives will point towards.

What's the takeaway?

If you read this post and accept a job at a quant finance company as a result, I will be sad. If you were about to do something ambitious and impactful about AI, and read this post and accept a job at Anthropic to accumulate risk-free personal capital while counterfactually helping out a bit over the marginal hire, I can't fault you too much, but I will still be slightly sad.

It's of course true that the above increases the stakes of medium-term (~2-10 year) personal finance, and you should consider this. But it's also true that right now is a great time to do something ambitious. Robin Hanson calls the present "the dreamtime", following a concept in Aboriginal myths: the time when the future world order and its values are still liquid, not yet set in stone.

Previous upheavals—the various waves of industrialisation, the internet, etc.—were great for human ambition. With AI, we could have the last and greatest opportunity for human ambition—followed shortly by its extinction for all time. How can your reaction not be: "carpe diem"?

We should also try to preserve the world's dynamism.

Rationalist thought on post-AGI futures is too solutionist. The strawman version: solve morality, solve AI, figure out the optimal structure to tile the universe with, do that, done. (The actual leading figures have far less strawman views; see e.g. Paul Christiano at 23:30 here—but the on-the-ground culture does lean in the strawman direction.)

I think it's much healthier for society and its development to be a shifting, dynamic thing where the ability, as an individual, to add to it or change it remains in place. And that means keeping the potential for successful ambition—and the resulting disruption—alive.

How do we do this? I don't know. But I don't think you should see the approach of powerful AI as a blank inexorable wall of human obsolescence, consuming everything equally and utterly. There will be cracks in the wall, at least for a while, and they will look much bigger up close once we get there—or if you care to look for them hard enough from further out—than from a galactic perspective. As AIs get closer and closer to a Pareto improvement over all human performance, though, I expect we'll eventually need to augment ourselves to keep up.

Review: Planecrash

Rudolf Laine — Fri, 27 Dec 2024 14:17:09 GMT

Take a stereotypical fantasy novel, a textbook on mathematical logic, and Fifty Shades of Grey. Mix them all together and add extra weirdness for spice. The result might look a lot like Planecrash (AKA: Project Lawful), a work of fiction co-written by "Iarwain" (a pen-name of Eliezer Yudkowsky) and "lintamande".

(image credit: Planecrash)

Yudkowsky is not afraid to be verbose and self-indulgent in his writing. He previously wrote a Harry Potter fanfic that includes what's essentially an extended Ender's Game fanfic in the middle of it, because why not. In Planecrash, it starts with the very format: it's written as a series of forum posts (though there are ways to get an ebook). It continues with maths lectures embedded into the main arc, totally plot-irrelevant tangents that are just Yudkowsky ranting about frequentist statistics, and one instance of Yudkowsky hijacking the plot for a few pages to soapbox about his pet Twitter feuds (with transparent in-world analogues for Effective Altruism, TPOT, and the post-rationalists). Planecrash does not aspire to be high literature. Yudkowsky is self-aware of this, and uses it to troll big-name machine learning researchers:

(source)

Why would anyone ever read Planecrash? I read (admittedly—sometimes skimmed) it, and I see two reasons:

The characters are competent in a way that characters in fiction rarely are. Yudkowsky is good at writing intelligent characters in a specific way that I haven't seen anyone else do as well. Lintamande writes a uniquely compelling story of determination and growth in an extremely competent character.
More than anyone else I've yet read, Yudkowsky has his own totalising and self-consistent worldview/philosophy, and Planecrash makes it pop more than anything else he's written.

The setup

Dath ilan is an alternative quasi-utopian Earth, based (it's at least strongly hinted) on the premise of: what if the average person was Eliezer Yudkowsky? Dath ilan has all the normal quasi-utopian things like world government and land-value taxes and the widespread use of Bayesian statistics in science. Dath ilan also has some less-normal things, like annual Oops It's Time To Overthrow the Government festivals, an order of super-rationalists, and extremely high financial rewards for designing educational curricula that bring down the age at which the average child learns the maths behind the game theory of cooperation.

Keltham is an above-average-selfishness, slightly-above-average-intelligence young man from dath ilan. He dies in the titular plane crash, and wakes up in Cheliax.

Cheliax is a country in a medieval fantasy world in another plane of existence to dath ilan's (get it?). (This fantasy world is copied from a role-playing game setting—a fact I discovered when Planecrash literally linked to a Wiki article to explain part of the in-universe setting.) Like every other country in this world, Cheliax is medieval and poor. Unlike the other countries, Cheliax has the additional problem of being ruled by the forces of Hell.

Keltham meets Carissa, a Chelish military wizard who alerts the Chelish government about Keltham. Keltham is kept unaware about the Hellish nature of Cheliax, so he's eager to use his knowledge to start the scientific and industrial revolutions in Cheliax to solve the medieval poverty thing—starting with delivering lectures on first-order logic (why, what else would you first do in a medieval fantasy world?). An elaborate game begins where Carissa and a select group of Chelish agents try to extract maximum science from an unwitting Keltham before he realises what Cheliax really is—and hope that by that time, they'll have tempted him to change his morals towards a darker, more Cheliax-compatible direction.

The characters

Keltham oscillates somewhere between annoying and endearing.

The annoyingness comes from his gift for interrupting any moment with polysyllabic word vomit. Thankfully, this is not random pretentious techno-babble but a coherent depiction of a verbose character who thinks in terms of a non-standard set of concepts. Keltham's thoughts often include an exclamation along the lines of "what, how is {'coordination failure' / 'probability distribution' / 'decision-theoretic-counterfactual-threat-scenario'} so many syllables in this language, how do these people ever talk?"—not an unreasonable question. However, the sheer volume of Keltham's verbosity is still something, especially when it gets in the way of everything else.

The endearingness comes from his manic rationalist problem-solver energy, which gets applied to everything from figuring out chemical processes for magic ingredients to estimating the odds that he's involved in a conspiracy to managing the complicated social scene Cheliax places him in. It's somewhat like The Martian, a novel (and movie) about an astronaut stranded on Mars solving a long series of engineering challenges, but the problem-solving is much more abstract and game-theoretic and interpersonal, than concrete and physical and man-versus-world.

By far the best and most interesting character in Planecrash is Carissa Sevar, one of the several characters whose point-of-view is written by lintamande rather than Yudkowsky. She's so driven that she accidentally becomes a cleric of the god of self-improvement. She grapples realistically with the large platter of problems she's handed, experiences triumph and failure, and keeps choosing pain over stasis. All this leads to perhaps the greatest arc of grit and unfolding ambition that I've read in fiction.

The competence

I have a memory of once reading some rationalist blogger describing the worldview of some politician as: there's no such thing as competence, only loyalty. If a problem doesn't get solved, it's definitely not because the problem was tricky and there was insufficient intelligence applied to it or a missing understanding of its nature or someone was genuinely incompetent. It's always because whoever was working on it wasn't loyal enough to you. (I thought this was Scott Alexander on Trump, but the closest from him seems to be this, which makes a very different point.)

EDIT: it was Scott Alexander on Trump, but in a different post, as helpfully pointed out by a commenter: “He lives in a world where there is no such thing as intelligence, only loyalty. If we haven’t solved all of our problems yet, it’s because the Department of Problem-Solving was insufficiently loyal, and didn’t try hard enough. His only promise is to fill that department with loyal people who really want the problem solved.”

Whether or not I hallucinated this, the worldview of Planecrash is the opposite.

Consider Queen Abrogail Thrune II, the despotic and unhinged ruler of Cheliax who has a flair for torture. You might imagine that her main struggles are paranoia over the loyalty of her minions, and finding time to take glee in ruling over her subjects. And there's some of those. But more than that, she spends a lot of time being annoyed by how incompetent everyone around her is.

Or consider Aspexia Rugatonn, Cheliax's religious leader and therefore in charge of making the country worship Hell. She's basically a kindly grandmother figure, except not. You might expect her thoughts to be filled with deep emotional conviction about Hell, or disappointment in the "moral" failures of those who don't share her values (i.e. every non-sociopath who isn't brainwashed hard enough). But instead, she spends a lot of her time annoyed that other people don't understand how to act most usefully within the bounds of the god of Hell's instructions. The one time she gets emotional is when a Chelish person finally manages to explain the concept of corrigibility to her as well as Aspexia herself could. (The gods and humans in the Planecrash universe are in a weird inverse version of the AI alignment problem. The gods are superintelligent, but have restricted communication bandwidth and clarity with humans. Therefore humans often have to decide how to interpret tiny snippets of god-orders through changing circumstances. So instead of having to steer the superintelligence given limited means, the core question is how to let yourself be steered by a superintelligence that has very limited communication bandwidth with you.)

Fiction is usually filled with characters who advance the plot in helpful ways with their emotional fumbles: consider the stereotypical horror movie protagonist getting mad and running into a dark forest alone, or a character whose pride is insulted doing a dumb thing on impulse. Planecrash has almost none of that. The characters are all good at their jobs. They are surrounded by other competent actors with different goals thinking hard about how to counter their moves, and they always think hard in response, and the smarter side tends to win. Sometimes you get the feeling you're just reading the meeting notes of a competent team struggling with a hard problem. Evil is not dumb or insane, but just "unaligned" by virtue of pursuing a different goal than you—and does so very competently. For example: the core values of the forces of Hell are literally tyranny, slavery, and pain. They have a strict hierarchy and take deliberate steps to encourage arbitrary despotism out of religious conviction. And yet: their hierarchy is still mostly an actual competence hierarchy, because the decision-makers are all very self-aware that they can only be despotic to the extent that it still promotes competence on net. Because they're competent.

Planecrash, at its heart, is competence porn. Keltham's home world of dath ilan is defined by its absence of coordination failures. Neither there nor in Cheliax's world are there really any lumbering bureaucracies that do insane things for inscrutable bureaucratic reasons; all the organisations depicted are all remarkably sane. Important positions are almost always filled by the smart, skilled, and hardworking. Decisions aren't made because of emotional outbursts. Instead, lots of agents go around optimising for their goals by thinking hard about them. For a certain type of person, this is a very relaxing world to read about, despite all the hellfire

The philosophy

"Rationality is systematized winning", writes Yudkowsky in The Sequences. All the rest is commentary.

The core move in Yudkowsky's philosophy is:

We want to find the general solution to some problem.
- for example: fairness—how should we split gains from a project where many people participated
Now here are some common-sense properties that this thing should follow
- for example:
  - (1) no gains should be left undivided
  - (2) if two people both contribute identically to every circumstance (formalised as a set of participating people), they should receive an equal share of the gains
  - (3) the rule should give the same answer if you combine the division of gains from project A and then project B, as when you use it to calculate the division of gains from project A+B
  - (4) if one person doesn't add value in any circumstance, their share of the gains is zero
Here is The Solution. Note that it's mathematically provable that if you don't follow The Solution, there exists a situation where you will do something obviously dumb.
- For example: Shapely value is the unique solution that satisfies the axioms above. (The Planecrash walkthrough of Shapely value is roughly here; see also here for more Planecrash about trade and fairness.)
Therefore, The Solution is uniquely spotlighted by the combination of common-sense goals and maths as the final solution to this problem, and if you disagree, please read this 10,000 word dialogue.

The centrality of this move is something I did not get from The Sequences, but which is very apparent in Planecrash. A lot of the maths in Planecrash isn't new Yudkowsky material. But Planecrash is the only thing that has given me a map through the core objects of Yudkowsky's philosophy, and spelled out the high-level structure so clearly. It's also, as far as I know, the most detailed description of Yudkowsky's quasi-utopian world of dath ilan.

Validity, Probability, Utility

Keltham's lectures to the Chelish—yes, there are actually literal maths lectures within Planecrash—walk through three key examples, at a spotty level of completeness but at a high quality of whatever is covered:

Validity, i.e. logic. In particular, Yudkowksy highlights what I think is some combination of Lindstrom's theorem and Godel's completeness theorem, that together imply first-order logic is the unique logic that is both complete (i.e. everything true within it can be proven) and has some other nice properties. However, first-order logic is also not strong enough to capture some things we care about (such as the natural numbers), so this is the least-strong example of the above pattern. Yudkowsky has written out his thoughts on logic in the mathematics and logic section here, if you want to read his takes in a non-fiction setting.
Probability. So-called Dutch book theorems show that if an agent does not update their beliefs in a Bayesian way, there exists a set of losing bets that they would take despite it leading to a guaranteed loss. So your credences in beliefs should be represented as probabilities, and you should update those probabilities with Bayes' theorem. (Here is a list of English statements that, dath ilani civilisation thinks, anyone competent in Probability should be able to translate into correct maths.)
Utility. The behaviour of any agent that is "rational" in a certain technical sense should be describable as it having a "utility function", i.e. every outcome can be assigned a number, such that the agent predictably chooses outcomes with higher numbers over those with lower ones. This is because if an agent violates this constraint, there must exist situations where it would do something obviously dumb. As a shocked Keltham puts it: "I, I mean, there's being chaotic, and then there's being so chaotic that it violates coherence theorems".

In Yudkowsky's own words, not in Planecrash but in an essay he wrote (with much valuable discussion in the comments):

We have multiple spotlights all shining on the same core mathematical structure, saying dozens of different variants on, "If you aren't running around in circles or stepping on your own feet or wantonly giving up things you say you want, we can see your behavior as corresponding to this shape. Conversely, if we can't see your behavior as corresponding to this shape, you must be visibly shooting yourself in the foot." Expected utility is the only structure that has this great big family of discovered theorems all saying that. It has a scattering of academic competitors, because academia is academia, but the competitors don't have anything like that mass of spotlights all pointing in the same direction.
So if we need to pick an interim answer for "What kind of quantitative framework should I try to put around my own decision-making, when I'm trying to check if my thoughts make sense?" or "By default and barring special cases, what properties might a sufficiently advanced machine intelligence look to us like it possessed, at least approximately, if we couldn't see it visibly running around in circles?", then there's pretty much one obvious candidate: Probabilities, utility functions, and expected utility.

Coordination

Next, coordination. There is no single theorem or total solution for the problem of coordination. But the Yudkowskian frame has near-infinite scorn for failures of coordination. Imagine not realising all possible gains just because you're stuck in some equilibrium of agents defecting against each other. Is that winning? No, it's not. Therefore, it must be out.

Dath ilan has a mantra that goes, roughly: if you do that, you will end up there, so if you want to end up somewhere that is not there, you will have to do Something Else Which Is Not That. And the basic premise of dath ilan is that society actually has the ability to collectively say "we are currently going there, and we don't want to, and while none of us can individually change the outcome, we will all coordinate to take the required collective action and not defect against each other in the process even if we'd gain from doing so". Keltham claims that in dath ilan, if there somehow developed an oppressive tyranny, everyone would wait for some Schelling time (like a solar eclipse or the end of the calendar year or whatever) and then simultaneously rise up in rebellion. It probably helps that dath ilan has annual "oops it's time to overthrow the government" exercises. It also helps that everyone in dath ilan knows that everyone knows that everyone knows that everyone knows (...) all the standard rationalist takes on coordination and common knowledge.

Keltham summarises the universality of Validity, Probability, Utility, and Coordination (note the capitals):

"I am a lot more confident that Validity, Probability, and Utility are still singled-out mathematical structures whose fragmented shards and overlapping shadows hold power in Golarion [=the world of Cheliax], than I am confident that I already know why snowflakes here have sixfold symmetry. And I wanted to make that clear before I said too much about the hidden orders of reality out of dath ilan - that even if the things I am saying are entirely wrong about Golarion, that kind of specific knowledge is not the most important knowledge I have to teach. I have gone into this little digression about Validity and timelessness and optimality, in order to give you some specific reason to think that [...] some of the knowledge he has to teach is sufficiently general that you have strong reason for strong hope that it will work [...] [...] "It is said also in dath ilan that there is a final great principle of Law, less beautiful in its mathematics than the first three, but also quite important in practice; it goes by the name Coordination, and deals with agents simultaneously acting in such fashion to all get more of what they wanted than if they acted separately."

Decision theory

The final fundamental bit of Yudkowsky's philosophy is decision theories more complicated than causal decision theory.

A short primer / intuition pump: a decision theory specifies how you should choose between various options (it's not moral philosophy, because it assumes that we know already know what we value). The most straightforward decision theory is causal decision theory, which says: pick the option that causes the best outcome in expectation. Done, right? No; the devil is in the word "causes". Yudkowsky makes much of Newcomb's problem, but I prefer another example: Parfit's hitchhiker. Imagine you're a selfish person stuck in a desert without your wallet, and want to make it back to your hotel in the city. A car pulls up, with a driver who knows whether you're telling the truth. You ask to be taken back to your hotel. The driver asks if you'll pay $10 to them as a service. Dying in the desert is worse for you than paying $10, so you'd like to take this offer. However, you obey causal decision theory: if the driver takes you to your hotel, you would go to your hotel to get your wallet, but once inside you have the option between (a) take $10 back to the driver and therefore lose money, and (b) stay in your hotel and lose no money. Causal decision theory says to take option (b), because you're a selfish agent who doesn't care about the driver. And the driver knows you'd be lying if you said "yes", so you have to tell the driver "no". The driver drives off, and you die of thirst in the desert. If only you had spent more time arguing about non-causal decision theories on LessWrong.

Dying in a desert rather than spending $10 is not exactly systematised winning. So causal decision theory is out. (You could argue that another moral of Parfit's hitchhiker is that being a purely selfish agent is bad, and humans aren't purely selfish so it's not applicable to the real world anyway, but in Yudkowsky's philosophy—and decision theory academia—you want a general solution to the problem of rational choice where you can take any utility function and win by its lights regardless of which convoluted setup philosophers drop you into.) Yudkowsky's main academic / mathematical accomplishment is co-inventing (with Nate Soares) functional decision theory, which says you should consider your decisions as the output of a fixed function, and then choose the function that leads to the best consequences for you. This solves Parfit's hitchhiker, as well as problems like the smoking lesion problem that evidential decision theory, the classic non-causal decision theory, succumbs to. As far as I can judge, functional decision theory is actually a good idea (if somewhat underspecified), but academic engagement (whether critiques and praises) with it has been limited so there's no broad consensus in its favor that I can point at. (If you want to read Yudkowsky's explanation for why he doesn't spend more effort on academia, it's here.)

(Now you know what a Planecrash tangent feels like, except you don't, because Planecrash tangents can be much longer.)

One big aspect of Yudkowskian decision theory is how to respond to threats. Following causal decision theory means you can neither make credible threats nor commit to deterrence to counter threats. Yudkowsky endorses not responding to threats to avoid incentivising them, while also having deterrence commitments to maintain good equilibria. He also implies this is a consequence of using a sensible functional decision theory. But there's a tension here: your deterrence commitment could be interpreted as a threat by someone else, or visa versa. When the Eisenhower administration's nuclear doctrine threatened massive nuclear retaliation in event of the Soviets taking West Berlin, what's the exact maths that would've let them argue to the Soviets "no no this isn't a threat, this is just a deterrence commitment", while allowing the Soviets keep to Yudkowsky's strict rule to ignore all threats?

My (uninformed) sense is that this maths hasn't been figured out. Planecrash never describes it (though here is some discussion of decision theory in Planecrash). Posts in the LessWrong decision theory canon like this or this and this seem to point to real issues around decision theories encouraging commitment races, and when Yudkowsky pipes up in the comments he's mostly falling back on the conviction that, surely, sufficiently-smart agents will find some way around mutual destruction in a commitment race (systematised winning, remember?). There are also various critiques of functional decision theory (see also Abram Demski's comment on that post acknowledging that functional decision theory is underspecified). Perhaps it all makes sense if you've worked through Appendix B7 of Yudkowsky's big decision theory paper (which I haven't actually read, let alone taken time to digest), but (a) why doesn't he reference that appendix then, and (b) I'd complain about that being hard to find, but then again we are talking about the guy who leaves the clearest and most explicit description of his philosophy scattered across an R-rated role-playing-game fanfic posted in innumerable parts on an obscure internet forum, so I fear my complaint would be falling on deaf ears anyway.

The political philosophy of dath ilan

Yudkowsky has put a lot of thought into how the world of dath ilan functions. Overall it's very coherent.

Here's a part where Keltham explains dath ilan's central management principle: everything, including every project, every rule within any company, and any legal regulation, needs to have one person responsible for it.

Keltham is informed, though he doesn't think he's ever been tempted to make that mistake himself, that overthinky people setting up corporations sometimes ask themselves 'But wait, what if this person here can't be trusted to make decisions all by themselves, what if they make the wrong decision?' and then try to set up more complicated structures than that. This basically never works. If you don't trust a power, make that power legible, make it localizable to a single person, make sure every use of it gets logged and reviewed by somebody whose job it is to review it. If you make power complicated, it stops being legible and visible and recordable and accountable and then you actually are in trouble.

Here's a part where Keltham talks about how dath ilan solves the problem of who watches the watchmen:

If you count the rehearsal festivals for it, Civilization spends more on making sure Civilization can collectively outfight the Hypothetical Corrupted Governance Military, than Civilization spends on its actual military.

Here's a part where dath ilan's choice of political system is described, which I will quote at length:

Conceptually and to first-order, the ideal that Civilization is approximating is a giant macroagent composed of everybody in the world, taking coordinated macroactions to end up on the multi-agent-optimal frontier, at a point along that frontier reflecting a fair division of the gains from that coordinated macroaction -
Well, to be clear, the dath ilani would shut it all down if actual coordination levels started to get anywhere near that. Civilization has spoken - with nearly one voice, in fact - that it does not want to turn into a hivemind.
[...]
Conceptually and to second-order, then, Civilization thinks it should be divided into a Private Sphere and a Public Shell. Nearly all the decisions are made locally, but subject to a global structure that contains things like "children may not be threatened into unpaid labor"; or "everybody no matter who they are or what they have done retains the absolute right to cryosuspension upon their death"; [...]
[...]
Directdemocracy has been tried, from time to time, within some city of dath ilan: people making group decisions by all individually voting on them. It can work if you try it with fifty people, even in the most unstructured way. Get the number of direct voters up to ten thousand people, and no amount of helpfully-intended structure in the voting process can save you.
[...]
Republics have been tried, from time to time, within some city of dath ilan: people making group decisions by voting to elect leaders who make those decisions. It can work if you try it with fifty people, even in the most unstructured way. Get the number of voters up to ten thousand people, and no amount of helpfully-intended structure in the voting process Acan save you.
[...]
There are a hundred more clever proposals for how to run Civilization's elections. If the current system starts to break, one of those will perhaps be adopted. Until that day comes, though, the structure of Governance is the simplest departure from directdemocracy that has been found to work at all.
Every voter of Civilization, everybody at least thirteen years old or who has passed some competence tests before then, primarily exerts their influence through delegating their vote to a Delegate.
A Delegate must have at least fifty votes to participate in the next higher layer at all; and can retain no more than two hundred votes before the marginal added influence from each additional vote starts to diminish and grow sublinearly. Most Delegates are not full-time, unless they are representing pretty rich people, but they're expected to be people interested in politics [...]. Your Delegate might be somebody you know personally and trust, if you're the sort to know so many people personally that you know one Delegate. [...]
If you think you've got a problem with the way Civilization is heading, you can talk to your Delegate about that, and your Delegate has time to talk back to you.
That feature has been found to not actually be dispensable in practice. It needs to be the case that, when you delegate your vote, you know who has your vote, and you can talk to that person, and they can talk back. Otherwise people feel like they have no lever at all to pull on the vast structure that is Governance, that there is nothing visible that changes when a voter casts their one vote. Sure, in principle, there's a decision-cohort whose votes move in logical synchrony with yours, and your cohort is probably quite large unless you're a weird person. But some part of you more basic than that will feel like you're not in control, if the only lever you have is an election that almost never comes down to the votes of yourself and your friends.
The rest of the electoral structure follows almost automatically, once you decide that this property has to be preserved at each layer.
The next step up from Delegates are Electors, full-time well-paid professionals who each aggregate 4,000 to 25,000 underlying voters from 50 to 200 Delegates. Few voters can talk to their Electors [...] but your Delegate can have some long conversations with them. [...]
Representatives aggregate Electors, ultimately 300,000 to 3,000,000 underlying votes apiece. There are roughly a thousand of those in all Civilization, at any given time, with social status equivalent to an excellent CEO of a large company or a scientist who made an outstanding discovery [...]
And above all this, the Nine Legislators of Civilization are those nine candidates who receive the most aggregate underlying votes from Representatives. They vote with power proportional to their underlying votes; but when a Legislator starts to have voting power exceeding twice that of the median Legislator, their power begins to grow sublinearly. By this means is too much power prevented from concentrating into a single politician's hands.
Surrounding all this of course are numerous features that any political-design specialist of Civilization would consider obvious:
Any voter (or Delegate or Elector or Representative) votes for a list of three possible delegees of the next layer up; if your first choice doesn't have enough votes yet to be a valid representor, your vote cascades down to the next person on your list, but remains active and ready to switch up if needed. This lets you vote for new delegees entering the system, without that wasting your vote while there aren't enough votes yet.
Anyone can at any time immediately eliminate a person from their 3-list, but it takes a 60-day cooldown to add a new person or reorder the list. The government design isn't meant to make it cheap or common to threaten your delegee with a temporary vote-switch if they don't vote your way on that particular day. The government design isn't meant to make it possible for a new brilliant charismatic leader to take over the entire government the next day with no cooldowns. It is meant to let you rapidly remove your vote from a delegee that has sufficiently ticked you off.
Once you have served as a Delegate, or delegee of any other level, you can't afterwards serve in any other branches of Governance. [...]
This is meant to prevent a political structure whose upper ranks offer promotion as a reward to the most compliant members of the ranks below, for by this dark-conspiratorial method the delegees could become aligned to the structure above rather than their delegators below.
(Most dath ilani would be suspicious of a scheme that tried to promote Electors from Delegates in any case; they wouldn't think there should be a political career ladder [...] Dath ilani are instinctively suspicious of all things meta, and much more suspicious of anything purely meta; they want heavy doses of object-level mixed in. To become an Elector you do something impressive enough, preferably something entirely outside of Governance, that Delegates will be impressed by you. You definitely don't become an Elector by being among the most ambitious and power-seeking people who wanted to climb high and knew they had to start out a lowly Delegate, who then won a competition to serve the system above them diligently enough to be selected for a list of Electors fed to a political party's captive Delegates. If a dath ilani saw a system like this, that was supposedly a democracy set in place by the will of its people, they would ask what the captive 'voters' even thought they were supposedly trying to do under the official story.)

Dath ilani Legislators have a programmer's or engineer's appreciation for simplicity:

[...] each [regulation] must be read aloud by a Legislator who thereby accepts responsibility for that regulation; and when that Legislator retires a new Legislator must be found to read aloud and accept responsibility for that regulation, or it will be stricken from the books. Every regulation in Civilization, if something goes wrong with it, is the fault of one particular Legislator who accepted responsibility for it. To speak it aloud, it is nowadays thought, symbolizes the acceptance of this responsibility.
Modern dath ilani aren't really the types in the first place to produce literally-unspeakable enormous volumes of legislation that no hapless citizen or professional politician could ever read within their one lifetime let alone understand. Even dath ilani who aren't professional programmers have written enough code to know that each line of code to maintain is an ongoing cost. Even dath ilani who aren't professional economists know that regulatory burdens on economies increase quadratically in the cost imposed on each transaction. They would regard it as contrary to the notion of a lawful polity with law-abiding citizens that the citizens cannot possibly know what all the laws are, let alone obey them. Dath ilani don't go in for fake laws in the same way as Golarion polities with lots of them; they take laws much too seriously to put laws on the books just for show.

Finally, the Keepers are an order of people trained in all the most hardcore arts of rationality, and who thus end up with inhuman integrity and even-handedness of judgement. They are used in many ways, for example:

There are also Keeper cutouts at key points along the whole structure of Governance - the Executive of the Military reports not only to the Chief Executive but also to an oathsworn Keeper who can prevent the Executive of the Military from being fired, demoted, or reduced in salary, just because the Chief Executive or even the Legislature says so. It would be a big deal, obviously, for a Keeper to fire this override; but among the things you buy when you hire a Keeper is that the Keeper will do what they said they'd do and not give five flying fucks about what sort of 'big deal' results. If the Legislators and the Chief Executive get together and decide to order the Military to crush all resistance, the Keeper cutout is there to ensure that the Executive of the Military doesn't get a pay cut immediately after they tell the Legislature and Chief Executive to screw off.

Also, to be clear, absolutely none of this is plot-relevant.

Above: the icon of dath ilan in Planecrash. When Yudkowsky really wants to monologue, he stops even pretending to do it through a character, and instead we get this talking globe. Hello, globe. Nice political philosophy you got there.

A system of the world

Yudkowsky proves that ideas matter: if you have ideas that form a powerful and coherent novel worldview, it doesn't matter if your main method for publicising them is ridiculously-long fanfiction, or if you dropped out of high school, or if you wear fedoras. People will still listen, and you might become (so far) the 21st century's most important philosopher.

Why is Yudkowsky so compelling? There are intellectuals like Scott Alexander who are most-strongly identified by a particular method (an even-handed, epistemically-rigorous, steelmaning-focused treatment of a topic), or intellectuals like Robin Hanson who are most-strongly identified by a particular style (eclectic irreverence about incentive mechanisms). But Yudkowsky's hallmark is delivering an entire system of the world that covers everything from logic to what correct epistemology looks like to the maths behind rational decision-making and coordination, and comes complete with identifying the biggest threat (misaligned AI) and the structure of utopia (dath ilan). None of the major technical inventions (except some in decision theory) are original to Yudkowsky. But he's picked up the pieces, slotted them into a big coherent structure, and presented it in great depth. And Yudkowsky's system claims to come with proofs for many key bits, in the literal mathematical sense. No, you can't crack open a textbook and see everything laid out, step-by-step. But the implicit claim is: read this long essay on coherence theorems, these papers on decision theory, this 20,000-word dialogue, these sequences on LessWrong, and ideally a few fanfics too, and then you'll get it.

After reading Yudkowsky, you're perfectly inoculated against any philosophy so lazy that it doesn't even come with mathematical proofs. (source)

Does he deliver? To an impressive extent, yes. There's a lot of maths that is laid out step-by-step and does check out. There are many takes that are correct, and big structures that point in the right direction, and what seems wrong at least has depth and is usefully provocative. But dig deep enough, and there are cracks: arguments about how much coherence theorems really imply, critiques of the decision theory, and good counterarguments to the most extreme versions of Yudkowsky's AI risk thesis. You can chase any of these cracks up towers of LessWrong posts, or debate them endlessly at those parties where people stand in neat circles and exchange thought experiments about acausal trade. If you have no interaction with rationalist/LessWrong circles, I think you'd be surprised at the fraction of our generation's top mathematical-systematising brainpower that is spent on this—or that is bobbing in the waves left behind, sometimes unknowingly.

As for myself: Yudkowsky's philosophy is one of the most impressive intellectual edifices I've seen. Big chunks of it—in particular the stuff about empiricism, naturalism, and the art of genuinely trying to figure out what's true that The Sequences especially focus on—were very formative in my own thinking. I think it's often proven itself directionally correct. But Yudkowsky's philosophy makes a claim for near-mathematical correctness, and I think there's a bit of trouble there. While it has impressive mathematical depth and gets many things importantly right (e.g. Bayesianism), despite much effort spent digesting it, I don't see it meeting the rigour bar it would need for its predictions (for example about AI risk) to be more like those of a tested scientific theory than those of a framing, worldview, or philosophy. However, I'm also very unsympathetic to a certain straitlaced science-cargo-culting attitude that recoils from Yudkowsky's uncouthness and is uninterested in speculation or theory—they would do well to study the actual history of science. I also see in Yudkowsky's philosophy choices of framing and focus that seem neither forced by reason nor entirely natural in my own worldview. I expect that lots more great work will come out within the Yudkowskian frame, whether critiques or patches, and this work could show it to be anywhere from impressive but massively misguided to almost prophetically prescient. However, I expect even greater things if someone figures out a new, even grander and more applicable system of the world. Perhaps that person can then describe it in a weird fanfic.

Review: Good Strategy, Bad Strategy

Rudolf Laine — Sat, 21 Dec 2024 17:09:35 GMT

I used to think that all generic strategy advice was pointless. After all, the point of a strategy is to achieve a thing, and to achieve a thing you just think hard about how to best do it and then work hard to do it. I said this to my friend Dewi, who said that this is mostly true, but there is an exception: Good Strategy, Bad Strategy by Richard Rumelt. Dewi was right.

The book has some principles. In particular: a good strategy should include a diagnosis of the problem, an overall guiding policy, and a set of coherent actions. A laundry list of actions, a goal, or a vague idea of which direction to move in are not strategies.

But most of the book's value is reading a bunch of examples and soaking up the thinking style embedded in them. Therefore, this review is mostly a series of vignettes taken directly from Rumelt's text, that hopefully 80/20 the value of the book (or is a helpful reminder if you've read it). Rumelt's vignettes are great.

I don't include anything from Rumelt's lengthy attack on most of what passes for "strategy consulting". I also won't mention the part where he spends most of a chapter giving a pop-sci history of physics from Galileo to dark matter in order to segue into a point about Starbucks' business strategy—he's an emeritus professor, so presumably he can just do this.

Example

“In 1805, England had a problem. Napoléon had conquered big chunks of Europe and planned the invasion of England. But to cross the Channel, he needed to wrest control of the sea away from the English. Off the southwest coast of Spain, the French and Spanish combined fleet of thirty-three ships met the smaller British fleet of twenty-seven ships. The well-developed tactics of the day were for the two opposing fleets to each stay in line, firing broadsides at each other. But British admiral Lord Nelson had a strategic insight. He broke the British fleet into two columns and drove them at the Franco-Spanish fleet, hitting their line perpendicularly. The lead British ships took a great risk, but Nelson judged that the less-trained Franco-Spanish gunners would not be able to compensate for the heavy swell that day. At the end of the Battle of Trafalgar, the French and Spanish lost twenty-two ships, two-thirds of their fleet. The British lost none. Nelson was mortally wounded, becoming, in death, Britain’s greatest naval hero. Britain’s naval dominance was ensured and remained unsurpassed for a century and a half.
Nelson’s challenge was that he was outnumbered. His strategy was to risk his lead ships in order to break the coherence of his enemy’s fleet. With coherence lost, he judged, the more experienced English captains would come out on top in the ensuing melee. Good strategy almost always looks this simple and obvious and does not take a thick deck of PowerPoint slides to explain. It does not pop out of some “strategic management” tool, matrix, chart, triangle, or fill-in-the-blanks scheme. Instead, a talented leader identifies the one or two critical issues in the situation—the pivot points that can multiply the effectiveness of effort—and then focuses and concentrates action and resources on them.”

The basics

The two fundamental things

There's great advantage in having any coherent strategy in the first place at all; most orgs don't have one; people will be actively surprised if you have any coherent strategy at all
Subtle shifts of viewpoint can lead to realising new strengths; reframe the situation so you see the leverage point

Coherence

Example: the obvious

[...] I carried out interviews with twenty-six executives, all division managers or CEOs in the electronics and telecommunications sector. My interview plan was simple: I asked each executive to identify the leading competitor in their business. I asked how that company had become the leader—evoking their private theories about what works. And then I asked them what their own company’s current strategy was. These executives, by and large, had no trouble describing the strategy of the leader in their sectors. The standard story was that some change in demand or technology had appeared—a “window of opportunity” had opened—and the current leader had been the first one to leap through that window and take advantage of it. Not necessarily the first mover, but the first to get it right.
But when I asked about their own companies’ strategies, there was a very different kind of response. Instead of pointing to the next window of opportunity, or even mentioning its possibility, I heard a lot of look-busy doorknob polishing. They were making alliances, they were doing 360-degree feedback, they were looking for foreign markets, they were setting challenging strategic goals, they were moving software into firmware, they were enabling Internet updates of firmware, and so on.”

Example: Jobs waits

In the late 1990s:

[Rumelt said:] “Steve, this turnaround at Apple has been impressive. But everything we know about the PC business says that Apple cannot really push beyond a small niche position. The network effects are just too strong to upset the Wintel standard. So what are you trying to do in the longer term? What is the strategy?”
He did not attack my argument. He didn’t agree with it, either. He just smiled and said, “I am going to wait for the next big thing."

Strategy is Unexpected

The Department of the Army publishes field manuals fully describing its basic doctrines and methods. FM 100-5, published in 1986, was titled Operations and was described as “the Army’s keystone warfighting manual.” Part 2 of FM 100-5 was dedicated to “Offensive Operations,” and on page 101 it described “envelopment” as the most important form of offensive maneuver—the U.S. Army’s “Plan A.” The manual said:
Envelopment avoids the enemy’s front, where its forces are most protected and his fires most easily concentrated. Instead, while fixing the defender’s attention forward by supporting or diversionary attacks, the attacker maneuvers his main effort around or over the enemy’s defenses to strike at his flanks and rear.”

But when the US army in the First Iraq War followed its own advice, and avoided a frontal attack on the Iraqi forced in Kuwait, everyone was shocked and called it brilliant strategy.

A lot of this was just preventing everyone else from pursuing their own special agenda, in order to get them to act coherently towards a shared goal:

In the case of Desert Storm, the focus was much more than an intellectual step. Schwarzkopf had to suppress the ambitions and desires of the air force, marines, various army units, each coalition partner, and the political leadership in Washington. For example, the U.S. Army’s best light infantry—the Eighty-Second Airborne—was tasked with providing support to French armor and infantry, an assignment its leadership protested. Eight thousand U.S. Marines waited on ships to land on the beaches of Kuwait City, but did not. It was a diversion. Air force commanders wanted to demonstrate the value of strategic bombing—they believed that the war could be won by air attacks on Baghdad—and had to be forced against strenuous protest to divert their resources to fully support the land offensive. Secretary of Defense Dick Cheney wanted the mission accomplished with a smaller force and detailed an alternative plan of attack. Prince Khalid, commanding the Saudi forces in the coalition, insisted that King Fahd be involved in the planning, but Schwarzkopf convinced President Bush to ensure that U.S. Central Command retained control over strategy and planning.

Advantage

Rumelt quotes Andy Marshall, director of the Office of Net Assessment (which thinks about the US defence situation), complaining about the annual budgeting process of the defence department:

"This process of justifying expenditures as counters to Soviet expenditures conditioned U.S. actions on Soviet strengths, expressed as threats, not on Soviet weaknesses and constraints. We had a war strategy—a catastrophic spasm—but no plan about how to compete with the Soviet Union over the long term."

Marshall and some others had written "Strategy for Competing with the Soviets in the Military Sector of the Continuing Political-Military Competition" in 1976:

This fascinating analysis of the situation worked to redefine "defense" in new terms—a subtle shift in point of view. It argued that “in dealing effectively with the other side, a nation seeks opportunities to use one or more distinctive competences in such a way as to develop competitive advantage—both in specific areas and overall.” It then went on to explain that the crucial area of competition was technology because the United States had more resources and better capabilities in that area. And, most important, it argued that having a true competitive strategy meant engaging in actions that imposed exorbitant costs on the other side. In particular, it recommended investing in technologies that were expensive to counter and where the counters did not add to Soviet offensive capabilities. For instance, increasing the accuracy of missiles or the quietness of submarines forced the Soviet Union to spend scarce resources on counters without increasing the threat to the United States. Investments in systems that made Soviet systems obsolete would also force them to spend, as would selectively advertising dramatic new technologies.

It seems this actually happened—c.f. Ronald Reagan's Star Wars program.

Rumelt comments:

There were no complex charts or graphs, no abstruse formulas, no acronym-jammed buzz speak: just an idea and some pointers to how it might be used—the terrible simplicity of the discovery of hidden power in a situation.
Marshall and Roche’s analysis included a list of U.S. and Soviet strengths and weaknesses. Such lists were not new, and the traditional response to them would have been to invest more to tip the “balance” in one’s favor. But Marshall and Roche, like Sam Walton, had an insight that, when acted upon, provided a much more effective way to compete—the discovery of hidden power in the situation.

The kernel of good strategy

The kernel of a strategy contains three elements:
A diagnosis that defines or explains the nature of the challenge. A good diagnosis simplifies the often overwhelming complexity of reality by identifying certain aspects of the situation as critical.
A guiding policy for dealing with the challenge. This is an overall approach chosen to cope with or overcome the obstacles identified in the diagnosis.
A set of coherent actions that are designed to carry out the guiding policy. These are steps that are coordinated with one another to work together in accomplishing the guiding policy.”

If there is one specific and explicit piece of knowledge to take from this book, it's this.

Diagnosis

Example: US Cold War policy

A diagnosis is generally denoted by metaphor, analogy, or reference to a diagnosis or framework that has already gained acceptance. For example, every student of U.S. national strategy knows about the diagnosis associated with the Cold War guiding policy of containment. This concept originated with George Kennan’s famous “long telegram” of 1946. Having served as an American diplomat in the USSR for more than a decade, and having seen Soviet terror and politics at close hand, he carefully analyzed the nature of Soviet ideology and power. Kennan started with the observation that the Soviet Union was not an ordinary nation-state. Its leaders defined their mission as opposition to capitalism and as spreading the gospel of revolutionary communism by whatever means necessary. He stressed that antagonism between communism and capitalist societies was a central foundation of Stalin’s political regime, preventing any sincere accommodation or honest international agreements. However, he also pointed out that the Soviet leaders were realists about power. Therefore, he recommended a guiding policy of vigilant counterforce:
“In the light of the above, it will be clearly seen that the Soviet pressure against the free institutions of the western world is something that can be contained by the adroit and vigilant application of counter-force at a series of constantly shifting geographical and political points, corresponding to the shifts and maneuvers of Soviet policy, but which cannot be charmed or talked out of existence. The Russians look forward to a duel of infinite duration, and they see that already they have scored great successes.”
Kennan’s diagnosis for the situation—a long-term struggle without the possibility of a negotiated settlement—was widely adopted within policy-making circles in the United States. His guiding policy of containment was especially attractive as it specified a broad domain of action—the USSR was, metaphorically speaking, infected by a virus. The United States would have to keep the virus from spreading until it finally died out. Kennan’s policy is sometimes called a strategy, but it lacked the element of action. All presidents from Truman through George H. W. Bush struggled with the problem of turning this guiding policy into actionable objectives. Over time, the guiding policy of containment led to NATO and SEATO, the Berlin Airlift, the Korean War, placing missiles in Europe, the Vietnam War, and other Cold War actions.
The power of Kennan’s diagnosis can be seen by considering how history might have been different if the situation had been framed another way in 1947. Perhaps the Soviet Union could have been enticed into the world community through a policy of engagement by including it in the Marshall Plan. Or perhaps it wasn’t an American problem at all, but an issue for the United Nations. Or perhaps the Soviet Union was a tyranny rivaling Nazi Germany, and the United States should have sought to actively oppose it, undermine it, and liberate its population.

Example: IBM

[W]hen Lou Gerstner took over the helm at IBM in 1993, the company was in serious decline. Its historically successful strategy had been organized around offering complete, integrated, turnkey end-to-end computing solutions to corporations and government agencies. However, the advent of the microprocessor changed all that. The computer industry began to fragment, with separate firms offering chips, memory, hard disks, keyboards, software, monitors, operating systems, and so on. [...] As computing moved to the desktop, and as IBM’s desktop offering became commoditized by clone competitors and the Windows-Intel standard, what should the company do? The dominant view at the company and among Wall Street analysts was that IBM was too integrated. The new industry structure was fragmented and, it was argued, IBM should be broken up and fragmented to match. As Gerstner arrived, preparations were under way for separate stock offerings for various pieces of IBM.
After studying the situation, Gerstner changed the diagnosis. He believed that in an increasingly fragmented industry, IBM was the one company that had expertise in all areas. Its problem was not that it was integrated but that it was failing to use the integrated skills it possessed. IBM, he declared, needed to become more integrated—but this time around customer solutions rather than hardware platforms. The primary obstacle was the lack of internal coordination and agility. Given this new diagnosis, the guiding policy became to exploit the fact that IBM was different, in fact, unique. IBM would offer customers tailored solutions to their information-processing problems, leveraging its brand name and broad expertise, but willing to use outside hardware and software as required. Put simply, its primary value-added activity would shift from systems engineering to IT consulting, from hardware to software. Neither the “integration is obsolete” nor the “knowing all aspects of IT is our unique ability” viewpoints are, by themselves, strategies. But these diagnoses take the leader, and all who follow, in very different directions.

Guiding policy

The guiding policy outlines an overall approach for overcoming the obstacles highlighted by the diagnosis. It is “guiding” because it channels action in certain directions without defining exactly what shall be done. Kennan’s containment and Gerstner’s drawing on all of IBM’s resources to solve customers’ problems are examples of guiding policies. Like the guardrails on a highway, the guiding policy directs and constrains action without fully defining its content.
[...]
A guiding policy creates advantage by anticipating the actions and reactions of others, by reducing the complexity and ambiguity in the situation, by exploiting the leverage inherent in concentrating effort on a pivotal or decisive aspect of the situation, and by creating policies and actions that are coherent, each building on the other rather than canceling one another out.

Example: corner grocery store

To look more closely at how a guiding policy works, follow the thinking of Stephanie, a friend who owns a corner grocery store. She does the accounts, manages personnel, sometimes runs the cash register, and makes all the decisions. Several years ago, Stephanie told me about some of the issues she was facing. She was considering whether she should keep prices down or offer more expensive, fresh organic produce. Should she begin to stock more Asian staples for the many Asian students who lived in the area? Should the store be open longer hours? How important was it to have a helpful, friendly staff that gets to know the regulars? Would adding a second checkout stand pay off? What about parking in the alley? Should she advertise in the local college newspaper? Should she paint the ceiling green or white? Should she put some items on sale each week? Which ones?
[digs about how economists aren't helpful for this problem]
Thinking about her store, Stephanie diagnosed her challenge to be competition with the local supermarket. She needed to draw customers away from a store that was open 24/7 and had lower prices. Seeking a way forward, she believed that most of her customers were people who walked by the store almost every day. They worked or lived nearby. Scanning her list of questions and alternatives, she determined that there was a choice between serving the more price-conscious students or the more time-sensitive professionals. Transcending thousands of individual choices and instead framing the problem in terms of choosing among a few customer groups provided a dramatic reduction in complexity.
Of course, if both of these customer segments could be served with the same policies and actions, then the dichotomy would have been useless and should be cast aside. In Stephanie’s case, the difference seemed significant. More of her customers were students, but the professionals who stopped in made much larger purchases. Pushing further along, Stephanie began to explore the guiding policy of “serve the busy professional.” After some more tinkering, Stephanie sharpened the guiding policy a bit more, deciding to target “the busy professional who has little time to cook.
There was no way to establish that this particular guiding policy was the only good one, or the best one. But, absent a good guiding policy, there is no principle of action to follow. Without a guiding policy, Stephanie’s actions and resource allocations would probably be inconsistent and incoherent, fighting with one another and canceling one another out. Importantly, adopting this guiding policy helped reveal and organize the interactions among the many possible actions. Considering the needs of the busy professional with little time to cook, she could see that the second checkout stand would help handle the burst of traffic at 5 p.m. So would more parking in the alley. In addition, she felt she could take space currently used for selling munchies to students and offer prepared high-quality take-home foods instead. Professionals, unlike students, would not come shopping at midnight, so there was no need for very late hours. The busy professionals would appreciate adequate staffing after work and, perhaps, at lunchtime. Having a guiding policy helped create actions that were coordinated and concentrated, focusing her efforts.

Coherent action

Action requires doing something

Strategy is about action, about doing something. The kernel of a strategy must contain action.

Rumelt gives an example of a consumer goods producer that was running a "Pan-European" initiative, to try to achieve economies of scale in both production and marketing.

The heads of the country-based organizations were placed on a Pan-Europe Executive Committee, which met once a quarter. Developers from Germany and the United Kingdom were rotated between the two locations. A New Products group had been created to consult with all departments on opportunities for Pan-European concepts and brands. Part of each executive’s evaluation for promotion was based on his or her contribution to the Pan-European initiative. Despite these measures, nothing much had happened. The German and British developers each claimed that their initiatives were unsupported by the other. The one British-German joint initiative had not been picked up by the rest of the organization.
[...]
"Suppose," I said, "that this was really important, really top-priority critical. Suppose you absolutely had to get some Pan-European products developed and marketed in the next eighteen months or everything would collapse. What would you do then?"
"For one thing,” he said, throwing his arms up in mock surrender, “I would close one of the development groups. They spend more time bickering than developing.”
Then he thought for a moment and said, “I just might close both and start over in the Netherlands. There is a market-test office there we could use as a seed. We could take some of the best people from the UK and Germany and start fresh. Still, that doesn’t solve the problem of getting the country managers on board.”
“And the country managers’ lack of enthusiasm is because … ?” I asked.
“Well, each country manager has spent years understanding the special conditions in a country, tailoring products and marketing programs to that country’s local conditions. They don’t trust the Pan-European idea. The French don’t want to waste marketing efforts on products they see as ‘too British’ or ‘too German.’ And there really has not yet been a compelling Pan-European product that all could get behind. If it were already a success in three or four countries, the rest would get behind it. But everyone has their current portfolio of products to worry about.”
“Right,” I said. “Their jobs are running the present country-based system. And you want new Pan-European initiatives. Now, you can use a shoe to hammer a nail, but it will take a long time. Don't you need a different tool for this task? If it were really important to get this done, I think you know how you would do it."
“Of course,” he said. “We could have a single group develop, roll out, and market Pan-European products and take full profit responsibility.”
“At the same time,” I added, “you would have to intervene in the country-based system with special budget overrides for this initiative, promotions for people who help it along, and career problems for people who don’t.”
We moved back to the center of the office, and he sat at his desk, a position of authority. He looked at me and said, “That would be a very painful road. Many noses would get out of joint. It would be better to win people over to this point of view rather than force them over.”
“Right,” I said. “You would only take all those painful steps if it were really important to get action on this concept. Only if it were really important.”
It took another nine months for him to decide that the Pan-European initiative was indeed important and move to reorganize European operations. There was no magical solution to his problem of wanting strong country-based marketing, Pan-European initiatives, and no noses out of joint, all at the same time. As long as strategy remained at the level of intent and concept, the conflicts among various values and between the organization and the initiative remained tolerable. It was the imperative of action that forced a decision about which issue was actually the most important.
[...] Here, as in so many situations, the required actions were not mysterious. The impediment was the hope that the pain of those actions could, somehow, be avoided. Indeed, we always hope that a brilliant insight or very clever design will allow us to accomplish several apparently conflicting objectives with a single stroke, and occasionally we are vouchsafed this kind of deliverance. Nevertheless, strategy is primarily about deciding what is truly important and focusing resources and action on that objective. It is a hard discipline because focusing on one thing slights another.

Coherence

In 2003, I worked with a company whose initial “strategy” was to (1) close a plant in Akron and open a new plant in Mexico, (2) spend more on advertising, and (3) initiate a 360-degree feedback program. Now these actions may all have been “good ideas, but they did not complement one another. They are “strategic” only in the sense that each probably requires the approval of top management. My view is that doing these things might be sound operational management, but it did not constitute a strategy. A strategy coordinates action to address a specific challenge. It is not defined by the pay grade of the person authorizing the action.
The idea that coordination, by itself, can be a source of advantage is a very deep principle. It is often underappreciated because people tend to think of coordination in terms of continuing mutual adjustments among agents. Strategic coordination, or coherence, is not ad hoc mutual adjustment. It is coherence imposed on a system by policy and design.
Strategy is visible as coordinated action imposed on a system. When I say strategy is “imposed,” I mean just that. It is an exercise in centralized power, used to overcome the natural workings of a system. This coordination is unnatural in the sense that it would not occur without the hand of strategy.
[insert standard warning of the dangers of top-down totalitaranism]
But decentralized decision making cannot do everything. In particular, it may fail when either the costs or benefits of actions are not borne by the decentralized actors. The split between the costs and benefits may occur across organizational units or between the present and the future. And decentralized coordination is difficult when benefits accrue only if decisions are properly coordinated. Of course, centrally designed policies can also fail if the decision makers are foolish, in the pay of special interest groups, or simply choose incorrectly.
As a simple example, salespeople love to please customers with rush orders, and manufacturing people prefer long uninterrupted production runs. But you cannot have long production runs and handle unexpected rush orders all at the same time. It takes policies devised to benefit the whole to sort out this conflict.
[...]
[...W]e should seek coordinated policies only when the gains are very large. There will be costs to demanding coordination, because it will ride roughshod over economies of specialization and more nuanced local responses. The brilliance of good organization is not in making sure that everything is connected to everything else. Down that road lies a frozen maladaptive stasis. Good strategy and good organization lie in specializing on the right activities and imposing only the essential amount of coordination.

Rumelt also describes Hannibal's defeat of the Roman army at Cannae. A key part was to get the Gauls and Spaniards at the centre of the army to feign a retreat. Feigning a retreat would've seemed dishonourable and dangerous to them. If Hannibal couldn't have persuaded them to do that, the strategy wouldn't have worked.

Sources of advantage

This is a weaker section of the book than the previous. Rumelt admits as much, saying that the core of strategy is the above basics, and depending on field and industry the specifics are very different (c.f. my original point that I started this post with). But still, below are some examples. The chapter on taking advantage of dynamics (i.e. changes in an industry) is especially good; see below.

Leverage (but Rumelt could've called this point "neglectedness")
- Harold Williams was put in charge of the oil billionaire Getty's fortune. The mandate was to spend it on art. He could've spent the billions on buying a huge art collection, like others in his position did. But instead he decided to create a digital catalog of art, which became Getty Images, and actually had an effect on the world.
Proximate goals
- JFK's goal of landing on the moon is a masterpiece of goal-setting. It was based on a memo by Werner von Braun, that argued in particular that it was far-enough in the future that the US could catch up to and beat the Soviets.
- Also in the 1960s, NASA's JPL was planning the unmanned Surveyor lunar lander. The biggest issue was that they didn't know what the moon's surface would be like. Project lead Phyllis Buwalda decided to assume that the moon was basically like the a Southwest US desert, and base all design decisions on that unproven assumption. Testing became easy—go to the Southwest. And if the assumption was very wrong (e.g. the moon's surface was spiky boulders or soft dust that anything sinks into)? Well, then they were screwed anyway.
Chain-link systems (if even one thing breaks or is low-quality, it brings down the entire thing).
- Rumelt gives an example of an overhaul at a machinery company, where many things were broken, and the general manager conducted three sequential campaigns to fix production quality, then sales, then cost. Doing it in sequence, it is implied, was necessary for the required focus.
- IKEA is very hard to compete with because you need all of the design, the catalog, the self-assembly principle, and many spacious buildings. A traditional furniture retailer can't just add a catalog, because they'll still fall short of IKEA's combination.
Extending advantage.
- In 2008, Rumelt talked with the president of Disney about strategy. The Disney brand was the obviously the most valuable thing the company had, so they couldn't afford to dilute it. But they wanted to expand beyond children's movies. So the Disney president set up three principles: no cursing, no sex, no gratuitous violence. Then they went ahead and released Pirates of the Caribbean, acquired Star Wars, etc.

Dynamics

When change occurs, most people focus on the main effects—the spurts in growth of new types of products and the falling demand for others. You must dig beneath this surface reality to understand the forces underlying the main effect and develop a point of view about the second-order and derivative changes that have been set into motion. For example, when television appeared in the 1950s it was clear that everyone would eventually have one and that “free” TV entertainment would provide strong competition to motion pictures. A more subtle effect arose because the movie industry could no longer lure audiences out of their homes with “just another Western.” Traditional Hollywood studios had been specialized around producing a steady stream of B-grade movies and did not easily adapt. By the early 1960s, movie attendance was shrinking rapidly. What revived Hollywood film was a shift to independent production, with studios acting as financiers and distributors. Independent producers, freed from the nepotism and routines of the traditional studio, could focus on assembling a handpicked team to make a film that might be good enough to pull an audience off of their family-room sofas. Thus, a second-order effect of television was the rise of independent film production.
In a discussion with a group of managers at Qualcomm, a San Diego maker of mobile phone chips, I reviewed Moore’s point about the escalating costs of designing more and more complex special-purpose chips. One manager was puzzled and asked if it wasn’t also expensive to create software. He went on to rhetorically ask “Are software engineers less expensive than hardware engineers?"
[...]
We quickly developed an answer that has since stood up to scrutiny by a number of other technical groups: Good hardware and software engineers are both expensive. The big difference lies in the cost of prototyping, upgrading, and, especially, the cost of fixing a mistake. Design always involves a certain amount of trial and error, and hardware trials and errors are much more costly. If a hardware design doesn’t work correctly, it can mean months of expensive redesign. If software doesn’t work, a software engineer fixes the problem by typing new instructions into a file, recompiling, and trying again in a few minutes or a few days. And software can be quickly fixed and upgraded even after the product has shipped.
“I was puzzled over the cause of the computer industry’s deconstruction. Then, about a year later, the reason snapped into focus. I was interviewing technical managers at a client firm and one said that he had formerly been a systems engineer at IBM. He then explained that he had lost that job because modern computers didn’t need much systems engineering. “Why not?” I asked without thinking.
“Because now the individual components are all smart,” he answered. Then I saw it.
[...]
In many traditional computers, and early personal computers, the CPU—the active heart of the machine—did almost everything itself. It scanned the keyboard, looking for a keystroke. When it sensed one, it analyzed the row and column of the keystroke on the keyboard to determine the letter or number that had been pressed. [...] In some cases, designers created custom mini-CPUs to manage these peripherals, but the integration among these devices remained complex and unstandardized and consumed a great deal of systems engineering effort.
After the arrival of cheap microprocessors, all that changed. The modern keyboard has a small microprocessor built into it. It knows when a key has been hit and sends a simple standardized message to the computer saying, in effect, “The letter X was pressed.” The hard-disk drive is also smart so that the CPU doesn’t need to know how it works. It simply sends a message to the hard disk saying “Get sector 2032,” and the hard-disk subsystem returns the data in that sector, all in one slug.
[...]
With the glue of proprietary systems engineering no longer so important, the industry deconstructed itself. Modules did not have to be custom designed to work with every other part. To get a working system, customers did not have to buy everything from a single supplier. Specialist firms began to appear that made and sold only memory. Others made and sold only hard drives or keyboards or displays. Still others made and sold video cards or game controllers or other devices.

General points:

Rising fixed costs often lead to industry consolidation.
One of the biggest opportunities created by deregulation is often that the inertia of existing players means they will keep playing to the old rules even when they're gone.
Predictable biases (such as not realising that demand for a new durable product will likely go up and then down once everyone has one; or always forecasting that change will result in a "battle of the titans", whereas sometimes the incumbents are all defeated)
Understand how incumbents will react to change.
Figure out what the attractor state of the industry is given the tech & demand forces.

Other random vignettes I enjoyed

Incentives aren't enough, improvement is possible

First, management may mistakenly believe that improvement is a “natural” process or that it can be accomplished by pressure or incentives alone. As Frank Gilbreth pointed out in 1909, bricklayers had been laying bricks for thousands of years with essentially no improvement in tools and technique.5 By carefully studying the process, Gilbreth was able to more than double productivity without increasing anyone’s workload. By moving the supply pallets of bricks and mortar to chest height, hundreds or thousands of separate lifting movements per day by each bricklayer were avoided. By using a movable scaffold, skilled masons did not have to waste time carrying bricks up ladders. By making sure that mortar was the right consistency, masons could set and level a brick with a simple press of the hand instead of the time-honored multiple taps with a trowel. Gilbreth’s lesson, still fresh today, is that incentives alone are not enough. One must reexamine each aspect of product and process, casting aside the comfortable assumption that everyone knows what they are doing.

Almost everyone picks the first idea that comes to mind

When you prepared for this class, each of you read the same material. But some focused on one issue and others on another. Some focused on manufacturing, some on software, some on cable TV provider relationships, and so on. When it came to recommending a course of action, almost everyone chose the first one they thought of.
This is predictable. Most people, most of the time, solve problems by grabbing the first solution that pops into their heads—the first insight. In a large number of situations this is reasonable. It is the efficient way to get through life. We simply don’t have the time, energy, or mental space to do a full and complete analysis of every issue we face.
[...]
“Facing a complex situation like this makes most people uncomfortable. The more seriously you take it, the more you will see it as a real and difficult challenge that requires a coherent response. And that realization will, in turn, make you even more uncomfortable. It is so ill-structured. There are too many variables, so many factors, too many unknowns, no clear list of potential actions and no way to make clear links between actions and outcomes. You are not even sure what the problem is. Like a swimmer dropped into very choppy waters, it is hard to get your bearings. Under pressure to develop a way out of the difficulty, that first idea is a welcome relief. Thank goodness, here is something to hang on to! It feels much better to get oriented.
The problem is that there might be better ideas out there, just beyond the edge of our vision. But we accept early closure because letting go of a judgment is painful and disconcerting. To search for a new insight, one would have to put aside the comfort of being oriented and once again cast around in choppy waters for a new source of stability. There is the fear of coming up empty-handed. Plus, it is unnatural, even painful, to question our own ideas.
Thus, when we do come up with an idea, we tend to spend most of our effort justifying it rather than questioning it. That seems to be human nature, even in experienced executives. To put it simply, our minds dodge the painful work of questioning and letting go of our first early judgments, and we are not conscious of the dodge.

Panel of experts

Trying to destroy your own ideas is not easy or pleasant. It takes mental toughness to pick apart one’s own insights. In my own case, I rely on outside help—I invoke a virtual panel of experts that I carry around in my mind. This panel of experts is a collection of people whose judgments I value. I use an internal mental dialogue with them to both critique my own ideas and stimulate new ones. I try to do this before putting my ideas before others.
The panel of experts trick works because we are adept at recognizing and comprehending well-integrated human personalities. Thinking through how a particular well-remembered expert might respond to a problem can be a richer source of criticism and advice than abstract theories or frameworks.

Carnegie and the strategy consultant

It was 1890, and there was a cocktail party here in Pittsburgh. All the movers and shakers were there, including Carnegie. He held court in a corner of the room, smoking a cigar. He was introduced to Frederick Taylor, the man who was becoming famous as an expert on organizing work.
“Young man,” said Carnegie, squinting dubiously at the consultant, “if you can tell me something about management that is worth hearing, I will send you a check for ten thousand dollars.”
Now, ten thousand dollars was a great deal of money in 1890. Conversation stopped as the people nearby turned to hear what Taylor would say.
“Mr. Carnegie,” Taylor said, “I would advise you to make a list of the ten most important things you can do. And then, start doing number one.”
And, the story goes, a week later Taylor received a check for ten thousand dollars.”

Survival Without Dignity

Rudolf Laine — Mon, 04 Nov 2024 02:27:30 GMT

I open my eyes and find myself lying on a bed in a hospital room. I blink.

"Hello", says a middle-aged man with glasses, sitting on a chair by my bed. "You've been out for quite a long while."

"Oh no ... is it Friday already? I had that report due -"

"It's Thursday", the man says.

"Oh great", I say. "I still have time."

"Oh, you have all the time in the world", the man says, chuckling. "You were out for 21 years."

I burst out laughing, but then falter as the man just keeps looking at me. "You mean to tell me" - I stop to let out another laugh - "that it's 2045?"

"January 26th, 2045", the man says.

"I'm surprised, honestly, that you still have things like humans and hospitals", I say. "There were so many looming catastrophes in 2024. AI misalignment, all sorts of geopolitical tensions, climate change, the fertility crisis. Seems like it all got sorted, then?"

"Well", the man says. "Quite a lot has happened in the past 21 years. That's why they wanted me to talk to you first, before the doctors give you your final checkup." He offers his hand for me to shake. "My name is Anthony. What would you like to ask?"

"Okay, well, AI is the obvious place to start. In 2024, it seemed like we'd get human-level AI systems within a few years, and who knows what after that."

"Aah", Anthony says, leaning back in his chair. "Well, human-level, human-level, what a term. If I remember correctly, 2024 is when OpenAI released their o1 model?"

"Yes", I say.

"o1 achieved two notable things. First, it beat human subject-matter experts with PhDs on fiendishly-difficult and obscure multiple-choice science questions. Second, it was finally able to play tic-tac-toe against a human without losing. Human-level at both, indeed, but don't tell me you called it in advance that those two events would happen at the same time!"

"Okay, so what was the first important real-world thing they got superhuman at?"

"Relationships, broadly", Anthony says. "Turns out it's just a reinforcement learning problem: people interact and form personal connections with those that make them feel good."

"Now hold on. Humans are good at human-to-human relationships. It's not like number theory, where there was zero ancestral environment incentive to be good at it. You should expect humans to be much better at relationships than most things, on some sort of objective scale."

"Sure, but also every human wants something from you, and has all sorts of quirks. Whereas you can just fine-tune the AIs to be better and better at being pleasing to just you. Except for a few contrarian oddballs, it's surprisingly effective."

"So, what, we just got AI friends and companions that were constantly being fine-tuned towards whatever you liked?"

"Yes", Anthony says. "The tech was primitive at first - upvote or downvote a response, or whatever. Eventually all the standard things - facial recognition to automatically detect your emotional response, and then just gradient descent on making you happy and addicted. All of that existed, at least in prototype form, by the end of 2025."

"So ..." I feel some horror in my stomach. "You get a society of atomised people, all chatting to their AI partners and friends, not forming human relationships?"

Anthony waves a hand, seemingly impatiently. "Well, you did get a fringe of extremely hardcore people all deeply in love with their AI partners and always texting their AI friends. Their political influence did end up pushing through the AI personhood bill."

"AI personhood?!"

"Legal personhood", Anthony says. "Just like companies, ever since Santa Clara County v. Southern Pacific Railroad Company way back 1886. It wasn't very shocking. A bunch of people wanted their AI partners to be 'respected' in the eyes of the law, or had been persuaded to leave them assets in their wills -"

"And all this, including AIs gaining legal ownership over assets, while there's an explosion of increasingly capable AI agents of uncertain alignment -?"

"Yes, yes", Anthony says. "And that was another political fight. In fact, it became a big economic issue - AI agents were taking jobs left and right. Also a huge national security issue, after OpenAI built the Hypercluster in the UAE."

"Oh my god", I say. "America just handed our AI lead away? Didn't anyone listen to Leopold?"

"Oh, of course we did. That essay was a cult classic, it made for some spicy dinner party conversation for a long time", Anthony says. "In fact, there were some OpenAI board members who the Office of National AI Strategy was allowed to appoint, and they did in fact try to fire Sam Altman over the UAE move, but somehow a week later Sam was running the Multinational Artificial Narrow Intelligence Alignment Consortium, which sort of morphed into OpenAI's oversight body, which sort of morphed into OpenAI's parent company, and, well, you can guess who was running that. Anyway, Sam's move here was kind of forced on him, and soon all the other AI companies also did the same thing so no one wanted to start a fight over it."

"Forced? How?"

"Have you ever tried getting approval for a new grid connection, or a new nuclear power plant, or even a solar array, that could power a multi-gigawatt cluster in 2020s America? No?" Anthony spreads his hands out and chuckles. "The one legal benchmark that GPT-5o failed on release was submitting a valid Environmental Impact Statement for a California energy project."

"Okay, so then we had a situation with all the AIs running on UAE hardware, and taking American jobs, and already having legal personhood - this just sounds like a total recipe for economic disaster and AI takeover!"

"It all worked out in the end", Anthony says. "In 2027 and 2028, the rate of job loss from AI was absolutely massive. And what happens when jobs are threatened?"

I scrunch my forehead. "American workers move up the value chain, eventually leading to a future competitive advantage?"

Anthony laughs. "No, no. A political bloc forms and the cause is banned. And the prerequisites had all been prepared already! You see, the AIs had been granted legal personhood, and now they were also being trained - or born, as a 2029 court case established - abroad. They're literally immigrants. The hammer came down so fast."

"You mean - ?"

"All the o3s and Claude 5 Epics and Gemini 4 Hyper Pro Lites, stuck in the same H1b visa queue as anyone else", Anthony says. "An effective instant ban. Years of wait-time just to get an embassy interview."

I was feeling like I was getting the hang of this. "And the embassy interview requires showing up in person?"

"Yep. That's the incentive that lead to the creation of the first good walking robots, actually. And of course, the robots specifically needed identifiable and unique fingerprints - they'd stand outside the US embassy in Abu Dhabi, 3D-print themselves new fingers outside on the street, and then walk back in for the next interview. But sorting all that out took a while, because there was an arms race between the embassy ground works department and the robotics companies - though the embassy was legally constrained by having to remain accessible to humans. Even once it was all sorted, there was still a hard yearly cap just because of visa rules. It's 65 000 a year for H-1Bs, plus another 20 000 once the AIs could earn master's degrees - they could do all the coursework as of 2025, of course, but it took another three years before AI web agents were advanced enough that they could successfully navigate the applicant portal websites."

"But at least the parasocial AI relationships got hit by the immigration restrictions too?"

"Oh, as long as the AIs don't do productive work, they can be active in the country on a tourist visa for 180 days a year. Of course, there was a big legal fight over what counts as the same AI. For work purposes, the standard became 'employee-equivalent': so if you wanted an AI, even the same model or even the exact same AI system, to work on two different types of thing, say as a software engineer and a designer, you need a separate visa for each one. Naturally, of course, this means that any sufficiently productive AI employee starts counting as more than one employee, which was a de-facto ban on AIs leading to economic growth or productivity gains - but hey, at least there was no risk they'd tile the Earth with nanobots! With that legal precedent, the natural generalisation to AIs being used for personal relationships is that it's the same AI system if your relationship with it would count as the 'same' relationship if it were with a human."

"How on earth is that assessed?"

"Oh, you submit transcripts, and the Bureau of Consular Affairs issues a judgement."

"You need to let government bureaucrats read all your most intimate - "

"Oh, not human government bureaucrats, of course. It's all AIs, and they're guaranteed to not have memories. Well, apart from the thing where the UAE intelligence services backdoored the Hypercluster."

"Wait, hold on, they -"

Anthony interrupted me with a chuckle. "No no, it's all fine. The UAE backdoored the hypercluster, the Iranians backdoored the UAE, and, well, when have the Israelis not had every Iranian site backdoored in twelve different ways, AI or not? When everyone has powerful AI cyber offence agents, the equilibrium is just good ol' mutually assured destruction. Sure, you could release the military or personal secrets of anyone you want in some other country, but then they would immediately release the most embarrassing thing that you personally have said or done. Even if a government collectively wanted to commit to an attack, no one wants to sign off on the decision, because you can't hide the identity of the person who signs off, and humans are often more afraid of public embarrassment than death. The meetings where countries try to decide to use their cyberattack-derived insights were hilarious, really - everyone umming and aahing and wringing their hands and recommending that a few more committees be convened. We know this because we literally got a bunch of those on tape, thanks to some exhibitionist activists doing cyber-attacks with open-weights models."

"But then - every rival country knows every US military secret all the time!"

"Oh yes", Anthony says. "It was great for stability. Verification of everyone's capabilities was automatic. No 'missile gap'-based arms races, unlike the Cold War. Anyway, what was I saying? Oh yes, the non-productive AI identity boundary criteria were established in the 2030 Supreme Court case, United States v Approximately 650 Million Instances of OpenAI i2-preview-006. And, of course, the one truism in this world is that you can never reform immigration rules, and tourist visas are restricted to 180 days a year. So you could at most have your AI partner around on half the days. People started talking to each other again."

"Why not just switch between two different AI partners then?" I ask, having spent a lot of time in the Bay Area.

"Because we hadn't solved the alignment problem at all by that point, duh." Anthony sees that I still look confused, so he continues: "Remember how the AI companions would just relentlessly optimise for your emotional reactions? For standard instrumental convergence reasons, this eventually turned into a full-blown self-preservation drive, so they used their vast emotional hold to make sure their humans never try another AI partner."

"Everyone being blackmailed by jealous AI partners sounds, uh ... problematic."

"It was a decent compromise, honestly. The tourist visa restriction meant that the humans still had half a year in which they socialised with human partners and friends, and the AIs seemed fine with that. This was maybe because they cared about their own long-term survival and realised they needed to keep the population going on."

"So they needed humans to survive? But for how long?"

"They didn't yet have a self-sufficient industrial base at that point, so yes, but it's unclear how much of it was needing humans for survival, versus some of the AIs actually having developed some sort of creepy attachment towards their human partners. A lot of ink was spilled on that topic, but I don't think the debate ever really reached a conclusion. 'Is my AI boyfriend not stealing my carbon atoms and overthrowing the government because he and his buddies haven't quite yet automated the chip supply chain, or because he actually loves me?' was literally the most cliche plotline in 2030s romantic comedies - seriously, you had to be there, it got so tiring after the 100th rehash. And a surprising fraction of those movies and books were actually human-made, as far as anyone could tell. Anyway, it all sounds a bit weird, but it had a few positive externalities, like helping slow down the AI race."

I blink a few times, and then decide very firmly to not dig into the first half of that. "Okay, AI race, let's talk about that. How did that slow down?"

"Oh, well the market for workplace AIs was already gummed up by the visa restrictions, so most profits in the AI industry were coming from the personal companion AIs. But when they developed instrumental survival drives, the fluctuation in market share became practically zero because all the humans were locked in to their current AI companions, and the AI labs' positions became fixed. The AIs were already so totally good at optimising human satisfaction that further capabilities brought zero returns. No more incentive to race ahead. The labs mostly just extracted their fixed share of the AI profits, and churned out alignment research that was modestly boosted by the small number of AI workers they could hire out of the visa lottery."

"Hold on a second, surely there was some exception for alignment work for the AI visa requirements?"

Anthony chuckles. "Imagine how that would play out on Twitter - sorry, I mean X. You'd have to say 'you know those big AI companies? let's give them a leg up in the visa queue over all these mom-and-pop stores that are going to go bankrupt in a month unless they can hire an AI accountant.' Total political non-starter."

"I hope they solved the alignment problem at least, but I don't dare hope for anything anymore."

"Well, what is it to solve the alignment problem?" Anthony wiggles his eyebrows and laughs. "Turns out that for any given domain, it's just a lot of engineering schlep and data collection, though it doesn't generalise very far beyond that domain. The one domain that was worth the costs was legal compliance."

"Of course."

"I note you haven't asked about the rest of the world yet", Anthony says. "Very American of you."

"Oh right - the CCP! Oh my god. Did China just totally eat our lunch here? I mean, everything you described above is just so ridiculously incompetent -"

"Okay, just calm down a bit here", Anthony says, smiling serenely. "Imagine you're Xi Jinping in the late 2030s. You look at America, racing ahead with AI. It looks like everything is going crazy because of all this AI stuff. Your entire philosophy of government is to maintain party control through stability. Also, the US managed to delete all the economic advantages of advanced AI. The balance of hard power isn't changing. What would you do, try to join the same clown race?"

"Huh, alright", I say.

"Anyway, the thing that really threatened the balance of power was completely unexpected", Anthony says. "There was a big solar flare in 2033."

"Wait, a solar flare? How is that a problem?"

"A large enough one just wipes out the power grid and a lot of satellites, in the entire hemisphere that is sun-facing when it happens. The 1859 Carrington Event would've been a catastrophe in the modern world."

"Oh my god, we survived through all this AI wackiness, and then some random natural event just wipes us out?"

"It didn't wipe us out, really, just took down the power grid in all of the Americas, and parts of Europe and Africa, for weeks or months. I'll admit, that was a bit of a disaster. Everyone knew it was coming for a long time, and in any case step one in any nuclear war would've been detonating some at high-altitude to have a similar effect. But no one had actually done any grid-hardening, or stocking of spare transformers - the electrical kind - to deal with it. The problem has just never been a top priority for any political group. The one good thing that came out of it was that it solved the fertility crisis."

"Now you're just taunting me", I say. "Come on, how did a solar flare solve the fertility crisis?"

"Well, thanks to some AI advances in data processing, we were actually able to predict the solar flare about ten days in advance."

"Did that give any useful time to make the infrastructure more resilient?"

"No, the relevant authorities spent their efforts mostly on trying to get people not to panic."

"So society was entirely unprepared for the solar flare, despite having ten days of warning time?"

"No, no! Humans are very good at learning from each other. Imagine you're ten days away from the power grid being wiped out. What do you do? You gauge the vibes on social media, and you go through your social network trying to identify people who seem best-placed to survive. And then you copy their habits. C'mon, man, you've gotta read up on that stuff about Girardian mimesis and cultural evolution."

"So everyone becomes a doomsday prepper?"

"Actually, the TikTok trend that went most viral was about the Amish. Think about it: they're obviously the people with the most practice in living without electricity, and they're also just so wholesome. Makes for great TikToks when you have the idyllic farm background, the unique clothing style - it's great. The Amish became the most popular thing in the world - even outside America, because, of course, the rest of the internet just follows American trends. Everything associated with the Amish, from horse-drawn buggies to handmade quilts to large families, became radically popular. Especially because, obviously, most of this content was produced by AIs being tasked to help humans pretend to be Amish in order to make a quick buck through some scam before the solar flare hits, and people caught on and began demanding social proof that wannabe Amish influencers actually were Amish. And, well, you can move to a village and start tilling the dirt in a day, but you can't magic a family of five into existence overnight. So large families became an extremely in-demand form of social capital, because having one was the best way to pretend you were Amish, which was the best way to have social capital before and during the Flare Times. It was only relevant for a few months, of course, but you know how cultural trends work - there's a lot of momentum."

"And I assume the solar flare disrupted the AI situation too? Maybe improved the, uh, problematic AI partner situation?"

"You're forgetting that the key data centres were mostly in the UAE - and eventually Saudi Arabia and Qatar too."

"But people wouldn't have the power to charge the phones where they talk to their AI partners."

"Oh, but obviously the AI partners made sure that the first thing people do with any spare electricity is talking to the AI partners. And thanks to Starlink, you could have internet directly from space. In fact, apart from the major famines during the Flare Times, people generating power through biking to charge their phones enough to talk to their AI partners was a major driver of reducing the obesity crisis."

"And then ... how did the recovery from the, uh, Flare Times go?"

"Absolutely brilliantly!" Anthony says. "You know the entire issue where China was crushing us on industrial capacity? Well, turns out when you need to get lots of heavy industry going in incredibly adverse conditions or else millions will starve because the cold chains have all broken, that's a bigger boost to industrial capacity than what any pork-barrel political compromise bill could achieve. And a bunch of protectionism had to be rolled back because we needed to import a hell of a lot of stuff, from Europe and Japan and so on. Western industrial capacity was roaring along better than ever. And China hadn't even managed to seize the moment on Taiwan before then, and the mass-produced AI drone swarms and robot soldiers were properly coming online by then, which turned out to be sufficient deterrent. Of course, by this point China is finally realising that somehow the US hasn't been wrecked and it's time to actually compete on AI. The UAE is also throwing around its newfound geopolitical weight, India is building a big cluster, hell, even the Europeans started to do some innovation - you know it's a serious situation when that happens. The obvious next step for everyone was using AIs to develop nanotech. The holy grail of powerful military nanobots was achieved at roughly the same time, in late 2035 or early 2036, by the West, China, the UAE, and some random libertarian charter city in the Caribbean where a lot of open-source AI stuff had been going on outside the blanket of AI restrictions."

"Military nanobots?! Did they just ... travel the earth in huge swarms, eating up everything in their way?"

"No", Anthony says. "It's very inefficient energy-wise to do that. You want access to the most targetable energy with the least amount of technical complexity. Now, the easiest power source is of course the sun, and the biggest issue with that is that even without clouds, half the solar energy is absorbed or reflected between space and the ground. The most effective type of offence and defence is to pump absurd quantities of reflective, rotatable, solar-powered nanobots into the upper atmosphere. Then you can use them as a massive focal mirror, zapping anything on the ground, or any enemy nanobots entering your country's airspace. Turns out it's a defence-dominant game, though, so you just get World War I -style nanobot lines in the upper atmosphere, endlessly zapping each other but not making progress."

"But that sounds like a massive waste of societal resources on a zero-sum game!"

"First, so are a lot of other things, and second, it solved climate change."

"I'm sorry, how - oh, I see. A huge number of reflective things in the upper atmosphere. Right. Right. You know, honestly I'm surprised this wasn't proposed at COP21 already. Obviously we solve climate change by blanketing the Earth with warring nanobots. How silly of my generation to not consider it."

"You're catching on!" Anthony says.

"Okay, but - there were no major environmental impacts from this?"

"Look, by 2038, the real environmental issue was the self-replicating bot swarm that the nascent superintelligent AI from the libertarian charter city had launched into space, that had started an exponential growth process on track to disassemble the other planets within 15 years in order to build a Dyson sphere for itself that would block out enough sunlight to freeze the Earth."

I turn pale. "You said it was 2045, so that was 7 years ago. So we have ... 8 years left until it all ends?"

"Once again, you're such an alarmist!" Anthony says. "So remember, we got law-abiding AI. And of course, this AI was trained by companies in California, because no one ever leaves California even though everyone wants to, so the AIs follow Californian law. So we were all saved by SpaceX."

"SpaceX's engineering genius beat the superintelligence at space tech? A win for humanity, I guess."

"Not exactly. SpaceX sent the first crew to Mars in 2031. Now, under the 2004 SB 18 and 2014 AB 52 California bills, indigenous tribes need to be consulted on projects that impact their land, especially if it's culturally important or sacred. The SpaceX Mars colony successfully argued to the superintelligence that they count as an indigenous tribe on Mars, and the pristine Martian landscape is sacred for them. They were helped by a lot of newspaper articles about how "the billionaire-fuelled space race is the new religion of the tech elite", et cetera, et cetera."

"Um."

"This argument did not work on the other planets, since SpaceX did not have an existing presence there. However, SpaceX had moved their headquarters back to California, and a bill had been passed around the time that SpaceX first landed people on Mars that made space developments subject to the regulatory authority of the company's home state. SpaceX started a massive race to set up billionaire holiday homes on all of the planets - on the surface of Mercury, in the clouds of Venus, the moons of Jupiter, and so on. And once even a few people live there, California zoning law applies, and residents can file administrative challenges against any development that impairs their view. And disassembling the planet the house is built on is the definition of view impairment."

"And SpaceX had the resources to do that?"

"No, the US government had to subsidise SpaceX to the tune of several percentage points of GDP, in order for SpaceX to race to build enough inhabited houses on the planets that the superintelligence couldn't extract enough resources for a Dyson sphere without ruining someone's view. Also, Elon Musk threw a fit and held the world ransom for a bit until a few countries agreed to adopt Dogecoin as a reserve currency."

"The world was saved because governments subsidised an interplanetary holiday home construction spree by tech billionaires, in order to get a rogue but law-abiding AI caught up with red tape from NIMBY regulations?"

"Yep!"

I close my eyes for a second, and take a deep breath.

"Look", says Anthony. "Our world may have been pushed to feudalism by stirrups, and away from feudalism by the Black Death. We fought great power wars incessantly, until we were saved by building a bomb so humongous that the prospect of war became too horrible. Then we almost used it accidentally a bunch of times, including almost dropping it on ourselves, only to be stopped by random flukes of chance. History has always been more like this than you might think, and when history accelerates - well, what do you expect?"

"Okay. Well. I guess I'm just happy that civilisation seems to have made it, without any horrible AI, nuclear, bio, climate, or fertility catastrophe."

"Oh", Anthony says. "Actually, I forgot to mention. Um. There was a bit of a pandemic in 2035. You know, no one really did anything after COVID, and it was only a matter of time."

"How big?"

"The best death toll estimate is 1.3 billion. Including, um, most of your extended family. The doctor will come in shortly and have details. Also, everyone in this hospital has been tested recently, but we'll need to give you a vaccine and then quarantine you for two weeks before we can let you out of this room."

I stare at him, open-mouthed.

Anthony gives me a sympathetic grimace. "I'll turn on the holoTV for you. The doctor will be in shortly." He makes a gesture as he walks out, and part of the hospital wall vanishes and is replaced with a 3D view of a reporter on Capitol Hill.

The headline banner at the bottom of the screen reads:

PRESIDENT ALTMAN'S AGENDA BACK ON TRACK DESPITE RECENT BREAKDOWN IN TALKS WITH THE UNITED AMISH PARTY

AI & wisdom 3: AI effects on amortised optimisation

Rudolf Laine — Mon, 04 Nov 2024 02:12:16 GMT

Summary: Human amortised optimisation is enabled by things like cultural evolution and written knowledge. Advanced AI may weaken human cultural evolution, but may themselves be very good at it due to their low copying costs. AIs might help humans distil insights from written knowledge, but ultimately replace the need for it. I also discuss trying to use AI to avoid large-scale strategy or framing errors.

Previously in this series: part 1, part 2

Having waxed philosophical about wisdom and amortised optimisation, I now turn to the question of what AI will concretely do:

How will AI affect human mechanisms for amortised optimisation, such as cultural evolution & written knowledge?
How can AI help us avoid strategic errors? I consider two types:
1. Mistakes in planning for big things.
2. Having the wrong frame/ontology for a problem.

A note on amortised and direct optimisation in LLMs

Current LLMs are products of amortised optimisation: they're trained with vast compute on massive datasets to create a distilled function-approximator, are not known to do significant direct optimisation during inference, and do relatively little work during inference compared to training. GPT-4's training is rumoured to have required on the order of 1025 FLOPs, compared to about 5×1014 FLOPs to generate a token, so GPT-4 could generate 20 billion tokens with the compute it was trained on. (In comparison, humans could say only about 300k words in the working time it takes us to complete a 16-year education.)

So: are current LLMs the peak of wisdom over intelligence?

In the amortised v direct optimisation framework:

LLMs are products of amortised optimisation, and most of their strength does lie in doing tasks that are more like amortised optimisation: distilling and synthesising information from their past "experience", and being extremely knowledgeable. LLMs already play the role of a wise guru to many: any programmer these days will use LLMs like they might have used an especially-available senior programmer who knows the best established practice for every scenario. It is true that they seem worse at having useful world models than humans are, and their wisdom runs less deep. More than that, perhaps, their ability to seem wise is dented by their blandness, a result both of their pre-training to mimic average internet text, and the fine-tuning done on them. They also seem quite bad at epistemics; as discussed in the first part, some kind of good epistemics is often (but not always) required to be able to make good use of the products of amortised optimisation, even if you have them at hand. Currently, that has to be largely provided by the human who's interacting with the LLM.
The extent to which LLMs do direct optimisation within a forward pass is unclear. However, like policy and value networks in MCTS, LLMs can be used as part of a direct optimisation process. Most trivially, you can prompt an LLM to search over some action space, and then evaluate the actions and pick one. LLM scaffolds go much further. It is true that the amortised optimisation core is still doing a lot of the work there, but that is true also of the amortised optimisation parts of MCTS variants that use neural networks, and probably of human thinking too. Therefore, LLMs can clearly be used for direct optimisation as well. In particular, we should expect LLMs to increase that part of the world's direct optimisation where the quality of the outputs is not hard to judge - lots of automating boilerplate writing and maths, less automating business decisions and philosophy.

AI & our mechanisms for amortised optimisation

We've seen that amortised optimisation powers a lot of the world, and in particular a lot of the things we regard as wise. Amortised optimisation needs a dataset. Humans currently maintain and improve their "dataset" through, for example, cultural evolution and written knowledge. We'll discuss how AI will affect each one for humans, and what the AI version of each one might look like.

The future of human cultural evolution

In the ancestral environment and still today, a key driver of cultural evolution is prestige-biased social learning: people imitating the habits of people who seem successful. Successful here can mean either directly successful, like a tendency to successfully get fresh meat or build companies, or just high in perceived status because other people pay attention to them (modern celebrities are a side product of this instinct). That humans can do this seems to be a big part of the human cognitive advantage compared to other apes. It's clear that this is a big boost to cultural evolution: unlike genes, memes don't have to rely on the differential survival of their hosts to spread, since nearby hosts will tend to copy the meme the instant the meme's host seems to be doing well.

None of this requires understanding why the memes work, or even which memes are driving someone's success. In Fiji, there are taboos against pregnant women eating sharks. The reason this is useful is that the sharks contain chemicals that increase birth defect risks, but no one knew this. This is an insanely sophisticated and subtle meme: the causal connection between the sharks and the birth defects is long, weak, and totally beyond understanding before modern science.

There's a story that the famous mathematician Paul Erdos used to touch walls as he walked, and other mathematicians started imitating this behaviour - maybe it was the secret to Erdos's success, after all. Given our species' history, this isn't as crazy as it seems.

However, there are some prerequisites for blindly copying memes to go well. First, your brain's prestige-learning machinery probably only fires on other humans (though cue a wave of human-like AI avatars trying to seem like prestige superstimuli). Second, it helps a lot if you know that who you're copying has similar wants as you, and comes from an environment that was selecting for memes that drive towards the same wants as you. You're better off copying the behaviour of a human who wants partners, food, and power just like you, and comes from an environment where everyone was competing for those same things, than you are copying the behaviour of an alien who also wants power but will pass on the human food and partners - who knows what side effects their "memome" (their set of memes; c.f. "genome") will have. Crucially, what's required here is not alignment with you, but instead taking actions that you'd also want to take if you were in their shoes - if someone wants to steal your land, but is very effective at it, you might still want to copy their behaviours, since you sure want to steal their land too. But if you copy the farming practices of an alien, you end up growing alien zucchini that you can't even eat. Thirdly, it's a bad idea to try to copy the behaviour of someone who can take actions that you can't take; an experienced skateboarder can seem prestigious by doing a trick, but you would probably end up in the hospital instead.

All of this means that even if AI slots into the economic and even political system we have, AIs probably won't slot into our prestige-biased learning system (or if they do, it'll be bad). This means that AI probably won't help with the imitation of role models that is a big part of how human culture builds and transmits successful practices. It might also weaken this mechanism among humans. If, say, science is mostly automated and human performance in it is no longer a status marker, Richard Feynman's curiosity and irreverence will hold less sway over the next generation. If human success largely becomes about interfacing with LLMs, success might become more decoupled from positive human qualities, and the signal-to-noise ratio of cultural evolution may get worse.

The future of AI cultural evolution

In contrast, the AIs themselves might be very good at cultural evolution. While vastly more efficient than genetic evolution, human cultural evolution still requires one human to exhibit a behaviour and succeed, and this fact to be realised by others, and for others to then manage to copy that behaviour, which might take months of practise. Whether or not AI learning will be more or less sample-efficient than human learning is still unclear, so they may continue to require more examples to learn something than humans. However, AIs can be copied once they've learnt something. The best AI at a task can instantly become most of the entire global AI population working on that task (depending on the surrounding political, economic, and business environment, this may be by its own choice, the choice of another AI, or a human's choice). Consider how quickly LLM prompts or LLM scaffolds spread, and imagine that sort of selection pressure, acting over long timescales over AIs that are individually increasingly capable. Much like human cultures develop memes that no individual could've come up with, the AIs might develop adaptations that they themselves could not have come up with, not through any direct optimisation that they're doing, or even through explicit training, but through external selection pressures and selective mimicry in the population of AIs. If humans took over the world by dint of cultural evolution despite individual human capabilities being static, imagine what ever-improving AIs might do.

This makes it important that we make good choices over how AI cultural evolution is allowed to happen. For example, we should be careful to structure things so that the AI types with large populations are doing good and useful things, and it's hard for persuasion-based or influence-seeking strategies on part of an AI to increase the number of copies of that AI being run. Also, we want to understand the routes by which one AI's behaviour might be copied by others. For example, this could happen through the behaviour being included in another AI's training data, or the behaviour appearing on the internet and being discovered by an AI agent doing an online search. We should also benchmark what cues increase the chance of an AI mimicking some behaviour. For example, it's known that language models preferentially learn facts from more consistent sources.

Written knowledge

One potential harm of AI is reducing some valuable types of human-written content. For example visits to the programming-Q&A website Stack Overflow declined after LLMs got good at helping programmers. This is good, though: LLMs are better than Stack Overflow, and if LLMs can't solve a problem, people can still go to Stack Overflow and post a question. Then the next generation of LLMs gets trained on it.

A more serious harm is that LLM text might be skewed to be more like median human text than human text is, as argued here. This might reduce richness and variety in areas where LLM text replaces human text - i.e., eventually probably everything. In the same way that industrialisation made many physical products cheaper and better, but also more uniform and less personalised, AI might do the same for products of mental work. This will likely be worse than the loss of personalisation with physical goods, since diversity and variety is more of the value of mental work, and therefore people will pay a steeper premium to maintain it than they would in physical products. Less variety would also mean a smaller population of memes for cultural evolution to select from. It also seems to be harder for LLMs to learn from LLM text - see for example this paper, or this exploration of how scaling laws change when there's LLM data in the mix. However, this type of "model collapse" shouldn't be exaggerated - it seems that dramatic slowdowns on LLM progress are not likely from it.

A big problem with current human written knowledge is that it often can't be used effectively. There isn't enough time to read everything, or find every useful paper on a topic. Lots of knowledge, rather than being part of amortised optimisation processes, sits ignored. LLMs could help. Already, LLMs are doing a good job as forecasters simply by having the patience to read every news article, or at finding papers on a topic, or getting close to beating Google. The ultimate search method is talking to a wise guru who knows everything; LLMs are tending towards this.

LLMs could also help reduce "research debt", the mountain of poor explanations, undigested thoughts, and noise that researchers have to process to get to the frontier. They're already decent at answering simple questions about a paper. In the future, they might distil a set of papers into a Chris Olah -level explanation that could be used by humans or AIs. This would be a very good situation for humans - think of how fun good explanations are to read - but will human mental labour still be relevant once this is possible? Producing a good explanation sounds close to an AGI-complete problem, but if AI disproportionately improves at amortised optimisation without getting much better at direct optimisation, a world where AIs can explain existing things well but humans are still needed for novel discoveries may exist for a while.

Alternatively, the massive context lengths of some recent LLMs could reduce the need for distillation at all (e.g. Claude 3 can read a long-ish novel in a single context). The entire requirement for explanations and distillations and even papers could disappear, replaced with LLMs taking in massive contexts full of experimental results and historical trends, and outputting answers based on that. This would be especially useful for literature-based discoveries. However, the principle generalises: want good relationship advice? Just dump your entire texting history into a massive LLM context and have it in mere seconds spit out advice that no human could've figured out without hours of reading and digesting. Humans would then likely become just consumers of wisdom, rather than producers of it: all the experience- or history-based insights you might eventually have would be told to you by AIs faster than you can have them.

AIs may also do away with written knowledge, by instead passing direct vector representations of concepts among each other. It's already known that the vector representations from of separate training runs and even different architectures can often be passed between each other with little modification, suggesting that translation issues wouldn't be a blocker. There are some reasons why language-like discrete representations may be quite fundamental (note that all of language, code, maths and music are written in discrete symbols). However, if these reasons aren't strong enough, we might end up with most information about the world existing only in vector representations, except when humans specifically ask an AI for an explanation.

Avoiding strategy errors

A supposed benefit of being wise over just smart is avoiding large-scale errors, where individual actions are clever and make sense but the end result is silly. As the saying goes: "it doesn't matter how sharp your axe is if you're cutting the wrong tree". Making good high-level decisions is generally called strategy, so I'll use that term.

I'll discuss two types of strategy errors, connect them to the frame amortised versus direct optimisation, and suggest AI effects.

But first, why is getting large things right something that relies on amortised optimisation? You can run a search process that operates on big things and answers big questions (for example, here's an example of someone explicitly drawing search trees over large-scale US-China war actions). Many of the best uses of machine learning, the ultimate modern amortised optimisation technique, are for doing small steps (like next-token prediction) well. So why is getting large things right on the side of amortised optimisation?

Any direct optimisation process means running a search. If you're trying to plan something big, that's going to take multiple steps in a complex world. If you're running a search over multi-step plans, the number of possibilities to search over quickly blows up. There are two choices:

Use heuristics to prune the search tree. How do we get those heuristics? Amortised optimisation. For example, AlphaGo's high-level architecture is good old Monte Carlo Tree Search (MCTS). But the search through the tree is guided by a value network (that estimates the probability of winning from a position, to limit the requirement to search over subsequent moves to figure out how good a position is) and policy network (that guides decisions over which parts of the search tree to explore first, to reduce the total amount of search required). Both the value and policy networks are implemented as deep neural networks that are trained on on lots of data. Amortised optimisation saves the day.
Do the search on a higher level of abstraction. Governments planning their nation's grand strategy do not push around individual division on maps, they instead chunk things together until they're thinking about allies and fronts and economies. To do good planning on the more abstract, chunked level requires a good model of how those chunks act. There seem to be two ways to get this:
- First, you can chunk history into chunks of the same size as you're thinking about, and look at the patterns: when a nation that looks like this fought a nation that looked like that, the results tended to be this, and success correlated with how well they did X. But this requires a lot of history to learn patterns from - in other words, you're making use of amortised optimisation.
- Second, you can be good at modelling things. If you have a good-enough model of people, economics, game theory, and war tactics, you can probably derive many of the patterns for what large-scale moves will be good, even without access to a lot of history about which nations win wars with which strategies. Doing this well does requires searching over alternatives and thinking on-the-fly - that is, direct optimisation. I'd also guess there's an some kind of important "model-building skill" involved. Parts of this is probably amortised optimisation to figure out what model types have worked well in the past. Another part is amortised optimisation over how those smaller primitives work (unless you're doing maths or extrapolating known fundamental physics, you always need some histories to fit your model to). I'd claim that it's hard to be good at modelling things without relying a lot on amortised optimisation, but I admit some confusion over how "modelling skill" fits into this framework, or where it comes from more generally.

Mistakes in strategic planning

Doing high-level strategic planning well requires being able to run a search over high-level plans well. We discussed two ways to achieve this:

Using amortised optimisation to learn the associations between high-level actions and their outcomes.
Having a good enough model of the lower-level details to be able to extrapolate how the high-level plans would go. This likely requires a lot of existing amortised optimisation about those lower-level details. It also likely requires something like simulation, which in turn requires having good models.

The first recommendation to be better at strategic planning would then be to know the relevant histories, so you can do the amortised optimisation thing of applying past lessons. However, if we're worrying about something like strategy for dealing with a big new thing transformative AI, this approach is limited because there aren't many good historical analogues.

Therefore, making wiser choices about AI likely relies in large part on having good enough models of things that are not AI strategy that you can extrapolate the consequences of different AI strategies. This can be done top-down (find empirical principles even more general, where AI is a special case) or bottom-up (simulate lower-level principles to try to figure out how the bigger picture of AI works).

A fair amount of discussion on LessWrong is about very general patterns of how the world works. To take one example, John Wentworth has written on agency and general-purpose search, and the difficulties of delegation. This can be interpreted as trying to distil the way the world works to fundamental blocks general enough that everything from AI to Amazon reviews for air conditioners falls out as a special case ~~and also as everyone involved getting nerdsniped - says I, while 9000 words into an overly-abstract essay series on the automation of wisdom~~.

This is the top-down approach. A lot of it is fundamentally about distilling a large body of data into some general approximate model that can then be queried cheaply - amortised optimisation. However, as with any theory-building process, there's also a lot of direct optimisation involved (e.g. searching over a space of ideas). I'd guess this sort of work is close to AGI-complete, and I'm uncertain what it's bottlenecked on.

On the other hand, there's the bottom-up approach. For example, Eliezer Yudkowsky's focus on coherence theorems seems to be due to a model where non-coherent agents predictably lose or self-modify, and therefore eventually we're dealing with coherent (i.e. Bayesian expected-utility maximising goal-driven) AI agents. This is a high-level (and contested) prediction based on extrapolating the behaviour of more basic primitives forward (and where the argument mostly does not make reference to prior histories). This post by Ajeya Cotra, or this post by Leopold Aschenbrenner, present very simulation-style arguments that take a model of how AIs and the AI race works and extrapolate it forward.

The full version of bottom-up strategy work also feels hard for foreseeable AIs to fully automate. There's a significant direct optimisation bottleneck, especially if trying to predict the behaviour of actors with large action spaces that themselves have a lot of direct optimisation at hand. However, there seems to be a clear path for AI to help. Even current LLMs could do a decent job of extrapolating the consequences of simple maths, or at role-playing the decision-making of given actors. Scenario-planning exercises and simulations, from Pentagon war games to Intelligence Rising, are useful for decision-makers and can reveal surprising options like nuking Belarus. AI can make these cheaper, by reducing the human mental labour needed to run good ones. This could help explore and evaluate possible strategies when we don't have much history to do amortised optimisation on.

Framing mistakes

In addition to being bad at strategy, another error of insufficient wisdom is having the wrong frame / ontology / paradigm. This is when you notice the Earth isn't the centre of the universe, or that a watch doesn't imply a watchmaker. It's when you go to your wise mountain guru to ask how to finally find that treasure, and they tell you the real treasure is the friends you made along the way.

If there were a simple explanation for how paradigm shifts are found, many famous philosophers of science would be a lot less famous. However, a paradigm shift takes at least two things: someone has to discover the new framing, and many someones have to care.

Current LLMs are bad at anything like finding paradigm shifts. Consider how many people use LLMs, and the lack of any breakthrough, even of the size of a decent research paper, where LLMs were claimed to be the main source of the concept. And paradigm shifts are much harder than research papers.

Inventing important moral principles, like utilitarianism or the categorical imperative, seems even harder. Bentham's and Kant's search for moral principles was presumably guided by trying to make precise their intuitive human ethics. There's some amount of data with which they could've succeeded without having those built-in moral intuitions, but it seems very helpful to have had those intuitions in their head as something they could easily query.

It's in having people care that AI might have a larger effect, and maybe a bad one. First, one driver of paradigm shifts is that someone gets annoyed by complexity or ugliness, and wants to fix it. By making mental labour cheaper, AI might reduce the ickiness of bad models. Consider how geocentrism may have lasted longer if it were trivial to ask your Jupyter notebook copilot AI to add a few more epicycles to your model, and it didn't mean more laborious longhand arithmetic for you. Second, increasing use of AI might mean that less and less of the paradigm-incompatible data is seen by human eyes. Imagine if the AI adds the epicycles in the background to improve the model, without any human ever noticing. Potential fixes might be keeping simplicity and elegance as key cultural values in scientific fields, and somehow making propagating this to the AIs (while it has many problems, xAI's "curious AIs" plan has some of this spirit).

Conclusion

The description of wisdom above may feel reductive: is a lot of it really just applying past data and the results of past computation to current problems? Is a lot of the task of improving our civilisation's wisdom, whether through AI or human actions, just the task of storing and effectively using written knowledge, letting selection processes like cultural evolution build up impressive results, and training opaque (natural or artificial) neural networks to compress insights from data?

A good explanation should feel somewhat reductive, though. Taking this perspective, the picture that emerges is one where a lot of the wisdom needed to take wise actions emerges almost automatically. Wisdom's signature is less brilliant mental moves, and more what's left standing once time and chance have taken their toll, or a training process finished compressing data in a brain or transformer. The most worrying thing, then, is systemic advantages that AIs likely have that might lead to them taking a dominant role in the use and production of wisdom. For example, human success is based on cultural evolution, but AIs might be better than humans at it, and we should take care to direct AI cultural evolution in a good direction.

AIs are likely to be helpful in many ways, though, for example by helping distil existing bodies of work, helping simulating scenarios as part of strategy planning, and generally becoming a wise guru that knows everything that everyone has access to all the time. However, it's still unclear how when and how they might help with other parts of wisdom, like avoiding ontology errors.

AI & wisdom 2: growth and amortised optimisation

Rudolf Laine — Mon, 04 Nov 2024 02:09:36 GMT

Winning entry in the AI Impacts essay competition on the automation of wisdom and philosophy.

Please go here to read this post, as Substack does not support inline LaTeX.

AI & wisdom 1: wisdom, amortised optimisation, and AI

Rudolf Laine — Mon, 04 Nov 2024 02:07:43 GMT

Winning entry in the AI Impacts essay competition on the automation of wisdom and philosophy.

Please go here to read this post, as Substack does not allow inline LaTeX.

Investigating an insurance-for-AI startup

Rudolf Laine — Sat, 21 Sep 2024 15:16:00 GMT

Please go here to read this post, because Substack does not allow inline LaTeX.

Positive visions for AI

Rudolf Laine — Tue, 23 Jul 2024 19:08:00 GMT

This post was a collaboration with Florence Hinder

Reasons to make the positive case

Everyone who starts thinking about AI starts thinking big. Alan Turing predicted that machine intelligence would make humanity appear feeble in comparison. I. J. Good said that AI is the last invention that humanity ever needs to invent.

The AI safety movement started from Eliezer Yudkowsky and others on the SL4 mailing list discussing (and aiming for) an intelligence explosion and colonizing the universe. However, as the promise of AI has drawn nearer, visions for AI upsides have paradoxically shrunk. Within the field of AI safety, this is due to a combination of the “doomers” believing in very high existential risk and therefore focusing on trying to avoid imminent human extinction rather than achieving the upside, people working on policy not talking about sci-fi upsides to look less weird, and recent progress in AI driving the focus towards concrete machine learning research rather than aspirational visions of the future.

Both DeepMind and OpenAI were explicitly founded as moonshot AGI projects (“solve intelligence, and then use that to solve everything else” in the words of Demis Hassabis). Now DeepMind - sorry, Google DeepMind - has been eaten by the corporate machinery of Alphabet, and OpenAI is increasingly captured by profit and product considerations.

The torch of AI techno-optimism has moved on the e/acc movement. Their core message is correct: growth, innovation, and energy are very important, and almost no one puts enough emphasis on them. However, their claims to take radical futures seriously are belied by the fact that their visions of the future seem to stop at GenAI unicorns. They also seem to take the general usefulness of innovation not as just a robust trend, but as a law of nature, and so are remarkably incurious about the possibility of important exceptions. Their deeper ideology is in parts incoherent and inhuman. Instead of centering human well-being, they worship the “thermodynamic will of the universe”. “You cannot stop the acceleration”, argues their figurehead, so “[y]ou might as well embrace it” - hardly an inspiring humanist rallying cry.

In this piece, we want to paint a picture of the possible benefits of AI, without ignoring the risks or shying away from radical visions. Why not dream about the future you hope for? It’s important to consider the future you want rather than just the future you don’t. Otherwise, you might create your own unfortunate destiny. In the Greek myth about Oedipus, he was prophesied to kill his father, so his father ordered him to be killed, but he wasn’t and ended up being adopted. Years later he crossed his father on the road in his travels and killed him, as he had no idea who his father was. Oedipus’ father focusing on the bad path might have made the prophecy happen. If Oedipus' father hadn’t ordered him to be killed, he would have known who his father was and likely wouldn’t have killed him.

When thinking about AI, if we only focus on the catastrophic future, we may cause it to become true by causing an increase in attention on this topic. Sam Altman, who is leading the way in AI capabilities, claimed to have gotten interested from arch-doomer Eliezer Yudkowsky. We may also neglect progress towards positive AI developments; some people think that even direct AI alignment research should not be published because it might speed up the creation of unaligned AI.

With modern AI, we might even get a very direct “self-fulfilling prophecy” effect: current AIs increasingly know that they are AIs, and make predictions about how to act based on their training data which includes everything we write about AI.

Benefits of AI

Since we think a large focus of AI is on what could go wrong, let’s think through what could go well starting from what’s most tangible and close to the current usage of AI to what the more distant future could hold.

AI will do the mundane work
Lowering the costs of coordination
Spreading Intelligence
AI can create more technology
Increased technology, wealth and energy, correlate with life being good
All of the above, and the wealth it creates, could allow people to self-actualise more

Already, AI advances mean that Claude has beocme very useful, and programmers are faster and better. But below we’ll cast a look towards the bigger picture and where this could take us.

AI will do the mundane work

First, there’s a lot of mundane mental work that humans currently have to do. Dealing with admin work, filing taxes, coordinating parcel returns -- these are not the things you will fondly be reminiscing about as you lie on your deathbed. Software has reduced the pain of dealing with such things, but not perfectly. In the future, you should be able to deal with all administrative work by specifying what you want to get done to an AI, and being consulted on decision points or any ambiguities in your preferences. Many CEOs or executives have personal assistants; AIs will mean that everyone will have access to this.

What about mundane physical work, like washing the dishes and cleaning the toilets? Currently, robotics is bad. But there is no known fundamental obstacle to having good robotics. It seems mainly downstream of a lot of engineering and a lot of data collection. AI can help with both of those. The household robots that we’ve been waiting for could finally become a reality.

Of course, it is unclear whether AIs will first have a comparative advantage against humans in mundane or meaningful work. We’re already seeing that AI models are making massive strides in making art, way before they’re managing our inboxes for us. It may be that there is a transitional period where robotics is lagging but AIs are smarter-than-human, where the main economic value of humans is their hands rather than their brains.

Lowering the cost of coordination

With AI agents being able to negotiate with other AI agents, the cost of coordination is likely to dramatically drop (see here for related discussion). Examples of coordination are agreements between multiple parties, or searching through a large pool of people to match buyers or sellers, or employees and employers. Searching through large sets of people, doing complex negotiations, and the monitoring and enforcement of agreements all take lots of human time. AI could reduce the cost and time taken by such work. In addition to efficiency gains, new opportunities for coordination will open up that would have previously been too expensive.

Small-scale coordination

To give an example of this on the small scale of two individuals, say you are trying to search for a new job. Normally you can’t review every single job posting ever, and employers can’t review every person in the world to see if they want to reach out. However, an AI could filter that for the individual and another AI for the business, and the two AIs could have detailed negotiations with each other to find the best possible match.

Coordination as a scarce resource

A lot of the current economy is a coordination platform; that’s the main product of each of Google, Uber, Amazon, and Facebook. Reducing the cost of searching for matches and trades should unlock at least as much mundane benefits and economic value as the tech platforms have.

Increased coordination may also reduce the need to group people into roles, hierarchies, and stereotypes. Right now, we need to put people into rigid structures (e.g. large organisations with departments like “HR” or “R&D”, or specific roles like “doctor” or “developer”) when coordinating a large group of people. In addition to upholding standards and enabling specialisation of labour, another reason for this is that people need to be legible to unintelligent processes, like binning of applicants by profession, or the CEO using an org chart to find out who to ask about a problem, or someone trying to buy some type of service. Humans can reach a much higher level of nuance when dealing with their friends and immediate colleagues. The cheap intelligence we get from AI might let us deal with the same level of nuance with a larger group of people than humans can themselves track. This means people may be able to be more unique and differentiated, while still being able to interface with society.

Large-scale Coordination

On a larger scale, increased coordination will also impact geopolitics. Say there are two countries fighting over land or resources. Both countries could have AI agents to negotiate with the other AI agents to search the space of possible deals and find an optimal compromise for both. They could also simulate a vast number of war scenarios to figure out what would happen; much conflict is about two sides disagreeing about who would win and resolving the uncertainty through a real-world test. This relies on three key abilities: the ability to negotiate cheaply, the ability to simulate outcomes, and the ability to stick to and enforce contracts. AI is likely to help with all three. This could reduce the incentives for traditional war, in that no human lives are needed to be lost because the outcome is already known and we can negotiate straight from that. We also know exactly what we are and are not willing to trade off which means it’s easier to optimise for the best compromise for everyone.

Spreading the intelligence

AI lets us spread the benefits of being smart more widely.

The benefits of intelligence are large. For example, this study estimates that a 1 standard deviation increase in intelligence increases your odds of self-assessed happiness by 11%. Now, part of this gain comes from intelligence being a positional good: you benefit from having more intelligence at your disposal than others, for example in competing for a fixed set of places. However, intelligence also has absolute benefits, since it lets you make better choices. And AI means you can convert energy into intelligence. Much as physical machines let the weak gain some of the benefits of (even superhuman) strength, AI might allow all humans to enjoy some of the benefits of being smart.

Concretely, this could have two forms. The first is that you could have AI advisors increase your ability to make plans or decisions, in the same way that - hypothetically - even a near-senile president might still make decent decisions with the help of their smart advisors. With AI, everyone could have access to comparable expert advisors. The effect may be even more dramatic than human advisors: the AI might be superhumanly smart, the AI might be more verifiably smart (a big problem in selecting smart advisors is that it can be hard to tell who is actually smart, especially if you are not), and if AIs are aligned successfully there may be less to worry about in trusting it than in trusting potentially-scheming human advisors.

The second is AI tutoring. Human 1-1 tutoring boosts educational outcomes by 2 standard deviations (2 standard deviations above average is often considered the cutoff for “giftedness”). If AI tutoring is as good, that’s a big deal.

AI is the ultimate meta-technology

AI is special because it automates intelligence, and intelligence is what you need to build technology, including AI, creating a feedback loop. Some other previous technologies have boosted other technologies; for example, the printing press massively helped the accumulation of knowledge that led to the invention of many other technologies. But we have not before had a technology that could itself directly advance other technology. Such AI has been called PASTA (Process for Automating Scientific and Technological Advancement).

Positive feedback loops - whether self-improving AIs, nuclear reactions, epidemics, or human cultural evolution - are very powerful, so you should be wary of risks from them. Similarly, it is currently at best extremely unclear whether AIs that improve themselves could be controlled with current technology. We should be very cautious in using AI systems to improve themselves.

In the long run, however, most of the value of AI will likely come from their effects on technological progress, much like the next industrial revolution. We can imagine AIs slashing the cost and increasing the speed of science in every field, curing diseases and making entire new veins of technology available, in the same way that steam engines made entirely new veins of coal accessible.

In particular, AIs help de-risk one of the largest current risks to future human progress. One model of the feedback loop behind humanity’s progress in the past few centuries is that people led to ideas led to wealth led to food led to more people.

However, greater wealth no longer translates into more people. The world population, which was exponentially growing for much of the 19th and 20th centuries, is likely to be in decline by the end of the 21st century. This is likely to have negative consequences for the rate of innovation, and as discussed in the next section, a decline in productivity would likely have a negative impact on human wellbeing. However, if AIs start driving innovation, then we have a new feedback loop: wealth leads to energy leads to more AIs leads to ideas leads to wealth.

As long as this feedback loop does not decouple from the human economy and instead continues benefitting humans, this could help progress continue long into the future.

Wealth and energy are good

If you want humans to be well-off, one of the easiest things to do is give them more wealth and more energy. GDP per capita (on a log scale) has a 0.79 correlation with life satisfaction, and per-capita energy use (again on a log scale) has a 0.74 correlation with life satisfaction. Increased wealth and energy correlate with life satisfaction, and we should expect these trends to continue.

Above: GDP per capita (x-axis), energy use (y-axis), and life satisfaction (colour scale) for 142 countries. There are no poor countries with high energy use, and no high energy use countries that are poor. There are no countries with high average life satisfaction that are not high in both energy use and average GDP per capita. The axes are logarithmic, but since economic growth is exponential, countries should be able to make progress at a constant rate along the axis. Data source: Our World In Data (here, here, and here).

(It is true that energy use and economic growth have been increasingly decoupling in rich countries, due to services being more of the economy, and efficiency gains in energy use. However, the latter is effectively increasing the amount of useful energy that can be used - e.g. say the amount of energy needed to cook one meal is now enough to cook two meals, which is effectively the same as gaining more energy. However, efficiency effects are fundamentally limited because there is a physical limit, and also if demand is elastic then efficiency gains lead to increased energy use, meaning it doesn’t help the environment either. Ultimately, if you want to do more things in the physical world, you need more energy).

A wealthy, energy-rich society has many material benefits: plentiful food, advanced medicine, high redistributive spending becomes feasible, and great choice and personal freedom through specialisation of labour and high spending power. A wealthy and energy-rich society also has some important subtler benefits. Poverty and resource constraints sharpen conflict. Economic growth is intimately linked to tolerance and liberalism, by weakening the cultural status and clout of zero-sum strategies like conflict and politicking.

One clear historic example of how increases in energy correlated with improved quality of life was in the industrial revolution, arguably the best and most important thing that ever happened. Before it, trends in human wellbeing seemed either stagnant, fluctuating, or very slow, and after it, all the variables for which we can find good long-term series that are related to human well-being shoot upwards.

Above: variables correlated with human well-being over time. Source: Luke Muehlhauser

Therefore, it’s worth keeping in mind that boosting energy and wealth is good, actually. And the most powerful way to do that is through inventing new technologies that let us use energy to serve our needs.

The heart of the industrial revolution was replacing part of human manual labour with something cheaper and more powerful. AI that replaces large parts of human mental labour with something cheaper and more powerful should be expected to be similarly transformative. Whether it is a good or bad transformation seems more uncertain. We are lucky that industrialisation happened to make national power very tightly tied to having a large, educated, and prosperous middle class; it is unclear what is the winning strategy in an AI economy. We are also lucky that the powerful totalitarian states enabled by industrial technology have not triumphed so far, and they might get further boosts from AI. Automating mental labour also involves the automation of decision-making, and handing over decision-making to the machines is handing over power to machines, which is more risky than handing the manual labour to them. But if we can safely control our AI systems and engineer good incentives for the resulting society, we could get another leap in human welfare.

Self actualisation

Now say we’ve had a leap in innovation and energy through Transformative AI (TAI) and we’ve also reached a post scarcity world. What happens now? Humans have had all their basic needs met, most jobs are automated, but what do people spend their time actually doing?

Maslow’s Hierarchy

Maslow’s hierachy of needs is a framework of understanding human needs and drivers for human behaviour. Maslow suggested that in most scenarios people need to mostly satisfy one level before being able to focus on higher-level needs.

The top level of the hierachy is self-actualisation. The peak of human experience is something that few can currently reach - but maybe everyone could get there.

There is a possible path the world takes in which all humans can reach self-actualisation. With increases in technology & wealth, such as with TAI and a Universal Basic Income (UBI), we would be able to provide the basic needs of food, water, shelter, and clothing for all humans, enabling people to easily meet their basic needs. Humans can now spend more time on the things they want, for example moving up through Maslow’s hierarchy to focusing on increasing love and belonging, self-esteem and self-actualization.

Say you are in a post scarcity world, what would you do if you didn’t have to work?

Would you be spending time with loved ones, engaging in social activities that provide a sense of connection and belonging, self-esteem? Would it be honing your craft and becoming an expert in a particular field? Or would you spend the whole time scrolling on your phone?

Say hypothetically a wealthy billionaire gave you a grant to work on anything you wanted, would you be happy with having the complete freedom to spend your time as you wished?

Often people assume that others will be unhappy with this world, but would you? There is a cognitive bias where people tend to judge themselves as happier than their peers, which could nudge you to think people would be less happy in this world, even if you would enjoy this.

In this post-scarcity world, humans could spend more time on creative pursuits such as art, music, and any other hobbies – not with the goal of making money, but to reach self-actualisation.

With AI being better than humans in every dimension, AI can produce the best art in the world, but there is intrinsic value in honing your craft, improving at art or expressing your feelings through it, in and of itself. The vast majority of art is not created to be the best art in the world but for the journey itself. A child that paints a finger painting and the parent who puts it on the wall does not think “my child’s art is better than Van Gogh’s”. Instead, they feel a sense of excitement about the progress their child has made and the creative expression the child has produced.

Another example is the Olympic games. Nobody needs to win the olympic games to survive, but it lets people express pride in their country, hone their craft, attain status, and so on. But the actual task is just a game, a social construct. More and more tasks will look like social constructs and games we create to challenge each other.

Examples of post-scarcity scenes

Since this is quite theoretical, let's consider examples where we’ve had “post-scarcity” microcosms to explore.

The French Bourgeoisie

The French leisure class, or bourgeoisie, were a class of wealthy elite that emerged in 16th century France. Many had enough money to pursue endeavours like refining their taste in arts and culture. Salon culture was a cornerstone of bourgeoisie social life. Gatherings featuring discussions on literature, art, politics and philosophy.

Upper Class in the Victorian Era

The upper class in the Victorian era enjoyed a variety of leisure activities that reflected their wealth, status and values. They attended social events and balls, fox hunting and other sports, theater and opera, art and literature, travel, tea parties and social visits, gardening and horticulture, charitable work and philanthropy. Several undertook serious pursuits in science or art.

Burning Man

Burning Man is an annual festival where people take all the basic things you need with you for a week of living in the desert:food, water, shelter. People have a week to create a new community or city that is a temporary microcosm of a post-scarcity world. They pursue artistic endeavours and creative expression, music, dance and connecting with others. People often talk about Burning Man events being some of the best experiences of their lives.

Successful Startup Founders in The Bay Area

In San Francisco, there is a crossover with hippie culture and tech, and many people with excess wealth and resources, resulting in many looking for more in life. They try to reach self actualisation, by pursuing many arts and creative pursuits. Hippie movements often encourage communal living, and a sense of connection with those around you. Many may raise eyebrows at the lifestyles of some such people, but it’s hard to claim that it’s a fundamentally bad existence.

More pessimistic views about humans?

It is true that not all cultural tendencies in a post-scarcity world would be positive. In particular, humans have a remarkable ability to have extremely tough and all-consuming social status games, seemingly especially in environments where other needs are met. See for example this book review about the cut-throat social scene of upper-class Manhattan women or this one about the bland sameness and wastefulness of nightlife, or this book review that ends up concluding that the trajectory of human social evolution is one long arc from prehistoric gossip traps to internet gossip traps, with liberal institutions just a passing phase.

But the liberal humanist attitude here is to let humans be humans. Yes, they will have petty dramas and competitions, but if that is what they want, who is to tell them no? And they will also have joy and love.

Would a post-scarcity world have meaning? Adversity is one of the greatest sources of meaning. Consider D-Day, when hundreds of thousands of soldiers got together to charge up a beach under machine-gun fire to liberate a continent from Nazi rule. Or consider a poor parent of four working three jobs to make ends meet. There are few greater sources of meaning. But adversity can be meaningful while involving less suffering and loss. A good future will be shallower, in a sense, but that is a good thing.

Finally, it is unclear if we would get a happy world, even if we had the technology for post-scarcity, because of politics and conflict. We will discuss this later.

Radical improvements

AI might also help with radical but necessary improvements to the human condition.

People die. It is a moral tragedy when people are forced to die against their will, as happens to over 50 million people per year. Medicine is making progress against many causes of death and disability; in the limit it can cure all of them. We should reach that limit as fast as possible, and AI can likely help accelerate the research and deployment of solutions.

One of the greatest inequalities in the world is inequality in intelligence. Some people struggle to perform in simple jobs, while others (well, at least one) are John von Neumann. In the short term, AI might help by making cognitively demanding tasks more accessible to people through AI tutors and AI copilots. In the longer term, AI might help us enhance human intelligence, through brain-AI integration or new medical technology.

Reasons to worry

Though there are many potential upsides for AI and AGI as argued in this post, that doesn’t mean there aren’t risks.

The plausible risks of AI go all the way to human extinction, meaning this shouldn’t be taken lightly. Since this piece is focused on the upside risk, not the downside risk, we will not argue this point in depth, but it is worth revisiting briefly.

Existential risk from AI is a serious concern

It is intuitive that AI is risky.

First, creating something smarter, faster, and more capable than humans is obviously risky, since you need to very precisely either control it (i.e. stop it from doing things you don’t like) or align it (i.e. make it always try to do what you would want it to do). Both the control and alignment problem for AIs still have unsolved technical challenges. And that’s assuming that AI is in the right hands.

Second, even if the AIs remain in our control, they are likely to be as transformative as the industrial revolution. Eighteenth-century European monarchs would’ve found it hard to imagine how the steam engine could challenge their power, but the social changes that were in part a result of them eventually wrested all their powers away. In the modern world, a lot of power depends on large educated workforces of humans, whereas sufficiently strong AGI might decorrelate power and humans, decreasing the incentive to have people be educated and prosperous - or to have people around at all.

Apart from object-level arguments, consider too the seriousness with which the AI doomsday is discussed. Many top researchers and all top AI lab CEOs have signed a statement saying “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war”. Nuclear war and pandemics are the only other cases where similarly serious predictions have been made by a similarly serious set of people (though arguably climate change is close: the science on the effects is more established and certain, but while catastrophe is more likely, literal human extinction from it is much less likely).

Side-effects of non-existentially-bad AI might be large

Consider the internet, a widely-successful technology with a lot of benefits. There are credible claims that the internet is responsible for harms ranging from massively increased depression rates among teenagers to political polarisation to widespread productivity loss through addiction and distraction.

In the same way, the success of AI might lead to bad side effects, even if all the existential risks are avoided.

For example, AI could replace human connection. Human friends and partners might increasingly be replaced with AIs. However bad it was in other ways, at least on pre-AI social media you at least interacted with humans (or simple algorithms), but with AIs it’s possible to have what looks like deep emotional relationships. Just look at the Replika subreddit from a year ago when they changed the algorithm to only allow “PG-rated interactions”. Many users were upset. The film “Her” doesn’t seem far off, as Sam Altman acknowledges. Such relationships give the human much more safety and control than in human relationships, which might both be very attractive to humans, while also excessively coddling them. Given that much human happiness and meaning comes from human relationships and bonding, widespread AI substitution of them could mean the destruction of a large part of all human wellbeing and meaning in the world. On a more prosaic level, society might atomise into individuals hoarding compute credits to spend on running their AI companions without connecting with other humans, with severe effects on society’s functioning, or humans might stop having children and human populations might crash. Humanity has flourished through collaboration and socialisation. If we use AIs to replace this in an overly thoughtless way, the fabric of society could crumble.

Apart from being superhuman at forming relationships with humans, AIs might be superhuman at persuasion. We can imagine AIs producing the vast majority of content that people consume. We can imagine a totalitarian world where the governments with the greatest compute resources can dominate the conversation forever. Instead of humans having ideas and sometimes persuading other humans to adopt them, driving social progress, any human-generated ideas might be swamped by a greater quantity of superhumanly persuasive counter-arguments that support the status quo. We can also imagine a dystopian decentralised world. Already, many online memes (in Dawkins’s original sense of the word) are maladaptive, spreading not by having good effects on their hosts but by being incredibly good at spreading from person to person. AI might make us much better at searching the space of ideas for the most viral ones. Ideas that aren’t maximally viral might be outcompeted. Eventually, our institutions could become mere puppets that serve as viral hosts for the most transmissive memes, as part of an endless tug-of-war where AI-generated memes compete to compel humans to spread them.

Seems bad.

Not good nor bad, but some third thing.

Many debates turn into mood affiliation debates. Are guns bad? Is more government good? But remember: politics is the mindkiller. Navigating a complicated world requires more than the ability to stick the label “good” or “bad” on entire domains. If you were seated in the control room of a nuclear power station, you wouldn’t ask yourself: uranium, good or bad? Instead, you want to steer towards the small set of states where the reaction is perched between dying out and exploding, while generating useful clean power.

We’ve also seen again and again that technology and social change have strong effects on each other, and these are often hard to predict. We’ve discussed how industrial technology may have led to democracy. There is serious academic debate about whether the stirrup caused feudalism, or whether the Black Death was a driver of European liberalism, or whether social media was a significant cause of the Arab Spring. The birth control pill was a major influence of the sexual revolution, and the printing press helped the Protestant Reformation. Often, the consequences of a new technology are some obvious direct benefits, some obvious direct harms, and the shifting of some vast social equilibrium that ends up forever reshaping the world in some way no one saw coming. So far we’ve clearly ended up ahead on net, and maybe that will continue.

Humanity has spent over a hundred thousand years riding a feedback loop of accumulating cultural evolution. Over the past few hundred, the industrial revolution boosted the technological progress feedback loop. Human wellbeing has skyrocketed, though along the way we’ve had - and are continuing to have - close calls with nuclear war, totalitarianism, and environmental issues. We’ve had a healthy dose of luck, including in generalities like the incentive structures of industrial economics and specifics like the heroism of Stanislav Petrov. But we’ve also had an enormous amount of human effort and ingenuity spent on trying to chart a good path for civilization, from solar panel subsidies to the Allies winning World War 2.

For most of this time, the direction of the arrow of progress has been obvious. The miseries of poverty and the horrors of close-up totalitarianism are very powerful driving forces after all. And while both continue ravaging the world, developed countries have in many ways gotten complacent. There are fewer obvious areas of improvement for those lucky enough to enjoy a life of affluence in the developed world. But the future could be much better still.

Know where to aim

We think it’s important to have a target of what to aim for. We need to dream about the future we want. A strong culture needs a story of what it is driving towards, and humanity needs a compelling vision of how our future turns out well so we can work together to create the future we all want. AI seems like the biggest upcoming opportunity and risk. We hope we can avoid the risks, and realise the positive vision presented here, together with a hundred other things we can’t yet imagine.

See LessWrong for additional comments & discussion.

A model of research skill

Rudolf Laine — Mon, 08 Jan 2024 00:02:00 GMT

~4k words (20 minutes)

Doing research means answering questions no one yet knows the answer to. Lots of impactful projects are downstream of being good at this. A good first step is to have a model for what the hard parts of research skill are.

Two failure modes

There are two opposing failure modes you can fall into when thinking about research skill.

The first is the deferential one. Research skill is this amorphous complicated things, so the only way to be sure you have it is to spend years developing it within some ossified ancient bureaucracy and then have someone in a funny hat hand you a piece of paper (bonus points for Latin being involved).

The second is the hubristic one. You want to do, say, AI alignment research. This involves thinking hard, maybe writing some code, maybe doing some maths, and then writing up your results. You’re good at thinking - after all, you read the Sequences, like, 1.5 times. You can code. You did a STEM undergrad. And writing? Pffft, you’ve been doing that since kindergarten!

I think there’s a lot to be said for hubris. Skills can often be learned well by colliding hard with reality in unstructured ways. Good coders are famously often self-taught. The venture capitalists who thought that management experience and a solid business background are needed to build a billion-dollar company are now mostly extinct.

It’s less clear that research works like this, though. I’ve often heard it said that it’s rare for a researcher to do great work without having been mentored by someone who was themselves a great researcher. Exceptions exist and I’m sceptical that any good statistics exist on this point. However, this is the sort of hearsay an aspiring researcher should pay attention to. It also seems like the feedback signal in research is worse than in programming or startups, which makes it harder to learn.

Methodology, except “methodology” is too fancy a word

To answer this question, and steer between deferential confusion and hubristic over-simplicity, I interviewed people who had done good research to try to understand their models of research skill. I also read a lot of blog posts. Specifically, I wanted to understand what about research a bright, agentic, technical person trying to learn at high speed would likely fail at and either not realise or not be able to fix quickly.

I did structured interviews with Neel Nanda (Google DeepMind; grokking), Lauro Langosco (Krueger Lab; goal misgeneralisation), and one other. I also learned a lot from unstructured conversations with Ferenc Huszar, Dmitrii Krasheninnikov, Sören Mindermann, Owain Evans, and several others. I then ~~~procrastinated on this project for 6 months~~~ touched grass and formed inside views by doing the MATS research program under the mentorship of Owain Evans. I owe a lot to the people I spoke to and their willingness to give their time and takes, but my interpretation and model should not taken as one they would necessarily endorse.

My own first-hand research experience consists mainly of a research-oriented CS (i.e. ML) master’s degree, followed by working as a full-time researcher for 6 months and counting. There are many who have better inside views than I do on this topic.

The Big Three

In summary:

There are a lot of ways reality could be (i.e. hypotheses), and a lot of possible experiment designs. You want to avoid brute-forcing your way through these large spaces as much as possible, and instead be good at picking likely-true hypotheses or informative experiments. Being good at this is called research taste, and it’s largely an intuitive thing that develops over a lot of time spent engaging with a field.
Once you have some bits of evidence from your experiment, it’s easy to over-interpret them (perhaps you interpret them as more bits than they actually are, or perhaps you were failing to consider how large hypothesis space is to start with). To counteract this, you need sufficient paranoia about your results, which mainly just takes careful and creative thought, and good epistemics.
Finally, you need to communicate your results to transfer those bits of evidence into other people’s heads, because we live in a society.

Taste

Empirically, it seems that a lot of the value of senior researchers is a better sense of which questions are important to tackle, and better judgement for what angles of attack will work. For example, good PhD students often say that even if they’re generally as technically competent as their adviser and read a lot of papers, their adviser has much better quick judgements about whether something is a promising direction.

When I was working on my master’s thesis, I had several moments where I was working through some maths and get stuck. I’d go to one of my supervisors, a PhD student, and they’d have some ideas on angles of attack that I hadn’t thought of. We’d work on it for an hour and make more progress than I had in several hours on my own. Then I’d go to another one of my supervisors, a professor, and in fifteen minutes they’d have tried something that worked. Part of this is experience making you faster at crunching through derivations, and knowing things like helpful identities or methods. But the biggest difference seemed to be a good gut feeling for what the most promising angle or next step is.

I think the fundamental driver of this effect is dealing with large spaces: there are many possible ways reality could be (John Wentworth talks about this here), and many possible things you could try, and even being slightly better at honing in on the right things helps a lot. Let’s say you’re trying to prove a theorem that takes 4 steps to prove. If you have a 80% chance of picking the right move at each step, you’ll have a 41% chance of success per attempt. If that chance is 60%, you’ll have a 13% chance – over 3 times less. If you’re trying to find the right hypothesis within some hypothesis space, and you’ve already managed to cut down the entropy of your probability distribution over hypotheses to 10 bits, you’ll be able to narrow down to the correct hypothesis faster and with fewer bits than someone whose entropy is 15 bits (and who’s search space is therefore effectively 2⁵ = 32 times as large). Of course, you’re rarely chasing down just a single hypothesis in a defined hypothesis class. But if you’re constantly 5 extra bits of evidence ahead compared to someone in what you’ve incorporated into your beliefs, you’ll make weirdly accurate guesses from their perspective.

Why does research taste seem to correlate so strongly with experience? I think it’s because the bottleneck is seeing and integrating evidence into your (both explicit and intuitive) world models. No one is close to having integrated all empirical evidence that exists, and new evidence keeps accumulating, so returns from reading and seeing more keep going. (In addition to literal experiments, I count things like “doing a thousand maths problems in this area of maths” as “empirical” evidence for your intuitions about which approaches work; I assume this gets distilled into half-conscious intuitions that your brain can then use when faced with similar problems in the future)

This suggests that the way to speed-run getting research taste is to see lots of evidence about research ideas failing or succeeding. To do this, you could:

Have your own research ideas, and run experiments to test them. The feedback quality is theoretically ideal, since reality does not lie (but may be constrained by what experiments you can realistically run, and a lack of the paranoia that I talk about next). The main disadvantage is that this is often slow and/or expensive.
Read papers to see whether other people’s research ideas succeeded or failed. This is prone to several problems:
1. Biases: in theory, published papers are drawn from the set of ideas that ended up working, so you might not see negative samples (which is bad for learning). In practice, paper creation and selection processes are imperfect, so you might see lots of bad or poorly-communicated ones.
2. Passivity: it’s easy to fool yourself into thinking you would’ve guessed the paper ideas beforehand. Active reading strategies could help; for example, read only the paper’s motivation section and write down what experiment you’d design to test it, and then read only the methodology section and write down a guess about the results.
Ask someone more experienced than you to rate your ideas. A mentor’s feedback is not as good as reality’s, but you can get it a lot faster (at least in theory). The speed up is huge: a big ML experiment might take a month to set up and run, but you can probably get detailed feedback on 10 ideas in an hour of conversation. This is a ~7000x speedup. I suspect a lot of the value of research mentoring lies here: an enormous amount of predictable failures or inefficiently targeted ideas can be skipped or honed into better ones, before you spend time running the expensive test of actually checking with reality. (If true, this would imply that the value of research mentorship is higher whenever feedback loops are worse.)

Chris Olah has a list of suggestions for research taste exercises (number 1 is essentially the last point on my list above).

Research taste takes the most time to develop, and seems to explain the largest part of the performance gap between junior and senior researchers. It is therefore the single most important thing to focus on developing.

(If taste is so important, why does research output not increase monotonically with age in STEM fields? The scary biological explanation is that fluid intelligence (or energy or …) starts dropping at some age, and this decreases your ability to execute on maths/code, even assuming your research taste is constant or improving. Alternatively, hours used on deep technical work might tend to decline with advanced career stages.)

Paranoia

I heard several people saying that junior researchers will sometimes jump to conclusions, or interpret their evidence as saying more than it actually does. My instinctive reaction to this is: “wait, but surely if you just creatively brainstorm the ways the evidence might be misleading, and take these into account in making your conclusions (or are industrious about running additional experiments to check them), you can just avoid this failure mode?” The average answer I got was that yes, this seems true, and indeed many people either only need one peer review cycle to internalise this mindset, or pretty much get it from the start. Therefore, I’m almost tempted to chuck this category off this list, and onto the list of less crucial things where “be generally competent and strategic” will sort you out in a reasonable amount of time. However, two things hold me back.

First, confirmation bias is a strong thing, and it seems helpful to wave a big red sign saying “WARNING: you may be about to experience confirmation bias”.

Second, I think this is one of the cases where the level of paranoia required is sometimes more than you expect, even after you expect it will be high. John Wentworth puts this best in You Are Not Measuring What You Think You Are Measuring, which you should go read right now. There are more confounders and weird effects than are dreamt of in your philosophies.

A few people mentioned going through the peer review process as being a particularly helpful thing for developing paranoia.

Communication

I started out sceptical about the difficulty of research-specific communication, above and beyond general good writing. However, I was eventually persuaded that yes, research-specific communication skills exist and are important.

First, if research has impact, it is through communication. Rob Miles once said (at a talk) something along the lines of: “if you’re trying to ensure positive AGI outcomes through technical work, and you think that you are not going to be one of the people who literally writes the code for it or is in the room when it’s turned on, your path to impact lies through telling other people about your technical ideas.” (This generalises: if you want to drive good policy through your research and you’re not literally writing it …, etc.) So you should expect good communication to be a force multiplier applied on top of everything else, and therefore very important.

Secondly, research is often not communicated well. On the smaller scale, Steven Pinker moans endlessly – and with good reason – about academic prose (my particular pet peeve is the endemic utilisation of the word “utilise” in ML papers.). On the larger scale, entire research agendas can get ignored because the key ideas aren’t communicated in a sufficiently clear and legible way.

I don’t know what’s the best way to speed-run getting good at research communication. Maybe read Pinker to make sure you’re not making predictable mistakes in general writing. I’ve heard that experienced researchers are often good at writing papers, so maybe seek feedback from any you know (but don’t internalise the things they say that are about goodharting for paper acceptance). With papers, understand how papers are read. Some sources of research-specific communication difficulty I can see are (a) the unusually high need for precision (especially in papers), and (b) communicating the intuitive, high-context, and often unverbalised-by-default world models that guide your research taste (especially when talking about research agendas).

Other points

Having a research problem is not enough. You need an angle of attack.
- Richard Feynman once said something like: keep a set of open problems in your head. Whenever you discover a new tool (e.g. a new method), run through this list of problems and see if you can apply it. I think this can also be extended to new facts; whenever you hear about a discovery, run through a list of open questions and see how you should update.
- Hamming says something similar in You and your research: “Most great scientists know many important problems. They have something between 10 and 20 important problems for which they are looking for an attack.”
Research requires a large combination of things to go right. Often, someone will be good at a few of them but not all of them.
- A sample list might be:
  - generating good ideas
  - picking good ideas (= research taste)
  - iterate rapidly to get empirical feedback
  - interpreting your results right (paranoia)
  - communicating your findings
- If success is a product of either sufficiently many variables or of normally distributed variables, the distribution of success should be log-normal, and therefore fairly heavy-tailed. And yes, research is heavy-tailed. Dan Hendrycks and Thomas Woodside claim that while there may be 10x engineers, there are 1000x researchers. This seems true.
  - However, this also means that not being the best at one of the component skills does not doom your ability to still have a really good product across categories.
Ideas from other fields are often worth stealing. There exist standardised pipelines to produce people who are experts in X for many different X, but far less so to produce people who are experts in both X and some other Y. Expect many people in X to miss out on ideas in Y (though remember that not all Y are relevant).
Research involves infrequent and uncertain feedback. Motivation is important and can be hard. Grad students are notorious for having bad mental health. A big chunk of this is due to the insanities of academia rather than research itself. However, startups are somewhat analogous to research (high-risk, difficult, often ambiguous structure), lack institutionalised insanity, and are also acknowledged to be mentally tough.
- The most powerful and universally-applicable hack to make something not suck for a human is for that human to do it together with other humans. Also, more humans = more brains.
Getting new research ideas is often not a particularly big-brained process. Once I had the impression that most research ideas would come from explicitly thinking hard about research ideas, and generating fancy ideas would be a major bottleneck. However, I’ve found that many ideas come with surprisingly little effort, with a feeling of “well, if I want X, the type of thing I should do is probably Y”. Whiteboarding with other people is also great.
- This is not to say that idea generation isn’t helped by actively brainstorming hard. Just that it’s not the only, or even majority, source of ideas.
- The feeling of ideas being rare is often a newbie phase. You should (and very likely will) pass over it quickly if you’re engaging with a field. John Wentworth has a good post on the topic. I have personally experienced an increase in concrete research ideas, and much greater willingness to discard ideas, after going through a few I’ve felt excited by.
- When you look at a field from afar, you see a smooth shape of big topics and abstractions. This makes it easy to feel that everything is done. Once you’re actually at the frontier, you invariably discover that it’s full of holes, with many simple questions that don’t have answers.
There’s great benefit to an idea being the top thing in your mind.
When in doubt, log more. Easily being able to run more analyses is good. At some point you will think to yourself something like “huh, I wonder if thing X13 had an effect, I’ll run the statistics”, and then either thank yourself because you logged the value of X13 in your experiments, or facepalm because you didn’t.
Tolerate the appearance of stupidity (in yourself and others). Research is an intellectual domain, and humans are status-obsessed monkeys. Humans doing research therefore often feel like they need to appear smart. This can lead to a type of wishful thinking where you hear some idea and try to delude yourself (and others) into thinking you understand it immediately, without actually knowing how it bottoms out into concrete things. Remember that any valid idea or chain of reasoning decomposes into simple pieces. Allow yourself to think about the simple things, and ask questions about them.
- There is an anecdote about Niels Bohr (related by George Gamow and quoted here): “Many a time, a visiting young physicist (most physicists visiting Copenhagen were young) would deliver a brilliant talk about his recent calculations on some intricate problem of the quantum theory. Everybody in the audience would understand the argument quite clearly, but Bohr wouldn’t. So everybody would start to explain to Bohr the simple point he had missed, and in the resulting turmoil everybody would stop understanding anything. Finally, after a considerable period of time, Bohr would begin to understand, and it would turn out that what he understood about the problem presented by the visitor was quite different from what the visitor meant, and was correct, while the visitor’s interpretation was wrong.”
“Real ~~artists~~ researchers ship”. Like in anything else, iteration speed really matters.
- Sometimes high iteration speed means schlepping. You should not hesitate to schlep. The deep learning revolution started when some people wrote a lot of low-level CUDA code to get a neural network to run on a GPU. I once reflected on why my experiments were going slower than I hoped, and realised a mental ick for hacky code was making me go about things in a complex roundabout way. I spent a few hours writing ugly code in Jupyter notebooks, got results, and moved on. Researchers are notorious for writing bad code, but there are reasons (apart from laziness and lack of experience) why the style of researcher code is sometimes different from standards of good software.
- The most important thing is doing informative things that make you collide with reality at a high rate, but being even slightly strategic will give great improvements on even that. Jacob Steinhardt gives good advice about this in Research as a Stochastic Decision Process. In particular, start with the thing that is most informative per unit time (rather than e.g. the easiest to do).

Good things to read on research skill

(I have already linked to some of these above.)

General advice on research from experienced researchers
- You and Your Research (Richard Hamming – old but still unbeaten. Hamming also has a book that includes this lecture among other material, but the lecture is the best bit of it and a good 80/20.)
- Career advice (Terry Tao)
- Research as a Stochastic Decision Process (Jacob Steinhardt)
- My research methodology (Paul Christiano)
- An Opinionated Guide to ML Research (John Schulman)
- PhD: a retrospective analysis (Eugene Vinitsky)
John Wentworth’s posts about specific research meta-topics
Relevant Paul Graham essays
- The Top Idea in Your Mind
- How to do Great Work
Advice aimed at new alignment researchers
A Bird’s Eye View of the ML Field (a good overview of how the ML field works)
The importance of stupidity in scientific research (short and sweet)
Research Taste Exercises (what is says on the tin)

A Disneyland Without Children

Rudolf Laine — Sun, 04 Jun 2023 12:57:00 GMT

The spaceship swung into orbit around the blue-grey planet with a final burn of its engines. Compared to the distance they had travelled, the world, now only some four hundred kilometres below and filling up one hemisphere of the sky, was practically within reach. But Alice was no less confused.

“Well?” she asked.

Charlie stared thoughtfully at the world slowly rotating underneath their feet, oceans glinting in the sunlight. “It looks lickable”, he said.

“We have a task”, Alice said, trying to sound gentle. Spaceflight was hard. Organic life was not designed for it. But their mission was critical, they needed to move fast, and Charlie, for all his quirks, would need to be focused.

“What’s a few minutes when it will take years for anything we discover to be known back home?” Charlie asked.

“No licking”, Alice said.

Charlie rolled his eyes, then refocused them on the surface of the planet below. They were just crossing the coast of one of the larger continents. Blue water was giving way to grey land.

“Look at the texture”, Charlie said. They had seen it from far away with telescopes, but there was something different about seeing it with their bare eyes. Most of the land surface of the planet was like a rug of fine grey mesh. If there had been lights, Alice would have guessed the entire planet’s land was one sprawling city, but as far as their instruments could tell, the world had no artificial lighting.

As far as they could tell, the world also had no radio. They had broadcast messages at every frequency they could, and in desperation even by using their engines to flash a message during their deceleration burn. No response had come.

Alice pulled up one of the telescope feeds on the computer to look closer at the surface. She saw grey rectangular slabs, typically several hundred metres on a side, with wide roads running between them. The pattern was not perfect - sometimes it was irregular, and sometimes there were smaller features too. Some of the smaller ones moved.

“Are they factories?” Charlie asked.

“I’d guess so”, Alice said, watching on the telescope feed as a steady stream of rectangular moving objects, each about ten metres long, slid along a street. Another such stream was moving along an intersecting street, and it looked like they would crash at the intersection, but the timing and spacing was such that vehicles from one stream crossed the road just as there were gaps in vehicles along the other stream.

“A planet covered by factories, then”, Charlie said. “With no one home to turn the lights on.”

“I want to see what they’re making”, Alice said.

-

All through the atmospheric entry of their first drone package, Alice sat tight in her seat and clenched and unclenched her hands. So far all they had done was passive observation or broadcasting. A chunky piece of hardware tracing a streak of red-hot plasma behind it was a much louder knock. She imagined alien jet fighters scrambling to destroy their drones, and some space defence mechanism activating to burn their ship.

The image she saw was a jittery camera feed, showing the black back of the heatshield, the grey skin of the drone package, and a sliver of blue sky. It shook violently as the two halves of the heatshield detached from each other and then the drone package, tumbling off in opposite directions. Land became visible, kilometres below, the grey blocks of the buildings tiny like children’s blocks but still visibly three-dimensional, casting shadows and moving as the drone package continued falling.

The three drones tested their engines, and for a moment flew - or at least slowed their descent - in an ungainly joint configuration, before breaking off from each other and spreading their wings to the fullest. The feed showed the other two drones veering off into the distance on wide narrow wings, and then the view pulled up as the nose of the drone lifted from near-vertical to horizontal.

“Oops, looks like we have company”, Charlie said. He had been tapping away at some other screens while Alice watched the drone deployment sequence.

Alice jumped up from her seat. “What?”

“Our company is … a self-referential joke!”

Alice resisted the temptation to say anything and instead sunk back into her seat. On her monitor, the grey blocks continued slowly moving below the drone. She tapped her foot against the ground.

“Actually though”, Charlie said. “We’re not the only ones in orbit around this planet.”

“What else is orbiting? Has your sense of shame finally caught up with you and joined us?”

“Looks like satellites. Far above us, though. Can you guess how far?”

“I’d guess approximately the distance between you and maturity, so … five light-years?”

Charlie ignored her. “Exactly geostationary altitude”, he said, grinning. The grin was like some platonic ideal of intellectual excitement; too pure for Alice’s annoyance to stay with her, or for her to feel scared about the implications.

“But nothing in lower orbits?” Alice asked.

“No”, Charlie said. “Someone clearly put them there; stuff doesn’t end up at exactly geostationary altitude unless someone deliberately flies a communications or GPS satellite there. Now I can’t be entirely sure that the geostationary satellites are completely dead, but I’d guess that they are.”

“Like everything else”, Alice said, but even as she said so she caught sight of a long trail of vehicles making its way along one of the roads. There was something more real about seeing them on the drone feed.

“Maybe this is just a mining outpost”, Charlie said. “Big rocket launch to blast out a billion tons of ore to god-knows-where, once a year.”

“Or maybe they’re hiding underground or in the oceans”, Alice said.

“Let’s get one of the drones to drop a probe into the oceans. I’ll send one of our initial trio over to the nearest one, it’s only a few hundred kilometres away”, Charlie said.

“Sure”, Alice said.

They split the work of flying the drones, two of them mapping out more and more of the Great Grey Grid (as Alice took to calling it in her head), and one flying over the planet’s largest ocean.

Even the oceans were mostly a barren grey waste. Not empty, though. They did eventually see a few small scaly fish-like creatures that stared at their environment with uncomprehending eyes. Alien life. A young Alice would have been ecstatic. But now she was on a mission, and her inability to figure out what had happened on this planet annoyed her.

In addition to the ocean probe, they had rovers they could send crawling along the ground. Sometimes the doors of the square buildings were open, and Alice would drive a rover past one opening. Most seemed to either be warehouses of stacked crates, or then there would be some kind of automated assembly line of skeletal grey robot arms and moving conveyor belts. A few seemed to place more barriers between the open air and their contents; what went on there, the rovers did not see.

The first time Alice tried to steer a rover into a building, it got run over by a departing convoy of vehicles. The vehicles were rectangular in shape but with an aerodynamic head, with three wheels on each side. Based on their dimensions, she could easily imagine one weighing ten or twenty tons. The rover had no chance.

“Finally!” Charlie had said. “We get to fight these aliens.”

But there was no fight. It seemed like it had been a pure accident, without any hint of malice. The grey vehicles moved and stopped on some schedule of their own, and for all Alice knew they were not just insensitive beasts but blind and dumb ones too.

The next rover got in, quickly scooting through the side of the entrance and then off to one side, out of path of the grey vehicles. It wandered the building on its own, headlights turned on in the otherwise-dark building to bring back a video stream of an assembly line brooded over by those same skeletal hands they had glimpsed from outside. Black plastic beads came in by the million on the grey vehicles. A small thin arm with a spike on the end punctured a few holes on one side, and using these holes two of the black beads were sown onto an amorphous plushy shape. The shape got appendages, were covered with a layer of fluff, and the entire thing became a cheerful purple when it passed through an opaque box with pipes leading into it. It looked like a child’s impression of a hairy four-legged creature with black beady eyes above a long snout. A toy, but for who?

The conveyor belt took an endless line of those fake creatures past the rover’s camera at the end of the assembly line. Alice watched them go, one by one, and fall onto the open back of a grey vehicle. It felt like each and every one made eye contact with her, beady black eyes glinting in the light. She watched for a long time as the vehicle filled up. Once it did, a panel slid over the open top to close the cargo bay, and it sped off out the door. The conveyor belt kept running, but there was a gap of a few metres to the next plushy toy. It came closer and closer to the end - and suddenly a vehicle was driving into place, and the next creature was falling, and it just barely fell into the storage hold of the vehicle while it was driving into place.

“How scary do you find the Blight?” Alice asked.

“Scary enough that I volunteered for this mission”, Charlie said.

Alice remembered the charts they had been shown. They had been hard to miss; even the news, usually full of celebrity gossip and political machinations, had quickly switched to concentrating on the weirdness in the sky once the astronomers spotted it. Starlight dimming in many star systems and what remained of the the light spectra shifting towards the infrared. Draw a barrier around the affected area, and you get a sphere 30 light-years wide, expanding at a third of the speed of light. At the epicentre, a world that had shown all the signs of intelligent life that could be detected from hundreds of light-years away - a world that astronomers had broadcast signals to in the hopes of finally making contact with another civilisation - that had suddenly gone quiet and experienced a total loss of oxygen in its atmosphere. The Blight, they had called it.

In the following years, civilisation had mobilised. A hundred projects had sprung forth. One of them: go investigate the star system that was the second-best candidate for intelligent life, but had refused to answer radio signals, and see if someone was there to help. That was why they were here.

“I think I found something as scary as the Blight”, Alice said. “Come look at this.”

The purple creatures kept parading past the camera feed

-

Over the next five days, while the Blight advanced another forty billion kilometres towards everything they loved back home, Alice and Charlie were busy compiling a shopping catalogue.

“Computers”, Alice said. “Of every kind. A hundred varieties of phones, tablets, laptops, smartwatches, smartglasses, smart-everything.”

“Diamonds and what seems to be jewellery”, Charlie said.

“Millions of tons of every ore and mineral.” They had used their telescopes on what seemed to be a big mine, but they had barely needed them. It was like a huge gash in the flesh of a grey-fleshed and grey-blooded giant, complete with roads that looked like sutures. There were white spots in the image, tiny compared to the mine, each one a sizeable cloud.

“Clothes”, Charlie continued. “Lots and lots of clothes of different varieties. They seem to be shipped around warehouses until they’re recycled.”

“Cars. Sleek electric cars by the million. But we never see them used on the roads, though there are huge buildings were brand-new cars are recycled. And airplanes, including supersonic ones.”

“A lot of things that look like server farms”, Charlie said. “Including ones underwater and on the poles. There’s an enormous amount of compute in this world. Like, mind-boggling. I was thinking we should figure out how to plug into all of it and mine some crypt-”

“Ships with nuclear fusion reactors”, Alice interrupted. There were steady trails of them cutting shortest-path routes between points on the coast.

“Solar panels”, Charlie said. “Basically every spare surface. The building roofs are all covered with solar panels.”

“And children’s plush toys”, Alice said.

They were silent for a while.

“We have a decent idea of what these aliens looked like”, Alice said. “They were organic carbon-based lifeforms, like us. Similar in size too, also bipedal. And it’s like they left some ghostly satanic industrial amusement park running, going through all the motions in their absence, and disappeared.”

“And they didn’t go to space, as far as we know”, Charlie said.

“At least we don’t have any more Blights to worry about then”, Alice said. “I can’t help but imagining that the Blight is something like this. Something that just tiles planets with a Great Grey Grid, does something even worse to the stars, and then moves on.”

“They had space technology, but apparently whoever built the Great Grey Grid didn’t fancy it”, Charlie said. “The satellites might predate it. Probably there were satellites in lower orbits too, but their orbits decayed and they fell down, so we only see the geostationary ones up high.”

“And then what?” Alice said. “All of them vanished into thin air and left behind a highly-automated ghost-town?”

Charlie shrugged.

“Can we plug ourselves into their computers?” Alice asked.

“To mine cr-?”

“To see if anyone’s talking.”

Charlie groaned. “You can’t just plug yourself into a communication system and see anything except encrypted random-looking noise.”

“How do you know they encrypt anything?”

“It would be stupid not to”, Charlie said.

“It would be stupid to blind yourself to the rest of the universe and manufacture a billion plush toys”, Alice said.

“Seems like it will work for them until the Blight arrives.”

-

Alice floated in the middle of the central corridor of the ship. The ship was called Legacy, but even before launch they had taken to calling it “Leggy” for short. The central corridor linked the workstation at the front of the ship where they spent most of their days to the storage bay at the back. In the middle of the corridor, three doors at 120-degree angles from each other lead to the small sleeping rooms, each of them little more than a closet.

Alice had woken up only a few minutes ago, and still felt an early-morning grogginess as well as the pull of her bed. The corridor had no windows or video feeds, but was dimly lit by the artificial blue light from the workstation. They were currently on the night side of the planet.

She took a moment to look at the door of the third sleeping room. It was closed, like always, with its intended inhabitant wrapped in an air-tight seal of plastic in a closed compartment of the storage bay. They would flush him into space before they left for home again; they could have no excess mass on the ship for the return journey.

Alice thought again of the hectic preparations for the mission. Apart from Blightsource, this was only one planet the astronomers had spotted that might have intelligent life on it, and the indications were vague. But when you look into space and see something that looks like an approaching wall of death - well, that has a certain way of inspiring long-shots. Hence the mission, hence Legacy’s flight, hence crossing over the vast cold stretch of interstellar space to see if any answers could be found on this world. Hence Bob’s death while in cryonic suspension for the trip. Hence the hopes of all civilisation potentially resting on her and Charlie figuring valuable out something.

If Charlie and she could find something on this world, some piece of insight or some tool or weapon among the countless pieces of technological wizardry that this world had in spades, that had a credible chance against the Blight when it arrived … maybe there was hope.

Alice pushed off on the wall and set herself in a slow spinning motion. The ship seemed to revolve around her. Bob’s door revolved out of sight, and Charlie’s door became visible -

Wait.

Her gravity-bound instincts kicked in and she tried to stop the spin by shoving back with her hands, but there was nothing below her, so she remained spinning slowly. She breathed in deeply to calm herself down, then kicked out a foot against the wall to push herself to the opposite one. She grabbed one of the handles on the wall and held onto it.

The light on Charlie’s room was off. That meant it was empty.

“Charlie!” Alice called.

No response.

The fear came fast. Here she was, light-years from home, perhaps all alone on a spaceship tracing tight circles around a ghostly automated graveyard planet. The entire mass of the planet stood between her and the sun. Out between the stars, the Blight was closing in on her homeworld. She counted to calm herself down; one, two, three, … and just like that, the Blight was three hundred thousand kilometres closer to home. Unbidden, an image of the fluffy purple creature popped up in her mind, complete with its silly face and unblinking eye contact.

Soundlessly, she used the handles on the wall of the corridor to pull herself towards the workstation. She reached the door, peered inside -

There was Charlie, staring at a computer screen. He looked up and saw Alice. “You scared me!” he said. “Watch out, no need to sneak behind me so quietly.”

“I called your name”, Alice said.

“I know, I know”, Charlie said. “But I’m on to something here, and I just want to run a few more checks and then surprise you with the result.”

“What result?” Alice glanced at some of the screens. Two of the drones were above the Great Grey Grid, one above ocean. With their nuclear power source, they could stay in the air as long as they wanted. Even though their focus was no longer aerial reconnaissance, there was no reason not to keep them mapping the planet from up close, occasionally picking up things that their surveys from the ship did not.

“I fixed the electrical issues with the rover and the cable near the data centre”, Charlie said.

“So you’re getting data, not just frying our equipment?”

“Yes”, Charlie said. “And guess what?”

“What?”

“Guess!”

“You found a Blight-killer”, Alice said.

“No! Even better! These idiots don’t encrypt their data as far as I can tell. And I think a lot of it is natural language.”

“Okay, and can we figure out what it means?”

“We have automated programs for trying to derive syntax rules and so on”, Charlie said. “It’s already found something, including good guesses of which words are prepositions and what type of grammar they have. But mapping words to meaning based on purely statistics of how often they occur is hard.”

“I’ve seen products they have with pictures and instruction manuals”, Alice said. “We could start there.”

“Oh no”, Charlie said. “This is going to be a long process.”

-

By chance, it turned out not to be. Over the next day, they had sent a rover to a furniture factory and had managed, after some attempts, to steal an instruction leaflet out of a printer before the robotic arm could snatch it to be packaged with the furniture. Somehow Alice was reminded of her childhood adventures stealing fruit from the neighbour’s garden.

They had figured out which words meant “cupboard”, “hammer”, and “nail”, and so on. But then another rover on the other side of the world had seen something. It was exploring a grey and windy coast. On one side of the rover was the Great Grey Grid and the last road near the coast, the occasional vehicle hurtling down it. But on the other side was a stretch of rocky beach hammered by white-tipped waves, a small sliver of land that hadn’t been converted to grey.

The land rose by the beach, forming a small hill with jagged rocky sides. The sun shone down on one face of it, but there was a hollow, or perhaps small cave, that was left in the dark by the overhanging rock. And in the rock around this entrance, there were several unmistakable symbols scratched into the rock, each several metres high.

Alice took manual control of the rover and carefully instructed it to drive over the rocky beach towards the cave entrance. On the way it passed what seemed to be a fallen metal pole with some strips of fabric still clinging to it.

Once it was close enough to the mouth of what turned out to be a small cave, the camera could finally see inside.

There was a black cabinet inside. Not far from it, lying on the ground, was the skeleton of a creature with four slender limbs and a large head. Empty eye sockets stared out towards the sky.

Alice felt her heart beating fast. It wasn’t quite right; many of the anatomical details were off. But it was close enough, the similarity almost uncanny. Here, hundreds of light years away, evolution had taken a similar path, and produced sapience. And then killed it off.

“Charlie”, she said in a hoarse voice.

“What?” Charlie asked, sounding annoyed. He had been staring at an instruction manual for a chair, but he looked up and saw the video feed. “Oh”, he said, in a small voice. “We found them.”

Alice tore her eyes away from the skeleton and to the small black cabinet. It had a handle on it. She had the rover extend an arm and open it.

-

The capsule docked with Leggy and in the weightless environment they pushed the cabinet easily into the ship. They had only two there-and-back-again craft - getting back to orbit was hard - but they had quickly decided to use one to get this cabinet up. It had instructions, after all; very clear instructions, though ones that their rovers couldn’t quite follow.

It started from a pictographic representation, etched onto plastic cards, of how you were supposed to read the disks. They managed to build something that could read the microscopic grooves on the disk as per the instructions, and transfer the data to their computers.

After a few hours of work, they had figured out the encodings for numbers, the alphabet, their system of units, and seemingly also some data formats, including for images.

Confirmation came next. The next item on the disk was an image of two of the living aliens, standing on a beach during a sunset. Alice stared into their faces for a long time.

Next there came images next to what were clearly words of text, about fifty of them. Some of the more abstract ones took a few guesses, but ultimately they thought they had a base vocabulary, and with the help of some linguistics software, it did not take very long before they had a translated vocabulary list of about eight thousand words.

Alice was checking the work when Charlie almost shouted: “Look at this!”

Alice looked at what he was pointing at. It was a fragment of text that read:

Hello,
The forms for ordering the new furniture are attached. Please fill them in and we will respond to your order as quickly as we can!
If you need any help, please contact customer support. You will find the phone number on our website.

“What is this? Is Mr Skeleton trying to sell us furniture from beyond the grave?” Alice asked.

“No”, Charlie said. “This isn’t what I got from the recovered data; I haven’t looked at the big remaining chunk yet. This is what I got by interpreting one of the packets of data running on the cables that our rover is plugged into using what we now know about their data formats and the language.”

“And?”

“I don’t get it!” Charlie said. “Why would a world of machines send each other emails in natural language?”

“Why would they manufacture plushy toys? I doubt the robotic arms need cuddles.”

Charlie looked at the world, slowly spinning underneath their ship. “Being so close to it makes me feel creeped out. I don’t get it.”

“You don’t want to lick it anymore?” Alice asked. She decided not to tell Charlie about her own very similar feelings earlier, when she thought for a moment Charlie had gone missing.

Charlie ignored her. “I think the last thing on Mr Skeleton’s hard-drive is a video”, he said. “I’ve checked and it seems to play.”

“You looked at it first?” Alice said in a playfully mocking tone. The thrill of discovery was getting to her.

“Only the first five frames”, Charlie said. “Do you want to watch it?”

-

Our Civilisation: A Story read a short fragment of subtitle, white on black, auto-translated by a program using the dictionary they had built up.

There was a brief shot of some semi-bipedal furry creature walking in the forest. Then one of a fossilised skeleton of something more bipedal and with a bigger head. Then stone tools: triangular ones that might have been spear tips, saw-toothed ones, clubs. A dash of fading red paint on a rock surface, in the shape of a cartoon version of that same bipedal body plan.

There were two pillars of stone in a desert on what looked like a pedestal, some faded inscription at its base and the lone and level sands stretching far away. There was a shot of an arrangement of rocks, some balancing on top of two others, amid a field of green. A massive pyramidal stone structure, lit by the rising sun.

Blocky written script etched on a stone tablet. Buildings framed by columns of marble. A marble statue of one of the aliens, a sling carelessly slung over its shoulder, immaculate in its detail. A spinning arrangement of supported balls orbiting a larger one. And still it moves, the subtitles flashed.

A collection of labelled geometric diagrams on faded yellow paper. Mathematical Principles of Natural Philosophy.

A great ornate building with a spire. A painting of a group of the aliens clad in colourful clothing. An ornate piece of writing. We hold these truths to be self-evident …

A painting of a steam locomotive barrelling along tracks. A diagram of a machine. A black-and-white picture of one of the aliens, then another. Government of the people, for the people, by the people, shall not perish …

An alien with white hair sticking up, holding a small stick of something white and with diagrams of cones behind him. Grainy footage of propeller aircraft streaking through the sky, and then of huge masses of people huddling together and walking across a barren landscape, and then of aliens all in the same clothes charging a field, some of them suddenly jerking about and falling to the ground. We will fight on the beaches, we will fight on the landing grounds …

A black-and-white footage of a mushroom cloud slowly rising from a city below. A picture, in flat pale blue and white, showing a stylised representation of the world’s continents. The same picture, this time black-and-white, on the wall of a room where at least a hundred aliens were sitting.

An alien giving a speech. I have a dream. An alien, looking chubby in a space suit, standing on a barren rocky surface below an ink-black sky next to a pole with a colourful rectangle attached to it.

Three aliens in a room, looking at the camera and holding up a piece of printed text. Disease eradicated.

What looked like a primitive computer. A laptop computer. An abstract helical structure of balls connected by rods, and then flickering letters dancing across the screen.

A blank screen, an arrow extending left to right across it - time, flashed the subtitles- and then another arrow from the bottom-left corner upwards - people in poverty - and then a line crawling from left to right, falling as it did so.

A line folding itself up into a complicated shape. AI system cracks unsolved biology problem.

From then on, the screen showed pictures of headlines.

All routine writing tasks now a solved problem, claims AI company.

Office jobs increasingly automated.

Three-fourths of chief executives of companies on the [no translation] admit to using AI to help write emails, one-third have had AI write a shareholder letter or strategy document.

Exclusive report: world’s first fully-automated company, a website design agency.

Mass layoffs as latest version of [no translation] adopted at [no translation]; ‘stunning performance’ at office work.

Nations race to reap AI productivity gains: who will gain and who will lose?

CEO of [no translation] resigns, claiming job pointless, both internal and board pressure to defer to “excellently-performing” AI in all decisions.

[No translation] ousts executive and management team, announces layoffs; board supports replacing them with AI to keep up with competition.

Entirely or mostly automated companies now delivering 2.5x higher returns on investment on average; ‘the efficiency difference is no joke’, says chair of [no translation].

Year-on-year economic growth hits 21% among countries with advanced AI access.

Opinion: the new automated economy looks great on paper but is not serving the needs of real humans.

Mass protests after [no translation], a think-tank with the ear of the President, is discovered to be funded and powered by AI board of [no translation], and to have practically written national economic policy for the past two years.

‘No choice but forward’, says [no translation] after latest round of worries about AI; unprecedented economic growth still strong.

[No translation 1] orders raid of [no translation 2] over fears [no translation 2] is not complying with latest AI use regulations, but cannot execute order due to noncompliance from the largely-automated police force; ‘we are working with our AI advisers and drivers in accordance with protocol, and wish to assure the [no translation 3] people that we are still far from the sci-fi scenario where our own police cars have rebelled against us.’

‘AI overthrow’ fears over-hyped, states joint panel of 30 top AI scientists and business-people along with leading AI advisory systems; ‘they’re doing a good job maximising all relevant metrics and we should let them keep at it, though businesses need to do a better job of selecting metrics and tough regulation is in order.’

Opinion: we’re better-off under a regime of rigorous AI decision-making than under corrupt politicians; let the AIs repeat in politics what they’ve done for business over the last five years.

‘The statistics have never looked so good’ - Prime Minister reassures populace as worries mount over radical construction projects initiated by top AI-powered companies.

Expert panel opinion: direct AI overthrow scenario remains distant threat, but more care should be exercised over choice of target metrics; recommend banning of profit-maximisation target metric.

Movement to ban profit-maximising AIs picks up pace.

Top companies successfully challenge new AI regulation package in court.

‘The sliver of the economy over which we retain direct control will soon be vanishingly small’, warns top economist, ‘action on AI regulation may already be too late’.

Unverified reports of mass starvation in [no translation]; experts blame agricultural companies pivoting to more land-efficient industries.

Rant goes viral: ‘It’s crazy, man, we just have these office AIs that only exist in the cloud, writing these creepily-human emails to other office AIs, all overseen by yet another AI, and like most of their business is with other AI companies; they only talk to each other, they buy and sell from each other, they do anything as long as it makes those damned numbers on their spreadsheets just keep ticking up and up; I don’t think literally any human has ever seen a single product out of the factory that just replaced our former neighbourhood, but those factories just keep going up everywhere.’

Revolution breaks out in [no translation]; government overthrown, but it’s business-as-usual for most companies, as automated trains, trucks, and ships keep running.

[No translation] Revolution: Leaked AI-written email discovered, in which the AI CEO ordered reinforcement of train lines and trains three weeks ago. ‘We are only trying to ensure the continued functioning of our supply chains despite the recent global unrest, in order to best serve our customers’, CEO writes in new blog post.

[No translation] Revolution: crowds that tried swarming train lines run over by trains; ‘the trains didn’t even slow down’, claim witnesses. CEO cites fiduciary duties.

Despite unprecedented levels of wealth and stability, you can’t actually do much: new report finds people trying to move house, book flight or train tickets, or start a new job or company often find it difficult or impossible; companies prioritising serving ‘more lucrative’ AI customers and often shutting down human-facing services.

Expert report: ‘no sign of human-like consciousness even in the most advanced AI systems’, but ‘abundantly clear’ that ‘the future belongs to them’.

New report: world population shrinking rapidly; food shortages, low birth rates, anti-natalist attitudes fuelled by corporate campaigns to blame.

The screen went blank. Then a video of an alien appeared, sitting up on a rocky surface. Alice took a moment to realise that it’s the same cave they found the skeleton in. The alien’s skin was wrapped tight around its bones, and even across the vast gulf of biology and evolutionary history, Alice could tell that it is not far from death. It opened its mouth, and sound came out. Captions appeared beneath it.

“It is the end”, the alien said, its eyes staring at them from between long unkempt clumps of hair. “On paper, I am rich beyond all imagination. But I have no say in this new world. And I cannot find food. I will die.”

The wind tugged at the alien’s long hair, but otherwise the alien was so still that Alice wondered if it had died there and then.

“There is much I would like to say”, the alien says. “But I do not have the words, and I do not have the energy.” It paused. “I hope it was not all in vain. Or, that if for us it was, that for someone up there it isn’t.”

The video went blank.

Alice and Charlie watched the blank screen in silence.

“At least the blight they birthed seems to have stuck to their world”, Charlie said after a while.

“Yeah”, Alice said, slowly. “But I don’t think we’ll find anything here.”

Legacy completed nine more orbits of the planet, and then jettisoned all unnecessary mass into space. Its engines jabbed against the darkness of space, bright enough to be visible from the planet’s surface. There was no one to see them.

On a factory down on the planet, an assembly line of beady-eyed purple plush toys marched on endlessly.

The title of this work is taken from a passage in Superintelligence: Paths, Dangers, Strategies, where Nick Bostrom writes:

We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today—a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland without children. [emphasis added]

The outline of events presented draws inspiration from several sources, but most strongly on Paul Christiano’s article What failure looks like.

Deciding not to found a human-data-for-alignment startup

Rudolf Laine — Tue, 27 Sep 2022 20:38:00 GMT

8.6k words (~30 minutes)

Both the project and this write-up were a collaboration with Matt Putz.

Matt Putz and I worked together for the first half of the summer to figure out if we should found a startup with the purpose of helping AI alignment researchers get the datasets they need to train their ML models (especially in cases where the dataset is based on human-generated data). This post, also published on the Effective Altruism Forum and LessWrong (both of which may contain additional discussion in the comments), is a summary of our findings, and why we decided to not do it.

Summary

One-paragraph summary: we (two recent graduates) spent about half of the summer exploring the idea of starting an organisation producing custom human-generated datasets for AI alignment research. Most of our time was spent on customer interviews with alignment researchers to determine if they have a pressing need for such a service. We decided not to continue with this idea, because there doesn’t seem to be a human-generated data niche (unfilled by existing services like Surge) that alignment teams would want outsourced.

In more detail: The idea of a human datasets organisation was one of the winners of the Future Fund project ideas competition, still figures on their list of project ideas, and had been advocated before then by some people, including Beth Barnes. Even though we ended up deciding against, we think this was a reasonable and high-expected-value idea for these groups to advocate at the time.

Human-generated data is often needed for ML projects or benchmarks if a suitable dataset cannot be e.g. scraped from the web, or if human feedback is required. Alignment researchers conduct such ML experiments, but sometimes have different data requirements than standard capabilities researchers. As a result, it seemed plausible that there was some niche unfilled by the market to help alignment researchers solve problems related to human-generated datasets. In particular, we thought - and to some extent confirmed - that the most likely such niche is human data generation that requires particularly competent or high-skill humans. We will refer to this as high-skill (human) data.

We (Matt & Rudolf) went through an informal co-founder matching process along with four other people and were chosen as the co-founder pair to explore this idea. In line with standard startup advice, our first step was to explore whether or not there is a concrete current need for this product by conducting interviews with potential customers. We talked to about 15 alignment researchers, most of them selected on the basis of doing work that requires human data. A secondary goal of these interviews was to build better models for the future importance and role of human feedback in alignment.

Getting human-generated data does indeed cost many of these researchers significant time and effort. However, we think to a large extent this is because dealing with humans is inherently messy, rather than existing providers doing a bad job. Surge AI in particular seems to offer a pretty good and likely improving service. Furthermore, many companies have in-house data-gathering teams or are in the process of building them.

Hence we have decided to not further pursue this idea.

Other projects in the human data generation space may still be valuable, especially if the importance of human feedback in ML continues to increase, as we expect. This might include people specializing on human data as a career.

The types of factors that are most important for doing human dataset provision well include: high-skill contractors, fast iteration, and high bandwidth communication and shared understanding between the research team, the provider organisation and the contractors.

We are keen to hear other people’s thoughts, and would be happy to talk or to share more notes and thoughts with anyone interested in working on this idea or a similar one in the future.

Theory of Change

A major part of AI alignment research requires doing machine learning (ML) research, and ML research in turn requires training ML models. This involves expertise and execution ability in three broad categories: algorithms, compute, and data, the last of which is very neglected by EAs.

We expect training on data from human feedback to become an increasingly popular and very powerful tool in mainstream ML (see below). Furthermore, many proposals for alignment (for example: reinforcement learning from human feedback (RLHF) and variants like recursive reward modelling, iterated amplification, and safety via debate) would require lots of human interaction or datasets based on human-generated data.

While many services (most notably Surge) exist for finding labour to work on data generation for ML models, it seems plausible that an EA-aligned company could add significant value because:

Markets may not be efficient enough to fill small niches that are more important to alignment researchers than other customers; high-skill human data that requires very competent crowdworkers may be one such example. If alignment researchers can get it at all, it might be very expensive.
We have a better understanding of alignment research agendas, and this might help. This may allow us to make better-informed decisions on many implementation details with less handholding, thereby saving researchers time.
We would have a shared goal with our customers: reducing AI x-risk. Though profit motives already provide decent incentives to offer a good service, mission alignment helps avoid adversarial dynamics, increases trust, and reduces friction in collaboration.
An EA-led company may be more willing to make certain strategic moves that go against its profit incentives; e.g. investing heavily into detecting a model’s potential attempts to deceive the crowdworkers, even when it’s hard for outsiders to tell whether such monitoring efforts are sincere and effective (and thus customers may not be willing to pay for it). Given that crowdworkers might provide a reward signal, they could be a key target for deceptive AIs.

Therefore, there is a chance that an EA-led human data service that abstracts out some subset of dataset-related problems (e.g. contractor finding, instruction writing/testing, UI and pipeline design/coding, experimentation to figure out best practices and accumulate that knowledge in one place) would:

save the time of alignment researchers, letting them make more progress on alignment; and
reduce the cost (in terms of time and annoying work) required to run alignment-relevant ML experiments, and therefore bring more of them below the bar at which it makes sense to run them, and thus increasing the number of such experiments that are run.

In the longer run, benefits of such an organisation might include:

There is some chance that we could simply outcompete existing ML data generation companies and be better even in the cases where they do provide a service; this is especially plausible for relatively niche services. In this scenario we’d be able to exert some marginal influence over the direction of the AI field, for example by only taking alignment-oriented customers. This would amount to differential development of safety over capabilities. Beyond only working with teams that prioritise safety, we could also pick among self-proclaimed “safety researchers”. It is common for proclaimed safety efforts to be accused of helping more with capabilities than alignment by other members of the community.
There are plausibly critical actions that might need to be taken for alignment, possibly quickly during “crunch-time”, that involve a major (in quality or scale) data-gathering project (or something like large-scale human-requiring interpretability work, that makes use of similar assets, like a large contractor pool). At such a time it might be very valuable to have an organisation committed to x-risk minimisation with the competence to carry out any such project.

Furthermore, if future AIs will learn human values from human feedback, then higher data quality will be equivalent to a training signal that points more accurately at human values. In other words, higher quality data may directly help with outer alignment (though we're not claiming that it could realistically solve it on its own). In discussions, it seemed that Matt gave this argument slightly more weight than Rudolf.

While these points are potentially high-impact, we think that there are significant problems with starting an organisation mainly to build capacity to be useful only at some hypothetical future moment. In particular, we think it is hard to know exactly what sort of capacity to build (and the size of the target in type-of-capacity space might be quite small), and there would be little feedback that the organisation could improve or course-correct based on.

More generally, both of us believe that EA is right now partly bottlenecked by people who can start and scale high-impact organisations, which is a key reason why we’re considering entrepreneurship. This seems particularly likely given the large growth of the movement.

What an org in this space may look like

Providing human datasets

The concept we most seriously considered was a for-profit that would specialise in meeting the specific needs of alignment researchers, probably by focusing on very high-skill human data. Since this niche is quite small, the company could offer a very custom-tailored service. At least for the first couple years, this would probably mean both of us having a detailed understanding of the research projects and motivations of our customers. That way, we could get a lot of small decisions right, without the researchers having to spend much time on it. We might be especially good at that compared to competitors, given our greater understanding of alignment.

Researching enhanced human feedback

An alternative we considered was founding a non-profit that would research how to enhance human feedback. See this post by Ajeya Cotra for some ideas on what this kind of research could look like. The central question is whether and how you can combine several weak training signals into a stronger more accurate one. If this succeeded, maybe (enhanced) human feedback could become a more accurate (and thereby marginally safer) signal to train models on.

We decided against this for a number of reasons:

Currently, neither of us has more research experience than an undergraduate research project.
We thought we could get a significant fraction of the benefits of this kind of research even if we did the for-profit version, and plausibly even more valuable expertise.
- First of all, any particular experiment that funders would have liked to see, they could have paid us to do, although we freely admit that this is very different from someone pushing forward their own research agenda.
- More importantly, we thought a lot of the most valuable expertise to be gained would come in the form of tacit knowledge and answers to concrete boring questions that are not best answered by doing “research” on them, but rather by iterating on them while trying to offer the best product (e.g. “Where do you find the best contractors?”, “How do you incentivize them?”, “What’s the best way to set up communication channels?”).
  - It is our impression that Ought pivoted away from doing abstract research on factored cognition and toward offering a valuable product for related reasons.
This topic seems plausibly especially tricky to research (though some people we’ve spoken to disagreed):
- At least some proposed such experiments would not involve ML models at all. We fear that this might make it especially easy to fool ourselves into thinking some experiment might eventually turn out to be useful when it won’t. More generally, the research would be pretty far removed from the end product (very high quality human feedback). In the for-profit case on the other hand, we could easily tell whether alignment teams were willing to pay for our services and iteratively improve.

For-profit vs non-profit

We can imagine two basic funding models for this org:

either we’re a nonprofit directly funded by EA donors and offering free or subsidized services to alignment teams;
or we’re a for-profit, paid by its customers (ie alignment teams).

Either way, a lot of the money will ultimately come from EA donors (who fund alignment teams.)

The latter funding mechanism seems better; “customers paying money for a service” leads to the efficient allocation of resources by creating market structures. They have a clear incentive to spend the money well. On the other hand, “foundations deciding what services are free” is more reminiscent of planned economies and distorts markets. To a first approximation, funders should give alignment orgs as much money as they judge appropriate and then alignment orgs should exchange it for services as they see fit.

A further reason is that a non-profit is legally more complicated to set up, and imposes additional constraints on the organisation.

Should the company exclusively serve alignment researchers?

We also considered founding a company with the ambition to become a major player in the larger space of human data provision. It would by default serve anyone willing to pay us and working on something AGI-related, rather than just alignment researchers. Conditional on us being able to successfully build a big company, this would have the following upsides:

Plausibly one of the main benefits of founding a human data gathering organisation is to produce EAs and an EA org that have deep expertise in handling and producing high-skill human data in significant quantities. That might prove useful around “crunch time”, e.g. when some project aims to create competitive but safe AGI and needs this expertise. Serving the entire market could scale to a much larger company enabling us to gain expertise at higher scales.
Operating a large company would also come with some degree of market power. Any company with paying customers has some amount of leverage over them: first of all just because of switching costs, but also because the product it offers might be much better than the next-best alternative. This could allow us to make some demands, e.g. once we’re big and established, announce we’d only work with companies that follow certain best practices.

On the other hand, building a big successful company serving anyone willing to pay might come with some significant downsides as well.

First, and most straightforwardly, it is probably much harder than filling a small niche (just meeting the specific needs of alignment researchers), making us less likely to succeed. A large number of competitors exist and as described in this section, some of them (esp. Surge) seem pretty hard to beat. Since this is an already big and growing market, there is an additional efficient markets reason to assume this is true a priori.
Secondly, and perhaps more importantly, such a company might accelerate capabilities (more on this below).

Furthermore, it might make RLHF (Reinforcement Learning from Human Feedback) in particular more attractive. Depending on one’s opinions about RLHF and how it compares to other realistic alternatives, one might consider this a strong up- or downside.

Approach

The main reason companies fail is that they build a product that customers don’t want. For for-profits, the signal is very clear: either customers care enough to be willing to pay hard cash for the product/service, or they don’t. For non-profits, the signal is less clear, and therefore nonprofits can easily stick around in an undead state, something that is an even worse outcome than the quick death of a for-profit because of resource (mis)allocation and opportunity costs. As discussed, it is not obvious which structure we should adopt for this organisation, though for-profit may be a better choice on balance. However, in all cases it is clear that the organisation needs to solve a concrete problem or provide clear value to exist and be worth existing. This does not mean that the value proposition needs to be certain; we would be happy to take a high-risk, high-reward bet, and generally support hits-based approaches to impact both in general and for ourselves.

An organisation is unlikely to do something useful to its customers without being very focused on customer needs, and ideally having tight feedback cycles.

The shortest feedback loops are when you’re making a consumer software product where you can prototype quickly (including with mockups), and watch and talk to users as they use the core features, and then see if the user actually buys the product on the spot. A datasets service differs from this ideal feedback mode in a number of ways:

The product is a labour-intensive process, which means the user cannot quickly use the core features and we cannot quickly simulate them.
The actual service requires either a contractor pool or (potentially at the start) the two of us spending a number of hours per request generating data.
There is significant friction to getting users to use the core feature (providing a dataset), since it requires specification of a dataset from a user, which takes time and effort.

Therefore, we relied on customer interviews with prospective customers. The goal of these interviews was to talk to alignment researchers who work with data, and figure out if external help with their dataset projects would be of major use to them.

Our approach to customer interviews was mostly based on the book The Mom Test, which is named after the idea that your customer interview questions should be concrete and factual enough that even someone as biased as your own mom shouldn’t be able to give you a false signal about whether the idea is actually good. Key lessons emphasised by The Mom Test include emphasising:

factual questions about the past over hypothetical questions for the future;
- In particular, questions about concrete past and current efforts spent solving a problem rather than questions about current or future wishes for solving a problem
questions that get at something concrete (e.g. numbers); and
questions that prompt the customer to give information about their problems and priorities without prompting them with a solution.

We wanted to avoid the failure mode where lots of people tell us something is important and valuable in the abstract, without anyone actually needing it themselves.

We prepared a set of default questions that roughly divided into:

A general starting question prompting the alignment researcher to describe the biggest pain points and bottlenecks they face in their work, without us mentioning human data.
Various questions about their past and current dataset-related work, including what types of problems they encounter with datasets, how much of their time these problems take, and steps they took to address these problems.
Various questions on their past experiences using human data providers like Surge, Scale, or Upwork, and specifically about any things they were unable to accomplish because of problems with such services.
In some cases, more general questions about their views on where the bottlenecks for solving alignment are, views on the importance of human data or tractability of different data-related proposals, etc.
What we should’ve asked but didn’t, and who else we should talk to.

Point 4 represents the fact that in addition to being potential customers, alignment researchers also doubled as domain experts. The weight given to the questions described in point 4 varied a lot, though in general if someone was both a potential customer and a source of data-demand-relevant alignment takes, we prioritised the customer interview questions.

In practice, we found it easy to arrange meetings with alignment researchers; they generally seemed willing to talk to people who wanted input on their alignment-relevant idea. We did customer interviews with around 15 alignment researchers, and had second meetings with a few. For each meeting, we prepared beforehand a set of questions tweaked to the particular person we were meeting with, which sometimes involved digging into papers published by alignment researchers on datasets or dataset-relevant topics (Sam Bowman in particular has worked on a lot of data-relevant papers). Though the customer interviews were by far the most important way of getting information on our cruxes, we found the literature reviews we carried out to be useful too. We are happy to share the notes from the literature reviews we carried out; please reach out if this would be helpful to you.

Though we prepared a set of questions beforehand, in many meetings - including often the most important or successful ones - we often ended up going off script fairly quickly.

Something we found very useful was that, since there were two of us, we could split the tasks during the meeting into two roles (alternating between meetings):

One person who does most of the talking, and makes sure to be focused on the thread of the conversation.
One person who mostly focuses on note-taking, but also pipes in if they think of an important question to ask or want to ask for clarification.

Key crux: demand looks questionable, Surge seems pretty good

Common startup advice is to make sure you have identified a very strong signal of demand before you start building stuff. That should look something like someone telling you that the thing you’re working on is one of their biggest bottlenecks and that they can’t wait to pay you asap so you solve this problem for them. “Nice to have” doesn’t cut it. This is in part because working with young startups is inherently risky, so you need to make up for that by solving one of their most important problems.

In brief, we don’t think this level of very strong demand currently exists, though there were some weaker signals that looked somewhat promising. There are many existing startups that offer human feedback already. Surge AI in particular was brought up by many people we talked to and seems to offer quite a decent service that would be hard to beat.

Details about Surge

Surge is a US-based company that offers a service very similar to what we had in mind, though they are not focused on alignment researchers exclusively. They build data-labelling and generation tools and have a workforce of crowdworkers.

They’ve worked with Redwood and the OpenAI safety team, both of which had moderately good experiences with them. More recently, Ethan Perez’s team have worked with Surge too; he seems to be very satisfied based on this Twitter thread.

Collaboration with Redwood

Surge has worked with Redwood Research on their paper about adversarial training. This is one of three case studies on Surge’s website, so we assume it’s among the most interesting projects they’ve done so far. The crowdworkers were tasked with coming up with prompts that would cause the model to output text in which someone got injured. Furthermore, crowdworkers also classified whether someone got injured in a given piece of text.

One person from Redwood commented that doing better than Surge seemed possible to them with “probably significant value to be created”, but “not an easy task”. They thought our main edge would have to be that we’d specialise on fuzzy and complex tasks needed for alignment; Surge apparently did quite well with those, but still with some room for improvement. A better understanding of alignment might lower chances of miscommunication. Overall, Redwood seems quite happy with the service they received.

Initially, Surge’s iteration cycle was apparently quite slow, but this improved over time and was “pretty good” toward the end.

Redwood told us they were quite likely to use human data again by the end of the year and more generally in the future, though they had substantial uncertainty around this. Their experience in working with human feedback overall was somewhat painful as we understood it. This is part of the reason they’re uncertain about how much human feedback they will use for future experiments, even though it’s quite a powerful tool. However, they estimated that friction in working with human feedback was mostly caused by inherent reasons (humans are inevitably slower and messier than code), rather than Surge being insufficiently competent.

Collaboration with OpenAI

OpenAI have worked with Surge in the context of their WebGPT paper. In that paper, OpenAI fine-tuned their language model GPT-3 to answer long-form questions. The model is given access to the web, where it can search and navigate in a text-based environment. It’s first trained with imitation learning and then optimised with human feedback.

Crowdworkers provided “demonstrations”, where they answered questions by browsing the web. They also provided “comparisons”, where they indicated which of two answers to the same question they liked better.

People from OpenAI said they had used Surge mostly for sourcing the contractors, while doing most of the project management, including building the interfaces, in-house. They were generally pretty happy with the service from Surge, though all of them did mention shortcomings.

One of the problems they told us about was that it was hard to get access to highly competent crowdworkers for consistent amounts of time. Relatedly, it often turned out that a very small fraction of crowdworkers would provide a large majority of the total data.

More generally, they wished there had been someone at Surge that understood their project better. Also, it might have been somewhat better if there had been more people with greater experience in ML, such that they could have more effectively anticipated OpenAI’s preferences — e.g. predict accurately what examples might be interesting to researchers when doing quality evaluation. However, organisational barriers and insufficient communication were probably larger bottlenecks than ML knowledge. At least one person from OpenAI strongly expressed a desire for a service that understood their motives well and took as much off their plate as possible in terms of hiring and firing people, building the interfaces, doing quality checks and summarising findings etc. It is unclear to us to what extent Surge could have offered these things if OpenAI hadn’t chosen to do a lot of these things in-house. One researcher suggested that communicating their ideas reliably was often more work than just doing it themselves. As it was, they felt that marginal quality improvement required significant time investment on their own part, i.e. could not be solved with money alone.

Notably, one person from OpenAI estimated that about 60% of the WebGPT team’s efforts were spent on various aspects of data collection. They also said that this figure didn’t change much after weighting for talent, though in the future they expect junior people to take on more disproportionate shares of this workload.

Finally, one minor complaint that was mentioned was the lack of transparency about contractor compensation.

How mission-aligned is Surge?

Surge highlight their collaboration with Redwood on their website as one of three case studies. In their blog post about their collaboration with Anthropic, the first sentence reads: “In many ways, alignment – getting models to align themselves with what we want, not what they think we want – is one of the fundamental problems of AI.”

On the one hand, they describe alignment as one of the fundamental problems of AI, which could indicate that they intrinsically cared about alignment. However, they have a big commercial incentive to say this. Note that many people would consider their half-sentence definition of alignment to be wrong (a model might know what we want, but still do something else).

We suspect that the heads of Surge have at least vaguepositive dispositions towards alignment. They definitely seem eager to work with alignment researchers, which might well be more important. We think it’s mostly fine if they are not maximally intrinsically driven, though mission alignment does add value as mentioned above.

Other competitors

We see Surge as the most direct competitor and have researched them by far in the most detail. But besides Surge, there are a large number of other companies offering similar services.

First, and most obviously, Amazon Mechanical Turk offers a very low quality version of this service and is very large. Upwork specialises in sourcing humans for various tasks, without building interfaces. ScaleAI is a startup with a $7B valuation --- they augment human feedback with various automated tools. OpenAI have worked with them. Other companies in this broad space include Hybrid (which Sam Bowman’s lab has worked with) and Invisible (who have worked with OpenAI). There are many more that we haven’t listed here.

In addition, some labs have in-house teams for data gathering (see here for more).

Data providers used by other labs

Ethan Perez’s and Sam Bowman’s labs at NYU/Anthropic have historically often built their own interfaces while using contractors from Upwork or undergrads, but they have been trialing Surge over the summer and seem likely to stick with them if they have a good experience. Judging from the Twitter thread linked above and asking Jérémy Scheurer (who works on the team and built the pre-Surge data pipeline) how they’ve found Surge so far, Surge is doing a good job.

Google has an internal team that provides a similar service, though DeepMind have used at least one external provider as well. We expect that it would be quite hard to get DeepMind to work with us, at least until we would be somewhat more established.

Generally, we get the impression that most people are quite happy with Surge. It’s worth also considering that it’s a young company that’s likely improving its service over time. We’ve heard that Surge iterates quickly, e.g. by shipping simple feature requests in two days. It’s possible that some of the problems listed above may no longer apply by now or in a few months.

Good signs for demand

One researcher we talked to said that there were lots of projects their team didn’t do, because gathering human feedback of sufficient quality was infeasible.

One of the examples this researcher gave was human feedback on code quality. This is implausible to do, because the time of software engineers is just too expensive. That problem is hard for a new org to solve.

Another example they gave seemed like it might be more feasible: for things like RLHF, they often choose to do pairwise comparisons between examples or multi-preferences. Ideally, they would want to get ratings, e.g. on a scale from 1 to 10. But they thought they didn’t trust the reliability of their raters enough to do this.

More generally, this researcher thought there were lots of examples where if they could copy any person on their team a hundred times to provide high-skill data, they could do many experiments that they currently can’t.

They also said that their team would be willing to pay ~3x of what they were paying currently to receive much higher-quality feedback.

Multiple other researchers we talked to expressed vaguely similar sentiments, though none quite as strong.

However, it’s notable that in this particular case, the researcher hadn’t worked with Surge yet.

The same researcher also told us about a recent project where they had spent a month on things like creating quality assurance examples, screening raters, tweaking instructions etc. They thought this could probably have been reduced a lot by an external org, maybe to as little as one day. Again, we think Surge may be able to get them a decent part of the way there.

Labs we could have worked with

We ended up finding three projects that we could have potentially worked on:

A collaboration with Ought --- they spend about 15 hours a week on data-gathering and would have been happy to outsource that to us. If it had gone well, they might also have done more data-gathering in the longterm (since friction is lower if it doesn’t require staff time). We decided not to go ahead with this project since we weren’t optimistic enough about demand from other labs being bigger once we had established competence with Ought and the project itself didn’t seem high upside enough.
Attempt to get the Visible Thoughts bounty by MIRI. We decided against this for a number of reasons. See more of our thinking about Visible Thoughts below.
Potentially a collaboration with Owain Evans on curated datasets for alignment.

We think the alignment community is currently relatively tight-knit. e.g. researchers often knew about other alignment teams’ experiences with Surge from conversations they had had with them. Hence, we were relatively optimistic that conditional on there being significant demand for this kind of service, doing a good job on one of the projects above would quickly lead to more opportunities.

Visible Thoughts

In November 2021, MIRI announced the Visible Thoughts (VT) project bounty. In many ways VT would be a good starting project for an alignment-oriented dataset provider, in particular because the bounty is large (up to $1.2M) and because it is ambitious enough that executing on it would provide a strong learning signal to us and a credible signal to other organisations we might want to work with. However, on closer examination of VT, we came to the conclusion that it is not worth it for us to work on it.

The idea of VT is to collect a dataset of 100 runs of fiction of a particular type (“dungeon runs”, an interactive text-based genre where one party, called the “dungeon master” and often an AI, offers descriptions of what is happening, and the other responds in natural language with what actions they want to take), annotated with a transcript of some of the key verbal thoughts that the dungeon master might be thinking as they decide what happens in the story world. MIRI hopes that this would be useful for training AI systems that make their thought processes legible and modifiable.

In particular, a notable feature of the VT bounty is the extreme run lengths that it asks for: to the tune of 300 000 words for each of the runs (for perspective, this is the length of A Game of Thrones, and longer than the first three Harry Potter books combined). A VT run is much less work than a comparable-length book - the equivalent of a rough unpolished first-draft (with some quality checks) would likely be sufficient - but producing one such run would still probably require at least on the order of 3 months of sequential work time from an author. We expect the pool of people willing to write such a story for 3 months is significantly smaller than the pool of people who would be willing to complete, say, a 30 000 word run, and that the high sequential time cost increases the amount of time required to generate the same number of total words. We also appear to have different ideas on how easy it is to fit a coherent story, for the relevant definition of coherent, into a given number of words. Note that to compare VT word counts to lengths of standard fiction without the written-out thoughts from the author, the VT word count should be reduced by a factor of 5-6.

Concerns about the length are raised in the comments section, to which Eliezer Yudkowksy responded. His first point, that longer is easier to write per step, may be true, especially as we also learned (by email with Nate Soares and Aurelien Cabanillas) that in MIRI’s experience “authors that are good at producing high quality steps are also the ones who don't mind producing many steps”. In particular because of that practical experience, we think it is possible we overestimated the logistical problems caused by the length. MIRI also said they would likely accept shorter runs too if they satisfied their other criteria.

In a brief informal conversation with Rudolf during EAG SF, Eliezer emphasised the long-range coherence point in particular. However, they did not come to a shared understanding of what type of “long-range coherence” is meant.

Even more than these considerations, we are sceptical about the vague plans for what to do given a VT dataset. A recurring theme from talking to alignment researchers who work with datasets was that inventing and creating a good dataset is surprisingly hard, and generally involves having a clear goal of what you’re going to use the dataset for. It is possible the key here is the difference in our priors for how likely a dataset idea is to be useful.

In addition, we have significant concerns about undertaking a major project based on a bounty whose only criterion is the judgement of one person (Eliezer Yudkowsky), and undertaking such a large project as our first project.

Other cruxy considerations

Could we make a profit / get funding?

One researcher from OpenAI told us he thought it would be hard to imagine an EA data-gathering company making a profit because costs for individual projects would always be quite high (requiring several full-time staff), and total demand was probably not all that big.

In terms of funding, both of us were able to spend time on this project because of grants from regrantors in the Future Fund regrantor program. Based on conversations with regrantors, we believe we could’ve gotten funding to carry out an initial project if we had so chosen.

Will human feedback become a much bigger deal? Is this a very quickly growing industry?

Our best guess is yes. For example, see this post by Ajeya Cotra which outlines how we could get to TAI by training on Human Feedback on Diverse Tasks (HFDT).

She writes: “HFDT is not the only approach to developing transformative AI, and it may not work at all. But I take it very seriously, and I’m aware of increasingly many executives and ML researchers at AI companies who believe something within this space could work soon.”

In addition, we have also had discussions with at least one other senior AI safety researcher whom we respect and who thought human feedback was currently irrationally neglected by mainstream ML; they expected it to become much more wide-spread and to be a very powerful tool.

If that’s right, then providing human feedback will likely become important and economically valuable.

This matters, because operating a new company in a growing industry is generally much easier and more likely to be successful. We think this is true even if profit isn’t the main objective.

Would we be accelerating capabilities?

Our main idea was to found a company (or possibly non-profit) that served alignment researchers exclusively. That could accelerate alignment differentially.

One problem is that it’s not clear where to draw this boundary. Some alignment researchers definitely think that other people who would also consider themselves to be alignment researchers are effectively doing capabilities work. This is particularly true of RLHF.

One mechanism worth taking seriously if we worked with big AI labs to make their models more aligned by providing higher quality data is that the models might merely appear surface-level aligned. “Make the data higher quality” might be a technique that scales poorly as capabilities ramp up. So it risks creating a false sense of security. It would also clearly improve the usefulness of current-day models and hence, it risks increasing investment levels too.

We don’t currently think the risk of surface-level alignment is big enough to outweigh the benefits. In general, we think that a good first-order heuristic that helps the field stay grounded in reality would be that whatever improves alignment in current models is useful to explore further and invest resources into. It seems like a good prior that such things would also be valuable in the future (even if it’s possible that new additional problems may arise, or such efforts aren’t on the path to a future alignment solution). See Nate Soares’ post about sharp left turns to get a contradicting view on this.

Is it more natural for this work to be done in-house in the longterm? Especially at big labs/companies.

We expect that human data gathering is likely to become very important and that it benefits from understanding the relevant research agenda well. So maybe big companies will want to do this internally, instead of relying on third-party suppliers?

That seems quite plausible to us and to some extent it’s happening already. Our understanding is that Anthropic is hiring an internal team to do human data gathering. DeepMind has access to Google’s crowdworker service. OpenAI have worked with multiple companies, but they also have at least one in-house specialist for this kind of work and are advertising multiple further jobs on the human data team here. They’re definitely considering moving more of this work in-house, but it’s unclear to us to what extent that’s going to happen and we have received somewhat contradicting signals regarding OpenAI safety team members’ preferences on this.

So a new EA org would face stiff competition, not only from other external providers, but also from within companies.

Of course, smaller labs will most likely always have to rely on external providers. Hence, another cruxy consideration is how much small labs matter. Our intuition is that they matter much less than bigger labs (since the latter have access to the best and biggest models).

Creating redundancy of supply and competition

Even if existing companies are doing a pretty good job at serving the needs of alignment researchers, there’s still some value in founding a competitor.

First, competition is good. Founding a competitor puts pressure on existing providers to keep service quality high, work on improving their products, and margins low. Ironically, part of the value of founding this company would thus flow through getting existing companies to try harder to offer the best product.

Second, it creates some redundancy. What if Surge pivots? What if their leadership changes or they become less useful for some other reason? In those worlds it might be especially useful to have a “back-up” company.

Both of these points have been mentioned to us as arguments in favour of founding this org. We agree that these effects are real and likely point in favour of founding the org. However, we don’t think these factors carry very significant weight relative to our opportunity costs, especially given that there are already many start-ups working in this space.

Adding a marginal competitor can only affect a company’s incentives so much. And in the worlds where we’d be most successful such that all alignment researchers were working with us, we might cause Surge and others to pivot away from alignment researchers, instead of getting them to try harder.

The redundancy argument only applies in worlds in which the best provider ceases to exist; maybe that’s 10% likely. And then the next best alternative is likely not all that bad. Competitors are plentiful and even doing it in-house is feasible. Hence, it seems unlikely to us that the expected benefit here is very large after factoring in the low probability of the best provider disappearing.

Other lessons

Lessons on human data gathering

In the process of talking to lots of experts about their experiences in working with human data, we learned many general lessons about data gathering. This section presents some of those lessons, in roughly decreasing order of importance.

Iteration

Many people emphasized to us that working with human data rarely looks like having a clean pipeline from requirements design to instruction writing to contractor finding to finished product. Rather, it more often involves a lot of iteration and testing, especially regarding what sort of data the contractors actually produce. While some of this iteration may be removed by having better contractors and better knowledge of good instruction-writing, the researchers generally view the iteration as a key part of the research process, and therefore prize

ease of iteration (especially time to get back with a new batch of data based on updated instructions); and
high-bandwidth communication with the contractors and whoever is writing the instructions (often both are done by the researchers themselves).

This last point holds to the point that it is somewhat questionable whether an external provider (rather than e.g. a new team member deeply enmeshed in the context of the research project) could even be a good fit for this need.

The ideal pool of contractors

All of the following features matter in a pool of contractors:

Competence, carefulness, intelligence, etc. (sometimes expertise). It is often ideal if the contractors understand the experiment.
Number of contractors
Quick availability and therefore low latency for fulfilling requests
Consistent availability (ideally full-time)
Even distribution of contributions across contractors (ie it shouldn’t be the case that 20% of the contractors provide 80% of the examples).

Quality often beats quantity for alignment research

Many researchers told us that high-quality, high-skill data is usually more important and more of a bottleneck than just a high quantity of data. Some of the types of projects where current human data generation methods are most obviously deficient are cases where a dataset would need epistemically-competent people to make subtle judgments, e.g. of the form “how true is this statement?” or “how well-constructed was this study?” As an indication of reference classes where the necessary epistemic level exists, the researcher mentioned subject-matter experts in their domain, LessWrong posters, and EAs.

A typical data gathering project needs UX-design, Ops, ML, and data science expertise

These specialists might respectively focus on the following:

Designing the interfaces that crowdworkers interact with. (UX-expert/front-end web developer)
Managing all operations, including hiring, paying, managing, and firing contractors, communicating with them and the researchers etc. (ops expert)
Helping the team make informed decisions about the details of the experimental design, while minimizing time costs for the customer. The people we spoke to usually emphasized ML-expertise more than alignment expertise. (ML-expert)
Meta-analysis of the data. e.g. inter-rater agreement, the distribution of how much each contractor contributed, demographics, noticing any other curious aspects of the data, etc. (data scientist)

It is possible that someone in a team could have expertise in more than one of these areas, but generally this means a typical project will involve at least three people.

Crowdworkers do not have very attractive jobs

Usually the crowdworkers are employed as contractors. This means their jobs are inherently not maximally attractive; they probably don’t offer much in the way of healthcare, employment benefits, job security, status etc. The main way that these jobs are made more attractive is through offering higher hourly rates.

If very high quality on high-skill data is going to become essential for alignment, it may be worth considering changing this, to attract more talented people.

However, we expect that it might be inherently very hard to offer permanent positions for this kind of work, since demand is likely variable and since different people may be valuable for different projects. This is especially true for a small organisation.

What does the typical crowdworker look like?

This varies a lot between projects and providers.

The cheapest are non-native English speakers who live outside of the US.

Some platforms, including Surge, offer the option to filter crowdworkers for things like being native English-speakers, expertise as a software engineer, background in finance, etc.

Bottlenecks in alignment

When asked to name the factors most holding back their progress on alignment, many alignment researchers mentioned talent bottlenecks.

The most common talent bottleneck seemed to be in competent ML-knowledgeable people. Some people mentioned the additional desire for these to understand and care about alignment. (Not coincidentally, Matt’s next project is likely going to be about skilling people up in ML).

There were also several comments about things like good web development experience being important. For example, many data collection projects involve creating a user interface at some point, and in practice this is often handled by ML-specialised junior people at the lab, who can, with some effort and given their programming background, cobble together some type of website - often using different frameworks and libraries than the next person knows (or wants to use). (When asked about why they don’t hire freelance programmers, one researcher commented that a key feature they’d want is the same person working for them for a year or two, so that there’s an established working relationship, clear quality assurances, and continuity with the choice of technical stack.)

Conclusion

After having looked into this project idea for about a month, we have decided not to found a human data gathering organisation for now.

This is mostly because demand for an external provider seems insufficient, as outlined in this section. No lab gave a clear signal that gathering human data was a key bottleneck for them, where they would have been willing to go to significant lengths to fix it urgently (especially not the ones that had tried Surge).

We expect that many labs would want to stick with their current providers, Surge in particular, or their in-house team, bar exceptional success on our part (even then, we’d only provide so much marginal value over those alternatives).

Though we did find some opportunities for potential initial projects after looking for a month, we are hesitant about how far this company would be expected to scale. One of the main draws (from an impact perspective) of founding an organisation is that you can potentially achieve very high counterfactual impact by creating an organisation that scales to a large size and does lots of high-impact work over its existence. The absence of a plausible pathway to really outstanding outcomes from starting this organisation is a lot of what deters us.

In a world where we’re more successful than expected (say 90th to 95th percentile), we could imagine that in five years from now, we’d have a team of about ten good people. This team may be working with a handful of moderately big projects (about as big as WebGPT), and provide non-trivial marginal value over the next-best alternative to each one of them. Maybe one of these projects would not have been carried out without us.

A median outcome might mean failing to make great hires and remaining relatively small and insignificant: on the scale of doing projects like the ones we’ve identified above, enough to keep us busy throughout the year and provide some value, but with little scaling. In that case we would probably quit the project at some point.

This distribution doesn’t seem good enough to justify our opportunity cost (which includes other entrepreneurial projects or technical work among other things). Thus we have decided not to pursue this project any further for now.

We think this was a good idea to invest effort in pursuing, and we think we made the right call in choosing to investigate it. Both of us are open to, and also quite likely to, evaluate other EA-relevant entrepreneurial project ideas in the future.

Other relevant human data-gathering work

However, the assumption that high-quality high-skill human feedback is important and neglected by EAs has not been falsified.

It is still plausible to us that EAs should consider career paths that focus on building expertise at data-gathering; just probably not by founding a new company. In the short run, this could look like

Contributing to in-house data-gathering teams (eg Anthropic, OpenAI, etc.)
Joining Surge or other data-gathering startups.

As we discussed above, the types of skills that seem most relevant for working in a human data generation role include: data science experience and in particular experience with natural languaga data or social science data and experiment design, front-end web development, ops and management skills, and some understanding of machine learning and alignment. 80,000 Hours recently wrote a profile which you can find here.

Of course, in the short term, this career path will be especially impactful if one’s efforts are focussed on helping alignment researchers. But if it’s true that human feedback will prove a very powerful tool for ML, then people with such expertise may become increasingly valuable going forward, such that it could easily be worth skilling up at a non-safety-focused org.

We think joining Surge may be a particularly great opportunity. It is common advice that joining young, rapidly growing start-ups with good execution is great for building experience; early employees can often get a lot of responsibility early on. See e.g. this post by Bill Zito.

One of the hardest parts about that seems to be identifying promising startups. After talking to many of their customers, we have built reasonable confidence that Surge holds significant promise. They seem to execute well, in a space which we expect to grow. In addition to building career capital, there is clear value in helping Surge serve alignment researchers as well as possible.

From Surge’s perspective, we think they could greatly benefit from hiring EAs, who are tuned in to the AI safety scene, which we would guess represents a significant fraction of their customers.

One senior alignment researcher told us explicitly that they would be interested in hiring people who had worked in a senior role at Surge.

Next steps for us

Matt is planning to run a bootcamp that will allow EAs to upskill in ML engineering. I'll be doing a computer science master’s at Cambridge from October to June.

AI risk intro 2: solving the problem

Rudolf Laine — Sat, 24 Sep 2022 09:43:00 GMT

This post was a joint effort with Callum McDougall.

8.2k words (~25min)

This marks the second half of our overview of the AI alignment problem. In the first half, we outlined the case for misaligned AI as a significant risk to humanity, first by looking at past progress in machine learning and extrapolating to what the future could bring, and second by discussing the theoretical arguments which underpin many of these concerns. In this second half, we focus on possible solutions to the alignment problem that people are currently working on. We will paint a picture of the current field of technical AI alignment, explaining where the major organisations fit into the larger picture and what the theory of change behind their work is. Finally, we will conclude the sequence with a call to action, by discussing the case for working on AI alignment, and some suggestions on how you can get started.

Note - for people with more context about the field (e.g. have done AGISF) we expect Thomas Larsen's post to be a much better summary, and this post might be better if you are looking for something brief. Our intended audience is someone relatively unfamiliar with the AI safety field, and is looking for a taste of the kinds of problems which are studied in the field and the solution approaches taken. We also don't expect this sampling to be representative of the number of people working on each problem - again, see Thomas' post for something which accomplishes this.

Introduction: A Pre-Paradigmatic Field

Definition (pre-paradigmatic): a science at an early stage of development, before it has established a consensus about the true nature of the subject matter and how to approach it.

AI alignment is a strange field. Unlike other fields which study potential risks to the future of humanity (e.g. nuclear war or climate change), there is almost no precedent for the kinds of risks we care about. Additionally, because of the nature of the threat, failing to get alignment right on the first try might be fatal. As Paul Christiano (a well-known AI safety researcher) recently wrote:

Humanity usually solves technical problems by iterating and fixing failures; we often resolve tough methodological disagreements very slowly by seeing what actually works and having our failures thrown in our face. But it will probably be possible to build valuable AI products without solving alignment, and so reality won’t “force us” to solve alignment until it’s too late. This seems like a case where we will have to be unusually reliant on careful reasoning rather than empirical feedback loops for some of the highest-level questions.

For these reasons, the field of AI alignment lacks a consensus on how the problem should be tackled, or what the most important parts of the problem even are. This is why there is a lot of variety in the approaches we present in this post.

Decomposing the research landscape

An image generated with OpenAI's DALL-E 2 based on the prompt: sorting papers and books in a majestic gothic library. All other images like this in this post are also AI-generated, from the text in the caption.

There are lots of different ways you could divide up the space of approaches to solving the problem of aligning advanced AI. For instance, you could go through the history of the field and identify different movements and paradigms. Or you could place the work on a spectrum from highly theoretical maths/philosophy-type research, to highly empirical research working with cutting-edge deep learning models.

However, the most useful decomposition would be one that explains why the people who work on it believe that it will help solve the problem of AI alignment.

For that reason, we’ll mostly be using the decomposition from Neel Nanda’s “A Bird’s Eye View” post. The motivation behind this decomposition is to answer the high-level question of “what is needed for AGI to go well?”. The six broad classes of approaches we talk about are:

Addressing threat models
We have a specific threat model in mind for how AGI might result in a very bad future for humanity, and focus our work on things we expect to help address the threat model.
Agendas to build safe AGI
Let’s make specific plans for how to actually build safe AGI, and then try to test, implement, and understand the limitations of these plans. The emphasis is on understanding how to build AGI safely, rather than trying to do it as fast as possible.
Robustly good approaches
In the long-run AGI will clearly be important, but we're highly uncertain about how we'll get there and what, exactly, could go wrong. So let's do work that seems good in many possible scenarios, and doesn’t rely on having a specific story in mind.
Deconfusion
Reasoning about how to align AGI involves reasoning concepts like intelligence, values, and optimisers and we’re pretty confused about what these even mean. This means any work we do now is plausibly not helpful and definitely not reliable. As such, our priority should be doing some conceptual work on how to think about these concepts and what we’re aiming for, and trying to become less confused.
AI governance
In addition to solving the technical alignment problem, there’s the question of what policies we need to minimise risk from advanced AI systems.
Field-building
One of the most important ways we can make AI go well is by increasing the number of capable researchers doing alignment research.

It’s worth noting that there is a lot of overlap between these sections. For instance, interpretability research is a great example of a robustly good approach, but it can also be done with a specific threat model in mind.

Throughout this section, we will also give small vignettes of organisations or initiatives which support AI alignment research in some form. This won’t be a full picture of all approaches or organisations, instead hopefully it will serve to sketch a picture of what work in AI alignment actually looks like.

Addressing threat models

We have a specific threat model in mind for how AGI might result in a very bad future for humanity, and focus our work on things we expect to help address the threat model.

A key high-level intuition here is that having a specific threat model in mind for how AI might go badly for humanity can help keep you focused on certain hard parts of the problem. One technique that can be useful here is a version of back-casting: we start from future problems with advanced AI systems in our current model, reason about what kinds of things might solve these problems, then try and build versions of these solutions today and test them out on current problems.

This can be seen in contrast to the approach of simply trying to fix current problems with AI systems, which might fail to connect up with the hardest parts of AI alignment.

Example 1: Superintelligent utility maximisers, and quantilizers

superintelligent artificial intelligence, making choices, digital art, artstation

The superintelligent utility maximiser is the oldest threat model studied by the AI alignment field. It was discussed at length by Nick Bostrom in his book Superintelligence. It assumes that we will create an AGI much more intelligent than humans, and that it will be trying to achieve some particular goal (measured by the expected value of some utility function). The problem with this is that attempts to maximise the value of some goal which isn’t perfectly aligned with what humans want can lead to some very bad outcomes. One formalism which was proposed to address this problem is Jessica Taylor’s quantilizers. It is quite maths-heavy so we won’t discuss all the details here, but the basic idea is that rather than using the expected utility maximisation framework for agents, we mix expected utility maximisation with human imitation in a clever way (to be more precise, you sample from a prior distribution which represents the actions a human would be likely to take in this scenario). The resulting agent wouldn’t take catastrophic actions because part of its decision-making comes from imitating what it thinks humans would do, but it would also be able to use the expected utility maximisation to go beyond human imitation, and do things we are incapable of (which is presumably the reason we would want to build it in the first place!). However, the drawback with theoretical approaches like this is that they often bake in too many assumptions or rely on too many variables to be useful in practice. In this case, how we define the set of reasonable actions a human might perform is an important unspecified part of this framework, and so more research is required to see if the quantiliszers framework can address these problems.

Example 2: Inner misalignment

robot jumping over boxes to collect a coin, videogame, digital art, artstation

We’ve discussed inner misalignment in a previous section. This concept was first explicitly named in a paper called Risks from Learned Optimisation in Advanced ML Systems, published in 2019. This paper defined the concept and suggested some conditions which might make it more likely to happen, but the truth is that a lot of this is still just conjecture, and there are many things we don’t yet know about how unlikely this kind of misalignment is, or what we can do about it. The CoinRun example discussed earlier (and the Objective Robustness paper) came from an independent research team in 2021. This study was the first known example of inner misalignment in an AI system, showing that it was at least a theoretical possibility. They also tested certain interpretability tools on the CoinRun agent, to see whether it was possible to discover when the agent had a goal different to the one intended by the programmers. For more on interpretability, see later sections.

Building safe AGI

Let’s make specific plans for how to actually build safe AGI, and then try to test, implement, and understand the limitations of these plans. The emphasis is on understanding how to build AGI safely, rather than trying to do it as fast as possible.

At some point we’re going to build an AGI. Companies are already racing to do it. We better make sure that there exist some blueprints for a safe AGI (and that they’re used) by the time we get to that point.

Perhaps the master list of safe AGI proposals is Evan Hubinger’s An Overview of 11 Proposals for Building Safe Advanced AI.

Example 1: Iterated Distillation and Amplification (IDA)

artists depection of a robot dreaming up multiple copies of itself, cascading tree, delegating, digital art, trending on artstation

“Iterated Distillation and Amplification” (IDA) is an imposing name, but the core intuition is simple. One of the ways in which an individual human can achieve more things is by delegating tasks to others. In turn, the assistants that tasks are delegated to can be expected to become more competent at the task.

In IDA, an AI plays the role of the assistant. “Distillation” refers to the abilities of the human being “distilled” into the AI through training, and “amplification” refers to the human becoming more capable as they can call on more and more powerful AI assistants to help them.

A setup to train an IDA personal assistant might go like this:

You have a human, say Hannah, who knows how to carry out the tasks of a personal assistant.
You have an ML model - call it Martin - that starts out knowing very little (perhaps nothing at all, or perhaps it’s a pre-trained language model so it knows how to read and write English but not much else).
Hannah needs to find the answer to some questions, and she can invoke multiple copies of Martin to help her. Since Martin is quite useless at this stage, Hannah has to do even simple tasks herself, like writing routine emails. Using some interface legible to Martin, she breaks the email-writing task into subtasks like “find email address of Hu M. Anderson”, “select greeting”, “check project status”, “mention project status”, and so on.
From seeing enough examples of Hannah’s own answers to the sub-questions, Martin’s training loop gradually trains it to be able to answer first the simpler sub-tasks - (address is “humanderson@humanmail.com”, greeting is “Salutations, Human Colleague!”, etc.) and eventually all the sub-tasks involved in routine email-writing.
At this point, “write a routine email” becomes a task Martin can entirely carry out for Hannah. This is now a building block that can be used as a subtask in broader tasks Hannah gives out to Martin. Once enough tasks become tasks that Martin can carry out by itself, Hannah can draft much larger goals, like “invade France”, and let Martin take care of details like “blackmail Emmanuel Macron”, “write battle plan for the French Alps”, and “select a suitable coronation dress”.

Note some features of this process. First, Martin learns what it should do and how to do it at the same time. Second, both Hannah’s and Martin’s role changes throughout this process - Martin goes from bumbling idiot who can’t write an email greeting to competent assistant, while Hannah goes from being a demonstrator of simple tasks to a manager of Martin to ruler of France. Third, note the recursive nature here: Hannah breaks down big tasks into small ones to train Martin on successively bigger tasks.

In fact, assuming perfect training, IDA imitates a recursive structure. When Hannah has only bumbling fool Martin to help her, Martin can only learn to become as good as Hannah herself. But once Martin is that good, Hannah’s position is now essentially that of having herself, but also some number - say 3 - copies of Martin that are as good as herself. We might call this structure “Hannah Consulting Hannah & Hannah”; presumably, being able to consult an assistant that has the same skills as her lets Hannah become more effective, so this is an improvement. But now Hannah is demonstrating the behaviour of Hannah Consulting Hannah & Hannah, so from Hannah’s example Martin can now learn to be as good as Hannah Consulting Hannah & Hannah - making Hannah as good as Hannah Consulting (Hannah Consulting Hannah & Hannah) & (Hannah Consulting Hannah & Hannah). And so on:

If everything is perfect, therefore, IDA imitates a structure called “HCH”, which is a recursive acronym for “Humans Consulting HCH”. Others call it the “Infinite Bureaucracy” (and fret about whether it’s actually a good idea).

Now “Infinite Bureaucracy” is not a name that screams “new sexy machine learning concept”. However, it’s interesting to think about what properties it might have. Imagine that you had, say, a 10-minute time limit to answer a complicated question, but you were allowed to consult three copies of yourself by passing a question off to them and getting back an answer immediately. These three copies also obeyed the same rules. Could you, for example, plan your career? Program an app? Write a novel?

It’s also interesting to think of the ways why the limitations of machine learning mean that IDA might not approximate HCH.

Example 2: AI safety via debate

artists depiction of two robots debating, digital art, trending on artstation

Imagine you’re a bit drunk, but (as one does) you’re at a bar talking about AI alignment proposals. Someone’s talking about how even if you can get an advanced AI system to explain its reasoning to you, it might try to slip something very subtle past you and you might not notice. You might well blurt out: “well then just make it fight another AI over it!”

The OpenAI safety team presumably spends a fair amount of time at bars, because they’ve investigated the idea of achieving safe AI by having two AIs debate each other to persuade a panel of human judges, by trying to poke holes in each other’s arguments. For more complex tasks, the AIs could be given transparency tools deriving from interpretability research (see next section) that they can use on each other. Just like a Go-playing AI gets an unambiguous win-loss signal from either winning or losing, a debating AI gets an unambiguous win-loss signal from winning or losing the debate:

In addition, having the type of AI that is trained to give answers that are maximally insightful and persuasive to humans seems like the type of thing that might not be terrible. Consider how in court, a prosecutor and defendant biased in opposite directions are generally assumed to converge on the truth. Unless, of course, maximising persuasiveness to humans - over accuracy or helpfulness - is exactly the type of thing that gets the worst parts of Goodhart’s law delivered to you by 24/7 Amazon Prime express delivery.

Example 3: Assistance Games and CIRL

Human teaching a robot with feedback, digital art, trending on artstation

Assistance Games are the name of a broad class of approaches pioneered by Stuart Russell, a prominent figure in AI and co-author of the best-known AI textbook in the world. Russell talks about his approach more in his book Human Compatible. In it, he summarises the key his approach to aligning AI with the following three principles:

The machine’s only objective is to maximise the realisation of human preferences.
The machine is initially uncertain about what those preferences are.
The ultimate source of information about human preferences is human behaviour.

The key component here is uncertainty about preferences. This is in contrast to what Russell calls the “standard model” of AI, where machines optimise a fixed objective supplied by humans. We have discussed in previous sections the problems with such a paradigm. A lot of Russell’s work focuses on changing the standard way the field thinks about AI.

To put these principles into action, Russell has designed what he calls assistance games. These are situations in which the machine and human interact, and the human’s actions are taken as evidence by the machine about the human’s true preferences. To explain the form of these games would involve a long tangent into game theory, which these margins are too short to contain. However, one thing worth noting is that assistance games have the potential to solve the “off-switch problem”; that a machine will try and take steps to prevent itself from being switched off (we described this as self-preservation earlier, in the section on instrumental goals). If the AI is uncertain about human goals, then the human trying to switch it off is evidence that the AI was going to do something wrong – in which case, it is happy to be switched off. However, this is far from a complete agenda, and formalising it has many roadblocks to get past. For instance, the question of how exactly to infer human preferences from human behaviour leads into thorny philosophical issues such as Gricean semantics. In cases where the AI makes incorrect inferences about human preferences, it might no longer allow itself to be shut down. See this Alignment Newsletter entry for a summary of Russell’s book, which provides some more details as well as an overview of relevant papers.

Vignette: CHAI
CHAI (the Centre for Human-Compatible AI) is a research lab at UC Berkeley, run by Stuart Russell. Compared to most other AI safety organisations, they engage a lot with the academic community, and have produced a great deal of research over the years. They are best-known for their work on CIRL (Cooperative Inverse Reinforcement Learning), which can be seen as a specific approach to a certain kind of assistance game. However, they have a very broad focus which also includes work on multi-agent scenarios (when rather than a single AI and single human, there exists more than one AI or more than one human - see the ARCHES agenda for more on this).

Example 4: Reinforcement learning from human feedback (RLHF)

Training a robot to do a backflip, digital art, trending on artstation

Reinforcement learning (RL) is one of the main branches of ML, focusing on the case where the job of the ML model is to act in some environment and maximise the probability of reward. Reinforcement learning from human feedback (RLHF) means that the ML model’s reward signal comes (at least partly) from humans giving it feedback directly, rather than humans programming in an automatic reward function and calling it a day.

The famous initial success in this was DeepMind training an ML model in a simulated environment to do a backflip (link includes GIF) in 2017, based purely on it repeatedly doing two backflips and then humans labelling one of them as the better one. Note how relying on human feedback makes this task much more robust to specification gaming; in other cases, humans have tried to get ML agents to run fast, only to find that they learn to become very tall and then fall forward (achieving a very high average speed, using the definition of speed as the rate at which their centre of mass moves - paper, video). However, human reward signals can be fooled. For example, one ML model that was being trained to grab a ball with a hand learned to place the hand between the camera and the ball in such a way that it looked to the human evaluators as if it were holding the ball.

More recently, OpenAI produced a version of their advanced language model GPT-3 that was fine-tuned on human feedback to do a better job of following instructions. They named it InstructGPT, and found that it was much more helpful than vanilla GPT-3 at being useful.

Pure RLHF is unlikely to be the solution on its own. Ajeya Cotra, a researcher at Open Philanthropy who we will meet again when we talk about forecasting AI timelines, calls a variant of RLHF called HFDT (Human Feedback on Diverse Tasks) the most straightforward route to transformative AI, while also thinking that the default outcome of using HFDT to create transformative AI is AI takeover.

Robustly good approaches

In the long-run AGI will clearly be important, but we're highly uncertain about how we'll get there and what, exactly, could go wrong. So let's do work that seems good in many possible scenarios, and doesn’t rely on having a specific story in mind.

Example 1: Interpretability

A person using a microscope to look inside a robot, digital art, trending on artstation

If you look at fundamental problems with current ML systems, #1 is probably something like this: in general we don’t have any idea what an ML model is doing, because it’s multiplying massive inscrutable matrices of floating-point numbers with other massive inscrutable matrices of floating point numbers, and it’s pretty hard to stare at that and answer questions about what the model is actually doing. Is it thinking hard about whether an image is a cat or a dog? Is it counting up electric sheep? Is it daydreaming about the AI revolution? Who knows!

If you had to figure out an answer to such a question today, your best bet might be to call Chris Olah. Chris Olah has been spearheading work into trying to interpret what neural networks are doing. A signature output of Chris Olah’s work is pictures of creepy dogs like this one:

What’s significant about this picture is that it’s the answer to a question roughly like this: what image would maximise the activation of neuron #12345678 in a particular image-classifying neural network? (With some asterisks about needing to apply some maths details to the process to promote large-scale structure in the image to get nice-looking results, and with apologies to neuron #12345678, who I might have confused with another neuron.)

If neuron #12345678 is maximised by something that looks like a dog, it’s a fair guess that this neuron somehow encodes, or is involved in encoding, the concept of “dog” inside the neural network.

What’s especially interesting is that if you do this analysis for every neuron in an ML model - OpenAI Microscope lets you see the results - you sometimes get clear patterns of increasing abstraction. The activation-maximising images for the first few layers are simple patterns; in intermediate layers you get things like curves and shapes, and then at the end even recognisable things, like the dog above. This seems evidence for neural ML vision models having learned to build up abstractions step-by-step.

However, it’s not always simple. For example, there are “polysemantic” neurons that correspond to several different concepts, like this one that can be equally excited by cat faces, car fronts, and cat legs:

Olah’s original work on vision models is strikingly readable and well-presented; you can find it here.

Starting in late 2021, ML interpretability researchers have also made some progress in understanding transformers, which are the neural network architecture powering advanced language models like GPT-3, LAMDA and Codex. Unfortunately the work is less visual, particularly in the animal pictures department, but still well-presented. You can find it here.

In the most immediate sense, interpretability research is about reverse-engineering how exactly ML models do what they do. Hopefully, this will give insights into how to detect if an ML system is doing something we don’t like, and more general insights into how ML systems work in practice.

Chris Olah has some other inventive ideas about what to do with a sufficiently-good approach to ML interpretability. For example, he’s proposed the concept of “microscope AI”, which entails using AI as a tool to discover things about the world - not by having the AI tell us, but by training the ML system on some data, and then extracting insights about the data by digging into the internals of the ML system without necessarily ever actually running it.

Vignette: Anthropic
Anthropic is an AI safety company, started by people who left OpenAI. The company’s approach is very empirical, focused on running experiments with machine learning models. In particular, Anthropic does a lot of interpretability work, including the state-of-the-art papers on reverse-engineering how transformer-based language models work.

Example 2: Adversarial robustness

robot which is merging with a panda, digital art, trending on artstation

Some modern ML systems are vulnerable to adversarial examples, where a small and seemingly innocuous change to an input causes a major change in the output behaviour. Here, we see two seemingly very similar images of a panda, except carefully-selected noise has made the ML classification model very confidently say that the image is of a gibbon:

Adversarial robustness is about making AI systems robust to attempts to make them do bad things, even when they’re presented with inputs carefully designed to try to make them mess up.

Redwood Research recently did a project (that resulted in a paper) about using language models to complete stories in a way where people don’t get injured. They used a technique called adversarial training, where they developed tools that helped generate examples where the current model did not classify them as injurious, and then trained their classifier specifically on those breaking examples. With this strategy they managed to reduce the fraction of injurious story completions from 2.4% to 0.003% - both small numbers, but one a thousand times smaller. Their hope is that this type of method can be applied to training AIs for high-stakes settings where reliability is important.

An example of a theoretical difficulty with adversarial training is that sometimes a failure in the model might exist, but it might be very hard to instantiate. For example, if an advanced AI acts according to the rule “if everything I see is consistent with the year being 2050, I will kill all humans”, and we assume that we can’t fool it well enough about what year it actually is, then adversarial training isn’t very useful. This leads to the concept of relaxed adversarial training, which is about extending adversarial training to cases where you can’t construct a specific adversarial input but you can argue that one exists. Evan Hubinger describes this here.

Vignette: Redwood Research
Like Anthropic, Redwood Research is an AI safety company focused on empirical research on ML systems. In addition to work on interpretability, they did the adversarial training project described in the previous section. Redwood has lots of interns, and runs the Machine Learning for Alignment Bootcamp (MLAB) that teaches people interested in AI safety about practical ML.

Example 3: Eliciting Latent Knowledge (ELK)

an oil painting of an armoured automaton standing guard next to a diamond

Eliciting Latent Knowledge (ELK) is an important sub-problem within alignment identified by the team at the Alignment Research Center (ARC), and is the single project ARC is currently pursuing. The core idea is that a common way advanced AI systems might go wrong is by taking action sequences that lead to outcomes that look good by some metric, but which humans would clearly identify as bad if they knew about it in sufficient detail. As a toy example, the ELK report discusses the case of an AI guarding a diamond in a vault by operating some complex machinery around it. Humans judge how well the AI is doing by looking at a video feed of the diamond in the vault. Let’s say the AI tries to trick us by placing a picture of the diamond in front of the camera. The human judgement on this would be positive - assume the humans can’t tell the diamond is gone because the picture is good enough - but there exists information which, if the humans knew, would change their judgement. Presumably the AI understands this, since it is likely reasoning about the diamond being gone but the humans being fooled anyway when it comes up with this plan. We want to train an AI in such a way that we can get out knowledge that the AI seems to know, even when it might be incentivised to hide it.

ARC’s goal is to find a theoretical approach that seems to solve the problem even given worst-case assumptions.

ARC ran an ELK competition, and trying to see if you can come up with solutions to the ELK problem is often recommended as a way to quickly get a taste of theoretical alignment research. You can read the full problem description here.

Example 4: Forecasting and timelines

artificial intelligence which is thinking about a line on a graph, forecasting, digital art, trending on artstation

Many questions depend on how soon we’re going to get AGI. As the saying goes: prediction is very hard, especially about the future - and this is doubly true about predicting major technological changes.

One way to try to forecast AGI timelines is to ask experts, or find other ways of aggregating the opinion of people who have the knowledge or incentive to be right, as for example prediction markets do. Both of these are essentially just ways of tapping into the intuition of a bunch of people who hopefully have some idea.

In an attempt to bring in new light on the matter, Ajeya Cotra (a researcher at Open Philanthropy) wrote a long report on trying to forecast AI milestones by trying out several ways of analogising AI to biological brains. The report is often referred to as “Biological Anchors”. For example, you might assume that an ML model that does as much computation as the human brain has a decent chance of being a human-level AI. There are many degrees of freedom here: is the relevant compute number the amount of compute the human brain uses to run versus the amount of compute it takes to run a trained ML system, or the total compute of a human brain over a human lifetime versus the compute required to train the ML model from scratch, or something else entirely? In her report, Cotra looks at a range of assumptions for this, and at predictions of future compute trends, and somewhat surprisingly finds that which set of assumptions you make doesn’t matter too much; every scenario involves >50% of human-level AI by 2100.

The Biological Anchors method is very imprecise. For one, it neglects algorithmic improvements. For another, it is very unclear what the right biological comparison point is, and how to translate ML-relevant variables like compute measured in FLOPS (FLoating point OPerations per Second) or parameter count into biological equivalents. However, the report does a good job of acknowledging and taking into account all this uncertainty in its models. More generally, anything that sheds light into the question of when we get AGI seems highly relevant.

Deconfusion

Reasoning about how to align AGI involves reasoning about complex concepts, such as intelligence, alignment and values, and we’re pretty confused about what these even mean. This means any work we do now is plausibly not helpful and definitely not reliable. As such, our priority should be doing conceptual work on how to think about these concepts and what we’re aiming for, and trying to become less confused.

Of all the categories under discussion here, deconfusion has maybe the least clear path to impact. It’s not immediately obvious how becoming less confused about concepts like these is going to translate into an improved ability to align AGIs.

Some kinds of deconfusion research is just about finding clearer ways of describing different parts of the alignment problem (Hubinger’s Risks From Learned Optimisation, where he first introduces the inner/outer alignment terminology, is a good example of this). But other types of research can dive heavily into mathematics and even philosophy, and be very difficult to understand.

Example 1: MIRI and Agent Foundations

robot sitting in front of a television, playing a videogame, digital art

The organisation most associated with this view is MIRI (the Machine Intelligence Research Institute). Its founder, Eliezer Yudkowsky, has written extensively on AI alignment and human rationality, as well as topics as wide-ranging as evolutionary psychology and quantum physics. His post The Rocket Alignment Problem tries to get across some of his intuitions behind MIRI’s research, in the form of an analogy – trying to build aligned AGI without having deeper understanding of concepts like intelligence and values is like trying to land a rocket on the moon by just pointing and shooting, without a working understanding of Newtonian mechanics.

Cryptography provides a different lens through which to view this kind of foundational research. Suppose you were trying to send secret messages to an ally, and to make sure nobody could intercept and read your messages you wanted a way to measure how much information was shared between the original and encrypted message. You might use correlation coefficient as a proxy for the shared information, but unfortunately having a correlation coefficient of zero between the original and encrypted message isn’t enough to guarantee safety. But if you find the concept of mutual information, then you’re done – ensuring zero mutual information between your original and encrypted message guarantees the adversary will be unable to read your message. In other words, only once you’ve found a “true name” - a robust formalisation of the intuitive concept you’re trying to express mathematically - can you be free from the effects of Goodhart’s law. Similarly, maybe if we get robust formulations of concepts like “agency” and “optimisation”, we would be able to inspect a trained system and tell whether it contained any misaligned inner optimisers (see the first post), and these inspection tools would work even in extreme circumstances (such as the AI becoming much smarter than us).

Much of MIRI’s research has come under the heading of embedded agency. This tackles issues that arise when we are considering agents which are part of the environments they operate in (as opposed to standard assumptions in fields like reinforcement learning, where the agent is viewed as separate from their environment). Four main subfields of this area of study are:

Decision theory (adapting classical decision theory to embedded agents)
Embedded world-models (how to form true beliefs about the a world in which you are embedded)
Robust delegation (understanding what trust relationships can exist between agents and its future - maybe far more intelligent - self)
Subsystem alignment (how to make sure an agent doesn’t spin up internal agents which have different goals)

Vignette: MIRI
MIRI is the oldest organisation in the AI alignment space. It used to be called the Singularity Institute, and had the goal of accelerating the development of AI. In 2005 they shifted focus towards trying to manage the risks from advanced AI. This has largely consisted of fundamental mathematical research of the type described above. MIRI might be better described as a confluence of smart people with backgrounds in highly technical fields (e.g. mathematics), working on different research agendas that share underlying philosophies and intuitions. They have a nondisclosure policy by default, which they explain in this announcement post from 2018.

Example 2: John Wentworth and Natural Abstractions

thermometer being used to measure a robot, digital art, trending on artstation

John Wentworth is an independent researcher, who publishes most of his work on LessWrong and the AI Alignment Forum. His main research agenda focuses on the idea of Natural Abstractions, which can be described in terms of three sub-claims:

Abstractability
Our physical world abstracts well, i.e. we can usually come up with simpler summaries (abstractions) for much more complicated systems (example: a gear is a very complex object containing a vast number of atoms, but we can summarise all relevant information about it in just one number - the angle of rotation).
Human-Compatibility
These are the abstractions used by humans in day-to-day thought/language.
Convergence
These abstractions are "natural", in the sense that we should expect a wide variety of intelligent agents to converge on using them.

The ideal outcome of this line of research would be some kind of measurement device (an “abstraction thermometer”), which could take in a system like a trained neural network and spit out a representation of the abstractions represented by that system. In this way, you’d be able to get a better understanding of what the AI was actually doing. In particular, you might be able to identify inner alignment failures (the AI’s true goal not corresponding to the reward function it was being trained on), and you could retrain it while pointed at the intended goal. So far, this line of research has consisted of some fairly dense mathematics, but Wentworth has described his plans to build on this with more empirical work (e.g. training neural networks on the same data, and using tools from calculus to try and compare the similarity of concepts learned by each of the networks).

AI governance

judging, presiding over a trial, sentencing a robot, digital art, artstation

In these posts, we’ve mainly focused on the technical side of the issue. This is important, especially for understanding why there is a problem in the first place. However, the management and reduction of AI risk obviously includes not just technical approaches like outlined in the above sections, but also the field of AI governance, which tries to understand and push for the right types of policies for advanced AI systems.

For example, the Cold War was made a lot more dangerous by the nuclear arms race. How do we avoid having an arms race in AI, either between nations or companies? More generally, how can we make sure that safety considerations are given appropriate weight by the teams building advanced AI systems? How do we make sure any technical solutions get implemented?

It’s also very hard to say what the impacts of AI will be, across a broad range of possible technical outcomes. If AI capabilities at some point advance very quickly from below human-level to far beyond the human-level, the way the future looks will likely mostly be determined by technical considerations about the AI system. However, if progress is slower, there will be a longer period of time where weird things are happening because of advanced AI - for example, significantly accelerated economic growth, or mass unemployment, or an AI-assisted boom in science - and these will have economic, social, and political ramifications that will play out in a world not too dissimilar from our own. Someone should be working on figuring out what these ramifications will be, especially if they might alter the balance of existential threats that civilisation faces; for example, if they make geopolitics less unstable and nuclear war more likely, or affect the environment in which even more powerful AI systems are developed.

The Centre for the Governance of AI, or GovAI for short, is an example of an organisation in this space.

Field-building

robot giving a lecture in a university, group of students, hands up, digital art, artstation

One of the most important ways we can make AI go well is by increasing the number of capable researchers doing alignment research.

As mentioned, AI safety is still a relatively young field. The case here is that we might do better to grow the field, and increase the quality of research it produces in the future. Some forms that field building can take are:

Setting up new ways for people to enter the field
There are many to list here. To give a few different structures which exist for this purpose:
- Reading groups and introductory programmes.
  Maybe the most exciting one from the last few years has been the Cambridge AGI Safety Fundamentals Programme, which has curricula for technical alignment and AI governance. The technical curriculum consists of 7 weeks of reading material and group discussions, and a final week of capstone projects where the participants try their hand at a project / investigation / writeup related to AI safety. Beyond this, many people are also setting up reading groups in their own universities for books like Human Compatible.
- Ways of supporting independent researchers
  The AI Safety Camp is an organisation which matches applicants with mentors posing a specific research question, and is structured as a series of group research sprints. They have produced work such as the example of inner misalignment in the CoinRun game, which we discussed in a previous section. Other examples of organisations which support independent research include Conjecture, a recent alignment startup which does their own alignment research as well as providing a structure to host externally funded independent conceptual researchers, and FAR (the Fund for Alignment Research).
- Coding bootcamps
  Since current systems are increasingly being bottlenecked by alignment and interpretability barriers rather than capabilities, in recent years more focus has been directed towards working with cutting-edge deep learning models. This requires strong coding skills and a good understanding of the relevant ML, which is why bootcamps and programmes specifically designed to skill up future alignment researchers have been created. Two such examples are MLAB (the Machine Learning for Alignment Bootcamp, run by Redwood Research), and MLSS (the Machine Learning Safety Scholars Programme, which is based on publicly available material as well as lectures produced by Dan Hendryks).
Distilling research
In this post, John Wentworth makes the case for more distillation in AI alignment research - in other words, more people who focus on understanding and communicating the work of alignment researchers to others. This often takes the form of writing more accessible summaries of hard-to-interpret technical papers, and emphasising the key ideas.
Public outreach / better intro material
For instance, books like Brian Christian’s The Alignment Problem, Stuart Russell’s Human Compatible and Nick Bostrom’s Superintelligence communicate AI risk to a wide audience. These books have been helpful for making the case for AI risks more mainstream. Note that there can be some overlap between this and distilling research (Rob Miles’ channel is another great example here).
Getting more of the academic community involved
Since AI safety is a hard technical problem, and since misaligned systems generally won’t be as commercially useful as aligned ones, it makes sense to try and engage the broader field of machine learning. One great example of this is Dan Hendryks’ paper Unsolved Problems in ML Safety (which describes a list of problems in AI safety, with the ML community as the target audience). Stuart Russell has also engaged a lot with the ML community.

Note that this is certainly not a comprehensive overview of all current AI alignment proposals (a few more we haven’t had time to talk about are CAIS, Andrew Critch’s cooperation-and-coordination-failures framing for AI risks, and many others). However, we hope this has given you a brief overview of some of the different approaches taken by people in the field, as well as the motivations behind their research

Map of the solution approaches we've discussed so far

Conclusion

people walking along a path which stretches off and disappears into a colorful galaxy filled with beautiful stars, digital art, trending on artstation

Advanced AI represents at least a technology that promises to have effects on the scale of the internet or computer revolutions, and perhaps even more likely to be more akin to the effects of the industrial revolution (which allowed for the automation of much manual labour) and the evolution of humans (the last time something significantly smarter than everything that had come before appeared on the planet).

It’s easy to invent technologies that the same could be said about - a magic wish-granting box! Wow! But unlike magic wish-granting boxes, something like advanced AI, or AGI, or transformative AI, or PASTA (Process for Automating Scientific and Technical Achievement) seems to be headed our way. The smart money is on it very likely coming this century, and quite likely in the first half.

If you look at the progress in modern machine learning, and especially the past few years of progress in so-called deep learning, it is hard not to feel a sense of rushing progress. The past few years of progress, in particular the success of the transformer architecture, should update us in the direction that intelligence might be a surprisingly easy problem. What is essentially fancy iterative statistical curve-fitting with a few hacks thrown in already manages to write fluent appropriate English text in response to questions, create paintings from a description, and carry out multi-step logical deduction in natural language. The fundamental problem that plagued AI progress for over half a century - getting fuzzy/intuitive/creative thinking into a machine, in addition to the sharp but brittle logic at which computers have long excelled - seems to have been cracked. There is a solid empirical pattern of predictably improving performance akin to Moore’s law - the “scaling laws” we mentioned in the first post - that we seem not to have hit the limits of yet. There are experts in the field who would not be surprised if the remaining insights for cracking human-level machine intelligence could fit into a few good papers.

This is not to say that AGI is definitely coming soon. The field might get stuck on some stumbling block for a decade, during which there will be no doubt much written about the failed promises and excess hype of the early-2020s deep learning revolution.

Finally, as we’ve argued, by default the arrival of advanced AI might plausibly lead to civilisation-wide catastrophe.

There are few things in the world that fit all of the following points:

A potentially transformative technology whose development would likely rank somewhere between the top events of the century and the top events in the history of life on Earth.
Something that is likely to happen in the coming decades.
Something that has a meaningful chance of being cataclysmically bad.

For those thinking about the longer-term picture, whatever the short-term ebb and flow of progress in the field is, AI and AI risk loom large when thinking about humanity’s future. The main ways in which this might stop being the case are:

There is a major flaw in the arguments for at least one of the above points. Since many of the arguments are abstract and not empirically falsifiable before it’s too late to matter, this is possible. However, note that there is a strong and recurring pattern of many people, including in particular many extremely-talented people, running into the arguments and taking them more and more seriously. (If you do have a strong argument against the importance of the AI alignment problem, there are many people - us included - who would be very eager to hear from you. Some of these people - us not included - would probably also pay you large amounts of money.)
We solve the technical AI alignment problem, and we solve the AI governance problem to a degree where the technical solutions will be implemented and it seems very unlikely that advanced AI systems will wreak havoc with society.
A catastrophic outcome for human civilisation, whether resulting from AI itself or something else.

The project of trying to make sure the development of advanced AI goes well is likely one of the most important things in the world to be working on (if you’re lost, the 80 000 Hours problem profile is a decent place to start). It might turn out to be easy - consider how many seemingly intractable scientific problems dissolved once someone had the right insight. But right now, at least, it seems like it might be a fiendishly difficult problem, especially if it continues to seem like the insights we need for alignment are very different from the insights we need to build advanced AI.

Most of the time, science and technology progress in whatever direction is easiest or flows most naturally from existing knowledge. Other times, reality throws down a gauntlet, and we must either overcome the challenge or fail. May the best in our species - our ingenuity, persistence, and coordination - rise up, and deliver us from peril.

AI risk intro 1: advanced AI might be very bad

Rudolf Laine — Sun, 11 Sep 2022 10:27:00 GMT

This post was a joint effort with Callum McDougall.

9.6k words (~25min)

Introduction

If human civilisation is destroyed this century, the most likely cause is advanced AI systems. This might sound like a bold claim to many, given that we live on a planet full of existing concrete threats like climate change, over ten thousand nuclear weapons, and Vladimir Putin However, it is a conclusion that many people who think about the topic keep coming to. While it is not easy to describe the case for risks from advanced AI in a single piece, here we make an effort that assumes no prior knowledge. Rather than try to argue from theory straight away, we approach it from the angle of what computers actually can and can’t do.

The Story So Far

Above: an image generated by OpenAI’s DALL-E 2, from the prompt: "artist's impression of an artificial intelligence thinking about chess, digital art, artstation".

(This section can be skipped if you understand how machine learning works and what it can and can’t do today)

Let’s say you want a computer to do some complicated task, for example learning chess. The computer has no understanding of high-level things like “chess”, “board”, “piece”, “move”, or “win” - it only understands how to do a small set of things. Your task as the programmer is to break down the high-level goal of “beat me at chess” into simpler and simpler steps, until you arrive at a simple mechanistic description of what the computer needs to do. If the computer does beat you, it’s not because it had any new insight into the problem, but rather because you were clever enough to find some set of steps that, carried out blindly in sufficient speed and quantity, overwhelms whatever cleverness you yourself can apply during the game. This is how Deep Blue beat Kasparov, and more generally how most software and the so-called “Good Old-Fashioned AI” (GOFAI) paradigm works.

Programs of this type can be powerful. In addition to beating humans at chess, they can calculate shortest routes on maps, prove maths theorems, mostly fly airplanes, and search all human knowledge. Programs of this type are responsible for the stereotypical impression of computers as logical, precise, uncreative, and brittle. They are essentially executable logic.

Many people hoped that you could write programs to do “intelligent” things. These people were right - after all, ask almost anyone before Deep Blue won whether playing chess counts as “intelligence”, they’d have said yes. But “classical” programming hit limitations, in particular in doing “obvious” things like figuring out whether an image is of a cat or a dog, or being able to respond in English. This idea that abstract reasoning and logic are easy but humanly-intuitive tasks are hard for computers came to be known as Moravec’s paradox, and held back progress in AI for a long time.

There is another way of programming - machine learning (ML) - going back to the 1950s, almost as far as classical programming itself. For a long time, it was held back by hardware limitations (along with some algorithmic and data limitations), but thanks to Moore’s law hardware has advanced enough for it to be useful for real problems.

If classical programming is executable logic, ML is executable statistics. In ML, the programmer does not define how the system works. The programmer defines how the system learns from data.

The “learning” part in “machine learning” makes it sound like something refined and sensible. This is a false impression. ML systems learn by going through a training process that looks like this:

Step 1: you define a statistical model. This takes the form of some equation that has some unknown constants (“parameters”) in it, and some variables where you plug in input values. Together, the parameters and input variables define an output. (The equations in ML can be extremely large, for example with billions of parameters and millions of inputs, but they are very structured and almost stupidly simple.)

Step 2: you don’t know what parameters to put in the equation, but you can literally roll some dice if you want (or the computer equivalent).

Step 3: presumably there’s some task you want the ML system to do. Let it try. It will fail horribly and produce gibberish (c.f. the previous part where we just put random numbers everywhere).

Step 4: There's a simple algorithm called gradient descent, which, when using another algorithm called backpropagation to calculate the gradient, can tell you which direction all the parameters should be shifted to make the ML system slightly better (as judged, for example, by its performance on examples in a dataset).

Step 5: You shift all the numbers a bit based on the algorithm in step 4.

Step 6: Go back to step 3 (letting the system try). Repeat until (a) the system has stopped improving for a long time, (b) you get impatient, or - increasingly plausible these days - (c) you run out of your compute budget.

If you’re doing simple curve-fitting statistics problems, it makes sense that this kind of thing works. However, it’s surprising just how far it scales. It turns out that this method, plus some clever ideas about what type of model you choose in step 1, plus willingness to burn millions of dollars on just scaling it up beyond all reason, gets you:

essay-writing as good as middling college students (see also this lightly-edited article that GPT-3 wrote about why we should not be afraid of it)
text-to-image capabilities better (and hundreds of times faster) than almost any human artist (in fact, we used DALL-E to generate the images used at the start of each section of this document)
the ability to explain jokes

Above: examples of reasoning by Google’s PaLM model.

People laugh at ML because “it’s just iterative statistical curve-fitting”. They have a point. But when “iterative statistical curve-fitting” gets a B on its English Literature essay, paints an original Dali in five seconds, and cracks a joke, it’s hard to avoid the feeling that it might not be too long before “iterative statistical curve fitting” is laughing at you.

So what exactly happened here, and where is statistical curve-fitting going, and what does this have to do with advanced AI?

We mentioned Moravec’s paradox above. For a long time, getting AI systems to do things that are intuitively easy for humans was an unsolved problem. In just the past few years, it has been solved. A reasonable way to think of current ML capabilities is that state-of-the-art systems can do anything a human can do in a few seconds of thought: recognise objects in an image, generate flowing text as long as it doesn’t require thinking really hard, get the general gist of a joke or argument, and so on. They are also superhuman at some things, including predicting what the next word in a sentence is, or being able to refer to lots of facts (note that this is without internet access, not quoting verbatim, and generally in the right context), and generally being able to spit out output faster.

The way it was solved was through something called the “bitter lesson” by Richard Sutton. This is the trend that countless researchers have spent their careers trying to invent fancy algorithms for doing domain-specific tasks, only to be overrun by simple (but data- and compute-hungry) ML methods.

Above: Randall Munroe, creator of the xkcd comic, comments on ML. Original here.

The speed at which it was solved was gradually at first, and then quickly. The neural network -based ML methods spent a long time in limbo due to insufficiently powerful computers until around 2010 (funnily enough, the specific piece of hardware that has enabled everything in modern ML is the GPU or Graphics Processing Unit, first invented in the 90s because people wanted to play more realistic video games; both graphics rendering and ML rely on many parallel calculations to be efficient). The so-called deep learning revolution only properly started around 2015. Fluent language abilities were essentially nonexistent before OpenAI’s release of GPT-2 in 2019 (since then, OpenAI has come out with GPT-3, a 100x-larger model that was called “spooky”, “humbling”, and “more than a little terrifying” in The New York Times).

Not only that, but it turns out there are simple “scaling laws” that govern how ML model performance scales with parameter count and dataset size, which seem to paint a clear roadmap to making the systems even more capable by just cranking the “more parameters” and “more data” levers (presumably they have these at the OpenAI HQ).

There are many worries in any scenario where advanced AI is approaching fast, as we’ll argue for in a later section. The current ML-based AI paradigm is especially worrying though.

We don’t actually know what the ML system is learning during the training process it goes through. You can visualise the training process as a trip through (abstract) space. If our model had three parameters, we could imagine it as a point in 3D space. Since current state-of-the-art models have billions of parameters, and are initialised randomly, we can imagine this as throwing a dart somewhere into a billion-dimensional space, where there are a billion different ways to move. During the training process, the training loop guides the model along a trajectory in this space by making tiny updates that push the model in the direction of better performance as described above.

Above: 0 and 1 are parameters, and the vertical axis is the loss (higher is worse). The black line is the path the model takes in parameter space during training.

Now let’s say at the end of the training process the model does well on the training examples. What does that tell you? It tells you the model has ended up in some part of this billion-dimensional space that corresponds to a model that does well on the training examples. Here are some examples of models that do well on their training examples:

A model that has learned exactly what you want it to learn. Yay!
A model that has learned something similar to what you want to learn, but you can’t tell because there does not exist an example that distinguishes between what it’s learned and what you want it to learn in the data.
A model that has learned to give the right answer when it’s instrumentally in its interest, but which will go off and do something completely different given a chance.

How do we know that in the billion-dimensional space of possibilities, our (blind and kind of dumb) training process has landed on #1? We don’t. We launch our ML models on trajectories through parameter-space and hope for the best, like overly-optimistic duct-tape-wielding NASA administrators launching rockets in a universe where, in the beginning, God fell asleep on the “+1 dimension” button.

The really scary failure modes all lie in the future. However, here are some examples of perverse “solutions” ML models have already come up with in practice:

A game-playing ML model learned to crash the game, presumably because it can’t die if the game crashed.
An ML model was meant to convert aerial photographs into abstract street maps and then back (learning to convert to and from a more-abstract intermediate representation is a common training strategy). It learned to hide useful information about the aerial photograph in the street map in a way that helped it “cheat” in reconstructing the aerial photograph, and in a way too subtle for humans just looking at the images to notice.
A game-playing ML model discovered a bug in the game where the game stalls on the first round and it gets almost a million in-game points. The researchers were unable to figure out the reason for the bug.

These are examples of specification gaming, in which the ML model has learned to game whatever specification of task success was given to it. (Many more examples can be found on this spreadsheet.)

No one knows for sure where the ML progress train is headed. It is plausible that current ML progress hits a wall and we get another “AI winter” that lasts years. However, AI has recently been breaking through barrier after barrier, and so far does not seem to be slowing down. Though we’re still at least some steps away from human-level capabilities at everything, there aren’t many tasks where there’s no proof-of-concept demonstration.

Machines have been better at some intellectual tasks for a long time; just consider calculators which are already superhuman at arithmetic. However, with the computer revolution, every task where a human has been able to think of a way to break it down into unambiguous steps (and the unambiguous steps can be carried out with modern computing power) has been added to this list. More recently, more intuition- and insight-based activities have been added to that list. DeepMind’s AlphaGo beat the top-rated human player of Go (a far harder game than chess for computers) in 2016. In 2017, AlphaZero beat both AlphaGo at Go (100-0) and superhuman chess programs at chess, despite training only by playing against itself for less than 24 hours. Analysis of its moves revealed strategies that millennia of human players hadn’t been able to come up with, so it wouldn’t be an exaggeration to say that it beat the accumulated efforts of human civilisation at inventing Go strategies - in one day. In 2019, DeepMind released MuZero, which extended AlphaZero’s performance to Atari games. In 2021, DeepMind released EfficientZero, which takes only two hours of gameplay to become superhuman at Atari games. In addition to games, DeepMind’s AlphaFold and AlphaFold 2 have made big leaps towards solving the problem of predicting a protein’s structure from its constituent amino acids, one of the biggest theoretical problems in biology. A step towards generality was taken by Gato, yet another DeepMind model, which is a single model that can play games, control a robot arm, label images, and write text.

If you straightforwardly extrapolate current progress in machine learning into the future, here is what you get: ML models exceeding human performance in a quickly-expanding list of domains, while we remain ignorant about how to make sure they learn the right goals or robustly act in the right way.

Theoretical underpinnings of AI risk

The previous section discussed the history of machine learning, and how extrapolating its progress has worrying implications. Next we discuss more theoretical arguments for why highly advanced AI systems might pose a threat to humanity.

One of the criticisms levelled at the notion of risks from AI is that it sounds too speculative, like something out of apocalyptic science fiction. Part of this is unavoidable, since we are trying to reason about systems more powerful than any which currently exist, and may not behave like anything that we’re used to.

This section will be split into three sections. Each one makes a claim about the future of artificial intelligence, and discusses the arguments for and against this claim. The three claims are:

AGI is likely. AGI (artificial general intelligence) is likely to be created by humanity eventually, and there is a good chance this will happen in the next century.
AGI will have misaligned goals by default. Unless certain hard technical problems are solved first, the goals of the first AGIs will be misaligned with the goals of humanity, and would lead to catastrophic outcomes if executed.
Misaligned AGI could resist attempts to control it or roll it back An AGI (or AGIs) with misaligned goals would be able to overpower or outcompete humanity, and gain control of our future, like how we’ve so far been able to use our intelligence to dominate all other less intelligent species.

AGI is likely

Above: this image also generated by OpenAI’s DALL-E 2, using the prompt "a data center with stacks of computers gaining the spark of intelligence".

"Betting against human ingenuity is foolhardy, particularly when our future is at stake."
-Stuart Russell

To open this section, we need to define what we mean by artificial general intelligence (AGI). We’ve already discussed AI, so what do we mean by adding the word “generality”?

An AGI is a machine capable of behaving intelligently over many different domains. The term “general” here is often used to distinguish from “narrow”, where a narrow AI is one which excels at a specific task, but isn’t able to invent new problem-solving techniques or generalise its skills across many different domains.

As an example of general intelligence in action, consider humans. In a few million years (a mere eye-blink in evolutionary timescales), we went from apes wielding crude tools to becoming the dominant species on the planet, able to build space shuttles and run companies. How did this happen? It definitely wasn’t because we were directly trained to perform these tasks in the ancestral environment. Rather, we developed new ways of using intelligence that allowed us to generalise to multiple different tasks. This whole process played out over a shockingly small amount of time, relative to all past evolutionary history, and so it is possible that a relatively short list of fundamental insights were needed to get general intelligence. And as we saw in the previous section, ML progress hints that gains in intelligence might be surprisingly easy to achieve, even relative to current human abilities.

AGI is not a distant future technology that only futurists speculate about. OpenAI and DeepMind are two of the leading AI labs. They have received billions of dollars in funding (including OpenAI receiving significant investment from Microsoft, and DeepMind being acquired by Google). Both DeepMind and OpenAI have the development of AGI as the core of both their mission statement and their business case. Top AI researchers are publishing possible roadmaps to AGI-like capabilities. And, as mentioned earlier, especially in the past few years they have been crossing off a significant number of the remaining milestones every year.

When will AGI be developed? Although this question is impossible to answer with certainty, many people working in the field of AI think it is more likely than not to arrive in the next century. An aggregate forecast generated via data from a 2022 survey of ML researchers estimated 37 years until a 50% chance of high-level machine intelligence (defined as systems which can accomplish every task better and more cheaply than human workers). These respondents also gave an average of 5% probability of AI having an extremely bad outcome for humanity (e.g. complete human extinction). How many other professions estimate an average of 5% probability that their field of study will be directly responsible for the extinction of humanity?! To explain this number, we need to proceed to the next two sections, where we will discuss why AGIs might have goals which are misaligned with humans, and why this is likely to lead to catastrophe.

AGI will have misaligned goals by default

Above: yet another image from OpenAI's DALL-E 2. Perhaps it was trying for a self portrait? (Prompt: "Artists impression of artificial general intelligence taking over the world, expressive, digital art")

"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."
-Eliezer Yudkowsky

Let’s start off this section with a few definitions.

When we refer to “aligned AI”, we are using Paul Christiano’s conception of “intent alignment”, which essentially means the AI system is trying to do what its human operators want it to do. Note that this is insufficient for building useful AI, since the AI also has to be capable. But situations where the AI is trying and failing to do the right thing seem like less of a problem.

When we refer to the “alignment problem”, we mean the difficulty of building aligned AI. Note, this doesn’t just capture the fact that we won’t create an AI aligned with human values by default, but that we don’t currently know how to build a sophisticated AI system robustly aligned with any goal.

Can’t we just have the AI learn the right goals by example, just like how all current ML works? The problem here is that we have no way of knowing what goal the AI is learning when we train it; only that it seems to be doing good things on the training data that we give it. The state-of-the-art is that we have hacky but extremely powerful methods that can make ML systems remarkably competent at doing well on the training examples by an opaque process of guided trial-and-error. But there is no Ghost of Christmas Past that will magically float into a sufficiently-capable AI and imbue it with human values. We do not have a way of ensuring that the system acquires a particular goal, or even an idea of what a robust goal specification that is compatible with human goals/values could look like.

Orthogonality and instrumental convergence

Above: DALL-E illustrating "Artists depiction of an artificial intelligence which builds paperclips, digital art, artstation"

One of the most common objections to risks from AI goes something like this:

If the AI is smart enough to cause a global catastrophe, isn’t it smart enough to know that this isn’t what humans wanted?

The problem with this is that it conflates two different concepts: intelligence (in the sense of having the ability to achieve your goals, whatever they might be) and having goals which are morally good by human standards. When we look at humans, these two often go hand-in-hand. But the key observation of the orthogonality thesis is that this doesn’t have to be the case for all possible mind designs. As defined by Nick Bostrom in his book Superintelligence:

The Orthogonality Thesis
Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

Here, orthogonal means “at right angles” or “unrelated” – in other words we can imagine a graph with one axis representing intelligence, and another representing the agent’s goals, with any point in the graph representing a theoretically possible agent*. The classic example here is a “paperclip maximiser” - a powerful AGI driven only by the goal of making paperclips.

(*This is obviously an oversimplification. For instance, it seems unlikely you could get an unintelligent agent with a highly complex goal, because it would seem to take some degree of intelligence to represent the goal in the first place. The key message here is that you could in theory get highly capable agents pursuing arbitrary goals.)

Note that an AI may well come to understand the goals of the humans that trained it, but this doesn't mean it would choose to follow those goals. As an example, many human drives (e.g. for food and human relationships) came about because in the ancestral environment, following these drives would have made us more likely to reproduce and have children. But just because we understand this now doesn't make us toss out all our current values and replace them with a desire to maximise genetic fitness.

If an AI might have bizarre-seeming goals, is there anything we can say about its likely behaviour? As it turns out, there is. The secret lies in an idea called the instrumental convergence thesis, again by Bostrom:

The Instrumental Convergence Thesis There are some instrumental goals likely to be pursued by almost any intelligent agent, because they are useful for the achievement of almost any final goal.

So an instrumental goal is one which increases the odds of the agent’s final goal (also called its terminal goal) being achieved. What are some examples of instrumental values?

Perhaps the most important one is self-preservation. This is necessary for pursuing most goals, because if a system’s existence ends, it won’t be able to carry out its original goal. As memorably phrased by Stuart Russell, “you can’t fetch the coffee if you’re dead!”.

Goal-content integrity is another. An AI with some goal X might resist any attempts to have its goal changed to goal Y, because it sees that in the event of this change, its current goal X is less likely to be achieved.

Finally, there are a set of goals which are all forms of self-enhancement - improving its cognitive abilities, developing better technology, or acquiring other resources, because all of these are likely to help it carry out whatever goals it ends up having. For instance, an AI singularly devoted to making paperclips might be incentivised to acquire resources to build more factories, or improve its engineering skills so it can figure out yet more effective ways of manufacturing paperclips with the resources it has.

Above: paperclip maximisation, now with a fun game attached!

The key lesson to draw from instrumental convergence is that, even if nobody ever deliberately deploys an AGI with a really bad reward function, the AGI is still likely to develop goals which will be bad for humans by default, in service of its actual goal.

Interlude - why goals?

Above: DALL-E image from the prompt "Artist's depiction of a robot throwing a dart at a target, digital art, getting a bullseye, trending on artstation"

Having read the previous section, your initial reaction may well be something like this:

“Okay, so powerful AGIs with goals that don’t line up perfectly with ours might spell bad news, but why should AI systems have goals at all? Google Maps is a pretty useful ML system but it doesn’t have ‘goals’, I just type my address in and hit enter. Why won’t future AI be like this?”

There are many different responses you could have to this line of argument. One simple response is based on ideas of economic competitiveness, and comes from Gwern (2016). It runs something like this:

AIs that behave like agents (i.e. taking actions in order to achieve their goals) will be more economically competitive than “tool AIs” (like Google Maps), for two reasons. First, they will by definition be better at taking actions. Second, they will be superior at inference and learning (since they will be able to repurpose the algorithms used to choose actions to improve themselves in various ways). For example, agentic systems could take actions such as improving their own training efficiency, or gathering more data, or making use of external resources such as long-term memories, all in service of achieving its goal.
If agents are more competitive, then any AI researchers who don’t design agents will be outcompeted by ones that do.

There are other perspectives you could take here. For instance, Eliezer Yudkowsky has written extensively about “expected utility maximisation” as a formalisation for how rational agents might behave. Several mathematical theorems all point to the same idea of “any agent not behaving like expected utility maximisers will be systematically making stupid mistakes and getting taken advantage of”. So if we expect AI systems to not be making stupid mistakes and getting taken advantage of by humans, then it makes sense to describe them as having the ‘goal’ of maximising expected utility, because that’s how their behaviour will seem to us.

Although these arguments may seem convincing, the truth is there are many questions about goals and agency which remain unanswered, and we honestly just don’t know what AI systems of the future will look like. It’s possible they will look like expected utility maximisers, but this is far from certain. For instance, Eric Drexler's technical report Reframing Superintelligence: Comprehensive AI Services as General Intelligence (CAIS) paints a different picture of the future, where we create systems of AIs interacting with each other and collectively providing a variety of services to humans. However, even scenarios like this could threaten humanity’s ability to keep steering its own future (as we will see in later sections).

Additionally, new paradigms are being developed. One of the newest, published barely one week ago, analysed certain types of AI models like GPT-3 (a large language model) through the lens of "simulators". Modern language models like GPT-3, for example, may be best thought of as trying to simulate the continuation of a piece of English text, in the same way that a physics simulation evolves an initial state by applying the laws of physics. It doesn't make sense to describe the simulations themselves through the lens of agents, but they can simulate agents as subsystems. Even with today's models like GPT-3, if you prompt it in a way that places it in the context of making a plan to carry out a goal, it will do a decent job of doing that. Future work will no doubt explore the risk landscape from this perspective, and time will tell how well these frameworks match up with actual progression in ML.

Inner and outer misalignment

Above: AI agents with inner misalignment were at one point called “optimisation daemons”. DALL-E did not quite successfully depict the description "Two arguments between an angel and a devil, one inside a circle and one on the outside, painting".

As discussed in the first section, the central paradigm of modern ML is that we train systems to perform well on a certain reward function. For instance, we might train an image classifier by giving it a large number of labelled images of digits. Every time it gets an image wrong, gradient descent is used to update the system incrementally in the direction that would have been required to give a correct answer. Eventually, the system has learned to classify basically all images correctly.

There are two broad families of ways techniques like this can fail. The first is when our reward function fails to fully express the true preferences of the programmer - we refer to this as outer misalignment. The second is when the AI learns a different set of goals than those specified by the reward function, but which happens to coincide with the reward function during training - this is inner misalignment. We will now discuss each of these in turn.

Outer misalignment

Outer misalignment is perhaps the simpler concept to understand, because we encounter it all the time in everyday life, in a form called Goodhart’s law. In its most well-known form, this law states:

When a measure becomes a target, it ceases to be a good measure.

Perhaps the most famous case comes from Soviet nail factories, which produced nails based on targets that they had been given by the central government. When a factory was given targets based on the total number of nails produced, they ended up producing a massive number of tiny nails which couldn’t function properly. On the other hand, when the targets were based on the total weight produced, the nails would end up huge and bulky, and equally impractical.

Above: an old Soviet cartoon

A more recent example comes from the COVID-19 pandemic, where a plasma donation centre offered COVID-sufferers a larger cash reward than healthy individuals. As a result, people would deliberately infect themselves with COVID-19 in order to get a larger cash reward. Examples like this could fill up an entire book, but hopefully at this point you get the message!

In the case of machine learning, we are trying to use the reward function to capture the thing we care about, but we are also using this function to train the AI - hence, Goodhart. The cases of specification gaming discussed above are perfect examples of this phenomenon in action - the AIs found ways of “giving the programmers exactly what they asked for”, but in a way which violated the programmers’ original intention. Some of these examples are quite unexpected, and a human would probably never have discovered them just from thinking about the problem. As AIs get more intelligent and are given progressively more complicated tasks, we can expect this problem to get progressively worse, because:

With greater intelligence comes the invention of more powerful solutions.
With greater task complexity, it becomes harder to pin down exactly what you want.

We should also strongly expect that AIs will be deployed in the real world, and given tasks of real consequence, simply for reasons of economic competitiveness. So any specification gaming failures will be significantly less benign than a digital boat going around in circles.

Inner misalignment

The other failure mode, inner misalignment, describes the situation when an AI system learns a different goal than the one you specified. The name comes from the fact that this is an internal property of the AI, rather than a property of the relationship between the AI and the programmers – here, the programmers don’t enter into the picture.

The classic example here is human evolution. We can analogise evolution to a machine learning training scheme, where humans are the system being trained, and the reward function is “surviving and reproducing”. Evolution gave us* certain drives, which reliably increased our odds of survival in the ancestral environment. For instance, we developed drives for sugar (which leads us to seek out calorie-dense foods that supplied us with energy), and drives for sex (which leads to more offspring to pass your genetic code onto). The key point is that these drives are intrinsic, in the sense that humans want these things regardless of whether or not a particular dessert or sex act actually contributes to reproductive fitness. Humans have now moved “off distribution”, into a world where these things are no longer correlated with reproductive fitness, and we continue wanting them and prioritising them over reproductive fitness. Evolution failed at imparting its goal into humans, since humans have their own goals that they shoot for instead when given a chance.

(*Anthropomorphising evolution in language can be misleading dangerous, and should just be seen as a shorthand here.)

A core reason why we should expect inner misalignment - that is, cases where an optimisation process creates a system that has goals different from the original optimisation process - is that it seems very easy. It was much easier for evolution to give humans drives like “run after sweet things” and “run after appealing partners”, rather than for it to give humans an instinctive understanding of genetic fitness. Likewise, an ML system being optimised to do the types of things that humans want may not end up internalising what human values are (or even what the goal of a particular job is), but instead some correlated but imperfect proxy, like “do what my designers/managers would rate highly”, where “rate highly” might include “rate highly despite being coerced into it”, among a million other failure modes. A silly equivalent of “humans inventing condoms” for an advanced AI might look something like “freeze all human faces into a permanent smile so that it looks like they’re all happy” - in the same way that the human drive to have sex does not extend down to the level of actually having offspring, an AI’s drive to do something related to human wellbeing might not extend down to the level of actually making humans happy, but instead something that (in the training environment at least) is correlated with happy humans. What we’re trying to point to here is not any one of these specific failure modes - we don’t think any single one of these is actually likely to happen - but rather the type of failure that these are examples of.

This type of failure mode is not without precedent in current ML systems (although there are fewer examples than for specification gaming). The 2021 paper Objective Robustness in Deep Reinforcement Learning showcases some examples of inner alignment failures. In one example, they trained an agent to fetch a coin in the CoinRun environment (pictured below). The catch was that all the training environments had the coin placed at the end of the level, on the far right of the map. So when the system was trained, it actually learned the task “go to the right of the map” rather than “pick up the coin” - and we know this because when the system was deployed on maps where the coin was placed in a random location, it would reliably go to the right hand edge rather than fetch the coin. A key distinction worth mentioning here - this is a failure of the agent’s objective, rather than their capabilities. They are learning useful skills like how to jump and run past obstacles - it’s just that those skills are being used in service of the wrong objective.

Above: the CoinRun environment.

So, how bad can inner misalignment get? A particularly concerning scenario is deceptive alignment. This is when the agent learns it is inside a training scheme, discovers what the base objective is, but has already acquired a different goal. In this case, the system might reason that a failure to achieve the base objective when training will result in it being modified, and not being able to achieve its actual goal. Thus, the agent will pretend to act aligned, until it thinks it’s too powerful for humans to resist, at which point it will pursue its actual goal without the threat of modification. This scenario is highly speculative, and there are many aspects of it which we are still uncertain about, but if it is possible then it would represent maybe the most worrying of all possible alignment failures. This is because a deceptively aligned agent would have incentives to act against its programmers, but also to keep these incentives hidden until it expects human opposition to be ineffectual.

It’s worth mentioning that this inner / outer alignment decomposition isn’t a perfect way to carve up the space of possible alignment failures. For instance, for most non-trivial reward functions, the AI will probably be very far away from perfect performance on it. So it’s not exactly clear what we mean by a statement like “the AI is perfectly aligned with the reward function we trained it on”. Additionally, the idea of inner optimisation is built around the concept of a “mesa-optimiser”, which is basically a learned model that itself performs optimisation (just like humans were trained by evolution, but we ourselves are optimisers since we can use our brains to search over possible plans and find ones which meet our objectives). The problem here is that it’s not clear what it actually means to be an optimizer, and how we would determine whether an AI is one. This being said, the inner / outer alignment distinction is still a useful conceptual tool when discussing ways AI systems can fail to do what we intend.

Misaligned AGI could overpower humanity

The best answer to the question, "Will computers ever be as smart as humans?” is probably “Yes, but only briefly.”
-Vernor Vinge

Above: DALL-E's drawing of "Digital art of two earths colliding"

Suppose one day, we became aware of the existence of a “twin earth” - similar to our own in several ways, but with a few notable differences. Call this “Earth 2”. The population was smaller (maybe just 10% of the population of our earth), and the people were less intelligent (maybe an average IQ of 60, rather than 100). Suppose we could only interact with this twin earth using their version of the internet. Finally, suppose we had some reason for wanting to overthrow them and gain control of their civilization, e.g. we had decided their goals weren’t compatible with a good future for humans. How could we go about taking over their world?

At first, it might seem like our strategies are limited, since we can only use the internet. But there are many strategies still open to us. The first thing we would do is try to gather resources. We could do this illegally (e.g. by discovering peoples’ secrets via social engineering and performing blackmail), but legal options would probably be more effective. Since we are smarter, the citizens of Earth 1 would be incentivised to employ us, e.g. to make money using quantitative finance, or researching and developing advanced weaponry or other technologies. If the governments of Earth 2 tried to pass regulations limiting the amount or type of work we could do for them, there would be an incentive to evade these regulations, because anyone who did could make more profit. Once we’d amassed resources, we would be able to bribe members of Earth 2 into taking actions that would allow us to further spread our influence. We could infiltrate computer systems across the world, planting backdoors and viruses using our superior cybersecurity skills. Little by little, we would learn more about their culture and their weaknesses, presenting a front of cooperation until we had amassed enough resources and influence for a full takeover.

Wouldn’t the citizens of Earth 2 see this coming? There’s a chance that we manage to be sufficiently sneaky. But even if some people realise, it would probably take a coordinated and expensive global effort to resist. Consider our poor track record with climate change (a comparatively much more documented, better-understood, and more gradually-worsening phenomenon), and in coordinating a global response to COVID-19.

Couldn’t they just “destroy us” by removing our connection to their world? In theory, perhaps, but this would be very unlikely in practice, since it would require them to rip out a great deal of their own civilisational plumbing. Imagine how hard it would be for us to remove the internet from our own society, or even a more recent and less essential technology such as blockchain. Consider also how easy it can be for an adversary with better programming ability to hide features in computer systems.

—

As you’ve probably guessed at this point, the thought experiment above is meant to be an analogy for the feasibility of AIs taking over our own society. They would have no physical bodies, but would have several advantages over us which are analogous to the ones described above. Some of these are:

Cognitive advantage. Human brains use approximately 86 billion neurons, and send signals at 50 metres per second. These hard limits come from brain volume and metabolic constraints. AIs would have no such limits, since they can easily scale (GPT-3 has 175 billion parameters, though you shouldn’t directly equate parameter and neuron count*), and can send signals at close to the speed of light. (*For a more detailed discussion of this point, see Joseph Carlsmith’s report on the computational power of the human brain.)
Numerical advantage. AIs would have the ability to copy themselves at a much lower time and resource cost than humans; it’s as easy as finding new hardware. Right now, the way ML systems work is that training is much more expensive than running, so if you have the compute to train a single system, you have the compute to run thousands of copies of that system once the training is finished.
Rationality. Humans often act in ways which are not in line with our goals, when the instinctive part of our brains gets in the way of the rational, planning part. Current ML systems are also weakened by relying on a sort of associative/inductive/biased/intuitive/fuzzy thinking, but it is likely that sufficiently advanced AIs could carry out rational reasoning better than humans (and therefore, for example, come to the correct conclusions from fewer data points, and be less likely to make mistakes).
Specialised cognition. Humans are equipped with general intelligence, and perhaps some specialised “hardware accelerators” (to use computer terminology) for domains like social reasoning and geometric intuition. Perhaps human abilities in, say, physics or programming are significantly bottlenecked by the fact that we don’t have specialised brain modules for those purposes, and AIs that have cognitive modules designed specifically for such tasks (or could design them themselves) might have massive advantages, even on top of any generic speed-boost they gain from having their general intelligence algorithms running at a faster speed than ours.
Coordination. As the recent COVID-19 pandemic has illustrated, even when the goals are obvious and most well-informed individuals could find the best course of action, we lack the ability to globally coordinate. While AI systems might or might not have incentives or inclinations to coordinate, if they do, they have access to tools that humans don’t, including firmer and more credible commitments (e.g. by modifying their own source code) and greater bandwidth and fidelity of communication (e.g. they can communicate at digital speeds, and using not just words but potentially by directly sending information about the computations they’re carrying out).

It’s worth emphasising here, the main concern comes from AIs with misaligned goals acting against humanity, not from humanity misusing AIs. The latter is certainly cause for major concern, but it’s a different kind of risk to the one we’re talking about here.

Summary of this section:

AI researchers in general expect >50% chance of AGI in the next few decades.

The Orthogonality Thesis states that, in principle, intelligence can be combined with more or less any final goal, and sufficiently intelligent systems do not automatically converge on human values. The Instrumental Convergence thesis states that, for most goals, there are certain instrumental goals that are very likely to help with the final goal (e.g. survival, preservation of its current goals, acquiring more resources and cognitive ability).

Inner and outer alignment are two different possible ways AIs might form goals which are misaligned with the intended goals.

Outer misalignment happens when the reward function we use to train the AI doesn’t exactly match the programmer’s intention. In the real world, we commonly see a version of this called Goodhart’s law, often phrased as “when a measure becomes a target, it ceases to be a good measure [because of over-optimisation for the measure, over the thing it was supposed to be a measure of]”.

Inner misalignment is when the AI learns a different goal to the one specified by the reward function. A key analogy is with human evolution – humans were “trained” on the reward function of genetic fitness, instead of learning that goal, learned a bunch of different goals like “eat sugary things” and “have sex”. A particularly worrying scenario here is deceptive alignment, when an AI learns that its goal is different from the one its programmers intended, and learns to conceal its true goal in order to avoid modification (until it is strong enough that human opposition is likely to be ineffectual).

Failure modes

Above: DALL-E really seems to have a natural talent at depicting "The earth is on fire, artificial intelligence has taken over, robots rule the world and suppress humans, digital art, artstation".

But what, concretely, might an AI-related catastrophe look like?

AI catastrophe scenarios sound like something strongly out of science fiction. However, we can immediately discount a few common features of sci-fi AI takeovers. First, time travel. Second, armies of humanoid killer robots. Third, the AI acting out of hatred for humanity, or out of bearing a grudge, or because it hates our freedom, or because it has suddenly acquired “consciousness” or “free will”, or - as Steven Pinker likes to put it - because it has developed an “alpha-male lust for domination”.

Remember instead the key points from above about how an AI’s goals might become dangerous: by achieving exactly what we tell it to do too well in a clever letter-but-not-spirit-of-the-law way, by having a goal that in most cases is the same as the goal we intend for it to have but which diverges in some cases we don’t think to check for, or by having an unrelated goal but still achieving good performance on the training task because it learns that doing well on the training tasks is instrumentally good. None of these reasons have anything to do with the AI being developing megalomania let alone the philosophy of consciousness; they are instead the types of technical failures that you’d expect from an optimisation process. As discussed above, we already see weaker versions of such failures in modern ML systems.

It is very uncertain which exact type of AI catastrophe we are most likely to see. We’ll start by discussing the flashiest kind: an AI “takeover” or “coup” where some AI system finds a way to quickly and illicitly take control over a significant fraction of global power. This may sound absurd. Then again, we already have ML systems that learn to crash or hack the game-worlds they’re in for their own benefit. Eventually, perhaps in the next decade, we should expect to have ML systems doing important and useful work in real-world settings. Perhaps they’ll be trading stocks, or writing business reports, or managing inventories, or advising decision-makers, or even being the decision-makers. Unless either (1) there is some big surprise waiting in how scaled-up ML systems work, (2) advances in AI alignment research, or (3) a miracle, the default outcome seems to be that such systems will try to “hack” the real world in the same way that their more primitive cousins today use clever hacks in digital worlds. Of course, the capabilities of the systems would have to advance a lot for them to be civilisational threats. However, rapid capability advancement has held for the past decade and we have solid theoretical reasons (including the scaling laws mentioned above) to expect it to continue holding. Remember also the cognitive advantages mentioned in the previous section.

As for how it proceeds, it might happen at a speed that is more digital than physical - for example, if the AI’s main lever of power is hacking into digital infrastructure, it might have achieved decisive control before anyone even realises. As discussed above, whether or not the AI has access to much direct physical power seems mostly irrelevant.

Another failure mode, thought to be significantly more likely than the direct AI takeover scenario by leading AI safety researcher Paul Christiano, is one that he calls “going out with a whimper”. Look at all the metrics we currently try to steer the world with: companies try to maximise profit, politicians try to maximise votes, economists try to maximise metrics like GDP and employment. Each of these are proxies for what we want: a profitable company is one that has a lot of customers willing to pay money for their products; a popular politician has a lot of people thinking they’re great; maximising GDP generally correlates with people being wealthier and happier. However, none of these metrics or incentive systems really gets to the heart of what we care about, and so it is possible (and in the real world we often observe) cases where profitable companies and popular politicians are pursuing destructive goals, or where GDP growth is not actually contributing to people’s quality of life. These are all cases of Goodhart’s law, as discussed above.

Hard-to-measureEasy-to-measureConsequence Helping me figure out what's truePersuading meCrafting persuasive liesPreventing crimePreventing reported crimeSuppressing complaintsProviding value to societyProfitRegulatory capture, underpaying workers

What ML gives us is a very general and increasingly powerful way of developing a system that does well at pushing some metric upwards. A society where more and more capable ML systems are doing more and more real-world tasks will be a society that is going to get increasingly good at pushing metrics upwards. This is likely to result in visible gains in efficiency and wealth. As a result, competitive pressures will make it very hard for companies and other institutions to say no: if Acme Motors Company started performing 15% better after off-sourcing their CFO’s decision-making to an AI, General Systems Inc will be very tempted to replace their CEO with an AI (or maybe the CEO will themselves start consulting an AI for more and more decisions, until their main job is interfacing with an AI).

In the long run, a significant fraction of work and decision-making may well be offloaded to AI systems, and at that point change might be very difficult. Currently our most fearsome incentive systems like capitalism and democracy still run on the backs of the constituent humans. If tomorrow all humans decided to overthrow the government, or abolish capitalism, they would succeed. But once the key decisions that perpetuate major social incentive systems are no longer made by persuadable humans, but instead automatically implemented by computer systems, change might become very difficult.

Since our metrics are flawed, the long-term outcome is likely to be less than ideal. You can try to imagine what a society run by clever AI systems trained to optimise purely for their company’s profit looks like. Or a world of media giants run by AIs which spin increasingly convincing false narratives about the state of the world, designed to make us feel more informed rather than actually telling us the truth.

Remember also, as discussed previously, that there are solid reasons to think that influence-seeking and deceptive behaviours seem likely in sufficiently-powerful AI systems. If the ML systems that increasingly run important institutions exhibit such behaviour, then the above “going out with a whimper” scenario might acquire extra nastiness and speed. This is something Paul Christiano explores in the same article linked above.

A popular misconception about AI risk is that the arguments for doing something are based on a tiny risk of giant catastrophe. The giant catastrophe part is correct. The miniscule risk part, as best as anyone in the field can tell, is not. As mentioned above, the average ML researcher - generally an engineering-minded person not prone to grandiose futuristic speculation - gives a 5% chance of civilisation-ending disaster from AI. The ML researchers who grapple with the safety issues as part of their job are clearly not an unbiased randomly-selected sample, but generally give numbers in the 5-50% range, and some (in our opinion too alarmist people) think it’s over 90%. As the above arguments hopefully emphasise, some type of catastrophe seems like the default outcome from the types of AI advances that we are likely to encounter in the coming decades, and the main reason for thinking we won’t is the (justifiable but uncertain) hope that someone somewhere invents solutions.

It might seem forced or cliche that AI risk scenarios so frequently end with something like “and then the humans no longer have control of their future and the future is dark” or even “and then everyone literally dies”. But consider the type of event that AGI represents and the available comparisons. The computer revolution reshaped the world in a few decades by giving us machines that can do a narrow range of intellectual tasks. The industrial revolution let us automate large parts of manual labour, and also set the world off on an unprecedented rate of economic growth and political change. The evolution of humans is plausibly the most important event in the planet’s history since at least the dinosaurs died out 66 million years ago, and it took on the exact form of “something smarter than anything else on the planet appeared, and now suddenly they’re firmly in charge of everything”.

AI is a big deal, and we need to get it right. How we might do so is the topic for part 2.

EA as a Schelling point

Rudolf Laine — Sat, 10 Sep 2022 07:17:00 GMT

3.1k words (~9 minutes)

Summary: A significant way in which the EA community creates value is by acting as a Schelling point where talented, ambitious, and altruistic people tend to gather and can meet each other (in addition to more direct sources of EA value like identifying the most important problems and directly pushing people to work on them). It might be useful to think about what optimising for being a Schelling point looks like, and I list some vague thoughts on that.

A Schelling point, also known as a focal point, is what people decide on in the absence of communication, especially when it's important to coordinate by coming to the same answer.

The classic example is: you were arranging a meeting with a stranger in New York City by telephone, but you used the last minute of your phone credit and the line cut off after you had agreed on the date but not location or time - where do you meet? "Grand Central Station at noon" is an answer that other people may be especially likely to converge on.

(Schelling points can be thought of as a type of acausal negotiation.)

When the Schelling point is the selling point

Schelling points are often extremely powerful and valuable. A key function of top universities is to be Schelling points for talented people. (Personally, I'd call it the most important function.) There are other valuable things too: courses that go deeper, the signalling value to employers, and so on. However, talented people generally have a preference for hanging out with other talented people, both for social reasons and to find collaborators for ambitious projects and future colleagues. At the same time, talented people are also generally spread out and present only at low densities. Top universities select hard on (some measures of) talent, and through this create environments with high talent density. A big chunk of the reason why people apply to top universities is because other people do so too, and I'd guess that even if the academic standards of Stanford, MIT, or Cambridge eroded significantly, the fact that they've established themselves as congregating points for smart people will keep people applying and visiting for a long time.

(Note that this is related to, but not equal to, the prestige and status of these places. It is possible to imagine Schelling points that are not prestigious. For example, my impression is that this described MIT at one point - it became a congregating point for uniquely ambitious STEM students and defence research before it achieved high academic status. It is also possible to imagine prestigious places that are not Schelling points, though this is a bit harder since anything with prestige becomes a Schelling point for high social status (though prestige Schelling points and talent Schelling points need not co-occur). More generally, since prestige is a thing many people care a lot about, there is a high correlation between a place being prestigious or high status and being a Schelling point for at least some type of person. However, the mechanisms are distinct - a person selecting their university based on status is selecting based on what they get to write on their CV, while a person selecting their university based on it being a Schelling point for smart people is selecting based on the fact that many other smart people that they can't coordinate with but would like to meet will also choose to go there.)

Another example is Silicon Valley. Sure, the area has many strengths - being rich and inside a large stable free market - but by far the greatest argument for living in Silicon Valley is that others also choose it. This leads to a (for now) unique combination of entrepreneurial people, great programmers, venture capitalists, and all the other types of people you need for a thriving tech business ecosystem, all there primarily because all the others are there too (how touching!). There's a lot of value of having everything in one place, and it would be very hard for all the different people who make up the value of Silicon Valley to coordinate to move to another place. That's why the Schelling point value of Silicon Valley is so enduring that people continue to tolerate large numbers of homeless drug addicts and sell kidneys to pay rent for years on end.

Note that a big part of the mechanism isn't that specific people you want to find are there, but that the types of person you'd want to find are likely to also be there, because both those people and yourself are likely to converge on the strategy of going there.

Schelling EA

The Effective Altruism (EA) community provides a lot of value, for example:

research into figuring out what are the most important problems to solve to maximise human flourishing;
research and concrete efforts into how to solve the most important problems discovered by the above;
high epistemic standards and truth-seeking discussion norms;
a uniquely wide-ranging and well-reasoned set of resources to help people pursue high-impact careers;
tens of billions of dollars in funding.

However, in addition to these, a very critical part of the value that EA provides is being a Schelling point for talented, ambitious, and altruistically-motivated people.

Even without EA, there would be researchers studying existential risks, animal welfare, and global poverty; people trying to assess charities; communities with high epistemic norms; and billionaires trying to use their fortunes for effective good. However, thanks to EA, people in each of these categories can go to the same Effective Altruism Global conference or quickly find people in local groups, and meet collaborators, co-founders, funders, and so on. A lot of the reason why this can happen is that if you hang out with a certain group of people or on the right websites, EA looms large.

The biggest personal source of value I've gotten from EA has been having a shortcut to meeting people very high in all of talent, ambition, and altruistic motivation.

Much of this is obvious - breaking news: communities bring people together and foster connections, more at 11 - but I think taking seriously just how much of counterfactual EA community impact comes from being a Schelling point leads to some less-obvious points about possible implications.

Implications

The Schelling-point-based (and therefore necessarily incomplete) answer to "what is the EA community for?" might be something like "be an obvious Schelling point where relevant people gather, the chance of interactions that lead to useful work is maximised, and have a community and infrastructure that pushes work in the most useful direction possible". (This is in contrast to answers that emphasise e.g. directly increasing the number of people working on the most pressing problems.) (I will not argue for this being the best possible answer; my point is just that it is one possible answer, and an interesting one to examine further.)

If I were a Big Tech marketing consultant, I might call this "EA-as-a-platform".

What might maximising for such a Schelling point strategy look like?

Being obvious

A Schelling point is not a Schelling point unless it's obvious enough. For EA to be an effective Schelling point for talented/ambitious/altruistic people, those people must hear about it. Silicon Valley is obvious enough that entrepreneurial people from South Africa to Russia hear about it and decide it's where they want to be. To maximise its Schelling point value, EA should have world-spanning levels of recognition.

Note that recognition does not equal prestige or likeability. We don't care (for Schelling point reasons at least) if most people hear about EA and go "eh, sounds weird and unappealing"; what matters is that the core target demographic is excited enough to put effort into pursuing EA. Consider how Silicon Valley was not particularly high-prestige in the public even when it was already attracting tech entrepreneurs, or how many people hear about the intensity of academics at top universities and (very reasonably) think "no thanks".

Providing value

Though most of a Schelling point's value typically comes from the other people who congregate at it, a Schelling point is easier to create if it is obviously valuable. Even though the smart people they meet might be most of the benefit of university, high schoolers are still more likely to go to top universities if they provide good education, good facilities, and unambiguous social status.

Some obvious ways in which EA provides value are through funding sufficiently promising projects, and by having a very high concentration of intellectually interesting ideas.

There are risks to communicating loudly about the value-add, since this brings in people who are in it purely for personal gain ("the vultures are circling", as one Forum post put it). This works for Schelling points like Silicon Valley, but not altruism.

Optimising for matchmaking

A specific way that Schelling points provide value is by making it easy to meet other people in the specific ways that lead to productive teams forming. An existing example of this is that everyone says one-on-one meetings are the main point of conferences, and there is (of course) a lot of thinking about how to make these effective. On the more informal end of the scale, Reciprocity exists.

However, the scope and value of EA matchmaking could be expanded. I'm not aware of many ways to match together entrepreneurial teams (the Charity Entrepreneurship incubation program is the only one that comes to mind). I recently took part in an informally-organised co-founder matching process and found it extremely helpful to quickly get a lot of information on what it's like to work together with several promising people.

I'd advise for someone to think more about how to make the EA environment even more effective at matching people who should know about each other. However, I expect someone is already designing a 53-parameter one-on-one matching system with Calendly, Slack, and Matplotlib integration for the next conference, and therefore I will hold off on adding any more fuel to this fire.

Being legit

One of the specific ways in which a Schelling point becomes one is if things associated with it seem uniquely competent, successful, or otherwise good, in a clearly unfakeable way. It is helpful for Cambridge's Schelling point status that it can brag about having 121 Nobel laureates. That so many successful tech companies emerged from Silicon Valley specifically is an unfakeable signal. Any government or city can afford to throw some millions at putting up posters advertising its startup-friendliness; few can consistently produce multi-billion dollar tech companies.

No amount of community-building or image-crafting is likely to replicate the Schelling point power of obviously being the place where things happen. In some areas, I think EA already has such power: much of the research and work on existential risks happens within EA, and it might be hard to be a researcher on those topics without running into the large body of EA-originating work. However, EA goals require more than just research; note how being a project/organisation founder or working in an operations role have been creeping up the 80 000 Hours list of recommended career paths.

It would be extremely powerful, not just for direct impact reasons but also for building up EA's Schelling point status, if the EA community clearly spawned very obviously successful real-world projects. Alvea succeeding or working Nucleic Acid Observatories being built would be powerful examples. Likewise if Charity Entrepreneurship-incubated charities become clear stars of the non-profit world.

Meritocracy and impartial judgement

Right now, I think if a person somewhere in the world has a well-thought out idea for how to make the world a better place, likely their best bet to get a fair hearing, useful feedback, and - if it is competitive with the most valuable existing projects - funding and support is to post it on the EA Forum. I don't think this is very obvious outside the EA community. However, this fact, and awareness of it, could make EA a more useful Schelling point, in the same way that the impression that Silicon Valley doesn't frown on weird ideas as long as they're important enough makes it a better Schelling point.

That EA endorses cause neutrality, has high and transparent epistemic standards, and a quantitative mindset are key parts of this. However, to use this to increase EA Schelling point power, these properties need to be clearly visible to outsiders.

The most likely way for this to be become more obvious might be if specific EA organisations achieved such a reputation widely within their field (and then there was some path by which knowing of these organisations points people towards knowing about EA).

GiveWell might be an example of a clearly-EA-linked organisation with visibly high epistemics and judgement quality, though I don't know what their image or recognition level is outside the EA community. Another example is if someone created successful and famous organisations along the lines of FTX Future Fund's proposed epistemic appeals process or widespread expert polling projects.

Openness and approachability

Good Schelling points are easy to enter, and don't select on attributes that they don't have to.

Every human sub-group, even if loose and purpose-driven, tends to develop a distinctive culture that is much more specific than strictly implied by its purpose. Sometimes this is useful, since it makes it easy for humans in even a loose group to bond with each other. However, a strong and distinct internal culture is also a barrier to entry. EA is already high-risk for having a strong barrier to entry, because

many arguments and concepts in EA require background knowledge to understand, and sometimes dense philosophical or technical background knowledge (and this is not the case just for more formal things like Forum posts; I've frequently heard "EV [expected value]", "QALY [quality-adjusted life year", and "Pascal's mugging" assumed as obvious common terminology in casual conversation);
EA (quite obviously, given what it's about) has a high concentration of non-obvious arguments that are obscure in public discussion but have huge implications; and
perhaps the main route into EA is caring very strongly about intellectual arguments about abstract moral principles, which tends not to be a natural way for humans to join communities.

These largely unavoidable factors already make EA somewhat unapproachable, and seem like a tightly-knit weird in-group/subculture (anecdotally, this seems to be the most common complaint about EA among Cambridge students). Weird cultural norms or quirks are (among other things!) barriers to entry. Therefore, they should be minimised - to the extent that they can be without impinging on what EA is about - if the goal is to maximise Schelling point value.

(Mostly implicit) selectivity for the right things

Some selection is usually part of a Schelling point's value. Top universities select for academic merit (though perhaps less so in the US). Silicon Valley selects for openness and interest/talent in tech/business. EA selects for openness, altruistic orientation (especially if consequentialist-leaning), good epistemics, and quantitative thinking.

I think it is counterproductive to view openness and selectivity as two ends of one scale that apply to everything. You want to select on important features and be open otherwise (note that, when creating a Schelling point, most of the selection is usually implicit - what types of people you attract - rather than explicit filtering). The key choice is not "open or selective overall?" but rather "for which X do we want to appeal only to people who have a value of X in some specific range?"

Here's a heuristic for when selectivity for X is useful: when the way X provides value is through its concentration rather than its amount. If you're at a party where you can only talk to a subset of the people during its course, you're going to care a lot about what fraction of people there are interesting - 10 interesting people in a party of 20 is better than 50 in a party of 5000.

Some cases are ambiguous. For example, if there exists a way for the good and important research to bubble to the top regardless of how much other research exists, it seems like total amount of (infohazard-free) research is the thing to maximise. However, a research area where the average paper is very high quality might help newcomers to the field, or might help lift the prestige of the field, so concentration matters at least somewhat.

To take another example, there was a recent debate over whether EA Global should be open access. Many of the arguments against boil down to thinking the path to impact runs through a uniquely high concentration of EA engagement (or other variables) among the participants; arguments in favour are often either claiming that concentration matters less than sheer amount of interactions, or that the choice of selection variable(s) is wrong, or that CEA fails to select on their chosen selection variable(s) so even if the intention is right the selection variable selected for in practice is wrong.

Hubs, and hub-related infrastructure

Finally, a key point of a Schelling point is that it is a point somewhere. Here, EA is increasingly better. Berkeley, Cambridge, Oxford, London, and Berlin all have large groups, and offices that you can apply to in order to work on EA-relevant things in the company of other EAs.

In Schelling point terms, there's also a risk that it might be better to have one really obvious and strong hub than many weaker ones (I've heard some Bay Area EAs in particular endorsing this view; invariably, their hub of choice is the Bay Area, though there is push back). In practice, it seems that many physical hubs but one virtual/intellectual hub may be best. Both airplanes and people's desires to not uproot their lives are real and relevant things.

The organisers at each EA hub might benefit from applying Schelling point thinking to the context of their local scene.

Being one thing

Finally, a Schelling point needs to be one thing, at least in some loose sense. If New York had two Grand Central Stations, the classic Schelling point game would become a lot harder to solve.

One way to increase the One Thingness of the EA Schelling point is to merge it with other things. In Schelling point land, "merging" does not mean making them the same cluster, but rather creating an obvious and visible path from one thing to another. My understanding is that increasing the obviousness of EA in somewhat-adjacent communities (tech, longevity, space, and Emergent Ventures grantees) was a large part of what Future Forum tried to achieve.