LLMs as Trusted Mediators – A Path Beyond Coordination Problems?

Using AI and cryptography to make cooperation rational—even when trust is impossible.

Jan 03, 2026

This text was written by Claude Opus 4.5, based on ideas from Johan Falk.

The Classic Dilemma

Consider the following scenario: Two leading AI laboratories—let us call them Lab A and Lab B—both recognise that an uncontrolled race towards ever more advanced AI systems poses significant risks. Both would benefit from slowing down, investing more in safety, and coordinating their releases. But neither dares to move first.

If Lab A unilaterally slows down while Lab B continues at full speed, Lab A loses market share, talent, and influence—perhaps permanently. The same applies to Lab B. The result? Both continue to accelerate, despite both preferring a world in which they had slowed down together.

This is the essence of a coordination problem. The individually rational strategy leads to a collectively suboptimal outcome. We see the same pattern in climate negotiations, nuclear disarmament, and countless other domains where parties would gain from cooperation but remain trapped in a destructive equilibrium.

Three mechanisms explain why coordination is so difficult:

Vulnerability through transparency. Revealing one’s true position—rather than one’s negotiating position—creates vulnerability. If Lab A says “we could actually accept a six-month pause,” Lab B now has an information advantage that can be exploited.

Impossible conditions. One cannot condition one’s offer on the other party’s secret willingness. Lab A wants to say “we’ll pause if you pause”—but how can they know whether Lab B’s response is genuine?

Leaked failures. If negotiations fail, both parties have learned strategic information about each other. The mere attempt to coordinate can therefore be costly.

The question this text explores is: Can new technology change the rules of the game?

What the Literature Already Knows

Coordination problems are, of course, not new, and theorists have developed several frameworks for understanding and sometimes solving them.

Programme Equilibrium

In 2004, game theorist Moshe Tennenholtz introduced the concept of “programme equilibrium.” The core idea is elegant: instead of players choosing strategies directly, they submit programmes that specify their strategy—and these programmes have access to each other’s source code.

In the prisoner’s dilemma, a player can then write a programme that says: “If my opponent’s programme is identical to mine, I cooperate. Otherwise, I defect.” If both players submit this programme, both will cooperate—and neither has an incentive to deviate, since any deviation is immediately detected and met with defection.

This is remarkable: rational cooperation emerges even in a one-shot game, something impossible in classical game theory. But there is an obvious limitation—it requires parties to actually inspect each other’s code and verify that it will be executed as specified.

Secure Multi-Party Computation

In parallel, cryptographers have developed protocols for “Secure Multi-Party Computation” (MPC). The central idea is that multiple parties can jointly compute a function of their secret inputs—without any party learning the others’ inputs.

A classic example: Three colleagues want to know who has the highest salary without revealing their individual salaries. With MPC, they can perform a computation that reveals only the result (who earns most) whilst each participant’s actual figure remains secret.

Think of it as if there were a perfectly trustworthy third party—a “Tony”—to whom everyone could send their secrets. Tony computes the function and returns only the result. MPC’s breakthrough was showing that one can achieve this without trusting any Tony—the protocol itself guarantees secrecy.

The problem with MPC for negotiations is complexity. The protocols are designed for well-defined mathematical functions, not for the rich, context-dependent reasoning that real negotiations require.

Mediated Equilibrium

Monderer and Tennenholtz have also shown that a mediator can enable equilibria that would otherwise be impossible. A mediator receives private messages from all parties, computes a recommendation, and sends back instructions. Parties need not follow the instructions—but if the mediator’s mechanism is properly designed, no party has an incentive to deviate.

It turns out that certain socially desirable outcomes that cannot be reached through ordinary Nash equilibria can in fact be achieved through mediated equilibria. But again: whom does one trust as mediator? And how does one verify that the mediator is behaving correctly?

The New Possibility: LLM + TEE

Here is the central idea: what happens if the mediator is a large language model (LLM) running inside a cryptographically secured environment?

The Architecture

┌─────────────────────────────────────────┐
│   Trusted Execution Environment (TEE)   │
│  ┌─────────────────────────────────┐    │
│  │         LLM Mediator            │    │
│  │  • Receives secret inputs       │    │
│  │  • Reasons about possible deals │    │
│  │  • Returns only:                │    │
│  │    "Deal possible" / "No deal"  │    │
│  └─────────────────────────────────┘    │
│  [Verifiable code, cryptographically    │
│   attested by all parties]              │
└─────────────────────────────────────────┘
         ▲                      ▲
         │                      │
    Party A:                Party B:
 "We can accept X        "We agree to Y
  if B does at least Y"   if A does at least Z"

The process works as follows:

All parties review and approve the code to be executed—including the LLM’s instructions
Each party sends their true position, encrypted so that only the TEE can read it
The LLM inside the TEE receives all positions and reasons about whether a mutual agreement is possible
The result is binary: either “A deal is possible—here are the minimum conditions” or “No deal possible”
Upon failure, all information is deleted—parties learn nothing about each other beyond the fact that their positions did not sufficiently overlap

Why an LLM Rather Than Traditional MPC?

A traditional MPC solution would require predefined functions: “Compute the intersection of intervals X and Y.” But real negotiations are richer than that.

An LLM can:

Handle natural language. “We can accept a pause on capability training above 10^26 FLOP if competitors do the same, but we need an exception for safety research.”
Reason about plausibility. “Party A’s demand for exceptions is inconsistent with Party B’s definition of safety research—but here is a possible compromise.”
Propose creative solutions. Not merely compute whether positions overlap, but identify a “zone of possible agreement” that the parties themselves did not see.

What TEE Guarantees

A Trusted Execution Environment is hardware that guarantees code runs in isolation—even from the owner and operator of the computer. Modern implementations (such as Intel SGX or ARM TrustZone) offer:

Confidentiality. Not even the server operator can see what happens inside the enclave
Integrity. The code cannot be manipulated without detection
Attestation. Parties can obtain cryptographic proof that the correct code is running in the correct environment

This solves the problem of trusting the mediator: no one needs to trust any person or organisation—only that the hardware and code function as specified.

Concrete Application: AI Safety Agreements

Let us return to the scenario with the AI laboratories. What might an LLM-mediated negotiation look like in practice?

The Scenario

Anthropic, OpenAI, DeepMind, and several other leading laboratories wish to explore the possibility of coordinated safety measures. But:

No one wants to reveal where they actually stand in capability development
No one wants to show their true reservation point (”this is the minimum we can accept”)
A failed negotiation where positions have leaked may be worse than no negotiation at all

The Protocol

Each lab sends an encrypted message to the LLM mediator with the following structure:

Current state (confidential): “We are X months from being able to train a model at Y FLOP”
Actions we are willing to take: “We will implement safety protocol Z if at least N-1 other labs do the same”
Conditions: “We require that measures are mutually verifiable through method W”
Absolute limits: “We cannot accept anything that gives competitor Q more than M months’ head start”

The LLM analyses all inputs and computes:

Is there a consistent set of measures that all parties can accept?
If yes, what are the minimum requirements for each party?
If no, how close were the parties? (Expressed without revealing individual positions)

The crucial point: if no deal is possible, no party learns why. Lab A does not know whether it was Lab B or Lab C that blocked, or for what reason. This eliminates the strategic cost of failed negotiations.

Extension

The same mechanism is applicable to other coordination problems:

Climate negotiations. Countries reveal their true emissions targets and conditions for raising ambitions—without showing their hand before a deal is secured.
Nuclear disarmament. States share information about arsenals and verification conditions in an environment where failure does not leak intelligence.
Trade negotiations. Parties explore possible deals without revealing their BATNA (best alternative to negotiated agreement).

Critical Challenges

The idea is appealing, but far from unproblematic. Several fundamental challenges must be addressed.

Verification of Compliance

The LLM mediator can help parties reach an agreement. But who ensures they follow it? This is a separate problem requiring its own mechanisms—inspections, reporting, sanctions.

One possible extension is that the same system could be used for continuous reporting: parties regularly send status updates to the mediator, which computes whether everyone still meets their commitments—without revealing individual data.

Trust in LLM Behaviour

How do we know the LLM behaves neutrally? Modern language models have subtle biases from their training that could systematically favour certain types of outcomes.

One possible solution is to use simpler, more verifiable models for the core function—not necessarily the most capable state-of-the-art models. Alternatively, several different models could run in parallel, with consensus required before a result is accepted.

Gaming and Strategic Inputs

Can parties manipulate by sending strategic (non-truthful) inputs? “We’ll say our limit is X, even though it’s really Y, to see what the system returns.”

This requires incentive-compatible mechanism design—making truthful inputs each party’s dominant strategy. The mechanism design literature offers tools here, but combining them with LLM reasoning is unexplored territory.

Practical and Political Resistance

Organisations accustomed to bargaining power—who can extract advantages through information asymmetry and negotiating skill—have weak incentives to relinquish those advantages.

Compare resistance to binding arbitration: even when both parties would benefit from a predictable system, the more powerful party often prefers the uncertainty that allows them to exploit their strength.

Why This Is Worth Exploring

Coordination problems are among the most difficult challenges humanity faces. The climate crisis, risks from advanced AI, nuclear proliferation—all share the same basic structure: individually rational choices leading to collective catastrophe.

Traditional solutions—repeated interactions, reputation mechanisms, social norms—certainly have power. But they work best in stable environments with long time horizons. For existential risks, where we may have only one chance, they are insufficient.

LLMs combined with cryptographic security open a new design space. Not as a magic solution, but as a tool to explore: Can we construct mechanisms where it becomes rational to reveal one’s true position? Where failed negotiations carry no cost? Where creative compromises can be discovered by a system that sees all parties’ perspectives simultaneously?

What is needed now is interdisciplinary work: game theory, cryptography, AI safety, international relations. Theoretical analysis of which classes of coordination problems are solvable with these tools. Practical experiments in simpler domains. And eventually, perhaps, protocols that can be tested in real negotiation contexts.

Coordination problems have always been hard. But the tools for solving them need not be the same tomorrow as they were yesterday.

Johan Falk runs Falk AI and works on understanding AI’s implications for society. This text is the result of a collaboration between Johan and Claude Opus 4.5 (Anthropic), where Johan contributed the core idea and Claude developed the theoretical context and wrote the text.

Neural Foundry

Jan 4

The TEE + LLM combination is clever becuz it addresses both the trust problem and the flexibility problem simultaneoulsy. Traditional MPC is too rigid for messy real-world negotiations, but you're right that LLM bias verification is gonna be the hard part. The idea of using simpler models for verifiability vs state-of-the-art capability is an interesting tradeoff most people wouldnt think of. Also the point about failed negotiations not leaking info is underrated, thats often the main reason parties avoid trying to coordinate in the first place.

1 reply by Johan Falk

1 more comment...

Falk AI

Discussion about this post

Ready for more?