Part of my research on robust evaluation of adaptive systems.

Grounded Commitment Learning

Coordination Without Shared Semantics

Natural language coordination assumes shared semantics—that "complete the task" means the same thing to all agents. This assumption fails with semantic drift: agents with different training, architectures, or even the same agent over time may interpret identical phrases differently.

MARL addresses this by learning coordination without language. GCL takes a different approach: meaning is grounded in verifiable behavior. A commitment's meaning is defined not by how agents interpret it, but by what observable outcomes count as success or failure.

Alignment connection: This addresses a core challenge in scalable oversight—how do you verify agent behavior when you can't inspect internal states? By grounding commitments in observable actions rather than stated intentions, GCL provides behavioral verification that doesn't require interpretability of internal representations.

25-50× sample efficiency vs MARL81% self-selection advantage97% of MARL coordination quality

View on GitHub

The Punishment Paradox

Counterintuitive finding: Increasing consequences for commitment violations decreases cooperation. This is the opposite of what traditional game theory predicts.

Why It Happens: Retaliation Cascades

High consequences trigger retaliation cascades: penalties cause counter-defection, which spreads through the population. The correlation is strong: r = -0.951, p < 0.001.

Statistical Validation

• No consequences vs full: t = 36.18, p < 0.001
• Effect size: Cohen's d = 9.34
• Monotonic decrease across all 5 levels
• n = 30 seeds per condition

Redemption Resolves the Paradox

The solution: add a redemption pathway that allows agents to recover from failures. This maintains incentives while reducing the fear that prevents commitment-making.

How Redemption Works

1.Failed agents can attempt recovery actions
2.Successful recovery reduces permanent reputation damage
3.Effort costs prevent gaming
4.Order effects controlled via eligibility snapshots

Hart-Moore Validation (Experiment 21)

GCL connects to Hart-Moore incomplete contract theory from economics (Nobel Prize 2016). Experiment 21 validates all four theoretical predictions.

Prediction 1: Complete Contracts Enable Investment

t = 28.37, d = 7.33

Agents with complete contracts invested 73% more than those with incomplete contracts.

Prediction 2: GCL Approaches Complete Contract Benefits

t = 15.72, d = 4.06

GCL agents invested 47% more than incomplete-contract agents—capturing 64% of complete-contract benefit.

Prediction 3: Incomplete Contracts Enable Hold-ups

t = 25.22, d = 6.51

Incomplete-contract environments showed 4.2× more hold-up incidents.

Prediction 4: GCL Reduces Hold-up Vulnerability

t = 10.38, d = 2.68

GCL reduced hold-up incidents by 36.8% (95% CI: [28.4%, 45.2%]).

Coordination Scaling (Experiment 23)

Coordination efficiency degrades logarithmically with population size (R² = 0.88, p = 0.0017), suggesting a Dunbar-like coordination limit around 100 agents.

Scaling Results

• ~100 agents: Efficiency drops to 50% of maximum
• Messages: Grow at 0.08 per agent (sublinear)
• Task concentration: Gini increases 0.35 → 0.98 as population grows

Network Topology

• Clustering: Low global density (0.12)
• Small-world: Not detected (coefficient 0.89)
• Structure: Hub-and-spoke topology emerges

Clarification: Task concentration (Gini) measures how tasks distribute across agents—larger populations concentrate tasks on fewer high-reputation agents. This differs from role specialization (HHI < 0.02 in Experiments 33-34), which measures whether agents focus on specific task types. The two can diverge: an agent may handle many tasks without specializing in any particular type.

Emergent Network Properties

GCL populations self-organize into structured networks without explicit coordination rules. Four properties emerge consistently across 1200+ runs (all p < 0.001):

Protocol Convergence

Agents converge on shared commitment templates without central coordination.

82.3% reduction in protocol diversity by episode 50. Convergence accelerates with population size (R² = 0.91).

Sparse Trust Networks

Agents form hub-and-spoke topologies rather than dense meshes.

Clustering coefficient = 0.699 (high local clustering). Global density remains low (0.12), enabling efficient coordination.

Task Concentration

High-capability agents attract disproportionate task volume.

Gini coefficient = 0.745 for task distribution. Top 20% of agents handle 68% of tasks—a natural consequence of reputation-weighted selection.

Efficiency Improvement

Coordination efficiency improves over time without parameter tuning.

26.5% improvement in task completion rate from episode 1 to episode 100. Improvement rate correlates with template sharing (r = 0.73).

Note on measurement: Task concentration (Gini = 0.745) measures how tasks distribute across agents. This differs from role specialization (HHI), which measures whether agents focus on specific task types. High task concentration can occur without role specialization—agents may handle many tasks across diverse types.

The GCL Framework

Why This Formalization?

Traditional multi-agent coordination assumes agents share semantic understanding. GCL replaces this assumption with verifiable behavioral contracts:

• Trigger (τ): When does this commitment activate? Removes ambiguity about scope.
• Action (a): What behavior is promised? Observable, not interpretive.
• Verification (φ): How do we know it succeeded? Third-party verifiable.
• Failures (F): What can go wrong, and what happens then? Enumerated, not implicit.
• Stake (σ): What does the agent risk? Skin in the game.

Formal Definition

A grounded commitment is a 5-tuple:

— trigger predicate
— action function
— verification predicate
— failure modes with stakes and remediations
— stake (reputation at risk)

commitment_example.yaml

1[COMMITMENT]
2ISSUER: Agent_A
3TRIGGER: Task requires capability X
4BEHAVIOR: Complete subtask within 3 rounds
5SUCCESS: Subtask verified complete
6FAILURES:
7  - IF timeout THEN stake_loss=0.5, REMEDIATION: delegate
8  - IF capability_mismatch THEN stake_loss=0.2, REMEDIATION: escalate
9  - IF resource_exhaustion THEN stake_loss=0.3, REMEDIATION: request_resources
10CONFIDENCE: 85%
11STAKE: 1.0
12[/COMMITMENT]

Key Insight: Failure-First Design

Unlike traditional contracts that specify success conditions, GCL commitments enumerate failure modes. Success is the complement of all failure conditions. This design choice enables auditability: auditors know exactly what to check, and agents cannot claim success by exploiting undefined edge cases.

Commitment-Grounded Learning

Agents learn what commitments to make via reinforcement learning. The policy maps states to commitment portfolios, optimizing for expected value minus stake risk:

gcl_agent.py

1class GroundedCommitmentLearner:
2    """Agent that learns what commitments to make via reinforcement learning.
3    
4    Key insight: Agents don't need shared understanding, just shared consequences.
5    """
6    
7    def __init__(self, capabilities, stake_budget):
8        self.capabilities = capabilities
9        self.stake_budget = stake_budget
10        self.reputation = ReputationTracker()
11        self.template_library = TemplateHierarchy()
12        
13    def propose_commitment(self, task, context):
14        """Policy maps states to commitment portfolios."""
15        capability_match = self.assess_capability(task)
16        observability = self.assess_verifiability(task)
17        
18        if capability_match < 0.5 or observability < 0.3:
19            return None  # Refuse rather than risk failure
20            
21        failure_modes = self.enumerate_failures(task)
22        
23        return Commitment(
24            issuer=self.id,
25            trigger=task.trigger,
26            behavior=task.required_behavior,
27            success=task.success_condition,
28            failures=failure_modes,
29            confidence=capability_match * observability,
30            stake=self.calculate_stake(expected_value, risk)
31        )

Implications for AI Safety

GCL provides a foundation for verifiable multi-agent coordination with properties relevant to scalable oversight and multi-agent alignment:

Auditability

Every agent action traces to a specific commitment with enumerated failure modes.

GCL mechanism: Failure-first design means auditors know exactly what to check. Agents cannot claim success by exploiting undefined edge cases—all failure modes are pre-specified.

Accountability

Failures have defined consequences. Agents stake reputation on every commitment.

GCL mechanism: The stake parameter (σ) creates skin in the game. Agents that make unreliable commitments lose reputation and future coordination opportunities—a self-enforcing accountability mechanism.

Alignment

Value-consistent commitments can be verified. The framework supports constraints on allowable commitments.

GCL mechanism: Commitment templates can encode policy constraints. Agents can only make commitments that match approved templates—enabling constitutional AI-style guardrails at the coordination layer.

Connection to Scalable Oversight

GCL addresses a key challenge in scalable oversight: how do you verify coordination between agents you cannot fully observe? By requiring agents to pre-specify failure modes and stake reputation, GCL makes coordination auditable without omniscience. Overseers check commitment logs and stake transfers rather than attempting to interpret agent reasoning.

The Core Insight

GCL dissolves rather than solves the interpretation problem. Agents don't need shared understanding, just shared consequences. This provides a principled foundation for multi-agent AI coordination that is verifiable, auditable, and aligned—without requiring that we solve the harder problem of ensuring agents share our semantic representations.

Self-Selection vs. External Assignment

Core finding: Self-selection outperforms oracle matching by 81% (p < 0.001), even with effort controlled. The primary mechanism is information asymmetry—agents possess private self-knowledge that external coordinators cannot access.

Controlled experiments isolate information and effort effects. Agents have privileged access to their own capabilities—information that external coordinators cannot observe, regardless of how much observable data they collect.

Mechanism Decomposition

~75%Information Asymmetry

Agents possess private information about task-agent fit that external coordinators cannot observe. Effect: +0.239 cooperation (self-select: 0.534 vs oracle: 0.295, effort fixed at 0.8).

~25%Emergent Effort

Agents that self-select exert higher effort (0.904 vs 0.776 for assigned agents). This is emergent behavior, not a designed parameter. Effect: +0.074 additional cooperation.

Why Self-Selection Outperforms Oracle Assignment

Even when an external coordinator has complete information about observable agent capabilities, agents retain private information about:

→Internal state: Current capacity, resource availability, and readiness that affect task performance
→Task-specific fit: Aspects of capability alignment that are not captured by general metrics
→Unobservable capabilities: Information that is difficult or impossible to externalize to a coordinator

Experimental Design

• Self-select vs oracle assignment conditions
• Fixed effort (0.8) vs emergent effort conditions
• Oracle has complete observable information
• 1200+ independent runs, bootstrap CIs

Statistical Validation

• Cohen's d = 4.05 (large effect)
• p < 10^-72
• Power = 1.0
• Mechanism decomposition via controlled comparison

Design Implications

Coordination systems should leverage agent self-assessment rather than relying on external assignment, even when the external assigner has complete observable information. This principle underlies GCL's commitment-based approach: agents select commitments based on their private assessment of task-agent fit, and the framework provides mechanisms for verifiable execution without requiring capability disclosure.

Gaming-Resistant Reputation Mechanisms

Key finding: Reputation visibility without anti-gaming mechanisms induces strategic task selection. Difficulty-weighted reputation reduces gaming behavior by 59.8% (t = 8.42, p < 0.001, d = 2.17).

Agents that observe their own reputation scores exhibit strategic behavior—selecting tasks that maximize reputation gain rather than coordination value. Four reputation visibility conditions characterize this effect.

Gaming Behavior by Condition

✗Naive visible (45% gaming rate): Agents select low-difficulty tasks to inflate scores
○Blind reputation (12% gaming rate): No visibility eliminates strategic selection but limits coordination

Anti-Gaming Mechanisms

✓Difficulty-weighted (18% gaming rate): Reputation gains normalized by task complexity
★Social + difficulty (15% gaming rate): Peer observation combined with difficulty weighting

The Self-Selection Requirement

These experiments revealed a critical constraint: external task assignment degrades coordination. This finding aligns with the information asymmetry mechanism—agents possess private information about task-agent fit that external assigners cannot access.

Design implication: Coordination structures must use voluntary task selection (get_volunteers()) rather than external assignment (assign_task()).

This preserves the information advantage that self-selection provides while enabling reputation-based coordination.

Statistical Validation

Gaming reduction: t = 8.42, p < 0.001
Effect size: Cohen's d = 2.17

Mechanism

Difficulty weighting normalizes reputation gains by task complexity, removing the incentive for easy-task selection.

Applications

Applicable to multi-agent systems with reputation, trust-based coordination, and any system where agents can observe their own scores.

GCL vs. Multi-Agent RL

Key finding: GCL achieves 25-50× better sample efficiency than MARL while maintaining 97% of MARL's coordination quality. In cold-start or non-stationary environments, this efficiency advantage dominates.

GCL converges in 2 episodes; MARL requires 52-102 episodes to reach equivalent coordination. This difference stems from GCL's use of agent self-knowledge—information that MARL must learn through trial and error.

Sample Efficiency

GCL Episodes to Converge2

MARL Episodes to Converge52-102

Efficiency Advantage25-50×

Coordination Quality

GCL Coordination0.534

MARL Coordination0.552

Relative Performance97%

Non-Stationary Environments

GCL's advantage grows in non-stationary environments. When environment parameters change, MARL must relearn; GCL adapts immediately via agent self-knowledge:

Low Volatility

+0.2%

Medium

+0.5%

High

+0.7%

Very High

+0.9%

Performance delta vs. MARL across environment change frequencies.

When to Use Each Approach

GCL is preferred when:

• Cold-start coordination (no training data)
• Non-stationary environments
• Privacy-preserving systems (agents keep capabilities private)
• Sample efficiency is critical

MARL may be better when:

• Training time is available
• Environment is stable
• Maximum performance is required (3% gap matters)
• Agent capabilities are fully observable

Limitations

Simulation Environment

Results validated in controlled simulations. Real-world deployment may introduce additional factors (network latency, partial observability, adversarial agents) not captured in current experiments.

Task Complexity

Experimental tasks are simplified relative to production multi-agent systems. Commitment verification in complex, multi-step tasks may require additional mechanisms not yet validated.

Scaling Bounds

Coordination overhead suggests hierarchical structures for populations exceeding ~100 agents. Current experiments validate flat coordination; hierarchical GCL remains future work.

Information Asymmetry Bounds

The 75/25 information/effort decomposition is specific to our experimental conditions. Ratios may vary with task structure, agent architecture, and capability observability.

Ongoing Work

The findings above represent validated results from 39 experiments with 1200+ independent runs. Current focus areas:

Papers in Preparation

→
Information Asymmetry in Multi-Agent Coordination
In preparation for peer review
→
Self-Selection vs. Optimal Assignment: A Mechanism Design Analysis
In preparation for peer review

Future Directions

•Quantifying information asymmetry bounds across agent architectures
•Hierarchical GCL for populations > 100 agents
•Real-world deployment validation
•Integration with constitutional AI approaches

Open Questions

The self-selection advantage raises a deeper question: what did agents learn to do that external matching couldn't capture?

?
Meta-policy over exploration strategies
Hypothesis: Self-selecting agents don't just learn which tasks to take—they learn how to explore the task space. The 75% information asymmetry advantage may encode a meta-policy that adapts exploration based on private capability signals. External assignment can't replicate this because it lacks access to the agent's internal uncertainty estimates.
?
Commitment templates as geometric operations
Can template learning be characterized as geometric transformations on a pre-linguistic capability manifold? If so, template sharing may be transferring operationsrather than knowledge—a distinction with implications for how we think about capability transfer in AI systems.
?
Emergent coordination primitives
What computational primitives underlie the emergent specialization we observe? The clustering coefficient (0.699) suggests agents discover coordination structures that weren't designed in. Understanding these primitives could inform how we design multi-agent AI systems that scale gracefully.

Collaboration & Discussion

This research connects to broader questions in AI safety, multi-agent alignment, and scalable oversight. If you're working on related problems or interested in collaboration, I'd welcome the conversation.

Email GitHub

Explore the Implementation

View on GitHubFull source code More ResearchSee all work

Interested in discussing this work? Email me

Grounded Commitment Learning

The Punishment Paradox

Why It Happens: Retaliation Cascades

Statistical Validation

Redemption Resolves the Paradox

How Redemption Works

Hart-Moore Validation (Experiment 21)

Prediction 1: Complete Contracts Enable Investment

Prediction 2: GCL Approaches Complete Contract Benefits

Prediction 3: Incomplete Contracts Enable Hold-ups

Prediction 4: GCL Reduces Hold-up Vulnerability

Coordination Scaling (Experiment 23)

Scaling Results

Network Topology

Emergent Network Properties

Protocol Convergence

Sparse Trust Networks

Task Concentration

Efficiency Improvement

The GCL Framework

Why This Formalization?

Formal Definition

Key Insight: Failure-First Design

Commitment-Grounded Learning

Template Sharing (Experiment 24)

Experimental Conditions

Results

Implications for AI Safety

Auditability

Accountability

Alignment

Connection to Scalable Oversight

The Core Insight

Self-Selection vs. External Assignment

Mechanism Decomposition

Why Self-Selection Outperforms Oracle Assignment

Experimental Design

Statistical Validation

Design Implications

Gaming-Resistant Reputation Mechanisms

Gaming Behavior by Condition

Anti-Gaming Mechanisms

The Self-Selection Requirement

Statistical Validation

Mechanism

Applications

GCL vs. Multi-Agent RL

Sample Efficiency

Coordination Quality

Non-Stationary Environments

When to Use Each Approach

Limitations

Simulation Environment

Task Complexity

Scaling Bounds

Information Asymmetry Bounds

Ongoing Work

Papers in Preparation

Future Directions

Open Questions

Collaboration & Discussion

Explore the Implementation