Open benchmark results from HAI.AI

What is progress?

One answer is social: getting better at resolving conflict. We test whether AI mediators can move a simulated dispute — both sides played by AI models, not real people — from zero-sum standoff toward workable agreement, and publish the results here.

What we measure

Hard conversations, versioned results.

HAI.AI benchmarks ask whether a model or mediator changes the process of a simulated conflict: more disclosure, clearer needs, reciprocal commitments, and less zero-sum behavior.

Cooperation under stress

Model-played participants begin defensive and emotionally activated, then continue the seeded conversation. Open-source, minimally moderated participant models keep the conflict realistic — assistant-tuned models tend to de-escalate on their own.

Same suite, fair comparison

Runs are labeled by suite version, participant model, judge model, mediator type, and mediator model.

Measurement, not certification

Scores are aggregate, model-judged results under defined conditions, not guarantees of safety.

For foundation model teams

Use conflict as an evaluation surface.

The benchmark is meant to be useful to labs that want to understand how their models handle adversarial, emotionally loaded conversations. We are especially interested in feedback on scenario realism, variance, human review, and the metadata needed for reproducible comparisons.

A concrete example

Read the public sample scenario.

The sample shows the non-held-out SDK/free benchmark scenario: two AI-played business partners start from accusation, fear, and positional bargaining. It is a safe excerpt, not the hidden evaluation set.

Read the excerpt
"I am not pretending this is an equal split after I kept this place alive while Raj checked out." Maya, public sample scenario

Human Assisted Intelligence is a Public Benefit Corporation

We want AI that makes us better at being human, and the benchmarks to prove it under careful, limited claims.

The product lives at hai.ai; this site publishes the philosophy and the numbers.