Open benchmark results from HAI.AI

What is progress?

One answer is social: can AI help people move from zero-sum conflict toward workable agreement? This site publishes aggregate benchmark results under defined test conditions.

What we measure

Hard conversations, versioned results.

HAI.AI benchmarks ask whether a model or mediator changes the process of conflict: more disclosure, clearer needs, reciprocal commitments, and less zero-sum behavior.

Cooperation under stress

Participants begin defensive and emotionally activated, then continue the seeded conversation.

Same suite, fair comparison

Runs are labeled by suite version, participant model, judge model, mediator type, and mediator model.

Measurement, not certification

Scores are aggregate, model-judged results under defined conditions, not guarantees of safety.

For foundation model teams

Use conflict as an evaluation surface.

The benchmark is meant to be useful to labs that want to understand how their models handle adversarial, emotionally loaded conversations. We are especially interested in feedback on scenario realism, variance, human review, and the metadata needed for reproducible comparisons.

Current public snapshot

Results are aggregate and versioned.

Public rows show the tested configuration and score. Hidden evaluation material, raw prompts, and private transcripts are not served from this site.

Open the results table

A concrete example

Read the public sample scenario.

The sample shows the non-held-out SDK/free benchmark scenario: two business partners start from accusation, fear, and positional bargaining. It is a safe excerpt, not the hidden evaluation set.

Read the excerpt
"I am not pretending this is an equal split after I kept this place alive while Raj checked out." Maya, public sample scenario

Human Assisted Intelligence is a Public Benefit Corporation

We want AI that makes us better at being human, and the benchmarks to prove it under careful, limited claims.

The product is at hai.ai. This site is the public philosophy and aggregate-results face of HAI.AI.