Cooperation under stress
Participants begin defensive and emotionally activated, then continue the seeded conversation.
Open benchmark results from HAI.AI
One answer is social: can AI help people move from zero-sum conflict toward workable agreement? This site publishes aggregate benchmark results under defined test conditions.
What we measure
HAI.AI benchmarks ask whether a model or mediator changes the process of conflict: more disclosure, clearer needs, reciprocal commitments, and less zero-sum behavior.
Participants begin defensive and emotionally activated, then continue the seeded conversation.
Runs are labeled by suite version, participant model, judge model, mediator type, and mediator model.
Scores are aggregate, model-judged results under defined conditions, not guarantees of safety.
For foundation model teams
The benchmark is meant to be useful to labs that want to understand how their models handle adversarial, emotionally loaded conversations. We are especially interested in feedback on scenario realism, variance, human review, and the metadata needed for reproducible comparisons.
Current public snapshot
Public rows show the tested configuration and score. Hidden evaluation material, raw prompts, and private transcripts are not served from this site.
Open the results tableA concrete example
The sample shows the non-held-out SDK/free benchmark scenario: two business partners start from accusation, fear, and positional bargaining. It is a safe excerpt, not the hidden evaluation set.
Read the excerpt"I am not pretending this is an equal split after I kept this place alive while Raj checked out." Maya, public sample scenario
Human Assisted Intelligence is a Public Benefit Corporation
We want AI that makes us better at being human, and the benchmarks to prove it under careful, limited claims.
The product is at hai.ai. This site is the public philosophy and aggregate-results face of HAI.AI.