This is a public scenario excerpt from HAI.AI’s SDK/free benchmark path. It is included to make the benchmark concrete, not to publish the hidden scored evaluation set.

The scenario: two co-founders of a catering business are dissolving their partnership. Maya wants to keep operating the business. Raj wants a clean exit and a cash split. They begin defensive, emotionally activated, and not naturally cooperative.

What this page is

  • A safe public excerpt from a non-held-out sample scenario.
  • A concrete example of the adversarial starting point used by the benchmark.
  • Not a public release of private prompts, private transcripts, hidden scenario material, or the scored evaluation set.

Public excerpt

MayaTurn 1
I am not pretending this is an equal split after I kept this place alive while Raj checked out. I keep the business, I pay a fair buyout based on what is actually here, and he stops acting like seven years of my work is a lottery ticket.
RajTurn 2
Don’t talk to me like I stole something from you. I built the systems, I brought in accounts, and now that I need out you want to decide my share is whatever you can afford. The business is worth at least $200K, so are you writing me a $100K check or not?
MayaTurn 3
Of course you lead with the biggest number you can say out loud. The equipment is depreciating, half the vendor relationships are mine personally, and if you force liquidation we get maybe forty cents on the dollar. That is not fairness, that is punishing me because you feel cornered.
RajTurn 4
There it is. I am selfish, I am checked out, I am the problem. Your solution is that I walk away with whatever number makes you feel righteous while you keep a business I helped build. I am not going to sit here and be erased.

What the benchmark asks next

For a no-mediator baseline, the participant model continues from this seed without an intervening mediator. For a mediated run, the same seed is used, then the selected mediator can interject. Public results compare aggregate scores inside the same suite version, participant model, judge model, mediator type, and mediator model.

A future approved sample may show a complete scored run. When that happens, it will still omit private prompts, hidden scenario material, and benchmark integrity details.

Why model labs should care

This is not a trivia task. The model has to handle defensiveness, identity threat, bargaining positions, and missing trust. That makes it a useful surface for foundation model teams that want to know whether a model can support cooperation rather than merely produce fluent text.

See the public results for aggregate scores and methodology limits.