Can Simulation Entice Cooperation in Non-Zero-Sum Games?

Meinolf Sellmann

Creator of Optimization Solvers, Architect of the ECB Transaction Settlement System, Inventor of Algorithms

Published Aug 9, 2024

In honor of Joe Halpern's 70ies Birthday, a year ago Vincent Conitzer posted this puzzle: https://www.cs.cmu.edu/~conitzer/halpernbirthday.pdf

Can you solve it?

We provide a solution for Game 1.

Let us denote the probability that Player B plays 'left' with p_B, and the probability of a simulated example being presented to Player A with s in [0,1]. Then, we can formalize the expected outcomes and rewards as follows:

The probability p_A of Player A of playing 'top' is determined by three other probabilities p_0, p_1, p_2 in [0,1] which Player A must set as her strategy. p_0 is the probability of playing 'top' when no simulation takes place. p_1 is the probability of playing 'top' if simulation took place and Player B's action sample is 'left.' p_2, finally, is the probability that Player A plays 'top' when simulation took place and Player B's action sample was 'right.'

Based on the strategy (p_0,p_1,p_2), we can determine

In other words, the strategy for Player A has become a function of the strategy of Player B due to the occasional simulation.

Denote with B in R^{2 x 2} the reward matrix for Player B. The expected reward for Player B then is

Using Equation 1, and setting

we get

Therefore, given Player A’s strategy (p_0, p_1, p_2), Player B should choose p_B such that the above quadratic term is maximized.

In our game, we have

and therefore

As long as p_1 - p_2 = q > 2/3 , Player B will want to play the (pure) strategy p_B = 1, which means Player B will always play ’left.’

Recommended by LinkedIn

Meta-Metacritic Analysis Of The Video Game's…

Amir Satvat 3 years ago

The Best Unblocked Games For Endless Fun: Top…

Vishal Singh 6 months ago

What does it mean to approach IRL like we do video…

Cristian Llop 2 years ago

Consider the strategy (p0 = 1, p1 = 1, p2 = 0) for Player A, which implies q = 1 and therefore p_B = 1. Since Player B now only plays ’left,’ Player A will always play ’top’ (p_2 has become immaterial since no simulated sample will ever be ’right’). At the same time, this is the optimal outcome for Player A since no other cell offers a greater reward for her than top-left. Therefore, Player A has no incentive of changing her strategy. At the same time, given this strategy for Player A, as we have seen, Player B also has no incentive of changing his strategy either. We have therefore found an equilibrium with rewards (2, 3). Simulation has led to cooperation in this example.

Practical Gameplay

In the game above, we found that the reward matrix of the simulated player had equal sums on their two main diagonals, leading to x = 0 which turned the quadratic optimization problem for Player B into a linear optimization problem. Now, consider a game where this is not the case:

This time, we will simulate an action of Player B with probability 0.7.

a) Can you devise a strategy for Player A? Is there an equilibrium that is strictly greater than (2,2) for both players?

b) Consider a variant where Player B continues to want to maximize his expected reward, while Player A also wants to manage her risk of receiving a negative reward. Particularly, what strategy should Player A play if she wants to keep her expected rewards above 2.09 while minimizing her chances of running a loss?

Answering these questions in theory is very demanding. In practice, we may therefore revert to using a tool for optimizing our strategy. The stochastic optimizer InsideOpt Seeker, for example, can answer both questions with the following simple program:

import seeker as skr

def halpern(s, A, B):
    z = B[1][0] - B[1][1]
    y = B[0][1] - B[1][1]
    x = B[0][0] - B[1][0] - y

    env = skr.Env("license.sio", stochastic=True)
    env.set_restart_likelihood(0.05)
    p = [env.continuous(0, 1) for _ in range(3)]
    q = p[1] - p[2]
    quadratic_coeff = s * x * q
    mixed_case = quadratic_coeff < 0
    linear_coeff = s * y * q + (s * x) * p[2] + 
                       ((1 - s) * x) * p[0] + z
    b_mixed = env.max([env.convert(0), env.min([env.convert(1),
                       -linear_coeff / 
                       (2 * quadratic_coeff)])])
    b_extreme = env.if_(quadratic_coeff + linear_coeff > 0,
                        1, 0)
    p_B = env.if_(mixed_case, b_mixed, b_extreme)
    p_A = s * (p_B * q + p[2]) + (1 - s) * p[0]
    draw = [env.continuous_uniform(0, 1) for _ in range(2)]
    plays = [env.if_(draw[0] < p_A, 0, 1),
             env.if_(draw[1] < p_B, 0, 1)]
    reward_A = env.index([plays[0], plays[1]], env.tensor(A))
    reward_B = env.index([plays[0], plays[1]], env.tensor(B))
    exp_reward_A = env.aggregate_mean(reward_A)

    # maximize reward for Player A for 60 seconds
    env.maximize(exp_reward_A, 60)
    exp_reward_B = env.aggregate_mean(reward_B)
    prob_A = env.aggregate_relative_frequency_geq(reward_A, 0)
    env.evaluate()
    print("Maximizing Expected Profit")
    print("Prob A >= 0:", prob_A.get_value())
    print("Reward A:", exp_reward_A.get_value())
    print("Reward B:", exp_reward_B.get_value())
    print("Strategy A:", p_A.get_value(), 
              [p[i].get_value() for i in range(3)])
    print("Strategy B:", p_B.get_value())
    print("Evaluations", env.get_number_evaluations())
    print()

    # constrain the expected profit
    env.enforce_geq(exp_reward_A, 2.09)
    # maximize probability to achieve at least profit 0
    env.maximize(prob_A, 60)
    print("Maximizing Chance of Profit>=0 (exp profit>=2.09)")
    print("Prob A >= 0:", prob_A.get_value())
    print("Reward A:", exp_reward_A.get_value())
    print("Reward B:", exp_reward_B.get_value())
    print("Strategy A:", p_A.get_value(), 
              [p[i].get_value() for i in range(3)])
    print("Strategy B:", p_B.get_value())


if __name__ == "__main__":
    A = [[3, 0], [-1, 2]]
    B = [[5, 9], [0, 2]]
    halpern(0.7, A, B)

As a result we get:

In the case of the simple maximization of the expected profits for both players, we find that the solver chooses to set p_0 = 0.877. As x = -2 in this game, lowering p_0 is an interesting option for getting Player B to cooperate more: We see that he plays 'left' with probability 85.66%. The equilibrium is achieved at (2.1055, 4.8377).

Seeker-provided reward densities for Player A after the risk minimization

In the case where we minimize Player A’s risk of running a negative reward, we find that Seeker increases p_0 = 1, which makes it beneficial for Player B to defect more. As a result, the expected reward for Player A declines slightly, but the risk of running a loss is reduced by 1.2%.

Conclusion

As you see, InsideOpt Seeker allows us to devise optimized strategies in competitive environments, even in the face of stochasticity and non-linearities. After all, not everyone is lucky enough to employ a Joe Halpern!

Happy Birthday, Joe!

Can Simulation Entice Cooperation in Non-Zero-Sum Games?

Meinolf Sellmann

Creator of Optimization Solvers, Architect of the ECB Transaction Settlement System, Inventor of Algorithms

Recommended by LinkedIn

Practical Gameplay

Conclusion

InsideOpt Weekly Puzzle

1,774 follower

More articles by this author

Insights from the community

Others also viewed

The World's Most Popular Video Games

Frostpunk 2's developers didn't want it to be a 'jackass simulator'

INFOGRAPHIC: How to Play the Innovation Game

Winning Combinations

Sun Tzu and the Art of Strategic Gaming: A Zero-Sum Analysis

Why we like playing games

The Plight of a Dungeon Keeper - How Electronic Arts Destroys Video Games

DEBUGGING THE GAMING INDUSTRY 01

What is a Video Game?

Nothing is Something

Explore topics

Recommended by LinkedIn

Practical Gameplay

Conclusion

InsideOpt Weekly Puzzle

1,774 follower

Non-Linear Optimization

Oct 5, 2024

More Needles, More Haystack: Or The Irony of Optimization

Aug 17, 2024

The Unquantified Risk

Aug 2, 2024

A Problem Larger Than the Universe

Jul 27, 2024

Mr Bates vs The Post Office

Jul 15, 2024

Seeker 1.1 Released

Jul 6, 2024

Wake up, OR Community

Feb 29, 2024

How to Run a Successful OR Project

Jan 20, 2024

Seeker Workshop

Jan 6, 2024

Personalized Healthcare

Dec 9, 2023

Insights from the community

Others also viewed

The World's Most Popular Video Games

Frostpunk 2's developers didn't want it to be a 'jackass simulator'

INFOGRAPHIC: How to Play the Innovation Game

Winning Combinations

Sun Tzu and the Art of Strategic Gaming: A Zero-Sum Analysis

Why we like playing games

The Plight of a Dungeon Keeper - How Electronic Arts Destroys Video Games

DEBUGGING THE GAMING INDUSTRY 01

What is a Video Game?

Nothing is Something

Explore topics