## Abstract

In principal-agent models, a principal offers a contract to an agent to preform a certain task. The agent exerts a level of effort that maximizes her utility. The principal is oblivious to the agent's chosen level of effort, and conditions her wage only on possible outcomes. In this work, we consider a model in which the principal is unaware of the agent's utility and action space: she sequentially offers contracts to identical agents, and observes the resulting outcomes. We present an algorithm for learning the optimal contract under mild assumptions. We bound the number of samples needed for the principal obtain a contract that is within ϵ of her optimal net profit for every ϵ>0. Our results are robust even when considering risk averse agents. Furthermore, we show that when there only two possible outcomes, or the agent is risk neutral, the algorithm's outcome approximates the optimal contract described in the classical theory.

Original language | English |
---|---|

Article number | 114219 |

Journal | Theoretical Computer Science |

Volume | 980 |

DOIs | |

State | Published - 20 Nov 2023 |

## Keywords

- Bandits
- Contracts
- Game theory
- Learning

## ASJC Scopus subject areas

- Theoretical Computer Science
- General Computer Science