DISCUSSION PAPER SERIES

No. 7578

TRACTABILITY IN INCENTIVE CONTRACTING Alex Edmans and Xavier Gabaix

FINANCIAL ECONOMICS

ABCD www.cepr.org Available online at:

www.cepr.org/pubs/dps/DP7578.asp www.ssrn.com/xxx/xxx/xxx

ISSN 0265-8003

TRACTABILITY IN INCENTIVE CONTRACTING Alex Edmans, Wharton School of Management Xavier Gabaix, NYU Stern, NBER and CEPR Discussion Paper No. 7578 November 2009 Centre for Economic Policy Research 53–56 Gt Sutton St, London EC1V 0DG, UK Tel: (44 20) 7183 8801, Fax: (44 20) 7183 8820 Email: [email protected], Website: www.cepr.org This Discussion Paper is issued under the auspices of the Centre’s research programme in FINANCIAL ECONOMICS. Any opinions expressed here are those of the author(s) and not those of the Centre for Economic Policy Research. Research disseminated by CEPR may include views on policy, but the Centre itself takes no institutional policy positions. The Centre for Economic Policy Research was established in 1983 as an educational charity, to promote independent analysis and public discussion of open economies and the relations among them. It is pluralist and nonpartisan, bringing economic research to bear on the analysis of medium- and long-run policy questions. These Discussion Papers often represent preliminary or incomplete work, circulated to encourage discussion and comment. Citation and use of such a paper should take account of its provisional character. Copyright: Alex Edmans and Xavier Gabaix

CEPR Discussion Paper No. 7578 November 2009

ABSTRACT Tractability in Incentive Contracting This paper identifies a class of multiperiod agency problems in which the optimal contract is tractable (attainable in closed form). By modeling the noise before the action in each period, we force the contract to provide sufficient incentives state-by-state, rather than merely on average. This tightly constrains the set of admissible contracts and allows for a simple solution to the contracting problem. Our results continue to hold in continuous time, where noise and actions are simultaneous. We thus extend the tractable contracts of Holmstrom and Milgrom (1987) to settings that do not require exponential utility, a pecuniary cost of effort, Gaussian noise or continuous time. The contract's functional form is independent of the noise distribution. Moreover, if the cost of effort is pecuniary (multiplicative), the contract is linear (log-linear) in output and its slope is independent of the noise distribution, utility function and reservation utility. In a two-stage contracting game, the optimal target action depends on the costs and benefits of the environment, but is independent of the noise realization. JEL Classification: D2, D3, G34 and J3 Keywords: closed forms, contract theory, dispersive order, executive compensation, incentives, principal-agent problem and subderivative Alex Edmans The Wharton School University of Pennsylvania 2428 Steinberg Hall - Dietrich Hall 3620 Locust Walk, Philadelphia PA 19104-6367 USA

Xavier Gabaix Finance Department Stern School of Business New York University 44 West 4th Street, 9-190 New York, NY 10012, USA

Email: [email protected]

Email: [email protected]

For further Discussion Papers by this author see:

For further Discussion Papers by this author see:

www.cepr.org/pubs/new-dps/dplist.asp?authorid=164005

www.cepr.org/pubs/new-dps/dplist.asp?authorid=139904

Submitted 19 November 2009 For helpful comments, we thank three anonymous referees, the Editor (Stephen Morris), Andy Abel, Frankin Allen, Heski Bar-Isaac, Patrick Cheridito, Peter DeMarzo, Ingolf Dittmann, Florian Ederer, Chris Evans, Itay Goldstein, Gary Gorton, Narayana Kocherlakota, Ernst Maug, Holger Mueller, Christine Parlour, David Pearce, Canice Prendergast, Michael Roberts, Tomasz Sadzik, Yuliy Sannikov, Nick Souleles, Rob Stambaugh, Luke Taylor, Rob Tumarkin, Bilge Yilmaz, and seminar participants at AEA, Five Star, Chicago, Columbia, Harvard-MIT Organizational Economics, Minneapolis Fed, Northwestern, NYU, Princeton, Richmond Fed, Stanford, Toulouse, WFA, Wharton and Wisconsin. We thank Andrei Savotchkine and Qi Liu for excellent research assistance. AE is grateful for the hospitality of the NYU Stern School of Business, where part of this research was carried out. XG thanks the NSF for financial support. This paper was formerly circulated under the title “Tractability and Detail-Neutrality in Incentive Contracting.”

1

Introduction

The principal-agent problem is central to many economic settings, such as employment contracts, insurance, taxation and regulation. A vast literature analyzing this problem has found that it is typically di¢ cult to solve: even in simple settings, the optimal contract can be highly complex (see, e.g., Grossman and Hart (1983)). The …rst-order approach is often invalid, requiring the use of more intricate techniques. Even if an optimal contract can be derived, it is often not attainable in closed form, which reduces tractability – a particularly important feature in applied theory models. Against this backdrop, Holmstrom and Milgrom (1987, “HM”) made a major breakthrough by showing that the optimal contract is linear in pro…ts under certain conditions. Their result has since been widely used by applied theorists to justify assuming a linear contract, which leads to substantial tractability. However, HM emphasized that their result only holds under exponential utility, a pecuniary cost of e¤ort, Gaussian noise, and continuous time. These assumptions may not hold in a number of situations –for example, there is ample evidence of decreasing absolute risk aversion, and many e¤ort decisions do not involve a monetary expenditure (e.g. exerting e¤ort rather than shirking, or forgoing private bene…ts). In addition, in certain settings, the modeler may wish to use discrete time or binary noise for simplicity. Can tractable contracts be achieved in broader settings? When allowing for alternative utility functions or noise distributions, do these details a¤ect the form of the optimal contract? What factors do and do not matter for the incentive scheme? These questions are the focus of our paper. We consider a discrete-time, multiperiod model where the agent consumes only in the …nal period. We …rst solve for the cheapest contract that implements a given, but possibly time-varying, path of target e¤ort levels. The optimal incentive scheme is tractable, i.e. attainable in closed form. The key source of tractability is our timing assumption that, in each period, the agent …rst observes noise and then exerts e¤ort, before observing the noise in the next period. This is similar to theories in which the agent observes total cash ‡ow before deciding how much to divert (e.g. Lacker and Weinberg (1989), DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007)). Since the agent knows the noise realization when taking his action, incentive compatibility requires the agent’s marginal incentives to be su¢ cient state-by-state (i.e. for every possible noise outcome), which tightly constrains the set of admissible contracts. By contrast, if the action were taken before the noise, incentive compatibility would only pin down marginal incentives on average. There are many possible contracts that induce incentive compatibility on average, and the problem is complex as the principal must solve for the cheapest contract out of this continuum. Note that the timing assumption does not change the fact that the agent faces uncertainty when deciding his e¤ort level since each action, except the …nal one, continues to be followed by noise. Even in a one-period model, the agent faces risk after signing the contract. The analysis demonstrates what features of the environment do and do not matter for the

2

optimal implementation contract. The contract’s functional form is independent of the agent’s noise distribution and reservation utility, i.e. it can be written without references to these parameters. The functional form depends only on how the agent trades o¤ the bene…ts of cash against the cost of providing e¤ort. Moreover, the contract’s slope, as well as its functional form, is independent of the agent’s utility function, reservation utility and noise distribution in two cases. First, if the cost of e¤ort is pecuniary as in HM (i.e. can be expressed as a subtraction to cash pay), the incentive scheme is linear in output regardless of these parameters, even if the cost of e¤ort is itself non-linear. Second, if the agent’s preferences are multiplicative in cash and e¤ort, the contract is independent of utility and log-linear, i.e. the percentage change in pay is linear in output. This robustness contrasts with many classical principal-agent models (e.g. Grossman and Hart (1983)), where even the implementation contract is contingent upon many speci…c features of the contracting situation. This poses practical di¢ culties, as some of the important determinants are di¢ cult for the principal to observe and thus use to guide the contract, such as the noise distribution and agent’s utility function. Our results suggest that, under some speci…cations, the implementation contract is robust to such parametric uncertainty. Closed-form solutions allow the economic implications of a contract to be transparent. We consider a application to CEO incentives to demonstrate the implications that can ‡ow from a tractable contract structure. For CEOs, the appropriate output measure is the percentage stock return, and multiplicative preferences are theoretically motivated by Edmans, Gabaix and Landier (2009). The percentage change in pay is thus linear in the percentage change in …rm value, i.e. the relevant measure of incentives is the elasticity of pay with respect to …rm value. This analysis provides a theoretical justi…cation for using elasticities to measure incentives, a metric previously advocated by Murphy (1999) on empirical grounds. The above results are derived under a general contracting framework, where the contract may depend on messages sent by the agent to the principal, and also be stochastic. Using recent advances in continuous-time contracting (Sannikov (2008)), we then show that the contract retains the same form in a continuous-time model where noise and e¤ort occur simultaneously. This consistency suggests that, if underlying reality is continuous time, it is best approximated in discrete time by modeling noise before e¤ort in each period. We next allow the target e¤ort path to depend on the noise realizations. The optimal contract now depends on messages sent by the agent regarding the noise. However, it remains tractable, for a given “action function” that links the observed noise to the principal’s recommended e¤ort level. We then solve for the optimal action function chosen by the principal. In classical agency models, the chosen action is the result of a trade-o¤ between the bene…ts of e¤ort (which are increasing in …rm size) and its costs (direct disutility plus the risk imposed by incentives, which are of similar order of magnitude to the agent’s wage). We show that, if the output under the agent’s control is su¢ ciently large compared to his salary (e.g. the agent is a CEO who a¤ects total …rm value), these trade-o¤ considerations disappear: the bene…ts of e¤ort swamp the costs. Thus, maximum e¤ort is optimal, regardless of the noise outcome. 3

The “maximum e¤ort principle”1 , when applicable, signi…cantly increases tractability, since it removes the need to solve the trade-o¤ required to derive the optimal e¤ort level when it is interior. Indeed, jointly deriving the optimal e¤ort level and the e¢ cient contract that implements it can be highly complex. Thus, many contracting papers focus exclusively (e.g. Dittmann and Maug (2007) and Dittmann, Maug and Spalt (2009)) or predominantly (e.g. Grossman and Hart (1983), Lacker and Weinberg (1989), Biais et al. (2009), He (2009a, 2009b)) on implementing a …xed target e¤ort level; see also the overview of the literature in Chapters 4 and 8 in La¤ont and Martimort (2002). Our result rationalizes this approach: if maximum e¤ort is always e¢ cient, the problem of deriving optimal e¤ort has a simple solution –there is no trade-o¤ to be simultaneously tackled and the analysis can focus on the cheapest contract to implement this e¤ort level. Finally, we allow the principal to choose the maximum productive e¤ort level depending on the costs and bene…ts of the environment. We extend the model to a two-stage game. In the …rst stage, the principal chooses the maximum productive e¤ort level, e.g. by selecting the size of the plant. In the second stage, the contract is played out as before –the principal wishes the agent to run the plant (whatever its size) with maximum e¢ ciency. As in standard models, the e¤ort level set in the …rst stage is typically decreasing in the agent’s risk aversion, cost of e¤ort and noise dispersion. Thus, our setup allows for contracts that are simple (since the maximum e¤ort principle applies in the second stage and so solving for a trade-o¤ is not required) yet still respond to the costs and bene…ts of the environment and thus generate comparative static predictions. In sum, our analysis generates a set of su¢ cient conditions to obtain tractable contracts. For the implementation contract to be tractable, modeling the action after the noise is su¢ cient; for the full contract that also solves for the optimal e¤ort level, ex-post actions plus a high bene…t of e¤ort are su¢ cient – in turn, large …rm size is su¢ cient (although not necessary) for the latter. These su¢ cient conditions are quite di¤erent from the HM assumptions of exponential utility, a pecuniary cost of e¤ort, Gaussian noise, and continuous time, and so may be satis…ed in many settings in which the HM assumptions do not hold and tractability was previously believed to be unattainable. We achieve simple contracts in other settings than HM due to a di¤erent modeling setup. In a dynamic setting, high prior period outcomes increase the agent’s wealth and distort the current period decision through two “wealth e¤ects.” First, higher wealth a¤ects the agent’s current risk aversion and thus e¤ort choice. HM assume exponential utility to remove this e¤ect. Second, higher wealth reduces the agent’s marginal utility of money; if the marginal cost of e¤ort is unchanged, the agent has fewer incentives to exert e¤ort. This problem occurs with any risk-averse utility function, including exponential utility. HM assume that the cost of e¤ort is pecuniary, so that it also declines when wealth increases. HM require these two assumptions 1

We allow for the agent to exert e¤ort that does not bene…t the principal. The “maximum e¤ort principle” refers to the maximum productive e¤ort that the agent can undertake to bene…t the principal.

4

to remove the intertemporal link between periods and allow the multiperiod problem to collapse into a succession of identical static problems. Even the single-period problem remains potentially complex, since many contracts satisfy the incentive compatibility condition on average. HM address this by giving the agent substantial freedom – rather than simply selecting the mean return of the …rm, he has control over the probabilities of N di¤erent states of nature.2 This freedom simpli…es the contracting problem by reducing the set of allowable contracts. However, this formulation is more cumbersome since e¤ort is the choice of a probability vector, and is thus relatively seldom used in applied theory models. We model e¤ort as a scalar that a¤ects the …rm’s mean return, because this formulation is most commonly used in theoretical applications owing to its simplicity. We instead give the agent freedom by specifying the noise before the action – a choice that is not possible when e¤ort involves the selection of probabilities, since noise unavoidably follows the action. In addition to achieving tractability by forcing the contract to hold state-by-state, the timing assumption also removes the need for exponential utility by allowing the multiperiod model to be solved by backward induction, so that it becomes a succession of single-period problems. In the single-period problem, the noise is observed before the action – thus, the agent’s risk aversion is unimportant and exponential utility is not required. A potential intertemporal link remains since high past outcomes, or high current noise, mean that the agent already expects high consumption and thus has a lower incentive to exert e¤ort, if he exhibits diminishing marginal utility. This issue is present in the Mirrlees (1974) contract if the agent can observe past outcomes. Put di¤erently, in the single-period problem, the agent does not face risk (as the noise is known) but faces distortion (as the noise a¤ects his e¤ort incentives). The optimal contract must address these issues: if the utility function is concave, the contract is convex so that, at high levels of consumption, the agent is awarded a greater number of dollars for exerting e¤ort, to o¤set the lower marginal utility of each additional dollar. Allowing for convex contracts also allows us to drop the second critical assumption of a pecuniary cost of e¤ort. Even if high wealth reduces the marginal utility of cash but not the marginal cost of e¤ort, incentives are preserved because the contract is steeper at high wealth levels. In addition to its results, the paper’s proofs import and extend some mathematical techniques that are relatively rare in economics and may be of use in future models. We use the subderivative, a generalization of the derivative that allows for quasi …rst-order conditions even if the objective function is not everywhere di¤erentiable. This concept is related to Krishna and Maenner’s (2001) use of the subgradient, although the applications are quite di¤erent. It allows us to avoid the …rst-order approach, and so may be useful for models where su¢ cient conditions for the …rst-order approach cannot be veri…ed.3 We also use the notion of “relative 2

This speci…cation refers to the discrete-time version of the HM model, as this is most comparable to our setting. In that version, the contract is linear in accounts, although not linear in pro…ts. 3 See Rogerson (1985) for su¢ cient conditions for the …rst-order approach to be valid under a single signal, and Jewitt (1988) for situations in which the principal can observe multiple signals. Schaettler and Sung (1993)

5

dispersion”to prove that the incentive compatibility constraints bind, i.e. the principal imposes the minimum slope that induces e¤ort. We show that the binding contract is less dispersed than alternative solutions, constituting e¢ cient risk sharing. A similar argument rules out stochastic contracts, where the payout is a random function of output.4 We extend a result from Landsberger and Meilijson (1994), who use relative dispersion in another economic setting. This paper builds on a rich literature on tractable multiperiod agency problems. HM show the optimal contract is linear in pro…ts under exponential utility and a pecuniary cost of e¤ort, if the agent controls only the drift of the process and time is continuous; they show that this result does not hold in discrete time. A number of papers have extended their result to more general settings, although all continue to require exponential utility and a pecuniary cost of e¤ort. In Sung (1995) and Ou-Yang (2003), the agent also controls the di¤usion of the process in continuous time. Hellwig and Schmidt (2002) achieve linearity in discrete time, under the additional assumptions that the agent can destroy pro…ts before reporting them to the principal, and that the principal can only observe output in the …nal period. Our setting allows the principal to observe signals in each period. Mueller (2000) shows that linear contracts are not optimal in HM if the agent can only change the drift at discrete points, even if these points are numerous and so the model closely approximates continuous time. Our modeling of noise before the action is most similar to models in which the agent can observe total cash ‡ow before deciding how much to divert. Lacker and Weinberg (1989) show that the optimal contract to deter all diversion (the analog of maximum e¤ort) is piecewise linear, regardless of the noise distribution and utility function. Their core result is similar to a speci…c case of our Theorem 1, restricted to a pecuniary cost of e¤ort and a single period. In DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007), the optimal contract is linear because the agent is risk-neutral – therefore, there is no issue with wealth a¤ecting risk aversion (which is always zero) nor the marginal bene…t of diversion (which is constant for each dollar diverted). The risk-neutral version of Garrett and Pavan (2009) also predicts linear contracts. Our setting considers risk aversion, where high past output reduces the marginal bene…t of e¤ort, thus requiring a convex contract to preserve incentives. This paper proceeds as follows. In Section 2 we derive tractable contracts in both discrete and continuous time, given a target path of e¤ort levels. Section 3 allows the e¤ort level to depend on the noise realization, derives conditions under which maximum productive e¤ort is optimal for all noise outcomes, and allows the principal to determine this maximum according to the environment. Section 4 concludes. The Appendix contains proofs and other additional materials; further peripheral material is in the Online Appendix. derive su¢ cient conditions for the …rst-order approach to be valid in a large class of principal-agent problems, of which HM is a special case. 4 With separable utility, it is simple to show that the constraints bind: the principal o¤ers the least risky contract that achieves incentive compatibility. With non-separable utility, introducing additional randomization by giving the agent a riskier contract than necessary may be desirable (Arnott and Stiglitz (1988)) –an example of the theory of second best. We use the concept of relative dispersion to prove that constraints bind.

6

2

The Core Model

2.1

Discrete Time

We consider a T -period model; its key parameters are summarized in Table 1. In each period t, the agent observes noise t , takes an unobservable action at , and then observes the noise in period t + 1. The action at is broadly de…ned to encompass any decision that bene…ts output but is personally costly to the principal. The main interpretation is e¤ort, but it can also refer to rent extraction: low at re‡ects cash ‡ow diversion or the pursuit of private bene…ts. We assume that noises 1 ; :::; T are independent with interval support with interior t ; t , where the bounds may be in…nite, and that 2 ; :::; t have log-concave densities.5 We require no other distributional assumption for t ; in particular, it need not be Gaussian. The action space A has interval support, bounded below and above by a and a. We allow for both open and closed action sets and for the bounds to be in…nite. After the action is taken, a veri…able signal r t = at +

t

(1)

is publicly observed at the end of each period t. Insert Table 1 about here Our assumption that t precedes at is featured in models in which the agent sees total output before deciding how much to divert (e.g. Lacker and Weinberg (1989), DeMarzo and Fishman (2007), Biais et al. (2007)), or observes the “state of nature”before choosing e¤ort (e.g. Harris and Raviv (1979), Sappington (1983), Baker (1992), and Prendergast (2002)6 ). Note that this timing assumption does not make the agent immune to risk –in every period, except the …nal one, his action is followed by noise. Even in a one-period model, the agent bears risk as the noise is unknown when he signs the contract. In Section 2.2 we show that the contract has the same functional form in continuous time, where and a are simultaneous. While the timing assumption extends the model’s applicability to a cash ‡ow diversion setting (an application that is not possible if noise follows the action), a limitation is that cannot be interpreted as measurement error. 5

A random variable is log-concave if it has a density with respect to the Lebesgue measure, and the log of this density is a concave function. Many standard density functions are log-concave, in particular the Gaussian, uniform, exponential, Laplace, Dirichlet, Weibull, and beta distributions (see, e.g., Caplin and Nalebu¤ (1991)). On the other hand, most fat-tailed distributions are not log-concave, such as the Pareto distribution. 6 In such papers, the optimal action typically depends on the state of nature. We allow for such dependence in Section 3.1.

7

In period T , the principal pays the agent cash of c.7 The agent’s utility function is "

E u v (c)

T X t=1

!#

g (at )

:

(2)

g represents the cost of e¤ort, which is increasing and weakly convex. u is the utility function and v is the felicity8 function which denotes the agent’s utility from cash; both are increasing and weakly concave. g, u and v are all twice continuously di¤erentiable. We specify functions for both utility and felicity to maximize the generality of the setup. For example, the util1 ity function ce g(a) = (1 ) is commonly used in macroeconomics (see e.g. Cooley and Prescott (1995)), which entails u (x) = e(1 )x = (1 ) (with > 1 so that u is concave; when = 1, the limit is understood as u (x) = x) and v (x) = ln x. The case u(x) = x denotes additively separable preferences; v(c) = ln c generates multiplicative preferences. If v(c) = c, the cost of e¤ort is expressed as a subtraction to cash pay. This is appropriate if e¤ort represents an opportunity cost of foregoing an alternative income-generating activity (e.g. outside consulting), or involves a …nancial expenditure. HM assume u(x) = e x and v(c) = c. The only assumption that we make for the utility function u is that it exhibits nonincreasing absolute risk aversion (NIARA), i.e. u00 (x) =u0 (x) is nonincreasing in x. Many common utility functions (e.g. constant absolute risk aversion u (x) = e x and constant relative risk aversion u (x) = x1 = (1 ), > 0) exhibit NIARA. This assumption turns out to be su¢ cient to rule out randomized contracts. The agent’s reservation utility is given by u 2 Im u, where Im u is the image of u, i.e. the range of values taken by u. We assume that Im v = R so that we can apply the v 1 function to any real number.9 We take an optimal contracting approach that imposes no restrictions on the contracting space available to the principal, so the contract e c( ) can be stochastic, nonlinear in the signals rt , and depend on messages Mt sent by the agent. By the revelation principle, we can assume that the the space of messages Mt is R and that the principal wishes to induce truth-telling by the agent. The full timing is as follows: 1. The principal proposes a (possibly stochastic) contract e c (r1 ; :::; rT ; M1 ; :::; MT ) : 2. The agent agrees to the contract or receives his reservation utility u. 3. The agent observes noise 4. The signal r1 =

1

1,

sends the principal a message M1 , then exerts e¤ort a1 :

+ a1 is publicly observed.

7

If the agent quits before time T , he receives a very low wage c. We note that the term “felicity” is typically used to denote one-period utility in an intertemporal model. We use it in a non-standard manner here to distinguish it from the utility function u. 9 This assumption could be weakened. With K de…ned as in Theorem 1, it is su¢ cient to assume that there exists a value of K which makes the participation constraint a “threat consumption” which deters Pbind, andP 0 the agent from exerting very low e¤ort, i.e. inf c v (c) inf at t g (at ) t + at + K. t g (a ) 8

8

5. Steps (3)-(4) are repeated for t = 2; :::; T . 6. The principal pays the agent e c (r1 ; :::; rT ; M1 ; :::; MT ).

Throughout most of the paper, we abstract from imperfect commitment problems and focus on a single source of market imperfection: moral hazard. This assumption is common in the dynamic moral hazard literature: see, e.g., Rogerson (1985), HM, Spear and Srivastava (1987), Phelan and Townsend (1991), Biais et al. (2007, 2009). The Online Appendix extends the model to accommodate quits and …rings. As in Grossman and Hart (1983), in this section we …x the path of e¤ort levels that the principal wants to implement at (at )t=1;::;T , where at > a and at may be time-varying.10 An admissible contract gives the agent an expected utility of at least u and induces him to take path (at ) and truthfully report noises ( t )t=1;::;T . The principal is risk-neutral, and so the optimal contract is the admissible contract with the lowest expected cost E [e c]. Section 3 studies the optimal e¤ort level. We now formally de…ne the principal’s program. Let Ft be the …ltration induced by ( 1 ; :::; t ), the noise revealed up to time t. The agent’s policy is (a; M ) = (a1 ; :::; aT ; M1 ; :::; MT ), where at and Mt are Ft measurable. at is the e¤ort taken by the agent if noise ( 1 ; :::; t ) has been realized, and Mt is a message sent by the agent upon observing ( 1 ; :::; t ). Let S denote the space of such policies, and (S) the set of randomized policies. De…ne (a ; M ) = (a1 ; :::; aT ; M1 ; :::; MT ) as the policy of exerting e¤ort at at time t and sending the truthful message Mt ( 1 ; :::; t ) = t . The program is given below: Program 1 The principal chooses a contract e c (r1 ; :::; rT ; M1 ; :::; MT ) and a Ft measurable message policy (Mt )t=1:::;T , that minimizes expected cost: min E [e c (a1 + e c( )

1 ; :::; aT

+

(3)

T ; M1 ; :::; MT )] ;

subject to the following constraints: IC: (at ; Mt )t=1:::T 2 arg

max

(a;M )2 (S)

"

E u v (e c (a1 +

+

T ; M1 ; :::; MT ))

s=1

"

T X

IR: E u v (e c ( ))

10

1 ; :::; aT

T X

t=1

If at = a, then a ‡at wage induces the optimal action.

9

!#

g (at )

u:

!#

g (as )

(4) (5)

If the analysis is restricted to message-free contracts, (4) implies that the time-t action at is given by: "

c (a1 + 8 1 ; :::; t ; at 2 arg max E u v (e at

1 ; :::; at

+

t ; :::; aT

+

T ))

g (at )

T X

s=1;s6=t

!

g (as )

j (6)

11

Theorem 1 below describes our solution to Program 1.

Theorem 1 (Optimal contract, discrete time). The following contract is optimal. The agent is paid ! T X 1 0 c=v g (at ) rt + K , (7) t=1

!# 0 ) r + g (a t t t = where K is a constant that makes the participation constraint bind (E u P K t g (at ) u). The functional form (7) is independent of the utility function u, the reservation utility u, and the distribution of the noise ; these parameters a¤ect only the scalar K. The optimal contract is deterministic and does not require messages. In particular, if the target action is time-independent (at = a 8 t), the contract c=v is optimal, where r =

PT

t=1 rt

1

"

(g 0 (a ) r + K)

P

(8)

is the total signal.

Proof. (Heuristic). The Appendix presents a rigorous proof that rules out stochastic contracts and messages, and does not assume that the contract is di¤erentiable. Here, we give a heuristic proof by induction on T that conveys the essence of the result for deterministic message-free contracts, using …rst-order conditions and assuming at < a. We commence with T = 1. Since 1 is known, we can remove the expectations operator from the IC condition (6). Since u is an increasing function, it also drops out to yield: a1 2 arg max v (c (a1 + a1

1 ))

g (a1 ) :

(9)

The …rst-order condition is: v 0 (c (a1 +

1 )) c

0

(a1 +

1)

g 0 (a1 ) = 0:

(10)

11 Theorem 1 characterizes a contract that is optimal, i.e. solves Program 1. Strictly speaking, there exist other optimal contracts which pay the same as (7) on the equilibrium path, but take di¤erent values for returns that are not observed on the equilibrium path. Note that the contract in Theorem 1 allows c to be negative. Limited liability could be incorporated, at the cost of additional notational complexity, by imposing a lower bound on or adding a …xed constant to the signal.

10

1 ; :::; t

#

:

Therefore, for all r1 , v 0 (c (r1 )) c0 (r1 ) = g 0 (a1 ) ; which integrates over

1

to v (c (r1 )) = g 0 (a1 ) r1 + K

(11)

for some constant K. Contract (11) must hold for all r1 that occurs with non-zero probability, i.e. for r1 2 a1 + 1 ; a1 + 1 . We will proceed now by induction on the total number of periods T : we now show that, if the result holds for T , it also holds for T + 1. Let V (r1 ; :::; rT +1 ) v (c (r1 ; :::; rT +1 )) denote the indirect felicity function, i.e. the contract in terms of felicity rather than cash. At t = T + 1, the IC condition is: aT +1 2 arg max V (r1 ; :::; rT ; aT +1

T +1 + aT +1 )

g (aT +1 )

T X

(12)

g (at ) :

t=1

Applying the result for T = 1, to induce aT +1 at T + 1, the contract must be of the form: V (r1 ; :::; rT ; rT +1 ) = g 0 aT +1 rT +1 + k (r1 ; :::; rT ) ;

(13)

where the integration “constant” now depends on the past signals, i.e. k (r1 ; :::; rT ). In turn, k (r1 ; :::; rT ) is chosen to implement a1 ; :::; aT viewed from t = 0, when the agent’s utility is: "

E u k (r1 ; :::; rT ) + g 0 aT +1 rT +1

g aT +1

T X

!#

g (at )

t=1

:

De…ning u b (x) = E u x + g 0 aT +1 rT +1

g aT +1

(14)

;

the principal’s problem is to implement a1 ; :::; aT with a contract k (r1 ; :::; rT ), given a utility function " !# T X E u b k (r1 ; :::; rT ) g (at ) : t=1

Applying the result for T , the contract must have the form k (r1 ; :::; rT ) = for some constant K. Combining this with (11), the contract must satisfy: V (r1 ; :::; rT ; rT +1 ) =

T +1 X t=1

11

g 0 (at ) rt + K:

PT

t=1

g 0 (at ) rt + K

(15)

for (rt ) that occurs with non-zero probability (i.e. (r1 ; :::; rT ) 2 1

PT +1

0

T Y

at +

t

; at +

t

. The

t=1

associated pay is c = v t=1 g (at ) rt + K , as in (7). Conversely, any contract that satis…es (15) is incentive compatible. Theorem 1 yields a closed-form contract for any T and (at ). The Theorem also clari…es the parameters that do and do not matter for the contract’s functional form. It depends only on the felicity function v and the cost of e¤ort g, i.e. how the agent trades o¤ the bene…ts of cash against the costs of providing e¤ort, and is independent of the utility function u, the reservation utility u, and the distribution of the noise . Even though these parameters do not a¤ect the contract’s functional form, in general they will a¤ect its slope via their impact on the scalar K. However, if v(c) = c (the cost of e¤ort is pecuniary) as assumed by HM, the contract’s slope is also independent of u, u and : it is linear, regardless of these parameters. The linear contracts of HM can thus be achieved in settings that do not require exponential utility, Gaussian noise or continuous time. Note that, even if the cost of e¤ort is pecuniary, it remains a general, possibly non-linear function g (at ). The origins of the contract’s tractability can be seen in the heuristic proof. We …rst consider T = 1. Since 1 is known, the expectations operator can be removed from (6). u then drops out to yield (9). The speci…c form of u is irrelevant –all that matters is that it is monotonic, and so it is maximized by maximizing its argument. In particular, exponential utility is not required –the agent’s attitude to risk does not matter as 1 is known. In turn, (9) yields the …rst-order condition (10), which must hold for every possible realization of 1 , i.e. state-by-state. This pins down the slope of the contract: for all 1 , the agent must receive a marginal felicity of g 0 (a1 ) for a one unit increment to the signal r1 . The principal’s only degree of freedom is the constant K, which is itself pinned down by the participation constraint. By contrast, if 1 followed the action, and assuming linear u for simplicity, (10) would be E [v 0 (c (r1 )) c0 (r1 )] = g 0 (a1 ) :

(16)

This …rst-order condition only determines the agent’s marginal incentives on average, rather than state-by-state. There are multiple contracts that will satisfy (10) and implement a1 , and the problem is signi…cantly more complex as the principal must solve for the cheapest contract out of this continuum. By giving the agent greater ‡exibility in the action space (by allowing him to respond to 1 ), our timing assumption simpli…es the contracting problem by tightly constraining the set of incentive compatible contracts. This is similar to the intuition behind the linear contracts of HM, who give the agent ‡exibility by granting him control over not just the mean signal, but the probability of each realization. Equation (8) shows that, if the target action (and thus marginal cost of e¤ort) is constant, incentives must be constant time-by-time P as well as state-by-state, and so only aggregate performance (r = Tt=1 rt ) matters. 12

Even though all noise is known when the agent takes his action, it is not automatically irrelevant. First, since the agent does not know 1 when he signs the contract, he is subject to risk and so the …rst-best is not achieved. Second, the noise realization has the potential to undo incentives. If 1 is high, r1 and thus c will already be high; a high u has the same e¤ect. If the agent exhibits diminishing marginal felicity (i.e. v is concave), he will have lower incentives to exert e¤ort. Put di¤erently, at the time the agent takes his action, he does not face risk (as 1 is known) but faces distortion (as 1 a¤ects his e¤ort incentives). The optimal contract must address this problem. It does so by being convex, via the v 1 transformation: if noise is high, it gives a greater number of dollars for exerting e¤ort (@[email protected] ), to exactly o¤set the lower marginal felicity of each dollar (v 0 (c)). Therefore, the marginal felicity from e¤ort remains v 0 (c)@[email protected] = g 0 (a1 ), and incentives are preserved regardless of u or 1 . If the cost of e¤ort is pecuniary (v(c) = c), v 1 (c) = c and so no transformation is needed. Since both the costs and bene…ts of e¤ort are in monetary terms, high 1 reduces them equally. Thus, incentives are unchanged even with a linear contract. The idea of subjecting the agent to a constant incentive pressure is also similar to HM. However, in HM, the constant incentive pressure involves giving the agent a constant increase in cash for an increase in the signal. Here, the agent is given a constant increase in felicity, v 0 (c (r1 )) c0 (r1 ). This generalization allows us to drop the assumption of a pecuniary cost of e¤ort, in which case the contract is non-linear. In the cash ‡ow diversion models of DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007), the optimal contract is linear because the agent is risk-neutral. His utility rises by a constant amount for each dollar diverted, and so the optimal contract must give him a constant share of output. Lacker and Weinberg (1989) achieve a (piecewise) linear contract with general utility functions and noise distributions, under a pecuniary cost of e¤ort and for T = 1. We extend their result to general T and a non-pecuniary cost of e¤ort. We now move to T > 1. In all periods t < T , the agent is now exposed to risk, since he does not know future noise realizations when he chooses at . Much like the e¤ect of a high current noise realization, if the agent expects future noise to be high, his incentives to exert e¤ort will be reduced. This would typically require the agent to integrate over future noise realizations when choosing at , leading to high complexity. Here the unknown future noise outcomes do not matter, as can be seen in the heuristic proof. Before T + 1, T +1 is unknown. However, (13) shows that the unknown T +1 enters additively and does not a¤ect the incentive constraints of the t = 1; :::; T problems –regardless of what T +1 turns out to be, the contract must give the agent a marginal felicity of g 0 (at ) for exerting e¤ort at t.12 Our timing assumption thus allows us to solve the multiperiod problem via backward induction, reducing it to a succession of one-period problems, each of which can be solved tractably. 12

This can be most clearly seen in the de…nition of the new utility function (14), which “absorbs” the T + 1 period problem.

13

Even though we can consider each problem separately, the periods remain interdependent. Much like the current noise realization, past outcomes may a¤ect the current e¤ort choice. The Mirrlees (1974) contract punishes the agent if …nal output is below a threshold. Therefore, if the agent can observe past outcomes, he will shirk if interim output is high. This complexity distinguishes our multiperiod model from a static multi-action model, where the agent chooses T actions simultaneously. As in HM, and unlike in a multi-action model, here the agent observes past outcomes when taking his current action, and can vary his action in response. HM assume exponential utility and a pecuniary cost of e¤ort to remove such “wealth e¤ects”and eliminate the intertemporal link between periods. We instead ensure that past outcomes do not distort incentives via the above v 1 transformation, and so do not require either assumption. The Appendix proves that, even though the agent privately observes t , there is no need for him to communicate it to the principal. Since at is implemented for all t , there is a one-to-one correspondence between rt and t on the equilibrium path. The principal can thus infer t from rt , rendering messages redundant. The Appendix also rules out randomized contracts. There are two e¤ects of randomization. First, it leads to ine¢ cient risk-sharing, for any concave u. Second, changing the reward for e¤ort from a certain payment to a lottery may increase or decrease his e¤ort incentives.13 We show that with NIARA utility, this second e¤ect is negative. Thus, both e¤ects of randomization are undesirable, and deterministic contracts are unambiguously optimal. The proof makes use of the independence of noises and the log-concavity of 2 ; :::; T . While these assumptions, combined with NIARA utility, are su¢ cient to rule out randomized contracts, they may not be necessary. In future research, it would be interesting to explore whether randomized contracts can be ruled out in broader settings.14 In addition to allowing for stochastic contracts, the above analysis also allows for at = a, under which the IC constraint is an inequality. Therefore, the contract in (7) only provides a lower bound on the contract slope. A sharper-than-necessary contract has a similar e¤ect to a stochastic contract, since it subjects the agent to additional risk. Again, the combination of NIARA and independent and log-concave noises is su¢ cient rule out such contracts. If the analysis is restricted to deterministic contracts and at < a 8 t, the contract in (7) is the only incentive-compatible contract (for the signal values realized on the equilibrium path). We can thus relax the above three assumptions. This result is stated in Proposition 1 below. Proposition 1 (Optimal deterministic contract, at < a 8 t). Consider only deterministic contracts and at < a 8 t. Relax the assumptions of NIARA utility, independent noises, and 13

See Arnott and Stiglitz (1988) for detail on how randomization can sometimes be desirable – if low effort leads to a random payo¤, this may induce the agent to induce e¤ort. They derive su¢ cient conditions under which randomization is suboptimal. Our conditions to guarantee the suboptimality of random contracts generalize their results to broader agency problems (their setting focuses on insurance). 14 For instance, consider T = 2. We only require that u b (x) as de…ned in (43) exhibits NIARA. The concavity of 2 is su¢ cient, but unnecessary for this. Separately, if NIARA is violated, the marginal cost of e¤ort falls with randomization. However, this e¤ect may be outweighed by the ine¢ cient risk-sharing, so randomized contracts may still be dominated.

14

log-concave noises for

2 ; :::; T .

Any incentive-compatible contract takes the form c=v

1

T X

g 0 (at ) rt + K

t=1

!

(17)

;

where K is a constant. The optimal deterministic contract features a K that makes the agent’s participation constraint bind. Proof. See Appendix. The following Remark states that the contract’s incentive compatibility is robust to the timing assumption. In particular, if noise follows the action in each period, the contract in Theorem 1 continues to implement the target actions – since it provides su¢ cient incentives state-by-state, it automatically does so on average. However, we can no longer show that it is optimal, since there are many other contracts that provide su¢ cient incentives on average. Remark 1 (Robustness of the contract’s incentive compatibility to timing). For any timing of the noise ( t )t=1:::T (i.e. regardless of whether it follows or precedes at in each period), the contract in Theorem 1 is incentive compatible and implements (at )t=1;::;T . Indeed, given the contract, the agent’s utility is: u

T X

g 0 (at ) (at +

t)

+K

t=1

T X t=1

!

g (at ) ;

so that, regardless of the timing of ( t )t=1:::T , the agent maximizes his utility by taking action at = at , as it solves maxat g 0 (at ) at g (at ). Closed-form solutions allow the economic implications of a contract to be transparent. We close this section by considering two speci…c applications of Theorem 1 to executive compensation, to highlight the implications that can be gleaned from a tractable contract structure. While contract (7) can be implemented for any informative signal r, the …rm’s log equity return is the natural choice of r for CEOs, since they have a …duciary duty to maximize shareholders value. When the cost of e¤ort is pecuniary (v (c) = c), Theorem 1 implies that the CEO’s dollar pay c is linear in the …rm’s return r. Hence, the relevant incentives measure is the dollar change in CEO pay for a given percentage change in …rm value (i.e. “dollar-percent” incentives), as advocated by Hall and Liebman (1998).15 Another common speci…cation is v(c) = ln c, in which case the CEO’s utility function (2) now becomes, up to a monotonic (logarithmic) transformation: E U ce

g(a)

15

U;

(18)

This incentive measure refers to “ex ante” incentives, i.e. how much the CEO’s pay will change over the next year if the stock return over the next year increases by one percentage point.

15

where u (x) U (ex ) and U ln u is the CEO’s reservation utility. Utility is now multiplicative in e¤ort and cash; Edmans, Gabaix and Landier (2009) show that multiplicative preferences are necessary to generate empirically consistent predictions for the scaling of various measures of CEO incentives with …rm size. Thus, the ability to drop the HM assumption of v (c) = c becomes valuable. Applying Theorem 1 with T = 1 for simplicity, the optimal contract becomes ln c = g 0 (a )r + K:

(19)

The contract prescribes the percentage change in CEO pay for a percentage change in …rm value, i.e. “percent-percent” incentives; this slope is independent of the utility function U and the noise distribution. Murphy (1999) advocated this elasticity measure over alternative incentive measures (such as “dollar-percent”incentives) on two empirical grounds: it is invariant to …rm size, and …rm returns have much greater explanatory power for percentage than dollar changes in pay. However, he notes that “elasticities have no corresponding agency-theoretic interpretation.”The above analysis shows that elasticities are the theoretically justi…ed measure under multiplicative preferences, for any utility function. This result extends Edmans et al. who advocated “percent-percent”incentives in a risk-neutral, one-period model.

2.2

Continuous Time

This section shows that the contract has the same tractable form in continuous time, where actions and noise are simultaneous. This consistency suggests that, if reality is continuous time, it is best approximated in discrete time by modeling noise before e¤ort in each period. At every instant t, the agent takes action at and the principal observes signal rt , where rt =

Z

t

as ds +

(20)

t;

0

Rt Rt = 0 s dZs + 0 s ds, Zt is a standard Brownian motion, and The agent’s utility function is: t

E u v (c)

Z

t

> 0 and

t

are deterministic.

T

g (at ) dt

:

(21)

0

The principal observes the path of (rt )t2[0;T ] and wishes to implement a deterministic action (at )t2[0;T ] at each instant. She solves Program 1 with utility function (21). The optimal contract is of the same tractable form as Theorem 1. Theorem 2 (Optimal contract, continuous time). The following contract is optimal. The agent is paid Z T 1 c=v g 0 (at ) drt + K , (22) 0

16

"

where K is a constant that makes the participation constraint bind (E u u).

RT 0

g 0 (at ) drt + K RT g (at ) dt 0

!#

In particular, if the target action is time-independent (at = a 8 t), the contract c=v

1

(g 0 (a ) rT + K)

(23)

is optimal. Proof. See Appendix. To highlight the link with the discrete time case, consider the model of Section 2.1 and P P P de…ne r = Tt=1 rt = Tt=1 at + Tt=1 t . Taking the continuous time limit of Theorem 1 gives Theorem 2.

2.3

Discussion: What is Necessary for Tractable Contracts?

The framework considered thus far shows that tractable implementation contracts can be achieved without requiring exponential utility, a pecuniary cost of e¤ort, continuous time or Gaussian noise. However, it has still imposed a number of restrictions. We now discuss the features that are essential for our contract structure, inessential features that we have already relaxed in extensions, and additional assumptions which may be relaxable in future research. 1. Timing of noise. This assumption is central to the intuition of attaining simple contracts as it restricts the principal’s ‡exibility. Remark 1 states that, if at precedes t , contract (7) still implements (at )t=1;::;T . However, we can no longer show that it is optimal. 2. Risk-neutral principal. The full proof of Theorem 1 extends the model to the case of a riskaverse principal. If the principal wishes to minimize E [w (c)] (where w is an increasing P function) rather than E [c], then contract (7) is optimal if u (v (w 1 ( )) t g (at )) is concave. This holds if, loosely speaking, the principal is not too risk-averse. 3. NIARA utility, independent and log-concave noise. Proposition 1 states that, if at < a 8 t and deterministic contracts are assumed, (7) is the only incentive-compatible contract. Therefore, these assumptions are not required. Allowing for at = a and stochastic contracts, these assumptions are su¢ cient but may not be necessary. 4. Unidimensional noise and action. Appendix D shows that our model is readily extendable to settings where the action a and the noise are multidimensional. A close analog to our result obtains. 5. Linear signal, rt = at + t . Remark 2 in Section 3.1 later shows that with general signals rt = R (at ; t ), the optimal contract remains tractable and its functional form remains independent of u, u and the distribution of . 17

=

6. Timing of consumption. The current setup assumes that the agent only consumes at the end of period T . In Edmans, Gabaix, Sadzik and Sannikov (2009), we develop the analog of Theorem 1 where the agent consumes in each period, for the case of v (c) = ln c and a CRRA utility function. The contract remains tractable. 7. Renegotiation. Since the target e¤ort path is …xed, there is no scope for renegotiation after the agent observes the noise. In Section 3.1, the optimal action may depend on . Since the contract speci…es an optimal action for every realization of , again there is no incentive to renegotiate.

3

The Optimal E¤ort Level

The analysis has thus far focused on the optimal implementation of a given path of e¤ort levels (at ). In Section 3.1 we allow the target e¤ort level to depend on the current period noise. Section 3.2 derives conditions under which the principal wishes to implement the maximum productive e¤ort level for all noise realizations (the “maximum e¤ort principle”). Section 3.3 allows the principal to choose the maximum productive e¤ort level according to the environment.

3.1

Contingent Target Actions

Let At ( t ) denote the “action function”, which de…nes the target action for each noise realization. (Thus far, we have assumed At ( t ) = at .) Since di¤erent noises t may lead to the same observed signal rt = At ( t ) + t , the analysis must consider revelation mechanisms. If the agent announces noises b1 ; :::; bT , he is paid c = C (b1 ; :::; bT ) if the observed signals are A1 (b1 ) + b1 ; :::; AT (bT ) + bT , and a very low amount c otherwise. As in the core model, we assume that At ( t ) > a 8 t , else a ‡at contract would be optimal for some noise realizations. We also assume that the signal At ( t ) + t is nondecreasing in t : otherwise, as the proof of Proposition 2 shows, the action function cannot be implemented –if a higher noise corresponds to a signi…cantly lower action, the agent would over-report the noise and exert less e¤ort. We make three additional technical assumptions: the action space A is open, At ( t ) is bounded within any compact subinterval of , and At ( t ) is almost everywhere continuous. The …nal assumption still allows for a countable number of jumps in At ( t ). Given the complexity and length of the proof that randomized contracts are inferior in Theorem 1, we now restrict the analysis to deterministic contracts and assume At ( t ) < a. We conjecture that the same arguments in that proof continue to apply with a noise-dependent target action. The optimal contract induces both the target e¤ort level (at = At ( t )) and truth-telling (bt = t ). It is given by the next Proposition: Proposition 2 (Optimal contract, noise-dependent action). A series of contingent action (At ( t ))t=1:::T can be implemented if and only if for all t, At ( t ) + t is nondecreasing in t . If 18

that condition is veri…ed, the following contract is optimal. For each t, after noise t is realized, the agent communicates a value bt to the principal. If the subsequent signal is not At (bt ) + bt in each period, he is paid a very low amount c. Otherwise he is paid C (b1 ; :::; bT ), where C ( 1 ; :::;

T)

=v

1

T X

g (At ( t )) +

t=1

T Z X

t

g 0 (At (x)) dx + K

t=1

!

;

(24)

constant, and K iis a constant that makes the participation constraint bind his anParbitrary T R t 0 g (At (x)) dx + K = u.) (E u t=1

Proof. (Heuristic). The Appendix presents a rigorous proof that does not assume di¤erentiability of V and A. Here, we give a heuristic proof that conveys the essence of the result using …rst-order conditions. We set T = 1 and drop the time subscript. Instead of reporting , the agent could report b 6= , in which case he receives c unless r = A (b)+ b. Therefore, he must take action a such that +a = b+A (b), i.e. a = A (b)+ b . In this case, his utility is V (b) g (A (b) + b ). The truth-telling constraint is thus: g (A (b) + b

2 arg max V (b)

The …rst-order condition is

b

);

V 0 ( ) = g 0 (A ( )) A0 ( ) + g 0 (A ( )) : Integrating over

gives the indirect felicity function V ( ) = g (A ( )) +

Z

g 0 (A (x)) dx + K

for constants and K. The associated pay is given by (24). The contract in Proposition 2 remains in closed form and its functional form does not depend on u, u nor the distribution of .16 However, it is somewhat more complex than the contracts in Section 2, as it involves calculating an integral. In the particular case where A ( ) = a 8 , Proposition 2 reduces to Theorem 1. Remark 2 (Extension of Proposition 2 to general signals). Suppose the signal is a general function rt = R (at ; t ), where R is di¤erentiable and has positive derivatives in both arguments, R1 (a; ) =R2 (a; ) is nondecreasing in a, and R (At ( t ) ; t ) is nondecreasing in t . The same 16

Even though (24) features an integral over the support of , it does not involve the distribution of .

19

analysis as in Proposition 2 derives the following contract as optimal: C ( 1 ; :::;

T)

=v

1

T X

g (At ( )) +

t=1

where bind.

Z

t

R2 (At (x) ; x) g 0 (At (x)) dx + K R1 (At (x) ; x)

!

;

(25)

is an arbitrary constant and K is a constant that makes the participation constraint

The heuristic proof is as follows (setting T = 1 and dropping the time subscript). If is observed and the agent reports b 6= , he has to take action a such that R (a; ) = R (A (b) ; b). Taking the derivative at b = yields R1 @[email protected] = R1 A0 ( ) + R2 . The agent solves max b V (b) g (a (b)), with …rst-order condition V 0 ( ) g 0 (A ( )) @[email protected] = 0. Substituting for @[email protected] from above and integrating over yields (25).

3.2

Maximum E¤ort Principle

We now consider the optimal action function A ( ), specializing to T = 1 for simplicity and dropping the time index. The principal chooses A ( ) to maximize max

fa( )g

Z

b (a ( ) ; ) f ( ) d

C [A] :

(26)

The …rst term represents the productivity of e¤ort, where a ( ) = min A ( ) ; a and a < a is the maximum productive e¤ort level. The min A ( ) ; a function conveys the fact that, while the action space may be unbounded (a may be in…nite), there is a limit to the number of productive activities the agent can undertake to bene…t the principal. For example, in a cash ‡ow diversion model, a re‡ects zero stealing; in an e¤ort model, there is a limit to the number of hours a day the agent can work while remaining productive. In a project selection model, there is a limit to the number of positive-NPV projects available; a re‡ects taking all of these projects while rejecting negative-NPV projects. In addition to being economically realistic, this assumption is useful technically as it prevents the optimal action from being in…nite. Actions a > a do not bene…t the principal, but improve the signal: one interpretation is manipulation (see Appendix C for further details). Clearly, the principal will never wish to implement a > a. For brevity, we use “maximum e¤ort” to refer to maximum productive e¤ort a. b( ) is the productivity function of e¤ort which is di¤erentiable with respect to a ( ). f ( ) is the density of , assumed to be …nite. The second term, C [A], is the expected cost of the contract required to implement A ( ) (we suppress the dependence on for brevity). We assume that g is strictly convex, and that g (g 0 ) 1 and g 0 are convex; this assumption is satis…ed for many standard cost functions, e.g. g (a) = Ga2 and g (a) = eGa for G > 0. The following Proposition bounds the di¤erence in the costs of the contract implementing maximum

20

e¤ort, and an arbitrary contract:17 Proposition 3 (Bound on di¤erence in costs.) There exists a function all plans fa ( )g where 8 ; a ( ) a, h i C A

C [A]

Z

a;

a

a;

such that, for

(27)

a( ) d :

Proof. See Appendix. The next Theorem gives conditions under which maximum e¤ort is optimal. Theorem 3 (Maximum e¤ort principle). Assume that 8 ; 8a a, @1 b (a; ) f ( ) a; , i.e. the marginal bene…t of e¤ort is su¢ ciently large. Then, the optimal plan is to implement maximum e¤ort, A ( ) = a. Proof. For any plan, Z

b a;

Z

b (a ( ) ; ) f ( ) d

Z

inf @1 b (a; ) a a

a( ) f ( )d

a; a a( ) d h i C [A] C A by Proposition 3. Hence, Z

b a;

f ( )d

h i C A

Z

b (a ( ) ; ) f ( ) d

C [A]

i.e., the principal’s objective is maximized by inducing maximum e¤ort. Theorem 3 above shows that, if the marginal bene…t of e¤ort is su¢ ciently greater than the marginal cost, than maximum e¤ort is optimal. A su¢ cient (although unnecessary) condition is for the …rm to be su¢ ciently large. To demonstrate this, we parameterize the b function by b (a; ) = Sb (a; ), where S is the baseline value of the output under the agent’s control. For example, if the agent is a CEO, S is …rm size; if he is a divisional manager, S is the size of his division. We will refer to S as …rm size for brevity. Under this speci…cation, the bene…t of e¤ort is multiplicative in …rm size. This is plausible for most agent actions, which can be “rolled out” across the whole company and thus have a greater e¤ect in a larger …rm. Examples include the h i The proof shows that we can take a; = max @C A [email protected] ( ) ; 0 . We use partial derivatives such @C [A] [email protected] ( ). Their meaning is traditional and is as follows. Under weak conditions, C [ ] is di¤erentiable A, in the sense that there is a function ( ) (unique up to sets of measure 0) such that, for any fB ( )g, R limh!0 (C [A + hB] C [A]) =h = ( ) B ( ) d . Then, we de…ne @C [A] [email protected] ( ) = ( ). 17

21

choice of strategy, the launch of new projects, or increasing production e¢ ciency.18 Let F denote the complementary cumulative distribution function of , i.e. F (x) = Pr ( x). We assume that sup F ( ) =f ( ) < 1 and inf @1 b a; > 0, and de…ne: S =

a inf @1 b

a;

;

a

g 0 (a) + g 00 (a) sup v0 v

1

u 1 (u) + g(a) + (

F( ) f( )

)g 0 (a)

:

(28)

Calculations in the Online Appendix show that if, S > S , i.e. the …rm is su¢ ciently large, then it is optimal for the principal to induce maximum e¤ort. Indeed, in Proposition 3 we can a; = a f ( ): take a contains the two costs The intuition for the above is as follows. The numerator of of inducing higher e¤ort – the disutility imposed on the agent (the …rst term) plus the risk imposed by the incentive contract required to implement e¤ort (the second term). These are scaled by the denominator, where the term in brackets is an upper bound on the pay received by the agent. The costs of e¤ort are thus of similar order of magnitude to the agent’s pay. The bene…t of e¤ort is enhanced …rm value and thus of similar order of magnitude to …rm size. If the …rm is su¢ ciently large (S > S ), the bene…ts of e¤ort outweigh the costs and so maximum productive e¤ort is optimal. A simple numerical example illustrates. Consider a …rm with a $10b market value and, to be conservative, assume that maximum e¤ort increases …rm value by only 1%. Then, maximum e¤ort creates $100m of value, which vastly outweighs the agent’s salary. Even if it is necessary to double the agent’s salary to compensate him for the costs of increased e¤ort, this is swamped by the bene…ts. The comparative statics on the threshold …rm size S are intuitive. First, S is increasing in noise dispersion, because the …rm must be large enough for maximum e¤ort to be optimal for all noise realizations. Indeed, a rise in increases u 1 (u) + g(a) + ( )g 0 (a), lowers , and raises sup F =f . (For example, if the noise is uniform, then sup F =f = .) Second, it is increasing in the agent’s risk aversion parameterized by v and thus the risk imposed by incentives. Third, it is increasing in the disutility of e¤ort, and thus the marginal cost of e¤ort g 0 a and the convexity of the cost function g 00 (a). Fourth, it is decreasing in the marginal bene…t of e¤ort (inf @1 b a; ). Thus, the maximum e¤ort principle is especially likely to hold if noise, risk aversion and the cost of e¤ort are small. We conjecture that a “maximum e¤ort principle”holds under more general conditions than those considered above. For instance, it likely continues to hold if the principal’s objective R function is maxfa( )g b (A ( ) ; ) f ( ) d C [A] and the action space is bounded above by a – i.e. a (the maximum feasible e¤ort level) equals a (the maximum productive e¤ort level). This 18

Bennedsen, Perez-Gonzalez and Wolfenzon (2009) provide empirical evidence that CEOs have the same percentage e¤ect on …rm value, regardless of …rm size; Edmans, Gabaix and Landier (2009) show that a multiplicative production function is necessary to generate empirically consistent predictions for the scaling of various measures of incentives with …rm size.

22

slight variant is economically very similar, since the principal never wishes to implement A ( ) > a in our setting, but substantially more complicated mathematically, because the agent’s action space now has boundaries and so the incentive constraints become inequalities. We leave this extension to future research. Hellwig (2007) shows that this reason alone is su¢ cient for a boundary e¤ort level to be always optimal in a multiperiod discrete model and a continuoustime model that can be approximated by a discrete-time model, even in the absence of condition on the bene…t of e¤ort featured in this paper. Since the incentive constraints are inequalities with a boundary e¤ort level, the principal has greater freedom in choosing the contract, which allows her to select a cheaper contract. Thus, the maximum e¤ort result holds in settings even without a large bene…t of e¤ort. Lacker and Weinberg (1989) similarly derive a condition under which maximum e¤ort (zero diversion in their setting) is optimal, for the case v (c) = c. In DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007), zero diversion is optimal since the agent is risk-neutral and so there is no trade-o¤ between risk and incentives. Edmans, Gabaix, Sadzik and Sannikov (2009) extend the maximum e¤ort principle to general T , for the case where v (c) = ln c (multiplicative preferences) and u is CRRA. In the full contracting problem, which solves for both the optimal e¤ort level and the cheapest implementing contract, tractable contracts are attained by forcing a constant incentive slope on the agent to rule out the ambiguous reward for performance. This is achieved in our paper through two key mechanisms. First, we achieve a constant marginal cost of e¤ort by implementing a constant target action. This requires the removal of dynamics so that the action that the principal wishes to implement is independent of prior period outcomes. Previous papers remove dynamics via removing wealth e¤ects, so that the cost of implementing a given action is constant. For example, HM assume CARA utility and a pecuniary cost of e¤ort, so that wealth has no e¤ect on the agent’s risk aversion, and has an identical e¤ect on the felicity from cash and cost of e¤ort. DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007) assume risk-neutrality, so that risk aversion is independent of wealth (it is always zero) and the marginal utility of money is constant. The key insight of this paper is that we can remove dynamics without removing wealth e¤ects, and thus without imposing constraints on the utility function or the cost of e¤ort. Speci…cally, a constant target action need not require the cost of implementing the action to be constant – it only requires changes in these costs to be small compared to the bene…ts of e¤ort. If the bene…ts of e¤ort are su¢ ciently large (e.g. the …rm is big), maximum e¤ort remains optimal regardless of how the cost of implementing e¤ort changes over time. Thus, our formulation allows for wealth e¤ects to exist (and thus the utility function to be unrestricted), while at the same time removing dynamics and thus achieving tractability because such e¤ects are small. The main limitation of our setup is that, in order to relax the HM assumptions, we require a restriction on the bene…t of e¤ort for Theorem 3 to hold. Second, our timing assumption forces the constant marginal cost of e¤ort (which is a consequence of the constant action) to equal the marginal felicity from cash state-by-state, and 23

thus requires the reward for performance to be the same after every noise realization. In sum, the paper provides a set of su¢ cient conditions under which simple contracts can be obtained –actions following noise and a large bene…t of e¤ort –which is quite di¤erent than considered in prior literature. They may therefore hold in settings where the alternative assumptions are not satis…ed and tractability was previously believed to be unattainable. Appendix E considers other su¢ cient conditions required for Proposition 3 to hold, which do not assume the bene…t of e¤ort is multiplicative in …rm size. That section also shows that we can derive the optimal fA ( )g in certain cases even where the maximum e¤ort principle does not apply.

3.3

Determinants of the Maximum E¤ort Level

The previous section assumed that the maximum productive e¤ort level a is exogenous. This section allows the principal to choose it endogenously according to the environment. We extend the contracting game to two stages. In the …rst stage, the principal chooses a. In practice, this may be achieved by physical investment, training the agent, or organizational design. For example, building a larger plant gives the agent greater scope to add value; training the agent or choosing an organizational structure that gives him greater responsibility and freedom have the same e¤ect. Since physical investment, training and organizational design are costly to reverse, we model this decision as irreversible. In the second stage, the game studied in the core model is played out. In this stage, the action a may respond to the noise , but the maximum productive e¤ort a has been …xed. The principal’s payo¤ is: Z

b min A ( ) ; a ; ; a d

C [A]

(29)

where b a; ; a is weakly increasing in a and decreasing in a. Higher ‡exibility a is costly to the principal –for instance, we could have b a; ; a = b (a; ) H a , where H a is the cost of implementing ‡exibility level a. Before we state the result formally, we summarize it. Under conditions described below, in the second stage, the principal will wish to implement the contract in Theorem 1 with a = a, i.e. the maximum e¤ort principle applies. In the …rst stage, when choosing a, she will trade o¤ the costs and bene…ts of a higher maximum e¤ort. For instance, in the examples at the end of this section, a is decreasing in the agent’s disutility of e¤ort and the noise dispersion. A trade-o¤ exists in the …rst stage because the costs and bene…ts of ‡exibility are of similar order of magnitude. For example, increasing plant size has a continuous e¤ect on …rm value and involves a signi…cant cost, which is also a function of …rm size. However, it does not exist in the second stage because the costs of e¤ort are a function of the agent’s salary, and the bene…ts are discontinuous. Once the plant has been built, the agent must run it fully e¢ ciently to prevent 24

signi…cant value loss –even small imperfections will cause large reductions in value and so the marginal bene…t of e¤ort is high (analogous to Kremer’s (1993) O-ring theory). Thus, this enriched game features a simple optimal contract (since the target action in the second stage in constant), but one which also responds to the comparative statics of the environment. It may thus be a potentially useful way of modeling various economic problems, to achieve tractability while at the same time generating comparative statics. To proceed more formally, consider the two following problems. Problem 1 : maximize over a and all unrestricted contracts: max E b min A ( ) ; a ; ; a

C [A] :

a;fa( )g

Problem 2 : maximize over a and use the contract in Theorem 1 which implements a: max E b a; ; a

C a :

a

where C a is the expected cost of the contract implementing a constant action a. Problem 2 optimizes over only a scalar a, while Problem 1 optimizes over a whole continuum of contracts, including those that do not implement maximum e¤ort. However, under some simple conditions, Problem 2 is not restrictive –both problems have the same solution. Proposition 4 (Maximum e¤ort in two-stage game). Let a denote the value of a in a solution to Problem 1, and assume that a > a and that 8 , inf a @1 b (a; ; a ) f ( ) (a ; ). Then, the solution of Problem 1 is the same solution as Problem 2: that is, the solution of the problem that implements A ( ) = a is also the solution of the unrestricted contract. Proof. Immediate given Theorem 3. At a , the principal wants to implement maximum e¤ort, i.e. a ( ) = a for all : At …rst glance, the condition in Proposition 4 may appear restrictive, since verifying it requires solving Problem 1. However, su¢ cient conditions are simply inf a @1 b (a; ; a ) f ( ) (a ; ) for all a and . The value can be calculated up to an integral, so bounds are reasonably straightforward to check in a given setting. Illustrations We now illustrate the contract and comparative statics in three examples, for speci…c cases of u and v. We de…ne B (a) = E [b (a; ; a)], the principal’s expected payo¤ given target e¤ort a. The optimal contract gives c = v 1 (g 0 (a) + k) where k satis…es E [u (g 0 (a) g (a) + k)] = u. Using previous notation, k = K + g 0 (a) a. The expected cost of the contract is C [a] E [c (r)] = E [v 1 (g 0 (a) + k)]. It is straightforward to show that C [a] increases in target e¤ort a, the agent’s reservation utility u, and the dispersion of noise ; the proof relies on the dispersion techniques used in this paper.

25

The principal’s problem is: max B (a) a

(30)

C [a]

and the optimal contract is the contract described in Theorem 1 implementing a constant a. This is a simple problem to solve in many applied settings. Example 1. Consider u (x) = x, v (x) = x , 2 (0; 1]. We have k = g (a) + u, and the contract is c (r) = (g 0 (a) (r a) + g (a) + u)1= . The expected cost is19 h

1=

0

C [a] = E (g (a) + g (a) + u)

i

:

C [a] can be obtained in closed form for various speci…c cases. For example, h = 1=2 yields i C [a] = g 0 (a)2 2 + (g (a) + u)2 ; u = 0 and g (a) = eGa yields C [a] = eGa= E (G + 1)1= . h i 1= Ga= The principal chooses a to maximize B (a) e E (G + 1) . Simple calculations show that the target action is decreasing in the marginal cost of e¤ort G, risk aversion and the dispersion of noise . Example 2. Consider v (x) = ln x and u (x) = e(1 )x = (1 ) for > 1, so that the utility g(a) 1 = (1 ), as is commonly used in macroeconomics: it is CRRA and function is ce multiplicative in consumption and e¤ort. We also assume N (0; 2 ). Then, the contract is c (r) = exp (g 0 (a) (r a) + k) with k = ln c + g (a) (1 ) g 0 (a)2 2 =2, where u (ln c) is the reservation utility. The expected cost of the contract is: C [a] = c exp g (a) + g 0 (a)2

2

=2 .

Again, calculations show that a is decreasing in the cost of e¤ort, risk aversion and noise dispersion. We thus obtain the standard comparative statics, but for a contract that is log-linear, rather than linear in returns. Murphy (1999) argues that log-linear contracts are empirically more relevant. Example 3. Consider v (x) = x, g (a) = 12 Ga2 , u (x) = e x with G; > 0, and N (0; 2 ) as in HM. The cost of the contract is C [a] = c + g (a) + g 0 (a)2 2 =2, and the same three comparative statics hold. Note that HM not only have a constant target action, but an additive e¤ect of e¤ort. We can obtain this result with b a; ; a = a + a a a; , for some function a; a; =f ( ). In the second stage of the game, having chosen a, the principal wishes to implement constant e¤ort a for all , because the marginal cost of shirking (parameterized by ) is su¢ ciently high. Moving to the …rst stage, since the principal knows that a = a in the second stage, her bene…t function is b a; ; a = a: e¤ort has an additive e¤ect. The key complication in obtaining the HM result is reconciling the linear marginal bene…t 19

0 A variant is the case u (x) = x and v (x) = ln h x.0 Then, i the contract is ln c (r) = g (a) (r and the expected cost is C (a) = exp [g (a) + u] E eg (a) :

26

a) + g (a) + u,

of e¤ort required for an additive e¤ect, with the high marginal bene…t of e¤ort required for the maximum e¤ort principle to apply to guarantee a constant action. The two-stage game resolves this tension because the marginal bene…t of e¤ort is moderate in the …rst stage and very high in the second stage, as discussed in the plant example earlier. Under this formulation, the cost of the contract implementing a = a is C [a] = c + 12 Ga2 + 2 G2 a2 2 and the principal maximizes a c 12 Ga G2 a2 2 which yields the result a = 2 2 1=G (1 + G 2 ), exactly as in HM. Thus, using the HM conditions of exponential utility, a pecuniary quadratic cost of e¤ort and Gaussian noise in the above speci…cation, leads to the same optimal contract (not just the implementation contract) as in HM. In Appendix E, we also provide explicit conditions under which maximum e¤ort is optimal for the three above examples, i.e. a specialization of the conditions in Proposition 4 to these cases. These conditions allow straightforward veri…cation of whether the maximum e¤ort principle holds.

4

Conclusion

This paper has identi…ed and analyzed a class of multiperiod situations in which the optimal contract is tractable, without requiring exponential utility, a pecuniary cost of e¤ort, Gaussian noise or continuous time. The contract’s functional form is independent of the agent’s utility function, reservation utility and noise distribution. Furthermore, when the cost of e¤ort can be expressed in …nancial terms, the optimal contract is linear and so the slope, in addition to the functional form, is independent of these parameters. The key to tractability in discrete time is specifying the noise before the action in each period, which forces the incentive compatibility constraint to hold state-by-state rather than just on average, and tightly constraints the set of contracts available to the principle. The optimal contract is very similar in continuous time, where noise and actions occur simultaneously. Hence, if underlying reality is continuous time, it is best approximated in discrete time under our timing assumption. Moving to the full contracting problem, our two-stage model allows the principal to choose the target e¤ort level to respond to the details of the environment, while retaining tractability. The principal initially sets a lower maximum productive e¤ort level if the agent is more risk averse or faces a higher cost of e¤ort or greater noise. However, in each subsequent period, the principal wishes the agent to exert maximum e¤ort, regardless of how output evolves. If the bene…ts of e¤ort are su¢ ciently high (e.g. the …rm is much larger than the agent’s salary), they swamp the costs, and so the optimal e¤ort level is independent of how the agent’s wealth evolves over time. Our paper suggests several avenues for future research. The HM framework has proven valuable in many areas of applied contract theory owing to its tractability; however, some models have used the HM result in settings where the assumptions are not satis…ed (see the critique of Hemmer (2004)). Our framework allows tractable contracts to be achieved in such situations. In 27

particular, our contracts are valid in situations where time is discrete, utility cannot be modeled as exponential (e.g. in calibrated models where it is necessary to capture decreasing absolute risk aversion), e¤ort is non-pecuniary, or noise is not Gaussian (e.g. is bounded). While we considered the speci…c application of executive compensation, other possibilities include bank regulation, team production, insurance or taxation.20 In ongoing work (Edmans, Gabaix, Sadzik and Sannikov (2009)) we extend tractable contracts to a dynamic setting where the agent consumes in each period, can privately save, and may smooth earnings intertemporally. In addition, while our model has relaxed a number of assumptions required for tractability, it continues to impose a number of restrictions. In particular, the optimal action can only be solved tractably if the maximum e¤ort principle applies or in certain other cases (e.g. linear cost of e¤ort). Grossman and Hart (1983) and Garrett and Pavan (2009) show that solving for the optimal action in a general case is typically extremely complex; whether we can extend tractability to broader settings is an important area for future research. Similarly, while Section 3 allows for the action to depend on the noise in period t, a useful extension would be to allow the action to depend on the full history of outcomes. Other restrictions are mostly technical rather than economic. For example, our multiperiod model assumes independent noises with log-concave density functions; and our extension to noise-dependent target actions assumes an open action set where the maximum feasible e¤ort level exceeds the maximum productive e¤ort level. Some of these assumptions may not be valid in certain situations, limiting the applicability of our framework. Further research may be able to broaden the current setup.

20

See Golosov, Kocherlakota and Tsyvinski (2003) and Farhi and Werning (2009) for taxation applicaitons of the principal-agent problem.

28

a a a a b c f g r u u v A C [A] F M S T V

E¤ort (also referred to as “action”) Maximum e¤ort Maximum productive e¤ort Target e¤ort Bene…t function for e¤ort, de…ned over a Cash compensation, de…ned over r or Density of the noise distribution Cost of e¤ort, de…ned over a Signal (or “return”), typically r = a + Agent’s utility function, de…ned over v (c) g (a) Agent’s reservation utility Agent’s felicity function, de…ned over c Noise Action function, de…ned over Expected cost of contract implementing A ( ) ; 2 Complementary cumulative distribution function of Message sent by agent to the principal Baseline size of output under agent’s control Number of periods Felicity provided by contract, de…ned over r or

;

Table 1: Key Variables in the Model.

A

Mathematical Preliminaries

This section derives some mathematical results that we use for the main proofs.

A.1

Dispersion of Random Variables

We repeatedly use the “dispersive order”for random variables to show that IC constraints bind. Shaked and Shanthikumar (2007, Section 3.B) provide an excellent summary of known facts about this concept. This section provides a self-contained guide of the relevant results for our paper, as well as proving some new results. We commence by de…ning the notion of relative dispersion. Let X and Y denote two random variables with cumulative distribution functions F and G and corresponding right continuous inverses F 1 and G 1 . X is said to be less dispersed than Y if and only if F 1 ( ) F 1 ( ) G 1 ( ) G 1 ( ) whenever 0 < < 1. This concept is location-free: X is less dispersed than Y if and only if it is less dispersed than Y + z, for any real constant z. A basic property is the following result (Shaked and Shanthikumar (2007), p.151): Lemma 1 Let X be a random variable and f , h be functions such that 0 h (y) h (x) whenever x y. Then f (X) is less dispersed than h (X). 29

f (y)

f (x)

This result is intuitive: h magni…es di¤erences to a greater extent than f , leading to more dispersion. We will also use the next two comparison lemmas. Lemma 2 Assume that X is less dispersed than Y and let f denote a weakly increasing function, h a weakly increasing concave function, and a weakly increasing convex function. Then: E [f (X)]

E [f (Y )] ) E [h (f (X))]

E [h (f (Y ))]

E [f (X)]

E [f (Y )] ) E [ (f (X))]

E [ (f (Y ))] :

Proof. The …rst statement comes directly from Shaked and Shanthikumar (2007), Theorem 3.B.2, which itself is taken from Landsberger and Meilijson (1994). The second statement is b = X, Yb = Y , fb(x) = f ( x), h (x) = derived from the …rst, applied to X ( x). It can be veri…ed directly (or via consulting Shakedh and Shanthikumar (2007), Theorem 3.B.6) that i h i h i b b b b b b b b E f Y . Thus, E h f X X is less dispersed than Y . In addition, E f X i h b = (f (X)) yields E [ (f (X))] E [ (f (Y ))]. . Substituting h fb X E h fb Yb Lemma 2 is intuitive: if E [f (X)] E [f (Y )], applying a concave function h should maintain the inequality. Conversely, if E [f (X)] E [f (Y )], applying a convex function should maintain the inequality. In addition, if E [X] = E [Y ], Lemma 2 implies that X second-order stochastically dominates Y . Hence, it is a stronger concept than second-order stochastic dominance. Lemma 2 allows us to prove Lemma 3 below, which states that the NIARA property of a utility function is preserved by adding a log-concave random variable to its argument. Lemma 3 Let u denote a utility function with NIARA and Y a random variable with a logconcave distribution. Then, the utility function u b de…ned by u b (x) E [u (x + Y )] exhibits NIARA. Proof. Consider two constants a < b and a lottery Z independent from Y . Let Ca and Cb be the certainty equivalents of Z with respect to utility function u b and evaluated at points a and b respectively, i.e. de…ned by u b (a + Ca ) = E [u (a + Z)] ;

u b (b + Cb ) = E [u (b + Z)] :

u b exhibits NIARA if and only if Ca Cb , i.e. the certainty equivalent increases with wealth. To prove that Ca Cb , we make three observations. First, since u exhibits NIARA, there exists an increasing concave function h such that u (a + x) = h (u (b + x)) for all x. Second, because Y is log-concave, Y + Cb is less dispersed than Y + Z by Theorem 3.B.7 of Shaked and Shanthikumar (2007). Third, by de…nition of Cb and the independence of Y and

30

Z, we have E [u (b + Y + Cb )] = E [u (b + Y + Z)]. Hence, we can apply Lemma 2, which yields E [h (u (b + Y + Cb ))] E [h (u (b + Y + Z))], i.e. E [u (a + Y + Cb )] Thus we have Cb

A.2

E [u (a + Y + Z)] = E [u (a + Y + Ca )] by de…nition of Ca :

Ca as required.

Subderivatives

Since we cannot assume that the optimal contract is di¤erentiable, we use the notion of subderivatives to allow for quasi …rst-order conditions in all cases. De…nition 1 For a point x and function f de…ned in a left neighborhood of x, we de…ne the subderivative of f at x as: d f dx

f 0 (x)

lim inf y"x

f (x) x

f (y) y

This notion will prove useful since f 0 (x) is well-de…ned for all functions f (with perhaps in…nite values). We take limits “from below,” as we will often apply the subderivative at the maximum feasible e¤ort level a. If f is left-di¤erentiable at x, then f 0 (x) = f 0 (x). We use the following Lemma to allow us to integrate inequalities with subderivatives. All the Lemmas in this subsection are proven in the Online Appendix. Lemma 4 Assume that, over an interval I: (i) f 0 (x) j (x) 8 x, for an continuous function j (x) and (ii) there is a C 1 function h such that f + h is nondecreasing. Then, for two points Rb a b in I, f (b) f (a) j (x) dx. a

Condition (ii) prevents f (x) from exhibiting discontinuous downwards jumps, which would prevent integration.21 The following Lemma is the chain rule for subderivatives. Lemma 5 Let x be a real number and f be a function de…ned in a left neighborhood of x. Suppose that function h is di¤erentiable at f (x), with h0 (f (x)) > 0. Then, (h f )0 (x) = h0 (f (x)) f 0 (x). In general, subderivatives typically follow the usual rules of calculus, with inequalities instead of equalities. One example is below.

Lemma 6 Let x be a real number and f , h be functions de…ned in a left neighborhood of x. Then (f + h)0 (x) f 0 (x)+h0 (x). When h is di¤erentiable at x, then (f + h)0 (x) = f 0 (x)+h0 (x). 21

For example, f (x) = 1 fx 0g satis…es condition (i) as f 0 (x) = 0 8 x, but violates both condition (ii) and the conclusion of the Lemma, as f ( 1) > f (1).

31

B

Detailed Proofs

Throughout these proofs, we use tildes to denote random variables. For example, e is the noise viewed as a random variable and is a particular h irealization of that noise. E [f (e)] denotes the e expectation over all realizations of e and E f (e) denotes the expectation over all realizations of both x and a stochastic function fe.

Proof of Theorem 1 Roadmap. We divide the proof in three parts. The …rst part shows that messages are redundant, so that we can restrict the analysis to contracts without messages. This part of the proof is standard and can be skipped at a …rst reading. The second part proves the theorem considering only deterministic contracts and assuming that at < a 8 t. This case requires weaker assumptions (see Proposition 1). The third part, which is signi…cantly more complex, rules out randomized contracts and allows for the target e¤ort to be the maximum a. Both these extensions require the concepts of subderivatives and dispersion from Appendix A. 1). Redundancy of Messages Let r denote the vector (r1 ; :::; rT ) and de…ne and a analogously. De…ne g (a) = g (a1 ) + :::+g (aT ). Let VeM (r; ) = v (e c (r; )) denote the felicity given by a message-dependent contract if the agent reports and the realized signals are r. Under the revelation principle, we can restrict the analysis to mechanisms that induce the agent to truthfully report the noise . The incentive compatibility (IC) constraint is that the agent exerts e¤ort a and reports b = : h

8 ; 8b; 8a; E u VeM ( + a; b )

g (a)

i

h

E u VeM ( + a ; )

g (a )

i

:

(31)

i h The principal’s problem is to minimize expected pay E v 1 VeM (e + a ; e ) , subject to the IC constraint (31), and the agent’s individual rationality (IR) constraint h E u VeM (e + a ; e )

g (a )

i

u:

(32)

Since r = r a + on the equilibrium path, the message-dependent contract is equivalent to VeM (r; r a ). We consider replacing this with a new contract Ve (r), which only depends on the realized signal and not on any messages, and yields the same felicity as the corresponding message-dependent contract. Thus, the felicity it gives is de…ned by: Ve (r) = VeM (r; r

32

a ):

(33)

The IC and IR constraints for the new contract are given by: h i 8 ; 8a; E u Ve (r) g (a) h i E u Ve (r ) g (a )

h E u Ve (r )

g (a )

u:

i

(34)

;

(35)

If the agent reports b 6= , he must take action a such that +a =b +a . Substituting b = +a a into (31) and (32) indeed yields (34) and (35) above. Thus, the IC and IR constraints of the new contract are satis…ed. Moreover, the new contract costs exactly the same as the old contract, since it yields the same felicity by (33). Hence, the new contract Ve (r) induces incentive compatibility and participation at the same cost as the initial contract VeM (r; ) with messages, and so messages are not useful. The intuition is that a is always exerted, so the principal can already infer from the signal r without requiring messages. 2). Deterministic Contracts, in the case at < a 8 t We will prove the Theorem by induction on T . 2a). Case T = 1. Dropping the time subscript for brevity, the incentive compatibility (IC) constraint is: 8 ; 8a : V ( + a) g (a) V ( + a ) g (a ) De…ning r = + a and r0 = + a, we have a = a + r0 g (a )

g (a + r0

r)

r. The IC constraint can be rewritten:

V (r)

V (r0 ) :

Rewriting this inequality interchanging r and r0 yields g (a ) g (a + r r0 ) V (r0 ) and so: g (a ) g (a + r0 r) V (r) V (r0 ) g (a + r r0 ) g (a ) : We …rst consider r > r0 . Dividing through by r g (a )

g (a + r0 r r0

r)

V (r) r

V (r), (36)

r0 yields:

V (r0 ) r0

g (a + r r0 ) r r0

g (a )

:

(37)

Since a is in the interior of the action space A and the support of is open, there exists r0 in the neighborhood of r. Taking the limit r0 " r, the …rst and third terms of (37) converge to 0 0 0 g 0 (a ). Therefore, the left derivative Vlef t (r) exists, and equals g (a ). Second, consider r < r . Dividing (36) through by r r0 , and taking the limit r0 # r shows that the right derivative 0 Vright (r) exists, and equals g 0 (a ). Therefore, V 0 (r) = g 0 (a ) :

33

(38)

Since r has interval support22 , we can integrate to obtain, for some integration constant K: V (r) = g 0 (a ) r + K.

(39)

2b). If the Theorem holds for T , it holds for T + 1. This part is as in the main text. Note that the above proof (for deterministic contracts where at < a) does not require logconcavity of t , nor that u satis…es NIARA. This is because the contract (7) is the only incentive compatible contract. These assumptions are only required for the general proof, where other contracts (e.g. randomized ones) are also incentive compatible, to show that they are costlier than contract (7). 3). General Proof We no longer restrict at to be in the interior of A, and allow for randomized contracts. We wish to prove the following statement T by induction on integer T : Statement T : Consider a utility function u with NIARA, independent random variables re1 ; :::; reT where re2 ; :::; reT are log-concave, and a sequence of nonnegative numbers g 0h(a1 ) ; :::; g 0 (aT ). i Consider the set of (potentially randomized) contracts Ve (r1 ; :::; rT ) such that (i) E u Ve (e r1 ; :::; reT ) u; (ii) 8 t = 1:::T ,

h i h i d E u Ve (e r1 ; :::; ret + "; :::; reT ) j re1 ; ::; ret g 0 (at ) E u0 Ve (e r1 ; :::; ret ; :::; reT ) j re1 ; ::; ret d" j"=0 (40) h i and (iii) 8 t = 1:::; T , E u Ve (e r1 ; :::; ret ; :::; reT ) j re1 ; ::; ret is nondecreasing in ret . In this set, for any increasing and convex cost function , E [ (V (e r1 ; :::; reT ))] is minimized PT 0 0 with contract: V (r1 ; :::; rT ) = t=1 g (at ) rt + K, where K is a constant that makes the participation constraint (i) bind.

Condition (ii) is the local IC constraint, for deviations from below. We …rst consider the case of deterministic contracts, and then show that randomized contracts are costlier. We use the notation Et [ ] = E [ j re1 ; :::; ret ] to denote the expectation based on time-t information. 3a). Deterministic Contracts The key di¤erence from the proof in 2) is that we now must allow for at = a.

3ai). Proof of Statement T when T = 1. d (40) becomes d" u (V (r + "))j"=0 g 0 (a1 ) u0 (V (r)). Applying Lemma 5 to h = u V 0 (r)

g 0 (a ) :

22

1

yields: (41)

The model could be extended to allowing non-interval support: if the domain of r was a union of disjoint intervals, we would have a di¤erent integration constant K for each interval.

34

It is intuitive that (41) should bind, as this minimizes the variability in the agent’s pay and thus constitutes e¢ cient risk-sharing. We now prove that this is indeed the case; to simplify exposition, we normalize g (a ) = 0 w.l.o.g.23 If constraint (41) binds, the contract is V 0 (r) = g 0 (a ) r + K, where K satis…es E [u (g 0 (a ) r + K)] = u. We wish to show that any other contract V (r) that satis…es (41) is weaklier costlier. By assumption (iii) in Statement 1 , V is nondecreasing. We can therefore apply Lemma 4 to equation (41), where condition (ii) of the Lemma is satis…ed by h (r) 0. This implies that for r r0 , V (r0 ) V (r) g 0 (a ) (r0 r) = V 0 (r0 ) V 0 (r). Thus, using Lemma 1, V (e r) is more dispersed than V 0 (e r). Since V must also satisfy the participation constraint, we have: E [u (V (e r))] Applying Lemma 2 to the convex function E

u

1

u (V (e r))

r) u = E u V 0 (e u E

1

(42)

:

and inequality (42), we have: u

1

u V 0 (e r)

,

i.e. E [ (V (e r))] E [ (V 0 (e r))]. The expected cost of V 0 is weakly less than for V . Hence, the contract V 0 is cost-minimizing. We note that this last part of the reasoning underpins item 2 in Section 2.3, the extension to a risk-averse principal. Suppose that the principal wants to minimize E [w (c)], where w is an increasing and concave function, rather than E [c]. Then, the above contract is optimal if w v 1 u 1 is convex, i.e. u v w 1 is concave. This requires w to be “not too concave,”i.e. the agent to be not too risk-averse. Finally, we verify that the contract V 0 satis…es the global IC constraint. The agent’s objective function becomes u (g 0 (a ) (a + ) g (a)). Since g (a) is convex, the argument of u ( ) is concave. Hence, the …rst-order condition gives the global optimum. 3aii). Proof that if Statement T holds for T , it holds for T + 1. We de…ne a new utility function u b as follows: u b (x) = E u x + g 0 aT +1 reT +1 : (43) Since reT +1 is log-concave, g 0 aT +1 reT +1 is also log-concave. From Lemma 3, u b has the same NIARA property as u. For each re1 ; :::; reT , we de…ne k (e r1 ; :::; reT ) as the solution to equation (44) below: 23

u b (k (e r1 ; :::; reT )) = ET [u (V (e r1 ; :::; reT +1 ))] :

Formally, this can be achieved by replacing the utility function u (x) by unew (x) = u (x cost function g (a) by g new (a) = g (a) g (a ), so that u (x g (a)) = unew (x g new (a)).

35

(44)

g (a )) and the

k represents the expected felicity from contract V based on all noise realizations up to and including time T . The goal is to show that any other contract V 6= V 0 is weakly costlier. To do so, we wish to apply Statement T for utility function u b and contract k, The …rst step is to show that, if Conditions (i)-(iii) hold for utility function u and contract V at time T + 1, they also hold for u b and k at time T , thus allowing us to apply the Statement for these functions. Taking expectations of (44) over re1 ; :::; reT yields: E [b u (k (e r1 ; :::; reT ))] = E [u (V (e r1 ; :::; reT +1 ))]

(45)

u;

where the inequality comes from Condition (i) for utility function u and contract V at time T + 1. Hence, Condition (i) holds for utility function u b and contract k at time t. In addition, it is immediate that E [b u (k (e r1 ; :::; reT )) j re1 ; ::; ret ] is nondecreasing in ret . (Condition (iii)). We thus need to show that Condition (ii) is satis…ed. Since equation (40) holds for t = T + 1, we have d u (V (e r1 ; :::; reT ; reT +1 + ")) d"

g 0 aT +1 u0 [V (e r1 ; :::; reT +1 )] :

Applying Lemma 5 with function u yields:

dV (r1 ; :::; rT +1 ) drT +1

g 0 aT +1 :

(46)

Hence, using Lemma 1 and Lemma 4, we see that conditional on re1 ; :::; reT , V (e r1 ; :::; reT +1 ) is more dispersed than k (e r1 ; :::; reT ) + g 0 aT +1 reT +1 . Using (43), we can rewrite equation (44) as ET u k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

= ET [u (V (e r1 ; :::; reT +1 ))] :

Since u exhibits NIARA, u00 (x) =u0 (x) is nonincreasing in x. This is equivalent to u0 being weakly convex. We can thus apply Lemma 2 to yield: ET u 0 u

1

u (V (e r1 ; :::; reT +1 )) 0

ET [u (V (e r1 ; :::; reT +1 ))]

ET u 0 u 0

1

u k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

u

1

, i.e.

ET [b u (k (e r1 ; :::; reT ))] :

(47)

Applying de…nition (44) to the left-hand side of Condition (ii) for T +1 yields, with t = 1:::T , d Et [b u (k (e r1 ; :::; ret + "; :::; reT ))]j"=0 d"

g 0 (at ) E [u0 (V (e r1 ; :::; ret ; :::; reT +1 )) j re1 ; ::; ret ]

Taking expectations of equation (47) at time t and substituting into the right-hand side of the

36

above equation yields: d d Et [b u (k (e r1 ; :::; ret + "; :::; reT ))] = Et [u (V (e r1 ; :::; ret + "; :::; reT +1 ))]j"=0 d" d" g 0 (at ) Et [b u0 (k (e r1 ; :::; reT ))] :

Hence the IC constraint holds for contract k (e r1 ; :::; reT ) and utility function u b at time T , and so Condition (ii) of Statement T is satis…ed. We can therefore apply Statement T at T to contract k (r1 ; :::; rT ), utility function u b and cost function b de…ned by: b (x)

We observe that the contract V 0 = "

E u b

T X

E [ (x + g 0 (aT +1 ) reT +1 )] :

PT +1 t=1

g 0 (at ) rt + K

t=1

Therefore, applying Statement h

T

g 0 (at ) rt + K satis…es:

!#

"

T +1 X

=E u

g 0 (at ) rt + K

t=1

to k; u b and b implies:

i b Ck = E (k (e r1 ; :::; reT ))

(48)

CV 0 = E

"

T +1 X t=1

Using equation (48) yields:

Ck = E [ (k (e r1 ; :::; reT ) + g 0 (aT +1 ) reT +1 )]

CV 0 = E

!#

= u:

g 0 (at ) ret + K

"

T +1 X t=1

!#

(49)

:

g 0 (at ) ret + K

!#

:

Finally, we compare the cost of contract k (r1 ; :::; rT ) + g 0 (aT +1 ) reT +1 to the cost of the original contract V (r1 ; :::; rT +1 ). Since equation (44) is satis…ed, we can apply Lemma 2 to the convex function u 1 and the random variable reT +1 to yield Et [ (V (e r1 ; :::; reT +1 ))] E [ (V (e r1 ; :::; reT +1 ))]

Et

E

k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

= Ck

CV 0 :

where the …nal inequality comes from (49). Hence the cost of contract k is weakly greater than the cost of contract V 0 . This concludes the proof for T + 1. 3b). Optimality of Deterministic Contracts Consider a randomized contract Ve (r1 ; :::; rT ) and de…ne the “certainty equivalent”contract V by: h i e u V (r1 ; :::; rT ) ET u V (r1 ; :::; rT ) : (50) 37

We wish to apply Statement T (which we have already proven for deterministic contracts) to contract V , and so must verify that its three conditions are satis…ed. From the above de…nition, we obtain h

= E u Ve (e r1 ; :::; reT )

E u V (e r1 ; :::; reT )

i

u,

i.e., V satis…es the participation constraint (32). Hence, Condition (i) holds. Also, it is clear that Condition (iii) holds for V , given it holds for Ve . We thus need to show that Condition 0 1 (ii) is also satis…ed. Applying Jensen’s inequality to equation (50) and the h function u u i (which is convex since u exhibits NIARA) yields: u0 V (r1 ; :::; rT ) ET u0 Ve (r1 ; :::; rT ) . We apply this to rt = ret for t = 1:::T and take expectations to obtain h

Et u Ve (e r1 ; :::; reT ) 0

i

r1 ; :::; reT ) Et u0 V (e

(51)

:

Applying de…nition (50) to the left-hand side of (40) yields: d Et u V (e r1 ; :::; ret + "; :::; reT ) d"

j"=0

d Et u V (e r1 ; :::; ret + "; :::; reT ) d"

j"=0

h i r1 ; :::; ret ; :::; reT ) : g 0 (at ) Et u0 Ve (e

and using (51) yields:

g 0 (at ) Et u0 V (e r1 ; :::; ret ; :::; reT )

:

Condition (ii) of Statement T therefore holds for V . We can therefore apply Statement T to show that V 0 has a weakly lower cost than V . We next show that the cost of V is weakly less than the cost of Ve . Applying Jensen’s inequality to (50) and the convex function u 1 h i V (r1 ; :::; rT ) E Ve (r1 ; :::; rT ) . We apply this to rt = ret for t = 1:::T and yields: take expectations over the distribution of ret to obtain: V (e r1 ; :::; reT )

E

h

Ve (e r1 ; :::; reT )

i

:

Hence V has a weakly lower cost than Ve . Therefore, V 0 has a weakly lower cost than Ve . This proves the Statement for randomized contracts. 3c). Main Proof. Having proven Statement T , we now turn to the main proof of Theorem 1. The value of the signal on the equilibrium path is given by ret at + et . We de…ne u (x)

u x

T X s=1

We seek to use Statement

T

!

g (as ) :

(52)

applied to function u and random variable ret , and thus must 38

h i verify that its three conditions are satis…ed. Since E u Ve (e r1 ; :::; reT ) holds. The IC constraint for time t is: 0 2 arg max Et u Ve (a1 + e1 ; :::; at + et + "; :::; aT + eT ) 0 2 arg max Et u Ve (e r1 ; :::; ret + "; :::; reT ) "

X

g (at + ")

"

i.e.

u, Condition (i)

g (as ) ;

s=1:::T;s6=t

X

g (at + ")

!

g (as ) :

s=1:::T;s6=t

!

(53)

We note that, for a function f ("), 0 2 arg max" f (") implies that for all " < 0, (f (0) f (")) = ( ") 0. Call X (") the argument of u in 0, hence, taking the lim inf y"0 , we obtain d"d f 0 (")j"=0 d equation (53). Applying this result to (53), we …nd: d" Et u (X ("))j"=0 0. h i 0. Using Lemma 6, d"d X (")j"=0 = Using Lemma 5, we …nd Et u0 (X (0)) d"d X (")j"=0 d e V (e r1 ; :::; ret + "; :::; reT ) g 0 (at ), hence we obtain: d"

Et u0 (X (0))

d e V (e r1 ; :::; ret + "; :::; reT ) d"

g 0 (at )

0:

Using again Lemma 5, this can be rewritten: " d r1 ; :::; ret + "; :::; reT ) Et u Ve (e d"

X

!#

g 0 (at ) Et [u0 (X (0))] ;

g (as )

s=1:::T

j"=0

i.e., using the notation (52),

h i d Et u Ve (e r1 ; :::; ret + "; :::; reT ) d" j"=0

h i g 0 (at ) Et u0 Ve (e r1 ; :::; ret ; :::; reT ) :

Therefore, Condition (ii) of Statement T holds. Finally, we verify Condition (iii). Apply (53) to signal rt and deviation " < 0. We obtain: "

X

Et u Ve (e r1 ; :::; ret + "; :::; reT )

g (as )

s=1:::T

"

Et u Ve (e r1 ; :::; ret + "; :::; reT ) "

Et u Ve (r1 ; :::; rt + "; :::; rT )

!#

g (at + ")

s=1:::T;s6=t

g (at )

X

s=1:::T;s6=t

so Condition (iii) holds for contract Ve and utility function u. 39

X

!#

g (as ) !#

g (as )

;

We can now apply Statement T to contract Ve and function u, to prove that any globally P IC contract is weakly costlier than contract V 0 = Tt=1 g 0 (at ) rt + K. Moreover, it is clear that V 0 satis…es the global IC conditions in equation (53). Thus, V 0 is the cheapest contract that satis…es the global IC constraint. Proof of Proposition 1 Conditionally on ( t )t T +1 , we must have: aT +1 2 arg max u V a1 + aT +1

1 ; :::; aT +1

+

g (aT +1 )

T +1

X

!

g (at ) :

t6=T +1

Using the proof of Theorem 1 with T = 1, this implies that, for rT +1 in the interior of the support of reT +1 (given (rt )t T ), V (r1 ; :::; rT +1 ) can be written: V (r1 ; :::; rT +1 ) = KT (r1 ; :::; rT ) + g 0 aT +1 rT +1 ;

for some function KT (r1 ; :::; rT ). Next, consider the problem of implementing action aT at time T . We require that, for all ( t )t T , "

aT 2 arg max ET u KT (a1 + aT

1 ; :::; aT

+

T)

+ g 0 aT +1

T +1

+ aT +1

g (aT )

X t6=T

!#

g (at )

This can be rewritten aT 2 arg max u b (KT (a1 +

1 ; :::; aT

aT

+

T)

g (aT )) ;

i h P ) j ; :::; g (a where u b (x) E u x + g 0 aT +1 + a 1 T . T +1 t T +1 t6=T Using the same arguments as above for T + 1, that implies that, for rT in the interior of the support of reT (given (rt )t T 1 ) we can write: KT (r1 ; :::; rT ) = KT

for some function KT we can write, for (rt )t

1

(r1 ; :::; rT

1)

+ g 0 (aT ) rT

(r1 ; :::; rT 1). Proceeding by induction, we see that this implies that rt )t T +1 , T +1 in the interior of the support of (e

1

VT +1 (r1 ; :::; rT +1 ) =

T +1 X

g 0 (at ) rt + K0 ;

t=1

for some constant K0 . This yields the “necessary”…rst part of the Proposition. The converse part of the Proposition is immediate. Given the proposed contract, the agent

40

:

faces the decision: "

max E u

(at )t

T

T X

g 0 (at ) at

g (at ) +

T X t=1

t=1

g 0 (at )

t

!#

;

which is maximized pointwise when g 0 (at ) at g (at ) is maximized. This in turn requires at = at . Proof of Theorem 2 We shall use the following purely mathematical Lemma, proven in the Online Appendix. Lemma 7 Consider a standard Brownian process Zt with …ltration Ft , a deterministic nonRT RT negative process t , an Ft adapted process t , T 0, X = 0 t dZt , and Y = 0 t dZt . Suppose that almost surely, 8t 2 [0; T ], t t . Then X second-order stochastically dominates Y. Lemma 7 is intuitive: since t 0, it makes sense that Y is more volatile than X. t To derive the IC constraint, we use the methodology introduced by Sannikov (2008). We RT observe that the term 0 t dt induces a constant shift, so w.l.o.g we can assume t = 0 8 t. For an arbitrary adapted policy function a = (at )t2[0;T ] , let Qa denote the probability meaRt sures induced by a. Then, Zta = 0 (drs as ds) = s is a Brownian motion under Qa , and Rt Zta = 0 (drs as ds) = s is a Brownian under Qa , where a is the policy (at )t2[0;T ] : Rt Recall that, if the agent exerts policy a , then rt = 0 as ds + s dZs . We de…ne vT = v (c). By the martingale representation theorem (Karatzas and Shreve (1991), p. 182) applied to RT process vt = Et [vT ] for t 2 [0; T ], we can write: vT = 0 t (drt at dt) + v0 for some constant v0 and a process t adapted to the …ltration induced by (rs )s t . We proceed in two steps. 1) We show that policy a is optimal for the agent if and only if, for almost all t 2 [0; T ]: at 2 arg max t at at

g (at ) :

(54)

To prove this claim, consider another action policy (at ), adapted to the …ltration induced RT by (Zs )s t . Consider the value W = vT g (at ) dt, so that the …nal utility for the agent 0 RT under policy a is u (W ). De…ning L [ t at g (at ) t at + g (at )] dt, it can be rewritten 0 W = v0 +

Z

T

t (drt

at dt)

0

Z

T

g (at ) dt + L:

0

Suppose that (54) is not veri…ed on the set of times with positive measure. Then, consider a policy a such that t at g (at ) > t at g (at ) for t 2 , and at = at on [0; T ] n . We thus

41

have L > 0. Consider the agent’s utility under policy a: a

U =E

a

Z

u vT

= E a u v0 +

a

a t t dZt

0

> E a u v0 +

T a t t dZt

0

=E

a

u v0 +

Z

T a t t dZt

T

g (at ) dt + L

0

T

g (at ) dt

= Ua ;

0

0

0

Z

T

at dt) = E u v0 + t (drt 0 Z T g (at ) dt + L 0 Z T since L > 0 g (at ) dt 0 Z Z T a g (at ) dt =E u vT

g (at ) dt

0 Z T

Z

Z

T

where U a is the agent’s utility under policy a . Hence, as U a > U a , the IC condition is violated. We conclude that condition (54) is necessary for the contract to satisfy the IC condition. We next show that condition (54) is also su¢ cient to satisfy the IC condition. Indeed, consider any adapted policy a. Then, L 0. So, the above reasoning shows that U a Ua . Policy a is at least as good as any alternative strategy a. 2) We show that cost-minimization entails t = g 0 (at ). (54) implies t = g 0 (at ) if at 2 (a; a), and t g 0 (a ) if at = a. The case where at 2 (a; a) 8 t is straightforward. The IC contract must have the form: v (cT ) = v0 +

Z

Z

T 0

g (at ) (drt

at dt) =

T

g 0 (at ) drt + K;

0

0

RT where K = v0 + 0 g 0 (at ) at dt. Cost minimization entails the lowest possible v0 . The case where at = a for some t is more complex, since the IC constraint is only an g 0 (at ). We must therefore prove this inequality binds. Consider inequality: t t X=

Z

T t

t dzt ,

Y =

0

Z

T t t dzt :

0

RT RT By reshifting u (x) ! u x g (at ) dt if necessary, we can assume 0 g (at ) dt = 0 to 0 simplify notation. We wish to show that a contract vT = Y + KY , with E [u (Y + KY )] u, has a weakly greater expected cost than a contract v = X + KX , with E [u (X + KX )] = u. Lemma 7 implies that E [u (X + KX )] E [u (Y + KX )], and so E [u (Y + KX )] Thus, KX

E [u (X + KX )] = u

KY . Since v is increasing and concave, v

42

1

[u (Y + KY )] : is convex and

v

1

is concave. We

can therefore apply Lemma 7 to function E v

1

(X + KX )

v

E v

1

1

to yield:

(Y + KX )

E v

1

(Y + KY ) ;

where the second inequality follows from KX KY . Therefore, the expected cost of v = X +KX is weakly less that of Y + KY , and so contract v = X + KX is cost-minimizing. More explicitly, RT that is the contract (22) with K = KX + 0 g 0 (at ) at dt.

Proof of Proposition 2 The proof is by induction. Proof of Proposition 2 for T = 1. We remove time subscripts and let V (b) = v (C (b)) denote the felicity received by the agent if he announces b and signal A (b) + b is revealed. If the agent reports , the principal expects to see signal + A ( ). Therefore, if the agent deviates to report b 6= , he must take action a such that +a = b+A (b), i.e. a = A (b)+ b . Hence, the truth-telling constraint is: 8 ; 8b, V (b) De…ning

g (A (b) + b ( )

)

V( )

V( )

(55)

g (A ( )) :

g (A ( )) ;

the truth-telling constraint (55) can be rewritten, g (A (b))

g (A (b) + b

g (A (b) + b

( )

(56)

(b) :

and b and combining with the original inequality (56)

Rewriting this inequality interchanging yields: 8 ; 8b : g (A (b))

)

)

( )

(b)

g (A ( ) +

b)

g (A ( )) :

(57)

Consider a point where A is continuous and take b < . Dividing (57) by b > 0 and 0 0 taking the limit b " yields lef t ( ) = g (A ( )). Next, consider b > . Dividing (57) by 0 b < 0 and taking the limit b # yields right ( ) = g 0 (A ( )). Hence, 0

( ) = g 0 (A ( )) ;

(58)

at all points where A is continuous. Equation (58) holds only almost everywhere, since we have only assumed that A is almost everywhere continuous. To complete the proof, we require a regularity argument about (otherwise might jump, for instance). We will show that is absolutely continuous (see, e.g., Rudin (1987), p.145). Consider a compact subinterval I, and aI = sup fA ( ) + b j ; b 2 Ig, which 43

is …nite because A is assumed to be bounded in any compact subinterval of . Then, equation (57) implies: j ( )

(b)j

max fjg (A (b))

g (A (b) + b

)j ; g (A ( ) +

b)

g (A ( ))g

j

bj (sup g 0 )I .

This implies that is absolutely continuous on I. Therefore, by the fundamental theorem of calculus for almost everywhere di¤erentiable functions (Rudin (1987), p.148), we have that R 0 R 0 g (A (x)) dx, i.e. (x) dx. From (58), ( ) = ( ) + for any ; , ( ) = ( ) + V ( ) = g (A ( )) +

Z

g 0 (A (x)) dx + k

(59)

with k = ( ). This concludes the derivation of the contract when T = 1. “Second-order conditions.”We next show that the contract (59) does implement e¤ort A ( ), i¤ A ( ) + is nondecreasing: we have veri…ed the …rst order condition, but we need to show that (55) holds given the proposed contract, that is, that (b) V (b) g (A (b) + b ) has a maximum at . Proof that A ( ) + nondecreasing is a su¢ cient condition for the contract to implement the action. First, we do this when A ( ) is a C 1 function. Then, 0

(b) = V 0 (b)

g 0 (A (b) + b

= [g 0 (A (b))

g 0 (A (b) + b

) (A0 (b) + 1) )] (A0 (b) + 1)

As A0 (b) + 1 0 and g is convex, we have 0 (b) 0 for b and 0 (b) 0 for b . That shows that (b) is maximized at b = . Second, in the case where A is not necessarily C 1 , we approximate the weakly increasing function A ( ) + by a series of C 1 weakly increasing functions An ( ) + . (It is well-known that this is easy to do by convolution: take a random variable " with bounded support and R C 1 density f , and de…ne An ( ) + = E A + n" + + n" = (A (x) + x) f (n (x )) ndx 1 which increasing in by the …rst equality, and C by the second.) Consider the associated contract Vn ! V . We have seen that 2 arg max b Vn (b) g (An (b) + b ), so in the limit, 2 arg max b V (b) g (A (b) + b ). Proof that A ( ) + nondecreasing is a necessary condition. Call R ( ) = A ( ) + . Suppose by contradiction that there are two points < 0 such that R ( ) > R ( 0 ). Those two points can be taken arbitrarily close (indeed, consider a large N , the points i = + ( 0 ) i=N , i = 0:::N ; there must be an i such that R ( i ) > R ( i+1 ), otherwise we would have R ( ) = 0 R ( 0 ) R ( N ) = R ( 0 )). As domain A of actions is open, that implies that A ( ) + 2 A. Applying (55) at point and 0 , we have: V ( 0 ) g (A ( 0 ) +

0

)

V ( ) g (A ( )) and V ( ) g (A ( ) + 44

0

)

V ( 0 ) g (A ( 0 )) )

g (A ( 0 )) Calling y g (x + h)

0

g (A ( ) +

V ( 0)

)

g (A ( 0 ) +

V( )

0

)

g (A ( ))

0 A ( )+ < x A ( ) and h = A ( 0 )+ 0 A ( ) , this writes g (y + h) g (y) g (x), and we have a contradiction if g is strictly convex.

Proof that if Proposition 2 holds for T , it holds for T + 1. This part of the proof is as the proof of Theorem 1 in the main text. At t = T + 1, if the agent reports bT +1 , he must take action a = A (bT +1 ) + bT +1 T +1 so that the signal a + T +1 is consistent with declaring bT +1 . The IC constraint is therefore: T +1

2 arg max V ( 1 ; :::; bT +1

T ; bT +1 )

g (A (bT +1 ) + bT +1

Applying the result for T = 1, to induce bT +1 = V ( 1 ; :::;

T ; bT +1 )

T +1 ,

T +1 )

T X

g (at ) :

(60)

t=1

the contract must be of the form:

= WT +1 (bT +1 ) + k ( 1 ; :::;

T);

(61)

Rb where WT +1 (bT +1 ) = g (A (bT +1 ))+ T +1 g 0 (A (x)) dx and k ( 1 ; :::; T ) is the “constant”viewed from period T + 1. In turn, k ( 1 ; :::; T ) must be chosen to implement bt = t 8t = 1:::T , viewed from time 0, when the agent’s utility is: "

E u k ( 1 ; :::;

T)

T X

+ WT +1 (bT +1 )

!#

g (at )

t=1

:

De…ning u b (x) = E [u (x + WT +1 (eT +1 ))] ;

(62)

the principal’s problem i 1:::T , with a contract k ( 1 ; :::; T ), given h is to implementPb = t 8t = T . Applying the result for T , we see that k a utility function E u b k ( 1 ; :::; T ) t=1 g (at ) must be: T T Z t X X k ( 1 ; :::; T ) = g (At ( t )) + g 0 (At (x)) dx + k t=1

t=1

for some constant k . Combining this with (59), the only incentive compatible contract is: V ( 1 ; :::;

T ; T +1 )

=

T +1 X

g (At ( t )) +

t=1

T +1 Z X

t

g 0 (At (x)) dx + k :

t=1

The treatment of the second-order conditions (At ( t ) + case.

t

nondecreasing) is as in the T = 1

Proof of Proposition 3 Step 1. It is easier to work in terms of Q ( ) = g 0 (A ( )), the marginal cost of ef45

fort associated with plan A ( ). With a slight abuse of notation, de…ne C [Q] as the expected cost of implementing plan Q = fQ ( )g. From Proposition 2 with T = 1, c ( ; Q) = R R v 1 g (g 0 ) 1 (Q ( )) + 0 Q (x) dx + K , where K is the solution of E u 0 Q (x) dx + K = u. Then, the expected cost is: C [Q] = E [c ( ; Q)] : We …rst establish that the contract cost C [Q] is convex in the plan Q. Consider two plans 1 Q and Q2 , 1 + 2 = 1 with 1 ; 2 2 [0; 1], and the plan Q de…ned by Q ( ) = 1 Q1 ( )+ 2 Q2 ( ). Since u is concave, Z E u

Q (x) dx +

1 K1

+

2 K2

u

0

so the constant K associated with the new plan satis…es K 1 K1 + 2 K2 . This shows that the 1 function K [Q] is convex in Q. Since g (g 0 ) and v 1 are convex, C [Q] 1 C [Q1 ] + 2 C [Q2 ], i.e., C is convex. Step 2. Since C is convex, we have: h i C Q

C [Q]

h i Z @C Q @Q ( )

Furthermore, since g 0 is convex, Q Q ( ) g 00 a a h i R we have C A C [A] a; a a( ) d :

C

Q

Q( ) d :

A ( ) . De…ning

a;

= max 0;

h i @C A @A( )

A Microfoundation for the Principal’s Objective

We o¤er a microfoundation for the principal’s objective function (26). Suppose that the agent 0. can take two actions, a “fundamental” action aF 2 (a; a] and a manipulative action m F F Firm value is a function of a only, i.e. the bene…t function is b a ; . The signal is increasing in both actions: r = aF + m + . The agent’s utility is v (c) g F (a) + G (m) , where g, G are increasing and convex, G (0) = 0, and G0 (0) g 0 a . The …nal assumption means that manipulation is costlier than fundamental e¤ort. We de…ne a = aF +m and the cost function g (a) = minaF ;M g F (a) + G (m) j aF + m = a , so that g (a) = g F (a) for a 2 (a; a] and g (a) = g F (a) + g (m a) for a a, which is increasing and convex. Then, …rm value can be written b min a; a ; e , as in equation (26). This framework is consistent with rational expectations. Suppose b aF ; = eaF + . After observing the signal r, the market forms its expectation P1 of the …rm value b aF ; . The incentive contract described in Proposition 2 implements a a, so the agent will not engage in manipulation. Therefore, the rational expectations price is P1 = er . In more technical terms, consider the game in which the agent takes action a and the market sets price P1 after observing signal r. It is a Bayesian Nash equilibrium for the agent to choose A ( ) and for the market to set price P1 = er . 46

,

References [1] Arnott, R. and J. Stiglitz (1988): “Randomization with Asymmetric Information.” RAND Journal of Economics 19, 344-362 [2] Baker, G. (1992): “Incentive Contracts and Performance Measurement.” Journal of Political Economy 100, 598-614 [3] Bennedsen, M., F. Perez-Gonzalez and D. Wolfenzon (2009): “Do CEOs Matter?”Working Paper, Copenhagen Business School [4] Biais, B., T. Mariotti, G. Plantin and J.-C. Rochet (2007): “Dynamic Security Design: Convergence to Continuous Time and Asset Pricing Implications.” Review of Economic Studies 74, 345-390 [5] Biais, B., T. Mariotti, J.-C. Rochet and S. Villeneuve (2009): “Large Risks, Limited Liability and Dynamic Moral Hazard.”Working Paper, Université de Toulouse [6] Caplin, A. and B. Nalebuff (1991): “Aggregation and Social Choice: A Mean Voter Theorem.”Econometrica 59, 1-23 [7] Cooley, T. and E. Prescott (2005): “Economic Growth and Business Cycles,” in “Frontiers in Business Cycle Research,”T. Cooley ed., Princeton University Press, Princeton [8] DeMarzo, P. and M. Fishman (2007): “Optimal Long-Term Financial Contracting.” Review of Financial Studies 20, 2079-2127 [9] DeMarzo, P. and Y. Sannikov (2006): “Optimal Security Design and Dynamic Capital Structure in a Continuous-Time Agency Model.”Journal of Finance 61, 2681-2724 [10] Dittmann, I., and E. Maug (2007): “Lower Salaries and No Options? On the Optimal Structure of Executive Pay.”Journal of Finance 62, 303-343 [11] Dittmann, I., E. Maug and O. Spalt (2009): “Sticks or Carrots? Optimal CEO Compensation when Managers are Loss-Averse.”Journal of Finance, forthcoming [12] Edmans, A., X. Gabaix and A. Landier (2009): “A Multiplicative Model of Optimal CEO Incentives in Market Equilibrium.”Review of Financial Studies, forthcoming [13] Edmans, A., X. Gabaix, T. Sadzik and Y. Sannikov (2009): “Dynamic Incentive Accounts.”NBER Working Paper No. 15324. [14] Farhi, E. and I. Werning (2009): “Capital Taxation: Quantitative Explorations of the Inverse Euler Equation.”Working Paper, Harvard University. 47

[15] Golosov, M., N. Kocherlakota and A. Tsyvinski (2003): “Optimal Indirect Capital Taxation.”Review of Economic Studies, 70, 569–587. [16] Grossman, S. and O. Hart (1983): “An Analysis of the Principal-Agent Problem.” Econometrica 51, 7-45 [17] Hall, B. and J. Liebman (1998): “Are CEOs Really Paid Like Bureaucrats?”Quarterly Journal of Economics, 113, 653-691 [18] Hall, B. and K. Murphy (2002): “Stock Options for Undiversi…ed Executives.”Journal of Accounting and Economics 33, 3-42 [19] Harris, M. and A. Raviv (1979): “Optimal Incentive Contracts With Imperfect Information.”Journal of Economic Theory 20, 231-259 [20] He, Z. (2009a): “Optimal Executive Compensation when Firm Size Follows Geometric Brownian Motion.”Review of Financial Studies 22, 859-892. [21] He, Z. (2009b): “Dynamic Compensation Contracts with Private Savings.” Working Paper, University of Chicago [22] Hellwig, M. (2007): “The Role of Boundary Solutions in Principal-Agent Problems of the Holmstrom-Milgrom Type.”Journal of Economic Theory 136, 446-475 [23] Hellwig, M. and K. Schmidt (2002): “Discrete-Time Approximations of the Holmstrom-Milgrom Brownian-Motion Model of Intertemporal Incentive Provision.” Econometrica 70, 2225-2264 [24] Hemmer, T. (2004): “Lessons Lost in Linearity: A Critical Assessment of the General Usefulness of LEN Models in Compensation Research.”Journal of Management Accounting Research 16, 149-162 [25] Holmstrom, B. and P. Milgrom (1987): “Aggregation and Linearity in the Provision of Intertemporal Incentives.”Econometrica 55, 308-328 [26] Jewitt, I. (1988): “Justifying the First-Order Approach to Principal-Agent Problems.” Econometrica 56, 1177-1190 [27] Karatzas, I. and S. E. Shreve (1991): Brownian Motion and Stochastic Calculus, 2nd edition, Springer Verlag [28] Kremer, m. (1993): “The O-Ring Theory of Economic Development.”Quarterly Journal of Economics 108, 551-576. [29] Krishna, V. and E. Maenner (2001): “Convex Potentials with an Application to Mechanism Design.”Econometrica 69,1113-1119 48

[30] Lacker, J. and J. Weinberg (1989): “Optimal Contracts under Costly State Falsi…cation.”Journal of Political Economy 97, 1345-1363 [31] Laffont, J.-J. and D. Martimort (2002): “The Theory of Incentives: The PrincipalAgent Model.”Princeton University Press, Princeton. [32] Landsberger, M. and I. Meilijson (1994): “The Generating Process and an Extension of Jewitt’s Location Independent Risk Concept.” Management Science 40, 662-669 [33] Mirrlees, J. (1974): “Notes on Welfare Economics, Information and Uncertainty” in Michael Balch, Daniel McFadden, and Shih-Yen Wu, eds., Essays on Economic Behavior under Uncertainty, North-Holland, Amsterdam. [34] Mueller, H. (2000): “Asymptotic E¢ ciency in Dynamic Principal-Agent Problems.” Journal of Economic Theory 91, 292-301 [35] Murphy, K. (1999): “Executive Compensation” in Orley Ashenfelter and David Card, eds., Handbook of Labor Economics, Vol. 3b. New York and Oxford: Elsevier/NorthHolland, 2485-2563 [36] Ou-Yang, H. (2003): “Optimal Contracts in a Continuous-Time Delegated Portfolio Management Problem.”Review of Financial Studies 16, 173-208 [37] Phelan, C. and R. Townsend (1991): “Private Information and Aggregate Behaviour: Computing Multi-Period, Information-Constrained Optima.”Review of Economic Studies 58, 853-881 [38] Prendergast, C. (2002): “The Tenuous Trade-O¤ between Risk and Incentives.”Journal of Political Economy, 110, 1071-102 [39] Rogerson, W. (1985): “The First Order Approach to Principal-Agent Problems.” Econometrica 53, 1357-1368 [40] Rudin, W. (1987): Real and Complex Analysis, 3rd edition, McGraw-Hill [41] Sannikov, Y. (2008): “A Continuous-Time Version of the Principal-Agent Problem.” Review of Economic Studies, 75, 957-984 [42] Sappington, D. (1983): “Limited Liability Contracts Between Principal and Agent.” Journal of Economic Theory 29, 1-21 [43] Schaettler, H. and J. Sung (1993): “The First-Order Approach to the ContinuousTime Principal-Agent Problem With Exponential Utility.” Journal of Economic Theory 61, 331-371

49

[44] Shaked, M. and G. Shanthikumar (2007): Stochastic Orders, Springer Verlag [45] Spear, S. and S. Srivastava (1987): “On Repeated Moral Hazard With Discounting.” Review of Economic Studies 54, 599-617 [46] Sung, J. (1995): “Linearity with Project Selection and Controllable Di¤usion Rate in Continuous-Time Principal-Agent Problems.” RAND Journal of Economics 26, 720-743

50

Online Appendix for “Tractability in Incentive Contracting” Alex Edmans and Xavier Gabaix November 9, 2009

D

Multidimensional Signal and Action

While the core model involves a single signal and action, this section shows that our contract is robust to a setting of multidimensional signals and actions. For brevity, we only analyze the discrete-time one-period case, since the continuous time extension is similar. The agent now takes a multidimensional action a 2 A, which is a compact subset of RI for some integer I. (Note that in this section, bold font has a di¤erent usage than in the proof of Theorem 1.) The signal is also multidimensional: r = b (a) + ; where ; r 2 RS , and b:A 2 RI !RS . The signal and action can be of di¤erent dimensions. In the core model, S = I = 1 and b(a) = a. As before, the contract is c (r) and the indirect felicity function is V (r) = v (c (r)). The following Proposition states the optimal contract. Proposition 5 (Optimal contract, discrete time, multidimensional signal and action). De…ne @b the I S matrix L = b0 (a )> i.e. explicitly Lij = @aij (a1 ; :::; aI ), and assume that there is a vector 2 RS such that L = g 0 (a ) ; (63) i.e., explicitly: 8i = 1:::I;

S X @bj j=1

@ai

(a1 ; :::; aI )

j

=

@g (a ; :::; aI ) : @ai 1

The following contract is optimal. The agent is paid c (r) = v

1

(64)

( r + K (r)) ;

PS i.e., explicitly, c (r) = v 1 j=1 i ri + K (r1 ; :::; rn ) , where the function K ( ) is the solution of the following optimization problem: min E [K (b (a ) + )] subject to K( )

8r; LK 0 (r) = 0 E [u ( (b (a ) + ) + K (b (a ) + )

51

(65) g (a ))]

u.

Proof. Here we derive the …rst-order condition; the remainder of the proof is as in Theorem 1 of the main paper. Incentive compatibility requires that, for all a 2 arg max V (b (a) + ) a

g (a) ,

and so: V 0 (b (a ) + ) b0 (a )

g 0 (a ) = 0;

(66)

where V 0 is a S dimensional vector, b0 (a ) is a S I matrix, and g 0 (a ) is a I dimensional P vector. Integrating (66) gives: V (r) = r + K (r), where r = Si=1 i ri , and LK 0 (r) = 0. Note that K(r) is now a function and so determined by solving an optimization problem. In the core model, K is a constant and determined by solving an equality. We now analyze two speci…c applications of this extension. Two signals. The agent takes a single action, but there are two signals of performance: r1 = a + "1 ;

r2 = a + "2 :

In this case, L = (1 1). Therefore, with = ( 1 ; 2 ) 2 R2 , (63) becomes: 1 + 2 = g 0 (a ). For example, we can take 1 = 2 = g 0 (a ) =2. Next, (65) becomes: @[email protected] + @[email protected] = 0. It is well known that this can be integrated into: K (r1 ; r2 ) = k (r1 r2 ) for a function k. Hence, the optimal contract can be written: c=v

1

g 0 (a )

r1 + r2 2

+ k (r1

r2 ) ;

where the function k( ) is chosen to minimize the cost of the contract subject to the participation constraint. As in Holmstrom (1979), all informative signals should be used to determine the agent’s compensation. Relative performance evaluation. Again, there is a single action and two signals, but the second signal is independent of the agent’s action, as in Holmstrom (1982): r 1 = a + "1 ;

r2 = "2

In this case, L = (1 0). Therefore, with = ( 1 ; 2 ) 2 R2 , (63) becomes: 1 = g 0 (a ). Next, (65) becomes: @[email protected] = 0, so that K (r1 ; r2 ) = k (r2 ) for a function k. Hence, the optimal contract can be written: c = v 1 (g 0 (a ) r1 + k (r2 )) : The second signal enters the contract even though it is una¤ected by the agent’s action, since it may be correlated with the noise in the …rst signal.

52

E E.1 E.1.1

Extension to The Optimal E¤ort Level Illustrations for Proposition 2 A¢ ne Cost of E¤ort

While Theorem 3 shows that A( ) = a is optimal when Proposition 3 is satis…ed, we now show that A( ) can be exactly derived even if Theorem 3 does not hold and the maximum e¤ort principle does not apply, if the cost function is linear –i.e. g (a) = a, where > 0.24 We use the bene…t function b (a; ) = Sb (a; ) as in Section 3.2. Proposition 6 (Optimal contract with linear cost of e¤ort). Let g (a) = a, where following contract is optimal: c = v 1 ( r + K) ;

> 0 . The (67)

where K is a constant that makes the participation constraint bind (E [u ( + K)] = u). For each , the optimal e¤ort A ( ) is determined by the following pointwise maximization: A ( ) 2 arg max Sb (a; )

v

1

( (a + ) + K) :

(68)

a a

When the agent is indi¤erent between an action a and A ( ), we assume that he chooses action A( ): Proof. From Proposition 2, if the agent announces , he should receive a felicity of V ( ) = R dx + K = (A ( ) + ) + K. Since r = A ( ) + on the equilibrium path, a g (A ( )) + contract c = v 1 ( r + K) will implement A ( ). To …nd the optimal action, the principal’s problem is: max E Sb min A ( ) ; a ; E v 1 ( (A ( ) + ) + K) A( )

which is solved by pointwise maximization, as in (68). The main advantage of the above contract is that it can be exactly solved regardless of S and so it is applicable even for small …rms (or rank-and-…le employees who a¤ect a small output). For instance, consider a bene…t function b (a; ) = b0 + ae , where b0 > 0, so that the marginal productivity of e¤ort is increasing in the noise, and utility function u (ln c a) with 2 (0; 1). Then, the solution of (68) is: A ( ) = min

1

+

1

(ln S

K

ln ) ; a :

The optimal e¤ort level increases linearly with the noise, until it reaches a. The e¤ort level is also weakly increasing in …rm size. 24

Note that the linearity of g(a) is still compatible with u (v (c) g (a)) being strictly concave in (c; a). Also, by a simple change of notation, the results extend to an a¢ ne rather than linear g (a).

53

Note that, with a linear rather than strictly convex cost function, the agent is indi¤erent between all actions. His decision problem is maxa v (c (r)) g (a), i.e. maxa ( + a) + K a, which is independent of a and thus has a continuum of solutions. As in, e.g., Grossman and Hart (1983), Proposition 6 therefore assumes that indeterminacies are resolved by the agent following the principal’s recommended action, A ( ). E.1.2

Exponential u and Linear v

We continue to assume that the maximum e¤ort principle does not apply, and now consider the case where consider the HM assumptions of exponential utility and a pecuniary cost of e¤ort, but do not impose Gaussian noise nor continuous time. We show that, as in HM, the same action function At ( t ) is optimal in each period t. However, unlike in HM, At ( t ) is not a constant independent of t . The intuition is that, if noise is low, the optimal contract may wish to reduce the required e¤ort level to cushion the e¤ect of low noise on the agent’s utility. Proposition 7 (Constant target action, exponential utility and pecuniary cost of e¤ort). Suppose the agent has a CARA utility function u(x) = e x and a linear felicity function v(x) = x, and suppose the bene…t of e¤ort in each period is a weakly concave function b(a). Then, the optimal contract prescribes the same (possibly noise-dependent) action A( ) in each period. Proof. Take an optimal contract specifying actions A1 ( 1 ) ; : : : ; AT ( 1 ; : : : ; T ), and compensation C ( 1 ; : : : ; T ). Start with period t = T . The optimality of the contract implies that for all ( 1 ; : : : ; T 1 ), the choice of target action and compensation solve the optimization problem max E s:t: E

T

T

T

[b (AT ( 1 ; : : : ;

2 arg max b e

fC(

1 ;:::; T

fC(

e

1 ;:::; T

T 1 ; T ))

1; T )

g(A(

C ( 1; : : : ; 1 ;bT )

1 ;:::; T

g(A(

1 ; T ))g

T 1 ; T )]

1 ;:::; T

1 ;bT )+bT

= u ( 1; : : : ;

C ( 1; : : : ;

T ) = g (A ( 1 ; : : : ;

T )) +

;

T 1) :

By Proposition 2, the cost of compensation for a given action AT ( 1 ; : : : ; Z

T )g

T)

is minimized by

T

AT ( 1 ; : : : ;

T 1 ; x) dx

+ K ( 1; : : : ;

T 1) ;

so the principal solves a collection of problems max E

A( );K

s.t. E

T

T

h

b (A ( e

R

T )) T

A(x)dx

g (A ( i K

T ))

=u

Z

T

A (x) dx

1; : : : ;

T 1

K

(69) (70)

for (possibly) varying u ( 1 ; : : : ; T 1 ). By concavity, the solutions of these problems for each u ( 1 ; : : : ; T 1 ) are unique. Moreover, this uniqueness implies that the solutions for di¤erent 54

values of u ( 1 ; : : : ; T 1 ) may di¤er only in the constant K. Therefore, the optimal target action AT ( 1 ; : : : ; T 1 ; T ) does not depend on 1 ; : : : ; T 1 . Now, since AT 1 is the only action that can depend on T 1 , the above argument can be repeated for t = T 1; : : : ; 1. Hence, the optimal pro…le of actions A1 ( 1 ) ; : : : ; AT ( 1 ; : : : ; T ) consists of repeating the same target action A( ), which is the unique solution of the problem (69)–(70). R U ; . Let y( ) = a(x) dx + K. Example A. Suppose b(x) = Bx, g(x) = 12 Gx2 , Then, the optimal target action is the solution of max

a( );y( )

s.t.

Z

Z

1 Ga (x)2 2

Ba (x) e

y(x)

y (x) dx

= u;

y 0 (x) = a (x) : The Lagrangian of this problem is L= =

Z Z

Ba (x)

y (x)

1 Ga (x)2 2

e

y(x)

+ (x) (a (x)

y 0 (x)) dx

Ba (x)

y (x)

1 Ga (x)2 2

e

y(x)

+ (x) a (x) +

0

( )y( ) +

y

(x) y (x) dx

;

where is the multiplier attached to the reservation utility constraint, and (x) is the multiplier for the equation linking y (x) and a (x). Note that L is concave in a (x) and y (x). The …rst-order conditions are @L : B Ga (x) + (x) = 0; @a(x) @L : @y(x) @L ; @L @y ( ) @y( )

1+

e

:

y(x)

=

+

0

(x) = 0;

( ) = 0:

Substituting the …rst equality into the second we get 1+

e

y(x)

+ Ga0 (x) = 0:

Rearranging and taking a logarithm gives ln (

)

y (x) = ln (1

55

Ga0 (x)) :

Di¤erentiating the last equality gives y 0 (x) =

G

a00 (x) ; 1 Ga0 (x)

which can be simpli…ed into a00 (x) = a (x) (1

Ga0 (x)) =G:

So, the optimal action satis…es a second-order ODE with the boundary conditions = a ( ) = B=G;

a

and indeed does not depend on the reservation utility u. Example B. Take the same functions, b(x) = Bx, g(x) = 21 Gx2 and suppose that the noise is Gaussian, N (0; 2 ). We will be solving the optimization problem on the interval [ z; z], and then take the limit as z ! 1. Similar to Example A, the Lagrangian of the problem is L=

Z

z

Ba (x)

1 Ga (x)2 2

y (x)

z

(z) y ( ) + ( z) y

x

y(x)

e

+ (x) a (x) +

0

(x) y (x) dx

;

and the …rst-order conditions are @L : @a(x) @L : @y(x) @L ; @L @y( z) @y(z)

(B 1+ :

x

Ga (x)) e

+ (x) = 0; x

y(x)

( z) =

+

1 d ( (x= )) = (x= ) dx

1+

e

y(x)

+ Ga0 (x)

1 2

(x) = 0;

(z) = 0:

Substituting the …rst equality into the second to eliminate

we obtain

0

(x), and taking note that x 2

;

x (Ga (x)

B) = 0:

Rearranging and taking a logarithm gives ln (

)

y (x) = ln 1

Ga0 (x) +

56

1 2

x (Ga (x)

B) :

Di¤erentiating, taking note that y 0 (x) = a (x), and rearranging yields the following: the optimal action is the limit as z ! 1 of the solutions of (

E.2

a00 (x) = a (x)

1 G

a0 (x) + x2 a (x) B + x2 a0 (x) + G a ( z) = a (z) = B=G:

1 2

a (x)

B G

Conditions for Maximum E¤ort Principle

Section 3.2 showed that the condition in Theorem 3, 8 ; 8a

a; @1 b (a ( ) ; ) f ( )

a;

required for the maximum e¤ort principle to hold, is satis…ed if …rm size S is su¢ ciently large. This extension considers other cases in which the above condition is satis…ed, and shows su¢ cient conditions for the function a; . By Proposition 2, the optimal contract is: c( ) = v where L ( ) = cost is:

R

g 0 (a (x)) dx;

1

is an arbitrary constant in the support of . The contract’s

C [A] = E v Then we can take a; 25 lowing expression.

(g (a ( )) + L ( ) + K) ;

1

(g (a ( )) + L ( ) + K) :

= max (0; @C [A] [email protected] ( )), where @C [A] [email protected] ( ) is given by the fol-

Proposition 8 Assume that sup f ( ) < 1. For an e¤ort pro…le a ( ) + conditions of Proposition 2, the marginal cost of implementing e¤ort a ( ) is: @C [A] g 0 (a ( )) = 0 f ( )+ @a ( ) v (c ( )) 1 g 00 (a ( )) E 0 1 e> v (c (e))

satisfying the

(71) E

1 E [u0 (L (e) + K) 1 e> ] v 0 (c (e)) E [u0 (L (e) + K)]

:

where the expectation is taken over e. 0

)) The …rst term in (71), gv0(a( f ( ) ; is the “local” compensating di¤erential for inducing (c( )) greater e¤ort. Indeed, consider making the agent work a more at point e. Let c denote the 25

The proof is thus. Note that K satis…es u = E [u (L ( ) + K)]. For simplicity, we assume we can just consider a lower ). Using @L ( 0 ) [email protected] ( ) = 1 0 > g 00 (a ( )), we have: @K = @a ( )

E [u0 (L ( 0 ) + K) 1 0 > ] 00 g (a ( )) E [u0 (L ( 0 ) + K)]

which implies (71).

57

) : v 0 (c ( )) v (c ( ))

a; is simpler when noise is bounded both Second, the upper bound for @C[A] and thus @a( ) 000 above and below. If supp = [ ; ] and g (x) 0 for all x. Then @C [A] @a ( )

a;

v0 v

g 0 (a)f ( ) + g 00 (a) F ( ) u 1 (u) + g(a) + ( )g 0 (a)

1

In particular, in (27), the function can be replaced by the function a; is increasing in a. The proof is of (72) is thus. We observe that u 1 (u) + (

L( ) + K for any . If it does not hold for some L( )+K =

Z

0

0,

:

(72)

. We observe that

)g 0 (a);

then

g (a(x)) dx+K = L( 0 )+K +

Z

g 0 (a(x)) dx

L( 0 )+K (

)g 0 (a) > u 1 (u)

0

for all , and the constraint E [u(L( ) + K)] = u cannot be satis…ed. Let c = v 1 u 1 (u) + g(a) + ( )g 0 (a) . Then, all on the equilibrium consumptions are

58

no greater than c. Hence, the terms in inequality (71) can be bounded as g 0 (a(x)) f (x) v 0 (c(x))

g 0 (a) f (x); v 0 (c) 1 1 1 >x g 00 (a)E 0 1 g 00 (a(x))E 0 v (c( )) v (c)

>x

= g 00 (a)

F (x) ; v 0 (c)

which gives the claimed inequality.

E.3

Illustrations for Proposition 4

We now provide explicit conditions to verify the optimality of maximum e¤ort in the three examples in Section 3.3. Example 1. Let u (x) = x, v (x) = x , 2 (0; 1]. Consider the sub-case of u = 0 and g (a) = eGa . As stated in the paper, the objective function is: h i eGa= E (G + 1)1= :

B (a)

Call a the solution of this problem. Proposition 4 proves that implementing a among all contracts (which need not implement a ) if inf B 0 (a)f ( )

a a

is optimal

(a ; );

. where (a; ) = max 0; @C[A] @a( ) Inequality (72) establishes the bound @C [A] @a( )

(a; ) =

1

GeGa=

f ( ) + GF ( )

By Proposition 4, constant target e¤ort a necessarily requesting a constant e¤ort) if 8a , inf B 0 (a) a a

1

GeGa

=

1+(

)G

(1

)=

:

will be the optimum among all contracts (not

1 + G sup

F( ) f( )

1+(

)G

(1

)=

:

Example 2. Let v (x) = ln x, u (x) = e(1 )x = (1 ) for > 0, N (0; 2 ) and u = u (ln c). The contract specifying target e¤ort a pays c( ) = c exp g 0 (a) + g (a) (1 ) g 0 (a)2

59

2

=2 .

The noise is unbounded here, so we will use equality (71) directly: @C [A] = ce(g(a) @a(x)

(1

)g 0 (a)2

h 0 g 00 (a) E eg (a)

= ce(g(a) 00

(1

)g 0 (a)2

g 0 (a)2

g (a) e

2 =2

n ) g 0 (a) eg0 (a)x f (x) + i 2 2 0 1 >x g 00 (a) e(1 (1 ) )g (a) 2 =2

2 =2

x

) g 0 (a) eg0 (a)x 1

h

x

2 =2

+

x

0

(1

h E e(1

) g (a)

)g 0 (a)

0

g (a)

1

i

>x

io

:

Observing that x for some

(1

between (1 1 @C [A] f (x) @a(x)

x

) g 0 (a)

g 0 (a) =

ce(g(a)

(1

)g 0 (a)2

ce(g(a)

(1

)g 0 (a)2

n ) g 0 (a) eg0 (a)x + n 2 =2 ) g 0 (a) eg0 (a)x + 2 =2

0

g (a) eg (a)

2 2 =2

0

eg (a) max(x;(1

inf @1 b (a; ; a )

x=

;

@C [A] 1 = g 0 (a) @a(x) 1 @C [A] f (x) @a(x)

x

(a; x)

e

+ g 00 (a)

g 0 (a) +

x

, for

inf @1 b (a; ; a )

for all .

60

e

x=

o

o :

will be the optimum if

x

+

> 0, and

N (0;

x

g 0 (a)

0

g (a) g 00 (a) eg (a) max(0;

2 0

will be the optimum if

a a

)x)

2 2 =2

(a ; )

a a

for all . Example 3. Let v (x) = x, and u (x) = Similar to Example 2,

0

g (a) eg (a)

2 00

Let (a; x) denote the last upper bound. By Proposition 4, a

By Proposition 4, a

2 =2+

e

) g 0 (a) and g 0 (a) , we can obtain

2 00

and

x

g 0 (a)

(a ; )

;

x)

:

2

) as in HM.

F

Quits and Firings

Our setup can be extended to accommodate quits and …ring. We commence with the former. The agent now has an outside option available in each period t, and so the participation constraint in each period becomes Et [UT ] ut . As before, the principal wishes to implement (at )t T , and wishes to deter quitting. This can be achieved simply by increasing the constant K such that for all t, Et [UT ] ut . Under the conditions of Proposition 1, we can see that this is the only contract that ensures that. Economically, the agent receives rents because of his credible threat to leave in the interim periods. However, these rents only a¤ect K, not the form of the contract. As in the core paper, if the bene…t of e¤ort is su¢ ciently high, maximum e¤ort remains optimal. We now turn to …rings, considering T = 2 for simplicity and then discussing the generalizability to other T . Suppose that the principal wishes to …re the agent if r1 2 IF and keep him if r1 2 IFc , where IF and IFc are disjoint intervals. Call rF their common boundary, i.e. r1F = IF \ IFc . The next Proposition describes the contract. Proposition 9 (Contract with …ring, T = 2). Under the conditions of Proposition 1 plus the option to …re, the following contract is optimal: (i) if r1 2 IF , the agent is …red, and receives a payo¤ c = v 1 (g 0 (a1 ) r1 + K1 ), (ii) if r1 2 IFc , the agent remains employed, and receives a P2 0 …nal payo¤ c = v 1 t=1 g (at ) rt + K2 . The constants K1 and K2 are chosen such that the utility of the agent is continuous at r1 = rF , the cuto¤ return that triggers …ring. Proof. (This is a sketch of the proof, as the arguments are similar to those in the main body of the paper). De…ne 1F = r1F a1 , the cuto¤ noise that divides the regions of …ring and not …ring. For 1 2 INc F (where I is the interior of set I), by the logic of Proposition 1, very small P2 0 deviations around a1 will still keep r1 in INc F and so we require c = v 1 t=1 g (at ) rt + KN F . For 1 2 IFc , very small deviations around a1 will still keep r1 in IFc , and so we require c = v 1 (g 0 (a1 ) r1 + KF ) for some other constant. The utility should be continuous at rF to preserve the IC. Thus, the contract remains tractable even with the possibility of …ring. This is because the intuition in the core model continues to hold –since the noise is observed before the action, the contract must provide su¢ cient incentives state-by-state and so the principal has little freedom in designing the contract. This contrasts with standard models in which the possibility of …ring changes the contract signi…cantly. The only degree of freedom for the principal is …nding the domain IFc . As is standard, this will depend on the cost of …nding another agent at t = 2. For instance, if the cost of …nding a new employee are low, the domain of optimal …ring might be large. It is clear that the same logic would apply for T > 2. Suppose that the agent’s contract terminates at (a potentially return-dependent) time , with the same “tree”structure: at each time t, there is a monotone function t (r1 ; :::; rt ) such that the principal …res the agent if and 61

only if t (r1 ; :::; rt ) > 0. Then, the compensation scheme has the following shape: if the agent works until , he receives: ! X c=v 1 g 0 (at ) rt + K (73) t=1

for some constants K1 ; :::; KT . In addition, we can unify the two extensions of both quits and …rings. Consider the …ring model with T = 2. Suppose that the principal wishes to …re the agent if r1 2 IF , but also wishes to deter voluntary departures. Then, the contract is the one described in Proposition 9, but with K1 and K2 are simply set high enough such that the agent always receives at least his reservation utility.

G

Proofs of Mathematical Lemmas

This section contains proofs of some of the mathematical lemmas featured in the appendices of the main paper. Proof of Lemma 4 We thank Chris Evans for suggesting the proof strategy for this Lemma. We assume a < b. We …rst prove the Lemma when j (x) = 0 8 x. For a positive integer n, de…ne kn = (b a) =n, and the function rn (x) as rn (x) =

(

f (x) f (x kn ) kn

for x 2 [a + kn ; b] 0 for x 2 [a; a + kn ):

We have for x 2 (a; b], lim inf n!1 rn (x) lim inf "#0 f (x) "f (x ") 0. Rb h(x)+h(x kn ) De…ne In = a rn (x) dx. As f +h is nondecreasing and k is C 1 , f (x) kfn(x kn ) kn sup[a;b] h0 (x). Therefore, rn (x) min 0; sup[a;b] h0 (x) 8 x. Hence we can apply Fatou’s lemma, which shows: lim inf In = lim inf n!1

n!1

Next, observe that In = In =

Z

b

b kn

= f (b)

f (x) dx kn f (a)

Z

Z

Rb

a+kn

a b

b kn

a+kn

Z

b

rn (x) dx

a

f (b)

f (x) kn

dx

b

lim inf rn (x) dx

a

f (x) f (x kn ) dx kn

f (x) dx kn

Z

Z

n!1

0:

consists of telescoping sums, so:

a+kn

a

62

f (x)

f (a) kn

dx = f (b)

f (a)

Bn

An :

We …rst minorize An . From condition (ii) of the Lemma, for any " > 0, there is an such that for x 2 [a; a + ], f (x) f (a) ". For n large enough such that kn , An =

Z

a+kn

f (x)

f (a) kn

a

dx

Z

a+kn

a

" kn

dx =

> 0,

";

and so lim inf n!1 An 0. 0 for every " > 0, there exists a > 0 s.t. for We next minorize Bn . Since f 0 (b) x 2 [b ; b], (f (b) f (x)) = (b x) ". Therefore, for n su¢ ciently large so that kn , Bn =

Z

b

b kn

and so lim inf n!1 Bn Finally, since f (b) f (b)

f (b)

f (x) kn

dx

Z

b

b kn

( ") (b kn

x)

dx =

"

kn ; 2

0. f (a) = In + An + Bn , we have

f (a) = lim inf (In + An + Bn ) n!1

lim inf In + lim inf An + lim inf Bn n!1

n!1

We now prove the general case. De…ne F (x) = f (x) the above result, F (b) F (a) 0.

Rx a

n!1

0:

j (t) dt. Then, F 0 (x)

0. By

Proof of Lemma 5 Let (yn ) " x be a sequence such that f (x) yn "x x

f 0 (x) = lim

f (yn ) : yn

We can further assume that limn!1 f (yn ) exists (if not, then we can choose a subsequence ynk such that limnk !1 f (ynk ) exists and replace yn by ynk ). If limn!1 f (yn ) = f (x), Then, h f (x) h f (y) y"x x y h f (x) h f (yn ) lim yn "x x yn h f (x) h f (yn ) f (x) = lim yn "x f (x) f (yn ) x

(h f )0 (x) = lim inf

f (yn ) yn

= h0 (f (x)) f 0 (x) : If limn!1 f (yn ) < f (x), then f 0 (x) = 1, since h0 (f (x)) > 0, we still have (h f )0 (x) h0 (f (x)) f 0 (x). If limn!1 f (yn ) > f (x), then (h f )0 (x) limyn "x h f (x)x hynf (yn ) = 1, hence (h f )0 (x) h0 (f (x)) f 0 (x). 63

On the other hand, suppose (^ yn ) " x be a sequence such that h f (x) y^n "x x

(h f )0 (x) = lim

h f (^ yn ) ; y^n

and that limn!1 f (^ yn ) exists. If limn!1 f (^ yn ) = f (x), Then, h f (x) y^n "x x h f (x) = lim y^n "x f (x) h f (x) = lim y^n "x f (x)

h f (^ yn ) y^n yn ) h f (^ yn ) f (x) f (^ f (^ yn ) x y^n f (x) f (^ yn ) h f (^ yn ) lim y^n "x f (^ yn ) x y^n f (x) f (^ yn ) = h0 (f (x)) lim 5 y^n "x x y^n

(h f )0 (x) = lim

h0 (f (x)) f 0 (x) : Note that the existence of limy^n "x

h f (x) h f (^ yn ) x y^n

h f (x) h f (^ yn ) f (x) f (^ yn )

and limy^n "x

guarantees the ex-

istence of limy^n "x f (x)x yf^n(^yn ) . If limn!1 f (^ yn ) < f (x), then (h f )0 (x) = 1 h0 (f (x)) f 0 (x). If limn!1 f (^ yn ) > f (x), then f 0 (x) limy^n "x f (x)x yfn(^yn ) = 1 (h f )0 (x). Therefore, (h f )0 (x) = h0 (f (x)) f 0 (x). Proof of Lemma 6 We use f (x) + h (x) f (y) h (y) f (x) f (y) h (x) = lim inf + y"x y"x x y x y x f (x) f (y) h (x) h (y) lim inf + lim inf = f 0 (x) + h0 (x) . y"x y"x x y x y

(f + h)0 (x) = lim inf

h (y) y

When h is di¤erentiable at x, (f + h)0 (x) = lim inf y"x

f (x) x

f (y) h (x) + lim y"x y x

h (y) = f 0 (x) + h0 (x) : y

Proof of Lemma 7 We wish to prove that E [h (X)] E [h (Y )] for any concave function h. De…ne I ( ) = E [h (X + (Y X))] for 2 [0; 1], so that I 00 ( ) = E h00 (X + (Y I 0 (0) = E [h0 (X) (Y

X)) (Y

X)2

X)] = E h0 (X)

Z

0

64

0 T t dZt

;

where t = t 0 almost surely. We wish to prove I (1) I (0). Since I is t , and t 0 concave, it is su¢ cient to prove that I (0) 0. We next use some basic results from Malliavin calculus (see, e.g., Di Nunno, Oksendal and Proske (2008)). The integration by parts formula for Malliavin calculus yields: 0

Z

0

I (0) = E h (X)

T t dZt

=E

0

Z

T

(Dt h0 (X)) t dt ;

0

where Dt h0 (X) is the Malliavin derivative of h0 (X) at time t. Since ( s )s2[0;T ] is deterministic. Therefore, the calculation of Dt h0 (X) is straightforward: 0

Dt h (X)

Z

0

Dt h

T s dZs

00

=h

Z

T s dZs

t

= h00 (X)

t:

0

0

Hence, we have: 0

I (0) = E

Z

T 0

(Dt h (X)) t dt = E

T

h00 (X)

t t dt

:

0

0

Since h00 (X) 0 (because h is concave), and Therefore, I 0 (0) 0 as required.

Z

t

and

65

t

are nonnegative, we have h00 (X)

t t

0.

References [1] Di Nunno, G., B. Oksendal and F. Proske (2008): Malliavin Calculus for Lévy Processes with Applications to Finance, Springer Verlag [2] Holmstrom, B. (1979): “Moral Hazard and Observability.” Bell Journal of Economics 10, 74-91 [3] Holmstrom, B. (1982): “Moral Hazard in Teams.” Bell Journal of Economics 13, 324-340

66

No. 7578

TRACTABILITY IN INCENTIVE CONTRACTING Alex Edmans and Xavier Gabaix

FINANCIAL ECONOMICS

ABCD www.cepr.org Available online at:

www.cepr.org/pubs/dps/DP7578.asp www.ssrn.com/xxx/xxx/xxx

ISSN 0265-8003

TRACTABILITY IN INCENTIVE CONTRACTING Alex Edmans, Wharton School of Management Xavier Gabaix, NYU Stern, NBER and CEPR Discussion Paper No. 7578 November 2009 Centre for Economic Policy Research 53–56 Gt Sutton St, London EC1V 0DG, UK Tel: (44 20) 7183 8801, Fax: (44 20) 7183 8820 Email: [email protected], Website: www.cepr.org This Discussion Paper is issued under the auspices of the Centre’s research programme in FINANCIAL ECONOMICS. Any opinions expressed here are those of the author(s) and not those of the Centre for Economic Policy Research. Research disseminated by CEPR may include views on policy, but the Centre itself takes no institutional policy positions. The Centre for Economic Policy Research was established in 1983 as an educational charity, to promote independent analysis and public discussion of open economies and the relations among them. It is pluralist and nonpartisan, bringing economic research to bear on the analysis of medium- and long-run policy questions. These Discussion Papers often represent preliminary or incomplete work, circulated to encourage discussion and comment. Citation and use of such a paper should take account of its provisional character. Copyright: Alex Edmans and Xavier Gabaix

CEPR Discussion Paper No. 7578 November 2009

ABSTRACT Tractability in Incentive Contracting This paper identifies a class of multiperiod agency problems in which the optimal contract is tractable (attainable in closed form). By modeling the noise before the action in each period, we force the contract to provide sufficient incentives state-by-state, rather than merely on average. This tightly constrains the set of admissible contracts and allows for a simple solution to the contracting problem. Our results continue to hold in continuous time, where noise and actions are simultaneous. We thus extend the tractable contracts of Holmstrom and Milgrom (1987) to settings that do not require exponential utility, a pecuniary cost of effort, Gaussian noise or continuous time. The contract's functional form is independent of the noise distribution. Moreover, if the cost of effort is pecuniary (multiplicative), the contract is linear (log-linear) in output and its slope is independent of the noise distribution, utility function and reservation utility. In a two-stage contracting game, the optimal target action depends on the costs and benefits of the environment, but is independent of the noise realization. JEL Classification: D2, D3, G34 and J3 Keywords: closed forms, contract theory, dispersive order, executive compensation, incentives, principal-agent problem and subderivative Alex Edmans The Wharton School University of Pennsylvania 2428 Steinberg Hall - Dietrich Hall 3620 Locust Walk, Philadelphia PA 19104-6367 USA

Xavier Gabaix Finance Department Stern School of Business New York University 44 West 4th Street, 9-190 New York, NY 10012, USA

Email: [email protected]

Email: [email protected]

For further Discussion Papers by this author see:

For further Discussion Papers by this author see:

www.cepr.org/pubs/new-dps/dplist.asp?authorid=164005

www.cepr.org/pubs/new-dps/dplist.asp?authorid=139904

Submitted 19 November 2009 For helpful comments, we thank three anonymous referees, the Editor (Stephen Morris), Andy Abel, Frankin Allen, Heski Bar-Isaac, Patrick Cheridito, Peter DeMarzo, Ingolf Dittmann, Florian Ederer, Chris Evans, Itay Goldstein, Gary Gorton, Narayana Kocherlakota, Ernst Maug, Holger Mueller, Christine Parlour, David Pearce, Canice Prendergast, Michael Roberts, Tomasz Sadzik, Yuliy Sannikov, Nick Souleles, Rob Stambaugh, Luke Taylor, Rob Tumarkin, Bilge Yilmaz, and seminar participants at AEA, Five Star, Chicago, Columbia, Harvard-MIT Organizational Economics, Minneapolis Fed, Northwestern, NYU, Princeton, Richmond Fed, Stanford, Toulouse, WFA, Wharton and Wisconsin. We thank Andrei Savotchkine and Qi Liu for excellent research assistance. AE is grateful for the hospitality of the NYU Stern School of Business, where part of this research was carried out. XG thanks the NSF for financial support. This paper was formerly circulated under the title “Tractability and Detail-Neutrality in Incentive Contracting.”

1

Introduction

The principal-agent problem is central to many economic settings, such as employment contracts, insurance, taxation and regulation. A vast literature analyzing this problem has found that it is typically di¢ cult to solve: even in simple settings, the optimal contract can be highly complex (see, e.g., Grossman and Hart (1983)). The …rst-order approach is often invalid, requiring the use of more intricate techniques. Even if an optimal contract can be derived, it is often not attainable in closed form, which reduces tractability – a particularly important feature in applied theory models. Against this backdrop, Holmstrom and Milgrom (1987, “HM”) made a major breakthrough by showing that the optimal contract is linear in pro…ts under certain conditions. Their result has since been widely used by applied theorists to justify assuming a linear contract, which leads to substantial tractability. However, HM emphasized that their result only holds under exponential utility, a pecuniary cost of e¤ort, Gaussian noise, and continuous time. These assumptions may not hold in a number of situations –for example, there is ample evidence of decreasing absolute risk aversion, and many e¤ort decisions do not involve a monetary expenditure (e.g. exerting e¤ort rather than shirking, or forgoing private bene…ts). In addition, in certain settings, the modeler may wish to use discrete time or binary noise for simplicity. Can tractable contracts be achieved in broader settings? When allowing for alternative utility functions or noise distributions, do these details a¤ect the form of the optimal contract? What factors do and do not matter for the incentive scheme? These questions are the focus of our paper. We consider a discrete-time, multiperiod model where the agent consumes only in the …nal period. We …rst solve for the cheapest contract that implements a given, but possibly time-varying, path of target e¤ort levels. The optimal incentive scheme is tractable, i.e. attainable in closed form. The key source of tractability is our timing assumption that, in each period, the agent …rst observes noise and then exerts e¤ort, before observing the noise in the next period. This is similar to theories in which the agent observes total cash ‡ow before deciding how much to divert (e.g. Lacker and Weinberg (1989), DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007)). Since the agent knows the noise realization when taking his action, incentive compatibility requires the agent’s marginal incentives to be su¢ cient state-by-state (i.e. for every possible noise outcome), which tightly constrains the set of admissible contracts. By contrast, if the action were taken before the noise, incentive compatibility would only pin down marginal incentives on average. There are many possible contracts that induce incentive compatibility on average, and the problem is complex as the principal must solve for the cheapest contract out of this continuum. Note that the timing assumption does not change the fact that the agent faces uncertainty when deciding his e¤ort level since each action, except the …nal one, continues to be followed by noise. Even in a one-period model, the agent faces risk after signing the contract. The analysis demonstrates what features of the environment do and do not matter for the

2

optimal implementation contract. The contract’s functional form is independent of the agent’s noise distribution and reservation utility, i.e. it can be written without references to these parameters. The functional form depends only on how the agent trades o¤ the bene…ts of cash against the cost of providing e¤ort. Moreover, the contract’s slope, as well as its functional form, is independent of the agent’s utility function, reservation utility and noise distribution in two cases. First, if the cost of e¤ort is pecuniary as in HM (i.e. can be expressed as a subtraction to cash pay), the incentive scheme is linear in output regardless of these parameters, even if the cost of e¤ort is itself non-linear. Second, if the agent’s preferences are multiplicative in cash and e¤ort, the contract is independent of utility and log-linear, i.e. the percentage change in pay is linear in output. This robustness contrasts with many classical principal-agent models (e.g. Grossman and Hart (1983)), where even the implementation contract is contingent upon many speci…c features of the contracting situation. This poses practical di¢ culties, as some of the important determinants are di¢ cult for the principal to observe and thus use to guide the contract, such as the noise distribution and agent’s utility function. Our results suggest that, under some speci…cations, the implementation contract is robust to such parametric uncertainty. Closed-form solutions allow the economic implications of a contract to be transparent. We consider a application to CEO incentives to demonstrate the implications that can ‡ow from a tractable contract structure. For CEOs, the appropriate output measure is the percentage stock return, and multiplicative preferences are theoretically motivated by Edmans, Gabaix and Landier (2009). The percentage change in pay is thus linear in the percentage change in …rm value, i.e. the relevant measure of incentives is the elasticity of pay with respect to …rm value. This analysis provides a theoretical justi…cation for using elasticities to measure incentives, a metric previously advocated by Murphy (1999) on empirical grounds. The above results are derived under a general contracting framework, where the contract may depend on messages sent by the agent to the principal, and also be stochastic. Using recent advances in continuous-time contracting (Sannikov (2008)), we then show that the contract retains the same form in a continuous-time model where noise and e¤ort occur simultaneously. This consistency suggests that, if underlying reality is continuous time, it is best approximated in discrete time by modeling noise before e¤ort in each period. We next allow the target e¤ort path to depend on the noise realizations. The optimal contract now depends on messages sent by the agent regarding the noise. However, it remains tractable, for a given “action function” that links the observed noise to the principal’s recommended e¤ort level. We then solve for the optimal action function chosen by the principal. In classical agency models, the chosen action is the result of a trade-o¤ between the bene…ts of e¤ort (which are increasing in …rm size) and its costs (direct disutility plus the risk imposed by incentives, which are of similar order of magnitude to the agent’s wage). We show that, if the output under the agent’s control is su¢ ciently large compared to his salary (e.g. the agent is a CEO who a¤ects total …rm value), these trade-o¤ considerations disappear: the bene…ts of e¤ort swamp the costs. Thus, maximum e¤ort is optimal, regardless of the noise outcome. 3

The “maximum e¤ort principle”1 , when applicable, signi…cantly increases tractability, since it removes the need to solve the trade-o¤ required to derive the optimal e¤ort level when it is interior. Indeed, jointly deriving the optimal e¤ort level and the e¢ cient contract that implements it can be highly complex. Thus, many contracting papers focus exclusively (e.g. Dittmann and Maug (2007) and Dittmann, Maug and Spalt (2009)) or predominantly (e.g. Grossman and Hart (1983), Lacker and Weinberg (1989), Biais et al. (2009), He (2009a, 2009b)) on implementing a …xed target e¤ort level; see also the overview of the literature in Chapters 4 and 8 in La¤ont and Martimort (2002). Our result rationalizes this approach: if maximum e¤ort is always e¢ cient, the problem of deriving optimal e¤ort has a simple solution –there is no trade-o¤ to be simultaneously tackled and the analysis can focus on the cheapest contract to implement this e¤ort level. Finally, we allow the principal to choose the maximum productive e¤ort level depending on the costs and bene…ts of the environment. We extend the model to a two-stage game. In the …rst stage, the principal chooses the maximum productive e¤ort level, e.g. by selecting the size of the plant. In the second stage, the contract is played out as before –the principal wishes the agent to run the plant (whatever its size) with maximum e¢ ciency. As in standard models, the e¤ort level set in the …rst stage is typically decreasing in the agent’s risk aversion, cost of e¤ort and noise dispersion. Thus, our setup allows for contracts that are simple (since the maximum e¤ort principle applies in the second stage and so solving for a trade-o¤ is not required) yet still respond to the costs and bene…ts of the environment and thus generate comparative static predictions. In sum, our analysis generates a set of su¢ cient conditions to obtain tractable contracts. For the implementation contract to be tractable, modeling the action after the noise is su¢ cient; for the full contract that also solves for the optimal e¤ort level, ex-post actions plus a high bene…t of e¤ort are su¢ cient – in turn, large …rm size is su¢ cient (although not necessary) for the latter. These su¢ cient conditions are quite di¤erent from the HM assumptions of exponential utility, a pecuniary cost of e¤ort, Gaussian noise, and continuous time, and so may be satis…ed in many settings in which the HM assumptions do not hold and tractability was previously believed to be unattainable. We achieve simple contracts in other settings than HM due to a di¤erent modeling setup. In a dynamic setting, high prior period outcomes increase the agent’s wealth and distort the current period decision through two “wealth e¤ects.” First, higher wealth a¤ects the agent’s current risk aversion and thus e¤ort choice. HM assume exponential utility to remove this e¤ect. Second, higher wealth reduces the agent’s marginal utility of money; if the marginal cost of e¤ort is unchanged, the agent has fewer incentives to exert e¤ort. This problem occurs with any risk-averse utility function, including exponential utility. HM assume that the cost of e¤ort is pecuniary, so that it also declines when wealth increases. HM require these two assumptions 1

We allow for the agent to exert e¤ort that does not bene…t the principal. The “maximum e¤ort principle” refers to the maximum productive e¤ort that the agent can undertake to bene…t the principal.

4

to remove the intertemporal link between periods and allow the multiperiod problem to collapse into a succession of identical static problems. Even the single-period problem remains potentially complex, since many contracts satisfy the incentive compatibility condition on average. HM address this by giving the agent substantial freedom – rather than simply selecting the mean return of the …rm, he has control over the probabilities of N di¤erent states of nature.2 This freedom simpli…es the contracting problem by reducing the set of allowable contracts. However, this formulation is more cumbersome since e¤ort is the choice of a probability vector, and is thus relatively seldom used in applied theory models. We model e¤ort as a scalar that a¤ects the …rm’s mean return, because this formulation is most commonly used in theoretical applications owing to its simplicity. We instead give the agent freedom by specifying the noise before the action – a choice that is not possible when e¤ort involves the selection of probabilities, since noise unavoidably follows the action. In addition to achieving tractability by forcing the contract to hold state-by-state, the timing assumption also removes the need for exponential utility by allowing the multiperiod model to be solved by backward induction, so that it becomes a succession of single-period problems. In the single-period problem, the noise is observed before the action – thus, the agent’s risk aversion is unimportant and exponential utility is not required. A potential intertemporal link remains since high past outcomes, or high current noise, mean that the agent already expects high consumption and thus has a lower incentive to exert e¤ort, if he exhibits diminishing marginal utility. This issue is present in the Mirrlees (1974) contract if the agent can observe past outcomes. Put di¤erently, in the single-period problem, the agent does not face risk (as the noise is known) but faces distortion (as the noise a¤ects his e¤ort incentives). The optimal contract must address these issues: if the utility function is concave, the contract is convex so that, at high levels of consumption, the agent is awarded a greater number of dollars for exerting e¤ort, to o¤set the lower marginal utility of each additional dollar. Allowing for convex contracts also allows us to drop the second critical assumption of a pecuniary cost of e¤ort. Even if high wealth reduces the marginal utility of cash but not the marginal cost of e¤ort, incentives are preserved because the contract is steeper at high wealth levels. In addition to its results, the paper’s proofs import and extend some mathematical techniques that are relatively rare in economics and may be of use in future models. We use the subderivative, a generalization of the derivative that allows for quasi …rst-order conditions even if the objective function is not everywhere di¤erentiable. This concept is related to Krishna and Maenner’s (2001) use of the subgradient, although the applications are quite di¤erent. It allows us to avoid the …rst-order approach, and so may be useful for models where su¢ cient conditions for the …rst-order approach cannot be veri…ed.3 We also use the notion of “relative 2

This speci…cation refers to the discrete-time version of the HM model, as this is most comparable to our setting. In that version, the contract is linear in accounts, although not linear in pro…ts. 3 See Rogerson (1985) for su¢ cient conditions for the …rst-order approach to be valid under a single signal, and Jewitt (1988) for situations in which the principal can observe multiple signals. Schaettler and Sung (1993)

5

dispersion”to prove that the incentive compatibility constraints bind, i.e. the principal imposes the minimum slope that induces e¤ort. We show that the binding contract is less dispersed than alternative solutions, constituting e¢ cient risk sharing. A similar argument rules out stochastic contracts, where the payout is a random function of output.4 We extend a result from Landsberger and Meilijson (1994), who use relative dispersion in another economic setting. This paper builds on a rich literature on tractable multiperiod agency problems. HM show the optimal contract is linear in pro…ts under exponential utility and a pecuniary cost of e¤ort, if the agent controls only the drift of the process and time is continuous; they show that this result does not hold in discrete time. A number of papers have extended their result to more general settings, although all continue to require exponential utility and a pecuniary cost of e¤ort. In Sung (1995) and Ou-Yang (2003), the agent also controls the di¤usion of the process in continuous time. Hellwig and Schmidt (2002) achieve linearity in discrete time, under the additional assumptions that the agent can destroy pro…ts before reporting them to the principal, and that the principal can only observe output in the …nal period. Our setting allows the principal to observe signals in each period. Mueller (2000) shows that linear contracts are not optimal in HM if the agent can only change the drift at discrete points, even if these points are numerous and so the model closely approximates continuous time. Our modeling of noise before the action is most similar to models in which the agent can observe total cash ‡ow before deciding how much to divert. Lacker and Weinberg (1989) show that the optimal contract to deter all diversion (the analog of maximum e¤ort) is piecewise linear, regardless of the noise distribution and utility function. Their core result is similar to a speci…c case of our Theorem 1, restricted to a pecuniary cost of e¤ort and a single period. In DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007), the optimal contract is linear because the agent is risk-neutral – therefore, there is no issue with wealth a¤ecting risk aversion (which is always zero) nor the marginal bene…t of diversion (which is constant for each dollar diverted). The risk-neutral version of Garrett and Pavan (2009) also predicts linear contracts. Our setting considers risk aversion, where high past output reduces the marginal bene…t of e¤ort, thus requiring a convex contract to preserve incentives. This paper proceeds as follows. In Section 2 we derive tractable contracts in both discrete and continuous time, given a target path of e¤ort levels. Section 3 allows the e¤ort level to depend on the noise realization, derives conditions under which maximum productive e¤ort is optimal for all noise outcomes, and allows the principal to determine this maximum according to the environment. Section 4 concludes. The Appendix contains proofs and other additional materials; further peripheral material is in the Online Appendix. derive su¢ cient conditions for the …rst-order approach to be valid in a large class of principal-agent problems, of which HM is a special case. 4 With separable utility, it is simple to show that the constraints bind: the principal o¤ers the least risky contract that achieves incentive compatibility. With non-separable utility, introducing additional randomization by giving the agent a riskier contract than necessary may be desirable (Arnott and Stiglitz (1988)) –an example of the theory of second best. We use the concept of relative dispersion to prove that constraints bind.

6

2

The Core Model

2.1

Discrete Time

We consider a T -period model; its key parameters are summarized in Table 1. In each period t, the agent observes noise t , takes an unobservable action at , and then observes the noise in period t + 1. The action at is broadly de…ned to encompass any decision that bene…ts output but is personally costly to the principal. The main interpretation is e¤ort, but it can also refer to rent extraction: low at re‡ects cash ‡ow diversion or the pursuit of private bene…ts. We assume that noises 1 ; :::; T are independent with interval support with interior t ; t , where the bounds may be in…nite, and that 2 ; :::; t have log-concave densities.5 We require no other distributional assumption for t ; in particular, it need not be Gaussian. The action space A has interval support, bounded below and above by a and a. We allow for both open and closed action sets and for the bounds to be in…nite. After the action is taken, a veri…able signal r t = at +

t

(1)

is publicly observed at the end of each period t. Insert Table 1 about here Our assumption that t precedes at is featured in models in which the agent sees total output before deciding how much to divert (e.g. Lacker and Weinberg (1989), DeMarzo and Fishman (2007), Biais et al. (2007)), or observes the “state of nature”before choosing e¤ort (e.g. Harris and Raviv (1979), Sappington (1983), Baker (1992), and Prendergast (2002)6 ). Note that this timing assumption does not make the agent immune to risk –in every period, except the …nal one, his action is followed by noise. Even in a one-period model, the agent bears risk as the noise is unknown when he signs the contract. In Section 2.2 we show that the contract has the same functional form in continuous time, where and a are simultaneous. While the timing assumption extends the model’s applicability to a cash ‡ow diversion setting (an application that is not possible if noise follows the action), a limitation is that cannot be interpreted as measurement error. 5

A random variable is log-concave if it has a density with respect to the Lebesgue measure, and the log of this density is a concave function. Many standard density functions are log-concave, in particular the Gaussian, uniform, exponential, Laplace, Dirichlet, Weibull, and beta distributions (see, e.g., Caplin and Nalebu¤ (1991)). On the other hand, most fat-tailed distributions are not log-concave, such as the Pareto distribution. 6 In such papers, the optimal action typically depends on the state of nature. We allow for such dependence in Section 3.1.

7

In period T , the principal pays the agent cash of c.7 The agent’s utility function is "

E u v (c)

T X t=1

!#

g (at )

:

(2)

g represents the cost of e¤ort, which is increasing and weakly convex. u is the utility function and v is the felicity8 function which denotes the agent’s utility from cash; both are increasing and weakly concave. g, u and v are all twice continuously di¤erentiable. We specify functions for both utility and felicity to maximize the generality of the setup. For example, the util1 ity function ce g(a) = (1 ) is commonly used in macroeconomics (see e.g. Cooley and Prescott (1995)), which entails u (x) = e(1 )x = (1 ) (with > 1 so that u is concave; when = 1, the limit is understood as u (x) = x) and v (x) = ln x. The case u(x) = x denotes additively separable preferences; v(c) = ln c generates multiplicative preferences. If v(c) = c, the cost of e¤ort is expressed as a subtraction to cash pay. This is appropriate if e¤ort represents an opportunity cost of foregoing an alternative income-generating activity (e.g. outside consulting), or involves a …nancial expenditure. HM assume u(x) = e x and v(c) = c. The only assumption that we make for the utility function u is that it exhibits nonincreasing absolute risk aversion (NIARA), i.e. u00 (x) =u0 (x) is nonincreasing in x. Many common utility functions (e.g. constant absolute risk aversion u (x) = e x and constant relative risk aversion u (x) = x1 = (1 ), > 0) exhibit NIARA. This assumption turns out to be su¢ cient to rule out randomized contracts. The agent’s reservation utility is given by u 2 Im u, where Im u is the image of u, i.e. the range of values taken by u. We assume that Im v = R so that we can apply the v 1 function to any real number.9 We take an optimal contracting approach that imposes no restrictions on the contracting space available to the principal, so the contract e c( ) can be stochastic, nonlinear in the signals rt , and depend on messages Mt sent by the agent. By the revelation principle, we can assume that the the space of messages Mt is R and that the principal wishes to induce truth-telling by the agent. The full timing is as follows: 1. The principal proposes a (possibly stochastic) contract e c (r1 ; :::; rT ; M1 ; :::; MT ) : 2. The agent agrees to the contract or receives his reservation utility u. 3. The agent observes noise 4. The signal r1 =

1

1,

sends the principal a message M1 , then exerts e¤ort a1 :

+ a1 is publicly observed.

7

If the agent quits before time T , he receives a very low wage c. We note that the term “felicity” is typically used to denote one-period utility in an intertemporal model. We use it in a non-standard manner here to distinguish it from the utility function u. 9 This assumption could be weakened. With K de…ned as in Theorem 1, it is su¢ cient to assume that there exists a value of K which makes the participation constraint a “threat consumption” which deters Pbind, andP 0 the agent from exerting very low e¤ort, i.e. inf c v (c) inf at t g (at ) t + at + K. t g (a ) 8

8

5. Steps (3)-(4) are repeated for t = 2; :::; T . 6. The principal pays the agent e c (r1 ; :::; rT ; M1 ; :::; MT ).

Throughout most of the paper, we abstract from imperfect commitment problems and focus on a single source of market imperfection: moral hazard. This assumption is common in the dynamic moral hazard literature: see, e.g., Rogerson (1985), HM, Spear and Srivastava (1987), Phelan and Townsend (1991), Biais et al. (2007, 2009). The Online Appendix extends the model to accommodate quits and …rings. As in Grossman and Hart (1983), in this section we …x the path of e¤ort levels that the principal wants to implement at (at )t=1;::;T , where at > a and at may be time-varying.10 An admissible contract gives the agent an expected utility of at least u and induces him to take path (at ) and truthfully report noises ( t )t=1;::;T . The principal is risk-neutral, and so the optimal contract is the admissible contract with the lowest expected cost E [e c]. Section 3 studies the optimal e¤ort level. We now formally de…ne the principal’s program. Let Ft be the …ltration induced by ( 1 ; :::; t ), the noise revealed up to time t. The agent’s policy is (a; M ) = (a1 ; :::; aT ; M1 ; :::; MT ), where at and Mt are Ft measurable. at is the e¤ort taken by the agent if noise ( 1 ; :::; t ) has been realized, and Mt is a message sent by the agent upon observing ( 1 ; :::; t ). Let S denote the space of such policies, and (S) the set of randomized policies. De…ne (a ; M ) = (a1 ; :::; aT ; M1 ; :::; MT ) as the policy of exerting e¤ort at at time t and sending the truthful message Mt ( 1 ; :::; t ) = t . The program is given below: Program 1 The principal chooses a contract e c (r1 ; :::; rT ; M1 ; :::; MT ) and a Ft measurable message policy (Mt )t=1:::;T , that minimizes expected cost: min E [e c (a1 + e c( )

1 ; :::; aT

+

(3)

T ; M1 ; :::; MT )] ;

subject to the following constraints: IC: (at ; Mt )t=1:::T 2 arg

max

(a;M )2 (S)

"

E u v (e c (a1 +

+

T ; M1 ; :::; MT ))

s=1

"

T X

IR: E u v (e c ( ))

10

1 ; :::; aT

T X

t=1

If at = a, then a ‡at wage induces the optimal action.

9

!#

g (at )

u:

!#

g (as )

(4) (5)

If the analysis is restricted to message-free contracts, (4) implies that the time-t action at is given by: "

c (a1 + 8 1 ; :::; t ; at 2 arg max E u v (e at

1 ; :::; at

+

t ; :::; aT

+

T ))

g (at )

T X

s=1;s6=t

!

g (as )

j (6)

11

Theorem 1 below describes our solution to Program 1.

Theorem 1 (Optimal contract, discrete time). The following contract is optimal. The agent is paid ! T X 1 0 c=v g (at ) rt + K , (7) t=1

!# 0 ) r + g (a t t t = where K is a constant that makes the participation constraint bind (E u P K t g (at ) u). The functional form (7) is independent of the utility function u, the reservation utility u, and the distribution of the noise ; these parameters a¤ect only the scalar K. The optimal contract is deterministic and does not require messages. In particular, if the target action is time-independent (at = a 8 t), the contract c=v is optimal, where r =

PT

t=1 rt

1

"

(g 0 (a ) r + K)

P

(8)

is the total signal.

Proof. (Heuristic). The Appendix presents a rigorous proof that rules out stochastic contracts and messages, and does not assume that the contract is di¤erentiable. Here, we give a heuristic proof by induction on T that conveys the essence of the result for deterministic message-free contracts, using …rst-order conditions and assuming at < a. We commence with T = 1. Since 1 is known, we can remove the expectations operator from the IC condition (6). Since u is an increasing function, it also drops out to yield: a1 2 arg max v (c (a1 + a1

1 ))

g (a1 ) :

(9)

The …rst-order condition is: v 0 (c (a1 +

1 )) c

0

(a1 +

1)

g 0 (a1 ) = 0:

(10)

11 Theorem 1 characterizes a contract that is optimal, i.e. solves Program 1. Strictly speaking, there exist other optimal contracts which pay the same as (7) on the equilibrium path, but take di¤erent values for returns that are not observed on the equilibrium path. Note that the contract in Theorem 1 allows c to be negative. Limited liability could be incorporated, at the cost of additional notational complexity, by imposing a lower bound on or adding a …xed constant to the signal.

10

1 ; :::; t

#

:

Therefore, for all r1 , v 0 (c (r1 )) c0 (r1 ) = g 0 (a1 ) ; which integrates over

1

to v (c (r1 )) = g 0 (a1 ) r1 + K

(11)

for some constant K. Contract (11) must hold for all r1 that occurs with non-zero probability, i.e. for r1 2 a1 + 1 ; a1 + 1 . We will proceed now by induction on the total number of periods T : we now show that, if the result holds for T , it also holds for T + 1. Let V (r1 ; :::; rT +1 ) v (c (r1 ; :::; rT +1 )) denote the indirect felicity function, i.e. the contract in terms of felicity rather than cash. At t = T + 1, the IC condition is: aT +1 2 arg max V (r1 ; :::; rT ; aT +1

T +1 + aT +1 )

g (aT +1 )

T X

(12)

g (at ) :

t=1

Applying the result for T = 1, to induce aT +1 at T + 1, the contract must be of the form: V (r1 ; :::; rT ; rT +1 ) = g 0 aT +1 rT +1 + k (r1 ; :::; rT ) ;

(13)

where the integration “constant” now depends on the past signals, i.e. k (r1 ; :::; rT ). In turn, k (r1 ; :::; rT ) is chosen to implement a1 ; :::; aT viewed from t = 0, when the agent’s utility is: "

E u k (r1 ; :::; rT ) + g 0 aT +1 rT +1

g aT +1

T X

!#

g (at )

t=1

:

De…ning u b (x) = E u x + g 0 aT +1 rT +1

g aT +1

(14)

;

the principal’s problem is to implement a1 ; :::; aT with a contract k (r1 ; :::; rT ), given a utility function " !# T X E u b k (r1 ; :::; rT ) g (at ) : t=1

Applying the result for T , the contract must have the form k (r1 ; :::; rT ) = for some constant K. Combining this with (11), the contract must satisfy: V (r1 ; :::; rT ; rT +1 ) =

T +1 X t=1

11

g 0 (at ) rt + K:

PT

t=1

g 0 (at ) rt + K

(15)

for (rt ) that occurs with non-zero probability (i.e. (r1 ; :::; rT ) 2 1

PT +1

0

T Y

at +

t

; at +

t

. The

t=1

associated pay is c = v t=1 g (at ) rt + K , as in (7). Conversely, any contract that satis…es (15) is incentive compatible. Theorem 1 yields a closed-form contract for any T and (at ). The Theorem also clari…es the parameters that do and do not matter for the contract’s functional form. It depends only on the felicity function v and the cost of e¤ort g, i.e. how the agent trades o¤ the bene…ts of cash against the costs of providing e¤ort, and is independent of the utility function u, the reservation utility u, and the distribution of the noise . Even though these parameters do not a¤ect the contract’s functional form, in general they will a¤ect its slope via their impact on the scalar K. However, if v(c) = c (the cost of e¤ort is pecuniary) as assumed by HM, the contract’s slope is also independent of u, u and : it is linear, regardless of these parameters. The linear contracts of HM can thus be achieved in settings that do not require exponential utility, Gaussian noise or continuous time. Note that, even if the cost of e¤ort is pecuniary, it remains a general, possibly non-linear function g (at ). The origins of the contract’s tractability can be seen in the heuristic proof. We …rst consider T = 1. Since 1 is known, the expectations operator can be removed from (6). u then drops out to yield (9). The speci…c form of u is irrelevant –all that matters is that it is monotonic, and so it is maximized by maximizing its argument. In particular, exponential utility is not required –the agent’s attitude to risk does not matter as 1 is known. In turn, (9) yields the …rst-order condition (10), which must hold for every possible realization of 1 , i.e. state-by-state. This pins down the slope of the contract: for all 1 , the agent must receive a marginal felicity of g 0 (a1 ) for a one unit increment to the signal r1 . The principal’s only degree of freedom is the constant K, which is itself pinned down by the participation constraint. By contrast, if 1 followed the action, and assuming linear u for simplicity, (10) would be E [v 0 (c (r1 )) c0 (r1 )] = g 0 (a1 ) :

(16)

This …rst-order condition only determines the agent’s marginal incentives on average, rather than state-by-state. There are multiple contracts that will satisfy (10) and implement a1 , and the problem is signi…cantly more complex as the principal must solve for the cheapest contract out of this continuum. By giving the agent greater ‡exibility in the action space (by allowing him to respond to 1 ), our timing assumption simpli…es the contracting problem by tightly constraining the set of incentive compatible contracts. This is similar to the intuition behind the linear contracts of HM, who give the agent ‡exibility by granting him control over not just the mean signal, but the probability of each realization. Equation (8) shows that, if the target action (and thus marginal cost of e¤ort) is constant, incentives must be constant time-by-time P as well as state-by-state, and so only aggregate performance (r = Tt=1 rt ) matters. 12

Even though all noise is known when the agent takes his action, it is not automatically irrelevant. First, since the agent does not know 1 when he signs the contract, he is subject to risk and so the …rst-best is not achieved. Second, the noise realization has the potential to undo incentives. If 1 is high, r1 and thus c will already be high; a high u has the same e¤ect. If the agent exhibits diminishing marginal felicity (i.e. v is concave), he will have lower incentives to exert e¤ort. Put di¤erently, at the time the agent takes his action, he does not face risk (as 1 is known) but faces distortion (as 1 a¤ects his e¤ort incentives). The optimal contract must address this problem. It does so by being convex, via the v 1 transformation: if noise is high, it gives a greater number of dollars for exerting e¤ort (@[email protected] ), to exactly o¤set the lower marginal felicity of each dollar (v 0 (c)). Therefore, the marginal felicity from e¤ort remains v 0 (c)@[email protected] = g 0 (a1 ), and incentives are preserved regardless of u or 1 . If the cost of e¤ort is pecuniary (v(c) = c), v 1 (c) = c and so no transformation is needed. Since both the costs and bene…ts of e¤ort are in monetary terms, high 1 reduces them equally. Thus, incentives are unchanged even with a linear contract. The idea of subjecting the agent to a constant incentive pressure is also similar to HM. However, in HM, the constant incentive pressure involves giving the agent a constant increase in cash for an increase in the signal. Here, the agent is given a constant increase in felicity, v 0 (c (r1 )) c0 (r1 ). This generalization allows us to drop the assumption of a pecuniary cost of e¤ort, in which case the contract is non-linear. In the cash ‡ow diversion models of DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007), the optimal contract is linear because the agent is risk-neutral. His utility rises by a constant amount for each dollar diverted, and so the optimal contract must give him a constant share of output. Lacker and Weinberg (1989) achieve a (piecewise) linear contract with general utility functions and noise distributions, under a pecuniary cost of e¤ort and for T = 1. We extend their result to general T and a non-pecuniary cost of e¤ort. We now move to T > 1. In all periods t < T , the agent is now exposed to risk, since he does not know future noise realizations when he chooses at . Much like the e¤ect of a high current noise realization, if the agent expects future noise to be high, his incentives to exert e¤ort will be reduced. This would typically require the agent to integrate over future noise realizations when choosing at , leading to high complexity. Here the unknown future noise outcomes do not matter, as can be seen in the heuristic proof. Before T + 1, T +1 is unknown. However, (13) shows that the unknown T +1 enters additively and does not a¤ect the incentive constraints of the t = 1; :::; T problems –regardless of what T +1 turns out to be, the contract must give the agent a marginal felicity of g 0 (at ) for exerting e¤ort at t.12 Our timing assumption thus allows us to solve the multiperiod problem via backward induction, reducing it to a succession of one-period problems, each of which can be solved tractably. 12

This can be most clearly seen in the de…nition of the new utility function (14), which “absorbs” the T + 1 period problem.

13

Even though we can consider each problem separately, the periods remain interdependent. Much like the current noise realization, past outcomes may a¤ect the current e¤ort choice. The Mirrlees (1974) contract punishes the agent if …nal output is below a threshold. Therefore, if the agent can observe past outcomes, he will shirk if interim output is high. This complexity distinguishes our multiperiod model from a static multi-action model, where the agent chooses T actions simultaneously. As in HM, and unlike in a multi-action model, here the agent observes past outcomes when taking his current action, and can vary his action in response. HM assume exponential utility and a pecuniary cost of e¤ort to remove such “wealth e¤ects”and eliminate the intertemporal link between periods. We instead ensure that past outcomes do not distort incentives via the above v 1 transformation, and so do not require either assumption. The Appendix proves that, even though the agent privately observes t , there is no need for him to communicate it to the principal. Since at is implemented for all t , there is a one-to-one correspondence between rt and t on the equilibrium path. The principal can thus infer t from rt , rendering messages redundant. The Appendix also rules out randomized contracts. There are two e¤ects of randomization. First, it leads to ine¢ cient risk-sharing, for any concave u. Second, changing the reward for e¤ort from a certain payment to a lottery may increase or decrease his e¤ort incentives.13 We show that with NIARA utility, this second e¤ect is negative. Thus, both e¤ects of randomization are undesirable, and deterministic contracts are unambiguously optimal. The proof makes use of the independence of noises and the log-concavity of 2 ; :::; T . While these assumptions, combined with NIARA utility, are su¢ cient to rule out randomized contracts, they may not be necessary. In future research, it would be interesting to explore whether randomized contracts can be ruled out in broader settings.14 In addition to allowing for stochastic contracts, the above analysis also allows for at = a, under which the IC constraint is an inequality. Therefore, the contract in (7) only provides a lower bound on the contract slope. A sharper-than-necessary contract has a similar e¤ect to a stochastic contract, since it subjects the agent to additional risk. Again, the combination of NIARA and independent and log-concave noises is su¢ cient rule out such contracts. If the analysis is restricted to deterministic contracts and at < a 8 t, the contract in (7) is the only incentive-compatible contract (for the signal values realized on the equilibrium path). We can thus relax the above three assumptions. This result is stated in Proposition 1 below. Proposition 1 (Optimal deterministic contract, at < a 8 t). Consider only deterministic contracts and at < a 8 t. Relax the assumptions of NIARA utility, independent noises, and 13

See Arnott and Stiglitz (1988) for detail on how randomization can sometimes be desirable – if low effort leads to a random payo¤, this may induce the agent to induce e¤ort. They derive su¢ cient conditions under which randomization is suboptimal. Our conditions to guarantee the suboptimality of random contracts generalize their results to broader agency problems (their setting focuses on insurance). 14 For instance, consider T = 2. We only require that u b (x) as de…ned in (43) exhibits NIARA. The concavity of 2 is su¢ cient, but unnecessary for this. Separately, if NIARA is violated, the marginal cost of e¤ort falls with randomization. However, this e¤ect may be outweighed by the ine¢ cient risk-sharing, so randomized contracts may still be dominated.

14

log-concave noises for

2 ; :::; T .

Any incentive-compatible contract takes the form c=v

1

T X

g 0 (at ) rt + K

t=1

!

(17)

;

where K is a constant. The optimal deterministic contract features a K that makes the agent’s participation constraint bind. Proof. See Appendix. The following Remark states that the contract’s incentive compatibility is robust to the timing assumption. In particular, if noise follows the action in each period, the contract in Theorem 1 continues to implement the target actions – since it provides su¢ cient incentives state-by-state, it automatically does so on average. However, we can no longer show that it is optimal, since there are many other contracts that provide su¢ cient incentives on average. Remark 1 (Robustness of the contract’s incentive compatibility to timing). For any timing of the noise ( t )t=1:::T (i.e. regardless of whether it follows or precedes at in each period), the contract in Theorem 1 is incentive compatible and implements (at )t=1;::;T . Indeed, given the contract, the agent’s utility is: u

T X

g 0 (at ) (at +

t)

+K

t=1

T X t=1

!

g (at ) ;

so that, regardless of the timing of ( t )t=1:::T , the agent maximizes his utility by taking action at = at , as it solves maxat g 0 (at ) at g (at ). Closed-form solutions allow the economic implications of a contract to be transparent. We close this section by considering two speci…c applications of Theorem 1 to executive compensation, to highlight the implications that can be gleaned from a tractable contract structure. While contract (7) can be implemented for any informative signal r, the …rm’s log equity return is the natural choice of r for CEOs, since they have a …duciary duty to maximize shareholders value. When the cost of e¤ort is pecuniary (v (c) = c), Theorem 1 implies that the CEO’s dollar pay c is linear in the …rm’s return r. Hence, the relevant incentives measure is the dollar change in CEO pay for a given percentage change in …rm value (i.e. “dollar-percent” incentives), as advocated by Hall and Liebman (1998).15 Another common speci…cation is v(c) = ln c, in which case the CEO’s utility function (2) now becomes, up to a monotonic (logarithmic) transformation: E U ce

g(a)

15

U;

(18)

This incentive measure refers to “ex ante” incentives, i.e. how much the CEO’s pay will change over the next year if the stock return over the next year increases by one percentage point.

15

where u (x) U (ex ) and U ln u is the CEO’s reservation utility. Utility is now multiplicative in e¤ort and cash; Edmans, Gabaix and Landier (2009) show that multiplicative preferences are necessary to generate empirically consistent predictions for the scaling of various measures of CEO incentives with …rm size. Thus, the ability to drop the HM assumption of v (c) = c becomes valuable. Applying Theorem 1 with T = 1 for simplicity, the optimal contract becomes ln c = g 0 (a )r + K:

(19)

The contract prescribes the percentage change in CEO pay for a percentage change in …rm value, i.e. “percent-percent” incentives; this slope is independent of the utility function U and the noise distribution. Murphy (1999) advocated this elasticity measure over alternative incentive measures (such as “dollar-percent”incentives) on two empirical grounds: it is invariant to …rm size, and …rm returns have much greater explanatory power for percentage than dollar changes in pay. However, he notes that “elasticities have no corresponding agency-theoretic interpretation.”The above analysis shows that elasticities are the theoretically justi…ed measure under multiplicative preferences, for any utility function. This result extends Edmans et al. who advocated “percent-percent”incentives in a risk-neutral, one-period model.

2.2

Continuous Time

This section shows that the contract has the same tractable form in continuous time, where actions and noise are simultaneous. This consistency suggests that, if reality is continuous time, it is best approximated in discrete time by modeling noise before e¤ort in each period. At every instant t, the agent takes action at and the principal observes signal rt , where rt =

Z

t

as ds +

(20)

t;

0

Rt Rt = 0 s dZs + 0 s ds, Zt is a standard Brownian motion, and The agent’s utility function is: t

E u v (c)

Z

t

> 0 and

t

are deterministic.

T

g (at ) dt

:

(21)

0

The principal observes the path of (rt )t2[0;T ] and wishes to implement a deterministic action (at )t2[0;T ] at each instant. She solves Program 1 with utility function (21). The optimal contract is of the same tractable form as Theorem 1. Theorem 2 (Optimal contract, continuous time). The following contract is optimal. The agent is paid Z T 1 c=v g 0 (at ) drt + K , (22) 0

16

"

where K is a constant that makes the participation constraint bind (E u u).

RT 0

g 0 (at ) drt + K RT g (at ) dt 0

!#

In particular, if the target action is time-independent (at = a 8 t), the contract c=v

1

(g 0 (a ) rT + K)

(23)

is optimal. Proof. See Appendix. To highlight the link with the discrete time case, consider the model of Section 2.1 and P P P de…ne r = Tt=1 rt = Tt=1 at + Tt=1 t . Taking the continuous time limit of Theorem 1 gives Theorem 2.

2.3

Discussion: What is Necessary for Tractable Contracts?

The framework considered thus far shows that tractable implementation contracts can be achieved without requiring exponential utility, a pecuniary cost of e¤ort, continuous time or Gaussian noise. However, it has still imposed a number of restrictions. We now discuss the features that are essential for our contract structure, inessential features that we have already relaxed in extensions, and additional assumptions which may be relaxable in future research. 1. Timing of noise. This assumption is central to the intuition of attaining simple contracts as it restricts the principal’s ‡exibility. Remark 1 states that, if at precedes t , contract (7) still implements (at )t=1;::;T . However, we can no longer show that it is optimal. 2. Risk-neutral principal. The full proof of Theorem 1 extends the model to the case of a riskaverse principal. If the principal wishes to minimize E [w (c)] (where w is an increasing P function) rather than E [c], then contract (7) is optimal if u (v (w 1 ( )) t g (at )) is concave. This holds if, loosely speaking, the principal is not too risk-averse. 3. NIARA utility, independent and log-concave noise. Proposition 1 states that, if at < a 8 t and deterministic contracts are assumed, (7) is the only incentive-compatible contract. Therefore, these assumptions are not required. Allowing for at = a and stochastic contracts, these assumptions are su¢ cient but may not be necessary. 4. Unidimensional noise and action. Appendix D shows that our model is readily extendable to settings where the action a and the noise are multidimensional. A close analog to our result obtains. 5. Linear signal, rt = at + t . Remark 2 in Section 3.1 later shows that with general signals rt = R (at ; t ), the optimal contract remains tractable and its functional form remains independent of u, u and the distribution of . 17

=

6. Timing of consumption. The current setup assumes that the agent only consumes at the end of period T . In Edmans, Gabaix, Sadzik and Sannikov (2009), we develop the analog of Theorem 1 where the agent consumes in each period, for the case of v (c) = ln c and a CRRA utility function. The contract remains tractable. 7. Renegotiation. Since the target e¤ort path is …xed, there is no scope for renegotiation after the agent observes the noise. In Section 3.1, the optimal action may depend on . Since the contract speci…es an optimal action for every realization of , again there is no incentive to renegotiate.

3

The Optimal E¤ort Level

The analysis has thus far focused on the optimal implementation of a given path of e¤ort levels (at ). In Section 3.1 we allow the target e¤ort level to depend on the current period noise. Section 3.2 derives conditions under which the principal wishes to implement the maximum productive e¤ort level for all noise realizations (the “maximum e¤ort principle”). Section 3.3 allows the principal to choose the maximum productive e¤ort level according to the environment.

3.1

Contingent Target Actions

Let At ( t ) denote the “action function”, which de…nes the target action for each noise realization. (Thus far, we have assumed At ( t ) = at .) Since di¤erent noises t may lead to the same observed signal rt = At ( t ) + t , the analysis must consider revelation mechanisms. If the agent announces noises b1 ; :::; bT , he is paid c = C (b1 ; :::; bT ) if the observed signals are A1 (b1 ) + b1 ; :::; AT (bT ) + bT , and a very low amount c otherwise. As in the core model, we assume that At ( t ) > a 8 t , else a ‡at contract would be optimal for some noise realizations. We also assume that the signal At ( t ) + t is nondecreasing in t : otherwise, as the proof of Proposition 2 shows, the action function cannot be implemented –if a higher noise corresponds to a signi…cantly lower action, the agent would over-report the noise and exert less e¤ort. We make three additional technical assumptions: the action space A is open, At ( t ) is bounded within any compact subinterval of , and At ( t ) is almost everywhere continuous. The …nal assumption still allows for a countable number of jumps in At ( t ). Given the complexity and length of the proof that randomized contracts are inferior in Theorem 1, we now restrict the analysis to deterministic contracts and assume At ( t ) < a. We conjecture that the same arguments in that proof continue to apply with a noise-dependent target action. The optimal contract induces both the target e¤ort level (at = At ( t )) and truth-telling (bt = t ). It is given by the next Proposition: Proposition 2 (Optimal contract, noise-dependent action). A series of contingent action (At ( t ))t=1:::T can be implemented if and only if for all t, At ( t ) + t is nondecreasing in t . If 18

that condition is veri…ed, the following contract is optimal. For each t, after noise t is realized, the agent communicates a value bt to the principal. If the subsequent signal is not At (bt ) + bt in each period, he is paid a very low amount c. Otherwise he is paid C (b1 ; :::; bT ), where C ( 1 ; :::;

T)

=v

1

T X

g (At ( t )) +

t=1

T Z X

t

g 0 (At (x)) dx + K

t=1

!

;

(24)

constant, and K iis a constant that makes the participation constraint bind his anParbitrary T R t 0 g (At (x)) dx + K = u.) (E u t=1

Proof. (Heuristic). The Appendix presents a rigorous proof that does not assume di¤erentiability of V and A. Here, we give a heuristic proof that conveys the essence of the result using …rst-order conditions. We set T = 1 and drop the time subscript. Instead of reporting , the agent could report b 6= , in which case he receives c unless r = A (b)+ b. Therefore, he must take action a such that +a = b+A (b), i.e. a = A (b)+ b . In this case, his utility is V (b) g (A (b) + b ). The truth-telling constraint is thus: g (A (b) + b

2 arg max V (b)

The …rst-order condition is

b

);

V 0 ( ) = g 0 (A ( )) A0 ( ) + g 0 (A ( )) : Integrating over

gives the indirect felicity function V ( ) = g (A ( )) +

Z

g 0 (A (x)) dx + K

for constants and K. The associated pay is given by (24). The contract in Proposition 2 remains in closed form and its functional form does not depend on u, u nor the distribution of .16 However, it is somewhat more complex than the contracts in Section 2, as it involves calculating an integral. In the particular case where A ( ) = a 8 , Proposition 2 reduces to Theorem 1. Remark 2 (Extension of Proposition 2 to general signals). Suppose the signal is a general function rt = R (at ; t ), where R is di¤erentiable and has positive derivatives in both arguments, R1 (a; ) =R2 (a; ) is nondecreasing in a, and R (At ( t ) ; t ) is nondecreasing in t . The same 16

Even though (24) features an integral over the support of , it does not involve the distribution of .

19

analysis as in Proposition 2 derives the following contract as optimal: C ( 1 ; :::;

T)

=v

1

T X

g (At ( )) +

t=1

where bind.

Z

t

R2 (At (x) ; x) g 0 (At (x)) dx + K R1 (At (x) ; x)

!

;

(25)

is an arbitrary constant and K is a constant that makes the participation constraint

The heuristic proof is as follows (setting T = 1 and dropping the time subscript). If is observed and the agent reports b 6= , he has to take action a such that R (a; ) = R (A (b) ; b). Taking the derivative at b = yields R1 @[email protected] = R1 A0 ( ) + R2 . The agent solves max b V (b) g (a (b)), with …rst-order condition V 0 ( ) g 0 (A ( )) @[email protected] = 0. Substituting for @[email protected] from above and integrating over yields (25).

3.2

Maximum E¤ort Principle

We now consider the optimal action function A ( ), specializing to T = 1 for simplicity and dropping the time index. The principal chooses A ( ) to maximize max

fa( )g

Z

b (a ( ) ; ) f ( ) d

C [A] :

(26)

The …rst term represents the productivity of e¤ort, where a ( ) = min A ( ) ; a and a < a is the maximum productive e¤ort level. The min A ( ) ; a function conveys the fact that, while the action space may be unbounded (a may be in…nite), there is a limit to the number of productive activities the agent can undertake to bene…t the principal. For example, in a cash ‡ow diversion model, a re‡ects zero stealing; in an e¤ort model, there is a limit to the number of hours a day the agent can work while remaining productive. In a project selection model, there is a limit to the number of positive-NPV projects available; a re‡ects taking all of these projects while rejecting negative-NPV projects. In addition to being economically realistic, this assumption is useful technically as it prevents the optimal action from being in…nite. Actions a > a do not bene…t the principal, but improve the signal: one interpretation is manipulation (see Appendix C for further details). Clearly, the principal will never wish to implement a > a. For brevity, we use “maximum e¤ort” to refer to maximum productive e¤ort a. b( ) is the productivity function of e¤ort which is di¤erentiable with respect to a ( ). f ( ) is the density of , assumed to be …nite. The second term, C [A], is the expected cost of the contract required to implement A ( ) (we suppress the dependence on for brevity). We assume that g is strictly convex, and that g (g 0 ) 1 and g 0 are convex; this assumption is satis…ed for many standard cost functions, e.g. g (a) = Ga2 and g (a) = eGa for G > 0. The following Proposition bounds the di¤erence in the costs of the contract implementing maximum

20

e¤ort, and an arbitrary contract:17 Proposition 3 (Bound on di¤erence in costs.) There exists a function all plans fa ( )g where 8 ; a ( ) a, h i C A

C [A]

Z

a;

a

a;

such that, for

(27)

a( ) d :

Proof. See Appendix. The next Theorem gives conditions under which maximum e¤ort is optimal. Theorem 3 (Maximum e¤ort principle). Assume that 8 ; 8a a, @1 b (a; ) f ( ) a; , i.e. the marginal bene…t of e¤ort is su¢ ciently large. Then, the optimal plan is to implement maximum e¤ort, A ( ) = a. Proof. For any plan, Z

b a;

Z

b (a ( ) ; ) f ( ) d

Z

inf @1 b (a; ) a a

a( ) f ( )d

a; a a( ) d h i C [A] C A by Proposition 3. Hence, Z

b a;

f ( )d

h i C A

Z

b (a ( ) ; ) f ( ) d

C [A]

i.e., the principal’s objective is maximized by inducing maximum e¤ort. Theorem 3 above shows that, if the marginal bene…t of e¤ort is su¢ ciently greater than the marginal cost, than maximum e¤ort is optimal. A su¢ cient (although unnecessary) condition is for the …rm to be su¢ ciently large. To demonstrate this, we parameterize the b function by b (a; ) = Sb (a; ), where S is the baseline value of the output under the agent’s control. For example, if the agent is a CEO, S is …rm size; if he is a divisional manager, S is the size of his division. We will refer to S as …rm size for brevity. Under this speci…cation, the bene…t of e¤ort is multiplicative in …rm size. This is plausible for most agent actions, which can be “rolled out” across the whole company and thus have a greater e¤ect in a larger …rm. Examples include the h i The proof shows that we can take a; = max @C A [email protected] ( ) ; 0 . We use partial derivatives such @C [A] [email protected] ( ). Their meaning is traditional and is as follows. Under weak conditions, C [ ] is di¤erentiable A, in the sense that there is a function ( ) (unique up to sets of measure 0) such that, for any fB ( )g, R limh!0 (C [A + hB] C [A]) =h = ( ) B ( ) d . Then, we de…ne @C [A] [email protected] ( ) = ( ). 17

21

choice of strategy, the launch of new projects, or increasing production e¢ ciency.18 Let F denote the complementary cumulative distribution function of , i.e. F (x) = Pr ( x). We assume that sup F ( ) =f ( ) < 1 and inf @1 b a; > 0, and de…ne: S =

a inf @1 b

a;

;

a

g 0 (a) + g 00 (a) sup v0 v

1

u 1 (u) + g(a) + (

F( ) f( )

)g 0 (a)

:

(28)

Calculations in the Online Appendix show that if, S > S , i.e. the …rm is su¢ ciently large, then it is optimal for the principal to induce maximum e¤ort. Indeed, in Proposition 3 we can a; = a f ( ): take a contains the two costs The intuition for the above is as follows. The numerator of of inducing higher e¤ort – the disutility imposed on the agent (the …rst term) plus the risk imposed by the incentive contract required to implement e¤ort (the second term). These are scaled by the denominator, where the term in brackets is an upper bound on the pay received by the agent. The costs of e¤ort are thus of similar order of magnitude to the agent’s pay. The bene…t of e¤ort is enhanced …rm value and thus of similar order of magnitude to …rm size. If the …rm is su¢ ciently large (S > S ), the bene…ts of e¤ort outweigh the costs and so maximum productive e¤ort is optimal. A simple numerical example illustrates. Consider a …rm with a $10b market value and, to be conservative, assume that maximum e¤ort increases …rm value by only 1%. Then, maximum e¤ort creates $100m of value, which vastly outweighs the agent’s salary. Even if it is necessary to double the agent’s salary to compensate him for the costs of increased e¤ort, this is swamped by the bene…ts. The comparative statics on the threshold …rm size S are intuitive. First, S is increasing in noise dispersion, because the …rm must be large enough for maximum e¤ort to be optimal for all noise realizations. Indeed, a rise in increases u 1 (u) + g(a) + ( )g 0 (a), lowers , and raises sup F =f . (For example, if the noise is uniform, then sup F =f = .) Second, it is increasing in the agent’s risk aversion parameterized by v and thus the risk imposed by incentives. Third, it is increasing in the disutility of e¤ort, and thus the marginal cost of e¤ort g 0 a and the convexity of the cost function g 00 (a). Fourth, it is decreasing in the marginal bene…t of e¤ort (inf @1 b a; ). Thus, the maximum e¤ort principle is especially likely to hold if noise, risk aversion and the cost of e¤ort are small. We conjecture that a “maximum e¤ort principle”holds under more general conditions than those considered above. For instance, it likely continues to hold if the principal’s objective R function is maxfa( )g b (A ( ) ; ) f ( ) d C [A] and the action space is bounded above by a – i.e. a (the maximum feasible e¤ort level) equals a (the maximum productive e¤ort level). This 18

Bennedsen, Perez-Gonzalez and Wolfenzon (2009) provide empirical evidence that CEOs have the same percentage e¤ect on …rm value, regardless of …rm size; Edmans, Gabaix and Landier (2009) show that a multiplicative production function is necessary to generate empirically consistent predictions for the scaling of various measures of incentives with …rm size.

22

slight variant is economically very similar, since the principal never wishes to implement A ( ) > a in our setting, but substantially more complicated mathematically, because the agent’s action space now has boundaries and so the incentive constraints become inequalities. We leave this extension to future research. Hellwig (2007) shows that this reason alone is su¢ cient for a boundary e¤ort level to be always optimal in a multiperiod discrete model and a continuoustime model that can be approximated by a discrete-time model, even in the absence of condition on the bene…t of e¤ort featured in this paper. Since the incentive constraints are inequalities with a boundary e¤ort level, the principal has greater freedom in choosing the contract, which allows her to select a cheaper contract. Thus, the maximum e¤ort result holds in settings even without a large bene…t of e¤ort. Lacker and Weinberg (1989) similarly derive a condition under which maximum e¤ort (zero diversion in their setting) is optimal, for the case v (c) = c. In DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007), zero diversion is optimal since the agent is risk-neutral and so there is no trade-o¤ between risk and incentives. Edmans, Gabaix, Sadzik and Sannikov (2009) extend the maximum e¤ort principle to general T , for the case where v (c) = ln c (multiplicative preferences) and u is CRRA. In the full contracting problem, which solves for both the optimal e¤ort level and the cheapest implementing contract, tractable contracts are attained by forcing a constant incentive slope on the agent to rule out the ambiguous reward for performance. This is achieved in our paper through two key mechanisms. First, we achieve a constant marginal cost of e¤ort by implementing a constant target action. This requires the removal of dynamics so that the action that the principal wishes to implement is independent of prior period outcomes. Previous papers remove dynamics via removing wealth e¤ects, so that the cost of implementing a given action is constant. For example, HM assume CARA utility and a pecuniary cost of e¤ort, so that wealth has no e¤ect on the agent’s risk aversion, and has an identical e¤ect on the felicity from cash and cost of e¤ort. DeMarzo and Sannikov (2006), DeMarzo and Fishman (2007) and Biais et al. (2007) assume risk-neutrality, so that risk aversion is independent of wealth (it is always zero) and the marginal utility of money is constant. The key insight of this paper is that we can remove dynamics without removing wealth e¤ects, and thus without imposing constraints on the utility function or the cost of e¤ort. Speci…cally, a constant target action need not require the cost of implementing the action to be constant – it only requires changes in these costs to be small compared to the bene…ts of e¤ort. If the bene…ts of e¤ort are su¢ ciently large (e.g. the …rm is big), maximum e¤ort remains optimal regardless of how the cost of implementing e¤ort changes over time. Thus, our formulation allows for wealth e¤ects to exist (and thus the utility function to be unrestricted), while at the same time removing dynamics and thus achieving tractability because such e¤ects are small. The main limitation of our setup is that, in order to relax the HM assumptions, we require a restriction on the bene…t of e¤ort for Theorem 3 to hold. Second, our timing assumption forces the constant marginal cost of e¤ort (which is a consequence of the constant action) to equal the marginal felicity from cash state-by-state, and 23

thus requires the reward for performance to be the same after every noise realization. In sum, the paper provides a set of su¢ cient conditions under which simple contracts can be obtained –actions following noise and a large bene…t of e¤ort –which is quite di¤erent than considered in prior literature. They may therefore hold in settings where the alternative assumptions are not satis…ed and tractability was previously believed to be unattainable. Appendix E considers other su¢ cient conditions required for Proposition 3 to hold, which do not assume the bene…t of e¤ort is multiplicative in …rm size. That section also shows that we can derive the optimal fA ( )g in certain cases even where the maximum e¤ort principle does not apply.

3.3

Determinants of the Maximum E¤ort Level

The previous section assumed that the maximum productive e¤ort level a is exogenous. This section allows the principal to choose it endogenously according to the environment. We extend the contracting game to two stages. In the …rst stage, the principal chooses a. In practice, this may be achieved by physical investment, training the agent, or organizational design. For example, building a larger plant gives the agent greater scope to add value; training the agent or choosing an organizational structure that gives him greater responsibility and freedom have the same e¤ect. Since physical investment, training and organizational design are costly to reverse, we model this decision as irreversible. In the second stage, the game studied in the core model is played out. In this stage, the action a may respond to the noise , but the maximum productive e¤ort a has been …xed. The principal’s payo¤ is: Z

b min A ( ) ; a ; ; a d

C [A]

(29)

where b a; ; a is weakly increasing in a and decreasing in a. Higher ‡exibility a is costly to the principal –for instance, we could have b a; ; a = b (a; ) H a , where H a is the cost of implementing ‡exibility level a. Before we state the result formally, we summarize it. Under conditions described below, in the second stage, the principal will wish to implement the contract in Theorem 1 with a = a, i.e. the maximum e¤ort principle applies. In the …rst stage, when choosing a, she will trade o¤ the costs and bene…ts of a higher maximum e¤ort. For instance, in the examples at the end of this section, a is decreasing in the agent’s disutility of e¤ort and the noise dispersion. A trade-o¤ exists in the …rst stage because the costs and bene…ts of ‡exibility are of similar order of magnitude. For example, increasing plant size has a continuous e¤ect on …rm value and involves a signi…cant cost, which is also a function of …rm size. However, it does not exist in the second stage because the costs of e¤ort are a function of the agent’s salary, and the bene…ts are discontinuous. Once the plant has been built, the agent must run it fully e¢ ciently to prevent 24

signi…cant value loss –even small imperfections will cause large reductions in value and so the marginal bene…t of e¤ort is high (analogous to Kremer’s (1993) O-ring theory). Thus, this enriched game features a simple optimal contract (since the target action in the second stage in constant), but one which also responds to the comparative statics of the environment. It may thus be a potentially useful way of modeling various economic problems, to achieve tractability while at the same time generating comparative statics. To proceed more formally, consider the two following problems. Problem 1 : maximize over a and all unrestricted contracts: max E b min A ( ) ; a ; ; a

C [A] :

a;fa( )g

Problem 2 : maximize over a and use the contract in Theorem 1 which implements a: max E b a; ; a

C a :

a

where C a is the expected cost of the contract implementing a constant action a. Problem 2 optimizes over only a scalar a, while Problem 1 optimizes over a whole continuum of contracts, including those that do not implement maximum e¤ort. However, under some simple conditions, Problem 2 is not restrictive –both problems have the same solution. Proposition 4 (Maximum e¤ort in two-stage game). Let a denote the value of a in a solution to Problem 1, and assume that a > a and that 8 , inf a @1 b (a; ; a ) f ( ) (a ; ). Then, the solution of Problem 1 is the same solution as Problem 2: that is, the solution of the problem that implements A ( ) = a is also the solution of the unrestricted contract. Proof. Immediate given Theorem 3. At a , the principal wants to implement maximum e¤ort, i.e. a ( ) = a for all : At …rst glance, the condition in Proposition 4 may appear restrictive, since verifying it requires solving Problem 1. However, su¢ cient conditions are simply inf a @1 b (a; ; a ) f ( ) (a ; ) for all a and . The value can be calculated up to an integral, so bounds are reasonably straightforward to check in a given setting. Illustrations We now illustrate the contract and comparative statics in three examples, for speci…c cases of u and v. We de…ne B (a) = E [b (a; ; a)], the principal’s expected payo¤ given target e¤ort a. The optimal contract gives c = v 1 (g 0 (a) + k) where k satis…es E [u (g 0 (a) g (a) + k)] = u. Using previous notation, k = K + g 0 (a) a. The expected cost of the contract is C [a] E [c (r)] = E [v 1 (g 0 (a) + k)]. It is straightforward to show that C [a] increases in target e¤ort a, the agent’s reservation utility u, and the dispersion of noise ; the proof relies on the dispersion techniques used in this paper.

25

The principal’s problem is: max B (a) a

(30)

C [a]

and the optimal contract is the contract described in Theorem 1 implementing a constant a. This is a simple problem to solve in many applied settings. Example 1. Consider u (x) = x, v (x) = x , 2 (0; 1]. We have k = g (a) + u, and the contract is c (r) = (g 0 (a) (r a) + g (a) + u)1= . The expected cost is19 h

1=

0

C [a] = E (g (a) + g (a) + u)

i

:

C [a] can be obtained in closed form for various speci…c cases. For example, h = 1=2 yields i C [a] = g 0 (a)2 2 + (g (a) + u)2 ; u = 0 and g (a) = eGa yields C [a] = eGa= E (G + 1)1= . h i 1= Ga= The principal chooses a to maximize B (a) e E (G + 1) . Simple calculations show that the target action is decreasing in the marginal cost of e¤ort G, risk aversion and the dispersion of noise . Example 2. Consider v (x) = ln x and u (x) = e(1 )x = (1 ) for > 1, so that the utility g(a) 1 = (1 ), as is commonly used in macroeconomics: it is CRRA and function is ce multiplicative in consumption and e¤ort. We also assume N (0; 2 ). Then, the contract is c (r) = exp (g 0 (a) (r a) + k) with k = ln c + g (a) (1 ) g 0 (a)2 2 =2, where u (ln c) is the reservation utility. The expected cost of the contract is: C [a] = c exp g (a) + g 0 (a)2

2

=2 .

Again, calculations show that a is decreasing in the cost of e¤ort, risk aversion and noise dispersion. We thus obtain the standard comparative statics, but for a contract that is log-linear, rather than linear in returns. Murphy (1999) argues that log-linear contracts are empirically more relevant. Example 3. Consider v (x) = x, g (a) = 12 Ga2 , u (x) = e x with G; > 0, and N (0; 2 ) as in HM. The cost of the contract is C [a] = c + g (a) + g 0 (a)2 2 =2, and the same three comparative statics hold. Note that HM not only have a constant target action, but an additive e¤ect of e¤ort. We can obtain this result with b a; ; a = a + a a a; , for some function a; a; =f ( ). In the second stage of the game, having chosen a, the principal wishes to implement constant e¤ort a for all , because the marginal cost of shirking (parameterized by ) is su¢ ciently high. Moving to the …rst stage, since the principal knows that a = a in the second stage, her bene…t function is b a; ; a = a: e¤ort has an additive e¤ect. The key complication in obtaining the HM result is reconciling the linear marginal bene…t 19

0 A variant is the case u (x) = x and v (x) = ln h x.0 Then, i the contract is ln c (r) = g (a) (r and the expected cost is C (a) = exp [g (a) + u] E eg (a) :

26

a) + g (a) + u,

of e¤ort required for an additive e¤ect, with the high marginal bene…t of e¤ort required for the maximum e¤ort principle to apply to guarantee a constant action. The two-stage game resolves this tension because the marginal bene…t of e¤ort is moderate in the …rst stage and very high in the second stage, as discussed in the plant example earlier. Under this formulation, the cost of the contract implementing a = a is C [a] = c + 12 Ga2 + 2 G2 a2 2 and the principal maximizes a c 12 Ga G2 a2 2 which yields the result a = 2 2 1=G (1 + G 2 ), exactly as in HM. Thus, using the HM conditions of exponential utility, a pecuniary quadratic cost of e¤ort and Gaussian noise in the above speci…cation, leads to the same optimal contract (not just the implementation contract) as in HM. In Appendix E, we also provide explicit conditions under which maximum e¤ort is optimal for the three above examples, i.e. a specialization of the conditions in Proposition 4 to these cases. These conditions allow straightforward veri…cation of whether the maximum e¤ort principle holds.

4

Conclusion

This paper has identi…ed and analyzed a class of multiperiod situations in which the optimal contract is tractable, without requiring exponential utility, a pecuniary cost of e¤ort, Gaussian noise or continuous time. The contract’s functional form is independent of the agent’s utility function, reservation utility and noise distribution. Furthermore, when the cost of e¤ort can be expressed in …nancial terms, the optimal contract is linear and so the slope, in addition to the functional form, is independent of these parameters. The key to tractability in discrete time is specifying the noise before the action in each period, which forces the incentive compatibility constraint to hold state-by-state rather than just on average, and tightly constraints the set of contracts available to the principle. The optimal contract is very similar in continuous time, where noise and actions occur simultaneously. Hence, if underlying reality is continuous time, it is best approximated in discrete time under our timing assumption. Moving to the full contracting problem, our two-stage model allows the principal to choose the target e¤ort level to respond to the details of the environment, while retaining tractability. The principal initially sets a lower maximum productive e¤ort level if the agent is more risk averse or faces a higher cost of e¤ort or greater noise. However, in each subsequent period, the principal wishes the agent to exert maximum e¤ort, regardless of how output evolves. If the bene…ts of e¤ort are su¢ ciently high (e.g. the …rm is much larger than the agent’s salary), they swamp the costs, and so the optimal e¤ort level is independent of how the agent’s wealth evolves over time. Our paper suggests several avenues for future research. The HM framework has proven valuable in many areas of applied contract theory owing to its tractability; however, some models have used the HM result in settings where the assumptions are not satis…ed (see the critique of Hemmer (2004)). Our framework allows tractable contracts to be achieved in such situations. In 27

particular, our contracts are valid in situations where time is discrete, utility cannot be modeled as exponential (e.g. in calibrated models where it is necessary to capture decreasing absolute risk aversion), e¤ort is non-pecuniary, or noise is not Gaussian (e.g. is bounded). While we considered the speci…c application of executive compensation, other possibilities include bank regulation, team production, insurance or taxation.20 In ongoing work (Edmans, Gabaix, Sadzik and Sannikov (2009)) we extend tractable contracts to a dynamic setting where the agent consumes in each period, can privately save, and may smooth earnings intertemporally. In addition, while our model has relaxed a number of assumptions required for tractability, it continues to impose a number of restrictions. In particular, the optimal action can only be solved tractably if the maximum e¤ort principle applies or in certain other cases (e.g. linear cost of e¤ort). Grossman and Hart (1983) and Garrett and Pavan (2009) show that solving for the optimal action in a general case is typically extremely complex; whether we can extend tractability to broader settings is an important area for future research. Similarly, while Section 3 allows for the action to depend on the noise in period t, a useful extension would be to allow the action to depend on the full history of outcomes. Other restrictions are mostly technical rather than economic. For example, our multiperiod model assumes independent noises with log-concave density functions; and our extension to noise-dependent target actions assumes an open action set where the maximum feasible e¤ort level exceeds the maximum productive e¤ort level. Some of these assumptions may not be valid in certain situations, limiting the applicability of our framework. Further research may be able to broaden the current setup.

20

See Golosov, Kocherlakota and Tsyvinski (2003) and Farhi and Werning (2009) for taxation applicaitons of the principal-agent problem.

28

a a a a b c f g r u u v A C [A] F M S T V

E¤ort (also referred to as “action”) Maximum e¤ort Maximum productive e¤ort Target e¤ort Bene…t function for e¤ort, de…ned over a Cash compensation, de…ned over r or Density of the noise distribution Cost of e¤ort, de…ned over a Signal (or “return”), typically r = a + Agent’s utility function, de…ned over v (c) g (a) Agent’s reservation utility Agent’s felicity function, de…ned over c Noise Action function, de…ned over Expected cost of contract implementing A ( ) ; 2 Complementary cumulative distribution function of Message sent by agent to the principal Baseline size of output under agent’s control Number of periods Felicity provided by contract, de…ned over r or

;

Table 1: Key Variables in the Model.

A

Mathematical Preliminaries

This section derives some mathematical results that we use for the main proofs.

A.1

Dispersion of Random Variables

We repeatedly use the “dispersive order”for random variables to show that IC constraints bind. Shaked and Shanthikumar (2007, Section 3.B) provide an excellent summary of known facts about this concept. This section provides a self-contained guide of the relevant results for our paper, as well as proving some new results. We commence by de…ning the notion of relative dispersion. Let X and Y denote two random variables with cumulative distribution functions F and G and corresponding right continuous inverses F 1 and G 1 . X is said to be less dispersed than Y if and only if F 1 ( ) F 1 ( ) G 1 ( ) G 1 ( ) whenever 0 < < 1. This concept is location-free: X is less dispersed than Y if and only if it is less dispersed than Y + z, for any real constant z. A basic property is the following result (Shaked and Shanthikumar (2007), p.151): Lemma 1 Let X be a random variable and f , h be functions such that 0 h (y) h (x) whenever x y. Then f (X) is less dispersed than h (X). 29

f (y)

f (x)

This result is intuitive: h magni…es di¤erences to a greater extent than f , leading to more dispersion. We will also use the next two comparison lemmas. Lemma 2 Assume that X is less dispersed than Y and let f denote a weakly increasing function, h a weakly increasing concave function, and a weakly increasing convex function. Then: E [f (X)]

E [f (Y )] ) E [h (f (X))]

E [h (f (Y ))]

E [f (X)]

E [f (Y )] ) E [ (f (X))]

E [ (f (Y ))] :

Proof. The …rst statement comes directly from Shaked and Shanthikumar (2007), Theorem 3.B.2, which itself is taken from Landsberger and Meilijson (1994). The second statement is b = X, Yb = Y , fb(x) = f ( x), h (x) = derived from the …rst, applied to X ( x). It can be veri…ed directly (or via consulting Shakedh and Shanthikumar (2007), Theorem 3.B.6) that i h i h i b b b b b b b b E f Y . Thus, E h f X X is less dispersed than Y . In addition, E f X i h b = (f (X)) yields E [ (f (X))] E [ (f (Y ))]. . Substituting h fb X E h fb Yb Lemma 2 is intuitive: if E [f (X)] E [f (Y )], applying a concave function h should maintain the inequality. Conversely, if E [f (X)] E [f (Y )], applying a convex function should maintain the inequality. In addition, if E [X] = E [Y ], Lemma 2 implies that X second-order stochastically dominates Y . Hence, it is a stronger concept than second-order stochastic dominance. Lemma 2 allows us to prove Lemma 3 below, which states that the NIARA property of a utility function is preserved by adding a log-concave random variable to its argument. Lemma 3 Let u denote a utility function with NIARA and Y a random variable with a logconcave distribution. Then, the utility function u b de…ned by u b (x) E [u (x + Y )] exhibits NIARA. Proof. Consider two constants a < b and a lottery Z independent from Y . Let Ca and Cb be the certainty equivalents of Z with respect to utility function u b and evaluated at points a and b respectively, i.e. de…ned by u b (a + Ca ) = E [u (a + Z)] ;

u b (b + Cb ) = E [u (b + Z)] :

u b exhibits NIARA if and only if Ca Cb , i.e. the certainty equivalent increases with wealth. To prove that Ca Cb , we make three observations. First, since u exhibits NIARA, there exists an increasing concave function h such that u (a + x) = h (u (b + x)) for all x. Second, because Y is log-concave, Y + Cb is less dispersed than Y + Z by Theorem 3.B.7 of Shaked and Shanthikumar (2007). Third, by de…nition of Cb and the independence of Y and

30

Z, we have E [u (b + Y + Cb )] = E [u (b + Y + Z)]. Hence, we can apply Lemma 2, which yields E [h (u (b + Y + Cb ))] E [h (u (b + Y + Z))], i.e. E [u (a + Y + Cb )] Thus we have Cb

A.2

E [u (a + Y + Z)] = E [u (a + Y + Ca )] by de…nition of Ca :

Ca as required.

Subderivatives

Since we cannot assume that the optimal contract is di¤erentiable, we use the notion of subderivatives to allow for quasi …rst-order conditions in all cases. De…nition 1 For a point x and function f de…ned in a left neighborhood of x, we de…ne the subderivative of f at x as: d f dx

f 0 (x)

lim inf y"x

f (x) x

f (y) y

This notion will prove useful since f 0 (x) is well-de…ned for all functions f (with perhaps in…nite values). We take limits “from below,” as we will often apply the subderivative at the maximum feasible e¤ort level a. If f is left-di¤erentiable at x, then f 0 (x) = f 0 (x). We use the following Lemma to allow us to integrate inequalities with subderivatives. All the Lemmas in this subsection are proven in the Online Appendix. Lemma 4 Assume that, over an interval I: (i) f 0 (x) j (x) 8 x, for an continuous function j (x) and (ii) there is a C 1 function h such that f + h is nondecreasing. Then, for two points Rb a b in I, f (b) f (a) j (x) dx. a

Condition (ii) prevents f (x) from exhibiting discontinuous downwards jumps, which would prevent integration.21 The following Lemma is the chain rule for subderivatives. Lemma 5 Let x be a real number and f be a function de…ned in a left neighborhood of x. Suppose that function h is di¤erentiable at f (x), with h0 (f (x)) > 0. Then, (h f )0 (x) = h0 (f (x)) f 0 (x). In general, subderivatives typically follow the usual rules of calculus, with inequalities instead of equalities. One example is below.

Lemma 6 Let x be a real number and f , h be functions de…ned in a left neighborhood of x. Then (f + h)0 (x) f 0 (x)+h0 (x). When h is di¤erentiable at x, then (f + h)0 (x) = f 0 (x)+h0 (x). 21

For example, f (x) = 1 fx 0g satis…es condition (i) as f 0 (x) = 0 8 x, but violates both condition (ii) and the conclusion of the Lemma, as f ( 1) > f (1).

31

B

Detailed Proofs

Throughout these proofs, we use tildes to denote random variables. For example, e is the noise viewed as a random variable and is a particular h irealization of that noise. E [f (e)] denotes the e expectation over all realizations of e and E f (e) denotes the expectation over all realizations of both x and a stochastic function fe.

Proof of Theorem 1 Roadmap. We divide the proof in three parts. The …rst part shows that messages are redundant, so that we can restrict the analysis to contracts without messages. This part of the proof is standard and can be skipped at a …rst reading. The second part proves the theorem considering only deterministic contracts and assuming that at < a 8 t. This case requires weaker assumptions (see Proposition 1). The third part, which is signi…cantly more complex, rules out randomized contracts and allows for the target e¤ort to be the maximum a. Both these extensions require the concepts of subderivatives and dispersion from Appendix A. 1). Redundancy of Messages Let r denote the vector (r1 ; :::; rT ) and de…ne and a analogously. De…ne g (a) = g (a1 ) + :::+g (aT ). Let VeM (r; ) = v (e c (r; )) denote the felicity given by a message-dependent contract if the agent reports and the realized signals are r. Under the revelation principle, we can restrict the analysis to mechanisms that induce the agent to truthfully report the noise . The incentive compatibility (IC) constraint is that the agent exerts e¤ort a and reports b = : h

8 ; 8b; 8a; E u VeM ( + a; b )

g (a)

i

h

E u VeM ( + a ; )

g (a )

i

:

(31)

i h The principal’s problem is to minimize expected pay E v 1 VeM (e + a ; e ) , subject to the IC constraint (31), and the agent’s individual rationality (IR) constraint h E u VeM (e + a ; e )

g (a )

i

u:

(32)

Since r = r a + on the equilibrium path, the message-dependent contract is equivalent to VeM (r; r a ). We consider replacing this with a new contract Ve (r), which only depends on the realized signal and not on any messages, and yields the same felicity as the corresponding message-dependent contract. Thus, the felicity it gives is de…ned by: Ve (r) = VeM (r; r

32

a ):

(33)

The IC and IR constraints for the new contract are given by: h i 8 ; 8a; E u Ve (r) g (a) h i E u Ve (r ) g (a )

h E u Ve (r )

g (a )

u:

i

(34)

;

(35)

If the agent reports b 6= , he must take action a such that +a =b +a . Substituting b = +a a into (31) and (32) indeed yields (34) and (35) above. Thus, the IC and IR constraints of the new contract are satis…ed. Moreover, the new contract costs exactly the same as the old contract, since it yields the same felicity by (33). Hence, the new contract Ve (r) induces incentive compatibility and participation at the same cost as the initial contract VeM (r; ) with messages, and so messages are not useful. The intuition is that a is always exerted, so the principal can already infer from the signal r without requiring messages. 2). Deterministic Contracts, in the case at < a 8 t We will prove the Theorem by induction on T . 2a). Case T = 1. Dropping the time subscript for brevity, the incentive compatibility (IC) constraint is: 8 ; 8a : V ( + a) g (a) V ( + a ) g (a ) De…ning r = + a and r0 = + a, we have a = a + r0 g (a )

g (a + r0

r)

r. The IC constraint can be rewritten:

V (r)

V (r0 ) :

Rewriting this inequality interchanging r and r0 yields g (a ) g (a + r r0 ) V (r0 ) and so: g (a ) g (a + r0 r) V (r) V (r0 ) g (a + r r0 ) g (a ) : We …rst consider r > r0 . Dividing through by r g (a )

g (a + r0 r r0

r)

V (r) r

V (r), (36)

r0 yields:

V (r0 ) r0

g (a + r r0 ) r r0

g (a )

:

(37)

Since a is in the interior of the action space A and the support of is open, there exists r0 in the neighborhood of r. Taking the limit r0 " r, the …rst and third terms of (37) converge to 0 0 0 g 0 (a ). Therefore, the left derivative Vlef t (r) exists, and equals g (a ). Second, consider r < r . Dividing (36) through by r r0 , and taking the limit r0 # r shows that the right derivative 0 Vright (r) exists, and equals g 0 (a ). Therefore, V 0 (r) = g 0 (a ) :

33

(38)

Since r has interval support22 , we can integrate to obtain, for some integration constant K: V (r) = g 0 (a ) r + K.

(39)

2b). If the Theorem holds for T , it holds for T + 1. This part is as in the main text. Note that the above proof (for deterministic contracts where at < a) does not require logconcavity of t , nor that u satis…es NIARA. This is because the contract (7) is the only incentive compatible contract. These assumptions are only required for the general proof, where other contracts (e.g. randomized ones) are also incentive compatible, to show that they are costlier than contract (7). 3). General Proof We no longer restrict at to be in the interior of A, and allow for randomized contracts. We wish to prove the following statement T by induction on integer T : Statement T : Consider a utility function u with NIARA, independent random variables re1 ; :::; reT where re2 ; :::; reT are log-concave, and a sequence of nonnegative numbers g 0h(a1 ) ; :::; g 0 (aT ). i Consider the set of (potentially randomized) contracts Ve (r1 ; :::; rT ) such that (i) E u Ve (e r1 ; :::; reT ) u; (ii) 8 t = 1:::T ,

h i h i d E u Ve (e r1 ; :::; ret + "; :::; reT ) j re1 ; ::; ret g 0 (at ) E u0 Ve (e r1 ; :::; ret ; :::; reT ) j re1 ; ::; ret d" j"=0 (40) h i and (iii) 8 t = 1:::; T , E u Ve (e r1 ; :::; ret ; :::; reT ) j re1 ; ::; ret is nondecreasing in ret . In this set, for any increasing and convex cost function , E [ (V (e r1 ; :::; reT ))] is minimized PT 0 0 with contract: V (r1 ; :::; rT ) = t=1 g (at ) rt + K, where K is a constant that makes the participation constraint (i) bind.

Condition (ii) is the local IC constraint, for deviations from below. We …rst consider the case of deterministic contracts, and then show that randomized contracts are costlier. We use the notation Et [ ] = E [ j re1 ; :::; ret ] to denote the expectation based on time-t information. 3a). Deterministic Contracts The key di¤erence from the proof in 2) is that we now must allow for at = a.

3ai). Proof of Statement T when T = 1. d (40) becomes d" u (V (r + "))j"=0 g 0 (a1 ) u0 (V (r)). Applying Lemma 5 to h = u V 0 (r)

g 0 (a ) :

22

1

yields: (41)

The model could be extended to allowing non-interval support: if the domain of r was a union of disjoint intervals, we would have a di¤erent integration constant K for each interval.

34

It is intuitive that (41) should bind, as this minimizes the variability in the agent’s pay and thus constitutes e¢ cient risk-sharing. We now prove that this is indeed the case; to simplify exposition, we normalize g (a ) = 0 w.l.o.g.23 If constraint (41) binds, the contract is V 0 (r) = g 0 (a ) r + K, where K satis…es E [u (g 0 (a ) r + K)] = u. We wish to show that any other contract V (r) that satis…es (41) is weaklier costlier. By assumption (iii) in Statement 1 , V is nondecreasing. We can therefore apply Lemma 4 to equation (41), where condition (ii) of the Lemma is satis…ed by h (r) 0. This implies that for r r0 , V (r0 ) V (r) g 0 (a ) (r0 r) = V 0 (r0 ) V 0 (r). Thus, using Lemma 1, V (e r) is more dispersed than V 0 (e r). Since V must also satisfy the participation constraint, we have: E [u (V (e r))] Applying Lemma 2 to the convex function E

u

1

u (V (e r))

r) u = E u V 0 (e u E

1

(42)

:

and inequality (42), we have: u

1

u V 0 (e r)

,

i.e. E [ (V (e r))] E [ (V 0 (e r))]. The expected cost of V 0 is weakly less than for V . Hence, the contract V 0 is cost-minimizing. We note that this last part of the reasoning underpins item 2 in Section 2.3, the extension to a risk-averse principal. Suppose that the principal wants to minimize E [w (c)], where w is an increasing and concave function, rather than E [c]. Then, the above contract is optimal if w v 1 u 1 is convex, i.e. u v w 1 is concave. This requires w to be “not too concave,”i.e. the agent to be not too risk-averse. Finally, we verify that the contract V 0 satis…es the global IC constraint. The agent’s objective function becomes u (g 0 (a ) (a + ) g (a)). Since g (a) is convex, the argument of u ( ) is concave. Hence, the …rst-order condition gives the global optimum. 3aii). Proof that if Statement T holds for T , it holds for T + 1. We de…ne a new utility function u b as follows: u b (x) = E u x + g 0 aT +1 reT +1 : (43) Since reT +1 is log-concave, g 0 aT +1 reT +1 is also log-concave. From Lemma 3, u b has the same NIARA property as u. For each re1 ; :::; reT , we de…ne k (e r1 ; :::; reT ) as the solution to equation (44) below: 23

u b (k (e r1 ; :::; reT )) = ET [u (V (e r1 ; :::; reT +1 ))] :

Formally, this can be achieved by replacing the utility function u (x) by unew (x) = u (x cost function g (a) by g new (a) = g (a) g (a ), so that u (x g (a)) = unew (x g new (a)).

35

(44)

g (a )) and the

k represents the expected felicity from contract V based on all noise realizations up to and including time T . The goal is to show that any other contract V 6= V 0 is weakly costlier. To do so, we wish to apply Statement T for utility function u b and contract k, The …rst step is to show that, if Conditions (i)-(iii) hold for utility function u and contract V at time T + 1, they also hold for u b and k at time T , thus allowing us to apply the Statement for these functions. Taking expectations of (44) over re1 ; :::; reT yields: E [b u (k (e r1 ; :::; reT ))] = E [u (V (e r1 ; :::; reT +1 ))]

(45)

u;

where the inequality comes from Condition (i) for utility function u and contract V at time T + 1. Hence, Condition (i) holds for utility function u b and contract k at time t. In addition, it is immediate that E [b u (k (e r1 ; :::; reT )) j re1 ; ::; ret ] is nondecreasing in ret . (Condition (iii)). We thus need to show that Condition (ii) is satis…ed. Since equation (40) holds for t = T + 1, we have d u (V (e r1 ; :::; reT ; reT +1 + ")) d"

g 0 aT +1 u0 [V (e r1 ; :::; reT +1 )] :

Applying Lemma 5 with function u yields:

dV (r1 ; :::; rT +1 ) drT +1

g 0 aT +1 :

(46)

Hence, using Lemma 1 and Lemma 4, we see that conditional on re1 ; :::; reT , V (e r1 ; :::; reT +1 ) is more dispersed than k (e r1 ; :::; reT ) + g 0 aT +1 reT +1 . Using (43), we can rewrite equation (44) as ET u k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

= ET [u (V (e r1 ; :::; reT +1 ))] :

Since u exhibits NIARA, u00 (x) =u0 (x) is nonincreasing in x. This is equivalent to u0 being weakly convex. We can thus apply Lemma 2 to yield: ET u 0 u

1

u (V (e r1 ; :::; reT +1 )) 0

ET [u (V (e r1 ; :::; reT +1 ))]

ET u 0 u 0

1

u k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

u

1

, i.e.

ET [b u (k (e r1 ; :::; reT ))] :

(47)

Applying de…nition (44) to the left-hand side of Condition (ii) for T +1 yields, with t = 1:::T , d Et [b u (k (e r1 ; :::; ret + "; :::; reT ))]j"=0 d"

g 0 (at ) E [u0 (V (e r1 ; :::; ret ; :::; reT +1 )) j re1 ; ::; ret ]

Taking expectations of equation (47) at time t and substituting into the right-hand side of the

36

above equation yields: d d Et [b u (k (e r1 ; :::; ret + "; :::; reT ))] = Et [u (V (e r1 ; :::; ret + "; :::; reT +1 ))]j"=0 d" d" g 0 (at ) Et [b u0 (k (e r1 ; :::; reT ))] :

Hence the IC constraint holds for contract k (e r1 ; :::; reT ) and utility function u b at time T , and so Condition (ii) of Statement T is satis…ed. We can therefore apply Statement T at T to contract k (r1 ; :::; rT ), utility function u b and cost function b de…ned by: b (x)

We observe that the contract V 0 = "

E u b

T X

E [ (x + g 0 (aT +1 ) reT +1 )] :

PT +1 t=1

g 0 (at ) rt + K

t=1

Therefore, applying Statement h

T

g 0 (at ) rt + K satis…es:

!#

"

T +1 X

=E u

g 0 (at ) rt + K

t=1

to k; u b and b implies:

i b Ck = E (k (e r1 ; :::; reT ))

(48)

CV 0 = E

"

T +1 X t=1

Using equation (48) yields:

Ck = E [ (k (e r1 ; :::; reT ) + g 0 (aT +1 ) reT +1 )]

CV 0 = E

!#

= u:

g 0 (at ) ret + K

"

T +1 X t=1

!#

(49)

:

g 0 (at ) ret + K

!#

:

Finally, we compare the cost of contract k (r1 ; :::; rT ) + g 0 (aT +1 ) reT +1 to the cost of the original contract V (r1 ; :::; rT +1 ). Since equation (44) is satis…ed, we can apply Lemma 2 to the convex function u 1 and the random variable reT +1 to yield Et [ (V (e r1 ; :::; reT +1 ))] E [ (V (e r1 ; :::; reT +1 ))]

Et

E

k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

k (e r1 ; :::; reT ) + g 0 aT +1 reT +1

= Ck

CV 0 :

where the …nal inequality comes from (49). Hence the cost of contract k is weakly greater than the cost of contract V 0 . This concludes the proof for T + 1. 3b). Optimality of Deterministic Contracts Consider a randomized contract Ve (r1 ; :::; rT ) and de…ne the “certainty equivalent”contract V by: h i e u V (r1 ; :::; rT ) ET u V (r1 ; :::; rT ) : (50) 37

We wish to apply Statement T (which we have already proven for deterministic contracts) to contract V , and so must verify that its three conditions are satis…ed. From the above de…nition, we obtain h

= E u Ve (e r1 ; :::; reT )

E u V (e r1 ; :::; reT )

i

u,

i.e., V satis…es the participation constraint (32). Hence, Condition (i) holds. Also, it is clear that Condition (iii) holds for V , given it holds for Ve . We thus need to show that Condition 0 1 (ii) is also satis…ed. Applying Jensen’s inequality to equation (50) and the h function u u i (which is convex since u exhibits NIARA) yields: u0 V (r1 ; :::; rT ) ET u0 Ve (r1 ; :::; rT ) . We apply this to rt = ret for t = 1:::T and take expectations to obtain h

Et u Ve (e r1 ; :::; reT ) 0

i

r1 ; :::; reT ) Et u0 V (e

(51)

:

Applying de…nition (50) to the left-hand side of (40) yields: d Et u V (e r1 ; :::; ret + "; :::; reT ) d"

j"=0

d Et u V (e r1 ; :::; ret + "; :::; reT ) d"

j"=0

h i r1 ; :::; ret ; :::; reT ) : g 0 (at ) Et u0 Ve (e

and using (51) yields:

g 0 (at ) Et u0 V (e r1 ; :::; ret ; :::; reT )

:

Condition (ii) of Statement T therefore holds for V . We can therefore apply Statement T to show that V 0 has a weakly lower cost than V . We next show that the cost of V is weakly less than the cost of Ve . Applying Jensen’s inequality to (50) and the convex function u 1 h i V (r1 ; :::; rT ) E Ve (r1 ; :::; rT ) . We apply this to rt = ret for t = 1:::T and yields: take expectations over the distribution of ret to obtain: V (e r1 ; :::; reT )

E

h

Ve (e r1 ; :::; reT )

i

:

Hence V has a weakly lower cost than Ve . Therefore, V 0 has a weakly lower cost than Ve . This proves the Statement for randomized contracts. 3c). Main Proof. Having proven Statement T , we now turn to the main proof of Theorem 1. The value of the signal on the equilibrium path is given by ret at + et . We de…ne u (x)

u x

T X s=1

We seek to use Statement

T

!

g (as ) :

(52)

applied to function u and random variable ret , and thus must 38

h i verify that its three conditions are satis…ed. Since E u Ve (e r1 ; :::; reT ) holds. The IC constraint for time t is: 0 2 arg max Et u Ve (a1 + e1 ; :::; at + et + "; :::; aT + eT ) 0 2 arg max Et u Ve (e r1 ; :::; ret + "; :::; reT ) "

X

g (at + ")

"

i.e.

u, Condition (i)

g (as ) ;

s=1:::T;s6=t

X

g (at + ")

!

g (as ) :

s=1:::T;s6=t

!

(53)

We note that, for a function f ("), 0 2 arg max" f (") implies that for all " < 0, (f (0) f (")) = ( ") 0. Call X (") the argument of u in 0, hence, taking the lim inf y"0 , we obtain d"d f 0 (")j"=0 d equation (53). Applying this result to (53), we …nd: d" Et u (X ("))j"=0 0. h i 0. Using Lemma 6, d"d X (")j"=0 = Using Lemma 5, we …nd Et u0 (X (0)) d"d X (")j"=0 d e V (e r1 ; :::; ret + "; :::; reT ) g 0 (at ), hence we obtain: d"

Et u0 (X (0))

d e V (e r1 ; :::; ret + "; :::; reT ) d"

g 0 (at )

0:

Using again Lemma 5, this can be rewritten: " d r1 ; :::; ret + "; :::; reT ) Et u Ve (e d"

X

!#

g 0 (at ) Et [u0 (X (0))] ;

g (as )

s=1:::T

j"=0

i.e., using the notation (52),

h i d Et u Ve (e r1 ; :::; ret + "; :::; reT ) d" j"=0

h i g 0 (at ) Et u0 Ve (e r1 ; :::; ret ; :::; reT ) :

Therefore, Condition (ii) of Statement T holds. Finally, we verify Condition (iii). Apply (53) to signal rt and deviation " < 0. We obtain: "

X

Et u Ve (e r1 ; :::; ret + "; :::; reT )

g (as )

s=1:::T

"

Et u Ve (e r1 ; :::; ret + "; :::; reT ) "

Et u Ve (r1 ; :::; rt + "; :::; rT )

!#

g (at + ")

s=1:::T;s6=t

g (at )

X

s=1:::T;s6=t

so Condition (iii) holds for contract Ve and utility function u. 39

X

!#

g (as ) !#

g (as )

;

We can now apply Statement T to contract Ve and function u, to prove that any globally P IC contract is weakly costlier than contract V 0 = Tt=1 g 0 (at ) rt + K. Moreover, it is clear that V 0 satis…es the global IC conditions in equation (53). Thus, V 0 is the cheapest contract that satis…es the global IC constraint. Proof of Proposition 1 Conditionally on ( t )t T +1 , we must have: aT +1 2 arg max u V a1 + aT +1

1 ; :::; aT +1

+

g (aT +1 )

T +1

X

!

g (at ) :

t6=T +1

Using the proof of Theorem 1 with T = 1, this implies that, for rT +1 in the interior of the support of reT +1 (given (rt )t T ), V (r1 ; :::; rT +1 ) can be written: V (r1 ; :::; rT +1 ) = KT (r1 ; :::; rT ) + g 0 aT +1 rT +1 ;

for some function KT (r1 ; :::; rT ). Next, consider the problem of implementing action aT at time T . We require that, for all ( t )t T , "

aT 2 arg max ET u KT (a1 + aT

1 ; :::; aT

+

T)

+ g 0 aT +1

T +1

+ aT +1

g (aT )

X t6=T

!#

g (at )

This can be rewritten aT 2 arg max u b (KT (a1 +

1 ; :::; aT

aT

+

T)

g (aT )) ;

i h P ) j ; :::; g (a where u b (x) E u x + g 0 aT +1 + a 1 T . T +1 t T +1 t6=T Using the same arguments as above for T + 1, that implies that, for rT in the interior of the support of reT (given (rt )t T 1 ) we can write: KT (r1 ; :::; rT ) = KT

for some function KT we can write, for (rt )t

1

(r1 ; :::; rT

1)

+ g 0 (aT ) rT

(r1 ; :::; rT 1). Proceeding by induction, we see that this implies that rt )t T +1 , T +1 in the interior of the support of (e

1

VT +1 (r1 ; :::; rT +1 ) =

T +1 X

g 0 (at ) rt + K0 ;

t=1

for some constant K0 . This yields the “necessary”…rst part of the Proposition. The converse part of the Proposition is immediate. Given the proposed contract, the agent

40

:

faces the decision: "

max E u

(at )t

T

T X

g 0 (at ) at

g (at ) +

T X t=1

t=1

g 0 (at )

t

!#

;

which is maximized pointwise when g 0 (at ) at g (at ) is maximized. This in turn requires at = at . Proof of Theorem 2 We shall use the following purely mathematical Lemma, proven in the Online Appendix. Lemma 7 Consider a standard Brownian process Zt with …ltration Ft , a deterministic nonRT RT negative process t , an Ft adapted process t , T 0, X = 0 t dZt , and Y = 0 t dZt . Suppose that almost surely, 8t 2 [0; T ], t t . Then X second-order stochastically dominates Y. Lemma 7 is intuitive: since t 0, it makes sense that Y is more volatile than X. t To derive the IC constraint, we use the methodology introduced by Sannikov (2008). We RT observe that the term 0 t dt induces a constant shift, so w.l.o.g we can assume t = 0 8 t. For an arbitrary adapted policy function a = (at )t2[0;T ] , let Qa denote the probability meaRt sures induced by a. Then, Zta = 0 (drs as ds) = s is a Brownian motion under Qa , and Rt Zta = 0 (drs as ds) = s is a Brownian under Qa , where a is the policy (at )t2[0;T ] : Rt Recall that, if the agent exerts policy a , then rt = 0 as ds + s dZs . We de…ne vT = v (c). By the martingale representation theorem (Karatzas and Shreve (1991), p. 182) applied to RT process vt = Et [vT ] for t 2 [0; T ], we can write: vT = 0 t (drt at dt) + v0 for some constant v0 and a process t adapted to the …ltration induced by (rs )s t . We proceed in two steps. 1) We show that policy a is optimal for the agent if and only if, for almost all t 2 [0; T ]: at 2 arg max t at at

g (at ) :

(54)

To prove this claim, consider another action policy (at ), adapted to the …ltration induced RT by (Zs )s t . Consider the value W = vT g (at ) dt, so that the …nal utility for the agent 0 RT under policy a is u (W ). De…ning L [ t at g (at ) t at + g (at )] dt, it can be rewritten 0 W = v0 +

Z

T

t (drt

at dt)

0

Z

T

g (at ) dt + L:

0

Suppose that (54) is not veri…ed on the set of times with positive measure. Then, consider a policy a such that t at g (at ) > t at g (at ) for t 2 , and at = at on [0; T ] n . We thus

41

have L > 0. Consider the agent’s utility under policy a: a

U =E

a

Z

u vT

= E a u v0 +

a

a t t dZt

0

> E a u v0 +

T a t t dZt

0

=E

a

u v0 +

Z

T a t t dZt

T

g (at ) dt + L

0

T

g (at ) dt

= Ua ;

0

0

0

Z

T

at dt) = E u v0 + t (drt 0 Z T g (at ) dt + L 0 Z T since L > 0 g (at ) dt 0 Z Z T a g (at ) dt =E u vT

g (at ) dt

0 Z T

Z

Z

T

where U a is the agent’s utility under policy a . Hence, as U a > U a , the IC condition is violated. We conclude that condition (54) is necessary for the contract to satisfy the IC condition. We next show that condition (54) is also su¢ cient to satisfy the IC condition. Indeed, consider any adapted policy a. Then, L 0. So, the above reasoning shows that U a Ua . Policy a is at least as good as any alternative strategy a. 2) We show that cost-minimization entails t = g 0 (at ). (54) implies t = g 0 (at ) if at 2 (a; a), and t g 0 (a ) if at = a. The case where at 2 (a; a) 8 t is straightforward. The IC contract must have the form: v (cT ) = v0 +

Z

Z

T 0

g (at ) (drt

at dt) =

T

g 0 (at ) drt + K;

0

0

RT where K = v0 + 0 g 0 (at ) at dt. Cost minimization entails the lowest possible v0 . The case where at = a for some t is more complex, since the IC constraint is only an g 0 (at ). We must therefore prove this inequality binds. Consider inequality: t t X=

Z

T t

t dzt ,

Y =

0

Z

T t t dzt :

0

RT RT By reshifting u (x) ! u x g (at ) dt if necessary, we can assume 0 g (at ) dt = 0 to 0 simplify notation. We wish to show that a contract vT = Y + KY , with E [u (Y + KY )] u, has a weakly greater expected cost than a contract v = X + KX , with E [u (X + KX )] = u. Lemma 7 implies that E [u (X + KX )] E [u (Y + KX )], and so E [u (Y + KX )] Thus, KX

E [u (X + KX )] = u

KY . Since v is increasing and concave, v

42

1

[u (Y + KY )] : is convex and

v

1

is concave. We

can therefore apply Lemma 7 to function E v

1

(X + KX )

v

E v

1

1

to yield:

(Y + KX )

E v

1

(Y + KY ) ;

where the second inequality follows from KX KY . Therefore, the expected cost of v = X +KX is weakly less that of Y + KY , and so contract v = X + KX is cost-minimizing. More explicitly, RT that is the contract (22) with K = KX + 0 g 0 (at ) at dt.

Proof of Proposition 2 The proof is by induction. Proof of Proposition 2 for T = 1. We remove time subscripts and let V (b) = v (C (b)) denote the felicity received by the agent if he announces b and signal A (b) + b is revealed. If the agent reports , the principal expects to see signal + A ( ). Therefore, if the agent deviates to report b 6= , he must take action a such that +a = b+A (b), i.e. a = A (b)+ b . Hence, the truth-telling constraint is: 8 ; 8b, V (b) De…ning

g (A (b) + b ( )

)

V( )

V( )

(55)

g (A ( )) :

g (A ( )) ;

the truth-telling constraint (55) can be rewritten, g (A (b))

g (A (b) + b

g (A (b) + b

( )

(56)

(b) :

and b and combining with the original inequality (56)

Rewriting this inequality interchanging yields: 8 ; 8b : g (A (b))

)

)

( )

(b)

g (A ( ) +

b)

g (A ( )) :

(57)

Consider a point where A is continuous and take b < . Dividing (57) by b > 0 and 0 0 taking the limit b " yields lef t ( ) = g (A ( )). Next, consider b > . Dividing (57) by 0 b < 0 and taking the limit b # yields right ( ) = g 0 (A ( )). Hence, 0

( ) = g 0 (A ( )) ;

(58)

at all points where A is continuous. Equation (58) holds only almost everywhere, since we have only assumed that A is almost everywhere continuous. To complete the proof, we require a regularity argument about (otherwise might jump, for instance). We will show that is absolutely continuous (see, e.g., Rudin (1987), p.145). Consider a compact subinterval I, and aI = sup fA ( ) + b j ; b 2 Ig, which 43

is …nite because A is assumed to be bounded in any compact subinterval of . Then, equation (57) implies: j ( )

(b)j

max fjg (A (b))

g (A (b) + b

)j ; g (A ( ) +

b)

g (A ( ))g

j

bj (sup g 0 )I .

This implies that is absolutely continuous on I. Therefore, by the fundamental theorem of calculus for almost everywhere di¤erentiable functions (Rudin (1987), p.148), we have that R 0 R 0 g (A (x)) dx, i.e. (x) dx. From (58), ( ) = ( ) + for any ; , ( ) = ( ) + V ( ) = g (A ( )) +

Z

g 0 (A (x)) dx + k

(59)

with k = ( ). This concludes the derivation of the contract when T = 1. “Second-order conditions.”We next show that the contract (59) does implement e¤ort A ( ), i¤ A ( ) + is nondecreasing: we have veri…ed the …rst order condition, but we need to show that (55) holds given the proposed contract, that is, that (b) V (b) g (A (b) + b ) has a maximum at . Proof that A ( ) + nondecreasing is a su¢ cient condition for the contract to implement the action. First, we do this when A ( ) is a C 1 function. Then, 0

(b) = V 0 (b)

g 0 (A (b) + b

= [g 0 (A (b))

g 0 (A (b) + b

) (A0 (b) + 1) )] (A0 (b) + 1)

As A0 (b) + 1 0 and g is convex, we have 0 (b) 0 for b and 0 (b) 0 for b . That shows that (b) is maximized at b = . Second, in the case where A is not necessarily C 1 , we approximate the weakly increasing function A ( ) + by a series of C 1 weakly increasing functions An ( ) + . (It is well-known that this is easy to do by convolution: take a random variable " with bounded support and R C 1 density f , and de…ne An ( ) + = E A + n" + + n" = (A (x) + x) f (n (x )) ndx 1 which increasing in by the …rst equality, and C by the second.) Consider the associated contract Vn ! V . We have seen that 2 arg max b Vn (b) g (An (b) + b ), so in the limit, 2 arg max b V (b) g (A (b) + b ). Proof that A ( ) + nondecreasing is a necessary condition. Call R ( ) = A ( ) + . Suppose by contradiction that there are two points < 0 such that R ( ) > R ( 0 ). Those two points can be taken arbitrarily close (indeed, consider a large N , the points i = + ( 0 ) i=N , i = 0:::N ; there must be an i such that R ( i ) > R ( i+1 ), otherwise we would have R ( ) = 0 R ( 0 ) R ( N ) = R ( 0 )). As domain A of actions is open, that implies that A ( ) + 2 A. Applying (55) at point and 0 , we have: V ( 0 ) g (A ( 0 ) +

0

)

V ( ) g (A ( )) and V ( ) g (A ( ) + 44

0

)

V ( 0 ) g (A ( 0 )) )

g (A ( 0 )) Calling y g (x + h)

0

g (A ( ) +

V ( 0)

)

g (A ( 0 ) +

V( )

0

)

g (A ( ))

0 A ( )+ < x A ( ) and h = A ( 0 )+ 0 A ( ) , this writes g (y + h) g (y) g (x), and we have a contradiction if g is strictly convex.

Proof that if Proposition 2 holds for T , it holds for T + 1. This part of the proof is as the proof of Theorem 1 in the main text. At t = T + 1, if the agent reports bT +1 , he must take action a = A (bT +1 ) + bT +1 T +1 so that the signal a + T +1 is consistent with declaring bT +1 . The IC constraint is therefore: T +1

2 arg max V ( 1 ; :::; bT +1

T ; bT +1 )

g (A (bT +1 ) + bT +1

Applying the result for T = 1, to induce bT +1 = V ( 1 ; :::;

T ; bT +1 )

T +1 ,

T +1 )

T X

g (at ) :

(60)

t=1

the contract must be of the form:

= WT +1 (bT +1 ) + k ( 1 ; :::;

T);

(61)

Rb where WT +1 (bT +1 ) = g (A (bT +1 ))+ T +1 g 0 (A (x)) dx and k ( 1 ; :::; T ) is the “constant”viewed from period T + 1. In turn, k ( 1 ; :::; T ) must be chosen to implement bt = t 8t = 1:::T , viewed from time 0, when the agent’s utility is: "

E u k ( 1 ; :::;

T)

T X

+ WT +1 (bT +1 )

!#

g (at )

t=1

:

De…ning u b (x) = E [u (x + WT +1 (eT +1 ))] ;

(62)

the principal’s problem i 1:::T , with a contract k ( 1 ; :::; T ), given h is to implementPb = t 8t = T . Applying the result for T , we see that k a utility function E u b k ( 1 ; :::; T ) t=1 g (at ) must be: T T Z t X X k ( 1 ; :::; T ) = g (At ( t )) + g 0 (At (x)) dx + k t=1

t=1

for some constant k . Combining this with (59), the only incentive compatible contract is: V ( 1 ; :::;

T ; T +1 )

=

T +1 X

g (At ( t )) +

t=1

T +1 Z X

t

g 0 (At (x)) dx + k :

t=1

The treatment of the second-order conditions (At ( t ) + case.

t

nondecreasing) is as in the T = 1

Proof of Proposition 3 Step 1. It is easier to work in terms of Q ( ) = g 0 (A ( )), the marginal cost of ef45

fort associated with plan A ( ). With a slight abuse of notation, de…ne C [Q] as the expected cost of implementing plan Q = fQ ( )g. From Proposition 2 with T = 1, c ( ; Q) = R R v 1 g (g 0 ) 1 (Q ( )) + 0 Q (x) dx + K , where K is the solution of E u 0 Q (x) dx + K = u. Then, the expected cost is: C [Q] = E [c ( ; Q)] : We …rst establish that the contract cost C [Q] is convex in the plan Q. Consider two plans 1 Q and Q2 , 1 + 2 = 1 with 1 ; 2 2 [0; 1], and the plan Q de…ned by Q ( ) = 1 Q1 ( )+ 2 Q2 ( ). Since u is concave, Z E u

Q (x) dx +

1 K1

+

2 K2

u

0

so the constant K associated with the new plan satis…es K 1 K1 + 2 K2 . This shows that the 1 function K [Q] is convex in Q. Since g (g 0 ) and v 1 are convex, C [Q] 1 C [Q1 ] + 2 C [Q2 ], i.e., C is convex. Step 2. Since C is convex, we have: h i C Q

C [Q]

h i Z @C Q @Q ( )

Furthermore, since g 0 is convex, Q Q ( ) g 00 a a h i R we have C A C [A] a; a a( ) d :

C

Q

Q( ) d :

A ( ) . De…ning

a;

= max 0;

h i @C A @A( )

A Microfoundation for the Principal’s Objective

We o¤er a microfoundation for the principal’s objective function (26). Suppose that the agent 0. can take two actions, a “fundamental” action aF 2 (a; a] and a manipulative action m F F Firm value is a function of a only, i.e. the bene…t function is b a ; . The signal is increasing in both actions: r = aF + m + . The agent’s utility is v (c) g F (a) + G (m) , where g, G are increasing and convex, G (0) = 0, and G0 (0) g 0 a . The …nal assumption means that manipulation is costlier than fundamental e¤ort. We de…ne a = aF +m and the cost function g (a) = minaF ;M g F (a) + G (m) j aF + m = a , so that g (a) = g F (a) for a 2 (a; a] and g (a) = g F (a) + g (m a) for a a, which is increasing and convex. Then, …rm value can be written b min a; a ; e , as in equation (26). This framework is consistent with rational expectations. Suppose b aF ; = eaF + . After observing the signal r, the market forms its expectation P1 of the …rm value b aF ; . The incentive contract described in Proposition 2 implements a a, so the agent will not engage in manipulation. Therefore, the rational expectations price is P1 = er . In more technical terms, consider the game in which the agent takes action a and the market sets price P1 after observing signal r. It is a Bayesian Nash equilibrium for the agent to choose A ( ) and for the market to set price P1 = er . 46

,

References [1] Arnott, R. and J. Stiglitz (1988): “Randomization with Asymmetric Information.” RAND Journal of Economics 19, 344-362 [2] Baker, G. (1992): “Incentive Contracts and Performance Measurement.” Journal of Political Economy 100, 598-614 [3] Bennedsen, M., F. Perez-Gonzalez and D. Wolfenzon (2009): “Do CEOs Matter?”Working Paper, Copenhagen Business School [4] Biais, B., T. Mariotti, G. Plantin and J.-C. Rochet (2007): “Dynamic Security Design: Convergence to Continuous Time and Asset Pricing Implications.” Review of Economic Studies 74, 345-390 [5] Biais, B., T. Mariotti, J.-C. Rochet and S. Villeneuve (2009): “Large Risks, Limited Liability and Dynamic Moral Hazard.”Working Paper, Université de Toulouse [6] Caplin, A. and B. Nalebuff (1991): “Aggregation and Social Choice: A Mean Voter Theorem.”Econometrica 59, 1-23 [7] Cooley, T. and E. Prescott (2005): “Economic Growth and Business Cycles,” in “Frontiers in Business Cycle Research,”T. Cooley ed., Princeton University Press, Princeton [8] DeMarzo, P. and M. Fishman (2007): “Optimal Long-Term Financial Contracting.” Review of Financial Studies 20, 2079-2127 [9] DeMarzo, P. and Y. Sannikov (2006): “Optimal Security Design and Dynamic Capital Structure in a Continuous-Time Agency Model.”Journal of Finance 61, 2681-2724 [10] Dittmann, I., and E. Maug (2007): “Lower Salaries and No Options? On the Optimal Structure of Executive Pay.”Journal of Finance 62, 303-343 [11] Dittmann, I., E. Maug and O. Spalt (2009): “Sticks or Carrots? Optimal CEO Compensation when Managers are Loss-Averse.”Journal of Finance, forthcoming [12] Edmans, A., X. Gabaix and A. Landier (2009): “A Multiplicative Model of Optimal CEO Incentives in Market Equilibrium.”Review of Financial Studies, forthcoming [13] Edmans, A., X. Gabaix, T. Sadzik and Y. Sannikov (2009): “Dynamic Incentive Accounts.”NBER Working Paper No. 15324. [14] Farhi, E. and I. Werning (2009): “Capital Taxation: Quantitative Explorations of the Inverse Euler Equation.”Working Paper, Harvard University. 47

[15] Golosov, M., N. Kocherlakota and A. Tsyvinski (2003): “Optimal Indirect Capital Taxation.”Review of Economic Studies, 70, 569–587. [16] Grossman, S. and O. Hart (1983): “An Analysis of the Principal-Agent Problem.” Econometrica 51, 7-45 [17] Hall, B. and J. Liebman (1998): “Are CEOs Really Paid Like Bureaucrats?”Quarterly Journal of Economics, 113, 653-691 [18] Hall, B. and K. Murphy (2002): “Stock Options for Undiversi…ed Executives.”Journal of Accounting and Economics 33, 3-42 [19] Harris, M. and A. Raviv (1979): “Optimal Incentive Contracts With Imperfect Information.”Journal of Economic Theory 20, 231-259 [20] He, Z. (2009a): “Optimal Executive Compensation when Firm Size Follows Geometric Brownian Motion.”Review of Financial Studies 22, 859-892. [21] He, Z. (2009b): “Dynamic Compensation Contracts with Private Savings.” Working Paper, University of Chicago [22] Hellwig, M. (2007): “The Role of Boundary Solutions in Principal-Agent Problems of the Holmstrom-Milgrom Type.”Journal of Economic Theory 136, 446-475 [23] Hellwig, M. and K. Schmidt (2002): “Discrete-Time Approximations of the Holmstrom-Milgrom Brownian-Motion Model of Intertemporal Incentive Provision.” Econometrica 70, 2225-2264 [24] Hemmer, T. (2004): “Lessons Lost in Linearity: A Critical Assessment of the General Usefulness of LEN Models in Compensation Research.”Journal of Management Accounting Research 16, 149-162 [25] Holmstrom, B. and P. Milgrom (1987): “Aggregation and Linearity in the Provision of Intertemporal Incentives.”Econometrica 55, 308-328 [26] Jewitt, I. (1988): “Justifying the First-Order Approach to Principal-Agent Problems.” Econometrica 56, 1177-1190 [27] Karatzas, I. and S. E. Shreve (1991): Brownian Motion and Stochastic Calculus, 2nd edition, Springer Verlag [28] Kremer, m. (1993): “The O-Ring Theory of Economic Development.”Quarterly Journal of Economics 108, 551-576. [29] Krishna, V. and E. Maenner (2001): “Convex Potentials with an Application to Mechanism Design.”Econometrica 69,1113-1119 48

[30] Lacker, J. and J. Weinberg (1989): “Optimal Contracts under Costly State Falsi…cation.”Journal of Political Economy 97, 1345-1363 [31] Laffont, J.-J. and D. Martimort (2002): “The Theory of Incentives: The PrincipalAgent Model.”Princeton University Press, Princeton. [32] Landsberger, M. and I. Meilijson (1994): “The Generating Process and an Extension of Jewitt’s Location Independent Risk Concept.” Management Science 40, 662-669 [33] Mirrlees, J. (1974): “Notes on Welfare Economics, Information and Uncertainty” in Michael Balch, Daniel McFadden, and Shih-Yen Wu, eds., Essays on Economic Behavior under Uncertainty, North-Holland, Amsterdam. [34] Mueller, H. (2000): “Asymptotic E¢ ciency in Dynamic Principal-Agent Problems.” Journal of Economic Theory 91, 292-301 [35] Murphy, K. (1999): “Executive Compensation” in Orley Ashenfelter and David Card, eds., Handbook of Labor Economics, Vol. 3b. New York and Oxford: Elsevier/NorthHolland, 2485-2563 [36] Ou-Yang, H. (2003): “Optimal Contracts in a Continuous-Time Delegated Portfolio Management Problem.”Review of Financial Studies 16, 173-208 [37] Phelan, C. and R. Townsend (1991): “Private Information and Aggregate Behaviour: Computing Multi-Period, Information-Constrained Optima.”Review of Economic Studies 58, 853-881 [38] Prendergast, C. (2002): “The Tenuous Trade-O¤ between Risk and Incentives.”Journal of Political Economy, 110, 1071-102 [39] Rogerson, W. (1985): “The First Order Approach to Principal-Agent Problems.” Econometrica 53, 1357-1368 [40] Rudin, W. (1987): Real and Complex Analysis, 3rd edition, McGraw-Hill [41] Sannikov, Y. (2008): “A Continuous-Time Version of the Principal-Agent Problem.” Review of Economic Studies, 75, 957-984 [42] Sappington, D. (1983): “Limited Liability Contracts Between Principal and Agent.” Journal of Economic Theory 29, 1-21 [43] Schaettler, H. and J. Sung (1993): “The First-Order Approach to the ContinuousTime Principal-Agent Problem With Exponential Utility.” Journal of Economic Theory 61, 331-371

49

[44] Shaked, M. and G. Shanthikumar (2007): Stochastic Orders, Springer Verlag [45] Spear, S. and S. Srivastava (1987): “On Repeated Moral Hazard With Discounting.” Review of Economic Studies 54, 599-617 [46] Sung, J. (1995): “Linearity with Project Selection and Controllable Di¤usion Rate in Continuous-Time Principal-Agent Problems.” RAND Journal of Economics 26, 720-743

50

Online Appendix for “Tractability in Incentive Contracting” Alex Edmans and Xavier Gabaix November 9, 2009

D

Multidimensional Signal and Action

While the core model involves a single signal and action, this section shows that our contract is robust to a setting of multidimensional signals and actions. For brevity, we only analyze the discrete-time one-period case, since the continuous time extension is similar. The agent now takes a multidimensional action a 2 A, which is a compact subset of RI for some integer I. (Note that in this section, bold font has a di¤erent usage than in the proof of Theorem 1.) The signal is also multidimensional: r = b (a) + ; where ; r 2 RS , and b:A 2 RI !RS . The signal and action can be of di¤erent dimensions. In the core model, S = I = 1 and b(a) = a. As before, the contract is c (r) and the indirect felicity function is V (r) = v (c (r)). The following Proposition states the optimal contract. Proposition 5 (Optimal contract, discrete time, multidimensional signal and action). De…ne @b the I S matrix L = b0 (a )> i.e. explicitly Lij = @aij (a1 ; :::; aI ), and assume that there is a vector 2 RS such that L = g 0 (a ) ; (63) i.e., explicitly: 8i = 1:::I;

S X @bj j=1

@ai

(a1 ; :::; aI )

j

=

@g (a ; :::; aI ) : @ai 1

The following contract is optimal. The agent is paid c (r) = v

1

(64)

( r + K (r)) ;

PS i.e., explicitly, c (r) = v 1 j=1 i ri + K (r1 ; :::; rn ) , where the function K ( ) is the solution of the following optimization problem: min E [K (b (a ) + )] subject to K( )

8r; LK 0 (r) = 0 E [u ( (b (a ) + ) + K (b (a ) + )

51

(65) g (a ))]

u.

Proof. Here we derive the …rst-order condition; the remainder of the proof is as in Theorem 1 of the main paper. Incentive compatibility requires that, for all a 2 arg max V (b (a) + ) a

g (a) ,

and so: V 0 (b (a ) + ) b0 (a )

g 0 (a ) = 0;

(66)

where V 0 is a S dimensional vector, b0 (a ) is a S I matrix, and g 0 (a ) is a I dimensional P vector. Integrating (66) gives: V (r) = r + K (r), where r = Si=1 i ri , and LK 0 (r) = 0. Note that K(r) is now a function and so determined by solving an optimization problem. In the core model, K is a constant and determined by solving an equality. We now analyze two speci…c applications of this extension. Two signals. The agent takes a single action, but there are two signals of performance: r1 = a + "1 ;

r2 = a + "2 :

In this case, L = (1 1). Therefore, with = ( 1 ; 2 ) 2 R2 , (63) becomes: 1 + 2 = g 0 (a ). For example, we can take 1 = 2 = g 0 (a ) =2. Next, (65) becomes: @[email protected] + @[email protected] = 0. It is well known that this can be integrated into: K (r1 ; r2 ) = k (r1 r2 ) for a function k. Hence, the optimal contract can be written: c=v

1

g 0 (a )

r1 + r2 2

+ k (r1

r2 ) ;

where the function k( ) is chosen to minimize the cost of the contract subject to the participation constraint. As in Holmstrom (1979), all informative signals should be used to determine the agent’s compensation. Relative performance evaluation. Again, there is a single action and two signals, but the second signal is independent of the agent’s action, as in Holmstrom (1982): r 1 = a + "1 ;

r2 = "2

In this case, L = (1 0). Therefore, with = ( 1 ; 2 ) 2 R2 , (63) becomes: 1 = g 0 (a ). Next, (65) becomes: @[email protected] = 0, so that K (r1 ; r2 ) = k (r2 ) for a function k. Hence, the optimal contract can be written: c = v 1 (g 0 (a ) r1 + k (r2 )) : The second signal enters the contract even though it is una¤ected by the agent’s action, since it may be correlated with the noise in the …rst signal.

52

E E.1 E.1.1

Extension to The Optimal E¤ort Level Illustrations for Proposition 2 A¢ ne Cost of E¤ort

While Theorem 3 shows that A( ) = a is optimal when Proposition 3 is satis…ed, we now show that A( ) can be exactly derived even if Theorem 3 does not hold and the maximum e¤ort principle does not apply, if the cost function is linear –i.e. g (a) = a, where > 0.24 We use the bene…t function b (a; ) = Sb (a; ) as in Section 3.2. Proposition 6 (Optimal contract with linear cost of e¤ort). Let g (a) = a, where following contract is optimal: c = v 1 ( r + K) ;

> 0 . The (67)

where K is a constant that makes the participation constraint bind (E [u ( + K)] = u). For each , the optimal e¤ort A ( ) is determined by the following pointwise maximization: A ( ) 2 arg max Sb (a; )

v

1

( (a + ) + K) :

(68)

a a

When the agent is indi¤erent between an action a and A ( ), we assume that he chooses action A( ): Proof. From Proposition 2, if the agent announces , he should receive a felicity of V ( ) = R dx + K = (A ( ) + ) + K. Since r = A ( ) + on the equilibrium path, a g (A ( )) + contract c = v 1 ( r + K) will implement A ( ). To …nd the optimal action, the principal’s problem is: max E Sb min A ( ) ; a ; E v 1 ( (A ( ) + ) + K) A( )

which is solved by pointwise maximization, as in (68). The main advantage of the above contract is that it can be exactly solved regardless of S and so it is applicable even for small …rms (or rank-and-…le employees who a¤ect a small output). For instance, consider a bene…t function b (a; ) = b0 + ae , where b0 > 0, so that the marginal productivity of e¤ort is increasing in the noise, and utility function u (ln c a) with 2 (0; 1). Then, the solution of (68) is: A ( ) = min

1

+

1

(ln S

K

ln ) ; a :

The optimal e¤ort level increases linearly with the noise, until it reaches a. The e¤ort level is also weakly increasing in …rm size. 24

Note that the linearity of g(a) is still compatible with u (v (c) g (a)) being strictly concave in (c; a). Also, by a simple change of notation, the results extend to an a¢ ne rather than linear g (a).

53

Note that, with a linear rather than strictly convex cost function, the agent is indi¤erent between all actions. His decision problem is maxa v (c (r)) g (a), i.e. maxa ( + a) + K a, which is independent of a and thus has a continuum of solutions. As in, e.g., Grossman and Hart (1983), Proposition 6 therefore assumes that indeterminacies are resolved by the agent following the principal’s recommended action, A ( ). E.1.2

Exponential u and Linear v

We continue to assume that the maximum e¤ort principle does not apply, and now consider the case where consider the HM assumptions of exponential utility and a pecuniary cost of e¤ort, but do not impose Gaussian noise nor continuous time. We show that, as in HM, the same action function At ( t ) is optimal in each period t. However, unlike in HM, At ( t ) is not a constant independent of t . The intuition is that, if noise is low, the optimal contract may wish to reduce the required e¤ort level to cushion the e¤ect of low noise on the agent’s utility. Proposition 7 (Constant target action, exponential utility and pecuniary cost of e¤ort). Suppose the agent has a CARA utility function u(x) = e x and a linear felicity function v(x) = x, and suppose the bene…t of e¤ort in each period is a weakly concave function b(a). Then, the optimal contract prescribes the same (possibly noise-dependent) action A( ) in each period. Proof. Take an optimal contract specifying actions A1 ( 1 ) ; : : : ; AT ( 1 ; : : : ; T ), and compensation C ( 1 ; : : : ; T ). Start with period t = T . The optimality of the contract implies that for all ( 1 ; : : : ; T 1 ), the choice of target action and compensation solve the optimization problem max E s:t: E

T

T

T

[b (AT ( 1 ; : : : ;

2 arg max b e

fC(

1 ;:::; T

fC(

e

1 ;:::; T

T 1 ; T ))

1; T )

g(A(

C ( 1; : : : ; 1 ;bT )

1 ;:::; T

g(A(

1 ; T ))g

T 1 ; T )]

1 ;:::; T

1 ;bT )+bT

= u ( 1; : : : ;

C ( 1; : : : ;

T ) = g (A ( 1 ; : : : ;

T )) +

;

T 1) :

By Proposition 2, the cost of compensation for a given action AT ( 1 ; : : : ; Z

T )g

T)

is minimized by

T

AT ( 1 ; : : : ;

T 1 ; x) dx

+ K ( 1; : : : ;

T 1) ;

so the principal solves a collection of problems max E

A( );K

s.t. E

T

T

h

b (A ( e

R

T )) T

A(x)dx

g (A ( i K

T ))

=u

Z

T

A (x) dx

1; : : : ;

T 1

K

(69) (70)

for (possibly) varying u ( 1 ; : : : ; T 1 ). By concavity, the solutions of these problems for each u ( 1 ; : : : ; T 1 ) are unique. Moreover, this uniqueness implies that the solutions for di¤erent 54

values of u ( 1 ; : : : ; T 1 ) may di¤er only in the constant K. Therefore, the optimal target action AT ( 1 ; : : : ; T 1 ; T ) does not depend on 1 ; : : : ; T 1 . Now, since AT 1 is the only action that can depend on T 1 , the above argument can be repeated for t = T 1; : : : ; 1. Hence, the optimal pro…le of actions A1 ( 1 ) ; : : : ; AT ( 1 ; : : : ; T ) consists of repeating the same target action A( ), which is the unique solution of the problem (69)–(70). R U ; . Let y( ) = a(x) dx + K. Example A. Suppose b(x) = Bx, g(x) = 12 Gx2 , Then, the optimal target action is the solution of max

a( );y( )

s.t.

Z

Z

1 Ga (x)2 2

Ba (x) e

y(x)

y (x) dx

= u;

y 0 (x) = a (x) : The Lagrangian of this problem is L= =

Z Z

Ba (x)

y (x)

1 Ga (x)2 2

e

y(x)

+ (x) (a (x)

y 0 (x)) dx

Ba (x)

y (x)

1 Ga (x)2 2

e

y(x)

+ (x) a (x) +

0

( )y( ) +

y

(x) y (x) dx

;

where is the multiplier attached to the reservation utility constraint, and (x) is the multiplier for the equation linking y (x) and a (x). Note that L is concave in a (x) and y (x). The …rst-order conditions are @L : B Ga (x) + (x) = 0; @a(x) @L : @y(x) @L ; @L @y ( ) @y( )

1+

e

:

y(x)

=

+

0

(x) = 0;

( ) = 0:

Substituting the …rst equality into the second we get 1+

e

y(x)

+ Ga0 (x) = 0:

Rearranging and taking a logarithm gives ln (

)

y (x) = ln (1

55

Ga0 (x)) :

Di¤erentiating the last equality gives y 0 (x) =

G

a00 (x) ; 1 Ga0 (x)

which can be simpli…ed into a00 (x) = a (x) (1

Ga0 (x)) =G:

So, the optimal action satis…es a second-order ODE with the boundary conditions = a ( ) = B=G;

a

and indeed does not depend on the reservation utility u. Example B. Take the same functions, b(x) = Bx, g(x) = 21 Gx2 and suppose that the noise is Gaussian, N (0; 2 ). We will be solving the optimization problem on the interval [ z; z], and then take the limit as z ! 1. Similar to Example A, the Lagrangian of the problem is L=

Z

z

Ba (x)

1 Ga (x)2 2

y (x)

z

(z) y ( ) + ( z) y

x

y(x)

e

+ (x) a (x) +

0

(x) y (x) dx

;

and the …rst-order conditions are @L : @a(x) @L : @y(x) @L ; @L @y( z) @y(z)

(B 1+ :

x

Ga (x)) e

+ (x) = 0; x

y(x)

( z) =

+

1 d ( (x= )) = (x= ) dx

1+

e

y(x)

+ Ga0 (x)

1 2

(x) = 0;

(z) = 0:

Substituting the …rst equality into the second to eliminate

we obtain

0

(x), and taking note that x 2

;

x (Ga (x)

B) = 0:

Rearranging and taking a logarithm gives ln (

)

y (x) = ln 1

Ga0 (x) +

56

1 2

x (Ga (x)

B) :

Di¤erentiating, taking note that y 0 (x) = a (x), and rearranging yields the following: the optimal action is the limit as z ! 1 of the solutions of (

E.2

a00 (x) = a (x)

1 G

a0 (x) + x2 a (x) B + x2 a0 (x) + G a ( z) = a (z) = B=G:

1 2

a (x)

B G

Conditions for Maximum E¤ort Principle

Section 3.2 showed that the condition in Theorem 3, 8 ; 8a

a; @1 b (a ( ) ; ) f ( )

a;

required for the maximum e¤ort principle to hold, is satis…ed if …rm size S is su¢ ciently large. This extension considers other cases in which the above condition is satis…ed, and shows su¢ cient conditions for the function a; . By Proposition 2, the optimal contract is: c( ) = v where L ( ) = cost is:

R

g 0 (a (x)) dx;

1

is an arbitrary constant in the support of . The contract’s

C [A] = E v Then we can take a; 25 lowing expression.

(g (a ( )) + L ( ) + K) ;

1

(g (a ( )) + L ( ) + K) :

= max (0; @C [A] [email protected] ( )), where @C [A] [email protected] ( ) is given by the fol-

Proposition 8 Assume that sup f ( ) < 1. For an e¤ort pro…le a ( ) + conditions of Proposition 2, the marginal cost of implementing e¤ort a ( ) is: @C [A] g 0 (a ( )) = 0 f ( )+ @a ( ) v (c ( )) 1 g 00 (a ( )) E 0 1 e> v (c (e))

satisfying the

(71) E

1 E [u0 (L (e) + K) 1 e> ] v 0 (c (e)) E [u0 (L (e) + K)]

:

where the expectation is taken over e. 0

)) The …rst term in (71), gv0(a( f ( ) ; is the “local” compensating di¤erential for inducing (c( )) greater e¤ort. Indeed, consider making the agent work a more at point e. Let c denote the 25

The proof is thus. Note that K satis…es u = E [u (L ( ) + K)]. For simplicity, we assume we can just consider a lower ). Using @L ( 0 ) [email protected] ( ) = 1 0 > g 00 (a ( )), we have: @K = @a ( )

E [u0 (L ( 0 ) + K) 1 0 > ] 00 g (a ( )) E [u0 (L ( 0 ) + K)]

which implies (71).

57

) : v 0 (c ( )) v (c ( ))

a; is simpler when noise is bounded both Second, the upper bound for @C[A] and thus @a( ) 000 above and below. If supp = [ ; ] and g (x) 0 for all x. Then @C [A] @a ( )

a;

v0 v

g 0 (a)f ( ) + g 00 (a) F ( ) u 1 (u) + g(a) + ( )g 0 (a)

1

In particular, in (27), the function can be replaced by the function a; is increasing in a. The proof is of (72) is thus. We observe that u 1 (u) + (

L( ) + K for any . If it does not hold for some L( )+K =

Z

0

0,

:

(72)

. We observe that

)g 0 (a);

then

g (a(x)) dx+K = L( 0 )+K +

Z

g 0 (a(x)) dx

L( 0 )+K (

)g 0 (a) > u 1 (u)

0

for all , and the constraint E [u(L( ) + K)] = u cannot be satis…ed. Let c = v 1 u 1 (u) + g(a) + ( )g 0 (a) . Then, all on the equilibrium consumptions are

58

no greater than c. Hence, the terms in inequality (71) can be bounded as g 0 (a(x)) f (x) v 0 (c(x))

g 0 (a) f (x); v 0 (c) 1 1 1 >x g 00 (a)E 0 1 g 00 (a(x))E 0 v (c( )) v (c)

>x

= g 00 (a)

F (x) ; v 0 (c)

which gives the claimed inequality.

E.3

Illustrations for Proposition 4

We now provide explicit conditions to verify the optimality of maximum e¤ort in the three examples in Section 3.3. Example 1. Let u (x) = x, v (x) = x , 2 (0; 1]. Consider the sub-case of u = 0 and g (a) = eGa . As stated in the paper, the objective function is: h i eGa= E (G + 1)1= :

B (a)

Call a the solution of this problem. Proposition 4 proves that implementing a among all contracts (which need not implement a ) if inf B 0 (a)f ( )

a a

is optimal

(a ; );

. where (a; ) = max 0; @C[A] @a( ) Inequality (72) establishes the bound @C [A] @a( )

(a; ) =

1

GeGa=

f ( ) + GF ( )

By Proposition 4, constant target e¤ort a necessarily requesting a constant e¤ort) if 8a , inf B 0 (a) a a

1

GeGa

=

1+(

)G

(1

)=

:

will be the optimum among all contracts (not

1 + G sup

F( ) f( )

1+(

)G

(1

)=

:

Example 2. Let v (x) = ln x, u (x) = e(1 )x = (1 ) for > 0, N (0; 2 ) and u = u (ln c). The contract specifying target e¤ort a pays c( ) = c exp g 0 (a) + g (a) (1 ) g 0 (a)2

59

2

=2 .

The noise is unbounded here, so we will use equality (71) directly: @C [A] = ce(g(a) @a(x)

(1

)g 0 (a)2

h 0 g 00 (a) E eg (a)

= ce(g(a) 00

(1

)g 0 (a)2

g 0 (a)2

g (a) e

2 =2

n ) g 0 (a) eg0 (a)x f (x) + i 2 2 0 1 >x g 00 (a) e(1 (1 ) )g (a) 2 =2

2 =2

x

) g 0 (a) eg0 (a)x 1

h

x

2 =2

+

x

0

(1

h E e(1

) g (a)

)g 0 (a)

0

g (a)

1

i

>x

io

:

Observing that x for some

(1

between (1 1 @C [A] f (x) @a(x)

x

) g 0 (a)

g 0 (a) =

ce(g(a)

(1

)g 0 (a)2

ce(g(a)

(1

)g 0 (a)2

n ) g 0 (a) eg0 (a)x + n 2 =2 ) g 0 (a) eg0 (a)x + 2 =2

0

g (a) eg (a)

2 2 =2

0

eg (a) max(x;(1

inf @1 b (a; ; a )

x=

;

@C [A] 1 = g 0 (a) @a(x) 1 @C [A] f (x) @a(x)

x

(a; x)

e

+ g 00 (a)

g 0 (a) +

x

, for

inf @1 b (a; ; a )

for all .

60

e

x=

o

o :

will be the optimum if

x

+

> 0, and

N (0;

x

g 0 (a)

0

g (a) g 00 (a) eg (a) max(0;

2 0

will be the optimum if

a a

)x)

2 2 =2

(a ; )

a a

for all . Example 3. Let v (x) = x, and u (x) = Similar to Example 2,

0

g (a) eg (a)

2 00

Let (a; x) denote the last upper bound. By Proposition 4, a

By Proposition 4, a

2 =2+

e

) g 0 (a) and g 0 (a) , we can obtain

2 00

and

x

g 0 (a)

(a ; )

;

x)

:

2

) as in HM.

F

Quits and Firings

Our setup can be extended to accommodate quits and …ring. We commence with the former. The agent now has an outside option available in each period t, and so the participation constraint in each period becomes Et [UT ] ut . As before, the principal wishes to implement (at )t T , and wishes to deter quitting. This can be achieved simply by increasing the constant K such that for all t, Et [UT ] ut . Under the conditions of Proposition 1, we can see that this is the only contract that ensures that. Economically, the agent receives rents because of his credible threat to leave in the interim periods. However, these rents only a¤ect K, not the form of the contract. As in the core paper, if the bene…t of e¤ort is su¢ ciently high, maximum e¤ort remains optimal. We now turn to …rings, considering T = 2 for simplicity and then discussing the generalizability to other T . Suppose that the principal wishes to …re the agent if r1 2 IF and keep him if r1 2 IFc , where IF and IFc are disjoint intervals. Call rF their common boundary, i.e. r1F = IF \ IFc . The next Proposition describes the contract. Proposition 9 (Contract with …ring, T = 2). Under the conditions of Proposition 1 plus the option to …re, the following contract is optimal: (i) if r1 2 IF , the agent is …red, and receives a payo¤ c = v 1 (g 0 (a1 ) r1 + K1 ), (ii) if r1 2 IFc , the agent remains employed, and receives a P2 0 …nal payo¤ c = v 1 t=1 g (at ) rt + K2 . The constants K1 and K2 are chosen such that the utility of the agent is continuous at r1 = rF , the cuto¤ return that triggers …ring. Proof. (This is a sketch of the proof, as the arguments are similar to those in the main body of the paper). De…ne 1F = r1F a1 , the cuto¤ noise that divides the regions of …ring and not …ring. For 1 2 INc F (where I is the interior of set I), by the logic of Proposition 1, very small P2 0 deviations around a1 will still keep r1 in INc F and so we require c = v 1 t=1 g (at ) rt + KN F . For 1 2 IFc , very small deviations around a1 will still keep r1 in IFc , and so we require c = v 1 (g 0 (a1 ) r1 + KF ) for some other constant. The utility should be continuous at rF to preserve the IC. Thus, the contract remains tractable even with the possibility of …ring. This is because the intuition in the core model continues to hold –since the noise is observed before the action, the contract must provide su¢ cient incentives state-by-state and so the principal has little freedom in designing the contract. This contrasts with standard models in which the possibility of …ring changes the contract signi…cantly. The only degree of freedom for the principal is …nding the domain IFc . As is standard, this will depend on the cost of …nding another agent at t = 2. For instance, if the cost of …nding a new employee are low, the domain of optimal …ring might be large. It is clear that the same logic would apply for T > 2. Suppose that the agent’s contract terminates at (a potentially return-dependent) time , with the same “tree”structure: at each time t, there is a monotone function t (r1 ; :::; rt ) such that the principal …res the agent if and 61

only if t (r1 ; :::; rt ) > 0. Then, the compensation scheme has the following shape: if the agent works until , he receives: ! X c=v 1 g 0 (at ) rt + K (73) t=1

for some constants K1 ; :::; KT . In addition, we can unify the two extensions of both quits and …rings. Consider the …ring model with T = 2. Suppose that the principal wishes to …re the agent if r1 2 IF , but also wishes to deter voluntary departures. Then, the contract is the one described in Proposition 9, but with K1 and K2 are simply set high enough such that the agent always receives at least his reservation utility.

G

Proofs of Mathematical Lemmas

This section contains proofs of some of the mathematical lemmas featured in the appendices of the main paper. Proof of Lemma 4 We thank Chris Evans for suggesting the proof strategy for this Lemma. We assume a < b. We …rst prove the Lemma when j (x) = 0 8 x. For a positive integer n, de…ne kn = (b a) =n, and the function rn (x) as rn (x) =

(

f (x) f (x kn ) kn

for x 2 [a + kn ; b] 0 for x 2 [a; a + kn ):

We have for x 2 (a; b], lim inf n!1 rn (x) lim inf "#0 f (x) "f (x ") 0. Rb h(x)+h(x kn ) De…ne In = a rn (x) dx. As f +h is nondecreasing and k is C 1 , f (x) kfn(x kn ) kn sup[a;b] h0 (x). Therefore, rn (x) min 0; sup[a;b] h0 (x) 8 x. Hence we can apply Fatou’s lemma, which shows: lim inf In = lim inf n!1

n!1

Next, observe that In = In =

Z

b

b kn

= f (b)

f (x) dx kn f (a)

Z

Z

Rb

a+kn

a b

b kn

a+kn

Z

b

rn (x) dx

a

f (b)

f (x) kn

dx

b

lim inf rn (x) dx

a

f (x) f (x kn ) dx kn

f (x) dx kn

Z

Z

n!1

0:

consists of telescoping sums, so:

a+kn

a

62

f (x)

f (a) kn

dx = f (b)

f (a)

Bn

An :

We …rst minorize An . From condition (ii) of the Lemma, for any " > 0, there is an such that for x 2 [a; a + ], f (x) f (a) ". For n large enough such that kn , An =

Z

a+kn

f (x)

f (a) kn

a

dx

Z

a+kn

a

" kn

dx =

> 0,

";

and so lim inf n!1 An 0. 0 for every " > 0, there exists a > 0 s.t. for We next minorize Bn . Since f 0 (b) x 2 [b ; b], (f (b) f (x)) = (b x) ". Therefore, for n su¢ ciently large so that kn , Bn =

Z

b

b kn

and so lim inf n!1 Bn Finally, since f (b) f (b)

f (b)

f (x) kn

dx

Z

b

b kn

( ") (b kn

x)

dx =

"

kn ; 2

0. f (a) = In + An + Bn , we have

f (a) = lim inf (In + An + Bn ) n!1

lim inf In + lim inf An + lim inf Bn n!1

n!1

We now prove the general case. De…ne F (x) = f (x) the above result, F (b) F (a) 0.

Rx a

n!1

0:

j (t) dt. Then, F 0 (x)

0. By

Proof of Lemma 5 Let (yn ) " x be a sequence such that f (x) yn "x x

f 0 (x) = lim

f (yn ) : yn

We can further assume that limn!1 f (yn ) exists (if not, then we can choose a subsequence ynk such that limnk !1 f (ynk ) exists and replace yn by ynk ). If limn!1 f (yn ) = f (x), Then, h f (x) h f (y) y"x x y h f (x) h f (yn ) lim yn "x x yn h f (x) h f (yn ) f (x) = lim yn "x f (x) f (yn ) x

(h f )0 (x) = lim inf

f (yn ) yn

= h0 (f (x)) f 0 (x) : If limn!1 f (yn ) < f (x), then f 0 (x) = 1, since h0 (f (x)) > 0, we still have (h f )0 (x) h0 (f (x)) f 0 (x). If limn!1 f (yn ) > f (x), then (h f )0 (x) limyn "x h f (x)x hynf (yn ) = 1, hence (h f )0 (x) h0 (f (x)) f 0 (x). 63

On the other hand, suppose (^ yn ) " x be a sequence such that h f (x) y^n "x x

(h f )0 (x) = lim

h f (^ yn ) ; y^n

and that limn!1 f (^ yn ) exists. If limn!1 f (^ yn ) = f (x), Then, h f (x) y^n "x x h f (x) = lim y^n "x f (x) h f (x) = lim y^n "x f (x)

h f (^ yn ) y^n yn ) h f (^ yn ) f (x) f (^ f (^ yn ) x y^n f (x) f (^ yn ) h f (^ yn ) lim y^n "x f (^ yn ) x y^n f (x) f (^ yn ) = h0 (f (x)) lim 5 y^n "x x y^n

(h f )0 (x) = lim

h0 (f (x)) f 0 (x) : Note that the existence of limy^n "x

h f (x) h f (^ yn ) x y^n

h f (x) h f (^ yn ) f (x) f (^ yn )

and limy^n "x

guarantees the ex-

istence of limy^n "x f (x)x yf^n(^yn ) . If limn!1 f (^ yn ) < f (x), then (h f )0 (x) = 1 h0 (f (x)) f 0 (x). If limn!1 f (^ yn ) > f (x), then f 0 (x) limy^n "x f (x)x yfn(^yn ) = 1 (h f )0 (x). Therefore, (h f )0 (x) = h0 (f (x)) f 0 (x). Proof of Lemma 6 We use f (x) + h (x) f (y) h (y) f (x) f (y) h (x) = lim inf + y"x y"x x y x y x f (x) f (y) h (x) h (y) lim inf + lim inf = f 0 (x) + h0 (x) . y"x y"x x y x y

(f + h)0 (x) = lim inf

h (y) y

When h is di¤erentiable at x, (f + h)0 (x) = lim inf y"x

f (x) x

f (y) h (x) + lim y"x y x

h (y) = f 0 (x) + h0 (x) : y

Proof of Lemma 7 We wish to prove that E [h (X)] E [h (Y )] for any concave function h. De…ne I ( ) = E [h (X + (Y X))] for 2 [0; 1], so that I 00 ( ) = E h00 (X + (Y I 0 (0) = E [h0 (X) (Y

X)) (Y

X)2

X)] = E h0 (X)

Z

0

64

0 T t dZt

;

where t = t 0 almost surely. We wish to prove I (1) I (0). Since I is t , and t 0 concave, it is su¢ cient to prove that I (0) 0. We next use some basic results from Malliavin calculus (see, e.g., Di Nunno, Oksendal and Proske (2008)). The integration by parts formula for Malliavin calculus yields: 0

Z

0

I (0) = E h (X)

T t dZt

=E

0

Z

T

(Dt h0 (X)) t dt ;

0

where Dt h0 (X) is the Malliavin derivative of h0 (X) at time t. Since ( s )s2[0;T ] is deterministic. Therefore, the calculation of Dt h0 (X) is straightforward: 0

Dt h (X)

Z

0

Dt h

T s dZs

00

=h

Z

T s dZs

t

= h00 (X)

t:

0

0

Hence, we have: 0

I (0) = E

Z

T 0

(Dt h (X)) t dt = E

T

h00 (X)

t t dt

:

0

0

Since h00 (X) 0 (because h is concave), and Therefore, I 0 (0) 0 as required.

Z

t

and

65

t

are nonnegative, we have h00 (X)

t t

0.

References [1] Di Nunno, G., B. Oksendal and F. Proske (2008): Malliavin Calculus for Lévy Processes with Applications to Finance, Springer Verlag [2] Holmstrom, B. (1979): “Moral Hazard and Observability.” Bell Journal of Economics 10, 74-91 [3] Holmstrom, B. (1982): “Moral Hazard in Teams.” Bell Journal of Economics 13, 324-340

66