Optimal exit time from casino gambling: Why a lucky coin
Transcription
Optimal exit time from casino gambling: Why a lucky coin
Optimal exit time from casino gambling: Why a lucky coin and a good memory matter ∗ Xue Dong He† Sang Hu‡ Jan Obl´oj§ Xun Yu Zhou¶ First version: May 24, 2013, This version: May 13, 2015 Abstract We consider the dynamic casino gambling model initially proposed by Barberis (2012) and study the optimal stopping strategy of a pre-committing gambler with cumulative prospect theory (CPT) preferences. We develop a systematic and analytical approach to finding the gambler’s optimal strategy. We illustrate how the strategies computed in Barberis (2012) can be strictly improved by reviewing the betting history or by tossing an independent coin, and we explain that the improvement generated by using randomized strategies results from the lack of quasi-convexity of CPT preferences. Finally, we show that any path-dependent strategy is equivalent to a randomization of path-independent strategies. Key words: casino gambling; cumulative prospect theory; path-dependence; randomized strategies; quasi-convexity; optimal stopping; Skorodhod embedding ∗ The main results in this paper are contained in the Ph.D. thesis of Sang Hu, Optimal Exit Strategies of Behavioral Gamblers (Hu, 2014), which was submitted to the Chinese University of Hong Kong (CUHK) in September 2014. The authors thank Nick Barberis for his helpful comments on an earlier version of the paper. The results were also announced and presented at the Trimester Seminar “Stochastic Dynamics in Economics and Finance” held at the Hausdorff Research Institute for Mathematics in August 2013, as well as at the “Second NUS Workshop on Risk & Regulation” held at the National University of Singapore in January 2014. Comments from the participants of these events are gratefully acknowledged. † Department of Industrial Engineering and Operations Research, Columbia University, S. W. Mudd Building, 500 W. 120th Street, New York, NY 10027, USA. Email: xh2140@columbia.edu. He acknowledges support from a start-up fund at Columbia University. ‡ Risk Management Institute, National University of Singapore, 21 Heng Mui Keng Terrace, Singapre 119613. Email: rmihsa@nus.edu.sg. § Mathematical Institute, The Oxford-Man Institute of Quantitative Finance and St John’s College, University of Oxford, Oxford, UK. Email: Jan.Obloj@maths.ox.ac.uk. Part of this research was completed while this author was visiting at CUHK in March 2013, and he is grateful for the support of that institution. He also gratefully acknowledges support from ERC Starting Grant RobustFinMath 335421. ¶ Mathematical Institute, The University of Oxford, Woodstock Road, OX2 6GG Oxford, UK, and Oxford–Man Institute of Quantitative Finance, University of Oxford. Email: zhouxy@maths.ox.ac.uk. This author acknowledges financial support from research funds at the University of Oxford and the Oxford– Man Institute of Quantitative Finance. 1 1 Introduction Gambling is the wagering of money on an event of which the outcome is uncertain. Casino gambling is a type of gambling that enjoys huge popularity. By its very nature, casino gambling is an act of risk seeking because a typical casino bet has at most zero expected value. Such risk-seeing behavior cannot be explained by models in the expected utility framework with concave utility functions. By contrast, models in behavioral economics have the potential to shed meaningful light on this behavior.1 Barberis (2012) was the first to employ the cumulative prospect theory (CPT) of Tversky and Kahneman (1992) to model and study casino gambling. In his model, a gambler comes to a casino at time 0 and is offered a bet with an equal chance to win or lose $1. If the gambler accepts, the bet is then played out and he either gains $1 or loses $1 at time 1. Then, the gambler is offered the same bet, and he can choose to leave the casino or to continue gambling. If he continues, the second bet is played out and the gambler is offered another bet at time 2, and so on. The problem is to determine whether the gambler will enter the casino to play the game at all and, if yes, what is the best time for him to stop playing. Clearly, the answer depends on the gambler’s preferences. In Barberis (2012), the gambler’s risk preferences are represented by CPT. In this theory, individuals’ preferences are determined by an S-shaped utility function and two inverse-S shaped probability weighting functions. The latter effectively overweight the tails of a distribution, so a gambler with CPT preferences overweights very large gains of small probability and thus may decide to play in the casino. Barberis (2012) compares the strategies of three types of gamblers: naive gamblers, sophisticated gamblers with pre-commitment, and sophisticated gamblers without precommitment. Because of probability weighting, the casino gambling problem is timeinconsistent in the sense that the optimal strategy designed by the gambler at time 0 is no longer optimal at a future time if and when the gambler reconsiders the problem. A naive gambler does not realize this inconsistency and thus keeps changing his strategy over time. A sophisticated gambler with pre-commitment is able to commit himself in the future to the strategy that is set up at time 0 through some commitment device. A sophisticated gambler without pre-commitment realizes the inconsistency but is unable to commit himself to the optimal strategy at time 0, and thus takes this inconsistency into 1 Another type of model similar to the one in Conlisk (1993) tries to explain gambling by introducing additional utility of gambling and appending it to the expected utility model. 2 account when making decisions today. With reasonable parameter values, Barberis (2012) finds that sophisticated gamblers with pre-commitment tend to take loss-exit strategies, i.e., to stop playing at a certain loss level (e.g., $100 loss) and to continue when he is in the gain position. Naive gamblers, by contrast, end up with gain-exit strategies. Finally, sophisticated agents without pre-commitment choose not to play at all. CPT is a descriptive model for individuals’ preferences. A crucial contribution in Barberis (2012) lies in showing that the optimal strategy of a gambler with CPT preferences is consistent with several commonly observed gambling behaviors such as the popularity of casino gambling and the implementation of gain-exit and loss-exit strategies. However, the setting of Barberis (2012) was restrictive as it assumed that the gambler can only choose among simple (path-independent) strategies. Moreover, the author proceeded by means of an exhaustive search and only found the optimal strategy for an example with five periods. Barberis (2012) leaves number of natural and important open questions. First, if the set of strategies is not restricted, could CPT also explain other commonly observed gambling patterns? For instance, Thaler and Johnson (1990) find the house money effect, i.e., that gamblers become more risk seeking in the presence of a prior gain. This is an example of a strategy which is not path-independent but takes into account the history of winnings. Similarly, it has been observed that individuals may use a random device, such as a coin flip, to aid their choices in various contexts. More examples and discussions on randomization can be found in, e.g., Agranov and Ortoleva (2013), Dwenger et al. (2013) and the references therein2 . Thus, it would be interesting to see whether CPT gamblers sometimes prefer to use path-dependent and randomized strategies and to understand why this is the case. Second, instead of relying on enumeration, it seems desirable to develop a general methodology for solving the casino gambling problem with CPT preferences. We are able here to answer both of the above questions. We consider the casino gambling problem without additional restrictions on time-horizon or set of available strategies. Our first main contribution is to study different classes of strategies and explain which features of the CPT preferences lead to behaviors consistent with empirical findings. This contribution can be summarized in three parts. 2 In Dwenger et al. (2013), several real-life examples exploiting the randomization feature are given: surprise menus at restaurants, last minute holiday booking desks at airports, movie sneak previews that do not advertise the movie titles, surprise-me features of internet services, and home delivery of produce bins with variable content. Another example is the practice of “drawing divination sticks”, popular in Chinese culture even today, in which people who are reluctant or unable to make their own important decisions go to a temple, pray and draw divination sticks, and follow whatever the words on the sticks tell them to do. 3 First, through a numerical example and assuming reasonable parameter values for the gambler’s CPT preferences, we find that the gambler may strictly prefer path-dependent strategies over path-independent strategies. Likewise, by tossing an independent (possibly biased) coin at some point, the gambler may further strictly improve his preference value. Secondly, we study, at a theoretical level, the issue of why the gambler prefers randomized strategies3 . Consider an agent who wants to optimally stop a Markov process. We find that the distribution of the process at a randomized stopping time is a convex combination of the distributions of the process at some non-randomized stopping times. Therefore, if the agent’s objective is to maximize his preference value of the distribution of the process at the stopping time and the preference is quasi-convex, he dislikes randomization. In the casino gambling problem, the gambler has CPT preferences, which are not quasi-convex; so it becomes possible for the gambler to strictly improve performance by using randomized strategies. Thirdly, we show that any path-dependent strategy is equivalent to a randomization of path-independent strategies. This result implies that agents with quasi-convex preferences do not need to choose path-dependent strategies. In particular, the gambler in the casino problem strictly prefers path-dependent strategies only because CPT is not quasi-convex. This is no longer true if randomization is allowed: randomized path-independent strategies always perform better than path-dependent but non-randomized strategies. Sometimes, as shown by examples, they perform strictly better. It also follows that it is always enough to consider randomized, path-independent strategies since the more complex randomized and path-dependent strategies cannot further improve the gambler’s preference value. Our second main contribution is to develop a systematic approach to solving the casino gambling problem. Because of probability weighting, classical approaches to optimal stopping problems such as the martingale method and the dynamic programming principle do not apply here. By proving a suitable Skorodhod embedding theorem, we show that the casino gambling problem is equivalent to an infinite-dimensional optimization problem in which the distribution of the cumulative gain and loss at the stopping time becomes the decision variable. Therefore, to find the optimal strategy of the gambler, we only need to solve the infinite-dimensional optimization problem for the optimal distribution and then use the Skorodhod embedding theorem to find the stopping time leading to this distribution. In the present paper, we focus on the optimal strategy for sophisticated agents with 3 In a game-theoretical language the non-randomized and randomized strategies correspond respectively to pure and mixed strategies, see Section 4.1 below for a further discussion. 4 pre-commitment. We do so for a number of reasons. First, this allows us to concentrate on studying the optimal behavior resulting from CPT preferences, describe features matching empirically observed patterns and investigate theoretical properties underpinning them. Second, we believe that such sophisticated agents with pre-commitment are themselves an important type of agents, and many gamblers belong to this type with the aid of some commitment devices. Finally, these agents are key to studying any other types of agents. In particular, once we have found the optimal strategy for them, the realized strategy for naive agents can also be computed. We propose to study it in detail in a subsequent work. The remainder of this paper is organized as follows: In Section 2, we briefly review CPT and then formulate the casino gambling model. In Section 3, we offer numerical examples to show that path-dependent and randomized strategies strictly outperform, respectively, path-independent and non-randomized ones. We then explain, in Section 4, the reasons underlying this out-performance. In Section 5, we propose a systematic approach to the casino gambling problem and carry out a numerical study. Finally, Section 6 concludes the paper. All proofs are presented in Appendix A. 2 The Model 2.1 Cumulative Prospect Theory In the classical expected utility theory (EUT), individuals’ preferences are represented by the expected utility of random payoffs. However, this theory cannot explain many commonly observed behaviors when individuals make decisions under risk.4 One of the most notable alternatives to EUT is the cumulative prospect theory (CPT) proposed by Kahneman and Tversky (1979) and Tversky and Kahneman (1992). In this theory, instead of evaluating random payoffs directly, individuals evaluate random gains and losses relative to a benchmark which is termed the reference point. More precisely, after a reference point B is specified, individuals code each random payoff Y into the corresponding gain or loss X = Y − B. Then, the preference for X is represented by the functional Z ∞ Z u(x)d[−w+ (1 − FX (x))] + V (X) := u(x)d[w− (FX (x))], −∞ 0 4 0 See a detailed discussion in Starmer (2000). 5 (1) where FX (·) is the cumulative distribution function (CDF) of X. The function u(·), which is strictly increasing, is called the utility function and w± (·), two strictly increasing mappings from [0, 1] onto [0, 1], are probability weighting (or distortion) functions on gains and losses, respectively.5 Empirical studies reveal that u(·) is typically S-shaped. In other words, both u+ (x) := u(x), x ≥ 0, the utility of the gain, and u− (x) := −u(−x), x ≥ 0, the disutility of the loss, are concave functions. On the other hand, w± (·) are inverse-S-shaped, representing individuals’ tendency to overweight (relative to EUT) the tails of payoff distributions. The S-shaped utility function and inverse-S-shaped probability weighting functions together result in a fourfold pattern of risk attitudes that is consistent with empirical findings: that individuals tend to be risk averse and risk seeking with respect to gains and losses of moderate or high probability, respectively; individuals are risk seeking and risk averse regarding gains and losses of small probability, respectively (Tversky and Kahneman, 1992). Finally, loss aversion, the tendency to be more sensitive to losses than to gains, can also be modeled in CPT by choosing u− (·) to be steeper than u+ (·). See Figures 1 and 2 for an illustration of the utility and probability weighting functions, where the functions take the following parametric forms proposed by Tversky and Kahneman (1992): u(x) = x α + −λ(−x)α− 2.2 for x ≥ 0 and for x < 0, w± (p) = (pδ± pδ± . + (1 − p)δ± )1/δ± (2) The Casino Gambling Problem We consider the casino gambling problem model proposed by Barberis (2012). At time 0, a gambler is offered a fair bet, e.g., an idealized black or red bet on a roulette wheel: win or lose one dollar with equal probability.6 If he declines this bet, he does not enter the casino to gamble. Otherwise, he starts the game and the outcome of the bet is played out at time 1, at which time he either wins or loses one dollar. The gambler is then offered the same bet and he can again choose to play or not, and so forth. 5 The utility function is called the value function in Kahneman and Tversky’s terminology. In the present paper, we use the term utility function to distinguish it from value functions of optimization problems. 6 In this paper we consider the case of a fair game because, while it is an idealization of reality, it is already rich enough to convey the main economical insights. The case of an unfair game is technically more involved without significantly new conceptual insights, and we hope to study it in a future paper. 6 4 3 2 u(x) 1 0 −1 −2 −3 −4 −2 −1.5 −1 −0.5 0 x 0.5 1 1.5 2 Figure 1: S-shaped utility function. The function takes form as in (2) with α+ = α− = 0.5, and λ = 2.5. 1 delta = 0.65 delta = 0.4 0.9 0.8 0.7 w(p) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 p Figure 2: Inverse-S-shaped probability weighting function. The function takes form as in (2) with the dotted line corresponding to δ = 0.65, the dashed line to δ = 0.4, and the solid line to δ = 1. 7 At time 0, the gambler decides whether to enter the casino and, if yes, the optimal time to leave the casino. We call such a decision a strategy. As in Barberis (2012), we assume CPT preferences for the gambler; so the decision criterion is to maximize the CPT value of his wealth at the time when he leaves the casino. Because the bet is fair, the cumulative gain or loss of the agent, while he continues to play, is a standard symmetric random walk Sn , n ≥ 0, on Z, the set of integers; see the representation in Figure 3. We further assume that the gambler uses his initial wealth as the reference point, so he perceives Sn as his cumulative gain or loss after playing the bet for n times. For any stopping time τ , the CPT value of Sτ is V (Sτ ) = ∞ X u+ (n) w+ (P(Sτ ≥ n)) − w+ (P(Sτ > n)) n=1 − ∞ X (3) u− (n) w− (P(Sτ ≤ −n)) − w− (P(Sτ < −n)) , n=1 with the convention that +∞ − ∞ = −∞, so that V (Sτ ) is always well-defined. The gambler’s problem faced at time 0 is then to find the stopping time τ in a set of admissible strategies to maximize V (Sτ ). The choice of a specific admissible set is critical to forming the optimal strategy, and we will specify a few important types below. 2.3 Types of Stopping Strategies A simple class of strategies, considered in Barberis (2012), is given by rules τ , which assign a binary decision, to stop or not to stop, to each node in the gain/loss process. In other words, for any t ≥ 0, {τ = t} (conditioning on {τ ≥ t}) is determined by (t, St ). We call such strategies path-independent or Markovian, and we denote the set of these strategies as AM . Extending the above, we can consider path-dependent strategies which base the decision not only on the current state but also on the entire betting history. In other words, we consider strategies τ such that for any t ≥ 0, {τ = t} is determined by the information set Ft = σ(Su : u ≤ t). The binary decision—to stop or not to stop—depends not only on the current node of the gain/loss process but also on how the gambler arrived at the current node. Mathematically speaking, τ is a stopping time relative to {Ft }t≥0 . We denote the set of path-dependent strategies as AD . Next, we consider strategies that can make use of independent coin tosses. Imagine that 8 (2,2) (1,1) (0,0) (2,0) (1,Ͳ1) (2,Ͳ2) Figure 3: Gain/loss process represented as a binomial tree. Each node is marked by a pair (n, x), where n stands for the time and x the amount of (cumulative) gain/loss. For example, the node (2, −2) signifies a total loss, relative to the initial wealth, of $2 at time 2. 9 sometimes the gambler may have difficulty deciding whether or not to leave the casino, so he tosses a coin to help with his decision making. More precisely, we consider the following simple strategies: at each step the gambler tosses a coin and decides to leave the casino if the outcome is tails and to continue if the outcome is otherwise. To describe this more formally, consider {ξt,x }t≥0,x∈Z a family of mutually independent, and independent of S, coin tosses: ξt,x = 0 stands for tails and ξt,x = 1 stands for heads. The gambler’s behaviour is represented by the following strategy: τ := inf{t ≥ 0|ξt,St = 0}. (4) Note that the information available to the gambler includes not only his prior gains and losses but also the coin toss outcomes in the history; i.e., the information at t becomes Gt := σ(Su , ξu,Su , u ≤ t). Clearly, τ in (4) is a stopping time with respect to this enlarged information flow. However, τ defined by (4) is path-independent in that {τ = t} depends only on St and ξt,St (conditioning on {τ ≥ t}). Let us stress that we do not specify the distribution of these random coins, i.e., the value of rt,x := P(ξt,x = 0) = 1 − P(ξt,x = 1); these numbers are determined by the gambler as part of his gambling strategy. We denote by AC the set of such randomized, path-independent strategies generated by tossing coins, i.e.,7 AC := {τ |τ is defined as in (4), for some {ξt,x }t≥0,x∈Z that are independent 0-1 random variables and are independent of {St }t≥0 .} Note that AM ⊂ AC since non-randomized Markovian strategies correspond simply to the case when rt,x ∈ {0, 1}. Finally, we could also consider randomized path-dependent strategies; i.e., at each time t and for each realization {xt } of {St }, an independent random coin ξt,xt∧· is tossed and the agent stops at the first time ξt,St∧· = 0.8 Interestingly, as will be seen in Proposition 1 below, using randomized path-dependent strategies does not improve the gambler’s preference value compared to using randomized but path-independent strategies. Put differently, the added complexity of randomized path-dependent strategies, when compared to randomized 7 One can also introduce randomization through mutually independent uniformly distributed random variables. 8 These may also be represented as a random (independent of S) choice of strategy in AD . Using game-theoretical language, these are mixed strategies with the set of pure strategies given by AD , see the discussion in Section 4.1. 10 (3,3) (5,3) (4,2) (3,1) (5,1) Figure 4: Optimal path-independent strategy. The CPT value is V = 0.250440. Black nodes stand for “stop” and white nodes stand for “continue.” Markovian strategies in AC , has no benefit for the gambler. In consequence, the randomized path-dependent strategies will not be considered in the following analysis. 3 Comparison of Strategies In this section we compare the three types of strategy defined in Section 2.3. We demonstrate, through the following example, that using path-dependent strategies can strictly increase the optimal value achieved by using the Markovian ones and that using randomized strategies can further strictly increase the optimal value over path-dependent ones. Here we consider Tversky and Kahneman (1992) utility and probability weighting functions in (2) with α+ = α− = α, δ+ = δ− = δ. Consider a 6-period horizon, α = 0.9, δ = 0.4, and λ = 2.25, which are reasonable parameter values; see for instance Tversky and Kahneman (1992).9 3.1 Optimal Path-Independent Strategy First, consider the case in which only path-independent strategies are allowed. The optimal strategy can be found through exhaustive search. The optimal CPT value is V = 0.250440. 9 Our results are not specific to the choice of probability weighting functions. For instance, we obtain the same results by using the probability weighting function proposed by Prelec (1998), i.e., w± (p) = δ e−γ(− ln p) , with γ = 0.5 and δ = 0.4. 11 (5,3) (3,3) (4,2) (3,1) (5,1) Figure 5: Optimal path-dependent strategy. The CPT value is V = 0.250693. Black nodes stand for “stop” and white nodes stand for “continue.” The half-black-half-white node (5, 1) means that if the previous two nodes are (3, 3) and (4, 2), then the gambler stops; if the previous two nodes are (3, 1) and (4, 2), then he continues. (3,3) (5,3) (4,2) (3,1) (5,1) Figure 6: Randomized, path-independent strategy. The CPT value is V = 0.250702. Black nodes stand for “stop” and white nodes stand for “continue.” The grey node (5, 3) means that if node (5, 3) is reached, then the gambler tosses a biased coin with (heads : 31/32, tails : 1/32). If the result is heads, then the gambler continues; otherwise he stops. The grey node (5, 1) means that if node (5, 1) is reached, then the gambler tosses a coin with (heads : 1/2, tails : 1/2) and continues only if the coin turns up heads. 12 Figure 4 shows the optimal path-independent strategy, where black nodes stand for “stop” and white nodes stand for “continue.” Observe that this strategy is to stop at any loss state and to continue at any gain state until the end of the 6th period except for at node (5, 1). 3.2 Optimal Path-Dependent Strategy Next, we consider path-dependent strategies. Again, the optimal strategy can be found through exhaustive search. The optimal CPT value is strictly improved: V = 0.250693. Figure 5 shows this optimal path-dependent strategy. Node (5, 1) is half-black-half-white, meaning that if the previous two nodes the path has gone through are (3, 3) and (4, 2), then the gambler stops; if the previous two nodes are (3, 1) and (4, 2), then he continues.10 Notice that even though (5, 1) is the only different node compared with the optimal pathindependent strategy, the optimal CPT value has already been strictly increased; i.e., the optimal CPT value over path-independent strategies is strictly improved by using pathdependent strategies. 3.3 Randomized, Path-Independent Strategy Now, we introduce randomization into nodes (5, 3) and (5, 1), which is shown in grey in Figure 6. Once node (5, 3) is reached, the gambler tosses a biased coin with (heads : 31/32, tails : 1/32), i.e., the probability of the coin turning up heads is 31/32, while the probability of tails is 1/32. If the result is heads, then the gambler continues; otherwise he stops. Similarly, once node (5, 1) is reached, the gambler tosses a coin with (heads : 1/2, tails : 1/2) and continues only if the coin turns up heads. The CPT value of such a strategy is increased to V = 0.250702. In particular, the optimal CPT value achieved using path-dependent strategies is strictly improved when using randomized but pathindependent strategies. The previous examples show that, starting with Markovian strategies, it is possible to strictly increase the gambler’s CPT value by allowing for path-dependent strategies and then to further increase the value by switching to randomized strategies in AC . This is a 10 The optimal path-dependent strategy is not unique: the gambler can also choose to continue if the path leading to node (5, 1) goes through nodes (3, 3) and (4, 2) and stop if it goes through nodes (3, 1) and (4, 2). 13 general fact and the following chain of inequalities holds: sup V (Sτ ) ≤ sup V (Sτ ) ≤ sup V (Sτ ). τ ∈AM τ ∈AD (5) τ ∈AC The inequality supτ ∈AM V (Sτ ) ≤ supτ ∈AD V (Sτ ) is obvious because the set of path-dependent strategies contains that of Markovian strategies. Similarly, supτ ∈AM V (Sτ ) ≤ supτ ∈AC V (Sτ ) is also trivial because Markovian strategies are special (deterministic) examples of strategies in AC , which in general rely on more information (i.e., coin tosses). The inequality supτ ∈AD V (Sτ ) ≤ supτ ∈AC V (Sτ ) (with the strict inequality possible) is not trivial and we will show this inequality in Section 4.2. 4 Why Lucky Coins and Good Memory Matter Preference for randomization has indeed been observed in the literature in a variety of contexts; see the recent empirical evidence provided in Agranov and Ortoleva (2013), Dwenger et al. (2013) and the references therein. The preference for path dependence, however, has not been studied extensively in the literature. In what follows, we explain theoretically, via a general optimal stopping problem, the reasons underlying the observed strict preferences for randomization and for path-dependence in the casino gambling model. We consider a general discrete-time Markov chain {Xt }t≥0 .11 Without loss of generality, we assume that Xt takes values in Z and that X0 = 0. As in the casino gambling problem, we denote AM , AD , and AC as the sets of Markovian stopping times, path-dependent stopping times, and randomized (through coin tossing) yet Markovian stopping times, respectively. Suppose an agent wants to choose stopping time τ to maximize his preference value of Xτ . In the following, we study the problem when the agent likes or dislikes randomization or path-dependent strategies. 4.1 Preference for Randomization Any randomized stopping time in AC can be viewed as a special case of full randomization. A fully randomized strategy is implemented by randomly choosing from a given set of non-randomized stopping times, e.g., from AM , using a random device. Such a device is 11 The gain/loss process in the casino gambling problem is a symmetric random walk and thus is a Markov chain. 14 represented by a random mapping σ from Ω to AM and is independent of the Markovian process {Xt }t≥0 . Suppose the outcome of σ is τ , which is a stopping time τ ∈ AM ; the agent then chooses τ as his strategy. To visualize this, imagine an uneven die with an infinite but countable number of faces, each corresponding to a strategy in AM . Using game-theoretical language, fully randomized strategies correspond to mixed strategies while elements in AM represent pure strategies. Any τ˜ ∈ AC , i.e., τ˜ = inf{t ≥ 0|ξt,Xt = 0}, for a given set of mutually independent 0-1 random variables {ξt,x }t≥0,x∈Z that are independent of {Xt }t≥0 , can be viewed as a special case of full randomization. Indeed, for any A := {at,x }t≥0,x∈Z such that at,x ∈ {0, 1}, define τA := inf{t ≥ 0|at,Xt = 0}. Then, τA ∈ AM . Define σ as σ(ω) := τA with A = {ξt,x (ω)}t≥0,x∈Z . Then, σ is a random mapping taking values in AM . Moreover, because {ξt,x }t≥0,x∈Z is independent of {Xt }t≥0 , so is σ. Thus, σ is a fully randomized strategy and, clearly, σ = τ˜.12 We now study the problem as to when the agent dislikes full randomization, and hence, in particular, will not use randomized strategies in AC . We first show that the stopped distribution of {Xt }t≥0 at a randomized stopping time σ is a convex combination of the stopped distributions at a set of nonrandomized stopping times. To see this, suppose the possible values that σ can take are τi ’s with probabilities pi ’s, respectively. Then, FXσ (x) = P(Xσ ≤ x) = E [P(Xσ ≤ x|σ)] = X pi P(Xτi ≤ x|σ = τi ) i=1 = X pi P(Xτi ≤ x) = i=1 X pi FXτi (x), i=1 where the third equality holds because of the independence of σ and {Xt }t≥0 . Therefore, full randomization effectively convexifies the set of state distributions at non-randomized stopping times. Now, suppose the agent’s preference for Xτ is law-invariant, i.e., the preference value of Xτ is V(FXτ ) for some functional V on distributions. We say V is quasi-convex if V(pF1 + (1 − p)F2 ) ≤ max{V(F1 ), V(F2 )}, 12 ∀F1 , F2 , ∀p ∈ [0, 1]. (6) The converse is also true: for any fully randomized strategy σ, we can find a strategy τ in AC such that the distributions of Sτ and Sσ are the same; see the discussion after Theorem 2. 15 If, further, V is continuous, then we can conclude that V +∞ X i=1 13 ! pi F i ≤ max V(Fi ), i≥1 ∀Fi , ∀pi ≥ 0, i ≥ 1, with ∞ X pi = 1. i=1 Because the state distribution at a randomized stopping time is a convex combination of the state distributions at nonrandomized stopping times, we can conclude that the agent dislikes any type of randomization if his preference representation V is quasi-convex. It has been noted in the literature that individuals with quasi-convex preferences dislike randomization; see for instance Machina (1985), Camerer and Ho (1994) and Blavatskyy (2006). It is easy to see that a sufficient condition for quasi-convexity is betweenness: for any two distributions F1 and F2 such that V(F1 ) ≥ V(F2 ) and any p ∈ (0, 1), V(F1 ) ≥ V(pF1 + (1 − p)F2 ) ≥ V(F2 ). Betweenness is further implied by independence: for any distributions F1 , F2 , G and any p ∈ (0, 1), if V(F1 ) ≥ V(F2 ), then V(pF1 +(1−p)G) ≥ V(pF2 +(1−p)G). 14 Because expected utility theory is underlined by the independence axiom, agents with EU preferences dislike randomization. Similarly, agents with preferences satifying betweenness or quasi-convexity also dislike randomization.15 On the other hand, preferences involving probability weighting, such as rank-dependent utility (Quiggin, 1982), prospect theory (Kahneman and Tversky, 1979, Tversky and Kahneman, 1992), and dual theory of choice (Yaari, 1987), are in general not quasi-convex; see Camerer and Ho (1994). With these preferences, randomization can be strictly preferred, as shown in our casino gambling model. Empirically, individuals have been observed to violate quasi-convexity and, as a result, to prefer randomization; see for instance Camerer and Ho (1994), Blavatskyy (2006), and the references therein. Pn Quasi-convexity implies V( i=1 pi Fi ) ≤ max1≤i≤n V(Fi ) and the continuity is needed to pass with n to infinity. 14 Indeed, by setting G in the definition of the independence axiom to be F1 and F2 , respectively, we immediately conclude that independence leads to betweenness. 15 Examples of preferences satisfying betweenness include disappointment theory (Gul, 1991), weighted utility model (Chew, 1983, Chew and MacCrimmon, 1979, Fishburn, 1983), skew-symmetric bilinear model (Fishburn, 1981), expected utility with suspicion (Bordley and Hazen, 1991), and utility models with betweenness only (Chew, 1989, Dekel, 1986). Further, some models in the quadratic class of utilities proposed by Machina (1982) and Chew et al. (1991) satisfy quasi-convexity. 13 16 4.2 Preference for Path-Dependent Strategies It is well known that if the agent’s preferences are represented by expected utility, i.e., R V(F ) = u(x)dF (x) for some utility function u, then the agent’s optimal stopping time must be Markovian. This result is a consequence of dynamic programming. For dynamic decision problems with non-EUT preferences, however, dynamic programming may not hold if the problems are time inconsistent, so it is unclear whether the optimal stopping time is still Markovian in this case. In the following, we show that any path-dependent strategy can be viewed as a randomization of Markovian strategies, so the agent does not strictly prefer path-dependent strategies if his preference for Xτ is quasi-convex. Proposition 1 For any τ ∈ AD , there exist τ˜ ∈ AC such that (Xτ , τ ) has the same distribution as (Xτ˜ , τ˜). More generally, for any randomized path-dependent strategy τ 0 , there exist τ˜ ∈ AC such that (Xτ 0 , τ 0 ) has the same distribution as (Xτ˜ , τ˜). Proposition 1 shows that for any path-dependent stopping time τ , Xτ is identically distributed as Xτ˜ for some τ˜ ∈ AC , which is a randomization of Markovian stopping times. This result has three implications for an agent whose preference for Xτ is represented by V(FXτ ). First, the agent is indifferent between τ and τ˜. Consequently, we must have sup V(FXτ ) ≥ sup V(FXτ ), τ ∈AD τ ∈AC which explains the second inequality in (5). In other words, randomized path-independent strategies always perform no worse than non-randomized path-dependent strategies. The inequality may be strict as seen in our examples in Section 3 above. Second, for any randomized path-dependent strategy τ 0 , there also exists τ˜ ∈ AC such that Xτ 0 has the same distribution as Xτ˜ . Therefore, using randomized path-dependent strategies cannot improve the gamblers preference value compared to simply using randomized but path-independent strategies. This explains why we consider only the latter in the casino gambling problem. Third, if the agent’s preference representation V is quasi-convex then it is optimal for him to use non-randomized and path-independent strategies only. Indeed, we have already concluded that he dislikes any type of randomization. Further, by Proposition 1, any pathdependent strategy is equivalent to a randomization of path-independent strategies and is thus less preferred than some path-independent strategy. In the casino gambling problem, the gambler can improve his CPT value by considering path-dependent strategies only 17 because CPT is not quasi-convex. 4.3 Discounting and time-dependent preferences For any full randomization σ, say taking possible values τi ’s with respective probabilities pi ’s, we have P(σ ≤ t, Xσ ≤ x) = E [P(σ ≤ t, Xσ ≤ x|σ)] = X pi P(τi ≤ t, Xτi ≤ x|σ = τi ) i=1 = X pi P(τi ≤ t, Xτi ≤ x), i=1 i.e., the joint distribution of (σ, Xσ ) is a convex combination of the joint distributions of (τi , Xτi ), i ≥ 1. Furthermore, Proposition 1 shows that for any path-dependent stopping time τ (randomized or not), (τ, Xτ ) is identically distributed as (˜ τ , Xτ˜ ) for some randomized but path-independent strategy τ . Therefore, the conclusions in Sections 4.1 and 4.2 remain true if the agent’s preferences are represented by a functional V˜ of the joint distribution of (τ, Xτ ). In particular, if V˜ is quasi-convex, then the agent dislikes randomization and path-dependent strategies. Suppose the agent’s preferences are represented as V(FH(τ,Xτ ) ), a functional of the distribution of H(τ, Xτ ) for some function H. Then, this preference representation can also be viewed as a functional of the joint distribution of (τ, Xτ ). Furthermore, if V(FH(τ,Xτ ) ) is quasi-convex in FH(τ,Xτ ) , it is also quasi-convex in the joint distribution Fτ,Xτ . In this case, the agent dislikes randomization and path-dependent strategies. A simple example of function H is H(t, x) = e−rt x, where r is a discount factor. Therefore, if the agent has law-invariant and quasi-convex preferences for the discounted value e−rτ Xτ , he will choose only Markovian strategies. 5 Solving the Optimal Gambling Problem In this section, we provide a systematic approach to finding the optimal randomized stopping time τ ∈ AC to maximize the gambler’s CPT value (3). Classical approaches to optimal stopping problems include the martingale method and the dynamic programming principle, which depend respectively on the linearity of mathematical expectation and time consistency. Both of these approaches fail in the casino 18 gambling problem due to the presence of probability weighting functions. Observe that the probability distribution of Sτ is the direct carrier of the gambler’s CPT value; this observation motivates us to take this distribution as the decision variable. This idea of changing the decision variable from τ to the distribution or quantile function of Sτ was first applied by Xu and Zhou (2012) to solve a continuous-time optimal stopping problem with probability weighting. In our discrete-time setting, however, quantile functions are not a suitable choice for the decision variable because they are integer-valued and thus form a non-convex feasible set. Here, we choose distribution functions as the decision variable. The procedure for solving the casino gambling problem can be divided into three steps. First, change the decision variable from stopping time τ to the probability distribution of Sτ . Second, solve an infinite-dimensional program to obtain the optimal distribution of Sτ . Third, provide a construction of an optimal stopping strategy from the optimal probability distribution. The key in carrying out this three-step procedure is to characterize the set of distributions of Sτ and to recover the stopping time τ such that Sτ has the desired optimal distribution, and these prerequisites are achieved using the Skorokhod embedding techniques. To illustrate this idea, we consider the casino gambling problem on an infinite time horizon.16 In this setting, we need to exclude doubling strategies, e.g., τ = inf{t ≥ 0|St ≥ b} for some b ∈ R. One way to exclude such strategies is to restrict ourselves to stopping times τ such that {Sτ ∧t }t≥0 is uniformly integrable.17 Therefore, the feasible set of stopping times under consideration becomes T := {τ ∈ AC |{Sτ ∧t }t≥0 is uniformly integrable}, and the gambler’s problem can be formulated as max V (Sτ ). τ ∈T (7) Next, we characterize the distribution of Sτ for τ ∈ T . Because {Sτ ∧t }t≥0 is uniformly integrable, we conclude that Sτ is integrable and that E[Sτ ] = 0. Thus, the following is a 16 Considering an infinite time horizon greatly reduces the technical difficulties. In the finite time horizon case, which we hope to study in a separate work, the characterization of the feasible distributions of Sτ is much more involved. 17 Another approach would be to impose a finite credit line, e.g., St ≥ −L, t ≥ 0 for some L > 0. 19 natural candidate for the set of feasible distributions of Sτ : ( M0 (Z) = ) probability measure µ on Z : X |n| · µ({n}) < ∞, n∈Z X n · µ({n}) = 0 . (8) n∈Z The following theorem confirms that M0 (Z) fully characterizes the feasible set. Theorem 2 For any µ ∈ M0 (Z), there exists {ri }i∈Z such that Sτ follows µ where τ is defined as in (4) with P(ξt,i = 0) = ri , i ∈ Z. Furthermore, {Sτ ∧t }t≥0 is uniformly integrable and does not visit states outside any interval that contains the support of µ. Conversely, for any τ ∈ AC such that {Sτ ∧t }t≥0 is uniformly integrable, the distribution of Sτ belongs to M0 (Z). Note that M0 (Z) is a convex set and contains the distribution of Sτ for any τ ∈ AD with uniform integrability. Consequently, the distribution of Sσ for any full randomization σ of possibly path-dependent strategies also lies in M0 (Z) and Theorem 2 implies that it may be achieved using a strategy in AC . With xn := P(Sτ ≥ n) and yn := P(Sτ ≤ −n), n ≥ 1, we have V (Sτ ) = = ∞ X n=1 ∞ X u+ (n) (w+ (xn ) − w+ (xn+1 )) − (u+ (n) − u+ (n − 1)) w+ (xn ) − n=1 ∞ X u− (n) (w− (yn ) − w− (yn+1 )) n=1 ∞ X (u− (n) − u− (n − 1)) w− (yn ) n=1 where the second equality is due to Fubini’s theorem. Recall that V (Sτ ) := −∞ if the second sum is infinite. With the notations x := (x1 , x2 , . . . ), y := (y1 , y2 , . . . ), we denote U (x, y) := ∞ X (u+ (n) − u+ (n − 1)) w+ (xn ) − n=1 ∞ X (u− (n) − u− (n − 1)) w− (yn ). (9) n=1 Again, U (x, y) := −∞ if the second sum is infinite. In view of Theorem 2, problem (7) can be translated into the following optimization 20 problem: max x,y U (x, y) subject to 1 ≥ x1 ≥ x2 ≥ ... ≥ xn ≥ ... ≥ 0, 1 ≥ y1 ≥ y2 ≥ ... ≥ yn ≥ ... ≥ 0, (10) x1 + y1 ≤ 1, P∞ P∞ x = n n=1 n=1 yn . Note that xn stands for P(Sτ ≥ n), so we must have the first constraint in (10). Similarly, yn stands for P(Sτ ≤ −n), hence the second constraint in (10). In addition, x1 + y1 = P(Sτ ≥ 1) + P(Sτ ≤ −1) = 1 − P(Sτ = 0) ≤ 1, which explains the third constraint. Finally, the last constraint is the translation of E[Sτ ] = 0. Because of Theorem 2, problems (7) and (10) are equivalent in terms of the optimal value and the existence of the optimal solution. If we can find an optimal solution to problem (10), which represents a distribution on Z, then Theorem 2 shows that we can find a stopping time τ ∈ T such that Sτ follows this distribution. Consequently, τ is the optimal stopping time for the gambling problem (7). The details of the construction of τ are provided in Appendix B. Therefore, to find the gambler’s optimal strategy, we only need to solve (10). To model individuals’ tendency to overweigh the tails of a distribution, w± (z) should be concave in z in the neighbourhood of 0. Consequently, U (x, y) is neither concave nor convex in (x, y) and thus it is hard to solve (10) analytically. Assuming power probability weighting functions w± (z) = z δ± , we are able to find a complete solution to (10).18 Because the rigorous analysis leading to this complete solution is complex and lengthy, we develop and present the analysis in a companion paper Authors (2014). Here, we only take a numerical example from that paper to illustrate the procedure of solving the gambling problem set out above. In this example, with α+ = 0.5, α− = 0.9, δ± = 0.52, and λ = 2.25, the optimal 18 Assuming concave power probability weighting functions is reasonable because these functions are concave in the neighbourhood of 0. For general weighting functions, we are not able to find a complete solution to (10). 21 distribution of Sτ is 0.5 0.5 1/0.48 0.5 0.5 1/0.48 0.1360 (n − (n − 1) ) , − ((n + 1) − n ) 0.0933, P(Sτ = n) = 0.8851, 0, n ≥ 2, n = 1, n = −1, n = 0 or n ≤ −2. We now construct a stopping time that achieves this distribution. In view of Theorem 2, we only need to determine ri ’s, i.e., the probabilities of the random coins showing heads. Following the construction in Appendix B, we obtain rn = 1, n ≤ −1, r0 = 0, r1 = 0.0571, r2 = 0.0061, r3 = 0.0025, r4 = 0.0014, . . . . Therefore, the gambler’s strategy is as follows: stop once reaching −1; never stop at 0; toss a coin with tails probability 0.0571 when reaching 1 and stop only if the toss outcome is tails; toss a coin with tails probability 0.0061 when reaching 2 and stop only if the toss outcome is tails; and so on. Note that this strategy requires a coin toss at every node19 . 6 Conclusion This paper considers the dynamic casino gambling model with CPT preferences that was initially proposed by Barberis (2012). In there, by studying a specific example, it was shown that CPT is able to explain several empirically observed gambling behaviors. Because of the restrictive set of strategies used in Barberis (2012) it was not possible to consider therein two other commonly observed phenomena, namely: use of path-dependent strategies and use of an independent randomization for decision making. Our first contribution here was to show that CPT, as a descriptive model for individuals’ preferences, accounts also for these two types of gambling behavior. We have illustrated, via simple examples, that path-dependent stopping strategies and randomized stopping strategies strictly outperform path-independent strategies. Moreover, we studied in detail the relation between different classes of strategies and what features of preferences favour which strategies. We have 19 In our companion paper, a different type of randomized strategy, one that is path-dependent but requires only a limited number of coin tosses, will be presented. 22 shown that agents with quasi-convex preferences have no incentive to use path-dependent or randomized strategies. In particular, the improvement in performance brought by these strategies in the casino gambling problem is a consequence of lack of quasi-convexity of CPT preferences. As mentioned before, the analysis in Barberis (2012) proceeds by enumerating all the (Markovian) strategies in the 5-period model. Our second main contribution here was to develop a systematic approach to solving the casino gambling problem analytically. Out method involves changing the decision variables, establishing a new Skorokhod embedding result, and solving an infinite-dimensional program. The solution of the infinite-dimensional program is highly involved, so we leave it to a companion paper Authors (2014). In addition, we have studied here only the infinite-horizon setting, which allowed us to focus on the main features of the problem. Adapting our approach to a finite horizon case, which we hope to pursue in future work, will involve new technical challenges. Finally, we focus only on one type of gambler, i.e., sophisticated gamblers with pre-commitment. This is the crucial and most involved type of agent which also highlights best the features of CPT preferences. The other two types of gamblers addressed in Barberis (2012), i.e., naive gamblers and sophisticated gamblers without pre-commitment, can be studied using the results of this paper and we intend to do so in our future work. Note. Recently, and independently of our work, Henderson et al. (2014) observed that randomized strategies may be necessary for optimal gambling strategies. This observation emerged in the course of a conversation between one of those authors and two of the authors of the present paper at the SIAM Financial Mathematics meeting in November 2014 in Chicago. The other paper was subsequently posted on SSRN. A A.1 Proofs Proof of Proposition 1 For any τ ∈ AD , let r(t, x) = P(τ = t|Xt = x, τ ≥ t), t ≥ 0, x ∈ Z. Take mutually independent random variables {ξt,x }t≥0,x∈Z , which are also independent of {Xt }t≥0 , such that P(ξt,x = 0) = r(t, x) = 1 − P(ξt,x = 1). Define τ˜ = inf{t ∈ N : ξt,Xt = 0} ∈ AC . We will show that (Xτ˜ , τ˜) is identically distributed as (Xτ , τ ), i.e., for any s ≥ 0, x ∈ Z, P(Xτ = x, τ = s) = P(Xτ˜ = x, τ˜ = s). 23 (11) We prove this by mathematical induction. We first show that (11) is true for s = 0. Indeed, for any x ∈ Z, we have P(Xτ = x, τ = 0) = P(X0 = x, τ = 0, τ ≥ 0) = P(X0 = x, τ ≥ 0)P(τ = 0|X0 = x, τ ≥ 0) = P(X0 = x)r(0, x) = P(X0 = x)P(ξ0,x = 0) = P(X0 = x, ξ0,x = 0) = P(X0 = x, τ˜ = 0) = P(Xτ˜ = x, τ˜ = 0), where the fifth equality is due to the independence of ξ0,x and {Xt }t≥0 and the sixth equality follows from the definition of τ˜. Next, we suppose that (11) is true for s ≤ t and show that it is also true for s = t + 1. First, note that {Xt }t≥0 is Markovian with respect both to the filtration generated by itself and to the filtration enlarged by randomization {ξt,x }t≥0,x∈Z . Furthermore, τ and τ˜ are stopping times with respect to these two filtrations, respectively. As a result, for any s < t, given Xs , events {τ = s} and {˜ τ = s} are independent of Xt . Then, we have P(Xt = x, τ ≤ t) = XX = XX s≤t s≤t = = y P(Xt = x|Xs = y)P(Xs = y, τ˜ = s) y XX s≤t P(Xt = x|Xs = y)P(Xs = y, τ = s) y XX s≤t P(Xt = x|Xs = y, τ = s)P(Xs = y, τ = s) P(Xt = x|Xs = y, τ˜ = s)P(Xs = y, τ˜ = s) y = P(Xt = x, τ˜ ≤ t), where the third equality is the case because (11) holds for s ≤ t by mathematical induction. 24 Consequently, P(Xτ = x, τ = t + 1) =P(τ = t + 1|Xt+1 = x, τ ≥ t + 1)P(Xt+1 = x, τ ≥ t + 1) X =r(t + 1, x) P(Xt+1 = x, Xt = y, τ ≥ t + 1) y =r(t + 1, x) X P(Xt+1 = x|Xt = y, τ ≥ t + 1)P(Xt = y, τ ≥ t + 1) y =r(t + 1, x) X =r(t + 1, x) X =r(t + 1, x) X P(Xt+1 = x|Xt = y)P(Xt = y, τ ≥ t + 1) y P(Xt+1 = x|Xt = y) (P(Xt = y) − P(Xt = y, τ ≤ t)) y P(Xt+1 = x|Xt = y) (P(Xt = y) − P(Xt = y, τ˜ ≤ t)) y =r(t + 1, x) X P(Xt+1 = x|Xt = y)P(Xt = y, τ˜ ≥ t + 1) y =r(t + 1, x) X =r(t + 1, x) X P(Xt+1 = x|Xt = y, τ˜ ≥ t + 1)P(Xt = y, τ˜ ≥ t + 1) y P(Xt+1 = x, Xt = y, τ˜ ≥ t + 1) y =P(˜ τ = t + 1|Xt+1 = x, τ˜ ≥ t + 1)P(Xt+1 = x, τ˜ ≥ t + 1) =P(Xτ˜ = x, τ˜ = t + 1). Here, the fourth and eighth equalities hold because of the Markovian property of {Xt }t≥0 and the tenth equality is the case because of the definition of τ˜. By mathematical induction, (11) holds for any t and x. Finally, consider a randomized, path-dependent strategy τ 0 , which is constructed as τ 0 = inf{t ≥ 0|ξt,Xt∧· = 0}, where {ξt,xt∧· } is a sequence of independent 0-1 random variables which are independent of {Xt }. Denote {Gt0 } as the filtration generated by {Xt } and the independent 0-1 random variables, i.e., Gt0 := σ(Xu , ξu,Xu∧· , u ≤ t). Then, {Xt } is still a Markov process with respect to {Gt0 } and τ 0 is a (possibly path-dependent) stopping time with respect to the same filtration. As a result, the above proof is still valid if we replace τ with τ 0 , so (τ 0 , Xτ 0 ) is 25 identically distributed as (˜ τ , Xτ˜ ) for some τ˜ ∈ AC . A.2 Proof of Theorem 2 It is straightforward to see that for any τ ∈ AC such that {Sτ ∧t }t≥0 is uniformly integrable, the distribution of Sτ belongs to M0 (Z). We only need to show that for any µ ∈ M0 (Z), we can construct τ ∈ AC such that Sτ is distributed as µ and {Sτ ∧t }t≥0 is uniformly integrable. In the following, we denote r := {ri }i∈Z , where ri ∈ [0, 1]. Given r, consider independent 0-1 random variables {ξt,i }t≥0,i∈Z that are also independent of {St }t≥0 and satisfy P(ξt,i = 0) = 1 − P(ξt,i = 1) = ri , i ∈ Z. Denote τ (r) := inf{t ≥ 0 : ξt,St = 0}. Note that {St }t≥0 is still a symmetric random walk with respect to {Gt }t≥0 , where Gt := σ(Su , ξu,Su : u ≤ t), and τ (r) is a stopping time with respect to {Gt }t≥0 . Furthermore, the distribution of Sτ (r) depends on r but is independent of the selection of {ξt,i }t≥0,i∈Z . The proof of Theorem 2 is divided into several steps. We first consider the case in which µ has finite support, i.e., µ([−B, A]) = 1 and µ({A}) > 0, µ({−B}) > 0 for some A, B ∈ Z. Because µ is centered, i.e., it has zero mean, we must have A, B ≥ 0. We want to find r such that τ (r) embeds the distribution µ, i.e., Sτ (r) has distribution µ. Let H[−B,A] = inf{t ≥ 0|St ≥ A or St ≤ −B}. Consider the set of r which embed less probability mass than prescribed by µ on (−B, A), i.e., Rµ = {r ∈ [0, 1]Z |ri = 1 if i ∈ / (−B, A) and P(Sτ (r) = i) ≤ µ({i}) if i ∈ (−B, A)}. By choosing ri = 0 for i ∈ (−B, A) and ri = 1 for i ∈ / (−B, A), we have P(Sτ (r) = i) = 0 for i ∈ (−B, A), so Rµ is non-empty. Furthermore, by definition, τ (r) ≤ H[−B,A] for any r ∈ Rµ . Proposition 3 If r, r0 ∈ Rµ , then so does their maximum, i.e. ˜r ∈ Rµ , where r˜i = ri ∨ ri0 , i ∈ Z. Proof Fix r, r0 ∈ Rµ and define ˜r = {˜ ri }i∈Z with r˜i = ri ∨ ri0 . Denote {ξt,i }t≥0,i∈Z as the sequence of 0-1 random variables used to construct τ (r). Construct another sequence of 0-1 random variables {εt,i }t≥0,i∈Z that are independent of each other and of {St }t≥0 , and 26 satisfy P(εt,i = 0) = 1 − P(εt,i (ri0 − ri )+ = 1) = 1ri <1 , 1 − ri t ≥ 0, i ∈ Z. Define ξ˜t,i := ξt,i εt,i . Then, P(ξ˜t,i = 0) = P(ξt,i = 0) + P(ξt,i = 1)P(εt,i = 0) = ri + (1 − ri ) (ri0 − ri )+ 1ri <1 = ri ∨ ri0 . 1 − ri Therefore, τ (˜r) can be constructed as follows τ (˜r) = inf{t ≥ 0|ξ˜t,St = 0} = inf{t ≥ 0|ξt,St εt,St = 0}. One can easily see that τ (˜r) ≤ τ (r). Now, consider any i ∈ Z ∩ (−B, A) such that ri ≥ ri0 , i.e., r˜i = ri . In this case, P(εt,i = 0) = 0, i.e., εt,i = 1. Note that for any t ≥ 0, {Sτ (˜r) = i, τ (˜r) = t} = {ξu,Su εu,Su = 1, u ≤ t − 1, ξt,i εt,i = 0, St = i} ⊆ {ξu,Su = 1, u ≤ t − 1, ξt,i εt,i = 0, St = i} = {ξu,Su = 1, u ≤ t − 1, ξt,i = 0, St = i} = {Sτ (r) = i, τ (r) = t}, where the second equality is the case because εt,i = 1. Therefore, we conclude {Sτ (˜r) = i} ⊆ {Sτ (r) = i}, so P(Sτ (˜r) = i) ≤ P(Sτ (r) = i) ≤ µ({i}). Similarly, for any i ∈ Z ∩ (−B, A) such that ri < ri0 , we also have P(Sτ (˜r) = i) ≤ µ({i}). Furthermore, it is obvious that r˜i = 1, i ∈ / (−B, A). Thus, ˜r ∈ Rµ . Q.E.D The following lemma is useful: Lemma 1 The distribution of Sτ (r) is continuous in r ∈ R := {r = {ri }i∈Z |ri = 1, i ∈ / (−B, A)}. Proof We need to prove that P(Sτ (r) = i) is continuous in r for any i ∈ Z. For any t ≥ 0, P(Sτ (r) = i, τ (r) = t) = P(ξu,Su = 1, u ≤ t − 1, ξt,St = 0, St = i) = E E[1ξu,Su =1,u≤t−1,ξt,St =0,St =i |σ(Su : u ≤ t)] . 27 For each realization of Su , u ≤ t, the above conditional probability is obviously continuous in r. Because the number of realizations of Su , u ≤ t is finite, we conclude that P(Sτ (r) = i, τ (r) = t) is continuous in r for any t ≥ 0. Finally, we have τ (r) ≤ H[−B,A] for any r ∈ R, so sup P(Sτ (r) = i, τ (r) ≥ t) ≤ P(H[−B,A] ≥ t), r∈R which goes to zero as t goes to infinity. Therefore, we conclude that P(Sτ (r) = i) is continuous in r ∈ R. Q.E.D Define rmax = {rimax }i∈Z with rimax := sup{s|r ∈ Rµ with ri = s}. Proposition 4 rmax is the maximal of Rµ , i.e., rmax ∈ Rµ and rimax ≥ ri , i ∈ Z for any r = {ri }i∈Z ∈ Rµ . Furthermore, Sτ (rmax ) follows distribution µ and {Sτ (rmax )∧t }t≥0 is uniformly integrable. Proof By definition, rimax = 1 for i ∈ / (−B, A). For i ∈ (−B, A), there exist rn,i = rjn }j∈Z with r˜jn := maxi∈(−B,A) rjn,i . {rjn,i }j∈Z ∈ Rµ such that limn→∞ rin,i = rimax . Define ˜rn := {˜ Then, by Proposition 3, ˜rn ∈ Rµ and, furthermore, we can assume that r˜jn is increasing in n for each j. Moreover, by construction, limn→∞ r˜in = rimax for each i ∈ (−B, A). Because of Lemma 1, we conclude that rmax ∈ Rµ . By definition, it is clear that rmax is the maximal of Rµ . Next, {Sτ (rmax )∧t }t≥0 is uniformly integrable because rmax ≤ H[−B,A] and {SH[−B,A] ∧t }t≥0 is uniformly integrable. Consequently, Sτ (rmax ) has zero mean. Finally, we show that Srmax follows distribution µ. By definition, P(Srmax = i) = 0 = µ({i}) for i ∈ / [−B, A]. We claim that P(Srmax = i) = µ({i}) for i ∈ (−B, A). Otherwise, there exist i0 ∈ (−B, A) such that P(Sτ (rmax ) = i0 ) < µ({i0 }). Consider r with ri = rimax for i 6= x and ri0 = rimax + . It follows that P(Sτ (r ) = i) ≤ P(Sτ (rmax ) = i) ≤ µ({i}) for 0 i 6= i0 . Further, by Lemma 1, P(Sτ (r ) = i0 ) ≤ µ({i0 }) for sufficiently small . Therefore, r ∈ Rµ , contradicting the fact that rmax is the maximal of Rµ . Because both Sτ (rmax ) and µ have zero mean and they agree on all states i except for −B and A, we conclude that they have the same distribution. Q.E.D Proposition 4 establishes Theorem 2 for µ with bounded support. The final step is to extend the result to general µ ∈ M0 through a limiting and coupling procedure. Let us fix µ ∈ M0 without bounded support and, without loss of generality, assume that the support 28 is unbounded on the right, i.e., µ((−∞, n]) < 1 for all n ≥ 0. For each integer n ≥ 1, we want to construct µn such that µn has support on [−n, n], is the same as µ on (−n, n), and has zero mean, where n will be determined later. To meet the constraints, we must have µ ({−n}) + µ((−n, n)) + µ ({n}) = 1, n n −nµ ({−n}) + nµ ({n}) + Pn−1 iµ({i}) = 0, n n i=−n+1 from which we can solve " # n−1 X 1 n(1 − µ((−n, n))) − iµ({i}) , µn ({n}) = n+n i=−n+1 " # n−1 X 1 µn ({−n}) = n(1 − µ((−n, n))) + iµ({i}) . n+n i=−n+1 (12) We need to choose n such that µn is truly a probability measure, i.e., such that µn ({n}) ≥ 0 and µn ({−n}) ≥ 0. We define n as n := min {k ∈ N|f (n, k) > 0}} , f (n, k) := k(1 − µ((−k, n))) − n−1 X iµ({i}). i=−k+1 For each fixed n, one can see that f (n, k) is increasing in k and limk→∞ f (n, k) = +∞ because µ((−∞, n]) < 1. Therefore, n is well-defined. By definition, µn ({n}) defined as in (12) is positive. Furthermore, n > 0 because f (n, 0) ≤ 0. On the other hand, n(1 − µ((−n, n))) + n−1 X iµ({i}) = −f (n, n − 1) + (n + n − 1)(1 − µ((−n, n))) > 0, i=−n+1 where the inequality is the case because f (n, n−1) ≤ 0 by the definition of n, µ((∞, n)) < 1, n ≥ 1, and n > 0. Therefore, µn ({−n}) defined as in (12) is positive. Finally, because f (n, k) is decreasing in n for each k, we conclude that n is increasing in n. We claim that −n converges to the left end of the support of µ as n goes to infinity, i.e., limn→∞ n = sup{i ∈ Z|µ((−∞, −i]) > 0}. Otherwise, there exists a positive integer 29 m < sup{i ∈ Z|µ((−∞, −i]) > 0} such that n ≤ m for any n. Because f (n, k) is increasing in k, we conclude that f (n, m) > 0 for any n. However, ∞ X lim f (n, m) = m(1 − µ((−m, +∞))) − n→∞ iµ({i}) < 0, i=−m+1 P∞ where the last inequality is the case because i=−∞ iµ({i}) = 0 and m < sup{i ∈ Z|µ((−∞, −i]) > 0}. To conclude, for each n ≥ 1, we construct positive integer n and measure µn such that µn has support [−n, n] and is the same as µ on (−n, n), and −n converges to the left end of the support of µ as n goes to infinity. Denote rn = {rin }i∈Z as the maximal of Rµn as in Proposition 4. We show that rin is decreasing in n for each i. To this end, we only need to show that rin ≥ rin+1 for any i ∈ (−n, n). If it is not the case, there exist i0 ∈ (−n, n) such that rin0 < rin+1 . We define 0 ˜r = {˜ ri }i∈Z with r˜i := 1 for i ∈ / (−n, n) and r˜i := rin+1 for i ∈ (−n, n). It is obvious that for any i ∈ (−n, n), P(S˜r = i) ≤ P(Srn+1 = i) = µ({i}). Therefore, ˜r ∈ Rµn , but this contradicts the maximality of rn in Rµn . Because rin is decreasing in n for each i, we can define ri := limn→∞ rin and denote r = {ri }i∈Z . We now construct a version of τ (rn )’s. Construct 0-1 random variables {εnt,i }t≥0,n≥1,i∈Z that are independent of each other and {St }t≥0 and satisfy P(ε1t,i = 1) = ri1 , P(εnt,i = 1) = rin rin−1 1rin−1 >0 , n ≥ 2. Q n n Define ξt,i = 1 − nk=1 εkt,i , n ≥ 1. Then, it is straightforward to verify that P(ξt,i = 0) = rin . n Because for each n, ξt,i , t ≥ 0, i ∈ Z are independent of each other and {St }t≥0 , we can n n define τ (r ) = inf{t ≥ 0|ξt,S = 0}. By definition, τ (rn ) is increasing in n and t lim τ (rn ) = inf{t ≥ 0|ξt,St = 0}, n→∞ n where ξt,i := limn→∞ ξt,i . Note that ξt,i , t ≥ 0, i ∈ Z are well-defined 0-1 random variables, 30 independent of each other, and independent of {St }t≥0 . Furthermore, n P(ξt,i = 0) = lim P(ξt,i = 0) = lim rin = ri , n→∞ n→∞ so inf{t ≥ 0|ξt,St = 0} is a version of τ (r). With this version, we have τ (r) = limn→∞ τ (rn ). P We now show that τ (r) < ∞ almost surely. To this end, let Ft := |St | − t−1 j=0 1Sj =0 , t ∈ N. One can check that {Ft }t≥0 is a martingale. By the optional sampling theorem, using τ (rn ) ≤ H[−n,n] , we conclude E[Fτ (rn ) ] = 0. This yields the first equality in the following computation: τ (rn )−1 E[ X 1Sj =0 ] = E[|Sτ (rn ) |] = j=0 X |i|µn ({i}) ≤ X |i|µ({i}) < ∞. (13) i∈Z i∈Z P (r)−1 1Sj =0 ], By the monotone convergence theorem, the left-hand side of (13) converges to E[ τj=0 and it follows that τ (r) < ∞ almost surely. Otherwise, this expectation would be infinite by the recurrence of symmetric random walk. Therefore, we have that τ (rn ) converges increasingly to τ (r) and Sτ (rn ) converges to Sτ (r) , almost surely, when n → ∞. Note that for each i 6= inf{j ∈ Z|µ({j}) > 0}, we have P(Sτ (rn ) = i) = µn ({i}) = µ({i}) for sufficiently large n, so P(Sτ (r) = i) = limn→∞ P(Sτ (rn ) = i) = µ({i}). Hence Sτ (r) follows distribution µ. Furthermore, X i∈Z |i|µ({i}) = E[|Sτ (r) |] ≤ lim inf E[|Sτ (rn ) |] ≤ lim sup E[|Sτ (rn ) |] ≤ n→∞ n→∞ X |i|µ({i}) < ∞, i∈Z where the first inequality is due to Fatou’s lemma and the third inequality is because of (13). Therefore, we have limn→∞ E[|Sτ (rn ) |] = E[|Sτ (r) |] < ∞. It follows by Scheffe’s lemma (see e.g., Williams, 1991) that Sτ (rn ) → Sτ (r) in L1 , and hence E[Sτ (r) |Fτ (rn ) ] = Sτ (rn ) almost surely. By the martingale convergence theorem and the tower property of conditional expectation, E[Sτ (r) |Fτ (r)∧t ] = lim E[Sτ (r) |Fτ (rn )∧t ] = lim E[E[Sτ (r) |Fτ (rn ) ]|Fτ (rn )∧t ] n→∞ n→∞ = lim E[Sτ (rn ) |Fτ (rn )∧t ] = lim Sτ (rn )∧t = Sτ (r)∧t . n→∞ n→∞ Therefore, we conclude that {Sτ (r)∧t }t≥0 is uniformly integrable. Finally, by construction, it is easy to see that ri = 1 for any i that is not in the support 31 of µ and that is at the boundaries of the support. Therefore, {Sτ (r)∧t }t≥0 will never visit states outside any interval that contains the support of µ. B Construction of Random Coins In this section, we work under the assumptions of Theorem 2 and provide an algorithmic method for computing {ri }i∈Z obtained therein. We let τ = τ (r) and gi denote the expected number of visits of {St }t≥0 to state i strictly before τ , i.e., gi := ∞ X P (τ > t, St = i) . (14) t=0 An argument similar to the one in (13) shows that gi < ∞. We may then compute, writing P pi := µ({i}) = P(Sτ = i) = ∞ t=0 P (τ = t, St = i), as follows: gi + pi = ∞ X P (τ ≥ t, St = i) = 1{i=S0 } + ∞ X P (τ ≥ t + 1, St+1 = i) t=0 t=0 = 1{i=S0 } + ∞ X P (τ > t, St+1 = i, St = i − 1) + t=0 = 1{i=S0 } + ∞ X ∞ X P (τ > t, St+1 = i, St = i + 1) t=0 P (τ > t, St = i − 1) P (St+1 = i|St = i − 1, τ > t) t=0 + ∞ X P (τ > t, St = i + 1) P (St+1 = i|St = i + 1, τ > t) t=S0 = 1{i=S0 } + ∞ X 1 t=1 2 P (τ > t, St = i − 1) + ∞ X 1 t=1 2 P (τ > t, St = i + 1) 1 1 = 1{i=S0 } + gi−1 + gi+1 , 2 2 where the fifth equality is the case because {St }t≥0 is Markovian. In other words, we have the following identity: 1 1 gi + pi − 1{i=S0 } = gi−1 + gi+1 , 2 2 i ∈ Z. (15) Note that given a probability vector {pi }i∈Z , (15) may have an infinite number of solutions unless we specify a boundary condition. In general, we have gi → 0 as i → ±∞. 32 Further, if the support of µ is bounded from above or below, then gi is zero for i above or below the bound of the support, respectively. Then (15) admits a unique solution and the solution is finite. This allows us to recover gi from pi . Finally, to construct r, we need to connect r with gi and pi . Note that pi = P Sτ (r) ∞ X =i = P (τ (r) = t, St = i) t=0 = ∞ X P (ξu,Su = 1, u = 0, 1, . . . , t − 1, ξt,St = 0, St = i) t=0 = = ∞ X t=0 ∞ X P (ξu,Su = 1, u = 0, 1, . . . , t − 1, St = i) P (ξt,St = 0|ξu,Su = 1, u = 0, 1, . . . , t − 1, St = i) P (ξu,Su = 1, u = 0, 1, . . . , t − 1, St = i) ri t=0 ∞ X = ri P (τ (r) ≥ t, St = i) = ri (gi + pi ). t=0 i Therefore, if pi + gi > 0, we must have ri = pip+g . If pi + gi = 0, then ri can take any value i in [0, 1]. Because we consider r to be the maximal in Rµ , we set ri = 1. To summarize, we have ri = pi 1{p +g >0} + 1{pi +gi =0} , pi + gi i i i ∈ Z. (16) References Agranov, M. and Ortoleva, P. (2013). Stochastic choice and hedging, Technical report, Mimeo California Institute of Technology. Authors (2014). Stopping strategies of behavioral gamblers in infinite horizon. Working Paper. Barberis, N. (2012). A model of casino gambling, Management Science 58(1): 35–51. Blavatskyy, P. R. (2006). Violations of betweenness or random errors?, Economics Letters 91(1): 34–38. 33 Bordley, R. and Hazen, G. B. (1991). SSB and weighted linear utility as expected utility with suspicion, Management Science 37(4): 396–408. Camerer, C. F. and Ho, T.-H. (1994). Violations of the betweenness axiom and nonlinearity in probability, Journal of Risk and Uncertainty 8(2): 167–196. Chew, S.-H. (1983). A generalization of the quasilinear mean with applications to the measurement of income inequality and decision theory resolving the allais paradox, Econometrica 51(4): 1065–1092. Chew, S. H. (1989). Axiomatic utility theories with the betweenness property, Annals of Operations Research 19(1): 273–298. Chew, S. H., Epstein, L. G. and Segal, U. (1991). Mixture symmetry and quadratic utility, Econometrica pp. 139–163. Chew, S. H. and MacCrimmon, K. R. (1979). Alpha-nu choice theory: a generalization of expected utility theory. Working Paper. Conlisk, J. (1993). The utility of gambling, Journal of Risk and Uncertainty 6(3): 255–275. Dekel, E. (1986). An axiomatic characterization of preferences under uncertainty: Weakening the independence axiom, Journal of Economic Theory 40(2): 304–318. Dwenger, N., K¨ ubler, D. and Weizsacker, G. (2013). Flipping a coin: Theory and evidence. Working Paper. URL: http://ssrn.com/abstract=2353282 Fishburn, P. (1981). An axiomatic characterization of skew-symmetric bilinear functionals, with applications to utility theory, Economics Letters 8(4): 311–313. Fishburn, P. C. (1983). 31(2): 293–317. Transitive measurable utility, Journal of Economic Theory Gul, F. (1991). A theory of disappointment aversion, Econometrica 59(3): 667–686. Henderson, V., Hobson, D. and Tse, A. (2014). Randomized strategies and prospect theory in a dynamic context. SSRN: 2531457. Hu, S. (2014). Optimal Exist Strategies of Behavioral Gamblers, PhD thesis, The Chinese University of Hong Kong. 34 Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of decision under risk, Econometrica 47(2): 263–291. Machina, M. J. (1982). “Expected utility” analysis without the independence axiom, Econometrica 50(2): 277–322. Machina, M. J. (1985). Stochastic choice functions generated from deterministic preferences over lotteries, Economic Journal 95(379): 575–594. Prelec, D. (1998). The probability weighting function, Econometrica 66(3): 497–527. Quiggin, J. (1982). A theory of anticipated utility, Journal of Economic Behavior and Organization 3: 323–343. Starmer, C. (2000). Developments in non-expected utility theory: The hunt for a descriptive theory of choice under risk, Journal of Economic Literature 38(2): 332–382. Thaler, R. H. and Johnson, E. J. (1990). Gambling with the house money and trying to break even: The effects of prior outcomes on risky choice, Management Science 36(6): 643–660. Tversky, A. and Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty, Journal of Risk and Uncertainty 5(4): 297–323. Williams, D. (1991). Probability with Martingales, Cambridge University Press. Xu, Z. Q. and Zhou, X. Y. (2012). Optimal stopping under probability distortion, Annals of Applied Probability 23(1): 251–282. Yaari, M. E. (1987). The dual theory of choice under risk, Econometrica 55(1): 95–115. 35