Dynamic Optimization Under Uncertainty

Time plays an important role for investment decisions. One aspect of it is the opportunity to make the same decision later; therefore the option of postponement should be included in today’s menu of choices.
$\newcommand\bbE{\mathbb{E}}$

Two Period Example

Let $I$ denote the sunk cost of investment that then produce a product per period forever, and let $r$ be the interest rate. Suppose the price of the product in period $0$ is $P_0$. From period $1$ onward, it will be $(1+u)P_0$ with probability $q$ and $(1-d)P_0$ with probability $1-q$.

If the investment opportunity is available only in period $0$, the expected present value of the revenue is denoted by $V_0$.

$V_0=P_0+[q(1-u)P_0+(1-q)(1-d)P_0]\left[{1\over 1+r}+{1\over (1+r)^2}+\cdots\right]$

and the net payoff of the project $\Omega_0=\max[V_0-I,0]$.

Now consider the actual situation, where the investment opportunity remains available in future periods. The outcome of future optimal decisions, which is called expected continuation value, can be written as

$\mathbb{E}_0[F_1]=q\max[(1+u)P_0{1+r\over r}-I,0]+(1-q)\max[(1-d)P_0{1+r\over r}-I,0]$

Therefore, the net present value of the whole investment opportunity optimally deployed, which we denote by $F_0$, is

$F_0=\max\lbrace V_0-I,{1\over 1+r}\mathbb{E}_0[F_1]\rbrace$

Many Period Problem on Markov chain

For those that have many periods, let $u_t$ be the possible control variables at time $t$, $\pi_t(x_t,u_t)$ be the profit flow at time $t$ after taking control $u_t$ at position $x_t$, $\rho$ be the discount rate, and $F_t(x_t)$ be the expected net present value when the firm makes all decisions optimally from this point onwards. Since for any time $t$, $F_{t+1}(x_{t+1})$ is random as $x_{t+1}$ is randomly assigned in a Markov chain. Thus,

$F_t(x_t)=\max_{u_t}\left\lbrace \pi_t(x_t,u_t)+{1\over 1+\rho}\bbE_t\left[F_{t+1}(x_{t+1})\right]\right\rbrace$

The idea behind this decomposition is Bellman’s Principle of Optimality. To reiterate, the first term on the right-hand side is the immediate profit, the second term constitutes the continuation value, and the optimum action this period is the one that maximize the sum of these two components. If the many-period problem has a fixed finite time horizon $T$, we can start at the end and work backward. If there’s no fixed finite time horizon for the decision problem, there is no known final value function from which we can work backward.

The crucial simplification that an infinite horizon brings to the above equation is independence from time $t$ as such, i.e. $t$ by itself has no effect. This works provided the flow profit function $\pi$, the transition probability distribution function $\Phi$, and the discount rate $\rho$ are themselves all independent of the actual label of $t$. In this setting, the problem one period hence looks exactly like the problem , except for a new starting state. Therefore the value function is common to all periods, although it will be evaluated at different points $x_t$. Thus, the Bellman equation for any $t$ becomes

$F(x_t)=\max_{u_t}\left\lbrace\pi(x_t,u_t)+{1\over 1+\rho}\bbE_t[F(x_{t+1})]\right\rbrace$

The recursive Bellman equation can be thought of as a whole list of equations, one for each possible value of $x$, with a whole list of unknowns, namely all the values $F(x)$. If $x$ took on only a finite number of discrete values $x_i$, this would be a system of equations with exactly as many equations as the number of unknowns $F(x_i)$. However, the equation is not linear as the optimal choice of $u$ depends on all the values $F(x_{t+1})$ that appear in the expectation. In general, we do not know whether nonlinear functional equations have solutions, let alone unique ones. Fortunately, the recursive Bellman equation has a special structure that allows existence and uniqueness of a solution function $F(x)$.

It is essentially a practical solution method. We start with any guess for the true value function, say $F^{(1)}(x)$. Use it on the right-hand side of the Bellman equation and find the corresponding optimal choice rule $u^{(1)}$, which can be expressed as a function of $x$ alone. Then substituting it back, the RHS becomes a new function of $x$; call it $F^{(2)}$. Then the successive guesses will converge to the true function regardless of the initial guess. The key lies in the factor $1/(1+\rho)$, which is strictly less than $1$ and scales down any errors in the guess.

Optimal Stopping

Here the choice in any period is binary. One alternative corresponds to stopping the process to take the termination payoff, and the other entails continuation for one period, when another similar binary choice will be available. In the investment model, stopping corresponds to making the investment. and continuation corresponds to waiting.

Let $\pi(x)$ denote the flow profit, and $\Omega(x)$ the termination payoff. Then the Bellman equation becomes

$F(x)=\max\left\lbrace\Omega(x), \pi(x_t,u_t)+{1\over 1+\rho}\bbE[F(x')\mid x]\right\rbrace$

For some range of values of $x$, the maximum on the right-hand side of this will be achieved by termination, and for other values of $x$ it will be achieved through continuation. In general this division could be arbitrary; intervals where termination is optimal could alternate with ones where continuation is optimal. However, most economic applications will have more structure. There will be a single cutoff $x’$, with termination optimal on one side and continuation on the other.

Continuous Time

Suppose each time period is of length $\Delta t$. We write $\pi(x,u,t)$ for the rate of the profit flow, so that the actual profit over the time period of length $\Delta t$ is $\pi(x,u,t)\Delta t$. SImilarly, let $\rho$ be the discount rate per unit time, so the total discounting over an interval of length $\Delta t$ is by the factor $1/(1+\rho\Delta t)$. The Bellman equation now becomes

$F(x)=\max\left\lbrace\pi(x,u,t)\Delta t+{1\over 1+\rho\Delta t}\bbE[F(x',t+\Delta t)\mid x, u]\right\rbrace$

Rearrange the above equation and get

$\rho\Delta t F(x,t)=\max_u\left\lbrace\pi(x,u,t)\Delta t(1+\rho\Delta t)+\bbE[\Delta F]\right\rbrace$

Divide by $\Delta t$ and let it go to zero. We get

$\rho\Delta t F(x,t)=\max_u\left\lbrace\pi(x,u,t)+{1\over dt}\bbE[dF]\right\rbrace$

On the LHS we have the normal return per unit time that a decision maker using $\rho$ as the discount rate, would require for holding this asset. On the RHS, the first term is the immediate payout or dividend from the asset, while the second term is its expected rate of capital gain (loss if negative). Thus the RHS is the expected total return per unit time from holding the asset. The equality becomes a no-arbitrage or equilibrium condition, expressing the investor’s willingness to hold the asset. The maximization with respect to $u$ means that the current operation of the asset is being managed optimally, bearing in mind not only the immediate payout but also the consequences for future values.

The limit on the RHS depends on the expectation corresponding to the random $x’$ a time $\Delta t$ later. There are two classes of stochastic processes in continuous time that allow such limits in a form conducive to further analysis and solution of the function $F(x,t)$ in the continuation region, which are the Ito and Poisson processes.

In discrete time, we stipulated that the action $u$, taken in the current period $t$ could depend on the knowledge of the current state $x_t$, but not on the random future state $x_{t+1}$· In continuous time the two coalesce. We have to be careful not to allow choices to depend on information about the future, even about “the next instant.” Otherwise we would he acting with the benefit of hindsight, and could make infinite profits. Technically this can be avoided by requiring the uncertainty to be “continuous from the right” in time while the strategies are “continuous from the left.” Then any jumps in the stochastic processes occur at an instant, while the actions cannot change until just after the instant.

Ito Processes

$dx=a(x,u,t)dt+b(x,u,t)dW_t$

where $dW_t$ is the increment of a standard Wiener process.

Applying it to value function $F$, we have

\begin{align}\label{eq:ito}
\rho F(x,t)=\max_u\left\lbrace\pi(x,u,t)+F_t(x,t)+a(x,u,t)F_x(x,t)+{1\over 2}b^2(x,u,t)F_{xx}(x,t)\right\rbrace
\end{align}

Substituting this expression for the optimal $u$ back into the RHS equation $\eqref{eq:ito}$. we get a partial differential equation of the second order. with $F$ as the dependent variable and $x$ and $t$ as the independent variables. In general this equation is very complicated.

If the time horizon is infinite and the functions $\pi$, $a$, and $b$ do not depend explicitly on time, then neither does the value function depend on time, and equation $\eqref{eq:ito}$ becomes an ODE with $x$ as its only independent variable:

\begin{align}
\rho F(x)=\max_u\left\lbrace\pi(x,u)+a(x,u)F’(x)+{1\over 2}b^2(x,u)F’’(x)\right\rbrace
\end{align}

Optimal Stopping and Smooth Pasting

Here we consider a binary decision problem. At every instant, the firm can either continue its current situation to get a profit flow, or stop and get a termination payoff. Both the profit flow $\pi(x, t)$ and the termination payoff $\Omega(x, t)$ can depend on a state variable $x$ and on time $t$, where $x$ follows an Ito process

$dx=a(x,t)dt+b(x,t)dW_t$

The most obvious example is of a firm deciding whether to cease operation and sell its equipment for its scrap value. Investment decisions can also be put in this form: continuation means waiting, and the flow payoff is zero; stopping means investing, and the termination payoff is just the expected present value of future profits from the project minus the cost of investment.

Intuition suggests that for each $t$ there will be a critical value $x^\ast(t)$, with continuation optimal if $x_t$ lies on one side of $x^\ast(t)$, and stopping optimal on the other side. We can regard the critical values $x^\ast(t)$ for various $t$ as forming a curve that divides the $(x,t)$ space into two regions, with continuation optimal above the curve and termination optimal below it.

Since we have the Bellman equation as

$F(x,t)=\max\left\lbrace\Omega(x,t),\pi(x,t)+{1\over 1+\rho dt}\bbE[F(x+dx,t+dt)\mid x]\right\rbrace$

In the continuation region, the second term on the RHS is the larger of the two. Expanding it by Ito Lemma, we get a PDE satisfied by the value function.

${1\over 2}b^2(x,t)F_{xx}(x,t)+a(x,t)F_x(x,t)+F_t(x,t)-\rho F(x,t)+\pi(x,t)=0$

This holds for $x>x^\ast(t)$, and we need to look for boundary conditions that hold along $x=x^\ast(t)$. From the Bellman equation, we know that in the stopping region we have $F(x,t)=\Omega(x,t)$, so by continuity we can impose the condition

$F(x^\ast(t),t)=\Omega(x^\ast(t),t)$

for all $t$. This is often called “value-matching condition” because it matches the values of the unknown function $F(x,t)$ to those of the known termination payoff function $\Omega(x,t)$.

However, the boundary itself, i.e. the region in $(x,t)$ space over which the PDE is valid, is an unknown. The boundary of the region, namely the curve $x^\ast(t)$, is called a “free boundary,” and the whole problem of solving the equation and determining its region is called a free-boundary problem.

We need a second condition if we are to find $x^\ast(t)$ jointly with the function $F(x,t)$. The extra condition comes from economic considerations. We must have

$F_x(x^\ast(t),t)=\Omega_x(x^\ast(t),t)$

for all $t$. This is called “smooth-pasting condition” because it requires not just the values but also the derivatives or slopes of the two functions to match at the boundary.