Dynamic Investment

Bellman Equations

The Hamilton–Jacobi–Bellman (HJB) equation is a partial differential equation which is central to optimal control theory. The solution of the HJB equation is the value function which gives the minimum cost for a given dynamical system with an associated cost function. The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by Richard Bellman and coworkers. The corresponding discrete-time equation is usually referred to as the Bellman equation.

To understand the Bellman equation, several underlying concepts must be understood. First, any optimization problem has some objective: minimizing travel time, minimizing cost, maximizing profits, maximizing utility, et cetera. The mathematical function that describes this objective is called the objective function.

Dynamic programming breaks a multi-period planning problem into simpler steps at different points in time. Therefore, it requires keeping track of how the decision situation is evolving over time. The information about the current situation which is needed to make a correct decision is called the “state”. For example, to decide how much to consume and spend at each point in time, people would need to know (among other things) their initial wealth. Therefore, wealth $W$ would be one of their state variables, but there would probably be others.

The variables chosen at any given point in time are often called the control variables. For example, given their current wealth, people might decide how much to consume now. Choosing the control variables now may be equivalent to choosing the next state; more generally, the next state is affected by other factors in addition to the current control. For example, in the simplest case, today’s wealth (the state) and consumption (the control) might exactly determine tomorrow’s wealth (the new state), though typically other factors will affect tomorrow’s wealth too.

The dynamic programming approach describes the optimal plan by finding a rule that tells what the controls should be, given any possible value of the state. For example, if consumption $c$ depends only on wealth $W$, we would seek a rule $c(W)$ that gives consumption as a function of wealth. Such a rule, determining the controls as a function of the states, is called a policy function.

Finally, by definition, the optimal decision rule is the one that achieves the best possible value of the objective. For example, if someone chooses consumption, given wealth, in order to maximize happiness (assuming happiness $H$ can be represented by a mathematical function, such as a utility function), then each level of wealth will be associated with some highest possible level of happiness, $H(W)$. The best possible value of the objective, written as a function of the state, is called the value function.

Richard Bellman showed that a dynamic optimization problem in discrete time can be stated in a recursive, step-by-step form known as backward induction by writing down the relationship between the value function in one period and the value function in the next period. The relationship between these two value functions is called the “Bellman equation”. In this approach, the optimal policy in the last time period is specified in advance as a function of the state variable’s value at that time, and the resulting optimal value of the objective function is thus expressed in terms of that value of the state variable. Next, the next-to-last period’s optimization involves maximizing the sum of that period’s period-specific objective function and the optimal value of the future objective function, giving that period’s optimal policy contingent upon the value of the state variable as of the next-to-last period decision. This logic continues recursively back in time, until the first period decision rule is derived, as a function of the initial state variable value, by optimizing the sum of the first-period-specific objective function and the value of the second period’s value function, which gives the value for all the future periods. Thus, each period’s decision is made by explicitly acknowledging that all future decisions will be optimally made.

Deriving the Bellman Equation in discrete setup

Let the state at time $t$ be $x_t$. For a decision that begins at time $0$, we take as given the initial state $x_0$. Then let $a_t\in \Gamma(x_t)$ be one or more control variables depends on the current state $x_t$. We also assume that the state chages from $x$ to a new state $T(x,a)$ when action $a$ is taken, and that the current payoff from taking action $a$ in state $x$ is $F(x,a)$. Finally, we have a discount factor $0<\beta<1$. Under this assumption, we have

subject to constraints $a_t\in\Gamma(x_t)$, $x_{t+1}=T(x_t,a_t)$. Then the value function $V(x)$ is the optimal value that can be obtained by maximizing this objective function subject to the assumed constraints.

Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

Therefore, we have the Bellman equation as follows

Basic Model

The starting point is a model first developed by McDonald and Siegel (1986). They considered the following problem: At what point is it optimal to pay a sunk cost $I$ in return for a project whose value is $V$, given that $V$ evolves according to the following geometric Brownian motion:

\begin{equation}\label{eq:geometric}
dV=\alpha Vdt+\sigma Vdz \tag{$\star$}
\end{equation}

where $dz$ is the increment of a Wiener process.

Note that the firm’s investment opportunity is equivalent to a perpetual call option—the right but not the obligation to buy a share of stock at a pre-specified price. Therefore the decision to invest is equivalent to decide when to exercise such an option. Thus, the investment decision can be viewed as a problem of option valuation. Also, it can be viewed as a problem in dynamic programming.

In this model, we will denote the value of the investment opportunity by $F(V)$. We want a rule that maximizes this value. Since the payoff from investing at time $t$ is $V_t-I$, we want to maximize its expected present value:

where $T$ is the unknown future time that the investment is made, $\rho$ is a discount rate.

Note: For this problem to make sense, we must also assume that $\alpha<\rho$; otherwise the integral in $\eqref{eq:geometric}$ could be made indefinitely larger by choosing a larger $T$. Thus waiting longer would always be a better policy, and the optimum would not exist. Let $\delta$ denote the difference $\rho-\alpha$, which is greater than $0$.

Because the investment opportunity, $F(V)$, yields no cash flows up to the time $T$ that the investment is undertaken, the only return from holding it is its capital appreciation. Hence, the Bellman equation is

It means that over a time interval $dt$, the total expected return on the investment opportunity, $\rho Fdt$, is equal to its expected rate of capital appreciation.

We expand $dF$ using Ito’s Lemma,

Substituting equation $\eqref{eq:geometric}$ for $dV$ into the expression

\begin{equation}\label{eq:differential}
\mathbb{E}[dF]=\alpha VF’(V)dt+{1\over 2}\sigma^2V^2F’’(V)dt
\end{equation}

Hence the Bellman equation becomes

In addition, $F(V)$ must satisfy the following boundary conditions:
\begin{equation}\label{eq:condition1}
F(0)=0 \
\end{equation}

\begin{equation}\label{eq:condition2}
F(V^\ast)=V^\ast-I \\
\end{equation}

\begin{equation}\label{eq:condition3}
F’(V^\ast)=1
\end{equation}

The first condition arises from the observation that if $V$ goes to zero, it will stay at zero. Therefore the option to invest will be of no value when $V=0$. Since $V^\ast$ is the critical value for optimal investment, the second condition is a value-matching condition, which says that upon investing, the firm receives a net payoff $V^\ast-I$. Finally, the last condition is the “smooth-pasting” condition. If $F(V)$ were not continuous and smooth at the critical exercise point $V^\ast$, on could do better by exercising at a different point.

Fork me on GitHub