Wednesday, March 17, 2010

How to do Stochastic Calculus


I have recently had to learn a mathematical approach called "stochastic calculus" as part of my actuarial exams. As misery loves company, I've decided to share the details with you...

This post is a short tutorial in stochastic calculus, with an emphasis on getting to the point where we can solve practical problems and skipping as much foundational material as I can get away with.

What is stochastic calculus?

Stochastic calculus is a useful tool in financial maths. In normal calculus, you might take a function and find its derivatives (gradient, curvature, etc) as time changes. Or you might take a differential equation (an equation relating a function to its derivatives) and use it to figure out what the corresponding function looks like.

"Stochastic" just means probability-related. In stochastic calculus, you take a random variable and find its derivatives, or take a differential equation and find the random variable it represents.

An example. Let's say we're looking at the size (nt) of a population of lemurs. We reckon that the rate of population change is directly proportional to the size of the population. In mathematical notation:
dnt/dt = g.nt
where g is a constant growth factor (or shrinkage factor if negative). If we solved this with differential calculus we'd find that:
nt = n0.exp(g.t)

But wait! Real populations don't work like that. They behave far more randomly, which can lead to very different results. In particular, you'll notice that, no matter how negative the growth rate is, the lemurs can never go extinct. Sadly this is not true of real animals.

How do we represent this uncertainty? One way is to add a "noise" term to the calculation. So:
dNt/dt = g.Nt + v.Wt.Nt
where v is a constant volatility factor and Wt is a continuously-changing random variable - a "stochastic process". You'll note that we're now using Nt rather than nt, to show that population size is probabilistic rather than predestined.

Brownian motion

Which stochastic process should we use for our noise term? This would depend very much on the situation - our team of biologists would need to research how species' populations change in practice. However, a very popular choice is based on "Brownian motion". It's typical to try to reframe any stochastic problem in terms of this process.

Brownian motion is named after the phenomenon of pollen grains bouncing around in water (they're small enough to be affected by the impact of individual water molecules). It is a "random walk" process, with several nice properties:

- it is distributed as N(0,t2) (e.g the standard deviation at time t is t0.5)
- it has independent increments, so you can model B2t as Bt + B't
- dBt.dBt = dt (please take it on trust that this is important, we'll see how it's relevant later)

In the case of our population equation, we might set Wt such that Wt.dt = dBt. In other words, Bt is in some sense the integral between times 0 and t of Wt. This allows us to rephrase our equation as:
dNt = g.Nt.dNt + v.Nt.Wt.dt
= g.Nt.dNt + v.Nt.dBt

Interlude: What the hell is that?

I keep using terms like dBt and dNt to mean increments of random variables. Intuitively, this doesn't make a heck of a lot of sense. Random variables, even time-dependent ones, don't have fixed values - that's kinda the point - so how the blazes can they have well-defined increments?

The answer is: the increment is also a random variable, just an infinitesimally small one (in the same way that dt is an infinitesimally small deterministic variable). When we say (for example) I = ∫0t 1.ds, the answer is clearly: I=t. When we say I = ∫0t 1.dBs, the answer is equally obvious: I = Bt, the sum of the random increments.

More complex expressions like ∫0t Bs.dBs are quite hard to understand intuitively. Multiplying a random variable by an infinitesimal random variable and taking an infinite sum? Wtf? Basically, the purpose of stochastic calculus is to allow us to transform messy situations like this back into simple ones like the two in the preceding paragraph.

The Ito formula

So anyway, we now know what the stochastic differential equation looks like. How do we solve it? We use an equation called the Ito formula to figure out what it should look like, and then we forcibly bring the two forms together.

The Ito formula says that if:
dXt = u.dt + v.dBt
(u, v real numbers)

Yt = f(t, Xt) for some function f(t,x)

dYt = ∂f/∂t.dt + ∂f/∂x.dXt + 1/2.∂2f/∂2x.dXt.dXt
using e.g. ∂f/∂x as a shorthand for ∂f/∂x(t, Xt)

Usually it makes sense to take Xt = Bt.

A simple example. Let's say we're evaluating It = ∫ Bt.dBt. We can rephrase this as: dIt = Bt.dBt

The variable It will end up depending on time and on the effects of the random variable, so we can write: It = f(t, Bt). Therefore, by Ito's formula:
dIt = ∂f/∂t.dt + ∂f/∂x.dBt + 1/2.∂2f/∂2x.dBt.dBt
= (∂f/∂t + 1/2.∂2f/∂2x).dt + ∂f/∂x.dBt because dBt.dBt = dt (see the section on Brownian motion above).

Since we know: dIt = 0.dt + Bt.dBt
we can say that: ∂f/∂x = x
and: ∂f/∂t = -1/2.∂2f/∂2x
From the first of these partial derivatives, we have:
f = x2/2 + g(t)
We can now rewrite the second equality as: ∂f/∂t = -1/2.1
giving: f = -t/2 + h(x)
So overall: f = x2/2 - t/2

Therefore (drumroll), we have: It = f(t,Bt) = Bt2/2 - t/2

That population equation

Now we've got the tools in place, let's solve that population equation. First assume that there is some function f(t,x) such that Nt = f(t,Bt)

We know that:
dNt = g.Nt.dNt + v.Nt.dBt
and that:
dNt = (∂f/∂t - 1/2.∂2f/∂2x).dt + ∂f/∂x.dBt

∂f/∂x = v.f
f = exp(v.x).p(t)
(I'm using p(t) and q(x) as my dummy functions rather than g(t) and h(x), to avoid confusion between g and g(t))

∂f/∂t + 1/2.∂2f/∂2x = g.f
∂f/∂t + 1/2.v2.f = g.f
∂f/∂t = (g - v2/2).f
f = exp((g - v2/2).t).q(x)

So overall we can infer that:
f = C.exp(v.x + (g-v2/2).t) for constant C

In other words:
Nt = N0.exp(v.Bt + (g-v2/2).t)

QED. This is similar, but not identical to what we'd expect from looking at the traditional differential equation.

How we can use this

Now, just having an equation is not terribly useful. But once we've got it we can start asking interesting questions such as "what is the probability that the population of lemurs will have shrunk at time t?"

P(Nt < N0) = P(N0.exp(v.Bt + (g-v2/2).t) < N0)
= P(exp(v.Bt + (g-v2/2).t) < 1)
= P(v.Bt + (g-v2/2).t < 0)
= P(Bt < (v2/2-g).t/v)

Since Bt ~ N(0,t), this can be worked out using a set of tables. For example if g = 5%, v = 2%, and t = 5, then:
P(N5 < N0) = P(B5 < (0.0004/2 - 0.05)*5/0.02)
= P(Z < (0.0004/2 - 0.05)*5/0.02/5^0.5)
= P(Z < (0.0004/2 - 0.05)*5/0.02/5^0.5)
= P(X < -5.5678)

Past a certain point the population of lemurs is likely to go extinct (although this is not modelled well by the equation - Nt can never reach 0). If the population was declining, and we could figure out values of g and v, we could determine the probability of extinction. We could then allocate our animal conservation money appropriately.

Applying these principles to pension fund capital is left as an exercise to the reader.

Why is that not the full story?

There are several things I've left out here, two mathematical points and two practical points.

Firstly, Wt is not a "real" distribution. If you actually try to work out what dBt/dt looks like, you find that it is infinite at infinitely many points. That is just plain weird.

Fortunately, there is a mathematical hold-all concept - the "extended distribution" that sweeps all that under the carpet. And since it all works out OK, the specifics aren't really a concern, any more than you need to know Real Analysis to understand differential calculus.

Or are they a concern? Point #2 is that there are actually several ways to define ∫...dBt, which give mutually exclusive results. Explaining the differences is beyond the scope of this post. Suffice to say that the one we've been using - the Ito integral - is mathematically nice.

Practically speaking, what we've looked at here is only half the battle. As responsible scientists, we'd want to produce sensible values for g and v, and in general confirm that the population is behaving as the equation predicts. This calibration is a nightmarish task that I'm not even going to begin to discuss here.

And finally there's the question of whether our general approach - all founded, you'll recall, on Brownian motion - is remotely sensible. There has been a lot of muttering recently pointing out that extreme events are far more likely than the normal distribution would suggest. It is very hard to produce Brownian processes that can emulate this behaviour, and nigh-on impossible to calibrate. There is a strong suggestion that a broader family called Levy processes might be more plausible, but they are also a bugger to calibrate.

This sort of thing won't be solved any time soon.


I learned SC from the textbook by Bernt Oksendal (5th edition). In retrospect this was probably a mistake, as he goes very deeply into the mathematical fundamentals before covering anything remotely practical. If you can get past the first 3 chapters without falling asleep then it's not a bad book, although not terribly readable (even by maths book standards...).

1 comment:

Anonymous said...

You wrote: "- it is distributed as N(0,t^2) (e.g the standard deviation at time t is t^0.5)"

But if something is N(0, t^2) then its sd is just `t`.