Renewals and Repeats

After talking about DTMCs (Discrete time markov chains) there are 2 ways you can go. Either talk about CTMCs (Continuous time markov chains) or talk about Renewal processes. I am choosing to talk about Renewal processes for a few reasons.

  1. It will be nice to take a break from Markov chains for a while.
  2. CTMCs in a way are just a combination of DTMCs and Renewal processes.
  3. I find Renewal processes to be more interesting

If you aren’t satisfied with those reasons, then you can come back later when I discuss CTMCs.

This post will mostly introduce a lot of the terminology/notation, so you don’t need to read it if you are already familiar with the topic but it can’t hurt.

Suppose we are given a set of iid (independent identically distributed) random variables \{Y_i\}_{i=1}^\infty, and Y_i\sim F (Y_1 has distribution F). To generalize a little further we will also assume that we have Y_0 which may have a different distribution that the other Y_i. If Y_0=0 we say that the process is pure, otherwise it is said to be delayed. The Y_i are referred to as the interarrival times, you will see why soon.

From the Y_i we define the following sequence \displaystyle S_n=\sum_{i=0}^{n-1}Y_i. If we think of the S_n as the time at which the n^{th} event occurs, then you can see why the Y_i are interarrival times.

Finally, for t\in [0,\infty) define the following function N(t)=\sup\{n:\ S_{n-1}\le t\}. Basically, this counts the number of events that have happened in the time interval [0,t]. Since N(t) depends on the S_n it is obvious that it to is also a random variable. If our process is pure then we define U(t):=\mathbb EN(t), and for a delayed process V(t):=\mathbb EN(t).

If the Y_i are exponentially distributed and Y_0=0, then N(t) is called a Poisson process. If you don’t know what the exponential distribution is then I suggest you take a moment to visit the above link. The reason I take the time to mention this case is because with this assumption we are able to create CTMCs, but more on that later.

Convolution

Suppose we have two random variables X\sim F and Y\sim G and we want to describe the distribution for X+Y. Well the answer is F*G, where * designates convolution.

Suppose f:X\to\mathbb R and g:X\to\mathbb R, where X (informally speaking) is any space that makes the following integral make sense. We define the convolution f*g(t):=\int_0^tf(t-x)g(x)dx. If instead we have a measure G:X\to [0,\infty) then f*G(t):=\int_0^tf(t-x)dG(x). In most of the cases I will talk about G will be absolutely continuous so there exist a function g such that f*G(t)=\int_0^tf(t-x)g(x)dx.

Now that we know that we can show that X+Y\sim F*G. This is rather informal but suppose we want to know P(X+Y=x), we could find this by first conditioning on the value of Y and then integrating over the values of Y.

P(X+Y=x)=\int_0^t P(X+Y=t | Y=x)P(Y=x)dx

which can be rewritten as

\int_0^t P(X=t-x)P(Y=x)dx=\int_0^tf(t-x)g(x)dx.

As you can see this is rather informal and glosses over a few issues (for instance; if Y is continuous then P(Y=s)=0) but the general idea is still there.

With this we now know that if our process is pure then S_n\sim F^{n*} which is the n fold convolution of F. If we have a delayed process then S_n\sim G*F^{(n-1)*}.

Posted in Markov Process, Poisson Process, Renewal Theory | Leave a comment

Markov Chains 2

The last post ended with a series of definitions. In this post I will attempt to show how they are related and how they can be used. Just as a refresher, here are the terms that were defined: communication, transient, recurrent, positive recurrent, and null recurrent. If you are unable to recall, or you just don’t know, what these terms mean then please read the ending section of the previous post.

We already know how to find the probability that after n steps you have traveled from state i to state j. This is just P^n_{ij}, where P is the transition matrix for our Markov chain. However, what if we want to know the probability of reaching state j for the FIRST TIME with n steps, given we start in state i? Clearly this will be different (in fact smaller) than the answer to the previous question. We will denote this quantity as f^{(n)}_{ij}, and then let \displaystyle f_{ij}:=\sum_nf^{(n)}_{ij}. In case you aren’t able to see it yourself, you can interpret f_{ij} as being the probability of ever going from state i to j.

One inequality that is somewhat obvious is f^{(n)}_{ij}\le p^n_{ij}. From here one can easily deduce that f_{ij}\le\sum p^n_{ij}.

Sometimes you might have to explicitly calculate the value of f^{(n)}_{ij}. After some thinking this isn’t all that “difficult”. The formula I present can be derived from a basic first step analysis. It should be noted that the case n=1 is rather special, for this we just have that f^{(1)}_{ij}=P_{ij}. Now suppose that n>1, then in the first step you go from state i to some other state k with k\neq j (since we have to reach j for the first time on the n^{th} step). Now that we are at state k the problem starts over, except this time we have to reach j in n-1 steps. Thus it follows that when n>1 we have

\displaystyle f^{(n)}_{ij}=\sum_{k\neq j}P_{ik}f^{(n-1)}_{kj}

Stability Properties

Suppose that \mathscr K is a property that can be used to describe a state, such as recurrence/transience, and state i has this property. If i\leftrightarrow j implies that state j also has property \mathscr K, then \mathscr K is called a stability property.

From the definition, you might have guessed that both recurrence and transience are stability properties. A result of this fact is that it is impossible to go from a recurrent state to a transient one. This is because you will return to the recurrent state with probability 1 in a finite number of steps. However, then you will also return to the transient state with probability 1 with a finite number of steps.

Theorem: If state i is transient and i\leftrightarrow j, then state j is also transient.
proof: In this proof I will use the two following facts (which I do not prove, but are true none the less), i is recurrent if and only if f_{ii}=1 if and only if \sum p^n_{ii}=\infty. For transience you have a similar result, but you replace = with <.
Since i\leftrightarrow j there exists k and m such that
\displaystyle P^{(k)}_{ij}>0 and P^{(m)}_{ji}>0. Moreover, for any n we have that

P^{(n+m+k)}_{jj}=\displaystyle\sum_a\sum_bP^{(m)}_{ja}P^{(n)}_{ab}P^{(k)}_{bj}\ge P^{(m)}_{ji}P^{(n)}_{ii}P^{(k)}_{ij}.

Since this is true for any n, we can sum both sides over all possible n. This gives us \displaystyle\sum_{n=1}^\infty P^{(n+m+k)}_{jj}\ge P^{(m)}_{ji}P^{(k)}{ij}\sum_{n=1}^\infty P^{(n)}_{ii}. So \sum P^{(n)}_{jj}=\infty since \sum P^{(n)}_{ii}=\infty.

There is another stability property that I won’t discuss too much, but it is worth mentioning. The period of a state, d(i), is defined as follows:

d(i)=\gcd\{n:\ P^{(n)}_{ii}>0\}

If d(i)=1 then the state is said to be aperiodic (as you would expect). The reason periodicity is important is that sometimes you wish to calculate \lim P^{(n)}_{ii} as n\to\infty. Well if the state is not aperiodic then this limit will not exist (not too difficult to see why, just think about the delta epsilon definition of a limit). However, to get around this we instead consider the Cesaro sum

\displaystyle\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^N P^{(n)}_{ii}

This allows us to find limiting probabilities (in a sense) of both periodic and aperiodic states. This is because if a_n\to\alpha then the Cesar sum also converges to \alpha.

Posted in Markov Process | Leave a comment

Markov Chains

Wow it really has been a while since my last post. I guess grad school will do that to you, or at least the first year anyway.

So as can be seen from the title this post will be on Markov chains. I don’t think I’ll present too many proofs, but at least I’ll state some interesting things.

Discrete Time Markov Chains

Since I am lazy and don’t want to type Discrete Time Markov Chain over and over, I will use the acronym DTMC.
Let S be a countable set (possibly finite). Most of the time we will assume that S is either \{0,1,2,\dots\} or \mathbb Z. You should by now know what a Markov chain is, but just in case here is a refresher.

Definition: A collection of random variables X_n:\mathbb N\to S such that P(X_n=j | X_{n-1}=i, X_{n-2}=i_{n-2}, \dots, X_0=i_0)=P(X_n=j | X_{n-1}=i). We will assume that our Markov chain in time homogeneous, so that P(X_{n+k}=j | X_k=i)=P(X_n=j | X_0=i).

We will denote this transition from state i to state j in one step by P(X_n=j | X_{n-1}=i)=P_{ij}. Using this we can form an |S|\times |S| matrix P which will represent all the one step transition probabilities. Note that each row of P sums to 1, but the same need not be true of the columns.

How would you express P_i(X_n=j)=P(X_n=j | X_0=i) for n>1? Let’s look at the case where n=2.

\displaystyle P_i(X_2=j)=\sum_{k\in S}P(X_2=j | X_1=k, X_0=i)P(X_1=k | X_0=i)

Then using the markov property and time homogeneity, the above is equal to

\displaystyle \sum_{k\in S}P(X_1=j | X_0=k)P(X_1=k | X_0=i)=\sum_{k\in S}P_{ik}P_{kj}=P_{ij}^2

Which is just the (i,j) element in P^2. To prove the general case is a simple induction. You should find that,

Theorem: P_i(X_n=j)=(P^n)_{ij} and P^{n+m}=P^{n}P^{m}

Nothing all that surprising if you think about it for a moment.

Classifications of States

Definition: A state i is said to communicate with a state j if there exist n such that (P^n)_{ij}>0. This is expressed as i\to j. If j also communicates with i then i\leftrightarrow j.

It turns out that communication defines an equivalence relation and can thus be used to decompose our state space into equivalence classes. I won’t prove that it is indeed an equivalence relation but the proof is not hard and almost obvious. So if you have a strong urge to do so then prove away, otherwise you are free to continue reading.

Definition: Choose an arbitrary state i\in S. For this state define the function \tau_i(n) as follows; \tau_i(0)=0, and \tau_i(n)=\inf\{m: X_m=i, m>\tau_i(n-1)\} for n\ge 1. Thus \tau_i(n) can be thought of as the time of the n^{th} visit to state i, given that you started at i.

Definition: A state if called recurrent if, given that you start in that state, you will return to it a finite amount of steps with probability 1. In other words P_i(\tau_i(1)<\infty)=1. If this expected return time also happens to be finite, E_i[\tau_i(1)]<\infty, the state is called positive recurrent. However, if the expected return time is infinite (which can happen even if P_i(\tau_i(1)<\infty)=1) the state is called null recurrent. Finally, if there is a chance that you will never return to the state you started from, P_i(\tau_i(1)=\infty)>0, that state is said to be transient. It is obvious that transient states are also null recurrent, even though they are not recurrent.

This seems like a good place to end (mainly because I have to do other things right now). So next time I will further develop the ideas mentioned at the end, sorry for the long list of what seems like random definitions.

Posted in Markov Process | Leave a comment

Expectation Calculation

Well I have officially started graduate school, so my update frequency might drop a little (not that it was that great to begin with sadly). I haven’t seen anything all that new yet in terms of probability but I’ll be sure to let you know when I do. In the mean time here are some lesser known ways of calculating the expected value for a random variable. The method that everyone learns is that \mathbb EX=\sum kP(X=k) for discrete random variables, and something similar for continuous ones.

Theorem: Suppose that X is a non-negative integer valued random variable. Then \displaystyle\mathbb EX=\sum_{k=1}^\infty P(X\ge k).

proof: It should be obvious that \displaystyle X=\sum_{k=1}^\infty 1_{\{X\ge k\}}. Then to find the expected value we just integrate both sides.

\displaystyle\int_\Omega Xd\mathbb P=\int_\Omega\sum_{k=1}^\infty 1_{\{X\ge k\}}d\mathbb P

Since all the terms in the sum are positive, we are able to interchange the summation and integral. This is basically and application of Tonelli’s Theorem. Thus we have,

\displaystyle \mathbb EX=\sum_{k=1}^\infty\int_\Omega 1_{\{X\ge k\}}d\mathbb P=\sum_{k=1}^\infty P(X\ge k)

Now what if X is a continuous random variable? Well, in a sense the same result holds (assuming that X is still non-negative).

Theorem: If X is a non-negative, continuous random variable then \mathbb EX=\int_0^\infty(1-F(x))dx. Here F(x)=P(X\le x).

proof: This proof will be a little different. This time we will show that both sides of the desired equality are equal to
\displaystyle\int_\Omega\int_0^\infty 1_{[0,X(w)]}(x)dxd\mathbb P(w).
If we evaluate the integral in the given order we find
\displaystyle\int_\Omega X(w)\mathbb P(w)=\mathbb EX.
The term that we are integrating is positive, so we can once again invoke Tonelli’s theorem and change the order of integration. Doing so yields the following.
\displaystyle\int_0^\infty P(X\ge x)dx=\int_0^\infty(1-F(x))dx.

If you feel that you understand the above proof here is a “challenge” that you might want to try.

Problem: Show that if X is a non-negative random variable, then \displaystyle\mathbb EX^n=\int_0^\infty nx^{n-1}(1-F(x))dx.

Posted in Expected value and Variance | Leave a comment

Derivatives and Integrals, Part 2

Near the end of my last post I commented on how I would like to prove the interchanging of the integral using only the fact that \int_a^b|f(x,y)|dx<\infty. As it turns out it’s not possible, if this is our only assumption. In the closing remarks, Wikipedia noted that the interchange of the derivative and integral was essentially the same as Fubini’s theorem.

Using the notation from the last post we have the following theorem.

Theorem: \displaystyle g'(y)=\int\frac{\partial}{\partial y}f(x,y)\ dx if and only if \displaystyle \int\int\frac{\partial}{\partial y}f(x,y)\ dx\ dy=\int\int\frac{\partial}{\partial y}f(x,y)\ dy\ dx.

Proof: In the previous post it was shown that the conclusion of Fubini’s theorem implies the desired result. Thus here we will only show the other direction.
At first when I thought about this problem I was at a loss of where to begin. That was until I remembered a trick I used to do all the time in my pre-university days (it is documented in this mouseover in this xkcd comic).

So by the Lebesgue Differentiation Theorem (which was mentioned in the last post) we have that
\displaystyle g'(y)=\lim_{s\to y}\frac{1}{y-s}\int_s^y\int_a^b\frac{\partial}{\partial r}f(x,r)\ dx\ dr,
which implies
\displaystyle g(y)-g(s)=\int_s^y\int_a^b\frac{\partial}{\partial r}f(x,r)\ dx\ dr.

Next we evaluate g(y)-g(s) in a different way. From our definition of g we have that
g(y)-g(s)=\int_a^b[f(x,y)-f(x,s)]\ dx.
Then we use the fundamental theorem of calculus to introduce another integral.
\displaystyle g(y)-g(s)=\int_a^b\int_s^y\frac{\partial}{\partial r}f(x,r)\ dr\ dx.

And we are done.

So what good would a theorem be if there were no problems it could be used to solve.

Example 1: Evaluate \displaystyle\int_0^\infty\frac{\sin x}{x}\ dx.

Solution: Define \displaystyle I(t)=\int_0^\infty\frac{e^{-tx}\sin x}{x}\ dx, and we see that we want to find I(0). Then we have that \int_0^\infty\left|\frac{\partial}{\partial t}\frac{e^{-tx}\sin x}{x}\right|dx<\infty, by a simple comparison test.
Thus we can write, I'(t)=\int_0^\infty e^{-tx}\sin xdx; which is an easy exercise with integration by-parts. So we get I'(t)=-(1+t^2)^{-1}, which means that I(t)=-\tan^{-1}(t)+C. We see that I(t)\to 0 as t\to\infty from its integral definition. So using this we have that I(t)=\frac{\pi}{2}-\tan^{-1}(t)\Longrightarrow I(0)=\frac{\pi}{2}.

Example 2: Evaluate \displaystyle\int_0^1\frac{x-1}{\log x}\ dx.

Solution: Let \displaystyle I(t)=\int_0^1\frac{x^t-1}{\log x}dx. After a quick check we see that Fubini’s theorem holds, so we have
\displaystyle I'(t)=\int_0^1x^tdx\Longrightarrow I'(t)=\frac{1}{t+1}.

Therefore, I(t)=\log(t+1)+C and since I(0)=0, I(t)=\log(t+1). So our final answer is \log(2).

Example 3: Evaluate \displaystyle\int_0^\infty\left(\frac{\sin x}{x}\right)^2dx. (This is actually an old Putnam question)

Solution: This time let I(a)=\displaystyle\int_0^\infty\left(\frac{\sin(ax)}{x}\right)^2dx, then

\displaystyle I'(a)=\int_0^\infty\frac{2\sin(ax)\cos(ax)}{x}dx=\int_0^\infty\frac{\sin(2ax)}{x}dx=\int_0^\infty\frac{\sin u}{u}du
after making the substitution u=2ax. From the first example we have that I'(a)=\frac{\pi}{2}, so I(a)=\frac{a\pi}{2}+C. From our definition of I(a) we have the initial condition I(0)=0, thus I(a)=\frac{a\pi}{2}\Longrightarrow I(1)=\frac{\pi}{2}.

Posted in Measure Theory | Leave a comment

Derivatives and Integrals

In the last post I stated the following as a theorem.

If \displaystyle\int |f(x,y)|\ dx<\infty then \displaystyle\frac{d}{dy}\int f(x,y)\ dx=\int\frac{\partial}{\partial y}f(x,y)\ dx.

I was trying to prove this earlier and found that I needed a different set of assumptions. I’m not saying that the above is wrong (though it might be), but I managed to find a different version that does happen to work.

Theorem: Let f:\mathbb R^2\to\mathbb C be measurable and differentiable with respect to its second argument. Additionally define \displaystyle g(y)=\int_a^bf(x,y)\ dx.

Suppose that either \displaystyle\frac{\partial}{\partial y}f(x,y)\ge 0 or \displaystyle\int_a^b\left |\frac{\partial}{\partial y}f(x,y)\right |\ dx<\infty. Then we have that

\displaystyle g'(y)=\int_a^b\frac{\partial}{\partial y}f(x,y)\ dx, for almost every y.

Proof: From the fundamental theorem of calculus we have that f(b,y)-f(a,y)=\int_a^b\frac{\partial}{\partial r}f(x,r)\ dr. With this, we begin by creating a difference quotient for g.

\displaystyle\frac{g(y)-g(s)}{y-s}=\frac{1}{y-s}\int_a^bf(x,y)-f(x,s)\ dx=\frac{1}{y-s}\int_a^b\int_s^y\frac{\partial}{\partial r}f(x,r)\ dr\ dx.

At this point we are able to make use of our assumptions about \frac{\partial}{\partial y}f(x,y), which allows us to apply Fubini’s theorem and change the order of integration. After doing so we find that

\displaystyle\frac{g(y)-g(s)}{y-s}=\frac{1}{y-s}\int_s^y\int_a^b\frac{\partial}{\partial r}f(x,r)\ dx\ dr.

Finally we are ready to take the limit as s\to y. Doing so on the left hand side gives us g'(y). For the right hand side we use the Lebesgue Differentiation Theorem. So

\displaystyle g'(y)=\lim_{s\to y}\frac{1}{y-s}\int_s^y\int_a^b\frac{\partial}{\partial r}f(x,r)\ dx\ dr=\int_a^b\frac{\partial}{\partial y}f(x,y)\ dx for almost every y.

I wonder if there is a way to prove this (assuming it is true) that only makes use of \int_a^b|f(x,y)|\ dx<\infty.

According to Wikipedia:

The following three basic theorems on the interchange of limits are essentially equivalent:

  • the interchange of a derivative and an integral (differentiation under the integral sign; i.e., Leibniz integral rule)
  • the change of order of partial derivatives
  • the change of order of integration (integration under the integral sign; i.e., Fubini’s theorem)
  • \
    Here are some links that you might find of interest;
    http://en.wikipedia.org/wiki/Differentiation_under_the_integral_sign
    http://en.wikipedia.org/wiki/Differentiation_of_integrals
    http://en.wikipedia.org/wiki/Leibniz_integral_rule

    Posted in Measure Theory | Leave a comment

    Brownian Motion

    Finally we are at the point where we can talk about Brownian motion. In a previous post I mentioned that we can obtain Brownian motion from W^n(t) by letting n\to\infty. So we will let W(t)=\lim W^n(t). It should also be noted that Brownian motion is sometimes referred to as a Wiener process, after the mathematician Norbert Wiener.

    Instead of working with \displaystyle\lim_n W^n(t) whenever we want to talk about Brownian motion we shall instead use the following definition.

    Definition: Let (\Omega,\mathscr F,\mathbb P) be a probability space, and W:\Omega\times\mathbb R^+\to\mathbb R. We say that W is a Brownian motion if:

    1. For each \omega\in\Omega we have that W(\omega,t) is continuous and W(0)=0 (normally we will drop the first argument of function but it doesn’t take long to get used to). Basically this can be understood as \mathbb P\left(\omega\in\Omega:W(\omega,0)\neq 0\right)=0.
    2. For all 0=t_0<t_1<t_2<\cdots<t_m, the increments
      W(t_1)-W(t_0), W(t_2)-W(t_1),\cdots,W(t_m)-W(t_{m-1})
      are independent and normally distributed.
    3. \mathbb E[W(t_i)-W(t_{i-1})]=0 and Var(W(t_i)-W(t_{i-1}))=t_i-t_{i-1}.

    Conditioning upon known information is a very common practice, and in order to do this with Brownian motion we will need “special” type of filtration.

    Definition: Let (\Omega,\mathscr F,\mathbb P) be a probability space with a Brownian motion, W(t). A filtration for a Brownian motion is a collection of \sigma-algebras \mathcal F(t), t\ge 0 such that:

  • If s\le t then every set in \mathcal F(s) is also in \mathcal F(t).
  • W(t) is \mathcal F(t)-measurable.
  • For 0\le s\le t, the increment W(t)-W(s) is independent of \mathcal F(s).
  • Problem:

    Let (\Omega,\mathscr F,\mathbb P) be a probability space with a Brownian motion W(t). Additionally, suppose that \mathcal F(t) is a filtration for the Brownian motion. Show that W(t) is a martingale.

    Proof: This proof is rather simple but I suppose I should type it out.

    \begin{array}{rcl}\mathbb E[W(t)|\mathcal F(s)]&=&\mathbb E[W(t)-W(s)+W(s)|\mathcal F(s)]\\\\&=&\mathbb E[W(t)-W(s)|\mathcal F(s)]+\mathbb E[W(s)|\mathcal F(s)]\\\\&=&\mathbb E[W(t)-W(s)]+W(s)\\\\&=&W(s)\end{array}

    It is also true that W(t) is a Markov process, but this proof requires additional techniques and thus will be presented later.

    From the definition of Brownian motion it is obvious that for t\ge 0 we have that Var(W(t))=t and \mathbb E[W(t)]=0. In the following theorem it is important to remember that the variance and quadratic variation are two different concepts. The quadratic variation depends upon the path that is taken when performing the computation. However, the variance can be thought of a kind of average over all possible paths.

    Theorem: Let W(t) be a Brownian motion. Then [W,W](T)=T for all T\ge 0 almost surely.
    The reason for the almost surely, is because there might exist paths such that [W,W](T)\neq T. But the set of all such paths has probability 0.

    Proof: I consider this to be the coolest proof I have seen all summer. Not because of the theorem, but more because of the technique used.

    Let \Pi=\{t_0,t_1,\dots,t_n\} be a partition with 0=t_0<t_1<\cdots<t_n=T, and \|\Pi\|=\max|t_i-t_{i-1}|. Now define the following random variable

    \displaystyle Q_\Pi=\sum_{i=1}^n\left(W(t_i)-W(t_{i-1})\right)^2

    We shall show that \mathbb EQ_\Pi=T and Var(Q_\Pi)\to 0 as \|\Pi\|\to 0. This is what makes this my favorite proof of the summer.

    Calculating the expected value is rather easy once we recall some properties of Brownian motion.

    \displaystyle\mathbb EQ_\Pi=\sum_{i=1}^n\mathbb E[(W(t_i)-W(t_{i-1})^2]=\sum_{i=1}^nVar(W(t_i)-W(t_{i-1})) and thus we have that

    \displaystyle\mathbb EQ_\Pi=\sum_{i=1}^nt_i-t_{i-1}=T.

    To calculate Var(Q_\Pi) we will use the fact that if X is a normally distributed random variable with Var(X)=\sigma^2 then \mathbb E[X^4]=3\sigma^4. The proof of this is rather simple and thus we will present it after the conclusion of this proof.

    Making use of this fact we have Var((W(t_i)-W(t_{i-1}))^2)=3(t_i-t_{i-1})^2-2(t_i-t_{i-1})^2+(t_i-t_{i-1})^2=2(t_i-t_{i-1})^2.

    Therefore,

    \begin{array}{rcl}Var(Q_\Pi)&=&\sum Var\left(W(t_i)-W(t_{i-1})\right)\\\\&=&\sum 2(t_i-t_{i-1})^2\\\\&\le&\sum 2\|\Pi\|(t_i-t_{i-1})\\\\&=&2\|\Pi\|T\end{array}

    Thus it clearly follows that

    \displaystyle\lim_{\|\Pi\|\to 0}Var(Q_\Pi)=0

    Which is gives us the desired result.

    Now to prove the fact that \mathbb E[X^4]=3\sigma^4 when X is normally distributed with mean 0 and variance \sigma^2. Define the function \varphi(u)=\mathbb Ee^{uX}. It is “easy” calculus 2 exercise to show that

    \displaystyle\varphi(u)=e^{\frac{1}{2}u^2\sigma^2}

    it actually took me a while to show this, but once you know what the final answer should be it’s quite easy to arrive at. Then taking the 4th derivative we have that

    \displaystyle\varphi^{(4)}(u)=\mathbb E[X^4e^{uX}]=\left(3\sigma^4+6u^2\sigma^6+u^4\sigma^8\right)e^{\frac{1}{2}u^2\sigma^2}

    Thus setting u=0, we have that \varphi^{(4)}(0)=\mathbb E[X^4]=3\sigma^4.

    The differentiation under the integral (inside the expectation) is allowed since everything is positive and integrable. This can be better phrased by the following theorem.

    Theorem: If \displaystyle\int |f(x,y)|\ d\mu(x)<\infty, then

    \displaystyle\frac{d}{dy}\int f(x,y)\ d\mu(x)=\int\frac{\partial}{\partial y}f(x,y)\ d\mu(x)

    Posted in Brownian Motion, Martingale | Leave a comment