missingbytes: The Power Iteration

In the middle of the last post, we needed to find the largest eigenvector of a symmetric 3x3 matrix. I used the Power Iteration to find the largest eigenvector, mostly because it's such a forgiving algorithm: Even a trivial implementation will still get a correct result.

In this post, I'm going to talk about how to actually write the Power Iteration, along with a few caveats along the way.

Eigenvectors and Eigenvalues

Okay, okay, okay, so what actually is an Eigenvector/Eigenvalue?

Lets take an even further step back, what is a matrix? In mathematics, we think of a matrix as representing a Linear Transformation. Intuitively, we might think of it like this : A matrix takes an incoming vector, and forms an output vector, where each output term is just a linear combination of the input terms.

[a, b, (3, (3*a + 5*b,

c, d] ^* 5) ⁼ 3*c + 5*d)

Lets consider a really simple matrix, called Scale79:

[7, 0,

0, 9]

You can see Scale79 takes a vector like (1,0) and returns (7,0). We say that (1,0) is an eigenvector of Scale79, with an Eigenvalue (λ) of 7. Notice that (2,0) is also an eigenvector, as is (3,0), (4,0) , (-4,0) and also (7,0) and (49,0)

Scale79 also takes the vector (0,1) to (0,9). We say that (0,1) is an eigenvector of Scale79, associated with the eigenvalue (λ) 9.

Our example matrix, Scale79, has two eigenvalues, 7 and 9. Its eigenvectors associated with the eigenvalue 7 look like (1,0), and its eigenvectors associated with the eigenvalue 9, look like (0,1)

Do all (square) matrices have eigenvalues? Well, yes and no. It's true that all n-by-n square matricies have n eigenvalues, but only if we allow ourselves the luxury of eigenvalues taking on complex values. Complex valued eigenvalues can sometimes be subtle to deal with, especially when using floating point math, so lets restrict ourselves to real numbers.

In the real-number-only case, it turns out that n-by-n square matrices have n eigenvalues, iff the matrix is symmetric.

This beautiful result, and so much more (including some technical details regarding multiplicity and repeated eigenvalues) is well beyond the scope of this blog, but I urge you to read about Jordan Normal Form for an introduction to this fascinating topic if you're interested to know more.

Let look at Scale79 a little closer. We know it has two eigenvalues, 7 and 9, and it scales eigenvectors like (1,0) and (0,1) by those eigenvalues.

Scale79 * (x,0)^T = 7 * (x, 0)^T

Scale79 * (0,y)^T = 9 * (0, y)^T

In some sense, we can replace a costly operation (matrix times vector) by a much simpler operation (scalar times vector), but only for some vectors!

Eigenvectors are like a sort of super matrix compression technology, but it only works some of the time!

Singular Value Decomposition

Before we go much further, lets get something out of the way. If ever you actually want to compute eigenvalues and eigenvectors for a matrix, you just need to use the library function called "Singular Value Decomposition" or S.V.D. Any half-way decent matrix library will provide an implementation which will factorize your matrix into a length-preserving matrix (U), a matrix of (generalized) eigenvalues for scaling (S), and another length-preserving matrix (V).

M = U . S . V (via S.V.D.)

Using this decomposition, it's trivial to pick off the largest eigenvalue (it's almost always the first one!), and its associated eigenvector (it's the first row of the V matrix)

For example, in GNU octave :

octave> scale79 = [7,0; 0,9]; [u,s,v] = svd(scale79)
u =
   0   1
   1 -0

s =
   9   0         <---- Largest Eigenvalue (λ = 9)
   0   7

v =
   0   1         <---- Eigenvector associated with 9
   1   0

Update: The SV Decomposition is only equivalent to the Eigen Decomposition if the eigenvalues are positive - best to use your library's eigen solver instead...

The Power Iteration

The s.v.d. is great for if we want all of the eigenvalues and eigenvectors. But sometimes the matrix M is too big to fit in memory, or perhaps we don't have handy access to a reliable s.v.d. implementation (e.g. on a GPU or an embedded controller)

The Power Iteration works by doing the simplest possible thing - it mindlessly multiplies the matrix by a vector (any vector), repeatedly, again and again, over and over until the vector becomes very close to an actual eigenvector:

M*v

M*(M*v)

M*(M*(M*v))

M*(M*(M*(M*v)))

M*(M*(M*(M*(M*v))))

Or, in Eigenvalue terms

λ*v

λ*(λ*v)

λ*(λ*(λ*v))

λ*(λ*(λ*(λ*v)))

λ*(λ*(λ*(λ*(λ*v))))

After many iterations, the vector v will tend to become closer and closer to an eigenvector associated with the largest eigenvalue. How quickly? Well after 5 iterations, we have λ⁵*v. And that's where the power method gets its name, we are raising the eigenvectors/matrix to a really large power, say, 10000 or more and then seeing what remains.

In theory, we can form λ¹⁰⁰⁰⁰*v, and it will be just as good an eigenvector as any other. In practice, we need to deal with floating point overflow. For this reason we normalize the vector after each iteration to prevent overflow.

In code, that looks like this :

    Vector3 v(1, 1, 1);
    for(int i=0; i<10000; i++)
    {
        v = (M * v).GetUnit();
    }
    return v;

Convergence

How fast will we find the eigenvector? In the limit, it actually depends on the ratio between the largest eigenvector and the second largest eigenvector. For the matrix scale79, this ratio is 7/9 or 0.7777...

This is because the smallest eigenvalues will quickly disappear into the round-off error, and in the end, it will just be the last two eigenvectors battling for supremacy.

In code, that looks like this :

    Vector3 v(1, 1, 1);
    Vector3 lastV = v;
    for(int i=0; i<10000; i++)
    {
        v = (M * v).GetUnit();
        if(DistanceSquared(v, lastV) < 1e-16f)
        {
            break;
        }
        lastV = v;
    }

If you've been playing along at home, you'll notice I've been talking about "the largest eigenvalue". In fact, this algorithm converges to the eigenvalue with the largest magnitude. e.g. if the two eigenvalues were -7 and -9, it will converge to an eigenvector associated with -9.

So the full name for the above code snipped is really:

FindEigenvectorAssociatedWithEigenvalueWithLargestMagnitudeAssumingSquareSymmetricMatrix(m);

Ouch! That's quite a mouthful! Please don't let the terms throw you off. All of this stuff is built on simple concepts that have deep roots in physics, chemistry, economics, etc etc. Because these are still relatively newish concepts, historically speaking, it makes sense that they don't (yet) map well to traditional natural languages such as English. Trust me, it's totally worth putting in the time to get an intuitive feel for these concepts because they come up again and again in so many different contexts.

Ask questions, break it down, ignore the clumsy verbiage and focus on the concepts, and it'll all make sense.

Preconditioning

Preconditioning is a technique we use when we want to make math problems easier to solve. We say, "rather than solve this difficult problem, lets solve this easier problem that has the same answer."

In our case, if iterating M*v will eventually converge to an eigenvector, then (M*M)*v will converge twice as fast to the same eigenvector (but to a different eigenvalue - the square of the original one!)

    float scale = FindLargestEntry(m);
    Matrix33 matrixConditioned = m / scale;
    matrixConditioned = matrixConditioned * matrixConditioned;
    matrixConditioned = matrixConditioned * matrixConditioned;
    matrixConditioned = matrixConditioned * matrixConditioned;

We have to be careful that the matrix multiply itself doesn't overflow! By dividing through by the largest entry, we have good confidence we can raise the matrix to the 8th power and still keep the structure of the matrix.

At this point you might be tempted to skip the normalization step, and divide through by the determinant (or some multiplier) instead. Be careful - these two techniques, while having the same effect, are actually solving two different problems. If you try and solve both at once you may find you solve neither. In particular, if your matrix has a particularly fiendish condition called "mixing of the modes", it can wreak all sorts of havoc with your computation.

Putting it All Together

So now we have all the pieces for our Power Iteration. The primary motivation for this algorithm is ease-of-coding, so keep it simple - if you find yourself putting time into making this algorithm better, you're almost always better off using a library implementation of SVD instead.

And remember, this function only works well for symmetric matrices!

// From http://missingbytes.blogspot.com/2012/06/power-iteration.html
// note: This function will perform badly if the largest eigenvalue is complex
Vector3 FindEigenVectorAssociatedWithLargestEigenValue(const Matrix33 &m){
    //pre-condition
    float scale=FindLargestEntry(m);
    Matrix33 mc=m*(1.0f/scale);
    mc=mc*mc;
    mc=mc*mc;
    mc=mc*mc;
    Vector3 v(1,1,1);
    Vector3 lastV=v;
    for(int i=0;i<100;i++){
        v=(mc*v).GetUnit();
        if(DistanceSquared(v,lastV)<1e-16f){
            break;
        }
        lastV=v;
    }
    return v;
}

missingbytes

Sunday 17 June 2012

The Power Iteration

Eigenvectors and Eigenvalues

Singular Value Decomposition

The Power Iteration

Convergence

Preconditioning

Putting it All Together

No comments:

Post a Comment