missingbytes: June 2012

Sunday 17 June 2012

The Power Iteration

In the middle of the last post, we needed to find the largest eigenvector of a symmetric 3x3 matrix. I used the Power Iteration to find the largest eigenvector, mostly because it's such a forgiving algorithm: Even a trivial implementation will still get a correct result.

In this post, I'm going to talk about how to actually write the Power Iteration, along with a few caveats along the way.

Eigenvectors and Eigenvalues

Okay, okay, okay, so what actually is an Eigenvector/Eigenvalue?

Lets take an even further step back, what is a matrix? In mathematics, we think of a matrix as representing a Linear Transformation. Intuitively, we might think of it like this : A matrix takes an incoming vector, and forms an output vector, where each output term is just a linear combination of the input terms.

[a, b, (3, (3*a + 5*b,

c, d] ^* 5) ⁼ 3*c + 5*d)

Lets consider a really simple matrix, called Scale79:

[7, 0,

0, 9]

You can see Scale79 takes a vector like (1,0) and returns (7,0). We say that (1,0) is an eigenvector of Scale79, with an Eigenvalue (λ) of 7. Notice that (2,0) is also an eigenvector, as is (3,0), (4,0) , (-4,0) and also (7,0) and (49,0)

Scale79 also takes the vector (0,1) to (0,9). We say that (0,1) is an eigenvector of Scale79, associated with the eigenvalue (λ) 9.

Our example matrix, Scale79, has two eigenvalues, 7 and 9. Its eigenvectors associated with the eigenvalue 7 look like (1,0), and its eigenvectors associated with the eigenvalue 9, look like (0,1)

Do all (square) matrices have eigenvalues? Well, yes and no. It's true that all n-by-n square matricies have n eigenvalues, but only if we allow ourselves the luxury of eigenvalues taking on complex values. Complex valued eigenvalues can sometimes be subtle to deal with, especially when using floating point math, so lets restrict ourselves to real numbers.

In the real-number-only case, it turns out that n-by-n square matrices have n eigenvalues, iff the matrix is symmetric.

This beautiful result, and so much more (including some technical details regarding multiplicity and repeated eigenvalues) is well beyond the scope of this blog, but I urge you to read about Jordan Normal Form for an introduction to this fascinating topic if you're interested to know more.

Let look at Scale79 a little closer. We know it has two eigenvalues, 7 and 9, and it scales eigenvectors like (1,0) and (0,1) by those eigenvalues.

Scale79 * (x,0)^T = 7 * (x, 0)^T

Scale79 * (0,y)^T = 9 * (0, y)^T

In some sense, we can replace a costly operation (matrix times vector) by a much simpler operation (scalar times vector), but only for some vectors!

Eigenvectors are like a sort of super matrix compression technology, but it only works some of the time!

Singular Value Decomposition

Before we go much further, lets get something out of the way. If ever you actually want to compute eigenvalues and eigenvectors for a matrix, you just need to use the library function called "Singular Value Decomposition" or S.V.D. Any half-way decent matrix library will provide an implementation which will factorize your matrix into a length-preserving matrix (U), a matrix of (generalized) eigenvalues for scaling (S), and another length-preserving matrix (V).

M = U . S . V (via S.V.D.)

Using this decomposition, it's trivial to pick off the largest eigenvalue (it's almost always the first one!), and its associated eigenvector (it's the first row of the V matrix)

For example, in GNU octave :

octave> scale79 = [7,0; 0,9]; [u,s,v] = svd(scale79)
u =
   0   1
   1 -0

s =
   9   0         <---- Largest Eigenvalue (λ = 9)
   0   7

v =
   0   1         <---- Eigenvector associated with 9
   1   0

Update: The SV Decomposition is only equivalent to the Eigen Decomposition if the eigenvalues are positive - best to use your library's eigen solver instead...

The Power Iteration

The s.v.d. is great for if we want all of the eigenvalues and eigenvectors. But sometimes the matrix M is too big to fit in memory, or perhaps we don't have handy access to a reliable s.v.d. implementation (e.g. on a GPU or an embedded controller)

The Power Iteration works by doing the simplest possible thing - it mindlessly multiplies the matrix by a vector (any vector), repeatedly, again and again, over and over until the vector becomes very close to an actual eigenvector:

M*v

M*(M*v)

M*(M*(M*v))

M*(M*(M*(M*v)))

M*(M*(M*(M*(M*v))))

Or, in Eigenvalue terms

λ*v

λ*(λ*v)

λ*(λ*(λ*v))

λ*(λ*(λ*(λ*v)))

λ*(λ*(λ*(λ*(λ*v))))

After many iterations, the vector v will tend to become closer and closer to an eigenvector associated with the largest eigenvalue. How quickly? Well after 5 iterations, we have λ⁵*v. And that's where the power method gets its name, we are raising the eigenvectors/matrix to a really large power, say, 10000 or more and then seeing what remains.

In theory, we can form λ¹⁰⁰⁰⁰*v, and it will be just as good an eigenvector as any other. In practice, we need to deal with floating point overflow. For this reason we normalize the vector after each iteration to prevent overflow.

In code, that looks like this :

    Vector3 v(1, 1, 1);
    for(int i=0; i<10000; i++)
    {
        v = (M * v).GetUnit();
    }
    return v;

Convergence

How fast will we find the eigenvector? In the limit, it actually depends on the ratio between the largest eigenvector and the second largest eigenvector. For the matrix scale79, this ratio is 7/9 or 0.7777...

This is because the smallest eigenvalues will quickly disappear into the round-off error, and in the end, it will just be the last two eigenvectors battling for supremacy.

In code, that looks like this :

    Vector3 v(1, 1, 1);
    Vector3 lastV = v;
    for(int i=0; i<10000; i++)
    {
        v = (M * v).GetUnit();
        if(DistanceSquared(v, lastV) < 1e-16f)
        {
            break;
        }
        lastV = v;
    }

If you've been playing along at home, you'll notice I've been talking about "the largest eigenvalue". In fact, this algorithm converges to the eigenvalue with the largest magnitude. e.g. if the two eigenvalues were -7 and -9, it will converge to an eigenvector associated with -9.

So the full name for the above code snipped is really:

FindEigenvectorAssociatedWithEigenvalueWithLargestMagnitudeAssumingSquareSymmetricMatrix(m);

Ouch! That's quite a mouthful! Please don't let the terms throw you off. All of this stuff is built on simple concepts that have deep roots in physics, chemistry, economics, etc etc. Because these are still relatively newish concepts, historically speaking, it makes sense that they don't (yet) map well to traditional natural languages such as English. Trust me, it's totally worth putting in the time to get an intuitive feel for these concepts because they come up again and again in so many different contexts.

Ask questions, break it down, ignore the clumsy verbiage and focus on the concepts, and it'll all make sense.

Preconditioning

Preconditioning is a technique we use when we want to make math problems easier to solve. We say, "rather than solve this difficult problem, lets solve this easier problem that has the same answer."

In our case, if iterating M*v will eventually converge to an eigenvector, then (M*M)*v will converge twice as fast to the same eigenvector (but to a different eigenvalue - the square of the original one!)

    float scale = FindLargestEntry(m);
    Matrix33 matrixConditioned = m / scale;
    matrixConditioned = matrixConditioned * matrixConditioned;
    matrixConditioned = matrixConditioned * matrixConditioned;
    matrixConditioned = matrixConditioned * matrixConditioned;

We have to be careful that the matrix multiply itself doesn't overflow! By dividing through by the largest entry, we have good confidence we can raise the matrix to the 8th power and still keep the structure of the matrix.

At this point you might be tempted to skip the normalization step, and divide through by the determinant (or some multiplier) instead. Be careful - these two techniques, while having the same effect, are actually solving two different problems. If you try and solve both at once you may find you solve neither. In particular, if your matrix has a particularly fiendish condition called "mixing of the modes", it can wreak all sorts of havoc with your computation.

Putting it All Together

So now we have all the pieces for our Power Iteration. The primary motivation for this algorithm is ease-of-coding, so keep it simple - if you find yourself putting time into making this algorithm better, you're almost always better off using a library implementation of SVD instead.

And remember, this function only works well for symmetric matrices!

// From http://missingbytes.blogspot.com/2012/06/power-iteration.html
// note: This function will perform badly if the largest eigenvalue is complex
Vector3 FindEigenVectorAssociatedWithLargestEigenValue(const Matrix33 &m){
    //pre-condition
    float scale=FindLargestEntry(m);
    Matrix33 mc=m*(1.0f/scale);
    mc=mc*mc;
    mc=mc*mc;
    mc=mc*mc;
    Vector3 v(1,1,1);
    Vector3 lastV=v;
    for(int i=0;i<100;i++){
        v=(mc*v).GetUnit();
        if(DistanceSquared(v,lastV)<1e-16f){
            break;
        }
        lastV=v;
    }
    return v;
}

Saturday 9 June 2012

Fitting a plane to a point cloud

A buddy of mine is trying to find the plane that best describes a cloud of points, and, naturally, my very first thought is, "Wow... that would make an awesome blogpost!"

The Problem

Given a collection of points in 3D space, we're trying to find the plane that is the closest to those points.

minimize( sum ( distance²(point, plane), point), plane)

If you're anything like me, you're probably wondering why we sum the squares of the distances.

The reasoning is a little backwards, but basically it's so that we can solve it by using the method of Linear Least Squares. It turns our approximation problem into a so-called "Quadratic-Form", which we know from theory we can solve exactly and efficiently using linear algebra.

There are other ways we could have phrased the problem. One of my favorites involves using Robust Statistics to ignore outliers. Another involves using radial basis functions to try and trap the plane within a region - this works great when we're not sure what type of function best approximates the points. Yet another approach involves perturbing and clustering the points to re-weight the problem.

Something I find fascinating is that most of those methods eventually require a linear least squares solver.

[Update 2015-11: I have a suspicion that finding the 'maximum likelihood estimator' for this particular problem doesn't require the use of LLS - can any kind reader confirm or deny?]

So lets start by forming the linear least squares approximation, and then later we can see if we still need to extend it. (Of course, if you'd like to know more, why not leave a comment in the comments section down below?)

Linear Least Squares

Let's start by defining our plane in implicit form:

C = Center of Plane

N = Plane's normal

C + X.N = 0

Then for an arbitrary point P, we can write it in 'plane' co-ordinates like so :

P = C + μN + pN^⟂

Here μ is the distance from the point to the plane, and N^⟂ is a 2-by-3 matrix representing the perpendicular to the plane's normal, and p is a 2-vector of co-factors.

We are trying to minimize the following :

E = sum( μ², points)

With a little math, we can show that

C = sum ( P, points ) / count ( points )

With a lot more math, we can show that N is the eigenvector associated with the smallest eigenvalue of the following matrix :

M = sum [ (P_x-C_x).(P_x-C_x), (P_x-C_x).(P_y-C_y), (P_x-C_x).(P_z-C_z),

(P_y-C_y).(P_x-C_x), (P_y-C_y).(P_y-C_y), (P_y-C_y).(P_z-C_z),

(P_z-C_z).(P_x-C_x), (P_z-C_z).(P_y-C_y), (P_z-C_z).(P_z-C_z)]

Finding the Center

Lets find the C first, that's the Center of the plane. In the mathematical theory, it doesn't really make sense to talk about the center of an infinite plane - any point on the plane, and any multiple of the plane's normal describe the same plane. Indeed, it's tantalizingly seductive to describe a plane by it's normal, N, and its distance to the origin, C.N.

But for those of us fortunate enough to be trapped inside a computer, we must face the realities of floating point numbers and round off error. For this reason, I always try to represent a plane using a Center and a Normal, until I've exhausted all other available optimizations.

 def FindCenter(points):

    sum = Vector3(0, 0, 0) 

    for p in points:

       sum += p

    return sum / len(points)

Finding the Normal

Now lets find the normal. You'll see this type of computation fairly often when dealing with quadratic forms, lots of square terms on the diagonal, and lots of cross terms off the diagonal.

As it's a symmetric matrix, we only need to compute half of the off-diagonal terms : X.{X,Y,Z} Y.{Y,Z} Z.{Z}

 def FindNormal(points):

    sumxx = sumxy = sumxz = 0

    sumyy = sumyz = 0

    sumzz = 0

    center = FindCenter(points) 

    for p in Points:

        dx = p.X - center.X

        dy = p.Y - center.Y

        dz = p.Z - center.Z

        sumxx += dx*dx

        sumxy += dx*dy

        sumxz += dx*dz

        sumyy += dy*dy

        sumyz += dy*dz

        sumzz += dz*dz

    symmetricM = Matrix33( \

           sumxx,sumxy,sumxz, \

           sumxy,sumyy,sumyz, \

           sumxz,sumyz,sumzz)

    return FindSmallestMumbleMumbleVector(symmetricM)

Note that we've had to make two passes through the points collection of points - once to find the center, and a second pass to form the matrix. In theory, we can compute both at the same time using a single pass through the points collection. In practice, the round-off error will obliterate any precision in our result. Still, it's useful to know it's possible if you're stuck on a geometry shader and only have one chance to see the incoming vertices.

The smallest Eigenvalue

So now we just need to find an eigenvector associated with the smallest eigenvalue of a matrix.

If you recall, for a (square) matrix M, an eigenvalue, λ, and an eigenvector, v are given by:

Mv = λv

That is, the matrix M, operating on the vector v, simply scales the vector by λ.

Our 3x3 symmetric matrix will have 1, 2 or 3 (real) eigenvalues ^{(see appendix!)}, and we need to find the smallest one. But watch this:

M^-1v = λ^-1v

We just turned our smallest eigenvalue of M into the largest eigenvalue of M^-1!

In code :

def FindEigenvectorAssociatedWithSmallestEigenvalue(m):

   det = m.GetDeterminant()

   if det == 0:

       return m.GetNullSpace()

   mInverse = m.GetInverse() 

   return FindEigenvectorAssociatedWithLargestEigenvalue(mInverse)

The largest Eigenvalue

Computing eigenvalues exactly can be a little tricky to code correctly, but luckily forming an approximation to the largest eigenvalue is really easy:

def FindEigenvectorAssociatedWithLargestEigenvalue(m):

   v = Vector(1, 1, 1) 

   for _ in xrange(10000):

       v = (m * v).GetNormalized()

   return v

Sucesss!

So there we have it, linear least squares in two passes! All that's left to do is write the code!

Appendix - Linear Least Squares plane for a Point Cloud in C++

The following 3 functions for linear least squares are hereby licensed under CC0.

// From http://missingbytes.blogspot.com/2012/06/fitting-plane-to-point-cloud.html
float FindLargestEntry(const Matrix33 &m){
    float result=0.0f;
    for(int i=0;i<3;i++){
        for(int j=0;j<3;j++){
            float entry=fabs(m.GetElement(i,j));
            result=std::max(entry,result);
        }
    }
    return result;
}

// From http://missingbytes.blogspot.com/2012/06/fitting-plane-to-point-cloud.html
// note: This function will perform badly if the largest eigenvalue is complex
Vector3 FindEigenVectorAssociatedWithLargestEigenValue(const Matrix33 &m){
    //pre-condition
    float scale=FindLargestEntry(m);
    Matrix33 mc=m*(1.0f/scale);
    mc=mc*mc;
    mc=mc*mc;
    mc=mc*mc;
    Vector3 v(1,1,1);
    Vector3 lastV=v;
    for(int i=0;i<100;i++){
        v=(mc*v).GetUnit();
        if(DistanceSquared(v,lastV)<1e-16f){
            break;
        }
        lastV=v;
    }
    return v;
}

// From http://missingbytes.blogspot.com/2012/06/fitting-plane-to-point-cloud.html
void FindLLSQPlane(Vector3 *points,int count,Vector3 *destCenter,Vector3 *destNormal){
    assert(count>0);
    Vector3 sum(0,0,0);
    for(int i=0;i<count;i++){
        sum+=points[i];
    }
    Vector3 center=sum*(1.0f/count);
    if(destCenter){
        *destCenter=center;
    }
    if(!destNormal){
        return;
    }
    float sumXX=0.0f,sumXY=0.0f,sumXZ=0.0f;
    float sumYY=0.0f,sumYZ=0.0f;
    float sumZZ=0.0f;
    for(int i=0;i<count;i++){
        float diffX=points[i].X-center.X;
        float diffY=points[i].Y-center.Y;
        float diffZ=points[i].Z-center.Z;
        sumXX+=diffX*diffX;
        sumXY+=diffX*diffY;
        sumXZ+=diffX*diffZ;
        sumYY+=diffY*diffY;
        sumYZ+=diffY*diffZ;
        sumZZ+=diffZ*diffZ;
    }
    Matrix33 m(sumXX,sumXY,sumXZ,\
        sumXY,sumYY,sumYZ,\
        sumXZ,sumYZ,sumZZ);

    float det=m.GetDeterminant();
    if(det==0.0f){
        m.GetNullSpace(destNormal,NULL,NULL);
        return;
    }
    Matrix33 mInverse=m.GetInverse();
    *destNormal=FindEigenVectorAssociatedWithLargestEigenValue(mInverse);
}

Appendix - Citation Needed

Our 3x3 symmetric matrix is Hermitian, therefore all it's eigenvalues are real and it's Jordan Normal form looks like this:

[ λ₁, 0, 0,

0, λ₂, 0,

0, 0, λ₃ ]

Assume w.l.o.g. that λ₁ >= λ₂ >= λ₃.

If λ₁ == λ₂ == λ₃, then we have only one eigenvalue.

If λ₁ > λ₂ > λ₃, then we have three eigenvalues.

The only remaining cases both have two eigenvalues.

Saturday 2 June 2012

The Cocktail Party Effect (Part 2 of 2)

Late at night, when the boys are (finally) asleep, we like to be able to hear the voices of the people on the television without waking up the neighbors. In part one I described how to build a custom digital audio filter by specifying a frequency response and running an optimizer to determine the optimal coefficients.

This post looks at one of the actual filters I use on my television in more detail.

First up, here's the response curve for the filter.

White = Filter, Blue=Desired Response

You can see the extremely strong cut-off starting at 1000Hz, dropping very quickly, -10 dB at 300Hz, and -20dB at 50Hz. In the other direction, we have an almost flat response between 1000Hz and 5000Hz, and then a very gradual drop, -3dB at 10kHz.

This corresponds nicely with the human voice, which ranges between 800Hz - 8000Hz, with the majority of the sound energy between 1500Hz-4000Hz

Here's the actual code I use to set the desired frequency:

    float GetDesiredResponse(float freq)

    {

        float logFreq = log(freq);

        float cutOff0 = log(1000.0f);

        float cutOff1 = log(3400.0f);

        float cutOff2 = log(8000.0f);

        float result = 1.0f;

        if(logFreq  < cutOff0)

        {

            float factor = logFreq / cutOff0;

            result *= pow(factor, 10.0f);

        }

        if(logFreq > cutOff1)

        {

            float factor = 2.0f - logFreq / cutOff1;

            result *= pow(factor, 2.0f);

        }

        if(logFreq > cutOff2)

        {
            float factor = 2.0f - logFreq / cutOff2;
            result *= pow(factor, 2.0f);
        }
        return result;
    }

As you can see with this method, it's relatively easy to get precise control over the frequency response.

A couple quick notes:

It's important that the frequency response be a quasiconcave function. This ensures there are no kinks in the response, which will (1) cause visiting audiophiles to complain, and (2) make some voices more difficult to comprehend.

Be careful when specifying a very steep transition, or trying to completely stop-pass some frequencies. A digital filter has some pretty strict limits on the kinds of things it can filter. Like the proverbial genie, if you try and go there, the optimizer will give you exactly what you ask for.

I can't hear past ~17kHz, so I took extra care to test with high frequencies, but played back at half speed to make sure I wasn't torturing the cats and dogs.

This is also a good time to read up about Odd and Even functions, which correspond to odd/even numbers of coefficients in the polynomial. An Odd function, for example, will start at -∞ and rise to +∞. That's useful if you want a high-pass or low-pass filter, but bad for a band-pass filter.

Having problems with the y racing off to infinity? Your Y polynomial may have an unstable feedback loop. Try a longer or shorter filter, or add an A-stability criteria to your optimizer.

Weighting

So you're probably wondering why the filter matches the desired function so accurately over the vocal range, but seems to drift in other areas. Here's the weighting function I use to compute the error:

That is, I sample 129 frequencies logarithmically spaced between 9Hz to 18000Hz. For each frequency, I add the square of the L2 error, weighted by the square of (desired response + a quarter).

Why so much squaring? To help the function minimizer converge to a global minimum, it's nice to give big parabolas for the optimizer to slide inside.

Oh, and before I forget, here's the actual filter in C++, licensed under CC0:

    // From http://missingbytes.blogspot.com/2012/06/cocktail-party-effect-part-2-of-2.html
    static float x[5] = {0};
    for(int i=0; i<length; i++)
    {
        float sampleLeft = GetNextLeftSample();
        float sampleRight = GetNextRightSample();

        for(int j=4; j>0; j--)
        {
            x[j] = x[j-1];
        }
        x[0] = sampleLeft + sampleRight;

        // The filter!
        static float y1=0.0f, y2=0.0f, y3=0.0f, y4=0.0f;
        f32 y0=+0.588746*x[0]-0.492725*x[1]-0.757061*x[2]+0.661062*x[3]
            +1.242758*y1+0.545942*y2-1.044236*y3+0.238844*y4;
        y4=y3;y3=y2;y2=y1;y1=y0;

        float value = y0*volume;
        float outValue = bound(value, -32768, 32767);
        if(value != outValue){clippingCount++;}
        unsigned short emit = (unsigned short)outValue;
        *dest++ = emit;// Output left
        *dest++ = emit;// Output right
    }

Results

So what does it sound like? Well, for music, pretty bad actually :) But that's not the point! It's designed for voice, and that where it really shines - even though the voice quality sounds a little unnatural, you can understand what people are saying, even at very low volume levels.

(Oops, having some video encoding problems - updated video coming real soon now)

The Future

Can we do better than this? Yes, we can! When I finally get around to setting up a 7.1 audio system, I plan to measure the room response function using a calibrated microphone. Then it's a simple matter of taking the FFT of the incoming audio, dividing through by the room response, then taking the inverse fourier transform before sending to the sound card. Normally this would entail some delay/lag, but that's not a problem when playing buffered video. I could even have different responses for different listener locations, or dynamically respond to changes in temperature or humidity.

... watch this space for a future update.