Support Vector Machines - An Introduction

This one goes out to Bryan to explain what a support vector machine is, but first I must cover a little bit of background information first.

Pattern Recognition
Let's say that we are a farmer trying to decide if our crops are going to be healthy this season, or ridden with locusts. We know that the chance of locust infection is heavily dependent on a lot of factors such as the length of the winter, the previous harvest size, and the last season's locust amounts. Ideally we would like to feed this data, the 'feature vector', into a black box that would output an accurate prediction for the presence or absence of locusts this year.

Being a mathematically trained farmer, we would know that there exists toolsets to do just that, so we would root around in our filing cabinets pulling up data of locust infections from years past and the corresponding weather patterns and harvests to makeup a training data set.

Now for a bit of definitions, a classification rule is a mathematical function that takes a set of training data and produces a classifier. A classifier is a function that maps a single feature vector, or sample point, to a label. These labels can be yes/no, locust/no locust, healthy/unhealthy, or even consist of multiple classes.

As a farmer, we will take our training data, use a classification rule in order to produce a classifier, and then input this year's parameters into the classifier in order to obtain a prediction for this years locust population.

Now, a small aside, why do we even bother with this in the first place? Usually the need to automate this process of prediction comes down to speed and complexity. For instance, in a factory where we want to separate tuna from sea bass we may want to classify several hundred fish per second. Or more often the case, the predictions are done in such highly complex scenarios that humans cannot easily handle them intuitively (can you visualize hundred dimensional spaces?).

Which classification rule (thus classifier) we use is a complicated issue, but today we'll talk about support vector machines (SVMs).

Support Vector Machines
The essence of SVMs is simple, take the training data, and divide the data into two (for a two class classifier) using a straight line. Or in three dimensions, use a plane, and anything higher, use a hyper plane. The math works out the same, but it is easier to visualize in the 2D line case.

How do we determine where to draw the line? Simply pick the line that creates the largest gap (or margin) between the closest set of opposing points. This can be formalized into an optimization problem, which I won't get into how to solve here, and the resulting hyperplane is used as our classifier. To do so, we simply take the current data point (this year's data) and see if it is on the locust or non-locust side of the (hyper)plane. See the image to the right (credit to Wikimedia) for a nice visual.

Now what if the data overlaps? Then we have to move to nonlinear kernel based SVMs where we map our 'low' dimensional data into a higher dimensional space using a fancy kernel function and try to separate it there.

There are many technicalities and finer points here, but that is the essence of what SVM does.

Note to Bryan, Kevin, and Hope: Now, for hyperspectral imaging, each pixel of a hyperspectral image  contains hundreds of values, one for each frequency band, and these are usually taken far enough away from the object of interest, that each pixel represents the combination of several signatures (or endmembers). Thus, an image might be 5 meter resolution and contain the signature for both soil and corn crop, and we wish to extract each of those endmembers from the several hundred values in the pixel. Using SVM to accomplish this leads to SVM endmember extraction. :)