Machine learning/Classification algorithms

From testwiki
Jump to navigation Jump to search

Classification is a subcategory of supervised learning problems.

k-nearest neighbor

  • a simple classification algorithm
  • Intuition: Find the majority vote in the training data
  • This is a discriminative model, meaning that there is no way to generate the training data points

Algorithm

  • Define some distance metric or similarity metric. The simplest case is Euclidean distance.
  • Given some input point x, find the k'th nearest neighbors from the training set.
  • Do a majority vote between these nearest neighbor list and classify the input point as the category with highest number of vote.

Probabilistic interpretation

Consider the classification output as a random variable y. Define probability of y given input x and training data D is

P(y|x,D)=fraction of points xi in k-th nearest neighbor points to x such that yi=yThe output of the classification is

y^=argmaxyP(y|x,D)Read more about probabilistic interpretation here: