Machine learning/Classification algorithms

From Wikiversity

< Machine learning

Jump to navigation Jump to search

Classification is a subcategory of supervised learning problems.

k-nearest neighbor[edit | edit source]

a simple classification algorithm
Intuition: Find the majority vote in the training data
This is a discriminative model, meaning that there is no way to generate the training data points

Algorithm[edit | edit source]

Define some distance metric or similarity metric. The simplest case is Euclidean distance.
Given some input point $x$ , find the $k$ 'th nearest neighbors from the training set.
Do a majority vote between these nearest neighbor list and classify the input point as the category with highest number of vote.

Probabilistic interpretation[edit | edit source]

Consider the classification output as a random variable $y$ . Define probability of $y$ given input $x$ and training data $D$ is

P(y|x,D)={\text{fraction of points }}x_{i}{\text{ in }}k{\text{-th nearest neighbor points to }}x{\text{ such that }}y_{i}=y

The output of the classification is

{\hat {y}}=\arg \max _{y}P(y|x,D)

Read more about probabilistic interpretation here:

https://www.cc.gatech.edu/~afb/classes/CS7616-Spring2014/slides/CS7616-13a-PKNN.pdf

Retrieved from "https://en.wikiversity.org/w/index.php?title=Machine_learning/Classification_algorithms&oldid=2478547"