Methods such as kNN and Parzen-Windowing are called nonparametric. There are also nonparametric *Bayesian* methods such as Gaussian processes and Dirichlet processes. Ironically, these methods do have parameters that can be tuned. However, they distinguish themselves from parametric methods in the sense that the complexity of their hypothesis space (roughly, the number of essentially different hypotheses) grows as the number of data instances grow. This means that these models can not be compressed into a fixed set of parameters.

What about kernel machines? On the one hand, an SVM with a linear kernel just entertains a linear separation boundary. This must therefore be parametric. The primal formulation makes this clear. On the other hand, if we use kernels based on an infinite set of features, then kernel machines can grow their complexity based on the number of data instances (the situation is slightly complicated due to the fact that kernel methods need regularization terms.) This can intuitively be seen from the fact that prediction in this case looks a lot like NN: f(t) = sum_{SV:i} a_i K(t,x_i) y_i with K(.,.) the kernel and y_i the label for train-case i and a_i is the weight for support vector i obtained from solving the dual SVM optimization problem.

Now what if we have a kernel based on M > N features but we grow N to be bigger than M? We start of with something that behaves non-parametrically but then as N > M the modeling capacity is reached and the model behaves parametrically. So perhaps the boundary between parametric and non-parametric methods is not all that clear-cut after all.

## Thursday, September 29, 2011

Subscribe to:
Posts (Atom)