Well, it doesn't. PCA simply projects the data into a lower dimensional space using a linear projection. It completely ignores the labels of the data-cases (if there are any) in doing so. Instead it finds the directions of highest variance. It is an example of unsupervised learning. Kernel-PCA is more powerful in the sense that it can deal with nonlinear projections. Even better are the more recent nonlinear embedding methods such as isomap, LLE, maximum variance unfolding etc.
If you want to project in such a way that data-cases of different classes are maximally separated in the projection then another technique is available: Fisher Linear Discriminant Analysis (FLDA). Also in this case, kernelised versions are available.