Learning-Kernel

We investigate a series of learning kernel problems with polynomial combinations of base kernels, which will help us solve regression and classification problems. We also perform some numerical experiments of polynomial kernels with regression and classification tasks on different datasets.

Introduction

The study of kernel learning has spawned panoply of fascinating research in many important areas. In this project, we studied diverse methods to learn linear and polynomial combinations of kernels in regression and classification setups.

In the first part, we consider the problem of learning the kernel for Kernel Ridge Regression. Starting from the dual formula one can derive several Gradient Descent type algorithms, depending on the family of kernels chosen and on possible regularizations. This type of algorithms was first proposed in a very general setting.

Starting from the general setting, we look at different algorithms solving the learning kernel problem for the families of kernels that we consider. We analyze the Interpolated Iterative Algorithm (IIA) and the Projection-Based Gradient Descent Algorithm (PGD). For this second one, we furnish some more detail on its convergence. We then look to a slightly modified optimization problem and we derive a Regularized Interpolated Iterative Algorithm (rIIA), for the linear case, and a Regularized Projection-Based Gradient Descent Algorithm (rPGD2), for the polynomial case. We finally briefly discuss about the generalization error for this learning problem.

The above algorithms are then tested on several UCI datasets. We reported the results from our implementation and briefly commented them. Finally we ask ourselves how the kernel learned with the above algorithms could perform for SVM. Some empirical results are reported and discussed.

More empirical results are reported in Appendix, together with a more detailed proof of proposition. We also discuss some ideas from manifold optimization which could be used instead of the presented PGD algorithm.

Algorithms for Kernel Learning

Kernel Ridge Regression

We consider the problem of learning the kernel for Kernel Ridge Regression (KRR). Be $S = \{(x_1,y_1),\dots,(x_m,y_m)\}$ the training sample and $y = [y_1,\dots,y_m]^T\in\mathbb{R}^m$ the vector of training set labels and $\Phi(x)\in \mathbb{R}^d$ the feature vector associated to an input data $x\in \mathbb{R}^n$.