Machine learning with kernel methods

Jean-Philippe Vert, Mines ParisTech

MSc Mathematics, Vision, Learning (ENS Cachan), Spring 2010

Results

Slides (last update: Feb 26, 2010)

Internships / PhD

I post here internship and PhD proposals I receive, which may be of interest for the students. If you would like to post a proposal just drop me a mail.

Outline

Many problems in computational biology and chemistry can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors. Indeed objects such as gene sequences, small molecules, protein 3D structures or phylogenetic trees, to name just a few, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations.

Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a positive definite kernel between the objects, formally equivalent to an implicit vectorization of the data. The "art" of kernel design for various objects have witnessed important advances in recent years, resulting in many state-of-the-art algorithms in computational biology and chemistry, as well as many other fields.

The goal of this course is to present the mathematical foundations of kernel methods, as well as the main approaches that have emerged so far in kernel design. The relevance of these methods will be illustrated by several examples in computational biology and chemistry.

Schedule and Homework

Lecture take place usually in room C102 (ENS Cachan, building Cournot).

Homeworks are due at the begining of the following lecture, by hard copy or (better) e-mail to Jean-Philippe.Vert@ensmp.fr. Implementations can be done in the programming language of your choice, e.g., the free R language, or Matlab and its free clone Octave

DateTopicSlidesHomeworkData
Jan 14, 2010, 11h-13h30Positive definite Kernels, RKHS1-36Homework 1
Jan 21, 2010, 11h-13h30Kernel trick37-52Homework 2
Feb 4, 2010, 11h-13h30Representer theorem, kernel PCA, kernel ridge regression53-84Homework 3xtrain.txt, ytrain.txt, xtest.txt, ytest.txt
Feb 11, 2010, 11h-13h30Pattern recognition, support vector machines85-134Homework 4
Feb 23, 2010, 10h30-13hSVM, Mercer kernels, RKHS and green functions135-173Homework 5data.txt,labels.txt
Mar 9, 2010, 10h30-13hSemigroup kernels174-203Homework 6
Mar 16, 2010, 11h-13h30
Mar 30, 2010, 10h30-13h
Mar 30, 2010, 14h-16h30

Results

The final note will be an average of the homeworks.
Vert Jean-Philippe
Last modified: Wed May 28 12:01:55 CEST 2008