Partager cette page

Artificial Intelligence : introduction to Machine Learning

COORDINATOR

Michel Riveill, PR Université Côte d'Azur, Polytech I3S

LOCATION

Tutorials at the Valrose Campus
Videoconference lectures remotely held from the SophiaTech Campus

Prerequisites

Python programming (see details below)

ABOUT THIS MINOR

This minor is also open to students from the DS4H and SPECTRUM graduate schools.

Summary

Broadly speaking, Machine learning (ML) is the scientific field aiming at building models and inferring knowledge by applying algorithms to data. Therefore, the process involves the (statistical) analysis of data, and the design of models, possibly predictive. These tasks are fundamental ones in modern science in general, and biology - medicine in particular. This course will develop an introduction to ML, by reviewing the fundamental principles and methods.

Each lecture will be accompanied by a hands-on practical (in python), during which datasets of biological and/or medical importance will be processed. Doing so will provide a unique opportunity to assess the performances of the various methods studied (running time, stability, sensitivity to noise/outliers, etc), and to think critically about the quality of models in biology/medicine.

The datasets used during the practicals will cover the main classes of data used in modern biology, at all scales (individual molecules, cells, organs, individuals).

General introduction

This lecture will introduce the main ingredients of ML, namely the different classes of problems, the data involved in such processes, the main classes of algorithms, and the learning process.

Topics :

Data types
Supervised vs non-supervised learning
Algorithms taxonomy
Software platforms and languages

Practical/potential applications :

Data manipulations
Model complexity and under/over fitting
The bias-variance trade-off

Regression with the linear model

Regression is the problem concerned with the prediction a response value from variables. This course will cover the basics of the method including the selection of variables and the design of sparse models.

Topics :

Linear regression and least squares
Errors and model adequacy
Sparse models

Practical/potential application :
The prostate cancer dataset (cf The elements of statistical learning)

Classification with the logistic regression

Logistic regression is a supervised classification algorithm used to model the probability of an observation to belong to a given class. To do so, a linear model is used to estimate the parameters.

Topics :

Classification using linear models
The logistic regression

Practical/potential application :
South African heart disease dataset (cf The elements of statistical learning)

Support Vector Machines (SVM)

SVM are a popular and robust class of models to perform supervised classification. The main difficulties are to deal with classes which are partially mixed -- e.g. due to noise, and whose boundaries have a complex geometry.

Topics :

Linear separability and support vectors
Soft margin separators
Kernels and non linear separation
Multiclass classification

Practical/potential application :
Classification of protein and DNA sequences (paper: Biological applications of support vector machines)

Linear Discriminant Analysis

LDA is another supervised classification algorithm using a linear combination of features defining boundaries separating two or more classes. This lecture will introduce LDA and compare it to the so-called Naive Bayes classifier.

Topics :

Naive Bayes classifier
LDA

Practical/potential application :
Those of lectures 4 and 5

CART / Decision Tree / Random Forest

Tree based models partition the data space to exploit local properties of the data, and can be used both for regression and classification. Multiple trees can also be combined to compensate the arbitrariness of the partitioning induced by a single tree.

Topics :

Classification And Regression Trees
Decision tree based classification
Tree induction and split rules
Ensembles of decision trees and random forests

Practical/potential application :
The iris data set + datasets used in lectures 2 and 3.

Clustering (k-means, hclust)

In a non supervised context, clustering aims at grouping the data in homogeneous groups by minimizing the intra-group variance. This fundamental task is surprisingly challenging due to several difficulties: the (generally) unknown number of clusters, clusters whose boundaries have a complex geometry, dealing with overlapping clusters (due to noise), dealing with high dimensional data, etc. This class will present two main clustering techniques:

Topics :

k-means and k-means++
Hierarchical clustering

Practical/potential applications :
Clustering molecular conformations

Dimension reduction (PSA, t-SNE)

Dimensionality reduction methods aim at embedding high-dimensional data into a lower-dimensional space, while preserving specific properties such as pairwise distances, the data spread, etc. Originating with the celebrated Principal Components Analysis method, recent methods have focused on data located on non linear spaces.

Topics :

Principal Component Analysis
t-Stochastic Neighbor Embedding (t-SNE)

Practical/potential applications :
Cell classification from RNAseq data (cf papers by Dana Pe'er)

Lecturers

Rodrigo Cabral Farias (MC Université Côte d'Azur, Polytech, I3S)
Lionel Fillatre (PR Université Côte d'Azur, Polytech, I3S)
Michel Riveill (PR Université Côte d'Azur, Polytech, I3S)

Prerequisites

Python programming :

Test your skills in Phyton : auto-eval (unice.fr)
FUN MOOC :
- Python 3 : des fondamentaux aux concepts avancés du langage (fun-mooc.fr)
- Apprendre à coder avec Python (fun-mooc.fr)
Coursera :
- Le Python pour tous | Coursera
- Python for Data Science, AI & Development | Coursera

English comprehension : level B1 recommended

Test your English : https://www.cambridgeenglish.org/fr/test-your-english/general-english/

Pedagogical resources

Jupyter Notebook : Project Jupyter | Home

Bibliography

Introduction to machine learning, E. Alpaydin
The elements of statistical learning, T. Hastie, R. Tibshirani, J. Friedman
Machine learning: a probabilistic perpsecitve, K. Murphy
Learning with kernels, B. Scholkopf and A. Smola
Python data science handbook, J. VanderPlas

Portail de Licence Sciences de la Vie	Portail de Licence Sciences de la Vie
Réseaux des anciens	Réseaux des anciens
Bibliothèque Universitaire	Bibliothèque Universitaire
Campus	Campus