Lebeau, Hugo (2021) A random matrix analysis of on-line learning PFE - Project Graduation, ENSTA.



This work is an analysis of kernel spectral clustering in streaming context. Assuming data x1, x2, . . . comes as a continuous flow and only a few number L of them can be kept in the learning pipeline, the Gram kernel matrix can only be computed around its diagonal: K = X⊤ X ⊙ T where X ∈ Rp×n is the data matrix and T ∈ {0, 1}n×n is a mask that keeps only the elements which can be computed in this on-line setting. As n, p, L → +∞ with p/n → c ∈ ]0, +∞[ and (2L − 1)/n → ε ∈ ]0, +∞[, we show how the limiting spectral distribution of K can be computed and we study the behavior of isolated eigenvalues and eigenvectors, which carry the information. This analysis reveals that c affects much more the performance than ε, which can fortunately be kept small. We detail how to perform on-line kernel spectral clustering and apply our algorithm to an image classification task. This work is a new step towards a better understanding of machine learning in order to make cost-efficient algorithms.

Item Type:Thesis (PFE - Project Graduation)
Subjects:Mathematics and Applications
ID Code:8914
Deposited By:Hugo Lebeau
Deposited On:30 sept. 2021 11:03
Dernière modification:30 sept. 2021 11:03

Repository Staff Only: item control page