PSC - Predictive Subspace Clustering

SGL Power

The PSC algorithm performs clustering of high-dimensional data. The assumption is that, within each cluster, the data can be approximated well by a linear subspace estimated by means of a principal component analysis. PSC then partitions the data into clusters while simultaneously estimating cluster-wise PCA parameters. The algorithm minimises an objective function that depends upon a new measure of influence for PCA models. A penalised version of the algorithm is able to carry out simultaneous subspace clustering and variable selection.

 

 

Matlab code

Gene expression data sets

Reference

McWilliams B. and Montana G. (2013) Subspace clustering of high-dimensional data: a predictive approach. Data Mining and Knowledge Discovery

 

Back Home