Machine Learning - Unsupervised: Key Points

Machine Learning - Unsupervised

Clustering Introduction

The automated assignment of data points to distinct groups is called clustering.
Gaussian Mixture Models (GMM) is one example of cluster analysis.
.fit, ..score_samples and .predict are some of the key methods in GMM clustering.
adjusted_rand_score method randomly assigns labels for prediction scoring.

Clustering Images

Image analysis almost always requires a bit of pre-processing.
Image scaling is performed by using fit_transform method from module StandardScaler in Scikit-learn.
GMM offers a good startig point in image clustering.
Diagonal plots are very useful in exploratory data analysis.

Dimensionality Reduction

Reduced features in a dataset reduce redundancy and process is called dimensionality reduction.
Principal component analysis (PCA) is one of the commonly used dimensionality reduction methods.
n_components is used specify the number of components in Scikit-learn.
PCA can be helpful to find groups of genes that seem to be co-regulated.