Clustering Introduction
- The automated assignment of data points to distinct groups is called
clustering. - Gaussian Mixture Models (GMM) is one example of cluster analysis.
-
.fit,..score_samplesand.predictare some of the key methods in GMM clustering. -
adjusted_rand_scoremethod randomly assigns labels for prediction scoring.
Clustering Images
- Image analysis almost always requires a bit of pre-processing.
- Image scaling is performed by using
fit_transformmethod from moduleStandardScalerinScikit-learn. - GMM offers a good startig point in image clustering.
- Diagonal plots are very useful in exploratory data analysis.
Dimensionality Reduction
- Reduced features in a dataset reduce redundancy and process is called dimensionality reduction.
- Principal component analysis (PCA) is one of the commonly used dimensionality reduction methods.
-
n_componentsis used specify the number of components inScikit-learn. - PCA can be helpful to find groups of genes that seem to be co-regulated.