Clustering Introduction


  • The automated assignment of data points to distinct groups is called clustering.
  • Gaussian Mixture Models (GMM) is one example of cluster analysis.
  • .fit, ..score_samples and .predict are some of the key methods in GMM clustering.
  • adjusted_rand_score method randomly assigns labels for prediction scoring.

Clustering Images


  • Image analysis almost always requires a bit of pre-processing.
  • Image scaling is performed by using fit_transform method from module StandardScaler in Scikit-learn.
  • GMM offers a good startig point in image clustering.
  • Diagonal plots are very useful in exploratory data analysis.

Dimensionality Reduction


  • Reduced features in a dataset reduce redundancy and process is called dimensionality reduction.
  • Principal component analysis (PCA) is one of the commonly used dimensionality reduction methods.
  • n_components is used specify the number of components in Scikit-learn.
  • PCA can be helpful to find groups of genes that seem to be co-regulated.