cytopy.flow.clustering.flowsom

Here you will find cytopy’s implementation of the FlowSOM algorithm, which relies on the MiniSOM library for self-organising maps. The work was adapted from https://github.com/Hatchin/FlowSOM for integration with cytopy and the database architecture.

Copyright 2020 Ross Burton

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Classes:

FlowSOM(data, features, …)

Python implementation of FlowSOM algorithm, adapted from https://github.com/Hatchin/FlowSOM This class implements MiniSOM in an almost identical manner to the work by Hatchin, but removed all the of the data handling steps seen in Hatchin’s original library, since these are handled by the infrastructure in cytopy.

class cytopy.flow.clustering.flowsom.FlowSOM(data: pandas.core.frame.DataFrame, features: list, neighborhood_function: str = 'gaussian', normalisation: bool = False, verbose: bool = True)

Python implementation of FlowSOM algorithm, adapted from https://github.com/Hatchin/FlowSOM This class implements MiniSOM in an almost identical manner to the work by Hatchin, but removed all the of the data handling steps seen in Hatchin’s original library, since these are handled by the infrastructure in cytopy. The FlowSOM algorithm is implemented here in such a way that it requires only a Pandas DataFrame, like that typically produced when retrieving data from the cytopy database, and gives access to methods of clustering and meta-clustering. In addition to Hatchin’s work, the cytopy implementation has improved error handling and integrates better with the cytopy workflow.

Parameters
  • data (Pandas.DataFrame) – training data

  • features (List) – list of columns to include

  • neighborhood_function (str) – name of distribution for initialising weights

  • normalisation (bool) – if True, min max normalisation applied prior to computation

Methods:

meta_cluster(cluster_class[, min_n, max_n, …])

Perform meta-clustering. Implementation of Consensus clustering, following the paper https://link.springer.com/content/pdf/10.1023%2FA%3A1023949509487.pdf :param cluster_class: clustering object (must follow Sklearn standard; needs fit_predict method called with parameter n_clusters) :param min_n: the min proposed number of clusters :type min_n: int :param max_n: the max proposed number of clusters :type max_n: int :param iter_n: the iteration times for each number of clusters :type iter_n: int :param resample_proportion: within (0, 1), the proportion of re-sampling when computing clustering :type resample_proportion: float, (Default value = 0.5).

predict()

Predict the cluster allocation for each cell in the associated dataset.

train([som_dim, sigma, learning_rate, …])

Train self-organising map. :param som_dim: dimensions of SOM embedding (number of nodes) :type som_dim: tuple, (default=(250, 250)) :param sigma: the radius of the different neighbors in the SOM, default = 1.0 :type sigma: float, (default=1.0) :param learning_rate: alters the rate at which weights are updated :type learning_rate: float, (default=0.5) :param batch_size: size of batches used in training (alters number of total iterations) :type batch_size: int, (default=500) :param seed: random seed :type seed: int, (default=42) :param weight_init: how to initialise weights: either ‘random’ or ‘pca’ (Initializes the weights to span the first two principal components) :type weight_init: str, (default=’random’).

meta_cluster(cluster_class: callable, min_n: int = 5, max_n: int = 50, iter_n: int = 10, resample_proportion: float = 0.5)

Perform meta-clustering. Implementation of Consensus clustering, following the paper https://link.springer.com/content/pdf/10.1023%2FA%3A1023949509487.pdf :param cluster_class: clustering object (must follow Sklearn standard; needs fit_predict method called with

parameter n_clusters)

Parameters
  • min_n (int) – the min proposed number of clusters

  • max_n (int) – the max proposed number of clusters

  • iter_n (int) – the iteration times for each number of clusters

  • resample_proportion (float, (Default value = 0.5)) – within (0, 1), the proportion of re-sampling when computing clustering

Returns

Return type

None

predict()

Predict the cluster allocation for each cell in the associated dataset. (Requires that train and meta_cluster have been called previously)

Returns

Predicted labels

Return type

numpy.ndarray

train(som_dim: tuple = (50, 50), sigma: float = 1.0, learning_rate: float = 0.5, batch_size: int = 500, seed: int = 42, weight_init: str = 'random')

Train self-organising map. :param som_dim: dimensions of SOM embedding (number of nodes) :type som_dim: tuple, (default=(250, 250)) :param sigma: the radius of the different neighbors in the SOM, default = 1.0 :type sigma: float, (default=1.0) :param learning_rate: alters the rate at which weights are updated :type learning_rate: float, (default=0.5) :param batch_size: size of batches used in training (alters number of total iterations) :type batch_size: int, (default=500) :param seed: random seed :type seed: int, (default=42) :param weight_init: how to initialise weights: either ‘random’ or ‘pca’ (Initializes the weights to span the

first two principal components)

Returns

Return type

None