cytopy.flow.cell_classifier.utils

Functions:

assert_population_labels(ref, expected_labels)

Given some reference FileGroup and the expected population labels, check the validity of the labels and return list of valid populations only.

auto_weights(y)

Estimate optimal weights from a list of class labels.

calc_metrics(metrics, y_true[, y_pred, y_score])

Given a list of Scikit-Learn supported metrics (https://scikit-learn.org/stable/modules/model_evaluation.html) or callable functions with signature ‘y_true’, ‘y_pred’ and ‘y_score’, return a dictionary of results after checking that the required inputs are provided.

check_downstream_populations(ref, …)

Check that in the ordered list of population labels, all populations are downstream of the given ‘root’ population.

confusion_matrix_plots(classifier, x, y, …)

Generate a figure of two heatmaps showing a confusion matrix, one normalised by support one showing raw values, displaying a classifiers performance.

multilabel(ref, root_population, …)

Load the root population DataFrame from the reference FileGroup (assumed to be the first population in ‘population_labels’).

singlelabel(ref, root_population, …)

Load the root population DataFrame from the reference FileGroup (assumed to be the first population in ‘population_labels’).

cytopy.flow.cell_classifier.utils.assert_population_labels(ref, expected_labels: list)

Given some reference FileGroup and the expected population labels, check the validity of the labels and return list of valid populations only.

Parameters
  • ref (FileGroup) –

  • expected_labels (list) –

Returns

Return type

List

Raises

AssertionError – Ref missing expected populations

cytopy.flow.cell_classifier.utils.auto_weights(y: numpy.ndarray)

Estimate optimal weights from a list of class labels.

Parameters

y (numpy.ndarray) –

Returns

Dictionary of class weights {label: weight}

Return type

dict

cytopy.flow.cell_classifier.utils.calc_metrics(metrics: list, y_true: numpy.array, y_pred: Optional[numpy.array] = None, y_score: Optional[numpy.array] = None)dict

Given a list of Scikit-Learn supported metrics (https://scikit-learn.org/stable/modules/model_evaluation.html) or callable functions with signature ‘y_true’, ‘y_pred’ and ‘y_score’, return a dictionary of results after checking that the required inputs are provided.

Parameters
  • metrics (list) – List of string values; names of required metrics

  • y_true (numpy.ndarray) – True labels or binary label indicators. The binary and multiclass cases expect labels with shape (n_samples,) while the multilabel case expects binary label indicators with shape (n_samples, n_classes).

  • y_pred (numpy.ndarray) – Estimated targets as returned by a classifier

  • y_score (numpy.ndarray) – Target scores. In the binary and multilabel cases, these can be either probability estimates or non-thresholded decision values (as returned by decision_function on some classifiers). In the multiclass case, these must be probability estimates which sum to 1. The binary case expects a shape (n_samples,), and the scores must be the scores of the class with the greater label. The multiclass and multilabel cases expect a shape (n_samples, n_classes). In the multiclass case, the order of the class scores must correspond to the order of labels, if provided, or else to the numerical or lexicographical order of the labels in y_true.

Returns

Dictionary of performance metrics

Return type

dict

Raises
  • AssertionError – F1 score requested yet y_pred is missing

  • AttributeError – Requested metric requires probability scores and y_score is None

  • ValueError – Invalid metric provided; possibly missing signatures: ‘y_true’, ‘y_score’ or ‘y_pred’

cytopy.flow.cell_classifier.utils.check_downstream_populations(ref, root_population: str, population_labels: list)None

Check that in the ordered list of population labels, all populations are downstream of the given ‘root’ population.

Parameters
  • ref (FileGroup) –

  • root_population (str) –

  • population_labels (list) –

Returns

Return type

None

Raises

AssertionError – One or more populations not downstream of root

cytopy.flow.cell_classifier.utils.confusion_matrix_plots(classifier, x: pandas.core.frame.DataFrame, y: numpy.ndarray, class_labels: list, cmap: Optional[str] = None, figsize: tuple = (8, 20), **kwargs)

Generate a figure of two heatmaps showing a confusion matrix, one normalised by support one showing raw values, displaying a classifiers performance. Returns Matplotlib.Figure object.

Parameters
  • classifier (object) – Scikit-Learn classifier

  • x (Pandas.DataFrame) – Feature space

  • y (numpy.ndarray) – Labels

  • class_labels (list) – Class labels (as they should be displayed on the axis)

  • cmap (str) – Colour scheme, defaults to Matplotlib Blues

  • figsize (tuple (default=(10,5))) – Size of the figure

  • kwargs – Additional keyword arguments passed to sklearn.metrics.plot_confusion_matrix

Returns

Return type

Matplotlib.Figure

cytopy.flow.cell_classifier.utils.multilabel(ref, root_population: str, population_labels: list, features: list) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

Load the root population DataFrame from the reference FileGroup (assumed to be the first population in ‘population_labels’). Then iterate over the remaining population creating a dummy matrix of population affiliations for each row of the root population.

Parameters
  • ref (FileGroup) –

  • root_population (str) –

  • population_labels (list) –

  • features (list) –

Returns

Root population flourescent intensity values, population affiliations (dummy matrix)

Return type

(Pandas.DataFrame, Pandas.DataFrame)

cytopy.flow.cell_classifier.utils.singlelabel(ref, root_population: str, population_labels: list, features: list) -> (<class 'pandas.core.frame.DataFrame'>, <class 'numpy.ndarray'>)

Load the root population DataFrame from the reference FileGroup (assumed to be the first population in ‘population_labels’). Then iterate over the remaining population creating a Array of population affiliations; each cell (row) is associated to their terminal leaf node in the FileGroup population tree.

Parameters
  • root_population

  • ref (FileGroup) –

  • population_labels (list) –

  • features (list) –

Returns

Root population flourescent intensity values, labels

Return type

(Pandas.DataFrame, numpy.ndarray)