cytopy.flow.fda_norm

This module provides normalisation methods using landmark registration, first described with application to cytometry data by Hahne et al [1] with further expansion by Finak et al [2]. Landmark registration is implemented in the LandmarkReg class using Scikit-FDA.

[1] Hahne F, Khodabakhshi AH, Bashashati A, Wong CJ, Gascoyne RD, Weng AP, Seyfert-Margolis V, Bourcier K, Asare A, Lumley T, Gentleman R, Brinkman RR. Per-channel basis normalization methods for flow cytometry data. Cytometry A. 2010 Feb;77(2):121-31. doi: 10.1002/cyto.a.20823. PMID: 19899135; PMCID: PMC3648208.

[2] Finak G, Jiang W, Krouse K, et al. High-throughput flow cytometry data normalization for clinical trials. Cytometry A. 2014;85(3):277-286. doi:10.1002/cyto.a.22433

Copyright 2020 Ross Burton

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Classes:

LandmarkReg(target, ref, var, mpt, **kwargs)

One technique for handling technical variation in cytometry data is local normalisation by aligning the probability density function of some data to a reference sample.

Functions:

cluster_landmarks(p, plabels)

Cluster peaks (p).

estimate_pdfs(target, ref, var)

Given some target and reference DataFrame, estimate PDF for each using convolution based kernel density estimation (see KDEpy).

filter_by_closest_centroid(x, labels, centroid)

Filter peaks (‘x’) to keep only those closest to their nearest centroid (centroid of clustered peaks).

match_landmarks(p, plabels)

Given an array of peaks (p) labelled according to their origin (plabels; 0 being from target and 1 being from reference), match landmarks with each other, between samples, using K means clustering and a nearest centroid approach.

peaks(y, x, **kwargs)

Detect peaks of some function, y, in the grid space, x.

unique_clusters_filter_nearest_centroid(p, …)

Under the assumption that clusters have zero entropy (that is, all peaks within a cluster originate from the same sample), filter peaks to keep only those nearest to the centroid.

zero_entropy_clusters(km_labels, plabels, …)

Determine which clusters (if any) have zero entropy (only contains peaks from a single sample; either target or reference)

class cytopy.flow.fda_norm.LandmarkReg(target: pandas.core.frame.DataFrame, ref: pandas.core.frame.DataFrame, var: str, mpt: float = 0.001, **kwargs)

One technique for handling technical variation in cytometry data is local normalisation by aligning the probability density function of some data to a reference sample. This should be applied to a population immediately prior to applying a gate.

The alignment algorithm is inspired by previous work [1, 2] and is performed as follows: 1. The probability density function of some target data and a reference sample are estimated using a convolution based fast kernel density estimation algorithm (KDEpy.FFTKDE) 2. Landmarks are identified in both samples as peaks of local maximal density. 3. The peaks from both target and reference are combined and clustered using K means clustering; the number of clusters is chosen as the number of peaks identified in the target 4. Unique pairings of peaks between samples, closest to the centroid of a cluster, are generated and used as landmarks. 5. Landmark registration is performed using the Scikit-FDA package to generate a warping function, with the target location being the mean between paired peaks 6. The warping function is applied to the target data, generating a new adjusted vector with high density regions matched to the reference sample

[1] Hahne F, Khodabakhshi AH, Bashashati A, Wong CJ, Gascoyne RD, Weng AP, Seyfert-Margolis V, Bourcier K, Asare A, Lumley T, Gentleman R, Brinkman RR. Per-channel basis normalization methods for flow cytometry data. Cytometry A. 2010 Feb;77(2):121-31. doi: 10.1002/cyto.a.20823. PMID: 19899135; PMCID: PMC3648208.

[2] Finak G, Jiang W, Krouse K, et al. High-throughput flow cytometry data normalization for clinical trials. Cytometry A. 2014;85(3):277-286. doi:10.1002/cyto.a.22433

Parameters
  • target (Pandas.DataFrame) – Target data to be transformed; must contain column corresponding to ‘var’

  • ref (Pandas.DataFrame) – Reference data for computing alignment; must contain column corresponding to ‘var’

  • var (str) – Name of the target variable to align

  • mpt (float (default=0.001)) – Minimum peak threshold; peaks that are less than the given percentage of the ‘highest’ peak (max density) will be ignored. Use this to remove small perturbations.

  • kwargs – Additional keyword arguments passed to cytopy.flow.fda_norm.peaks call

landmarks

(2, n) array, where n is the number of clusters. Order conserved between samples; first row is peaks from target, second row is peaks from reference.

Type

numpy.ndarray

original_functions

Original PDFs for target and reference

Type

skfda.representation.grid.FDataGrid

warping_function

Warping function

Type

skfda.representation.grid.FDataGrid

adjusted_functions

Registered curves following function compostion of original PDFs and warping function

Type

skfda.representation.grid.FDataGrid

landmark_shift_deltas

Corresponding shifts to align the landmarks of the PDFs described in original_functions

Type

numpy.ndarray

Methods:

plot_shift(x[, ax])

Plot the reference PDF and overlay the target data before and after landmark registration.

plot_warping([ax])

Generate a figure that plots the PDFs prior to landmark registration, the warping function, and the registered curves.

shift_data(x)

Provided the original vector of data to transform, use the warping function to normalise the data and align the reference.

plot_shift(x: numpy.ndarray, ax: Optional[matplotlib.axes._axes.Axes] = None)

Plot the reference PDF and overlay the target data before and after landmark registration.

Parameters
  • x (numpy.ndarray) – Target data

  • ax (Matplotlib.Axes, optional) –

Returns

Return type

Matplotlib.Axes

plot_warping(ax: Optional[list] = None)

Generate a figure that plots the PDFs prior to landmark registration, the warping function, and the registered curves.

Parameters

ax (Matplotlib.Axes, optional) –

Returns

Return type

Matplotlib.Axes

shift_data(x: numpy.ndarray)

Provided the original vector of data to transform, use the warping function to normalise the data and align the reference.

Parameters

x (numpy.ndarray) –

Returns

Return type

numpy.ndarray

Raises

AssertionError – If the class has not been called and therefore a warping function has not been defined

cytopy.flow.fda_norm.cluster_landmarks(p: numpy.ndarray, plabels: numpy.ndarray)

Cluster peaks (p). plabels indicate where the peak originated from; either target sample (0) or reference (1). The number of clusters, determined by KMeans clustering is equal to the number of peaks for the target sample.

Parameters
  • p (numpy.ndarray) – Peaks

  • plabels (numpy.ndarray) – Peak labels

Returns

K Means labels for each peak, cluster centroids

Return type

numpy.ndarray, numpy.ndarray

cytopy.flow.fda_norm.estimate_pdfs(target: pandas.core.frame.DataFrame, ref: pandas.core.frame.DataFrame, var: str)

Given some target and reference DataFrame, estimate PDF for each using convolution based kernel density estimation (see KDEpy). ‘var’ is the variable of interest and should be a column in both ref and target

Parameters
  • target (Pandas.DataFrame) –

  • ref (Pandas.DataFrame) –

  • var (str) –

Returns

Target PDF, reference PDF, and grid space

Return type

(numpy.ndarray, numpy.ndarray, numpy.ndarray)

cytopy.flow.fda_norm.filter_by_closest_centroid(x: numpy.ndarray, labels: numpy.ndarray, centroid: float)

Filter peaks (‘x’) to keep only those closest to their nearest centroid (centroid of clustered peaks). Labels indicate where the peak originated from; either target sample (0) or reference (1).

Parameters
  • x (numpy.ndarray) –

  • labels (numpy.ndarray) –

  • centroid (float) –

Returns

Peaks closest to centroid in cluster 1, Peaks closest to centroid in cluster 2

Return type

float, float

cytopy.flow.fda_norm.match_landmarks(p: numpy.ndarray, plabels: numpy.ndarray)

Given an array of peaks (p) labelled according to their origin (plabels; 0 being from target and 1 being from reference), match landmarks with each other, between samples, using K means clustering and a nearest centroid approach.

Parameters
  • p (numpy.ndarray) –

  • plabels (numpy.ndarray) –

Returns

(2, n) array, where n is the number of clusters. Order conserved between samples; first row is peaks from target, second row is peaks from reference.

Return type

numpy.ndarray

cytopy.flow.fda_norm.peaks(y: numpy.ndarray, x: numpy.ndarray, **kwargs)

Detect peaks of some function, y, in the grid space, x.

Parameters
  • y (numpy.ndarray) –

  • x (numpy.ndarray) –

  • kwargs – Additional keyword arguments passed to detecta.detect_peaks function

Returns

Return type

List

cytopy.flow.fda_norm.unique_clusters_filter_nearest_centroid(p: numpy.ndarray, plabels: numpy.ndarray, km_labels: numpy.ndarray, centroids: numpy.ndarray)

Under the assumption that clusters have zero entropy (that is, all peaks within a cluster originate from the same sample), filter peaks to keep only those nearest to the centroid.

Parameters
  • p (numpy.ndarray) – Peaks

  • plabels (numpy.ndarray) – Origin of the peak; either target (0) or reference (1)

  • km_labels (numpy.ndarray) – Cluster label for each peak

  • centroids (numpy.ndarray) – Cluster centroids

Returns

Updated peaks and peak labels containing only those closest to cluster centroids

Return type

numpy.ndarray, numpy.ndarray

Raises

AssertionError – If a supplied cluster entropy is not zero

cytopy.flow.fda_norm.zero_entropy_clusters(km_labels: numpy.ndarray, plabels: numpy.ndarray, centroids: numpy.ndarray)

Determine which clusters (if any) have zero entropy (only contains peaks from a single sample; either target or reference)

Parameters
  • km_labels (numpy.ndarray) – K means cluster labels

  • plabels (numpy.ndarray) – Origin of the peak; either target (0) or reference (1)

  • centroids (numpy.ndarray) – Cluster centroids

Returns

List of centroids for clusters with zero entropy

Return type

List