cytopy.flow.fda_norm¶
This module provides normalisation methods using landmark registration, first described with application to cytometry data by Hahne et al [1] with further expansion by Finak et al [2]. Landmark registration is implemented in the LandmarkReg class using ScikitFDA.
[1] Hahne F, Khodabakhshi AH, Bashashati A, Wong CJ, Gascoyne RD, Weng AP, SeyfertMargolis V, Bourcier K, Asare A, Lumley T, Gentleman R, Brinkman RR. Perchannel basis normalization methods for flow cytometry data. Cytometry A. 2010 Feb;77(2):12131. doi: 10.1002/cyto.a.20823. PMID: 19899135; PMCID: PMC3648208.
[2] Finak G, Jiang W, Krouse K, et al. Highthroughput flow cytometry data normalization for clinical trials. Cytometry A. 2014;85(3):277286. doi:10.1002/cyto.a.22433
Copyright 2020 Ross Burton
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Classes:

One technique for handling technical variation in cytometry data is local normalisation by aligning the probability density function of some data to a reference sample. 
Functions:

Cluster peaks (p). 

Given some target and reference DataFrame, estimate PDF for each using convolution based kernel density estimation (see KDEpy). 

Filter peaks (‘x’) to keep only those closest to their nearest centroid (centroid of clustered peaks). 

Given an array of peaks (p) labelled according to their origin (plabels; 0 being from target and 1 being from reference), match landmarks with each other, between samples, using K means clustering and a nearest centroid approach. 

Detect peaks of some function, y, in the grid space, x. 
Under the assumption that clusters have zero entropy (that is, all peaks within a cluster originate from the same sample), filter peaks to keep only those nearest to the centroid. 


Determine which clusters (if any) have zero entropy (only contains peaks from a single sample; either target or reference) 

class
cytopy.flow.fda_norm.
LandmarkReg
(target: pandas.core.frame.DataFrame, ref: pandas.core.frame.DataFrame, var: str, mpt: float = 0.001, **kwargs)¶ One technique for handling technical variation in cytometry data is local normalisation by aligning the probability density function of some data to a reference sample. This should be applied to a population immediately prior to applying a gate.
The alignment algorithm is inspired by previous work [1, 2] and is performed as follows: 1. The probability density function of some target data and a reference sample are estimated using a convolution based fast kernel density estimation algorithm (KDEpy.FFTKDE) 2. Landmarks are identified in both samples as peaks of local maximal density. 3. The peaks from both target and reference are combined and clustered using K means clustering; the number of clusters is chosen as the number of peaks identified in the target 4. Unique pairings of peaks between samples, closest to the centroid of a cluster, are generated and used as landmarks. 5. Landmark registration is performed using the ScikitFDA package to generate a warping function, with the target location being the mean between paired peaks 6. The warping function is applied to the target data, generating a new adjusted vector with high density regions matched to the reference sample
[1] Hahne F, Khodabakhshi AH, Bashashati A, Wong CJ, Gascoyne RD, Weng AP, SeyfertMargolis V, Bourcier K, Asare A, Lumley T, Gentleman R, Brinkman RR. Perchannel basis normalization methods for flow cytometry data. Cytometry A. 2010 Feb;77(2):12131. doi: 10.1002/cyto.a.20823. PMID: 19899135; PMCID: PMC3648208.
[2] Finak G, Jiang W, Krouse K, et al. Highthroughput flow cytometry data normalization for clinical trials. Cytometry A. 2014;85(3):277286. doi:10.1002/cyto.a.22433
 Parameters
target (Pandas.DataFrame) – Target data to be transformed; must contain column corresponding to ‘var’
ref (Pandas.DataFrame) – Reference data for computing alignment; must contain column corresponding to ‘var’
var (str) – Name of the target variable to align
mpt (float (default=0.001)) – Minimum peak threshold; peaks that are less than the given percentage of the ‘highest’ peak (max density) will be ignored. Use this to remove small perturbations.
kwargs – Additional keyword arguments passed to cytopy.flow.fda_norm.peaks call

landmarks
¶ (2, n) array, where n is the number of clusters. Order conserved between samples; first row is peaks from target, second row is peaks from reference.
 Type
numpy.ndarray

original_functions
¶ Original PDFs for target and reference
 Type
skfda.representation.grid.FDataGrid

warping_function
¶ Warping function
 Type
skfda.representation.grid.FDataGrid

adjusted_functions
¶ Registered curves following function compostion of original PDFs and warping function
 Type
skfda.representation.grid.FDataGrid

landmark_shift_deltas
¶ Corresponding shifts to align the landmarks of the PDFs described in original_functions
 Type
numpy.ndarray
Methods:
plot_shift
(x[, ax])Plot the reference PDF and overlay the target data before and after landmark registration.
plot_warping
([ax])Generate a figure that plots the PDFs prior to landmark registration, the warping function, and the registered curves.
shift_data
(x)Provided the original vector of data to transform, use the warping function to normalise the data and align the reference.

plot_shift
(x: numpy.ndarray, ax: Optional[matplotlib.axes._axes.Axes] = None)¶ Plot the reference PDF and overlay the target data before and after landmark registration.
 Parameters
x (numpy.ndarray) – Target data
ax (Matplotlib.Axes, optional) –
 Returns
 Return type
Matplotlib.Axes

plot_warping
(ax: Optional[list] = None)¶ Generate a figure that plots the PDFs prior to landmark registration, the warping function, and the registered curves.
 Parameters
ax (Matplotlib.Axes, optional) –
 Returns
 Return type
Matplotlib.Axes

shift_data
(x: numpy.ndarray)¶ Provided the original vector of data to transform, use the warping function to normalise the data and align the reference.
 Parameters
x (numpy.ndarray) –
 Returns
 Return type
numpy.ndarray
 Raises
AssertionError – If the class has not been called and therefore a warping function has not been defined

cytopy.flow.fda_norm.
cluster_landmarks
(p: numpy.ndarray, plabels: numpy.ndarray)¶ Cluster peaks (p). plabels indicate where the peak originated from; either target sample (0) or reference (1). The number of clusters, determined by KMeans clustering is equal to the number of peaks for the target sample.
 Parameters
p (numpy.ndarray) – Peaks
plabels (numpy.ndarray) – Peak labels
 Returns
K Means labels for each peak, cluster centroids
 Return type
numpy.ndarray, numpy.ndarray

cytopy.flow.fda_norm.
estimate_pdfs
(target: pandas.core.frame.DataFrame, ref: pandas.core.frame.DataFrame, var: str)¶ Given some target and reference DataFrame, estimate PDF for each using convolution based kernel density estimation (see KDEpy). ‘var’ is the variable of interest and should be a column in both ref and target
 Parameters
target (Pandas.DataFrame) –
ref (Pandas.DataFrame) –
var (str) –
 Returns
Target PDF, reference PDF, and grid space
 Return type
(numpy.ndarray, numpy.ndarray, numpy.ndarray)

cytopy.flow.fda_norm.
filter_by_closest_centroid
(x: numpy.ndarray, labels: numpy.ndarray, centroid: float)¶ Filter peaks (‘x’) to keep only those closest to their nearest centroid (centroid of clustered peaks). Labels indicate where the peak originated from; either target sample (0) or reference (1).
 Parameters
x (numpy.ndarray) –
labels (numpy.ndarray) –
centroid (float) –
 Returns
Peaks closest to centroid in cluster 1, Peaks closest to centroid in cluster 2
 Return type
float, float

cytopy.flow.fda_norm.
match_landmarks
(p: numpy.ndarray, plabels: numpy.ndarray)¶ Given an array of peaks (p) labelled according to their origin (plabels; 0 being from target and 1 being from reference), match landmarks with each other, between samples, using K means clustering and a nearest centroid approach.
 Parameters
p (numpy.ndarray) –
plabels (numpy.ndarray) –
 Returns
(2, n) array, where n is the number of clusters. Order conserved between samples; first row is peaks from target, second row is peaks from reference.
 Return type
numpy.ndarray

cytopy.flow.fda_norm.
peaks
(y: numpy.ndarray, x: numpy.ndarray, **kwargs)¶ Detect peaks of some function, y, in the grid space, x.
 Parameters
y (numpy.ndarray) –
x (numpy.ndarray) –
kwargs – Additional keyword arguments passed to detecta.detect_peaks function
 Returns
 Return type
List

cytopy.flow.fda_norm.
unique_clusters_filter_nearest_centroid
(p: numpy.ndarray, plabels: numpy.ndarray, km_labels: numpy.ndarray, centroids: numpy.ndarray)¶ Under the assumption that clusters have zero entropy (that is, all peaks within a cluster originate from the same sample), filter peaks to keep only those nearest to the centroid.
 Parameters
p (numpy.ndarray) – Peaks
plabels (numpy.ndarray) – Origin of the peak; either target (0) or reference (1)
km_labels (numpy.ndarray) – Cluster label for each peak
centroids (numpy.ndarray) – Cluster centroids
 Returns
Updated peaks and peak labels containing only those closest to cluster centroids
 Return type
numpy.ndarray, numpy.ndarray
 Raises
AssertionError – If a supplied cluster entropy is not zero

cytopy.flow.fda_norm.
zero_entropy_clusters
(km_labels: numpy.ndarray, plabels: numpy.ndarray, centroids: numpy.ndarray)¶ Determine which clusters (if any) have zero entropy (only contains peaks from a single sample; either target or reference)
 Parameters
km_labels (numpy.ndarray) – K means cluster labels
plabels (numpy.ndarray) – Origin of the peak; either target (0) or reference (1)
centroids (numpy.ndarray) – Cluster centroids
 Returns
List of centroids for clusters with zero entropy
 Return type
List