cytopy.data.fcs¶
The fcs module houses all functionality for the management and manipulation of data pertaining to a single biological specimen. This might include multiple cytometry files (primary staining and controls) all of which are housed within the FileGroup document. FileGroups should be generated and access through the Experiment class.
Copyright 2020 Ross Burton
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Classes:
|
Document representation of a file group; a selection of related fcs files (e.g. |
Functions:
|
Decorator that asserts the h5 file corresponding to the FileGroup exists. |
|
Check if node exists in hdf5 file. |
|
Wrapper to test if requested population passed to the given function exists in the given h5 file object |
|
Given a FileGroup generate a DataFrame detailing the number of events, proportion of parent population, and proportion of total (root population) for each population in the FileGroup. |
|
Given a dataframe of fcs events and lists of channels and markers, set the column names according to the given preference. |
-
class
cytopy.data.fcs.
FileGroup
(*args, **kwargs)¶ Document representation of a file group; a selection of related fcs files (e.g. a sample and it’s associated controls).
-
primary_id
¶ Unique ID to associate to group
- Type
str, required
-
files
¶ List of File objects
- Type
EmbeddedDocList
-
flags
¶ Warnings associated to file group
- Type
str, optional
-
notes
¶ Additional free text
- Type
str, optional
-
populations
¶ Populations derived from this file group
- Type
EmbeddedDocList
-
gates
¶ Gate objects that have been applied to this file group
- Type
EmbeddedDocList
-
collection_datetime
¶ Date and time of sample collection
- Type
DateTime, optional
-
processing_datetime
¶ Date and time of sample processing
- Type
DateTime, optional
-
valid
¶ True if FileGroup is valid
- Type
BooleanField (default=True)
-
subject
¶ Reference to Subject. If Subject is deleted, this field is nullified but the FileGroup will persist
- Type
ReferenceField
Miscellaneous:
Methods:
add_ctrl_file
(ctrl_id, data, channels, markers)Add a new control file to this FileGroup.
add_population
(population)Add a new Population to this FileGroup.
delete
([delete_hdf5_file])Delete FileGroup
delete_populations
(populations)Delete given populations.
get_population
(population_name)Given the name of a population associated to the FileGroup, returns the Population object, with index and control index ready loaded.
get_population_by_parent
(parent)Given the name of some parent population, return a list of Population object whom’s parent matches
init_new_file
(data, channels, markers)Under the assumption that this FileGroup has not been previously defined, generate a HDF5 file and initialise the root Population
list_downstream_populations
(population)For a given population find all dependencies
List population names
load_ctrl_population_df
(ctrl, population[, …])Load a population from an associated control.
load_population_df
(population[, transform, …])Load the DataFrame for the events pertaining to a single population.
merge_gate_populations
(left, right[, …])Merge two populations present in the current population tree.
merge_non_geom_populations
(populations, …)Merge multiple populations that are sourced either for classification or clustering methods.
population_stats
(population[, warn_missing])Returns a dictionary of statistics (number of events, proportion of parent, and proportion of all events) for the requested population.
print_population_tree
([image, path])Print population tree to stdout or save as an image if ‘image’ is True.
quantile_clean
([upper, lower])Iterate over every channel in the flow data and cut the upper and lower quartiles.
save
(*args, **kwargs)Save FileGroup and associated populations
subtract_populations
(left, right[, …])Subtract the right population from the left population.
update_population
(pop)Replace an existing population.
-
exception
DoesNotExist
¶
-
exception
MultipleObjectsReturned
¶
-
add_ctrl_file
(ctrl_id: str, data: numpy.array, channels: List[str], markers: List[str])¶ Add a new control file to this FileGroup.
- Parameters
ctrl_id (str) – Name of the control e.g (“CD45RA FMO” or “HLA-DR isotype control”
data (numpy.ndarray) – Single cell events data obtained for this control
channels (list) – List of channel names
markers (list) – List of marker names
- Returns
- Return type
None
- Raises
AssertionError – If control already exists
-
add_population
(population: cytopy.data.population.Population)¶ Add a new Population to this FileGroup.
- Parameters
population (Population) –
- Returns
- Return type
None
- Raises
DuplicatePopulationError – Population already exists
AssertionError – Population is missing index
-
delete
(delete_hdf5_file: bool = True, *args, **kwargs)¶ Delete FileGroup
- Parameters
delete_hdf5_file (bool (default=True)) –
- Returns
- Return type
None
-
delete_populations
(populations: list) → None¶ Delete given populations. Populations downstream from delete population(s) will also be removed.
- Parameters
populations (list or str) – Either a list of populations (list of strings) to remove or a single population as a string. If a value of “all” is given, all populations are dropped.
- Returns
- Return type
None
- Raises
AssertionError – If invalid value given for populations
-
get_population
(population_name: str) → cytopy.data.population.Population¶ Given the name of a population associated to the FileGroup, returns the Population object, with index and control index ready loaded.
- Parameters
population_name (str) – Name of population to retrieve from database
- Returns
- Return type
- Raises
MissingPopulationError – If population doesn’t exist
-
get_population_by_parent
(parent: str) → Generator¶ Given the name of some parent population, return a list of Population object whom’s parent matches
- Parameters
parent (str) – Name of the parent population to search for
- Returns
List of Populations
- Return type
Generator
-
init_new_file
(data: numpy.array, channels: List[str], markers: List[str])¶ Under the assumption that this FileGroup has not been previously defined, generate a HDF5 file and initialise the root Population
- Parameters
data (numpy.ndarray) –
channels (list) –
markers (list) –
- Returns
- Return type
None
-
list_downstream_populations
(population: str) → list¶ For a given population find all dependencies
- Parameters
population (str) – population name
- Returns
List of populations dependent on given population
- Return type
list or None
- Raises
AssertionError – If Population does not exist
-
list_populations
() → list¶ List population names
- Returns
- Return type
List
-
load_ctrl_population_df
(ctrl: str, population: str, classifier: str = 'XGBClassifier', classifier_params: Optional[dict] = None, scoring: str = 'balanced_accuracy', transform: str = 'logicle', transform_kwargs: Optional[dict] = None, verbose: bool = True, evaluate_classifier: bool = True, kfolds: int = 5, n_permutations: int = 25, sample_size: int = 10000) → pandas.core.frame.DataFrame¶ Load a population from an associated control. The assumption here is that control files have been collected at the same time as primary staining and differ by the absence or permutation of a marker/channel/stain. Therefore the population of interest in the primary staining will be used as training data to identify the equivalent population in the control.
The user should specify the control file, the population they want (which MUST already exist in the primary staining) and the type of classifier to use. Additional parameters can be passed to control the classifier and stratified cross validation with permutation testing will be performed if evalidate_classifier is set to True.
- Parameters
ctrl (str) – Control file to estimate population for
population (str) – Population of interest. MUST already exist in the primary staining.
classifier (str (default='XGBClassifier')) – Classifier to use. String value should correspond to a valid Scikit-Learn classifier class name or XGBClassifier for XGBoost.
classifier_params (dict, optional) – Additional keyword arguments passed when initiating the classifier
scoring (str (default='balanced_accuracy')) – Method used to evaluate the performance of the classifier if evaluate_classifier is True. String value should be one of the functions of Scikit-Learn’s classification metrics: https://scikit-learn.org/stable/modules/model_evaluation.html.
transform (str (default='logicle')) – Transformation to be applied to data prior to classification
transform_kwargs (dict, optional) – Additional keyword arguments applied to Transformer
verbose (bool (default=True)) – Whether to provide feedback
evaluate_classifier (bool (default=True)) – If True, stratified cross validation with permutating testing is applied prior to predicting control population, feeding back to stdout the performance of the classifier across k folds and n permutations
kfolds (int (default=5)) – Number of cross validation rounds to perform if evaluate_classifier is True
n_permutations (int (default=25)) – Number of rounds of permutation testing to perform if evaluate_classifier is True
sample_size (int (default=10000)) – Number of events to sample from primary data for training
- Returns
- Return type
Pandas.DataFrame
- Raises
AssertionError – If desired population is not found in the primary staining
MissingControlError – If the chosen control does not exist
-
load_population_df
(population: str, transform: str = 'logicle', features_to_transform: Optional[list] = None, transform_kwargs: Optional[dict] = None, label_downstream_affiliations: bool = False) → pandas.core.frame.DataFrame¶ Load the DataFrame for the events pertaining to a single population.
- Parameters
population (str) – Name of the desired population
transform (str or dict, optional (default="logicle")) – Transform to be applied; specify a value of None to not perform any transformation
features_to_transform (list, optional) – Features (columns) to be transformed. If not provied, all columns transformed
transform_kwargs (dict, optional) – Additional keyword arguments passed to Transformer
label_downstream_affiliations (bool (default=False)) – If True, an additional column will be generated named “population_label” containing the end node membership of each event e.g. if you choose CD4+ population and there are subsequent populations belonging to this CD4+ population in a tree like: “CD4+ -> CD4+CD25+ -> CD4+CD25+CD45RA+” then the population label column will contain the name of the lowest possible “leaf” population that an event is assigned too.
- Returns
- Return type
Pandas.DataFrame
- Raises
AssertionError – Invalid population, does not exist
-
merge_gate_populations
(left: cytopy.data.population.Population, right: cytopy.data.population.Population, new_population_name: Optional[str] = None)¶ Merge two populations present in the current population tree. The merged population will have the combined index of both populations but will not inherit any clusters and will not be associated to any children downstream of either the left or right population. The population will be added to the tree as a descendant of the left populations parent. New population will be added to FileGroup.
- Parameters
left (Population) –
right (Population) –
new_population_name (str (optional)) –
- Returns
- Return type
None
-
merge_non_geom_populations
(populations: list, new_population_name: str)¶ Merge multiple populations that are sourced either for classification or clustering methods. (Not supported for populations from autonomous gates)
- Parameters
populations (list) – List of populations to merge
new_population_name (str) – Name of the new population
- Returns
- Return type
None
- Raises
ValueError – If populations is invalid
-
population_stats
(population: str, warn_missing: bool = False)¶ Returns a dictionary of statistics (number of events, proportion of parent, and proportion of all events) for the requested population.
- Parameters
population (str) –
warn_missing (bool (default=False)) –
- Returns
- Return type
Dict
-
print_population_tree
(image: bool = False, path: Optional[str] = None)¶ Print population tree to stdout or save as an image if ‘image’ is True.
- Parameters
image (bool (default=False)) – Save tree as a png image
path (str (optional)) – File path for image, ignored if ‘image’ is False. Defaults to working directory.
- Returns
- Return type
None
-
quantile_clean
(upper: float = 0.999, lower: float = 0.001)¶ Iterate over every channel in the flow data and cut the upper and lower quartiles.
- Parameters
upper (float (default=0.999)) –
lower (float (default=0.001)) –
- Returns
- Return type
None
-
save
(*args, **kwargs)¶ Save FileGroup and associated populations
- Returns
- Return type
None
-
subtract_populations
(left: cytopy.data.population.Population, right: cytopy.data.population.Population, new_population_name: Optional[str] = None)¶ Subtract the right population from the left population. The right population must either have the same parent as the left population or be downstream of the left population. The new population will descend from the same parent as the left population. The new population will have a PolygonGeom geom. New population will be added to FileGroup.
- Parameters
left (Population) –
right (Population) –
new_population_name (str (optional)) –
- Returns
- Return type
None
- Raises
ValueError – If left and right population are not sourced from root or Gate
AssertionError – If left and right population do not share the same parent or the right population is not downstream of the left population
-
update_population
(pop: cytopy.data.population.Population)¶ Replace an existing population. Population to replace identified using ‘population_name’ field. Note: this method does not allow you to edit the
- Parameters
pop (Population) – New population object
- Returns
- Return type
None
-
-
cytopy.data.fcs.
data_loaded
(func: callable) → callable¶ Decorator that asserts the h5 file corresponding to the FileGroup exists.
- Parameters
func (callable) – Function to wrap
- Returns
Wrapper function
- Return type
callable
-
cytopy.data.fcs.
overwrite_or_create
(file: h5py._hl.files.File, data: numpy.ndarray, key: str)¶ Check if node exists in hdf5 file. If it does exist, overwrite with the given array otherwise create a new dataset.
- Parameters
file (h5py File object) –
data (Numpy Array) –
key (str) –
- Returns
- Return type
None
-
cytopy.data.fcs.
population_in_file
(func: callable)¶ Wrapper to test if requested population passed to the given function exists in the given h5 file object
- Parameters
func (callable) – Function to wrap
- Returns
- Return type
callable
-
cytopy.data.fcs.
population_stats
(filegroup: cytopy.data.fcs.FileGroup) → pandas.core.frame.DataFrame¶ Given a FileGroup generate a DataFrame detailing the number of events, proportion of parent population, and proportion of total (root population) for each population in the FileGroup.
- Parameters
filegroup (FileGroup) –
- Returns
- Return type
Pandas.DataFrame
-
cytopy.data.fcs.
set_column_names
(df: pandas.core.frame.DataFrame, channels: list, markers: list, preference: str = 'markers')¶ Given a dataframe of fcs events and lists of channels and markers, set the column names according to the given preference.
- Parameters
df (pd.DataFrame) –
channels (list) –
markers (list) –
preference (str) – Valid values are: ‘markers’ or ‘channels’
- Returns
- Return type
Pandas.DataFrame
- Raises
AssertionError – Preference must be either ‘markers’ or ‘channels’