cytopy.data.experiment

The experiment module houses the Experiment class, used to define cytometry based experiments that can consist of one or more biological specimens. An experiment should be defined for each cytometry staining panel used in your analysis and the single cell data (contained in *.fcs files) added to the experiment using the ‘add_new_sample’ method. Experiments should be created using the Project class (see cytopy.data.projects). All functionality for experiments and Panels are housed within this module.

Copyright 2020 Ross Burton

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Classes:

Experiment(*args, **values)

Container for Cytometry experiment.

NormalisedName(*args, **kwargs)

Defines a standardised name for a channel or marker and provides method for testing if a channel/marker should be associated to standard

Panel(*args, **kwargs)

Document representation of channel/marker definition for an experiment.

Functions:

check_duplication(x)

Internal method.

check_excel_template(path)

Check excel template and if valid return pandas dataframes

check_pairing(channel_marker, ref_mappings)

Internal method.

compenstate(x, spill_matrix)

Compensate the given data, x, using the spillover matrix by solving for their linear combination.

duplicate_mappings(mappings)

Check for duplicates in a list of dictionaries describing channel/marker mappings.

load_control_population_from_experiment(…)

Load Population from a given control from samples in the given Experiment and generate a standard exploration dataframe that contains the columns ‘sample_id’, ‘subject_id’, and initialises additional columns with null values if specified (additional_columns).

load_population_data_from_experiment(…[, …])

Load Population from samples in the given Experiment and generate a standard exploration dataframe that contains the columns ‘sample_id’, ‘subject_id’, ‘meta_label’ and initialises additional columns with null values if specified (additional_columns).

missing_channels(mappings, channels[, errors])

Check a list of channel/marker dictionaries for missing channels according to the reference channels given.

query_normalised_list(x, ref)

Internal method for querying a channel/marker against a reference list of NormalisedName’s

standardise_names(channel_marker, …)

Given a dictionary detailing a channel/marker pair ({“channel”: str, “marker”: str}) standardise its contents using the reference material provided.

class cytopy.data.experiment.Experiment(*args, **values)

Container for Cytometry experiment. The correct way to generate and load these objects is using the Project.add_experiment method (see cytopy.data.project.Project). This object provides access to all experiment-wide functionality. New files can be added to an experiment using the add_new_sample method.

experiment_id

Unique identifier for experiment

Type

str, required

panel

Panel object describing associated channel/marker pairs

Type

ReferenceField, required

fcs_files

Reference field for associated files

Type

ListField

flags

Warnings associated to experiment

Type

str, optional

notes

Additional free text comments

Type

str, optional

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Methods:

control_counts([ax])

Generates a barplot of total counts of each control in Experiment FileGroup’s

delete([signal_kwargs])

Delete Experiment; will delete all associated FileGroups.

delete_all_populations(sample_id)

Delete population data associated to experiment.

filter_samples_by_subject(query)

Filter FileGroups associated to this experiment based on some subject meta-data

generate_panel(panel_definition)

Associate a panel to this Experiment, either by fetching an existing panel using the given panel name or by generating a new panel using the panel definition provided (path to a valid template).

get_sample(sample_id)

Given a sample ID, return the corresponding FileGroup object

list_samples([valid_only])

Generate a list IDs of file groups associated to experiment

merge_populations(mergers)

For each FileGroup in sequence, merge populations.

population_statistics([populations])

Generates a Pandas DataFrame of population statistics for all FileGroups of an Experiment, for the given populations or all available populations if ‘populations’ is None.

remove_sample(sample_id)

Remove sample (FileGroup) from experiment.

sample_exists(sample_id)

Returns True if the given sample_id exists in Experiment

exception DoesNotExist
exception MultipleObjectsReturned
control_counts(ax: Optional[matplotlib.axes._axes.Axes] = None)matplotlib.axes._axes.Axes

Generates a barplot of total counts of each control in Experiment FileGroup’s

Parameters

ax (Matplotlib.Axes, optional) –

Returns

Return type

Matplotlib.Axes

delete(signal_kwargs=None, **write_concern)

Delete Experiment; will delete all associated FileGroups.

Returns

Return type

None

delete_all_populations(sample_id: str)None

Delete population data associated to experiment. Give a value of ‘all’ for sample_id to remove all population data for every sample.

Parameters

sample_id (str) – Name of sample to remove populations from’; give a value of ‘all’ for sample_id to remove all population data for every sample.

Returns

Return type

None

filter_samples_by_subject(query: str)list

Filter FileGroups associated to this experiment based on some subject meta-data

Parameters

query (str or mongoengine.queryset.visitor.Q) – Query to make on Subject

Returns

Return type

List

generate_panel(panel_definition: str)None

Associate a panel to this Experiment, either by fetching an existing panel using the given panel name or by generating a new panel using the panel definition provided (path to a valid template).

Parameters

panel_definition (str) – Path to a panel definition

Returns

Return type

None

Raises

ValueError – Panel definition is not a string or dict

get_sample(sample_id: str)cytopy.data.fcs.FileGroup

Given a sample ID, return the corresponding FileGroup object

Parameters

sample_id (str) – Sample ID for search

Returns

Return type

FileGroup

Raises

MissingSampleError – If requested sample is not found in the experiment

list_samples(valid_only: bool = True)list

Generate a list IDs of file groups associated to experiment

Parameters

valid_only (bool) – If True, returns only valid samples (samples without ‘invalid’ flag)

Returns

List of IDs of file groups associated to experiment

Return type

List

merge_populations(mergers: dict)

For each FileGroup in sequence, merge populations. Given dictionary should contain a key corresponding to the new population name and value being a list of populations to merge. If one or more populations are missing, then available populations will be merged.

Parameters

mergers (dict) –

Returns

Return type

None

population_statistics(populations: Optional[list] = None)pandas.core.frame.DataFrame

Generates a Pandas DataFrame of population statistics for all FileGroups of an Experiment, for the given populations or all available populations if ‘populations’ is None.

Parameters

populations (list, optional) –

Returns

Return type

Pandas.DataFrame

remove_sample(sample_id: str)

Remove sample (FileGroup) from experiment.

Parameters

sample_id (str) – ID of sample to remove

Returns

Return type

None

sample_exists(sample_id: str)bool

Returns True if the given sample_id exists in Experiment

Parameters

sample_id (str) – Name of sample to search for

Returns

True if exists, else False

Return type

bool

class cytopy.data.experiment.NormalisedName(*args, **kwargs)

Defines a standardised name for a channel or marker and provides method for testing if a channel/marker should be associated to standard

standard

the “standard” name i.e. the nomenclature we used for a channel/marker in this panel

Type

str, required

regex_str

regular expression used to test if a term corresponds to this standard

Type

str

permutations

String values that have direct association to this standard (comma seperated values)

Type

str

case_sensitive

is the nomenclature case sensitive? This would be false for something like ‘CD3’ for example, where ‘cd3’ and ‘CD3’ are synonymous

Type

bool, (default=False)

Methods:

query(x)

Given a term ‘x’, determine if ‘x’ is synonymous to this standard.

query(x: str)str

Given a term ‘x’, determine if ‘x’ is synonymous to this standard. If so, return the standardised name.

Parameters

x (str) – search term

Returns

Standardised name if synonymous to standard, else None

Return type

str or None

class cytopy.data.experiment.Panel(*args, **kwargs)

Document representation of channel/marker definition for an experiment. A panel, once associated to an experiment will standardise data upon input; when an fcs file is created in the database, it will be associated to an experiment and the channel/marker definitions in the fcs file will be mapped to the associated panel.

markers

list of marker names; see NormalisedName

Type

EmbeddedDocListField

channels

list of channels; see NormalisedName

Type

EmbeddedDocListField

mappings

list of channel/marker mappings; see ChannelMap

Type

EmbeddedDocListField

initiation_date

date of creationfiles[‘controls’]

Type

DateTime

Methods:

create_from_dict(x)

Populate panel attributes from a python dictionary

create_from_excel(path)

Populate panel attributes from an excel template

list_channels()

List of channels associated to panel

list_markers()

List of channels associated to panel

create_from_dict(x: dict)

Populate panel attributes from a python dictionary

Parameters

x (dict) – dictionary object containing panel definition

Returns

Return type

None

Raises

AssertionError – If invalid dictionary template

create_from_excel(path: str)None

Populate panel attributes from an excel template

Parameters

path (str) – path of file

Returns

Return type

None

Raises

AssertionError – If file path is invalid

list_channels()list

List of channels associated to panel

Returns

Return type

List

list_markers()list

List of channels associated to panel

Returns

Return type

List

cytopy.data.experiment.check_duplication(x: list)bool

Internal method. Given a list check for duplicates. Warning generated for duplicates.

Parameters

x (list) –

Returns

True if duplicates are found, else False

Return type

bool

cytopy.data.experiment.check_excel_template(path: str) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

Check excel template and if valid return pandas dataframes

Parameters

path (str) – file path for excel template

Returns

tuple of pandas dataframes (nomenclature, mappings) or None

Return type

(Pandas.DataFrame, Pandas.DataFrame) or None

Raises

AssertionError – If duplicate entries or missing entries in excel template

cytopy.data.experiment.check_pairing(channel_marker: dict, ref_mappings: List[cytopy.data.mapping.ChannelMap])bool

Internal method. Given a channel and marker check that a valid pairing exists in the list of given mappings.

Parameters
  • channel_marker (dict) –

  • ref_mappings (list) – List of ChannelMap objects

Returns

True if pairing exists, else False

Return type

bool

cytopy.data.experiment.compenstate(x: numpy.ndarray, spill_matrix: numpy.ndarray)numpy.ndarray

Compensate the given data, x, using the spillover matrix by solving for their linear combination.

Parameters
  • x (numpy.ndarray) –

  • spill_matrix (numpy.ndarray) –

Returns

Return type

numpy.ndarray

cytopy.data.experiment.duplicate_mappings(mappings: List[dict])

Check for duplicates in a list of dictionaries describing channel/marker mappings. Raise AssertionError if duplicates found.

Parameters

mappings (list) –

Returns

Return type

None

Raises

AssertionError – If duplicate channel/marker found

cytopy.data.experiment.load_control_population_from_experiment(experiment: cytopy.data.experiment.Experiment, population: str, ctrl: str, transform: str = 'logicle', transform_kwargs: Optional[dict] = None, sample_ids: Optional[list] = None, verbose: bool = True, additional_columns: Optional[list] = None)

Load Population from a given control from samples in the given Experiment and generate a standard exploration dataframe that contains the columns ‘sample_id’, ‘subject_id’, and initialises additional columns with null values if specified (additional_columns).

Parameters
  • experiment (Experiment) –

  • population (str) –

  • ctrl (str,) –

  • transform (str) –

  • transform_kwargs (dict, optional) –

  • sample_ids (list, optional) –

  • verbose (bool (default=True)) –

  • additional_columns (list, optional) –

Returns

Return type

Pandas.DataFrame

cytopy.data.experiment.load_population_data_from_experiment(experiment: cytopy.data.experiment.Experiment, population: str, transform: str = 'logicle', transform_kwargs: Optional[dict] = None, sample_ids: Optional[list] = None, verbose: bool = True, additional_columns: Optional[list] = None)

Load Population from samples in the given Experiment and generate a standard exploration dataframe that contains the columns ‘sample_id’, ‘subject_id’, ‘meta_label’ and initialises additional columns with null values if specified (additional_columns).

Parameters
  • experiment (Experiment) –

  • population (str) –

  • transform (str) –

  • transform_kwargs (dict, optional) –

  • sample_ids (list, optional) –

  • verbose (bool (default=True)) –

  • additional_columns (list, optional) –

Returns

Return type

Pandas.DataFrame

cytopy.data.experiment.missing_channels(mappings: List[dict], channels: List[cytopy.data.experiment.NormalisedName], errors: str = 'raise')

Check a list of channel/marker dictionaries for missing channels according to the reference channels given.

Parameters
  • mappings (list) –

  • channels (list) –

  • errors (str) –

Returns

Return type

None

Raises

KeyError – If channel is missing

cytopy.data.experiment.query_normalised_list(x: str, ref: List[cytopy.data.experiment.NormalisedName])str

Internal method for querying a channel/marker against a reference list of NormalisedName’s

Parameters
  • x (str or None) – channel/marker to query

  • ref (list) – list of NormalisedName objects for reference search

Returns

Standardised name

Return type

str

Raises

AssertionError – If no or multiple matches found in query

cytopy.data.experiment.standardise_names(channel_marker: dict, ref_channels: List[cytopy.data.experiment.NormalisedName], ref_markers: List[cytopy.data.experiment.NormalisedName], ref_mappings: List[cytopy.data.mapping.ChannelMap])dict

Given a dictionary detailing a channel/marker pair ({“channel”: str, “marker”: str}) standardise its contents using the reference material provided.

Parameters
  • channel_marker (dict) –

  • ref_channels (list) –

  • ref_markers (list) –

  • ref_mappings (list) –

Returns

Return type

dict

Raises

ValueError – If channel and marker are missing