Dna.signature

Contents

Dna.signature#

missionbio.mosaic.dna.Dna.signature

Dna.signature(attribute: Union[str, ndarray, DataFrame], kind: str = 'median', splitby: Optional[Union[str, ndarray, DataFrame]] = 'label', features: Optional[Union[str, ndarray, DataFrame]] = None, nan_value: Optional[float] = None) DataFrame#

The chosen signature for each cluster and feature.

Generate feature signatures for each cluster/feature pair across all barcodes using the supplied assay and layer.

Parameters:
attributeUnion[str, np.ndarray, pd.DataFrame]

Name of the layer or row attribute to be evaluated. Uses _Assay.get_attribute() constrained by row to retrieve the values.

kind[“median”, “mode”, “std”, “mean”]

The kind of signature to return

splitbyUnion[None, str, np.ndarray], default LABEL

The labels by which the cells are split. The signature is returned for each unique label. Uses _Assay.get_attribute() to retrieve the values constrained by row. The shape must be equal to (#cells). When this is None no grouping across cells is performed.

featuresUnion[None, str, np.ndarray], default None

The labels by which the ids are grouped. The signature is returned for each unique group. The signature kind (“median”, “mean”…) is first applied across the cells and then accross the features. _Assay.get_attribute() constrained by col is used to fetch the features. The shape must be equal to (#ids) if grouping is to be performed. A subset of ids can also be passed to get the signature for only those ids, or to reorder the ids in the returned DataFrame. When this is None no grouping across ids is performed.

nan_valueOptional[float]

The value in matrix that are to be converted to NaN. NaN values are removed before calculating the signatures.

Returns:
pd.DataFrame

The index are the clusters and the columns are the features

Notes

  1. Signature of all NaNs is NaN

  2. Median of even numbers are the average of the middle values

  3. Multiple modes return the lowest value

  4. Standard deviation of one point is NaN

Examples

To remove all the missing NGT values before calculating the median NGT the following can be called.

>>> import missionbio.mosaic as ms
>>> sample = ms.load_example_dataset("3 cell mix")
>>> sample.dna.signature("NGT", nan_value=3)

To compute standard deviation of AF where DP is not 0, use the AF_MISSING layer which stores those values as -50.

>>> sample.dna.signature("AF_MISSING", kind="std", nan_value=-50)

To compute the median copy number for each dna cell type, the CNV assay will be used. Note that the features=None by default, therefore the values are not combined across the ids.

>>> dna_labels = sample.dna.get_labels()
>>> sample.cnv.signature("ploidy", splitby=dna_labels)

To get the median value per celltype per chromosome, the features argument can be passed

>>> sample.cnv.signature("ploidy", splitby=dna_labels, features="CHROM")

It is also possible to obtain the median value per chromosome but for all the cells by setting splitby=None. These values can be used to color the cells in scatterplots. For example, to color the cells in a UMAP by the median ploidy of chromosome 7, the following can be executed.

>>> chr_ploidy = sample.cnv.signature("ploidy", splitby=None, features="CHROM")
>>> chr7_ploidy = chr_ploidy["7"].values
>>> sample.protein.scatterplot("umap", colorby=chr7_ploidy)

< Class Dna