Dna.count

Dna.count(features, layer='NGT', group_missing=True, min_clone_size=1, ignore_zygosity=False, show_plot=False)

A clustering method available only for DNA.

Labels the cells based on the groups formed by the chosen features. The values are stored in the LABEL layer.

The returned dataframe contains information regarding the nature of the subclones. False positive clones can be obtained due to Allele Dropout (ADO) events. It contains three columns, a score, the parent clones, and the sister ADO clones. The indices are the subclone names.

In an ADO event HET goes to WT and HOM for a given variant in a subset of the cells. Here, the HET clone is called the parent clone, the HOM and WT clones are the ADO clones, together called the sister clones.

The parent and sister clones will be np.nan if the score is zero. Otherwise it is the name of the clone from which the subclone was obtained due to an ADO event.

The score for each subclone measures the possibility that it’s a flase positive subclone obtained due to an ADO event. The score is 0 if it unlikely to be a clone due to ADO and 1 if it is highly likely to be an ADO clone.

The score takes into account the following metrics.
  1. NGT values of the clones

  2. Relative proportions of the clones

  3. Absolute proportions of the clones (uses min_clone_size as a parameter)

  4. Mean GQ of the clones

  5. Mean DP of the clones

The score is calculated using four sub scores.

score = (ss + ds + gs) * ps

  1. ss - sister score (0 - 0.8)

    It measures the proportion of the clone with resepect to its sister clone. This score is closer to 0.8 when the sister clones have similar proportions and exactly 0.8 when their proportions are within the min_clone_size.

  2. ds - DP score (0 - 0.1)

    It measures the mean DP of the clone with resepect to its parent clone. It is closer to 0.1 if the DP of the clone is lower than the parents’ DP.

  3. gs - GQ score (0 - 0.1)

    It measures the mean GQ of the clone with resepect to its parent clone. It is closer to 0.1 if the GQ of the clone is lowert than the parents’ GQ.

  4. ps - parent score (0 - 1)

    It measures the proportion of the clone with respect to the parent clone. This score is closer to 1 the larger the parent is compared to the clone, and closer to 0 the smaller the parent compared to the clone.

Parameters
featureslist-like

The features which are to be considered while allocating the groups formed by the genotype.

layerstr

Name of the layer used to count the cell types. Expected values are NGT or NGT_FILTERED as obtained from the Dna.filter_variants() method.

group_missingbool

Whether the clusters caused due to missing values are merged together under one cluster named ‘Missing’.

min_clone_sizefloat [0, 100]

The minimumum proportion of total cells to be present in the clone to count it as a separate clone.

ignore_zygositybool

Whether HET and HOM are considered the same or not

show_plotbool

Whether a plot showing the ADO identification process should be shown or not.

Returns
pd.DataFrame / None

None is returned if ignore_zygosity is True or group_missing is False otherwise a pandas dataframe is returned.