LikelihoodMethod

LikelihoodMethod#

missionbio.demultiplex.dna.likelihood.LikelihoodMethod

class LikelihoodMethod(truth: DataFrame, het_mismatch_penalty: float = 0.75, nocall_mismatch_penalty: float = 0.15, **kwargs: Any)#

A likelihood-based demultiplexing method for DNA data.

This method assigns a label to every cell. The label is the sample that is most likely to have generated the observed DNA data. The likelihoods are calculated using the allele frequencies and depths of the variants in the DNA data and the allele frequencies in the truth data.

Parameters:
  • truth – The truth used to label the cells. It must be filtered to all the relevant variants and contain the signature for doublets as well. The only check that is performed is the overlap with the DNA variants when label_cells is called.

  • het_mismatch_penalty – The penalty for a mismatch in a HET call. The lower the value the more lineant is the filter towards HET calls being called as WT / HOM due to ADO.

  • nocall_mismatch_penalty – The penalty for a mismatch in a missing call. The lower the value the more lineant is the filter towards cells with missing data. Increasing this value will reduce the miscalling in incomplete cells.

  • **kwargs – Passed to GTModel. Model parameters can be changed using these arguments.

Functions#

__init__

param truth:

The truth used to label the cells. It must be filtered to all the relevant

af_to_col_index

Convert the AF values to the col indices in the likelihoods DataFrame.

af_to_row_index

Convert the AF values to the row indices in the likelihoods DataFrame.

cluster_truth

Variants that could be different across samples

collapse_doublets_to_mixed

Convert all doublet labels to MIXED

get_fraction_match

Get the fraction of matches between the truth and the observed data.

label_cells

Run the demultiplexing

likelihoods

Genrating a probability distribution

sim_af

Simulate allele frequencies for the cells based on the profiles

Attributes#