LikelihoodMethod.label_cells#
missionbio.demultiplex.dna.likelihood.LikelihoodMethod.label_cells
- LikelihoodMethod.label_cells(dna: Optional[Assay] = None, *, af: Optional[DataFrame] = None, dp: Optional[DataFrame] = None, clone_weights: Union[bool, ndarray] = True, filter_ado: bool = True, min_fraction_genotyped: float = 0.3, min_fraction_match: float = 0.7, mixing_rate: float = 0.2) ndarray#
Run the demultiplexing
- Parameters:
dna – DNA assay object. Either af and dp or dna must be provided.
af – The allele frequency of the variants. If not provided, it will be obtained from dna
dp – The depth of the variants. If not provided, it will be obtained from dna
clone_weights – Whether to weigh the samples based on their estimated proportions in the data. This is useful when the samples are not expected to be in equal proportions and the number of variants is low. If True, ADO cells are more accurately assigned to their HET clones, and mixed cells are unlikely to be larger than their parents. Only set this to False when strict likelihood based assignment is required and doublets are not added to the truth. If an array is provided, it must be of the same length as the number of samples in the truth and sum to 1. These values will be used as the prior probabilities for the samples instead of estimating them from the data.
filter_ado – Whether to mark cells that have more than 50% of the variants with possible ADO as Ambiguous.
min_fraction_genotyped – The minimum fraction of variants genotyped in a cell. The variants that are included in the computation of this fraction are the ones which are present in both the DNA and the truth data. These variants must also be genotyped in at least min_fraction_genotyped cells.
min_fraction_match – The minimum fraction of matches required between the truth genotype and the genotype of the cell. The mismatches are weighted. A mismatch for a HET call is weighted at 0.75, missing calls are weighted at 0.15 and all other mismatches are weighted at 1. The fraction of the sum of the weights to the number of genotyped variants is the fraction mismatched. The inverse of that is the fraction matched, which should be greater than this value. It is not recommended to have this lower than 0.65 or greater than 0.85
mixing_rate – The expected rate of doublets in the data. This is used to set the prior probabilities for the samples. The prior for singlets is set to (1 - mixing_rate) and the prior for doublets is set to mixing_rate. Doublets are assumed to be labels in the truth that contain a colon (“:”).
- Returns:
The labels assigned to the cells