Dna.assign_from_truth

Dna.assign_from_truth#

missionbio.mosaic.dna.Dna.assign_from_truth

Dna.assign_from_truth(truth: DataFrame, add_doublets: bool = True, collapse: bool = True, filter_variants: bool = True, min_fraction_genotyped: float = 0.3, min_fraction_match: float = 0.7) None#
Parameters:
truthpd.DataFrame.

DataFrame with the cell lines in the index and the variants in the columns. The values should be NGT-like {0, 1, 2, 3} or AF-like {0, 50, 100, -50}. When add_doublets is False, the AF values can also contain {25, 75} i.e. the values for mixed cells.

add_doubletsbool

Whether to add doublets to the truth

collapsebool

Whether all the doublet clones should be collapsed to one cluster named Mixed

filter_variantsbool

Whether to filter the variants based on the truth before assigning the labels. These are variants that are not SNPs, are correlated, or do not pass the default filters.

min_fraction_genotyped: float

The minimum fraction of variants genotyped in a cell. The variants that are included in the computation of this fraction are the ones which are present in both the DNA and the truth data. These variants must also be genotyped in at least min_fraction_genotyped cells.

min_fraction_match: float

The minimum fraction of matches required between the truth genotype and the genotype of the cell. The mismatches are weighted. A mismatch for a HET call is weighted at 0.75, missing calls are weighted at 0.15 and all other mismatches are weighted at 1. The fraction of the sum of the weights to the number of genotyped variants is the fraction mismatched. The inverse of that is the fraction matched, which should be greater than this value. It is not recommended to have this lower than 0.65 or greater than 0.85


< Class Dna