Dna.filter_somatic_variants#
missionbio.mosaic.dna.Dna.filter_somatic_variants
- Dna.filter_somatic_variants(whitelist: Optional[Sequence] = None, blacklist: Optional[Sequence] = None, **kwargs) ndarray #
Find pathogenic somatic variants.
This filtering algorithm applies the basic filters from
filter_variants()
with reduced stringency followed by additional filters to identify somatic variants. These include looking for pathogenic variants that are present in a large number of cells (parameter:min_mut_prct_cells_denovo
) or variants that are present in a small number of cells (parameter:min_n_cells
) but co-occur with each other. The algorithm removes germline variants identified by their high frequency in the gnomAD database (parameter:gnomad_threshold
). The algorithm also removes variants that are too close to each other (parameter:denovo_filters
) on the same amplicon. A few other filters are applied to eventually arrive at a list of pathogenic somatic variants for a given sample.- Parameters:
- whitelistSequence
A list of somatic variants to keep.
- blacklistSequence
A list of germline variants to remove.
- kwargsdict
Keyword arguments passed to
missionbio.filter.somatic.config.SomaticFilterConfig
.For a full list of parameters run:
>>> from missionbio.filter.somatic.config import SomaticFilterConfig >>> print(SomaticFilterConfig())
The important parameters include:
- min_mut_prct_cells_denovofloat [0, 100], default 1
The minimum percent of the total cells in which the variant should be mutated. This is the same as min_mut_prct_cells. The denovo is appended because this value is only used when the background error rate file is not provided (which is the default). That file is only available for certain Mission Bio catalog panels e.g. the AML-MRD panel
- min_n_cells: int, default 3
The minimum number of cells in which the variant should be present when looking for co-occurring variants.
- gnomad_threshold: float [0, 1], default 0.001
The threshold for the gnomAD frequency. Variants with a frequency greater than this value are removed.
- denovo_filters: tuple, default (‘NOT_HOMOPOLYMER’, ‘IS_NONSYNONYMOUS’, ‘NOT_IN_GNOMAD’, ‘CONSECUTIVE’, ‘IS_FLT3’)
The filters to apply to the variants. Variants that do not pass these filters are removed.
NOT_HOMOPOLYMER: Variants that are not homopolymers are kept.
IS_NONSYNONYMOUS: Only non-synonymous variants are kept.
NOT_IN_GNOMAD: The above gnomAD filter is used.
CONSECUTIVE: Remove variants that are close to each other. See:
filter_variants_consecutive()
.IS_FLT3 - Keep FLT3 variants.
- Returns:
- np.ndarray
The list of somatic variants.
Notes
Germline variants that have changes due to copy number induced LoH will be removed. Only pathogentic variants are kept.
< Class Dna