Ge.get_target_modifications

Ge.get_target_modifications#

missionbio.mosaic.ge.Ge.get_target_modifications

Ge.get_target_modifications(target: str) DataFrame#

Get the target modifications DataFrame for the specified target. Identify variants on allele-1 and allele-2 for selected target across all cells

Returns a dataframe with 6 columns:

  • allele1: A comma (,) seperated list of variants on allele-1 of the cell

  • allele2: A comma (,) seperated list of variants on allele-2 of the cell

  • dp: DP value for the cell as calculated by GATK

  • gq: GQ value for the cell as calculated by GATK

  • af1: Allele frequency for the variants on allele-1 of the cell

  • af2: Allele frequency for the variants on allele-2 of the cell

  • simple_label: A simple label for the cell based on the allele1 and allele2 values. Has possible values:

    • “INDEL” - Either of the alleles contain an INDEL

    • “NO INDEL” - Neither of the alleles contain an INDEL

  • descriptive_label: A descriptive label for the cell based on the allele1 and allele2 values. Has possible values:

    • “NO INDEL” - Neither of the alleles contain an INDEL

    • “Mono-allelic INDEL” - One of the alleles contain an INDEL

    • “Heterozygous Bi-allelic INDEL” - Both alleles contain an INDEL, but they are different INDELs

    • “Homozygous Bi-allelic INDEL” - Both alleles contain an INDEL, and they are the same INDEL

The index of the dataframe is the cell barcode

In case the cell has a homozygous alternate genotype the values in allele1 will match the values in allele2 and the value in af1 and af2 columns will be the same

In case the cell does not have sufficient genoptyping information for the target the values in allele1 and allele2 would be None, and the values in dp, gq, af1 and af2 would be 0

Parameters:
targetstr

The target name for which the modifications are to be retrieved.

Returns:
pd.DataFrame

DataFrame containing the target modifications for the specified target.


Ge