Truth.read_csv

Contents

Truth.read_csv#

missionbio.demultiplex.dna.truth.Truth.read_csv

static Truth.read_csv(csv_file: Path) Truth#

Load dna truth from a csv file

Csv file should contain the following columns: - chromosome:

The chromosome the variant is on. Must include the “chr” prefix.

  • position:

    The position of the variant in 1-based coordinates.

  • ref_allele:
    The reference allele of the variant. Special notes:
    Insertions: must be the base pair in the reference genome

    just before where the insertion starts.

    Deletions: must start with the base pair in the reference allele

    just before where the deletion starts.

  • alt_allele:
    The alternate allele of the variant. Special notes:

    Insertions: must start with the reference allele. Deletions: must be the base pair in the reference genome

    just before where the deletion starts.

  • genotype:
    The expected genotype of the variant. Acceptable values:

    0 - wildtype 1 - heterozygous 2 - homozygous 3 - missing

  • type:

    The type of variant. Only rows with type=germline are used for demultiplexing.

  • sample_id:

    The name of the sample

Example

chromosome,position,ref_allele,alt_allele,genotype,type,sample_id chr1,115256669,G,A,0,germline,SampleA chr1,115256669,G,A,2,germline,SampleA

Parameters:

csv_file – The csv file with the truth data

Returns:

pd.DataFrame with the truth data.

Index is the sample_id and columns are the variant ids. Value represents expected VAF in 0-100 range. Unknown/missing values are -50.


< Class Truth