Truth.read_csv

Contents

Truth.read_csv#

missionbio.demultiplex.dna.truth.Truth.read_csv

static Truth.read_csv(csv_file: Path) Truth#

Load dna truth from a csv file

The CSV file should contain the following columns:

  • variant_id:

    It contains the name of the variant in the chromosome:position:ref_allele/alt_allele format.

    • chromosome:

      The chromosome the variant is on. Must include the “chr” prefix.

    • position:

      The position of the variant in 1-based coordinates.

    • ref_allele:

      The reference allele of the variant. Special notes:

      • For insertions it must be the base pair in the reference genome just before where the insertion starts.

      • For deletions it must start with the base pair in the reference allele just before where the deletion starts.

    • alt_allele:

      The alternate allele of the variant. Special notes:

      • For insertions it must start with the reference allele.

      • For deletions it must be the base pair in the reference genome just before where the deletion starts.

  • genotype:

    The expected genotype of the variant. Acceptable values are 0 (wildtype), 1 (heterozygous), 2 (homozygous), and 3 (missing)

  • type:

    The type of variant. Only rows with type=germline are used for demultiplexing.

  • sample_id:

    The name of the sample

Example

variant_id,genotype,type,sample_id
chr1:115256669:G/A,0,germline,SampleA
chr1:115256669:G/A,2,germline,SampleA
Parameters:

csv_file – The csv file with the truth data

Returns:

pd.DataFrame with the truth data.

Index is the sample_id and columns are the variant ids. Value represents expected VAF in 0-100 range. Unknown/missing values are -50.