Truth.read_csv#

missionbio.demultiplex.dna.truth.Truth.read_csv

static Truth.read_csv(csv_file: Path) → Truth#

Load dna truth from a csv file

Csv file should contain the following columns: - chromosome:

The chromosome the variant is on. Must include the “chr” prefix.

position:
The position of the variant in 1-based coordinates.
ref_allele:

The reference allele of the variant. Special notes:

Insertions: must be the base pair in the reference genome
just before where the insertion starts.

Deletions: must start with the base pair in the reference allele
just before where the deletion starts.
alt_allele:

The alternate allele of the variant. Special notes:
Insertions: must start with the reference allele. Deletions: must be the base pair in the reference genome

just before where the deletion starts.
genotype:

The expected genotype of the variant. Acceptable values:
0 - wildtype 1 - heterozygous 2 - homozygous 3 - missing
type:
The type of variant. Only rows with type=germline are used for demultiplexing.
sample_id:
The name of the sample

Example

chromosome,position,ref_allele,alt_allele,genotype,type,sample_id chr1,115256669,G,A,0,germline,SampleA chr1,115256669,G,A,2,germline,SampleA

Parameters:

csv_file – The csv file with the truth data

Returns:

pd.DataFrame with the truth data.

Index is the sample_id and columns are the variant ids. Value represents expected VAF in 0-100 range. Unknown/missing values are -50.

< Class Truth

Truth.read_csv