Truth.read_csv#
missionbio.demultiplex.dna.truth.Truth.read_csv
- static Truth.read_csv(csv_file: Path) Truth #
Load dna truth from a csv file
Csv file should contain the following columns: - chromosome:
The chromosome the variant is on. Must include the “chr” prefix.
- position:
The position of the variant in 1-based coordinates.
- ref_allele:
- The reference allele of the variant. Special notes:
- Insertions: must be the base pair in the reference genome
just before where the insertion starts.
- Deletions: must start with the base pair in the reference allele
just before where the deletion starts.
- alt_allele:
- The alternate allele of the variant. Special notes:
Insertions: must start with the reference allele. Deletions: must be the base pair in the reference genome
just before where the deletion starts.
- genotype:
- The expected genotype of the variant. Acceptable values:
0 - wildtype 1 - heterozygous 2 - homozygous 3 - missing
- type:
The type of variant. Only rows with type=germline are used for demultiplexing.
- sample_id:
The name of the sample
Example
chromosome,position,ref_allele,alt_allele,genotype,type,sample_id chr1,115256669,G,A,0,germline,SampleA chr1,115256669,G,A,2,germline,SampleA
- Parameters:
csv_file – The csv file with the truth data
- Returns:
pd.DataFrame with the truth data.
Index is the sample_id and columns are the variant ids. Value represents expected VAF in 0-100 range. Unknown/missing values are -50.
< Class Truth