Truth.read_csv#
missionbio.demultiplex.dna.truth.Truth.read_csv
- static Truth.read_csv(csv_file: Path) Truth#
Load dna truth from a csv file
The CSV file should contain the following columns:
- variant_id:
It contains the name of the variant in the
chromosome:position:ref_allele/alt_alleleformat.- chromosome:
The chromosome the variant is on. Must include the “chr” prefix.
- position:
The position of the variant in 1-based coordinates.
- ref_allele:
The reference allele of the variant. Special notes:
For insertions it must be the base pair in the reference genome just before where the insertion starts.
For deletions it must start with the base pair in the reference allele just before where the deletion starts.
- alt_allele:
The alternate allele of the variant. Special notes:
For insertions it must start with the reference allele.
For deletions it must be the base pair in the reference genome just before where the deletion starts.
- genotype:
The expected genotype of the variant. Acceptable values are
0(wildtype),1(heterozygous),2(homozygous), and3(missing)
- type:
The type of variant. Only rows with type=germline are used for demultiplexing.
- sample_id:
The name of the sample
Example
variant_id,genotype,type,sample_id chr1:115256669:G/A,0,germline,SampleA chr1:115256669:G/A,2,germline,SampleA
- Parameters:
csv_file – The csv file with the truth data
- Returns:
pd.DataFrame with the truth data.
Index is the sample_id and columns are the variant ids. Value represents expected VAF in 0-100 range. Unknown/missing values are -50.