load#
missionbio.mosaic.io.load
- load(filepath: Any, filter_cells: bool = False, filter_variants: bool = True, whitelist: Optional[Sequence] = None, raw: bool = False, single: bool = False) Union[Sample, SampleGroup] #
Loading the .h5 file with one or more assays.
This is the preferred way of loading .h5 files.
It directly returns a Sample object, which contains all the assays. Those assays that were not present are stored as None.
- Parameters:
- filepath:
The path to the .h5 multi-omics file.
- filter_cells:
If True, then only the cells called by the completeness algorithm are loaded. Complete cells are those with greater than 80% completeness. If False, then all the cells are loaded.
- filter_variants:
If False, then all the variants are loaded. If True then only the filtered dna variants are loaded. The filtered DNA variants are those that pass the
filter_variants()
function. This list can be obtained by loading all variants by setting filter_variants=False and then runningfilter_variants()
on it. Information about the default filtered variants is stored in the filtered column attribute of theDna
object.- whitelist:
The specific dna variants to load. The items in the whitliset can have three formats:
- Variant IDs - chr1:12345:A/C
These look for exact matches in the variants
- Positions - chr1:12345
These look for all the variants at that position in variants
- Regions - chr1:12345-12350
These look for all the variants in that region in variants Both 12345 and 12350 are included
The four cases for whitelist and filter_variants are:
- filter_variants - False, whitelist - None
Load all the variants
- filter_variants - True, whitelist - None
Only load the variants passing as per the filtered column attribute
- filter_variants - False, whitelist - Given
Only load the variants in the whitelist
- filter_variants - True, whitelist - Given
Only load the variants passing as per the filtered column attribute
- raw:
Whether the raw counts are to be loaded. This will load cnv_raw and protein_raw attributes of the
Sample
class.- single:
Whether to load as a single sample despite being a multi sample h5 file. If False then a
SampleGroup()
object is returned. This splits each sample into a differentSample
object. This helps with batch corrections when normalising the data, since each sample is treated separately. If single=True then a singleSample
object is returned. This makes interacting with the data easier, but care must be taken when normalising the data. Themerge()
function and thesplit()
function can be used to switch between the two object types.
- Returns:
- missionbio.mosaic.sample.Sample / missionbio.mosaic.samplegroup.SampleGroup
- Raises:
- Exception
When the h5 file format is not supported.