load#

missionbio.mosaic.io.load

load(filepath: Any, filter_cells: bool = False, filter_variants: bool = True, whitelist: Optional[Sequence] = None, raw: bool = False, single: bool = False) Union[Sample, SampleGroup]#

Loading the .h5 file with one or more assays.

This is the preferred way of loading .h5 files.

It directly returns a Sample object, which contains all the assays. Those assays that were not present are stored as None.

Parameters:
filepath:

The path to the .h5 multi-omics file.

filter_cells:

If True, then only the cells called by the completeness algorithm are loaded. Complete cells are those with greater than 80% completeness. If False, then all the cells are loaded.

filter_variants:

If False, then all the variants are loaded. If True then only the filtered dna variants are loaded. The filtered DNA variants are those that pass the filter_variants() function. This list can be obtained by loading all variants by setting filter_variants=False and then running filter_variants() on it. Information about the default filtered variants is stored in the filtered column attribute of the Dna object.

whitelist:

The specific dna variants to load. The items in the whitliset can have three formats:

  1. Variant IDs - chr1:12345:A/C

    These look for exact matches in the variants

  2. Positions - chr1:12345

    These look for all the variants at that position in variants

  3. Regions - chr1:12345-12350

    These look for all the variants in that region in variants Both 12345 and 12350 are included

The four cases for whitelist and filter_variants are:

  1. filter_variants - False, whitelist - None

    Load all the variants

  2. filter_variants - True, whitelist - None

    Only load the variants passing as per the filtered column attribute

  3. filter_variants - False, whitelist - Given

    Only load the variants in the whitelist

  4. filter_variants - True, whitelist - Given

    Only load the variants passing as per the filtered column attribute

raw:

Whether the raw counts are to be loaded. This will load cnv_raw and protein_raw attributes of the Sample class.

single:

Whether to load as a single sample despite being a multi sample h5 file. If False then a SampleGroup() object is returned. This splits each sample into a different Sample object. This helps with batch corrections when normalising the data, since each sample is treated separately. If single=True then a single Sample object is returned. This makes interacting with the data easier, but care must be taken when normalising the data. The merge() function and the split() function can be used to switch between the two object types.

Returns:
missionbio.mosaic.sample.Sample / missionbio.mosaic.samplegroup.SampleGroup
Raises:
Exception

When the h5 file format is not supported.