Basic usage of mosaic#

Objective
To showcase the minimum number of steps
required to do tertiary analysis of DNA + Protein
and some of the different ways to look at the data

Major questions answered:

Do we see DNA clones?
Do we see protein cell types
Is the differential expression significant?
Do the clones correlate with the cell types?

Things not shown:

All available methods eg. Filtering of nearby variants, variant annotation, plots
Discussing all methods and their options - Documented here
Systemic variations seen in protein data

Setup#

H5 files are a replacement of loom files. These are part of the DNA and protein pipeline output.

Note: This example h5 file trimmed specifically for this analysis

Here is the complete documentation on the load function

import missionbio.mosaic as ms

sample = ms.load_example_dataset("3 cell mix")  # Use ms.load(path_to_h5) for custom h5 files

Loading, <_io.BytesIO object at 0x7fd513095310>

Loaded in 0.4s.

Data Structure#

Dna, Cnv, and Protein are sub classes of the _Assay class
The information is stored in four ways, and the user
can change each of those

1. metadata (add_metadata / del_metadata):
    dictionary containing metrics of the assay

2. row_attrs (add_row_attr / del_row_attr):
    dictionary which contains 'barcode' as one of
    the keys. All the values must be of the same
    length i.e. match the number of barcodes
    This is the attribute where 'label', 'pca',
    and 'umap' values are added

3. col_attrs (add_col_attr / del_col_attr):
    dictionary which contains 'ids' as one of
    the keys. All the values must be of the same
    length i.e. match the number ids
    'ids' contains variants for DNA assays
    and anitobides for Protein assays

4. layers (add_layer / del_layer):
    dictionary containing 'read_counts' as one of 
    the metrics. All the values have the shape
    (num barcodes) x (num ids). This is the attribute
    where 'normalized_counts' will be added

Sample holds the Dna and Protein information

sample.protein

<missionbio.mosaic.protein.Protein at 0x7fd512253310>

sample.protein.metadata

{'sample_name': array([['3 cell mix']], dtype=object),
 '__mosaic_cluster_description': 'graph-community on pca with Neighbours set to 30',
 '__mosaic_clustered': 1,
 '__mosaic_data_prep_pca': 'scaled CLR',
 '__mosaic_data_prep_scale': 'CLR',
 '__mosaic_data_prep_umap': 'PCA of scaled CLR',
 '__mosaic_initialize': 0,
 '__mosaic_prepped': 1,
 '__mosaic_visual_type': array(['Plots', 'Heatmap'], dtype=object),
 'n_reads': 128914059,
 'n_reads_trimmed': 128590556,
 'n_reads_valid_ab_barcodes': 117037906,
 'n_reads_valid_cell_barcodes': 121712026,
 'pipeline_version': '1.1.0'}

sample.protein.row_attrs

{'barcode': array(['AACAACCTAAACTTGTCG', 'AACAACTGGTACGTTGGA', 'AACAATGCAAGACCACGC',
        ..., 'TTGTCAACCTACAACACC', 'TTGTCAACCTAGTAACGG',
        'TTGTTAGAGATCAGGATG'], dtype=object),
 'label': array(['Mixed', 'Jurkat', 'TOM-1', ..., 'KG-1', 'KG-1', 'KG-1'],
       dtype=object),
 'pca': array([[-0.00739676,  0.01677373, -0.03541206, ..., -0.01174839,
         -0.00960959,  0.00061406],
        [-0.02040745, -0.0168572 , -0.00686639, ..., -0.01645682,
         -0.01039511,  0.01420925],
        [ 0.00012047,  0.04579714, -0.00975074, ..., -0.00724615,
          0.0393313 , -0.03267846],
        ...,
        [ 0.0226351 , -0.01385157, -0.00756582, ...,  0.00323248,
          0.03955304,  0.02694122],
        [ 0.01691241, -0.00197575,  0.00163568, ..., -0.00587812,
          0.0005414 ,  0.01381873],
        [ 0.01485945, -0.00385416, -0.00495111, ...,  0.02740017,
          0.03469586, -0.01425199]]),
 'sample_name': array(['3 cell mix', '3 cell mix', '3 cell mix', ..., '3 cell mix',
        '3 cell mix', '3 cell mix'], dtype=object),
 'umap': array([[ 4.1881294,  2.1518862],
        [ 4.405163 ,  7.9102015],
        [ 5.964074 , -1.2190977],
        ...,
        [-4.759269 , -3.1054091],
        [-5.3096914, -1.1285425],
        [-5.050415 , -2.969383 ]], dtype=float32)}

sample.protein.ids()

array(['CD110', 'CD117', 'CD123', 'CD135', 'CD19', 'CD24', 'CD3', 'CD33',
       'CD34', 'CD38', 'CD44', 'CD45', 'CD56', 'CD90', 'HLA-DR',
       'Mouse IgG1k'], dtype=object)

sample.dna.layers

{'AF': array([[ 19.80676329,  28.57142857,   1.4084507 , ...,  27.65957447,
          25.0965251 ,  13.49693252],
        [ 38.55421687,   0.        ,   0.        , ...,  42.30769231,
          31.69014085,  50.        ],
        [  0.76335878, 100.        ,   0.        , ...,   0.        ,
           0.41322314,   0.41322314],
        ...,
        [  0.        ,   0.        ,  15.38461538, ...,   0.        ,
           0.48543689,   1.32890365],
        [  0.        ,   0.        ,  42.85714286, ...,   0.        ,
           0.70921986,   0.        ],
        [  0.        ,   0.        ,  50.        , ...,   0.        ,
           0.        ,   3.26086957]]),
 'AF_MISSING': array([[ 19.80676329,  28.57142857,   1.4084507 , ...,  27.65957447,
          25.0965251 ,  13.49693252],
        [ 38.55421687,   0.        ,   0.        , ...,  42.30769231,
          31.69014085,  50.        ],
        [  0.76335878, 100.        ,   0.        , ...,   0.        ,
           0.41322314,   0.41322314],
        ...,
        [  0.        ,   0.        ,  15.38461538, ...,   0.        ,
           0.48543689,   1.32890365],
        [  0.        ,   0.        ,  42.85714286, ...,   0.        ,
           0.70921986,   0.        ],
        [  0.        ,   0.        ,  50.        , ...,   0.        ,
           0.        ,   3.26086957]]),
 'DP': array([[207,  63,  71, ...,  47, 259, 326],
        [ 83,  36,  41, ...,  26, 142, 124],
        [131,   8,  17, ...,  19, 242, 242],
        ...,
        [ 39,  23,  13, ...,  31, 206, 301],
        [157,  21,  35, ...,  20, 141, 186],
        [ 76,   6,  20, ...,   7,  81, 184]], dtype=int16),
 'GQ': array([[99, 99, 99, ..., 99, 99,  0],
        [99, 54, 99, ..., 99, 99, 99],
        [99, 23, 51, ..., 57, 99, 99],
        ...,
        [99, 36, 38, ..., 93, 99, 99],
        [99, 33, 99, ..., 60, 99, 99],
        [99,  9, 99, ..., 21, 99, 99]], dtype=int8),
 'NGT': array([[1, 1, 0, ..., 1, 1, 0],
        [1, 0, 0, ..., 1, 1, 1],
        [0, 2, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 1, ..., 0, 0, 0],
        [0, 0, 1, ..., 0, 0, 0],
        [0, 0, 1, ..., 0, 0, 0]], dtype=int8),
 'NGT_FILTERED': array([[3, 1, 0, ..., 1, 1, 3],
        [1, 0, 0, ..., 1, 1, 1],
        [0, 3, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 3, ..., 0, 0, 0],
        [0, 0, 1, ..., 0, 0, 0],
        [0, 3, 1, ..., 3, 0, 0]], dtype=int8),
 'scaled_counts': array([[ 0.17521769,  0.77875598, -0.58027676, ...,  0.40544186,
          0.35280254, -0.1315704 ],
        [ 0.95461827, -0.33207324, -0.61815498, ...,  0.97618787,
          0.6251839 ,  1.37227722],
        [-0.61648665,  3.55582902, -0.61815498, ..., -0.67227966,
         -0.66686127, -0.670591  ],
        ...,
        [-0.64822228, -0.33207324, -0.20440833, ..., -0.67227966,
         -0.66387813, -0.63286694],
        [-0.64822228, -0.33207324,  0.53442497, ..., -0.67227966,
         -0.65463368, -0.6876149 ],
        [-0.64822228, -0.33207324,  0.72652162, ..., -0.67227966,
         -0.68393146, -0.55327411]])}

Go to the top

DNA Analysis#

Topcis covered

Whitelist of variants
Manually selecting variants

Basic filtering#

Many filtering options are available
use the documentation shared earlier,
or the help() function to get the same
information here

help(sample.dna.filter_variants)

Help on method filter_variants in module missionbio.mosaic.dna:

filter_variants(min_dp=10, min_gq=30, vaf_ref=5, vaf_hom=95, vaf_het=35, min_prct_cells=50, min_mut_prct_cells=1) method of missionbio.mosaic.dna.Dna instance
    Find informative variants.
    
    This method also adds the `NGT_FILTERED` layer to the assay
    which is a copy of the NGT layer but with the NGT for the
    cell-variants not passing the filters set to 3 i.e. missing.
    
    Parameters
    ----------
    min_dp : int
        The minimum depth (DP) for the call to be considered.
        Variants with less than this DP in a given
        barcode are treated as no calls.
    min_gq : int
        The minimum genotype quality (GQ) for the call to be
        considered. Variants with less than this GQ
        in a given barcode are treated as no calls.
    vaf_ref : float [0, 100]
        All reference calls (NGT = 0) with VAF > vaf_ref
        are converted to no calls (NGT = 3) for each barcode
        and variant in the NGT matrix
    vaf_het : float [0, 100]
        All hetrozygous calls (NGT = 1) with VAF < vaf_het
        are converted to no calls (NGT = 3) for each barcode
        and variant in the NGT matrix
    vaf_hom : float [0, 100]
        All homozygous calls (NGT = 2) with VAF < vaf_hom
        are converted to no calls (NGT = 3) for each barcode
        and variant in the NGT matrix
    min_prct_cells : float [0, 100]
        The minimum percent of total cells in which the variant
        should be present (NGT ∈ {0, 1, 2}) after the
        filters are applied.
    min_mut_prct_cells : float [0, 100]
        The minimum percent of the total cells in which the
        variant should be mutated, (NGT ∈ {1, 2}) after the
        filters are applied.
    
    Returns
    -------
    numpy.ndarray

# Filter variants
# This is the default insights filtering method

dna_vars = sample.dna.filter_variants()
dna_vars

array(['chr2:25458546:C/T', 'chr2:25469502:C/T', 'chr2:25470426:C/T',
       'chr2:25470573:G/A', 'chr2:209113192:G/A', 'chr4:55599436:T/C',
       'chr4:106154990:TATAGATAG/T', 'chr4:106154990:T/TATAG',
       'chr4:106158216:G/A', 'chr4:106190862:T/C', 'chr4:106197469:G/A',
       'chr6:62094287:A/T', 'chr7:148506064:A/G', 'chr7:148529851:G/GA',
       'chr7:148543525:A/G', 'chr10:5554293:T/C', 'chr10:77210191:C/T',
       'chr10:106721610:G/A', 'chr11:32414333:G/T', 'chr11:32417945:T/C',
       'chr12:112888239:C/T', 'chr13:28597686:G/A', 'chr13:28610183:A/G',
       'chr14:56969005:C/T', 'chr17:7577427:G/A', 'chr17:7578176:C/T',
       'chr17:7578263:G/A', 'chr20:31023356:G/T', 'chr21:36252917:C/T'],
      dtype='<U26')

# Check the number of filtered variants

len(dna_vars)

Whitelist#

Simply appnding the whitelist to the list of filtered
variants is sufficient to then select the variants
using the slice notation

i.e. sample.dna[{list of barcodes}, {list of ids}]

whitelist = ['chr1:115256513:G/A', 'chr21:44514718:C/T']

final_vars = whitelist + list(dna_vars)

len(final_vars)

# Selecting all cells and final variants

sample.dna = sample.dna[sample.dna.barcodes(), final_vars]

# Check the shape i.e. (Number of barcodes, number of ids)
# of the final filtered dna object

sample.dna.shape

(2476, 29)

Manual variant selection#

Heatmaps are interactive. Clicking on it selects
the corresponding id whose value is stored in the
`selected_ids` attribute of the object

eg. sample.dna.selected_ids

sample.dna.stripplot(attribute='AF', colorby='GQ')

sample.dna.heatmap(attribute='AF')

if len(sample.dna.selected_ids) > 0:
    sample.dna = sample.dna.drop(sample.dna.selected_ids)

Clustering#

DNA has a custom clustering method called `find_clones`

It projects the data on a UMAP and then performs
dbscan to identify unique clusters, which are then
merged in case they were formed due to missing
information

sample.dna.find_clones()

Unique clusters found - 6
Clusters after removing missing data - 5

/Users/casp/Documents/code/mosaic/src/missionbio/mosaic/dna.py:133: UserWarning:

Using the "umap" that is already present in the row attributes.

sample.dna.row_attrs

{'barcode': array(['AACAACCTAAACTTGTCG', 'AACAACTGGTACGTTGGA', 'AACAATGCAAGACCACGC',
        ..., 'TTGTCAACCTACAACACC', 'TTGTCAACCTAGTAACGG',
        'TTGTTAGAGATCAGGATG'], dtype=object),
 'label': array(['4', '2', '3', ..., '1', '1', '1'], dtype=object),
 'pca': array([[ 0.01696292,  0.00791857,  0.02525593, ...,  0.01081608,
         -0.01028663,  0.00942911],
        [ 0.01789271, -0.01653034, -0.00027796, ...,  0.00559107,
          0.00686503, -0.00267619],
        [ 0.0103838 ,  0.04182529, -0.01736327, ...,  0.04889085,
         -0.00238786, -0.01065905],
        ...,
        [-0.02319658, -0.00374513, -0.01804908, ...,  0.01024397,
          0.00164552,  0.01918958],
        [-0.02427855, -0.00700538, -0.00999176, ...,  0.00866599,
         -0.01266088,  0.01117126],
        [-0.02028216, -0.00607941,  0.02598467, ...,  0.01244461,
         -0.00450799,  0.00174288]]),
 'sample_name': array(['3 cell mix', '3 cell mix', '3 cell mix', ..., '3 cell mix',
        '3 cell mix', '3 cell mix'], dtype=object),
 'umap': array([[ 4.2597423,  3.5266023],
        [ 5.3297005,  5.5066705],
        [ 3.561163 , -8.171488 ],
        ...,
        [-5.4089503, -1.4769189],
        [-5.6915836, -1.3970627],
        [-5.5950303, -0.1773623]], dtype=float32)}

sample.dna.scatterplot(attribute='umap', colorby='label')

# AF_MISSING is the same as the AF layer except that it stores the missing values as -50 instead of 0

sample.dna.heatmap('AF_MISSING')

Conclusion#

1. Basic filtering of barcodes ids demonstrated
2. Basic DNA filtering functionality showcased

Go to the top

CNV Analysis#

Preliminary heatmap of CNV shows that there could be two clusters

Topics covered

Dimension reduction options and their effects

Observation#

sample.cnv.normalize_reads()
sample.cnv.heatmap(attribute='normalized_counts')

PCA options#

Here the UMAP options are kept constant
The only parameter in PCA is the number of components

Here we see how to determine this value, and the effect
when we deviate from this value

sample.cnv.run_pca(attribute='normalized_counts', components=6, show_plot=True)
sample.cnv.run_umap(attribute='pca', min_dist=0, n_neighbors=100)

../_images/bd9559b0b7a17148b861d8c626101985c6dd38fc73f2463d22efd53c914c51c3.png

Visualization#

The result of the dimension reduction analysis is
visualized using a scatterplot of the umap

sample.cnv.cluster(attribute='umap', method='dbscan', eps=0.55)

sample.cnv.scatterplot(attribute='umap', colorby='label')

CNV Conclusion#

Given all other variables are kept constant

Too many PCA components may result in merging of clusters
Too few PCA component may result in splitting of clusters
The appropriate number of components can be determined using the elbow plot

Go to the top

Protein Analysis#

Topics covered

Basic workflow
Custom clustering eg. selection on biaxial plot
Custom methods by adding layers

Basic workflow#

# Downsampling and clustering similar to CNV

sample.protein.normalize_reads('CLR')
sample.protein.run_pca(attribute='normalized_counts', components=5)
sample.protein.run_umap(attribute='pca')

sample.protein.cluster(attribute='pca', method='graph-community', k=100)

Creating the Shared Nearest Neighbors graph.

Identifying clusters using Louvain community detection.

Number of clusters found: 10.
Modularity: 0.753

sample.protein.heatmap(attribute='normalized_counts')

sample.protein.scatterplot(attribute='umap', colorby='label')

# Re cluster based on the observations from the UMAP

sample.protein.cluster(attribute='umap', method='dbscan')

sample.protein.ids()[:1]

array(['CD110'], dtype=object)

# Prefered way to look at protein expression profiles

features = ["CD110"]

sample.protein.ridgeplot(attribute='normalized_counts',
                         splitby='label',
                         features=features)

# UMAP with the expression for each of the selected protein overlayed
# In case of error, make sure that ids have been selected on the heatmap and shown in sample.protein.selected_ids

sample.protein.scatterplot(attribute='umap',
                           colorby='normalized_counts',
                           features=['CD34', 'CD44', 'HLA-DR'])

Custom clustering#

When `colorby` is not provided for any scatterplot
the lasso tool can be used to cluster the cells
based on the selection made

# Selction on biaxial scatterplot
# The same can be done for the UMAP when labels=False is passed

sample.protein.feature_scatter(layer='normalized_counts',
                               ids=['CD90', 'CD3'])

Custom methods by adding layers#

If someone is interested in trying their methods,
they can simply modify the appropriate layers, attributes
and metadata to plugin their step in this workflow

# Custom normalization by changing the `normalized_counts` layer

import numpy as np

log_reads = np.log10(10 + sample.protein.layers['read_counts'])
norm = np.divide(log_reads, log_reads.mean(axis=1).reshape(-1, 1))

sample.protein.add_layer('normalized_counts', norm)

Other examples include:

custom labels -> 'label' row_attr
custom palette -> 'palette' metadata   

Protein Conclusion#

1. Protein analysis workflow similar to CNV
2. Different clustering methods can result in
   different types of clusters being identified
3. It is possible to have custom clustering for
   any scatterplot by using the lasso tool
4. Custom analysis is possible by modifying appropriate
   layers, attributes and metadata

Go to the top

Statistical Significance#

The significane of differential expression
based on a t-test can be looked at using
the `feature_signature` method

dir(sample.protein)

['DeduplicationWarning',
 'NAN_L2',
 '_Assay__heatmap',
 '__abstractmethods__',
 '__annotations__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_brush',
 '_community_clustering',
 '_ensure_no_duplicates',
 '_heatmap_selection',
 '_palette',
 '_rename_barcodes',
 '_rename_duplicate_values',
 '_shape',
 '_switch_color',
 '_update_sample_metadata',
 'add_col_attr',
 'add_extension',
 'add_layer',
 'add_metadata',
 'add_row_attr',
 'assay_color',
 'barcodes',
 'cells_per_ab',
 'cluster',
 'clustered_barcodes',
 'clustered_ids',
 'col_attrs',
 'copy',
 'create',
 'del_col_attr',
 'del_layer',
 'del_metadata',
 'del_row_attr',
 'drop',
 'extensions',
 'feature_scatter',
 'ga',
 'get_attribute',
 'get_labels',
 'get_palette',
 'get_row_ids',
 'get_scaling_factor',
 'get_signal_profile',
 'group_clusters',
 'heatmap',
 'highlight_heatmap',
 'ids',
 'layers',
 'metadata',
 'name',
 'normalize_barcodes',
 'normalize_reads',
 'read_distribution',
 'reads_to_ab',
 'rename_labels',
 'rename_sample',
 'reset_ids',
 'ridgeplot',
 'row_attrs',
 'run_lda',
 'run_pca',
 'run_umap',
 'samples',
 'scale_data',
 'scatterplot',
 'select_columns',
 'select_rows',
 'selected_bars',
 'selected_ids',
 'set_ids_from_cols',
 'set_labels',
 'set_palette',
 'set_selected_labels',
 'shape',
 'signature',
 'signaturemap',
 'sort_columns',
 'sort_rows',
 'split',
 'stripplot',
 'test_signature',
 'title',
 'violinplot']

pval, tstat = sample.protein.test_signature(attribute='normalized_counts')

pval

	CD110	CD117	CD123	CD135	CD19	CD24	CD3	CD33	CD34	CD38	CD44	CD45	CD56	CD90	HLA-DR	Mouse IgG1k
1	7.710438e-135	2.880376e-181	1.389202e-01	4.401413e-139	1.100631e-115	5.561488e-213	6.523013e-207	0.000000e+00	0.000000e+00	1.837784e-287	0.000000e+00	2.287553e-02	1.592769e-191	6.326674e-224	7.221462e-04	3.091710e-153
2	8.800550e-109	2.374534e-29	5.145596e-94	5.200097e-86	3.119501e-90	2.120185e-42	0.000000e+00	1.401892e-68	2.506564e-254	9.959116e-298	5.191803e-147	8.219436e-306	3.504939e-239	0.000000e+00	0.000000e+00	4.250049e-124
3	1.258248e-13	6.297386e-53	5.197609e-125	4.268553e-27	0.000000e+00	0.000000e+00	3.586392e-63	1.853278e-96	1.010879e-48	5.386075e-01	9.412945e-136	0.000000e+00	9.556695e-01	3.126562e-72	3.294629e-296	3.088833e-15
4	4.818229e-09	5.738795e-25	2.111025e-03	1.141661e-11	1.388346e-23	7.238226e-26	3.786053e-13	1.311617e-34	9.702871e-20	1.475976e-11	4.835089e-25	7.794311e-01	3.916305e-01	7.962307e-10	1.142897e-09	9.995305e-11

pval = pval + 10 ** -50 + pval
pvals = -np.log10(pval) * (tstat > 0)

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 10))
sns.heatmap(pvals.T, vmax=50, vmin=2)

<matplotlib.axes._subplots.AxesSubplot at 0x7fd5147ba9d0>

../_images/8765cdf085c72edb0cb0b8ac08d31d810d93ea334ce70b6da3b6e0b5b066979e.png

Conclusion

Statistical significance of the differential expression
can be ascertained. Median values can be explored for DNA
to determine the difference between clusters.

Go to the top

Combined Visualizations#

Visualization for multiple assays at once

Clone vs Analyte#

CNV#

sample.clone_vs_analyte('cnv')

../_images/0ab193a3ac4f43b1b62acbea48a8dc96faaa78f9c740310025bdc12694599d66.png

Protein#

sample.clone_vs_analyte('protein')

../_images/a84358932af27f318746d5007f86f377268814c6fb417cfac2f7b612698fea8e.png

# Filtering protein and cnv to improve the visualization

sample.protein = sample.protein[:, ['CD3', 'CD90']]
sample.cnv = sample.cnv[:, 58:85]
sample.clone_vs_analyte('protein')

../_images/3568bedc3f95426f7ee1d1a956e9172360444469a423e973da211605976c7084.png

# Certain clones can also be dropped, but they must be dropped from all assays
# Hence the sample object is sliced in this case
# In this case it is better to store the new sample in a separate variable

# This returns the dna barcodes with the given labels
select_bars = sample.dna.barcodes(['2', '3', '4'])

sample_subset = sample[select_bars]
sample_subset.clone_vs_analyte('protein')

../_images/b99d1e0f623e392bb63a898ce8cbdeb5762cc3545c38dc6a3366edd13bdf44a6.png

# The ids can also be reset to the entire set

sample.reset('cnv')
sample.reset('protein')
sample.clone_vs_analyte('protein')

Multi assay heatmap#

sample.heatmap(clusterby='dna', sortby='protein', drop='cnv', flatten=False)

# Try the following
# sample.heatmap(clusterby='dna', sortby='protein', drop='cnv', flatten=True)
# sample.heatmap(clusterby='protein', sortby='dna', drop='cnv', flatten=False)
# sample.heatmap(clusterby='dna', sortby='protein', flatten=False)

Go to the top

Saving#

The analysis can be saved to an h5 file.
This final trimmed file will be much smaller than the original h5 file.
It can be opened in Insights, or back again in Mosaic

ms.save(sample, './basics.analyzed.h5')

Data from h5 files can be efficiently manipulated,
visualized, and inferred using Mosaic.

Mosaic v2.4 documentation

Basic usage of mosaic

Contents

Basic usage of mosaic#

Setup#

Data Structure#

DNA Analysis#

Basic filtering#

Whitelist#

Manual variant selection#

Clustering#

Conclusion#

CNV Analysis#

Observation#

PCA options#

Visualization#

CNV Conclusion#

Protein Analysis#

Basic workflow#

Custom clustering#

Custom methods by adding layers#

Protein Conclusion#

Statistical Significance#

Combined Visualizations#

Clone vs Analyte#

CNV#

Protein#

Multi assay heatmap#

Saving#