Protein.cluster_and_label#
missionbio.mosaic.protein.Protein.cluster_and_label
- Protein.cluster_and_label(truth: Optional[Union[str, Path, Truth]] = None, merge: str = 'labeled', min_prob_diff: float = 0.9, max_adjusted_mixing: float = 0.3, min_distance_for_doublet: int = 5, sticky_antibodies: Sequence = ('IgG1', 'IgG2a', 'IgG2b'), cluster: bool = True) ClusterMethod #
Cluster and label cells using the given truth.
The accuracy of the assignment is subject to the accuracy of the NSP normalization. If the scaling factor is not estimated correctly the assignment will have a large number of unassigned cells. In such cases, try to estimate the scaling factor using the
read_depth_dependence()
method of the isotype control.- Parameters:
- truth:
A Truth object or a path to a YAML file containing the truth. If None, the
builtin()
truth for PBMCs is used. It can be visualised using theplot()
function on the truth object.- merge:
How to merge cell types with multiple clusters
- “none”:
Keep all clusters. The doublets are identified by the “:” in the cluster name
- “mixed”:
Merge all doublet clusters into a single “Mixed” cluster e.g. “T cell:B cell” and “T cell:Monocytes” are merged into “Mixed”
- “labeled”:
Merge clusters with multiple cell types into a single cluster. e.g. T cell-1, T cell-2 are merged into T-cell
- “all”:
Merge all “Unassigned” clusters into a single cluster e.g. Unassigned-1, Unassigned-2 are merged into “Unassigned”
- min_prob_diff:
Minimum difference in probability between the most likely and second most likely cell type for a cell to be assigned to a single cell type. This is also used for the assignment of doublet clusters. Here the probability refers to the probability of the cluster being a particular cell type given the normalized read counts. See the example in
assign_from_truth()
.- max_adjusted_mixing:
Maximum adjusted mixing rate for a cluster with a mixed signature to be labelled as “Mixed”. If the adjusted mixing rate is higher, the doublet is assigned to the “Mixed Like” cluster.
- min_distance_for_doublet:
The minimum number of antibodies that must be different between two clones for their doublet to be considered. Increase this if too many mixed cells are observed.
- sticky_antibodies:
Antibodies that should be used to identify sticky cells
- cluster:
If True, the cells will be clustered using graph-community clustering before labeling. The default parameters are k=5 and random_state=42. If False then the existing label row attribute will be used as the clusters.
- Returns:
- ClusterMethod object:
It stores the labels for the cells with the most likely cell type from the truth.
See also
Notes
This modifies the LABEL attribute of the assay.
< Class Protein