Changelog#
v3.12.0#
Release date: 2025-21-04
Added#
A new attribute for all assays: info. This can be used to store arbitrary information of various types including pandas dataframes and dictionaries.
Unlike row attributes, column attributes, and layers, it is not confined to any shape. For example, the signature of the clones generated for only
the somatic variants would have fewer columns than ids and fewer rows than cells in the assay. Compass stores a lot information in it’s variables,
until now there was no easy way of saving it in the h5. These are dataframes with shapes that do not align with that of the assay. These values can
now be stored in the assay info as follows:
>>> sample.dna.add_info("somatic_signature", compass.node_genotypes_)
>>> sample.dna.add_info("assignment_probability", compass.probability_)
The assays also now store their palette in the info as a dictionary. sampleinfo() is a shortcut for accessing info for single sample assays.
>>> sample.dna.sampleinfo["palette"]
The functions for handling info are has_info(), add_info(), and del_info()
Updates to COMPASS
Update to the latest version of COMPASS
Add option to rename compass clones with a new probability value using
relabel().Allow passing prefixes for file names to
run()Added option to get fractions of each clone using
node_fractions()Allow passing directories to
run()COMPASS can call LOH clones without calling CNV using the
cnvparameter inrun().
Plotting updates
Cytoband information added to the CNV ploidy
signaturemap()andplot_ploidy()figures. These are also shown in the CopyNumber workflow.ticksoption toheatmap()to disable plotting of ticks for assay with large number of ids.The scatterplot size modifies depending on the length of the labels to ensure that the plotting area is constant.
Updates to CNV
Locally stored gene name annotations are used instead of pulling them from Ensembl when running
get_gene_names().When using
get_gene_names(), the gene name will always be the best match for the amplicon instead of mutliple values separated by a/get_gene_names()returns the gene names while also adding it to the column attributes.
Updates to DNA
get_annotated_ids()returns a list of human-readable names for all the variants in the assay.set_annotated_ids()updates the ids to be human-readable names.dna.Dna.genome()to easily access the genome_version metadata value.Store annotations in the DNA assay info instead of the column attributes to ease their access.
dna.Dna.snps()to quickly get the ids that are SNPs.
Functionality updates
Option to disable matching of ids in
SampleGroupusing thematch_idsparameter.Ability to load h5 files with only raw counts.
whitelist=[] is treated just like whitelist=None in
load()Deduplication of barcodes is done using integers instead of sample names and no warning is raised when deduplication of barcodes is performed. It can be manually performed using
deduplicate_barcodes()and inverted usingnormalize_barcodes().Store the palette in the info instead of the metadata of the assay, making saved h5 files valid in format even with the stored palette.
sample.Sample.vdjto easily access the VDJ assay in the h5 file.
Fixed#
ADO score shown in the figure is the same as the ADO score shown in the table in the
VariantSubcloneTableworkflowUse the correct genome version from the dna assay in
COMPASSget_annotationsraises an appropriate error when the genome version is not supported.
v3.7.0#
Release date: 2024-08-05
Added#
filter_somatic_variants()for automatic filtering of pathogenic somatic variants.dna.Dna.assign_from_truth()to label the cells for a known set of clones.protein.Protein.cluster_and_label()to find all protein clusters and label them based on the provided truth. This function can be used to novel cell types.protein.Protein.label_sticky_cells()to mark cells which are likely to be sticky.protein.Protein.assign_from_truth()label the cells for a known set of cell types. By default, it labels the PBMC subtypes.protein.Protein.truth()to convert cluster signatures to a truth that can be used forassign_from_truth()Ability to pass an external control to
compute_ploidy()read_depth_dependence()- a plot to quickly visualize the need and effectiveness of NSP normalization.Option to load a subset of the assays in the h5 file.
No error is raised when h5 files with unknown assays are loaded.
Raise an error when the number of variants to annotate is more than 1,000. This is a safeguard to prevent incorrect API calls.
Varsome annotations are stored locally and will not be fetched again unless the local file is deleted.
copyparameter toget_attribute()to return a view of the data instead of a copy.Option to pass
orderof labels toridgeplot().ADO score is formatted more conveniently in the
workflows.variant_subclone_table.VariantSubcloneTableworkflow. If it’s 0, then it’s shown as “-” and if <0.05 then it’s shown as “~0.0”.Ability to rename a sample using
rename().
Changed#
sample.Sample.nameis a now a property, and cannot be set. It returns a value according to the currentsample_namemetadata.assay._Assay.titleis a now a property, and cannot be set. It returns a value according to the current sample_name metadata.Behavior of
default_labelinassay._Assay.set_labels(). Whendefault_labelisNone, only the labels of the provided barcodes are updated.normalized_countsincompute_ploidy()is no longer used. Theread_countslayer is used directly.ANNOTATION_COLUMNSconstant was moved tomissionbio.annotation.constantsUse
pynndescentinstead ofscikit-learnto speed up nearest neighbors calculation during graph-community clustering. Results will not be backwards compatible.
Fixed#
Ordering of the barcodes in the heatmap when a subset of the variants are used.
Fetching of CNV amplicon gene names for regions where ensembl returns an incomplete response.
Allow custom grouping of amplicons for
cnv.CNV.heatmap()by passing amplicons tofeaturesandx_groupsvalues.
v3.4.0#
Release date: 2024-04-01
Added#
Support to pass
x_groupstosignaturemap()andheatmap().Support to pass variant filters to
load().positions(),amplicon_performance(), &panel_uniformity()to quickly get amplicon positions, performance and panel uniformity.Option to hide columns in the variants table of the
VariantSubcloneTableworkflow.Ability to filter variants through the GUI in the
VariantSubcloneTableworkflow.overrideparameter for theheatmap()function which is simply passed toclustered_idsandclustered_barcodes.The first column of the subclone table is frozen.
Mandate
featureswhenx_groupsis provided inheatmap().An appropriate error is raised when any cell has 0 total reads when running
NSP.An appropriate error is raised when the annotation API is not available.
Changed#
Increased the vertical spacing between the graph and the fishplot from 0 to 0.1.
The plotting functions in
missionbio.mosaic.plottingwere moved tomissionbio.plottingmissionbio.algorithms.nspwas moved tomissionbio.demultiplex.protein.nspUnpinned
scikit-learnandhdbscanas their latest versions are compatible with each other.scikit-learn>1.3.1is installed by default which results in slightly different NSP calls due to changes to its Gaussian mixture model.
Fixed#
Load the whitelist variants correctly when
filter_variants=Trueis passed toload().Nill values of DANN score are shown as empty cells instead of
º.name_id_by_pos()does not filter the amplicons.Lineplot in
plot_ploidy()does not connect the medians with a line when usinggenes+ampliconsorpositions+amplicons.The violin plot range is fixed to (0, 100) for the
AFandGQlayers inVariantSubcloneTable.Violin plots generated using
violinplot()are equally spaced when split by labels.Fix resetting of
selected_barswhen scatterplots are created.rename_labels()allows swapping of labels.Fishplot does not disappear when a clone and its parent both have 0 cells at some timepoint.
v3.1.1#
Release date: 2023-09-25
Added#
Relaxed missionbio.h5 requirement to >=4.13.0,<6
Changed#
Disable autouploading of tagged packages to anaconda.
Removed check for h5 file compatibility with H5Reader.
Fixed#
The
whitelistoption inload()correctly loads exact matches of variants.
v3.1.0#
Release date: 2023-09-13
Added#
The order of the names in the legend matches the order of the traces in the ridgeplot.
Option to pass any sequence type to
get_attribute()besides np.ndarray. This includes list, tuple, and range.featuresparameter tosignature()which allows grouping across ids, just likesplitbyallows grouping across cells. * Thefeauturesoption insignaturemap()allows plotting using grouped data fromsignature()Support for hg38 along with all species available through Ensembl in
get_annotations()Support for hg38 in
get_annotations().Sped up NSP by 2x by using
statsmodelsfor the KDE and using spherical covariance with kmeans++ initialization for the GMM parameters.ANSP- Approximate NSP to protein normalization. It runs in constant time for large datasets.get_attribute()also accepts dataframes.heatmap()can plot arbitrary dataframes as long as it has the expected number of cells.TreeGraphnow supports html tags like<br>,<b>, and<span>in the descriptions.
Changed#
Use latest python 3.8 in installer instead of 3.8.0
Fixed#
The title of
clone_vs_analyte()plot does not overlap with the DNA heatmap.The x-axis label order for CNV in the
clone_vs_analyte()plot matches the order of the points in the data shown.NGT layer not modified after running
filter_variants()“Last modified” timestamp does not change when loading an H5 file.
jitterparameter inNSPworksFailure of
VariantSubcloneTablewhen all the variant calls are filtered.Pinned hdbscan to v0.8.29. Higher versions (>=0.8.30,<=0.8.33) have runtime issues.
heatmap()andsignaturemap()execute successfully when “cnv” is passed before “dna”.Fix y-compression of
TreeGraphby checking the upwards and downwards movement of only the highest and lowest nodes respectively.
Updated#
Switched from using the depracated JupyterDash to the builtin jupyter dash in Dash v2.11. Documentation
jupyter_clientfrom <8 to >=8.1.0 as the ThreadedZMQStream error is fixed in it. Changelog
v3.0.1#
Release date: 2023-06-20
Added#
assay._Assay.crosstab()to wrappandas.crosstabfor ease of use with mosaic.assay._Assay.crosstabmap()to create heatmaps of the output ofassay._Assay.crosstab().assay._Assay.hierarchical_cluster()to get the hierarchical clustering order of the rows of a DataFrame.
Changed#
Updated matplotlib dependency from
<=3.2.2to>=3.4.0
Fixed#
assay._Assay.heatmap()subclustering performed when convolve=0. It was disabled by default.Custom
typography.cssused in workflows is included in the package dataSetting labels using dictionaries in
assay._Assay.set_labels().
v3.0.0#
Release date: 2023-06-16
Added#
A wrapper for COMPASS.
New variant filters that account for missing data.
Recipe and instructions for building installers.
plot_kindparameter todna.Dna.group_by_genotype()to change the type of plot shown.filter_cellstoio.load()which loads only the intersection algorithm cells.Progress bar to
io.load()algorithms.nsp.NSPandalgorithms.nsp.ExpressionProfileto modularize the NSP code.x_groupstoassay._Assay.heatmap()to group the x-axis by a given list of ids.Simplify and speedup
assay._Assay.heatmap()by removing duplicate data. (By usingplots.heatmap.Heatmap)assay._Assay.convolve()to convolve the data that was earlier performed in the Heatmap.Configuration options accessible via
Config:ms.Config.Colorscale.Dnato change the default color palette for all DNA plots.ms.Config.Colorscale.Cnvto change the default color palette for all CNV plots.ms.Config.Colorscale.Proteinto change the default color palette for all Protein plots.
Custom divirgent colorscale for Cnv Ploidy heatmaps
Option to return indices instead of barcodes in
assay._Asasy.clustered_barcodes().sample.Sample.common_barcodes()to get the common barcodes across assays.Add
subclusterparamter toassay._Assay.clustered_barcodes()to prevent clustering within the labelsOption to pass n-dimensional arrays as splitby in
assay._Assay.clustered_barcodes()Option to fetch a subset of the assays in
sample.Sample.assays()using thenamesparametersample.Sample.clustered_barcodes()to hierarchically cluster using multiple assaysMultiple options added to
sample.Sample.heatmap()to sort the assays, barcodes, and the featuresassay._Assay.signature`()accepts asplitbyparameter to get the signature for each unique label insplitby.Improvements to
assay._Assay.signaturemap():labels and ids are clustered by default.
Option to pass a list of labels to
assay._Assay.signaturemap()to order the labels.The default
featuresoption forcnv.Cnv.signaturemap()is set topositions.
Option to copy the labels and palette together by passing an
assay._Assay()toassay._Assay.set_labels()assay._Assay.heatmap()setssubcluster=Falsewhen calculating the barcode order when convolve is provided.Varsome URLs as hyperlinks on the variant name in the
VariantSubcloneTableAdd percentage of cells and amplicons present to the
CopyNumberWorkflowdna.Dna.mutated_cells()to get the number of cells with at least 1 mutation in each given clone. This is used insample.Sample.signaturemap().
Changed#
apply_filterchanged tofilter_variantsinio.load()SubcloneTree and SubcloneTreeGraph classes are renamed to Tree and TreeGraph respectively.
show_plottoreturn_plotindna.Dna.group_by_genotype()plots.heatmap.Heatmapsplits the vertical and horizontal lines on the main heatmap into two traces.The default value of
vaf_hetindna.Dna.filter_variants()changed from 35 to 30.Flattened
sample.Sample.heatmap`()option has been removed. A more customizable version is available under thesample.Sample.signaturemap()function.The constant -
constants.COLORSto have unique values.The grey values at the 10th, 20th, 30th.. positions were modified to be unique
The black (
#000000) value was moved from the 20th position to the last position
Fixed#
Get indexes maintains the order as per
find_listwhen there are duplicates in thefind_listandorder_using_find_listis True.DANN score in the variants subclone table is shown correctly for saved h5 files.
Overlapping of text in phylogeny trees.
Error in multiprocessing when fetching gene_names for CNV by adding a
max_workersparameter and using threads instead of processes.Missing clone is ignored when finding ADO sisters.
Removed#
Functions to convert legacy loom files to h5 files -
io._loom_to_h5,io._update_fileFunctions to read data from csv files -
io._merge_files,io._cnv_raw_counts,io._protein_raw_countsFunction to merge h5 files -
io._mergeshow_plotfromprotein.Protein.normalize_reads(). The same plot can be created in plotly usingalgorithms.nsp.NSP.plot()show_plotfromprotein.Protein.get_signal_profile(). The same plot can be created in plotly usingalgorithms.nsp.ExpressionProfile.plot()protein.Protein.get_signal_profilefunction. It can be executed usingalgorithms.nsp.ExpressionProfile.fit()if needed.protein.Protein.get_scaling_factorfunction. It can be executed usingalgorithms.nsp.NSP.scaling_factor()if needed