Rna.normalize_reads

Rna.normalize_reads#

missionbio.mosaic.rna.Rna.normalize_reads

Rna.normalize_reads(correct_background: bool = False, negative_control_genes: Optional[Sequence[str]] = None, bg_method: str = 'max', min_reads_per_cell: int = 50, min_cells_detec_gene: float = 0.01, max_fraction: float = 0.5, exclude_highly_expressed: bool = False, use_subtraction: bool = True) None#

Normalize raw RNA counts

Function performs multi-step normalization: 1. Filters cells based on total read counts. 2. Filters genes based on detection rate across cells. 3. Normalizes counts using per-cell size factors (median scaling). 4. Optionally excludes highly expressed genes from size factor computation. 5. Optionally estimates background from negative control genes and performs subtraction-based correction. 6. Returns a log1p-transformed normalized matrix mapped back to the full gene/cell space.

Parameters:
correct_background: bool

Whether to correct the background using negative control genes

negative_control_genes: Optional[Sequence[str]]

List of negative control genes to use for background correction. Default: [‘BFP’, ‘RFP’, ‘EGFP’]

bg_method{‘mean’, ‘median’, ‘max’}

Statistic used to compute background from negative control genes.

min_reads_per_cellint

Minimum total reads per cell required to be retained.

min_cells_detec_genefloat

Minimum fraction of filtered cells in which a gene must be detected (non-zero) to be retained.

max_fractionfloat

Maximum fraction of total cell reads for a gene to be considered “highly expressed.”

exclude_highly_expressedbool

Whether to exclude highly expressed genes from size factor computation.

use_subtractionbool

Whether to perform background subtraction (division not implemented).

Raises:
ValueError

If any negative control gene is not present in the assay IDs If bg_method is not one of {‘mean’, ‘median’, ‘max’}. If min_reads_per_cell is less than 1. If none of the negative control genes are present after filtering.

NotImplementedError

If use_subtraction is False (division-based correction not yet implemented).


Rna