The SegTraQ (SegTraQ) class#

The SegTraQ (SegTraQ) class represents the core interface for computing SegTraQ metrics.

class segtraq.SegTraQ(sdata: SpatialData, images_key: str | None = 'morphology_focus', tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str | None = 'cell_area', tables_centroid_x_key: str | None = 'x_centroid', tables_centroid_y_key: str | None = 'y_centroid', tables_gene_key: str | None = None, tables_raw_counts_layer: str | None = None, points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int | None = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', points_z_key: str | None = 'z', points_gene_key: str = 'feature_name', shapes_key: str = 'cell_boundaries', shapes_cell_id_key: str = 'cell_id', nucleus_shapes_key: str | None = 'nucleus_boundaries', nucleus_shapes_cell_id_key: str = 'cell_id', filter_low_quality_transcripts: bool = True, filter_kwargs: dict | None = None)#

Initialize a SegTraQ object, the core interface for computing SegTraQ metrics. Defaults target 10x Genomics Xenium; override keys for other technologies. By default, this removes low-quality and control transcripts that would otherwise skew metrics, but this can be configured via the filter_low_quality_transcripts and filter_kwargs arguments.

Parameters:
  • sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).

  • images_key (str or None, optional, default="morphology_focus") – Key in sdata.images for a nuclear or morphology image (e.g., DAPI). Used for visualization or to derive a nucleus mask via segtraq.run_cellpose when using the nuclear correlation module (segtraq.nc). If None, no image is expected.

  • tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • tables_area_key (str or None, optional, default="cell_area") – Column in the cell table with cell area (2D). If None, area will be computed via segtraq.bl.morphological_features.

  • tables_centroid_x_key (str or None, optional, default="x_centroid") – Column in the cell table with the x-coordinate of the cell centroid.

  • tables_centroid_y_key (str or None, optional, default="y_centroid") – Column in the cell table with the y-coordinate of the cell centroid.

  • tables_gene_key (str or None, default=None) – Column in sdata.tables[tables_key].var containing gene identifiers. If None, sdata.tables[tables_key].var_names are used.

  • tables_raw_counts_layer (str | None, optional) – Layer containing count data. If None, adata.X is used if it looks like counts. If a layer is specified, it must exist and contain count-like values.

  • points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.

  • points_background_id (str or int or None, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.

  • points_z_key (str or None, optional, default="z") – Column for the z-coordinate (3D data). If None, data are treated as 2D.

  • points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.

  • shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.

  • shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed). If None, the index is assumed to contain cell IDs and renamed to “segtraq_id”.

  • nucleus_shapes_key (str or None, optional, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available. If None, a nucleus mask can be obtained via segtraq.run_cellpose.

  • nucleus_shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[nucleus_shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).

filter_low_quality_transcriptsbool, default=True

Whether to filter out low-quality and control transcripts that would otherwise skew metrics.

filter_kwargsdict or None, optional

If filter_low_quality_transcripts is True, these keyword arguments are forwarded to the filtering function. Possible keys are: min_qv, control_genes, control_prefixes, inplace. Please refer to the function _filter_control_and_low_quality_transcripts for details.

Notes

After initializing a SegTraQ instance, all SegTraQ modules can be run directly from the object using its module facades.

Wrappers (run_baseline, run_nuclear_correlation, etc.) to run all metrics of a module are provided below.

filter_cells(col: str, func: Callable, inplace: bool = True)#

Filter cells from the cell table based on a user-defined function.

Parameters:
  • col (str) – Column in the cell table to apply the filtering function on.

  • func (Callable) – A function that takes a single argument (the column value) and returns True if the cell should be kept, False otherwise.

  • inplace (bool, default=True) – If True, modifies self.sdata in place. If False, returns a new SpatialData object with the filtered cells.

Returns:

  • If inplace=True: returns None after modifying self.sdata.

  • If inplace=False: returns a new SpatialData object with filtered cells.

Return type:

None or SpatialData

Example

>>> st.filter_cells(col='cell_area', func=lambda x: x > 100)
filter_control_and_low_quality_transcripts(min_qv: float = 20.0, control_prefixes: tuple | list = ('NegControlProbe_', 'antisense_', 'NegControlCodeword', 'BLANK_', 'Blank-', 'NegPrb', 'DeprecatedCodeword_', 'UnassignedCodeword_'), control_genes: tuple | list = (), recompute_expression: bool = True, inplace: bool = True)#

Filter control and low-quality transcripts from the SpatialData object. This is always done in place.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing transcript data.

  • min_qv (float | None, default=20.0) – Minimum quality value (qv) threshold for transcripts to be considered valid. If None, no filtering is applied based on quality.

  • control_prefixes (tuple | list, default=() – “NegControlProbe_”, “antisense_”, “NegControlCodeword”, “BLANK_”, “Blank-“, “NegPrb”, “DeprecatedCodeword_”, “UnassignedCodeword_”, “Intergenic_Region_”,

  • ) – Control prefixes to identify control probes in gene names. Transcripts with gene names starting with any of these prefixes will be considered control probes and filtered out.

  • control_genes (tuple | list, default=()) – Additional keywords to identify control probes in gene names. For these ones, exact matches will be filtered out (e.g. “GAPDH” or “ERCC-00002”), whereas for the control_prefixes, any gene name starting with the prefix will be filtered out (e.g. “NegControlProbe_1” or “NegControlProbe_2”).

  • points_key (str, default="transcripts") – The key in the SpatialData points attribute that contains transcript data.

  • points_gene_key (str, default="feature_name") – The column name in the points DataFrame that contains gene names.

  • points_cell_id_key (str, default="cell_id") – The column name in the points DataFrame that contains cell IDs.

  • points_background_id (str | int | None, default="UNASSIGNED") – The value in the points DataFrame that indicates background/unassigned transcripts.

  • tables_key (str, default="table") – The key in the SpatialData tables attribute that contains the expression table.

  • tables_cell_id_key (str, default="cell_id") – The column name in the tables DataFrame that contains cell IDs.

  • tables_gene_key (str or None, default=None) – Column in sdata.tables[tables_key].var containing gene identifiers. If None, sdata.tables[tables_key].var_names are used.

  • recompute_expression (bool, default=True) – Whether to recompute the expression matrix after filtering. Note that this can be computationally expensive for large datasets.

  • inplace (bool, default=True) – Whether to modify the SpatialData object in place. Defaults to True.

Returns:

The updated SpatialData object with invalid transcripts marked (in an extra column).

Return type:

sd.SpatialData

markers_from_reference(adata: AnnData, ref_cell_type: str, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = 'raw', mode: str = 'de', max_fpr: float | None = None, auc_pos_thresh: float = 0.9, method: str = 'wilcoxon', pval_adj_thresh: float = 0.05, logfc_pos_thresh: float = 1.0, vote_fraction_pos: float = 0.5, min_pos_frac: float = 0.1, max_neg_frac: float = 0.05, t_pos: float = 0.25, t_neg: float = 1.0, min_cells_per_celltype: int = 10, n_jobs: int = 1)#

Compute positive and negative markers per cell type using pairwise contrasts (AUC/pAUC or DE) followed by voting and a rarity-based definition of negative markers.

Positive markers: For each cell type c, a gene g is considered a positive marker if it is “up in c” in at least ceil(vote_fraction_pos * M_c) of its valid pairwise comparisons (M_c). Additionally, g must be expressed (> 0) in at least min_pos_frac fraction of cells of type c in the reference dataset.

Negative markers: For each ordered pair (a, b) of cell types, take genes up in a vs b and consider them negative-marker candidates for b if (1.) they are expressed (> 0) in at most max_neg_frac fraction of cells of type b, and (2.) are not up in b vs any cell type (computed across all ordered contrasts).

Overlap filtering: Overlap filtering is applied separately to positive and negative markers:

  • Positive lists: genes appearing in ≥ t_pos * n_types lists are dropped.

  • Negative lists: genes appearing in ≥ t_neg * n_types lists are dropped.

Parameters:
  • adata (AnnData) – Reference single-cell dataset (cells x genes).

  • ref_cell_type (str) – Column in adata.obs containing cell type labels.

  • ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.

  • ref_raw_counts_layer (str or None, default=None) – Layer containing raw counts. If None, raw counts are expected in adata.X.

  • mode ({"auc", "de"}, optional (default: "de")) –

    • “auc”: compute markers using pairwise AUC/pAUC.

    • ”de” : compute markers using pairwise DE.

  • max_fpr (float or None, optional (default: None)) – (AUC mode only) If None, compute full AUC. If in (0, 1], compute standardized pAUC over [0, max_fpr] using sklearn’s roc_auc_score(max_fpr=max_fpr).

  • auc_pos_thresh (float, optional (default: 0.9)) – (AUC mode only) Minimum AUC/pAUC for a gene to be considered “up in c_i vs c_j”.

  • method (str, optional (default: "wilcoxon")) – (DE mode only) DE method passed to sc.tl.rank_genes_groups (“wilcoxon”, “t-test”, “logreg”, …).

  • pval_adj_thresh (float, optional (default: 0.05)) – (DE mode only) FDR (adjusted p-value) cutoff for positive markers.

  • logfc_pos_thresh (float, optional (default: 1.0)) – (DE mode only) Minimum log fold-change for positive markers (c > d).

  • vote_fraction_pos (float, optional (default: 0.5)) – Fraction of valid pairwise contrasts in which a gene must be “up in c” (AUC mode) / significantly up in c (DE mode) to be called a positive marker of c.

  • min_pos_frac (float, optional (default: 0.1)) – Minimum fraction of cells of type c in which a gene must be expressed (counts > 0) in the reference dataset to be considered a positive marker of c.

  • max_neg_frac (float, optional (default: 0.05)) – Maximum fraction of cells of type c in which a gene may be expressed (counts > 0) in the reference dataset to be considered a negative marker of c.

  • t_pos (float, optional (default: 0.25)) – Overlap filter threshold for positive markers.

  • t_neg (float, optional (default: 1.0)) – Overlap filter threshold for negative markers.

  • min_cells_per_celltype (int, optional (default: 10)) – Minimum number of cells required per cell type to be included in pairwise computations.

  • n_jobs (int, optional (default: 1)) – Number of parallel jobs for running pairwise computations.

Returns:

A dictionary mapping each cell type to its positive and negative markers: {cell_type: {“positive”: [genes], “negative”: [genes]}}

Return type:

dict

run_baseline(inplace: bool = True, *, morphological_kwargs: dict | None = None)#

Run baseline (bl) metrics.

Convenience wrapper around global and per-cell summary metrics. Runs, in order:

  1. number of cells

  2. number of transcripts

  3. number of genes

  4. % unassigned transcripts

  5. % unassigned transcripts per gene

  6. transcripts per cell

  7. genes per cell

  8. mean transcripts per detected gene per cell

  9. morphological features

  10. transcript density

Parameters:
  • inplace (bool, default=True) – If True, results are merged into .uns, .obs, and/or .var as implemented by each metric, and None is returned. If False, per-metric results are returned in a dict.

  • morphological_kwargs (dict or None, optional) – Extra arguments forwarded to bl.morphological_features().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - “num_cells” - “num_transcripts” - “num_genes” - “perc_unassigned_transcripts” - “perc_unassigned_transcripts_per_gene” - “transcripts_per_cell” - “genes_per_cell” - “mean_transcripts_per_gene_per_cell” - “morphological_features” - “transcript_density”

Return type:

None or dict

run_clustering_stability(key_prefix: str = 'leiden_subset', use_hvg: bool = False, inplace: bool = True, connectedness_kwargs: dict | None = None, silhouette_kwargs: dict | None = None, purity_kwargs: dict | None = None, ari_kwargs: dict | None = None, leiden_kwargs: dict | None = None)#

Run clustering-stability metrics.

This method is a convenience wrapper around the clustering-stability (cs) functions. It runs, in order:

  1. cluster connectedness

  2. silhouette score

  3. purity (subset stability)

  4. ARI (subset stability)

Only parameters shared by all four computations are exposed explicitly. All other parameters are provided via method-specific *_kwargs dictionaries.

Parameters:
  • key_prefix (str, default="leiden_subset") – Prefix for Leiden clustering labels written to .obs by the underlying methods (where applicable).

  • use_hvg (bool, optional) – Whether to use highly variable genes (HVGs) for PCA. By default False.

  • inplace (bool, default=True) – If True, metrics are written to sdata.tables[“table”].uns by the underlying methods and this function returns None. If False, the computed metrics are returned as a dictionary.

  • connectedness_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_cluster_connectedness().

  • silhouette_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_silhouette_score().

  • purity_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_purity().

  • ari_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_ari().

  • leiden_kwargs (dict or None, optional) – Additional keyword arguments forwarded to Leiden clustering in all underlying methods that perform clustering. For example, flavor=’igraph’ can be used to specify the Leiden implementation.

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys:

  • ”cluster_connectedness” : float

  • ”silhouette_score” : float

  • ”mean_purity” : float

  • ”mean_ari” : float

Return type:

None or dict

run_label_transfer(adata_ref: AnnData, ref_cell_type: str, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = 'raw', tx_min: float = 10.0, tx_max: float = 2000.0, gn_min: float = 5.0, gn_max: float = inf, cell_type_key: str = 'transferred_cell_type', use_hvg: bool = False, exclude_gene_prefixes: tuple[str, ...] = ('MT-', 'RPL', 'RPS'), inplace: bool = True)#

Transfer cell type labels from a reference AnnData object to cells in a SpatialData table using Pearson correlation to reference mean expression profiles.

Raw counts are selected first, normalized with sc.pp.normalize_total, log-transformed with sc.pp.log1p, and then used for label transfer. If a raw-count layer is provided, it is used preferentially. Otherwise, .X is expected to contain raw counts.

Parameters:
  • sdata (SpatialData) – SpatialData object containing the query dataset. Cell-level expression data are expected in sdata.tables[tables_key].

  • adata_ref (AnnData) – Reference AnnData object containing annotated cells.

  • ref_cell_type (str) – Column in adata_ref.obs containing the reference cell type labels.

  • tables_raw_counts_layer (str or None, default=None) – Layer in sdata.tables[tables_key].layers containing raw counts for the query data. If None, raw counts are expected in sdata.tables[tables_key].X.

  • ref_raw_counts_layer (str or None, default=None) – Layer in adata_ref.layers containing raw counts for the reference data. If None, raw counts are expected in adata_ref.X.

  • tables_key (str, default="table") – Key identifying the cell-level AnnData table in sdata.tables.

  • tables_cell_id_key (str, default="cell_id") – Column in sdata.tables[tables_key].obs containing unique cell identifiers.

  • tables_gene_key (str or None, default=None) – Column in sdata.tables[tables_key].var containing gene identifiers. If None, sdata.tables[tables_key].var_names are used.

  • points_key (str, default="transcripts") – Key identifying the transcript-level points element in sdata.points.

  • points_cell_id_key (str, default="cell_id") – Column in the transcript points table containing cell identifiers.

  • points_gene_key (str, default="feature_name") – Column in the transcript points table containing gene names.

  • tx_min (float, default=10.0) – Minimum number of detected transcripts required for a cell to be retained.

  • tx_max (float, default=2000.0) – Maximum number of detected transcripts allowed for a cell to be retained.

  • gn_min (float, default=5.0) – Minimum number of detected genes required for a cell to be retained.

  • gn_max (float, default=np.inf) – Maximum number of detected genes allowed for a cell to be retained.

  • cell_type_key (str, default="transferred_cell_type") – Column name used to store transferred labels in the query table’s .obs.

  • ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.

  • use_hvg (bool, default=False) – If True, restrict label transfer to highly variable genes computed from the reference dataset.

  • exclude_gene_prefixes (tuple of str, default=("MT-", "RPL", "RPS")) – Gene prefixes to exclude from the HVG set before label transfer. Set to an empty tuple to disable this filtering.

  • inplace (bool, default=True) – If True, write transferred labels to sdata.tables[tables_key].obs[cell_type_key] and return None. If False, return a DataFrame with transferred labels and Pearson correlation scores.

Returns:

If inplace=False, returns a DataFrame with columns including tables_cell_id_key, cell_type_key, and “pearson_score”. If inplace=True, modifies sdata in place and returns None.

Return type:

pandas.DataFrame or None

run_point_statistics(genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, inplace: bool = True, *, centroid_kwargs: dict | None = None, membrane_kwargs: dict | None = None, skew_kwargs: dict | None = None, compartments_kwargs: dict | None = None)#

Run point-statistics (ps) metrics.

Convenience wrapper around point-level spatial statistics. Applies shared transcript and cell filtering (by gene(s) and cell type) and runs, in order:

  1. percentage of transcripts in compartments (nucleus overlap, cytoplasm, outside)

  2. distance to centroid (cell or nucleus)

  3. distance to membrane (cell or nucleus)

  4. membrane-distance skewness

Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific *_kwargs dictionaries.

Parameters:
  • genes (str | list[str] | None, optional) – Gene(s) to include. If None, all genes are used.

  • cell_type_key (str, default="transferred_cell_type") – Cell-type annotation key in sdata.tables[…].obs.

  • cell_type_query (str | list[str] | None, optional) – Restrict computations to cells matching these label(s).

  • inplace (bool, default=True) – If True, results are merged into .obs and None is returned. If False, per-metric results are returned.

  • centroid_kwargs (dict or None, optional) – Extra arguments for ps.distance_to_centroid().

  • membrane_kwargs (dict or None, optional) – Extra arguments for ps.distance_to_membrane().

  • skew_kwargs (dict or None, optional) – Extra arguments for ps.membrane_distance_skewness().

  • compartments_kwargs (dict or None, optional) – Extra arguments for ps.percentage_transcripts_in_compartments().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - “percentage_transcripts_in_compartments” - “distance_to_centroid” - “distance_to_membrane” - “membrane_distance_skewness”

Return type:

None or dict

run_region_similarity(n_jobs: int = -1, parallel_backend: str = 'threading', inplace: bool = True, iou_kwargs: dict = None, similarity_nucleus_cell_kwargs: dict = None, similarity_nucleus_cytoplasm_kwargs: dict = None, similarity_center_border_kwargs: dict = None, similarity_border_neighborhood_kwargs: dict = None, border_admixture_score_kwargs: dict = None)#

Compute region similarity metrics and optionally merge them into the cell table.

This runs, in order: 1) matching between each cell and its best-matching nucleus 2) similarity between per-cell expression and its matched nucleus 3) similarity between the cell’s nucleus-overlapping and cytoplasmic expression 4) similarity between center and border expression 5) similarity between border and neighborhood expression 6) border admixture score

Returns:

If inplace=True, returns None after writing to sdata. If inplace=False, returns a dictionary of DataFrames.

Return type:

None or dict

run_supervised(*, adata_ref: AnnData | None = None, ref_cell_type: str | None = None, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = None, markers: dict[str, dict[str, list[str]]] | None = None, cell_type_key: str | None = None, inplace: bool = True, label_transfer_kwargs: dict | None = None, markers_from_reference_kwargs: dict | None = None, purity_kwargs: dict | None = None, contamination_kwargs: dict | None = None, mecr_kwargs: dict | None = None)#

Run supervised (sp) metrics.

If markers is None, marker genes are generated from adata_ref using self.markers_from_reference() with ref_cell_type.

If cell_type_key is None, label transfer is run first via self.run_label_transfer() using adata_ref and ref_cell_type. The resulting labels are stored under “transferred_cell_type”.

Runs, in order:

  1. label transfer

  2. marker generation from reference

  3. marker_purity

  4. neighbor_contamination

  5. mutually_exclusive_coexpression_rate

Parameters:
  • adata_ref (AnnData or None, default=None) – Reference AnnData object used for label transfer and/or marker extraction. Required if cell_type_key=None or markers=None.

  • ref_cell_type (str or None, default=None) – Column in adata_ref.obs containing reference cell-type labels. Required if cell_type_key=None or markers=None.

  • ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.

  • ref_raw_counts_layer (str or None, default=None) – Layer containing raw counts. If None, raw counts are expected in adata.X.

  • markers (dict or None, default=None) – Dictionary of marker genes in the form {cell_type: {“positive”: list[str], “negative”: list[str]}}. If None, markers are computed from adata_ref using self.markers_from_reference().

  • cell_type_key (str | None = None) – Column in the query AnnData .obs with cell-type labels. If None, label transfer is run first using adata_ref and ref_cell_type, and the transferred labels are stored under “transferred_cell_type”.

  • inplace (bool, default=True) – If True, writes results into .obs and/or .uns as implemented by the underlying functions and returns None. If False, returns all results as a dictionary.

  • label_transfer_kwargs (dict or None, optional) – Extra keyword arguments forwarded to self.run_label_transfer(). Do not include adata_ref or ref_cell_type here; pass them directly to run_supervised.

  • markers_from_reference_kwargs (dict or None, optional) – Extra keyword arguments forwarded to self.markers_from_reference(). Do not include adata or ref_cell_type here; pass them directly to run_supervised.

  • purity_kwargs (dict or None, optional) – Extra arguments for sp.marker_purity.

  • contamination_kwargs (dict or None, optional) – Extra arguments for sp.neighbor_contamination.

  • mecr_kwargs (dict or None, optional) – Extra arguments for sp.mutually_exclusive_coexpression_rate.

Returns:

If inplace=True, returns None. If inplace=False, returns a dictionary with keys “label_transfer”, “markers”, “marker_purity”, “neighbor_contamination”, and “mutually_exclusive_coexpression_rate”.

Return type:

None or dict

run_volume(*, adata_ref: AnnData | None = None, ref_cell_type: str | None = None, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = None, cell_type_key: str | None = None, inplace: bool = True, label_transfer_kwargs: dict[str, Any] | None = None, similarity_kwargs: dict[str, Any] | None = None, heterotypic_overlap_kwargs: dict[str, Any] | None = None, vsi_kwargs: dict[str, Any] | None = None)#

Run volume-layer (vl) metrics.

Convenience wrapper around segtraq.vl functions via the instance facade self.vl.

If cell_type_key is provided, it is used for cell-type-aware metrics and to infer the number of ovrlpy components from sdata.tables[tables_key].obs[cell_type_key].

If cell_type_key is None and adata_ref is provided, label transfer is run first via self.run_label_transfer() using adata_ref and ref_cell_type. The resulting labels are stored under “transferred_cell_type”.

If neither cell_type_key nor adata_ref is provided, cell-type-aware metrics are skipped and ovrlpy is run with its default number of components.

A precomputed ovrlpy.Ovrlp object can be passed via vsi_kwargs={“ovrlp”: ovrlp}. In that case, ovrlpy is not run internally and n_components is not inferred.

Runs, in order:

  1. similarity_top_bottom

  2. label transfer, if needed and possible

  3. vertical_signal_integrity_per_cell

  4. fraction_heterotypic_overlap, only if cell-type labels and valid shapes are available

Parameters:
  • adata_ref (AnnData or None, default=None) – Reference AnnData object used for label transfer and/or to infer the number of ovrlpy components from reference cell-type labels.

  • ref_cell_type (str or None, default=None) – Column in adata_ref.obs containing reference cell-type labels. Required if cell_type_key=None and label transfer should be run, or if adata_ref is used to infer n_components.

  • ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.

  • ref_raw_counts_layer (str or None, default=None) – Layer containing raw counts. If None, raw counts are expected in adata.X.

  • cell_type_key (str or None, default=None) – Column in sdata.tables[tables_key].obs containing cell-type labels. If provided, this column is used for fraction_heterotypic_overlap and to infer ovrlpy n_components. If None and adata_ref is provided, label transfer is run first and labels are stored under “transferred_cell_type”.

  • inplace (bool, default=True) – If True, writes results into sdata.tables[tables_key].obs as implemented by the underlying functions and returns None. If False, returns all results as a dictionary.

  • label_transfer_kwargs (dict or None, optional) – Extra keyword arguments forwarded to self.run_label_transfer(). Do not include adata_ref or ref_cell_type here; pass them directly to run_volume.

  • similarity_kwargs (dict or None, optional) – Extra keyword arguments forwarded to vl.similarity_top_bottom.

  • heterotypic_overlap_kwargs (dict or None, optional) – Extra keyword arguments forwarded to vl.fraction_heterotypic_overlap.

  • vsi_kwargs (dict or None, optional) – Extra keyword arguments forwarded to vl.vertical_signal_integrity_per_cell. To use a precomputed ovrlpy object, pass it here as {“ovrlp”: ovrlp}.

Returns:

If inplace=True, returns None.

If inplace=False, returns a dictionary with available metric results.

Return type:

None or dict[str, object]