The SegTraQ (SegTraQ) class#
The SegTraQ (SegTraQ) class represents the core interface for computing SegTraQ metrics.
- class segtraq.SegTraQ(sdata: SpatialData, images_key: str | None = 'morphology_focus', tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str | None = 'cell_area', tables_centroid_x_key: str | None = 'x_centroid', tables_centroid_y_key: str | None = 'y_centroid', tables_gene_key: str | None = None, tables_raw_counts_layer: str | None = None, points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int | None = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', points_z_key: str | None = 'z', points_gene_key: str = 'feature_name', shapes_key: str = 'cell_boundaries', shapes_cell_id_key: str = 'cell_id', nucleus_shapes_key: str | None = 'nucleus_boundaries', nucleus_shapes_cell_id_key: str = 'cell_id', filter_low_quality_transcripts: bool = True, filter_kwargs: dict | None = None)#
Initialize a SegTraQ object, the core interface for computing SegTraQ metrics. Defaults target 10x Genomics Xenium; override keys for other technologies. By default, this removes low-quality and control transcripts that would otherwise skew metrics, but this can be configured via the filter_low_quality_transcripts and filter_kwargs arguments.
- Parameters:
sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).
images_key (str or None, optional, default="morphology_focus") – Key in sdata.images for a nuclear or morphology image (e.g., DAPI). Used for visualization or to derive a nucleus mask via segtraq.run_cellpose when using the nuclear correlation module (segtraq.nc). If None, no image is expected.
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
tables_area_key (str or None, optional, default="cell_area") – Column in the cell table with cell area (2D). If None, area will be computed via segtraq.bl.morphological_features.
tables_centroid_x_key (str or None, optional, default="x_centroid") – Column in the cell table with the x-coordinate of the cell centroid.
tables_centroid_y_key (str or None, optional, default="y_centroid") – Column in the cell table with the y-coordinate of the cell centroid.
tables_gene_key (str or None, default=None) – Column in sdata.tables[tables_key].var containing gene identifiers. If None, sdata.tables[tables_key].var_names are used.
tables_raw_counts_layer (str | None, optional) – Layer containing count data. If None, adata.X is used if it looks like counts. If a layer is specified, it must exist and contain count-like values.
points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.
points_background_id (str or int or None, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.
points_z_key (str or None, optional, default="z") – Column for the z-coordinate (3D data). If None, data are treated as 2D.
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.
shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.
shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed). If None, the index is assumed to contain cell IDs and renamed to “segtraq_id”.
nucleus_shapes_key (str or None, optional, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available. If None, a nucleus mask can be obtained via segtraq.run_cellpose.
nucleus_shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[nucleus_shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).
- filter_low_quality_transcriptsbool, default=True
Whether to filter out low-quality and control transcripts that would otherwise skew metrics.
- filter_kwargsdict or None, optional
If filter_low_quality_transcripts is True, these keyword arguments are forwarded to the filtering function. Possible keys are: min_qv, control_genes, control_prefixes, inplace. Please refer to the function _filter_control_and_low_quality_transcripts for details.
Notes
After initializing a SegTraQ instance, all SegTraQ modules can be run directly from the object using its module facades.
Wrappers (run_baseline, run_nuclear_correlation, etc.) to run all metrics of a module are provided below.
- filter_cells(col: str, func: Callable, inplace: bool = True)#
Filter cells from the cell table based on a user-defined function.
- Parameters:
col (str) – Column in the cell table to apply the filtering function on.
func (Callable) – A function that takes a single argument (the column value) and returns True if the cell should be kept, False otherwise.
inplace (bool, default=True) – If True, modifies self.sdata in place. If False, returns a new SpatialData object with the filtered cells.
- Returns:
If inplace=True: returns None after modifying self.sdata.
If inplace=False: returns a new SpatialData object with filtered cells.
- Return type:
None or SpatialData
Example
>>> st.filter_cells(col='cell_area', func=lambda x: x > 100)
- filter_control_and_low_quality_transcripts(min_qv: float = 20.0, control_prefixes: tuple | list = ('NegControlProbe_', 'antisense_', 'NegControlCodeword', 'BLANK_', 'Blank-', 'NegPrb', 'DeprecatedCodeword_', 'UnassignedCodeword_'), control_genes: tuple | list = (), recompute_expression: bool = True, inplace: bool = True)#
Filter control and low-quality transcripts from the SpatialData object. This is always done in place.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing transcript data.
min_qv (float | None, default=20.0) – Minimum quality value (qv) threshold for transcripts to be considered valid. If None, no filtering is applied based on quality.
control_prefixes (tuple | list, default=() – “NegControlProbe_”, “antisense_”, “NegControlCodeword”, “BLANK_”, “Blank-“, “NegPrb”, “DeprecatedCodeword_”, “UnassignedCodeword_”, “Intergenic_Region_”,
) – Control prefixes to identify control probes in gene names. Transcripts with gene names starting with any of these prefixes will be considered control probes and filtered out.
control_genes (tuple | list, default=()) – Additional keywords to identify control probes in gene names. For these ones, exact matches will be filtered out (e.g. “GAPDH” or “ERCC-00002”), whereas for the control_prefixes, any gene name starting with the prefix will be filtered out (e.g. “NegControlProbe_1” or “NegControlProbe_2”).
points_key (str, default="transcripts") – The key in the SpatialData points attribute that contains transcript data.
points_gene_key (str, default="feature_name") – The column name in the points DataFrame that contains gene names.
points_cell_id_key (str, default="cell_id") – The column name in the points DataFrame that contains cell IDs.
points_background_id (str | int | None, default="UNASSIGNED") – The value in the points DataFrame that indicates background/unassigned transcripts.
tables_key (str, default="table") – The key in the SpatialData tables attribute that contains the expression table.
tables_cell_id_key (str, default="cell_id") – The column name in the tables DataFrame that contains cell IDs.
tables_gene_key (str or None, default=None) – Column in sdata.tables[tables_key].var containing gene identifiers. If None, sdata.tables[tables_key].var_names are used.
recompute_expression (bool, default=True) – Whether to recompute the expression matrix after filtering. Note that this can be computationally expensive for large datasets.
inplace (bool, default=True) – Whether to modify the SpatialData object in place. Defaults to True.
- Returns:
The updated SpatialData object with invalid transcripts marked (in an extra column).
- Return type:
sd.SpatialData
- markers_from_reference(adata: AnnData, ref_cell_type: str, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = 'raw', mode: str = 'de', max_fpr: float | None = None, auc_pos_thresh: float = 0.9, method: str = 'wilcoxon', pval_adj_thresh: float = 0.05, logfc_pos_thresh: float = 1.0, vote_fraction_pos: float = 0.5, min_pos_frac: float = 0.1, max_neg_frac: float = 0.05, t_pos: float = 0.25, t_neg: float = 1.0, min_cells_per_celltype: int = 10, n_jobs: int = 1)#
Compute positive and negative markers per cell type using pairwise contrasts (AUC/pAUC or DE) followed by voting and a rarity-based definition of negative markers.
Positive markers: For each cell type c, a gene g is considered a positive marker if it is “up in c” in at least ceil(vote_fraction_pos * M_c) of its valid pairwise comparisons (M_c). Additionally, g must be expressed (> 0) in at least min_pos_frac fraction of cells of type c in the reference dataset.
Negative markers: For each ordered pair (a, b) of cell types, take genes up in a vs b and consider them negative-marker candidates for b if (1.) they are expressed (> 0) in at most max_neg_frac fraction of cells of type b, and (2.) are not up in b vs any cell type (computed across all ordered contrasts).
Overlap filtering: Overlap filtering is applied separately to positive and negative markers:
Positive lists: genes appearing in ≥ t_pos * n_types lists are dropped.
Negative lists: genes appearing in ≥ t_neg * n_types lists are dropped.
- Parameters:
adata (AnnData) – Reference single-cell dataset (cells x genes).
ref_cell_type (str) – Column in adata.obs containing cell type labels.
ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.
ref_raw_counts_layer (str or None, default=None) – Layer containing raw counts. If None, raw counts are expected in adata.X.
mode ({"auc", "de"}, optional (default: "de")) –
“auc”: compute markers using pairwise AUC/pAUC.
”de” : compute markers using pairwise DE.
max_fpr (float or None, optional (default: None)) – (AUC mode only) If None, compute full AUC. If in (0, 1], compute standardized pAUC over [0, max_fpr] using sklearn’s roc_auc_score(max_fpr=max_fpr).
auc_pos_thresh (float, optional (default: 0.9)) – (AUC mode only) Minimum AUC/pAUC for a gene to be considered “up in c_i vs c_j”.
method (str, optional (default: "wilcoxon")) – (DE mode only) DE method passed to sc.tl.rank_genes_groups (“wilcoxon”, “t-test”, “logreg”, …).
pval_adj_thresh (float, optional (default: 0.05)) – (DE mode only) FDR (adjusted p-value) cutoff for positive markers.
logfc_pos_thresh (float, optional (default: 1.0)) – (DE mode only) Minimum log fold-change for positive markers (c > d).
vote_fraction_pos (float, optional (default: 0.5)) – Fraction of valid pairwise contrasts in which a gene must be “up in c” (AUC mode) / significantly up in c (DE mode) to be called a positive marker of c.
min_pos_frac (float, optional (default: 0.1)) – Minimum fraction of cells of type c in which a gene must be expressed (counts > 0) in the reference dataset to be considered a positive marker of c.
max_neg_frac (float, optional (default: 0.05)) – Maximum fraction of cells of type c in which a gene may be expressed (counts > 0) in the reference dataset to be considered a negative marker of c.
t_pos (float, optional (default: 0.25)) – Overlap filter threshold for positive markers.
t_neg (float, optional (default: 1.0)) – Overlap filter threshold for negative markers.
min_cells_per_celltype (int, optional (default: 10)) – Minimum number of cells required per cell type to be included in pairwise computations.
n_jobs (int, optional (default: 1)) – Number of parallel jobs for running pairwise computations.
- Returns:
A dictionary mapping each cell type to its positive and negative markers: {cell_type: {“positive”: [genes], “negative”: [genes]}}
- Return type:
dict
- run_baseline(inplace: bool = True, *, morphological_kwargs: dict | None = None)#
Run baseline (bl) metrics.
Convenience wrapper around global and per-cell summary metrics. Runs, in order:
number of cells
number of transcripts
number of genes
% unassigned transcripts
% unassigned transcripts per gene
transcripts per cell
genes per cell
mean transcripts per detected gene per cell
morphological features
transcript density
- Parameters:
inplace (bool, default=True) – If True, results are merged into .uns, .obs, and/or .var as implemented by each metric, and None is returned. If False, per-metric results are returned in a dict.
morphological_kwargs (dict or None, optional) – Extra arguments forwarded to
bl.morphological_features().
- Returns:
If inplace=True, returns None. If inplace=False, returns a dict with keys: - “num_cells” - “num_transcripts” - “num_genes” - “perc_unassigned_transcripts” - “perc_unassigned_transcripts_per_gene” - “transcripts_per_cell” - “genes_per_cell” - “mean_transcripts_per_gene_per_cell” - “morphological_features” - “transcript_density”
- Return type:
None or dict
- run_clustering_stability(key_prefix: str = 'leiden_subset', use_hvg: bool = False, inplace: bool = True, connectedness_kwargs: dict | None = None, silhouette_kwargs: dict | None = None, purity_kwargs: dict | None = None, ari_kwargs: dict | None = None, leiden_kwargs: dict | None = None)#
Run clustering-stability metrics.
This method is a convenience wrapper around the clustering-stability (cs) functions. It runs, in order:
cluster connectedness
silhouette score
purity (subset stability)
ARI (subset stability)
Only parameters shared by all four computations are exposed explicitly. All other parameters are provided via method-specific *_kwargs dictionaries.
- Parameters:
key_prefix (str, default="leiden_subset") – Prefix for Leiden clustering labels written to .obs by the underlying methods (where applicable).
use_hvg (bool, optional) – Whether to use highly variable genes (HVGs) for PCA. By default False.
inplace (bool, default=True) – If True, metrics are written to sdata.tables[“table”].uns by the underlying methods and this function returns None. If False, the computed metrics are returned as a dictionary.
connectedness_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_cluster_connectedness().silhouette_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_silhouette_score().purity_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_purity().ari_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_ari().leiden_kwargs (dict or None, optional) – Additional keyword arguments forwarded to Leiden clustering in all underlying methods that perform clustering. For example, flavor=’igraph’ can be used to specify the Leiden implementation.
- Returns:
If inplace=True, returns None. If inplace=False, returns a dict with keys:
”cluster_connectedness” : float
”silhouette_score” : float
”mean_purity” : float
”mean_ari” : float
- Return type:
None or dict
- run_label_transfer(adata_ref: AnnData, ref_cell_type: str, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = 'raw', tx_min: float = 10.0, tx_max: float = 2000.0, gn_min: float = 5.0, gn_max: float = inf, cell_type_key: str = 'transferred_cell_type', use_hvg: bool = False, exclude_gene_prefixes: tuple[str, ...] = ('MT-', 'RPL', 'RPS'), inplace: bool = True)#
Transfer cell type labels from a reference AnnData object to cells in a SpatialData table using Pearson correlation to reference mean expression profiles.
Raw counts are selected first, normalized with sc.pp.normalize_total, log-transformed with sc.pp.log1p, and then used for label transfer. If a raw-count layer is provided, it is used preferentially. Otherwise, .X is expected to contain raw counts.
- Parameters:
sdata (SpatialData) – SpatialData object containing the query dataset. Cell-level expression data are expected in sdata.tables[tables_key].
adata_ref (AnnData) – Reference AnnData object containing annotated cells.
ref_cell_type (str) – Column in adata_ref.obs containing the reference cell type labels.
tables_raw_counts_layer (str or None, default=None) – Layer in sdata.tables[tables_key].layers containing raw counts for the query data. If None, raw counts are expected in sdata.tables[tables_key].X.
ref_raw_counts_layer (str or None, default=None) – Layer in adata_ref.layers containing raw counts for the reference data. If None, raw counts are expected in adata_ref.X.
tables_key (str, default="table") – Key identifying the cell-level AnnData table in sdata.tables.
tables_cell_id_key (str, default="cell_id") – Column in sdata.tables[tables_key].obs containing unique cell identifiers.
tables_gene_key (str or None, default=None) – Column in sdata.tables[tables_key].var containing gene identifiers. If None, sdata.tables[tables_key].var_names are used.
points_key (str, default="transcripts") – Key identifying the transcript-level points element in sdata.points.
points_cell_id_key (str, default="cell_id") – Column in the transcript points table containing cell identifiers.
points_gene_key (str, default="feature_name") – Column in the transcript points table containing gene names.
tx_min (float, default=10.0) – Minimum number of detected transcripts required for a cell to be retained.
tx_max (float, default=2000.0) – Maximum number of detected transcripts allowed for a cell to be retained.
gn_min (float, default=5.0) – Minimum number of detected genes required for a cell to be retained.
gn_max (float, default=np.inf) – Maximum number of detected genes allowed for a cell to be retained.
cell_type_key (str, default="transferred_cell_type") – Column name used to store transferred labels in the query table’s .obs.
ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.
use_hvg (bool, default=False) – If True, restrict label transfer to highly variable genes computed from the reference dataset.
exclude_gene_prefixes (tuple of str, default=("MT-", "RPL", "RPS")) – Gene prefixes to exclude from the HVG set before label transfer. Set to an empty tuple to disable this filtering.
inplace (bool, default=True) – If True, write transferred labels to sdata.tables[tables_key].obs[cell_type_key] and return None. If False, return a DataFrame with transferred labels and Pearson correlation scores.
- Returns:
If inplace=False, returns a DataFrame with columns including tables_cell_id_key, cell_type_key, and “pearson_score”. If inplace=True, modifies sdata in place and returns None.
- Return type:
pandas.DataFrame or None
- run_point_statistics(genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, inplace: bool = True, *, centroid_kwargs: dict | None = None, membrane_kwargs: dict | None = None, skew_kwargs: dict | None = None, compartments_kwargs: dict | None = None)#
Run point-statistics (ps) metrics.
Convenience wrapper around point-level spatial statistics. Applies shared transcript and cell filtering (by gene(s) and cell type) and runs, in order:
percentage of transcripts in compartments (nucleus overlap, cytoplasm, outside)
distance to centroid (cell or nucleus)
distance to membrane (cell or nucleus)
membrane-distance skewness
Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific *_kwargs dictionaries.
- Parameters:
genes (str | list[str] | None, optional) – Gene(s) to include. If None, all genes are used.
cell_type_key (str, default="transferred_cell_type") – Cell-type annotation key in sdata.tables[…].obs.
cell_type_query (str | list[str] | None, optional) – Restrict computations to cells matching these label(s).
inplace (bool, default=True) – If True, results are merged into .obs and None is returned. If False, per-metric results are returned.
centroid_kwargs (dict or None, optional) – Extra arguments for
ps.distance_to_centroid().membrane_kwargs (dict or None, optional) – Extra arguments for
ps.distance_to_membrane().skew_kwargs (dict or None, optional) – Extra arguments for
ps.membrane_distance_skewness().compartments_kwargs (dict or None, optional) – Extra arguments for
ps.percentage_transcripts_in_compartments().
- Returns:
If inplace=True, returns None. If inplace=False, returns a dict with keys: - “percentage_transcripts_in_compartments” - “distance_to_centroid” - “distance_to_membrane” - “membrane_distance_skewness”
- Return type:
None or dict
- run_region_similarity(n_jobs: int = -1, parallel_backend: str = 'threading', inplace: bool = True, iou_kwargs: dict = None, similarity_nucleus_cell_kwargs: dict = None, similarity_nucleus_cytoplasm_kwargs: dict = None, similarity_center_border_kwargs: dict = None, similarity_border_neighborhood_kwargs: dict = None, border_admixture_score_kwargs: dict = None)#
Compute region similarity metrics and optionally merge them into the cell table.
This runs, in order: 1) matching between each cell and its best-matching nucleus 2) similarity between per-cell expression and its matched nucleus 3) similarity between the cell’s nucleus-overlapping and cytoplasmic expression 4) similarity between center and border expression 5) similarity between border and neighborhood expression 6) border admixture score
- Returns:
If inplace=True, returns None after writing to sdata. If inplace=False, returns a dictionary of DataFrames.
- Return type:
None or dict
- run_supervised(*, adata_ref: AnnData | None = None, ref_cell_type: str | None = None, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = None, markers: dict[str, dict[str, list[str]]] | None = None, cell_type_key: str | None = None, inplace: bool = True, label_transfer_kwargs: dict | None = None, markers_from_reference_kwargs: dict | None = None, purity_kwargs: dict | None = None, contamination_kwargs: dict | None = None, mecr_kwargs: dict | None = None)#
Run supervised (sp) metrics.
If markers is None, marker genes are generated from adata_ref using self.markers_from_reference() with ref_cell_type.
If cell_type_key is None, label transfer is run first via self.run_label_transfer() using adata_ref and ref_cell_type. The resulting labels are stored under “transferred_cell_type”.
Runs, in order:
label transfer
marker generation from reference
marker_purity
neighbor_contamination
mutually_exclusive_coexpression_rate
- Parameters:
adata_ref (AnnData or None, default=None) – Reference AnnData object used for label transfer and/or marker extraction. Required if cell_type_key=None or markers=None.
ref_cell_type (str or None, default=None) – Column in adata_ref.obs containing reference cell-type labels. Required if cell_type_key=None or markers=None.
ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.
ref_raw_counts_layer (str or None, default=None) – Layer containing raw counts. If None, raw counts are expected in adata.X.
markers (dict or None, default=None) – Dictionary of marker genes in the form {cell_type: {“positive”: list[str], “negative”: list[str]}}. If None, markers are computed from adata_ref using self.markers_from_reference().
cell_type_key (str | None = None) – Column in the query AnnData .obs with cell-type labels. If None, label transfer is run first using adata_ref and ref_cell_type, and the transferred labels are stored under “transferred_cell_type”.
inplace (bool, default=True) – If True, writes results into .obs and/or .uns as implemented by the underlying functions and returns None. If False, returns all results as a dictionary.
label_transfer_kwargs (dict or None, optional) – Extra keyword arguments forwarded to self.run_label_transfer(). Do not include adata_ref or ref_cell_type here; pass them directly to run_supervised.
markers_from_reference_kwargs (dict or None, optional) – Extra keyword arguments forwarded to self.markers_from_reference(). Do not include adata or ref_cell_type here; pass them directly to run_supervised.
purity_kwargs (dict or None, optional) – Extra arguments for sp.marker_purity.
contamination_kwargs (dict or None, optional) – Extra arguments for sp.neighbor_contamination.
mecr_kwargs (dict or None, optional) – Extra arguments for sp.mutually_exclusive_coexpression_rate.
- Returns:
If inplace=True, returns None. If inplace=False, returns a dictionary with keys “label_transfer”, “markers”, “marker_purity”, “neighbor_contamination”, and “mutually_exclusive_coexpression_rate”.
- Return type:
None or dict
- run_volume(*, adata_ref: AnnData | None = None, ref_cell_type: str | None = None, ref_gene_key: str | None = None, ref_raw_counts_layer: str | None = None, cell_type_key: str | None = None, inplace: bool = True, label_transfer_kwargs: dict[str, Any] | None = None, similarity_kwargs: dict[str, Any] | None = None, heterotypic_overlap_kwargs: dict[str, Any] | None = None, vsi_kwargs: dict[str, Any] | None = None)#
Run volume-layer (vl) metrics.
Convenience wrapper around segtraq.vl functions via the instance facade self.vl.
If cell_type_key is provided, it is used for cell-type-aware metrics and to infer the number of ovrlpy components from sdata.tables[tables_key].obs[cell_type_key].
If cell_type_key is None and adata_ref is provided, label transfer is run first via self.run_label_transfer() using adata_ref and ref_cell_type. The resulting labels are stored under “transferred_cell_type”.
If neither cell_type_key nor adata_ref is provided, cell-type-aware metrics are skipped and ovrlpy is run with its default number of components.
A precomputed ovrlpy.Ovrlp object can be passed via vsi_kwargs={“ovrlp”: ovrlp}. In that case, ovrlpy is not run internally and n_components is not inferred.
Runs, in order:
similarity_top_bottom
label transfer, if needed and possible
vertical_signal_integrity_per_cell
fraction_heterotypic_overlap, only if cell-type labels and valid shapes are available
- Parameters:
adata_ref (AnnData or None, default=None) – Reference AnnData object used for label transfer and/or to infer the number of ovrlpy components from reference cell-type labels.
ref_cell_type (str or None, default=None) – Column in adata_ref.obs containing reference cell-type labels. Required if cell_type_key=None and label transfer should be run, or if adata_ref is used to infer n_components.
ref_gene_key (str or None, default=None) – Column in adata_ref.var containing gene identifiers. If None, adata_ref.var_names are used.
ref_raw_counts_layer (str or None, default=None) – Layer containing raw counts. If None, raw counts are expected in adata.X.
cell_type_key (str or None, default=None) – Column in sdata.tables[tables_key].obs containing cell-type labels. If provided, this column is used for fraction_heterotypic_overlap and to infer ovrlpy n_components. If None and adata_ref is provided, label transfer is run first and labels are stored under “transferred_cell_type”.
inplace (bool, default=True) – If True, writes results into sdata.tables[tables_key].obs as implemented by the underlying functions and returns None. If False, returns all results as a dictionary.
label_transfer_kwargs (dict or None, optional) – Extra keyword arguments forwarded to self.run_label_transfer(). Do not include adata_ref or ref_cell_type here; pass them directly to run_volume.
similarity_kwargs (dict or None, optional) – Extra keyword arguments forwarded to vl.similarity_top_bottom.
heterotypic_overlap_kwargs (dict or None, optional) – Extra keyword arguments forwarded to vl.fraction_heterotypic_overlap.
vsi_kwargs (dict or None, optional) – Extra keyword arguments forwarded to vl.vertical_signal_integrity_per_cell. To use a precomputed ovrlpy object, pass it here as {“ovrlp”: ovrlp}.
- Returns:
If inplace=True, returns None.
If inplace=False, returns a dictionary with available metric results.
- Return type:
None or dict[str, object]