The SegTraQ (SegTraQ) class#

The SegTraQ (SegTraQ) class represents the core interface for computing SegTraQ metrics.

SegTraQ class#

class segtraq.SegTraQ(sdata: SpatialData, images_key: str | None = 'morphology_focus', tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str | None = 'cell_area', tables_centroid_x_key: str | None = 'x_centroid', tables_centroid_y_key: str | None = 'y_centroid', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', points_z_key: str | None = 'z', points_gene_key: str = 'feature_name', shapes_key: str = 'cell_boundaries', shapes_cell_id_key: str = 'cell_id', nucleus_shapes_key: str | None = 'nucleus_boundaries', nucleus_shapes_cell_id_key: str = 'cell_id')#

Bases: object

Initialize a SegTraQ object, the core interface for computing SegTraQ metrics. Defaults target 10x Genomics Xenium; override keys for other technologies.

Parameters:
  • sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).

  • images_key (str or None, optional, default="morphology_focus") – Key in sdata.images for a nuclear or morphology image (e.g., DAPI). Used for visualization or to derive a nucleus mask via segtraq.run_cellpose when using the nuclear correlation module (segtraq.nc). If None, no image is expected.

  • tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table. Gene names in sdata.tables[tables_key].var.index should match the gene field in sdata.points[points_key] (see points_gene_key).

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • tables_area_key (str or None, optional, default="cell_area") – Column in the cell table with cell area (2D). If None, area will be computed via segtraq.bl.morphological_features.

  • tables_centroid_x_key (str or None, optional, default="x_centroid") – Column in the cell table with the x-coordinate of the cell centroid.

  • tables_centroid_y_key (str or None, optional, default="y_centroid") – Column in the cell table with the y-coordinate of the cell centroid.

  • points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.

  • points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.

  • points_z_key (str or None, optional, default="z") – Column for the z-coordinate (3D data). If None, data are treated as 2D.

  • points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.

  • shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.

  • shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).

  • nucleus_shapes_key (str or None, optional, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available. If None, a nucleus mask can be obtained via segtraq.run_cellpose.

  • nucleus_shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[nucleus_shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).

Notes

After initializing a SegTraQ instance, all SegTraQ modules can be run directly from the object using its module facades.

Wrappers (run_baseline, run_nuclear_correlation, etc.) to run all metrics of a module are provided below.

filter_cells(col: str, func: Callable, inplace: bool = True)#

Filter cells from the cell table based on a user-defined function.

Parameters:
  • col (str) – Column in the cell table to apply the filtering function on.

  • func (Callable) – A function that takes a single argument (the column value) and returns True if the cell should be kept, False otherwise.

  • inplace (bool, default=True) – If True, modifies self.sdata in place. If False, returns a new SpatialData object with the filtered cells.

Returns:

  • If inplace=True: returns None after modifying self.sdata.

  • If inplace=False: returns a new SpatialData object with filtered cells.

Return type:

None or SpatialData

Example

>>> st.filter_cells(col='cell_area', func=lambda x: x > 100)
filter_control_and_low_quality_transcripts(min_qv: float = 20.0, control_genes: tuple | list = (), recompute_expression: bool = False, inplace: bool = True)#

Filter control and low-quality transcripts from the SpatialData object. This is always done in place.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing transcript data.

  • min_qv (float | None, default=20.0) – Minimum quality value (qv) threshold for transcripts to be considered valid. If None, no filtering is applied based on quality.

  • control_genes (tuple | list, default=()) – Additional keywords to identify control probes in gene names. By default, only standard control prefixes are used. These are: “NegControlProbe_”, “antisense_”, “NegControlCodeword”, “BLANK_”, “Blank-”, “NegPrb”, “DeprecatedCodeword_”, “UnassignedCodeword_”.

  • points_key (str, default="transcripts") – The key in the SpatialData points attribute that contains transcript data.

  • points_gene_key (str, default="feature_name") – The column name in the points DataFrame that contains gene names.

  • points_cell_id_key (str, default="cell_id") – The column name in the points DataFrame that contains cell IDs.

  • tables_key (str, default="table") – The key in the SpatialData tables attribute that contains the expression table.

  • recompute_expression (bool, default=False) – Whether to recompute the expression matrix after filtering. Note that this can be computationally expensive for large datasets.

  • inplace (bool, default=True) – Whether to modify the SpatialData object in place. Defaults to True.

Returns:

The updated SpatialData object with invalid transcripts marked (in an extra column).

Return type:

sd.SpatialData

run_baseline(inplace: bool = True, *, morphological_kwargs: dict | None = None)#

Run baseline (bl) metrics.

Convenience wrapper around global and per-cell summary metrics. Runs, in order:

  1. number of cells

  2. number of transcripts

  3. number of genes

  4. % unassigned transcripts

  5. % unassigned transcripts per gene

  6. transcripts per cell

  7. genes per cell

  8. mean transcripts per detected gene per cell

  9. morphological features

  10. transcript density

Parameters:
  • inplace (bool, default=True) – If True, results are merged into .uns, .obs, and/or .var as implemented by each metric, and None is returned. If False, per-metric results are returned in a dict.

  • morphological_kwargs (dict or None, optional) – Extra arguments forwarded to bl.morphological_features().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - "num_cells" - "num_transcripts" - "num_genes" - "perc_unassigned_transcripts" - "perc_unassigned_transcripts_per_gene" - "transcripts_per_cell" - "genes_per_cell" - "mean_transcripts_per_gene_per_cell" - "morphological_features" - "transcript_density"

Return type:

None or dict

run_clustering_stability(key_prefix: str = 'leiden_subset', inplace: bool = True, connectedness_kwargs: dict | None = None, silhouette_kwargs: dict | None = None, purity_kwargs: dict | None = None, ari_kwargs: dict | None = None)#

Run clustering-stability metrics.

This method is a convenience wrapper around the clustering-stability (cs) functions. It runs, in order:

  1. cluster connectedness

  2. silhouette score

  3. purity (subset stability)

  4. ARI (subset stability)

Only parameters shared by all four computations are exposed explicitly. All other parameters are provided via method-specific *_kwargs dictionaries.

Parameters:
  • key_prefix (str, default="leiden_subset") – Prefix for Leiden clustering labels written to .obs by the underlying methods (where applicable).

  • inplace (bool, default=True) – If True, metrics are written to sdata.tables[“table”].uns by the underlying methods and this function returns None. If False, the computed metrics are returned as a dictionary.

  • connectedness_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_cluster_connectedness().

  • silhouette_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_silhouette_score().

  • purity_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_purity().

  • ari_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_ari().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys:

  • "cluster_connectedness" : float

  • "silhouette_score" : float

  • "mean_purity" : float

  • "mean_ari" : float

Return type:

None or dict

run_label_transfer(adata_ref=<class 'anndata._core.anndata.AnnData'>, tx_min: float = 10.0, tx_max: float = 2000.0, gn_min: float = 5.0, gn_max: float = inf, cell_type_key: str = 'transferred_cell_type', ref_cell_type: str = 'cell_type', ref_ensemble_key: str | None = None, query_ensemble_key: str | None = 'gene_ids', inplace: bool = True)#

Transfer cell labels from a reference AnnData to sdata.tables[tables_key] by Pearson correlation to reference mean profiles.

Parameters:
  • sdata (SpatialData-like) – Container with .tables[tables_key] as AnnData, and points needed for QC if absent. sdata.tables[tables_key].X values are ideally normalized and log1p transformed. Otherwise transformation will be performed before running label transfer.

  • adata_ref (AnnData) – Reference dataset (ideally normalized & log1p). Otherwise transformation will be performed before running label transfer.

  • ref_cell_type (str) – Column in adata_ref.obs with reference cell types.

  • tables_key (str) – Key of the AnnData table in sdata.tables.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).

  • points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).

  • points_gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).

  • tx_min (float) – Min/max transcripts per cell for pre-filtering.

  • tx_max (float) – Min/max transcripts per cell for pre-filtering.

  • gn_min (float) – Min/max genes per cell for pre-filtering.

  • gn_max (float) – Min/max genes per cell for pre-filtering.

  • cell_type_key (str) – Column name to store transferred labels in .obs when inplace=True.

  • ref_ensemble_key (str or None, default=None) – Column name in adata_ref.var that contains unique gene/ensemble IDs. If None, adata_ref.var_names will be used.

  • query_ensemble_key (str or None, default="gene_ids") – Column name in self.sdata.tables[self.tables_key].var that contains unique gene/ensemble IDs. If None, self.sdata.tables[self.tables_key].var_names will be used.

  • q_gene_key (str)

  • inplace (bool) – If True, writes labels into sdata.tables[tables_key].obs and returns None. If False, returns a DataFrame with [‘cell_id’, ‘transferred_cell_type’, ‘pearson_score’].

Returns:

None when inplace=True; otherwise a DataFrame of assignments.

Return type:

None or pd.DataFrame

run_point_statistics(genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, inplace: bool = True, *, centroid_kwargs: dict | None = None, membrane_kwargs: dict | None = None, skew_kwargs: dict | None = None, compartments_kwargs: dict | None = None)#

Run point-statistics (ps) metrics.

Convenience wrapper around point-level spatial statistics. Applies shared transcript and cell filtering (by gene(s) and cell type) and runs, in order:

  1. percentage of transcripts in compartments (nucleus overlap, cytoplasm, outside)

  2. distance to centroid (cell or nucleus)

  3. distance to membrane (cell or nucleus)

  4. membrane-distance skewness

Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific *_kwargs dictionaries.

Parameters:
  • genes (str | list[str] | None, optional) – Gene(s) to include. If None, all genes are used.

  • cell_type_key (str, default="transferred_cell_type") – Cell-type annotation key in sdata.tables[…].obs.

  • cell_type_query (str | list[str] | None, optional) – Restrict computations to cells matching these label(s).

  • inplace (bool, default=True) – If True, results are merged into .obs and None is returned. If False, per-metric results are returned.

  • centroid_kwargs (dict or None, optional) – Extra arguments for ps.distance_to_centroid().

  • membrane_kwargs (dict or None, optional) – Extra arguments for ps.distance_to_membrane().

  • skew_kwargs (dict or None, optional) – Extra arguments for ps.membrane_distance_skewness().

  • compartments_kwargs (dict or None, optional) – Extra arguments for ps.percentage_transcripts_in_compartments().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - "percentage_transcripts_in_compartments" - "distance_to_centroid" - "distance_to_membrane" - "membrane_distance_skewness"

Return type:

None or dict

run_region_similarity(metric: str = 'cosine_sim', n_jobs: int = -1, inplace: bool = True, iou_kwargs: dict = None, similarity_nucleus_cell_kwargs: dict = None, similarity_nucleus_cytoplasm_kwargs: dict = None, similarity_border_neighborhood_kwargs: dict = None)#

Compute region similarity metrics and optionally merge them into the cell table.

This runs, in order: 1) IoU between each cell and its best-matching nucleus 2) Similarity between per-cell expression and its matched nucleus 3) Similarity between the cell’s nucleus vs. cytoplasm expression 4) Similarity of gene expression in an eroded interior (“center”) and

a thin outer shell (“border”), and (2) comparing the border with the neighborhood.

Parameters:
  • metric (str, default="cosine_sim")

  • n_jobs (int, default=-1)

  • inplace (bool, default=True) – If True, writes results into sdata.tables[tables_key].obs and returns None. If False, returns a dictionary of DataFrames without writing.

  • iou_kwargs (dict, optional) – Additional keyword arguments to pass to match_nuclei_to_cells.

  • similarity_nucleus_cell_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_nucleus_cell.

  • similarity_nucleus_cytoplasm_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_nucleus_cytoplasm.

  • similarity_border_neighborhood_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_border_neighborhood.

Returns:

  • None or dict

  • - If `inplace=True` (returns None after writing to sdata.)

  • - If `inplace=False` (returns a dict with keys:)

  • * “ious” (pd.DataFrame)

  • * “similarity_nucleus_cell” (pd.DataFrame)

  • * “similarity_nucleus_cytoplasm” (pd.DataFrame)

  • * “similarity_border_neighborhood” (pd.DataFrame)

Notes

  • Requires self.nucleus_shapes_key (nucleus boundaries).

run_supervised(*, markers: dict[str, dict[str, list[str]]], cell_type_key: str = 'transferred_cell_type', inplace: bool = True, purity_kwargs: dict | None = None, contamination_kwargs: dict | None = None, mecr_kwargs: dict | None = None)#

Run supervised (sp) metrics.

Convenience wrapper around supervised marker-based QC metrics. Runs, in order:

  1. marker_purity (per-cell precision/recall/F1, neighborhood-aware negatives)

  2. neighbor_contamination (per-cell + directed type-type matrices)

  3. mutually_exclusive_coexpression_rate (MECR)

Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific *_kwargs dictionaries.

Parameters:
  • markers (dict) – {cell_type: {“positive”: list[str], “negative”: list[str]}}.

  • cell_type_key (str, default="transferred_cell_type") – Column in the AnnData .obs with cell-type labels.

  • inplace (bool, default=True) – If True, writes results into .obs / .uns / .uns[…] as implemented by the underlying functions and returns None. If False, returns all results as a dict.

  • purity_kwargs (dict or None, optional) – Extra args for sp.marker_purity(). (e.g. use_quantiles=…, weight_cont=…, require_neighbor_expression=…, neighbors_key=…)

  • contamination_kwargs (dict or None, optional) – Extra args for sp.neighbor_contamination(). (e.g. require_neighbor_expression=…, neighbors_key=…, uns_key=…, uns_key_binary=…)

  • mecr_kwargs (dict or None, optional) – Extra args for sp.mutually_exclusive_coexpression_rate(). (e.g. pseudocount=…)

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - "marker_purity" (pd.DataFrame) - "neighbor_contamination" (dict with per-cell + matrices) - "mutually_exclusive_coexpression_rate" (pd.DataFrame)

Return type:

None or dict

run_volume_metrics(*, vsi_map: ndarray | None = None, inplace: bool = True, similarity_kwargs: dict | None = None, heterotypic_overlap_kwargs: dict | None = None, vsi_kwargs: dict | None = None)#

Run volume-layer (vl) metrics.

Convenience wrapper around segtraq.vl functions via the instance facade self.vl. Runs, in order:

  1. similarity_top_bottom

  2. fraction_heterotypic_overlap

  3. vertical_signal_integrity_per_cell (only if vsi_map is provided)

Parameters:
  • vsi_map (np.ndarray or None, optional) – Precomputed 2D VSI map required for vertical_signal_integrity_per_cell. If None, VSI will be skipped.

  • inplace (bool, default=True) – If True, metrics are written into sdata.tables[tables_key].obs by the underlying methods and this function returns None. If False, returns a dict of result DataFrames.

  • similarity_kwargs (dict or None, optional) – Additional keyword arguments forwarded to vl.similarity_top_bottom().

  • heterotypic_overlap_kwargs (dict or None, optional) – Additional keyword arguments forwarded to vl.fraction_heterotypic_overlap().

  • vsi_kwargs (dict or None, optional) – Additional keyword arguments forwarded to vl.vertical_signal_integrity_per_cell().

Returns:

If inplace=True, returns None.

If inplace=False, returns a dict with keys:

  • ”similarity_top_bottom”: pd.DataFrame

  • ”fraction_heterotypic_overlap”: pd.DataFrame

  • ”vertical_signal_integrity_per_cell”: pd.DataFrame (only if vsi_map is not None)

Return type:

None or dict[str, object]

property sdata#

Underlying SpatialData object (modifiable).