The SegTraQ (`SegTraQ`) class#

The SegTraQ (SegTraQ) class represents the core interface for computing SegTraQ metrics.

SegTraQ class#

class segtraq.SegTraQ(sdata: SpatialData, images_key: str | None = 'morphology_focus', tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str | None = 'cell_area', tables_centroid_x_key: str | None = 'x_centroid', tables_centroid_y_key: str | None = 'y_centroid', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', points_z_key: str | None = 'z', points_gene_key: str = 'feature_name', shapes_key: str = 'cell_boundaries', shapes_cell_id_key: str = 'cell_id', nucleus_shapes_key: str | None = 'nucleus_boundaries', nucleus_shapes_cell_id_key: str = 'cell_id')#

Bases: object

Initialize a SegTraQ object, the core interface for computing SegTraQ metrics. Defaults target 10x Genomics Xenium; override keys for other technologies.

Parameters:

sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).
images_key (str or None, optional, default="morphology_focus") – Key in sdata.images for a nuclear or morphology image (e.g., DAPI). Used for visualization or to derive a nucleus mask via segtraq.run_cellpose when using the nuclear correlation module (segtraq.nc). If None, no image is expected.
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table. Gene names in sdata.tables[tables_key].var.index should match the gene field in sdata.points[points_key] (see points_gene_key).
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
tables_area_key (str or None, optional, default="cell_area") – Column in the cell table with cell area (2D). If None, area will be computed via segtraq.bl.morphological_features.
tables_centroid_x_key (str or None, optional, default="x_centroid") – Column in the cell table with the x-coordinate of the cell centroid.
tables_centroid_y_key (str or None, optional, default="y_centroid") – Column in the cell table with the y-coordinate of the cell centroid.
points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.
points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.
points_z_key (str or None, optional, default="z") – Column for the z-coordinate (3D data). If None, data are treated as 2D.
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.
shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.
shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).
nucleus_shapes_key (str or None, optional, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available. If None, a nucleus mask can be obtained via segtraq.run_cellpose.
nucleus_shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[nucleus_shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).

Notes

After initializing a SegTraQ instance, all SegTraQ modules can be run directly from the object using its module facades.

Wrappers (run_baseline, run_nuclear_correlation, etc.) to run all metrics of a module are provided below.

filter_cells(col: str, func: Callable, inplace: bool = True)#

Filter cells from the cell table based on a user-defined function.

Parameters:

col (str) – Column in the cell table to apply the filtering function on.
func (Callable) – A function that takes a single argument (the column value) and returns True if the cell should be kept, False otherwise.
inplace (bool, default=True) – If True, modifies self.sdata in place. If False, returns a new SpatialData object with the filtered cells.

Returns:

If inplace=True: returns None after modifying self.sdata.
If inplace=False: returns a new SpatialData object with filtered cells.

Return type:

None or SpatialData

Example

>>> st.filter_cells(col='cell_area', func=lambda x: x > 100)

filter_control_and_low_quality_transcripts(min_qv: float = 20.0, control_genes: tuple | list = (), recompute_expression: bool = False, inplace: bool = True)#

Filter control and low-quality transcripts from the SpatialData object. This is always done in place.

Parameters:

sdata (sd.SpatialData) – The SpatialData object containing transcript data.
min_qv (float | None, default=20.0) – Minimum quality value (qv) threshold for transcripts to be considered valid. If None, no filtering is applied based on quality.
control_genes (tuple | list, default=()) – Additional keywords to identify control probes in gene names. By default, only standard control prefixes are used. These are: “NegControlProbe_”, “antisense_”, “NegControlCodeword”, “BLANK_”, “Blank-”, “NegPrb”, “DeprecatedCodeword_”, “UnassignedCodeword_”.
points_key (str, default="transcripts") – The key in the SpatialData points attribute that contains transcript data.
points_gene_key (str, default="feature_name") – The column name in the points DataFrame that contains gene names.
points_cell_id_key (str, default="cell_id") – The column name in the points DataFrame that contains cell IDs.
tables_key (str, default="table") – The key in the SpatialData tables attribute that contains the expression table.
recompute_expression (bool, default=False) – Whether to recompute the expression matrix after filtering. Note that this can be computationally expensive for large datasets.
inplace (bool, default=True) – Whether to modify the SpatialData object in place. Defaults to True.

Returns:

The updated SpatialData object with invalid transcripts marked (in an extra column).

Return type:

sd.SpatialData

run_baseline(inplace: bool = True, *, morphological_kwargs: dict | None = None)#

Run baseline (bl) metrics.

Convenience wrapper around global and per-cell summary metrics. Runs, in order:

number of cells
number of transcripts
number of genes
% unassigned transcripts
% unassigned transcripts per gene
transcripts per cell
genes per cell
mean transcripts per detected gene per cell
morphological features
transcript density

Parameters:

inplace (bool, default=True) – If True, results are merged into .uns, .obs, and/or .var as implemented by each metric, and None is returned. If False, per-metric results are returned in a dict.
morphological_kwargs (dict or None, optional) – Extra arguments forwarded to bl.morphological_features().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - "num_cells" - "num_transcripts" - "num_genes" - "perc_unassigned_transcripts" - "perc_unassigned_transcripts_per_gene" - "transcripts_per_cell" - "genes_per_cell" - "mean_transcripts_per_gene_per_cell" - "morphological_features" - "transcript_density"

Return type:

None or dict

run_clustering_stability(key_prefix: str = 'leiden_subset', inplace: bool = True, connectedness_kwargs: dict | None = None, silhouette_kwargs: dict | None = None, purity_kwargs: dict | None = None, ari_kwargs: dict | None = None)#

Run clustering-stability metrics.

This method is a convenience wrapper around the clustering-stability (cs) functions. It runs, in order:

cluster connectedness
silhouette score
purity (subset stability)
ARI (subset stability)

Only parameters shared by all four computations are exposed explicitly. All other parameters are provided via method-specific *_kwargs dictionaries.

Parameters:

key_prefix (str, default="leiden_subset") – Prefix for Leiden clustering labels written to .obs by the underlying methods (where applicable).
inplace (bool, default=True) – If True, metrics are written to sdata.tables[“table”].uns by the underlying methods and this function returns None. If False, the computed metrics are returned as a dictionary.
connectedness_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_cluster_connectedness().
silhouette_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_silhouette_score().
purity_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_purity().
ari_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to cs.compute_ari().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys:

"cluster_connectedness" : float
"silhouette_score" : float
"mean_purity" : float
"mean_ari" : float

Return type:

None or dict

run_label_transfer(adata_ref=<class 'anndata._core.anndata.AnnData'>, tx_min: float = 10.0, tx_max: float = 2000.0, gn_min: float = 5.0, gn_max: float = inf, cell_type_key: str = 'transferred_cell_type', ref_cell_type: str = 'cell_type', ref_ensemble_key: str | None = None, query_ensemble_key: str | None = 'gene_ids', inplace: bool = True)#

Transfer cell labels from a reference AnnData to sdata.tables[tables_key] by Pearson correlation to reference mean profiles.

Parameters:

sdata (SpatialData-like) – Container with .tables[tables_key] as AnnData, and points needed for QC if absent. sdata.tables[tables_key].X values are ideally normalized and log1p transformed. Otherwise transformation will be performed before running label transfer.
adata_ref (AnnData) – Reference dataset (ideally normalized & log1p). Otherwise transformation will be performed before running label transfer.
ref_cell_type (str) – Column in adata_ref.obs with reference cell types.
tables_key (str) – Key of the AnnData table in sdata.tables.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
points_gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).
tx_min (float) – Min/max transcripts per cell for pre-filtering.
tx_max (float) – Min/max transcripts per cell for pre-filtering.
gn_min (float) – Min/max genes per cell for pre-filtering.
gn_max (float) – Min/max genes per cell for pre-filtering.
cell_type_key (str) – Column name to store transferred labels in .obs when inplace=True.
ref_ensemble_key (str or None, default=None) – Column name in adata_ref.var that contains unique gene/ensemble IDs. If None, adata_ref.var_names will be used.
query_ensemble_key (str or None, default="gene_ids") – Column name in self.sdata.tables[self.tables_key].var that contains unique gene/ensemble IDs. If None, self.sdata.tables[self.tables_key].var_names will be used.
q_gene_key (str)
inplace (bool) – If True, writes labels into sdata.tables[tables_key].obs and returns None. If False, returns a DataFrame with [‘cell_id’, ‘transferred_cell_type’, ‘pearson_score’].

Returns:

None when inplace=True; otherwise a DataFrame of assignments.

Return type:

None or pd.DataFrame

Run point-statistics (ps) metrics.

Convenience wrapper around point-level spatial statistics. Applies shared transcript and cell filtering (by gene(s) and cell type) and runs, in order:

percentage of transcripts in compartments (nucleus overlap, cytoplasm, outside)
distance to centroid (cell or nucleus)
distance to membrane (cell or nucleus)
membrane-distance skewness

Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific *_kwargs dictionaries.

Parameters:

genes (str | list[str] | None, optional) – Gene(s) to include. If None, all genes are used.
cell_type_key (str, default="transferred_cell_type") – Cell-type annotation key in sdata.tables[…].obs.
cell_type_query (str | list[str] | None, optional) – Restrict computations to cells matching these label(s).
inplace (bool, default=True) – If True, results are merged into .obs and None is returned. If False, per-metric results are returned.
centroid_kwargs (dict or None, optional) – Extra arguments for ps.distance_to_centroid().
membrane_kwargs (dict or None, optional) – Extra arguments for ps.distance_to_membrane().
skew_kwargs (dict or None, optional) – Extra arguments for ps.membrane_distance_skewness().
compartments_kwargs (dict or None, optional) – Extra arguments for ps.percentage_transcripts_in_compartments().

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - "percentage_transcripts_in_compartments" - "distance_to_centroid" - "distance_to_membrane" - "membrane_distance_skewness"

Return type:

None or dict

run_region_similarity(metric: str = 'cosine_sim', n_jobs: int = -1, inplace: bool = True, iou_kwargs: dict = None, similarity_nucleus_cell_kwargs: dict = None, similarity_nucleus_cytoplasm_kwargs: dict = None, similarity_border_neighborhood_kwargs: dict = None)#

Compute region similarity metrics and optionally merge them into the cell table.

This runs, in order: 1) IoU between each cell and its best-matching nucleus 2) Similarity between per-cell expression and its matched nucleus 3) Similarity between the cell’s nucleus vs. cytoplasm expression 4) Similarity of gene expression in an eroded interior (“center”) and

a thin outer shell (“border”), and (2) comparing the border with the neighborhood.

Parameters:

metric (str, default="cosine_sim")
n_jobs (int, default=-1)
inplace (bool, default=True) – If True, writes results into sdata.tables[tables_key].obs and returns None. If False, returns a dictionary of DataFrames without writing.
iou_kwargs (dict, optional) – Additional keyword arguments to pass to match_nuclei_to_cells.
similarity_nucleus_cell_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_nucleus_cell.
similarity_nucleus_cytoplasm_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_nucleus_cytoplasm.
similarity_border_neighborhood_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_border_neighborhood.

Returns:

None or dict
- If `inplace=True` (returns None after writing to sdata.)
- If `inplace=False` (returns a dict with keys:)
* “ious” (pd.DataFrame)
* “similarity_nucleus_cell” (pd.DataFrame)
* “similarity_nucleus_cytoplasm” (pd.DataFrame)
* “similarity_border_neighborhood” (pd.DataFrame)

Notes

Requires self.nucleus_shapes_key (nucleus boundaries).

run_supervised(*, markers: dict[str, dict[str, list[str]]], cell_type_key: str = 'transferred_cell_type', inplace: bool = True, purity_kwargs: dict | None = None, contamination_kwargs: dict | None = None, mecr_kwargs: dict | None = None)#

Run supervised (sp) metrics.

Convenience wrapper around supervised marker-based QC metrics. Runs, in order:

marker_purity (per-cell precision/recall/F1, neighborhood-aware negatives)
neighbor_contamination (per-cell + directed type-type matrices)
mutually_exclusive_coexpression_rate (MECR)

Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific *_kwargs dictionaries.

Parameters:

markers (dict) – {cell_type: {“positive”: list[str], “negative”: list[str]}}.
cell_type_key (str, default="transferred_cell_type") – Column in the AnnData .obs with cell-type labels.
inplace (bool, default=True) – If True, writes results into .obs / .uns / .uns[…] as implemented by the underlying functions and returns None. If False, returns all results as a dict.
purity_kwargs (dict or None, optional) – Extra args for sp.marker_purity(). (e.g. use_quantiles=…, weight_cont=…, require_neighbor_expression=…, neighbors_key=…)
contamination_kwargs (dict or None, optional) – Extra args for sp.neighbor_contamination(). (e.g. require_neighbor_expression=…, neighbors_key=…, uns_key=…, uns_key_binary=…)
mecr_kwargs (dict or None, optional) – Extra args for sp.mutually_exclusive_coexpression_rate(). (e.g. pseudocount=…)

Returns:

If inplace=True, returns None. If inplace=False, returns a dict with keys: - "marker_purity" (pd.DataFrame) - "neighbor_contamination" (dict with per-cell + matrices) - "mutually_exclusive_coexpression_rate" (pd.DataFrame)

Return type:

None or dict

run_volume_metrics(*, vsi_map: ndarray | None = None, inplace: bool = True, similarity_kwargs: dict | None = None, heterotypic_overlap_kwargs: dict | None = None, vsi_kwargs: dict | None = None)#

Run volume-layer (vl) metrics.

Convenience wrapper around segtraq.vl functions via the instance facade self.vl. Runs, in order:

similarity_top_bottom
fraction_heterotypic_overlap
vertical_signal_integrity_per_cell (only if vsi_map is provided)

Parameters:

vsi_map (np.ndarray or None, optional) – Precomputed 2D VSI map required for vertical_signal_integrity_per_cell. If None, VSI will be skipped.
inplace (bool, default=True) – If True, metrics are written into sdata.tables[tables_key].obs by the underlying methods and this function returns None. If False, returns a dict of result DataFrames.
similarity_kwargs (dict or None, optional) – Additional keyword arguments forwarded to vl.similarity_top_bottom().
heterotypic_overlap_kwargs (dict or None, optional) – Additional keyword arguments forwarded to vl.fraction_heterotypic_overlap().
vsi_kwargs (dict or None, optional) – Additional keyword arguments forwarded to vl.vertical_signal_integrity_per_cell().

Returns:

If inplace=True, returns None.

If inplace=False, returns a dict with keys:

”similarity_top_bottom”: pd.DataFrame
”fraction_heterotypic_overlap”: pd.DataFrame
”vertical_signal_integrity_per_cell”: pd.DataFrame (only if vsi_map is not None)

Return type:

None or dict[str, object]

property sdata#: Underlying SpatialData object (modifiable).

The SegTraQ (SegTraQ) class

Contents

The SegTraQ (`SegTraQ`) class#

SegTraQ class#

The SegTraQ (SegTraQ) class

Contents

The SegTraQ (SegTraQ) class#

SegTraQ class#

The SegTraQ (`SegTraQ`) class#