The SegTraQ (SegTraQ) class#
The SegTraQ (SegTraQ) class represents the core interface for computing SegTraQ metrics.
SegTraQ class#
- class segtraq.SegTraQ(sdata: SpatialData, images_key: str | None = 'morphology_focus', tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str | None = 'cell_area', tables_centroid_x_key: str | None = 'x_centroid', tables_centroid_y_key: str | None = 'y_centroid', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', points_z_key: str | None = 'z', points_gene_key: str = 'feature_name', shapes_key: str = 'cell_boundaries', shapes_cell_id_key: str = 'cell_id', nucleus_shapes_key: str | None = 'nucleus_boundaries', nucleus_shapes_cell_id_key: str = 'cell_id')#
Bases:
objectInitialize a SegTraQ object, the core interface for computing SegTraQ metrics. Defaults target 10x Genomics Xenium; override keys for other technologies.
- Parameters:
sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).
images_key (str or None, optional, default="morphology_focus") – Key in sdata.images for a nuclear or morphology image (e.g., DAPI). Used for visualization or to derive a nucleus mask via segtraq.run_cellpose when using the nuclear correlation module (segtraq.nc). If None, no image is expected.
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table. Gene names in sdata.tables[tables_key].var.index should match the gene field in sdata.points[points_key] (see points_gene_key).
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
tables_area_key (str or None, optional, default="cell_area") – Column in the cell table with cell area (2D). If None, area will be computed via segtraq.bl.morphological_features.
tables_centroid_x_key (str or None, optional, default="x_centroid") – Column in the cell table with the x-coordinate of the cell centroid.
tables_centroid_y_key (str or None, optional, default="y_centroid") – Column in the cell table with the y-coordinate of the cell centroid.
points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.
points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.
points_z_key (str or None, optional, default="z") – Column for the z-coordinate (3D data). If None, data are treated as 2D.
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.
shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.
shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).
nucleus_shapes_key (str or None, optional, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available. If None, a nucleus mask can be obtained via segtraq.run_cellpose.
nucleus_shapes_cell_id_key (str, optional, default="cell_id") – Cell ID key for sdata.shapes[nucleus_shapes_key]. Must match either the shapes index name or a column name (which will be set as the index if needed).
Notes
After initializing a SegTraQ instance, all SegTraQ modules can be run directly from the object using its module facades.
Wrappers (run_baseline, run_nuclear_correlation, etc.) to run all metrics of a module are provided below.
- filter_cells(col: str, func: Callable, inplace: bool = True)#
Filter cells from the cell table based on a user-defined function.
- Parameters:
col (str) – Column in the cell table to apply the filtering function on.
func (Callable) – A function that takes a single argument (the column value) and returns True if the cell should be kept, False otherwise.
inplace (bool, default=True) – If True, modifies self.sdata in place. If False, returns a new SpatialData object with the filtered cells.
- Returns:
If inplace=True: returns None after modifying self.sdata.
If inplace=False: returns a new SpatialData object with filtered cells.
- Return type:
None or SpatialData
Example
>>> st.filter_cells(col='cell_area', func=lambda x: x > 100)
- filter_control_and_low_quality_transcripts(min_qv: float = 20.0, control_genes: tuple | list = (), recompute_expression: bool = False, inplace: bool = True)#
Filter control and low-quality transcripts from the SpatialData object. This is always done in place.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing transcript data.
min_qv (float | None, default=20.0) – Minimum quality value (qv) threshold for transcripts to be considered valid. If None, no filtering is applied based on quality.
control_genes (tuple | list, default=()) – Additional keywords to identify control probes in gene names. By default, only standard control prefixes are used. These are: “NegControlProbe_”, “antisense_”, “NegControlCodeword”, “BLANK_”, “Blank-”, “NegPrb”, “DeprecatedCodeword_”, “UnassignedCodeword_”.
points_key (str, default="transcripts") – The key in the SpatialData points attribute that contains transcript data.
points_gene_key (str, default="feature_name") – The column name in the points DataFrame that contains gene names.
points_cell_id_key (str, default="cell_id") – The column name in the points DataFrame that contains cell IDs.
tables_key (str, default="table") – The key in the SpatialData tables attribute that contains the expression table.
recompute_expression (bool, default=False) – Whether to recompute the expression matrix after filtering. Note that this can be computationally expensive for large datasets.
inplace (bool, default=True) – Whether to modify the SpatialData object in place. Defaults to True.
- Returns:
The updated SpatialData object with invalid transcripts marked (in an extra column).
- Return type:
sd.SpatialData
- run_baseline(inplace: bool = True, *, morphological_kwargs: dict | None = None)#
Run baseline (bl) metrics.
Convenience wrapper around global and per-cell summary metrics. Runs, in order:
number of cells
number of transcripts
number of genes
% unassigned transcripts
% unassigned transcripts per gene
transcripts per cell
genes per cell
mean transcripts per detected gene per cell
morphological features
transcript density
- Parameters:
inplace (bool, default=True) – If True, results are merged into .uns, .obs, and/or .var as implemented by each metric, and None is returned. If False, per-metric results are returned in a dict.
morphological_kwargs (dict or None, optional) – Extra arguments forwarded to
bl.morphological_features().
- Returns:
If
inplace=True, returns None. Ifinplace=False, returns a dict with keys: -"num_cells"-"num_transcripts"-"num_genes"-"perc_unassigned_transcripts"-"perc_unassigned_transcripts_per_gene"-"transcripts_per_cell"-"genes_per_cell"-"mean_transcripts_per_gene_per_cell"-"morphological_features"-"transcript_density"- Return type:
None or dict
- run_clustering_stability(key_prefix: str = 'leiden_subset', inplace: bool = True, connectedness_kwargs: dict | None = None, silhouette_kwargs: dict | None = None, purity_kwargs: dict | None = None, ari_kwargs: dict | None = None)#
Run clustering-stability metrics.
This method is a convenience wrapper around the clustering-stability (cs) functions. It runs, in order:
cluster connectedness
silhouette score
purity (subset stability)
ARI (subset stability)
Only parameters shared by all four computations are exposed explicitly. All other parameters are provided via method-specific
*_kwargsdictionaries.- Parameters:
key_prefix (str, default="leiden_subset") – Prefix for Leiden clustering labels written to .obs by the underlying methods (where applicable).
inplace (bool, default=True) – If True, metrics are written to sdata.tables[“table”].uns by the underlying methods and this function returns None. If False, the computed metrics are returned as a dictionary.
connectedness_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_cluster_connectedness().silhouette_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_silhouette_score().purity_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_purity().ari_kwargs (dict or None, optional) – Additonal keyword arguments forwarded to
cs.compute_ari().
- Returns:
If inplace=True, returns None. If inplace=False, returns a dict with keys:
"cluster_connectedness": float"silhouette_score": float"mean_purity": float"mean_ari": float
- Return type:
None or dict
- run_label_transfer(adata_ref=<class 'anndata._core.anndata.AnnData'>, tx_min: float = 10.0, tx_max: float = 2000.0, gn_min: float = 5.0, gn_max: float = inf, cell_type_key: str = 'transferred_cell_type', ref_cell_type: str = 'cell_type', ref_ensemble_key: str | None = None, query_ensemble_key: str | None = 'gene_ids', inplace: bool = True)#
Transfer cell labels from a reference AnnData to sdata.tables[tables_key] by Pearson correlation to reference mean profiles.
- Parameters:
sdata (SpatialData-like) – Container with .tables[tables_key] as AnnData, and points needed for QC if absent. sdata.tables[tables_key].X values are ideally normalized and log1p transformed. Otherwise transformation will be performed before running label transfer.
adata_ref (AnnData) – Reference dataset (ideally normalized & log1p). Otherwise transformation will be performed before running label transfer.
ref_cell_type (str) – Column in adata_ref.obs with reference cell types.
tables_key (str) – Key of the AnnData table in sdata.tables.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
points_gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).
tx_min (float) – Min/max transcripts per cell for pre-filtering.
tx_max (float) – Min/max transcripts per cell for pre-filtering.
gn_min (float) – Min/max genes per cell for pre-filtering.
gn_max (float) – Min/max genes per cell for pre-filtering.
cell_type_key (str) – Column name to store transferred labels in .obs when inplace=True.
ref_ensemble_key (str or None, default=None) – Column name in adata_ref.var that contains unique gene/ensemble IDs. If None, adata_ref.var_names will be used.
query_ensemble_key (str or None, default="gene_ids") – Column name in self.sdata.tables[self.tables_key].var that contains unique gene/ensemble IDs. If None, self.sdata.tables[self.tables_key].var_names will be used.
q_gene_key (str)
inplace (bool) – If True, writes labels into sdata.tables[tables_key].obs and returns None. If False, returns a DataFrame with [‘cell_id’, ‘transferred_cell_type’, ‘pearson_score’].
- Returns:
None when inplace=True; otherwise a DataFrame of assignments.
- Return type:
None or pd.DataFrame
- run_point_statistics(genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, inplace: bool = True, *, centroid_kwargs: dict | None = None, membrane_kwargs: dict | None = None, skew_kwargs: dict | None = None, compartments_kwargs: dict | None = None)#
Run point-statistics (ps) metrics.
Convenience wrapper around point-level spatial statistics. Applies shared transcript and cell filtering (by gene(s) and cell type) and runs, in order:
percentage of transcripts in compartments (nucleus overlap, cytoplasm, outside)
distance to centroid (cell or nucleus)
distance to membrane (cell or nucleus)
membrane-distance skewness
Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific
*_kwargsdictionaries.- Parameters:
genes (str | list[str] | None, optional) – Gene(s) to include. If None, all genes are used.
cell_type_key (str, default="transferred_cell_type") – Cell-type annotation key in sdata.tables[…].obs.
cell_type_query (str | list[str] | None, optional) – Restrict computations to cells matching these label(s).
inplace (bool, default=True) – If True, results are merged into .obs and None is returned. If False, per-metric results are returned.
centroid_kwargs (dict or None, optional) – Extra arguments for
ps.distance_to_centroid().membrane_kwargs (dict or None, optional) – Extra arguments for
ps.distance_to_membrane().skew_kwargs (dict or None, optional) – Extra arguments for
ps.membrane_distance_skewness().compartments_kwargs (dict or None, optional) – Extra arguments for
ps.percentage_transcripts_in_compartments().
- Returns:
If
inplace=True, returns None. Ifinplace=False, returns a dict with keys: -"percentage_transcripts_in_compartments"-"distance_to_centroid"-"distance_to_membrane"-"membrane_distance_skewness"- Return type:
None or dict
- run_region_similarity(metric: str = 'cosine_sim', n_jobs: int = -1, inplace: bool = True, iou_kwargs: dict = None, similarity_nucleus_cell_kwargs: dict = None, similarity_nucleus_cytoplasm_kwargs: dict = None, similarity_border_neighborhood_kwargs: dict = None)#
Compute region similarity metrics and optionally merge them into the cell table.
This runs, in order: 1) IoU between each cell and its best-matching nucleus 2) Similarity between per-cell expression and its matched nucleus 3) Similarity between the cell’s nucleus vs. cytoplasm expression 4) Similarity of gene expression in an eroded interior (“center”) and
a thin outer shell (“border”), and (2) comparing the border with the neighborhood.
- Parameters:
metric (str, default="cosine_sim")
n_jobs (int, default=-1)
inplace (bool, default=True) – If True, writes results into sdata.tables[tables_key].obs and returns None. If False, returns a dictionary of DataFrames without writing.
iou_kwargs (dict, optional) – Additional keyword arguments to pass to match_nuclei_to_cells.
similarity_nucleus_cell_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_nucleus_cell.
similarity_nucleus_cytoplasm_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_nucleus_cytoplasm.
similarity_border_neighborhood_kwargs (dict, optional) – Additional keyword arguments to pass to similarity_border_neighborhood.
- Returns:
None or dict
- If `inplace=True` (returns None after writing to sdata.)
- If `inplace=False` (returns a dict with keys:)
* “ious” (pd.DataFrame)
* “similarity_nucleus_cell” (pd.DataFrame)
* “similarity_nucleus_cytoplasm” (pd.DataFrame)
* “similarity_border_neighborhood” (pd.DataFrame)
Notes
Requires self.nucleus_shapes_key (nucleus boundaries).
- run_supervised(*, markers: dict[str, dict[str, list[str]]], cell_type_key: str = 'transferred_cell_type', inplace: bool = True, purity_kwargs: dict | None = None, contamination_kwargs: dict | None = None, mecr_kwargs: dict | None = None)#
Run supervised (sp) metrics.
Convenience wrapper around supervised marker-based QC metrics. Runs, in order:
marker_purity (per-cell precision/recall/F1, neighborhood-aware negatives)
neighbor_contamination (per-cell + directed type-type matrices)
mutually_exclusive_coexpression_rate (MECR)
Only parameters shared by all computations are exposed explicitly. All other parameters are forwarded via method-specific
*_kwargsdictionaries.- Parameters:
markers (dict) – {cell_type: {“positive”: list[str], “negative”: list[str]}}.
cell_type_key (str, default="transferred_cell_type") – Column in the AnnData .obs with cell-type labels.
inplace (bool, default=True) – If True, writes results into .obs / .uns / .uns[…] as implemented by the underlying functions and returns None. If False, returns all results as a dict.
purity_kwargs (dict or None, optional) – Extra args for
sp.marker_purity(). (e.g. use_quantiles=…, weight_cont=…, require_neighbor_expression=…, neighbors_key=…)contamination_kwargs (dict or None, optional) – Extra args for
sp.neighbor_contamination(). (e.g. require_neighbor_expression=…, neighbors_key=…, uns_key=…, uns_key_binary=…)mecr_kwargs (dict or None, optional) – Extra args for
sp.mutually_exclusive_coexpression_rate(). (e.g. pseudocount=…)
- Returns:
If
inplace=True, returns None. Ifinplace=False, returns a dict with keys: -"marker_purity"(pd.DataFrame) -"neighbor_contamination"(dict with per-cell + matrices) -"mutually_exclusive_coexpression_rate"(pd.DataFrame)- Return type:
None or dict
- run_volume_metrics(*, vsi_map: ndarray | None = None, inplace: bool = True, similarity_kwargs: dict | None = None, heterotypic_overlap_kwargs: dict | None = None, vsi_kwargs: dict | None = None)#
Run volume-layer (vl) metrics.
Convenience wrapper around segtraq.vl functions via the instance facade self.vl. Runs, in order:
similarity_top_bottom
fraction_heterotypic_overlap
vertical_signal_integrity_per_cell (only if vsi_map is provided)
- Parameters:
vsi_map (np.ndarray or None, optional) – Precomputed 2D VSI map required for vertical_signal_integrity_per_cell. If None, VSI will be skipped.
inplace (bool, default=True) – If True, metrics are written into sdata.tables[tables_key].obs by the underlying methods and this function returns None. If False, returns a dict of result DataFrames.
similarity_kwargs (dict or None, optional) – Additional keyword arguments forwarded to
vl.similarity_top_bottom().heterotypic_overlap_kwargs (dict or None, optional) – Additional keyword arguments forwarded to
vl.fraction_heterotypic_overlap().vsi_kwargs (dict or None, optional) – Additional keyword arguments forwarded to
vl.vertical_signal_integrity_per_cell().
- Returns:
If inplace=True, returns None.
If inplace=False, returns a dict with keys:
”similarity_top_bottom”: pd.DataFrame
”fraction_heterotypic_overlap”: pd.DataFrame
”vertical_signal_integrity_per_cell”: pd.DataFrame (only if vsi_map is not None)
- Return type:
None or dict[str, object]
- property sdata#
Underlying SpatialData object (modifiable).