The point statistics (ps) accessor#

The point statistics metrics accessor provides metrics to compare the mean spatial distribution of transcripts in relation to the centroid and the segmented outline of the cell.

segtraq.ps.point_statistics.distance_to_centroid(sdata: SpatialData, genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str = 'cell_area', points_gene_key: str = 'feature_name', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', shapes_key: str = 'cell_boundaries', nucleus_shapes_key: str | None = 'nucleus_boundaries', centroid_region: Literal['cell', 'nucleus'] = 'cell', restrict_to_within_boundary: bool = False, select_by: Literal['iou', 'nucleus_fraction'] = 'nucleus_fraction', min_intersection_area: float = 0.0, n_jobs: int = 1, inplace: bool = True) DataFrame#

Compute the Euclidean distance between (i) the mean transcript coordinate per cell and (ii) a centroid derived from either cell or nucleus shapes.

If centroid_region=”cell”, distances are measured to the centroid of sdata.shapes[shapes_key]. If centroid_region=”nucleus”, each cell is first matched to a nucleus (see select_by, min_intersection_area) and distances are measured to that nucleus centroid. Optionally, transcripts can be restricted to lie within the cell boundary (restrict_to_within_boundary=True).

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.

  • genes (str | list[str] | None, optional) – String or list of strings indicating the feature/gene(s) to calculate the mean transcript coordiantes on. If None, all genes are used.

  • cell_type_key (str) – Column in sdata.tables[tables_key].obs with cell-type labels.

  • cell_type_query (str | list[str] | None, optional) – If provided, compute the metric only for cells whose cell_type_key matches these label(s).

  • tables_key (str, default="table") – The key to access the AnnData table from sdata.tables. Default is “table”.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • tables_area_key (str, default="cell_area") – Column in the table with cell area (used for normalization).

  • points_gene_key (str, default="feature_name") – The key to access gene names within the transcript data. Default is “feature_name”.

  • points_key (str, default="transcripts") – The key in the transcript table indicating transcript identifiers. Default is “transcripts”.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.

  • points_background_id (str | int, default="UNASSIGNED") – The cell ID value indicating background transcripts that should be ignored.

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.

  • shapes_key (str, default="cell_boundaries") – The key in sdata.shapes specifying the geometry column. Default is “cell_boundaries”.

  • nucleus_shapes_key (str | None, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons (required if centroid_region=”nucleus”).

  • centroid_region ({"cell","nucleus"}, default="cell") – Which shape centroid to use as the reference for distances.

  • restrict_to_within_boundary (bool, default=False) – If True, keep only transcripts that fall within the cell boundary. Uses covers, so points on the boundary are included.

  • select_by ({"iou","nucleus_fraction"}, default="nucleus_fraction") – Criterion to choose the best nucleus for each cell when centroid_region=”nucleus”.

  • min_intersection_area (float, default=0.0) – Minimum overlap area required to consider a nucleus a candidate for a cell.

  • n_jobs (int, default=1) – Number of parallel jobs for cell-nucleus matching (if needed). -1 uses all CPUs.

  • inplace (bool, default=True) – Whether to add the results to sdata.tables. Default is True.

Returns:

If inplace=False, returns a DataFrame containing per-cell mean transcript x/y, the chosen centroid x/y, raw distance, and the normalized distance column distance_<feature>. If inplace=True, returns the merged (two-column) DataFrame used to write into .obs.

Return type:

pd.DataFrame

segtraq.ps.point_statistics.distance_to_membrane(sdata: SpatialData, genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str = 'cell_area', points_gene_key: str = 'feature_name', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', shapes_key: str = 'cell_boundaries', nucleus_shapes_key: str | None = 'nucleus_boundaries', membrane_region: Literal['cell', 'nucleus'] = 'cell', restrict_to_within_boundary: bool = False, select_by: Literal['iou', 'nucleus_fraction'] = 'nucleus_fraction', min_intersection_area: float = 0.0, n_jobs: int = 1, signed: bool = True, inverse_score: bool = True, eps: float = 1e-06, inplace: bool = True) DataFrame#

Compute the mean transcript distance to the boundary (“membrane”) of either the cell or the matched nucleus.

For each transcript, the distance is measured to the boundary of the selected polygon: - membrane_region=”cell”: uses sdata.shapes[shapes_key]. - membrane_region=”nucleus”: matches each cell to a nucleus (see select_by,

min_intersection_area) and uses the boundary of that nucleus for all transcripts assigned to the cell.

Distances can be returned as signed (positive inside/on the polygon, negative outside), optionally restricted to transcripts within the boundary, and aggregated per cell (mean distance). A normalized version divides by sqrt(cell_area) as a length scale.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.

  • genes (str | list[str] | None, optional) – String or list of strings indicating the feature/gene(s) to calculate the mean transcript distances on. If None, all genes are used.

  • cell_type_key (str, default="transferred_cell_type") – Column in sdata.tables[tables_key].obs with cell-type labels.

  • cell_type_query (str | list[str] | None, optional) – If provided, compute the metric only for cells whose cell_type_key matches these label(s).

  • tables_key (str, default="table") – The key to access the AnnData table from sdata.tables. Default is “table”.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • tables_area_key (str, default="cell_area") – Column in the table with cell area (used for normalization).

  • points_gene_key (str, default="feature_name") – The key to access gene names within the transcript data. Default is “feature_name”.

  • points_key (str, default="transcripts") – The key in the transcript table indicating transcript identifiers. Default is “transcripts”.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.

  • points_background_id (str | int, default="UNASSIGNED") – The cell ID value indicating background transcripts that should be ignored.

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.

  • shapes_key (str, default="cell_boundaries") – The key in sdata.shapes specifying the geometry column. Default is “cell_boundaries”.

  • nucleus_shapes_key (str | None, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons (required if membrane_region=”nucleus”).

  • membrane_region ({"cell","nucleus"}, default="cell") – Which boundary to use when computing distances.

  • restrict_to_within_boundary (bool, default=False) – If True, keep only transcripts that fall within the cell boundary (uses covers, so boundary points are included).

  • select_by ({"iou","nucleus_fraction"}, default="nucleus_fraction") – Criterion to choose the best nucleus for each cell when membrane_region=”nucleus”.

  • min_intersection_area (float, default=0.0) – Minimum overlap area required to consider a nucleus a candidate for a cell.

  • n_jobs (int, default=1) – Number of parallel jobs for cell-nucleus matching (if needed). -1 uses all CPUs.

  • signed (bool, default=True) – If True, returns signed distances (positive if transcript is inside/on the polygon, negative if outside). If False, returns unsigned distances to the boundary.

  • inverse_score (bool, default=True) – If True, also computes an inverse-style score that is high when distance is small: 1 / sqrt(abs(distance) + eps).

  • eps (float, default=1e-6) – Small constant for numerical stability in inverse_score.

  • inplace (bool, default=True) – Whether to add the results to sdata.tables. Default is True.

Returns:

If inplace=False, returns a DataFrame with per-cell mean distance columns: - distance_to_{membrane_region}_membrane_norm_<feature> - optionally distance_to_{membrane_region}_membrane_inverse_<feature> If inplace=True, returns the DataFrame that was merged into .obs.

Return type:

pd.DataFrame

segtraq.ps.point_statistics.membrane_distance_skewness(sdata: SpatialData, genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', points_gene_key: str = 'feature_name', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', shapes_key: str = 'cell_boundaries', min_transcripts: int = 5, inplace: bool = True) DataFrame#

Compute per-cell skewness of transcript distances to the cell boundary (membrane), using only transcripts that are geometrically within/on the cell polygon.

The function optionally filters by cell_type_query, selects non-background transcripts assigned to those cells (optionally by genes), keeps transcripts inside or on the cell polygon, computes their distance to the polygon boundary, and aggregates these distances per cell to obtain skewness, returning NaN for mean and skewness when fewer than min_transcripts are available.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.

  • genes (str | list[str] | None, optional) – String or list of strings indicating the feature/gene(s) to calculate the mean transcript distances on. If None, all genes are used.

  • cell_type_key (str, default="transferred_cell_type") – Column in sdata.tables[tables_key].obs with cell-type labels.

  • cell_type_query (str | list[str] | None, optional) – If provided, compute the metric only for cells whose cell_type_key matches these label(s).

  • tables_key (str, default="table") – The key to access the AnnData table from sdata.tables. Default is “table”.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • tables_area_key (str, default="cell_area") – Column in the table with cell area (used for normalization).

  • points_gene_key (str, default="feature_name") – The key to access gene names within the transcript data. Default is “feature_name”.

  • points_key (str, default="transcripts") – The key in the transcript table indicating transcript identifiers. Default is “transcripts”.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.

  • points_background_id (str | int, default="UNASSIGNED") – The cell ID value indicating background transcripts that should be ignored.

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.

  • shapes_key (str, default="cell_boundaries") – The key in sdata.shapes specifying the geometry column. Default is “cell_boundaries”.

  • min_transcripts (int, default=20) – Miinimum number of transcripts required to compute a skewness.

  • inplace (bool, default=True) – Whether to add the results to sdata.tables. Default is True.

Returns:

Per-cell results with columns: - points_cell_id_key - skew_dist_to_{membrane_region}_membrane_<feature>

where <feature> is: - “all_genes” if genes is None - the gene name if genes is a single string - “<k>_genes” if genes is a list of length k (k>1)

Return type:

pd.DataFrame

Raises:

ValueError – If no transcripts remain after filtering/joining/within-cell restriction.

segtraq.ps.point_statistics.percentage_transcripts_in_compartments(sdata: SpatialData, genes: str | list[str] | None = None, cell_type_key: str = 'transferred_cell_type', cell_type_query: str | list[str] | None = None, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', shapes_key: str = 'cell_boundaries', nucleus_shapes_key: str = 'nucleus_boundaries', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_gene_key: str = 'feature_name', points_x_key: str = 'x', points_y_key: str = 'y', select_by: Literal['iou', 'nucleus_fraction'] = 'nucleus_fraction', min_intersection_area: float = 0.0, n_jobs: int = 1, predicate: str = 'intersects', inplace: bool = True) DataFrame#
For each cell, compute the percentage of transcripts assigned to that cell that fall in:
  • nucleus overlap region: cell ∩ matched_nucleus

  • cytoplasm region: cell - matched_nucleus (i.e. inside cell but not in nucleus overlap)

  • outside cell: not inside the assigned cell polygon

Notes

  • Nuclei are matched to cells using match_nuclei_to_cells (one nucleus_id per cell).

  • A transcript is counted as “inside cell” only if it spatially joins to some cell polygon AND the joined polygon id equals its assigned points_cell_id_key.

  • Nuclear transcripts are those that join to the matched nucleus polygon for their cell. If no nucleus is matched for a cell, nuclear transcripts are zero by definition for that cell.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.

  • genes (str | list[str] | None, optional) – String or list of strings indicating the feature/gene(s) to calculate the mean transcript coordiantes on. If None, all genes are used.

  • cell_type_key (str) – Column in sdata.tables[tables_key].obs with cell-type labels.

  • cell_type_query (str | list[str] | None, optional) – If provided, compute the metric only for cells whose cell_type_key matches these label(s).

  • tables_key (str, default="table") – The key to access the AnnData table from sdata.tables. Default is “table”.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • shapes_key (str, default="cell_boundaries") – The key in sdata.shapes specifying the geometry column. Default is “cell_boundaries”.

  • nucleus_shapes_key (str | None, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons (required if centroid_region=”nucleus”).

  • points_key (str, default="transcripts") – The key in the transcript table indicating transcript identifiers. Default is “transcripts”.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.

  • points_background_id (str | int, default="UNASSIGNED") – The cell ID value indicating background transcripts that should be ignored.

  • points_gene_key (str, default="feature_name") – The key to access gene names within the transcript data. Default is “feature_name”.

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.

  • select_by ({"iou","nucleus_fraction"}, default="nucleus_fraction") – Criterion to choose the best nucleus for each cell when centroid_region=”nucleus”.

  • min_intersection_area (float, default=0.0) – Minimum overlap area required to consider a nucleus a candidate for a cell.

  • n_jobs (int, default=1) – Number of parallel jobs for cell-nucleus matching (if needed). -1 uses all CPUs.

  • predicate (str, default="intersects") – Geometric predicate used to assign transcripts to cell or nucleus polygons during spatial joins (e.g. “covers” includes boundary points, “intersects” is more permissive).

  • inplace (bool, default=True) – Whether to add the results to sdata.tables. Default is True.

Returns:

n_total, n_in_cell, n_outside_cell, n_in_nucleus_overlap, n_in_cytoplasm pct_outside_cell, pct_nucleus, pct_cytoplasm

Return type:

DataFrame indexed by cell id with counts and percentages