The region similarity (rs) accessor#
The region similarity accessor provides metrics to evaluate how well intracellular regions align.
- segtraq.rs.region_similarity.match_nuclei_to_cells(sdata: SpatialData, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', shapes_key: str = 'cell_boundaries', nucleus_shapes_key: str = 'nucleus_boundaries', select_by: str = 'nucleus_fraction', min_intersection_area: float = 0.0, n_jobs: int = -1, inplace: bool = True) DataFrame#
Computes the best-matching nucleus for each cell based on Intersection-over-Union (IoU) or nucleus fraction (area(cell ∩ nucleus) / area(nucleus)).
- Parameters:
sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table. Gene names in sdata.tables[tables_key].var.index should match the gene field in sdata.points[points_key] (see points_gene_key).
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.
nucleus_shapes_key (str, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available.
select_by (str, default="nucleus_fraction") – Score used to select the best-matching nucleus per cell. Options: - “iou”: maximize Intersection-over-Union (cell vs nucleus). - “nucleus_fraction”: maximize area(cell ∩ nucleus) / area(nucleus). If multiple nuclei have the same score (e.g. fully inside the cell), the larger nucleus (by area) is selected.
min_intersection_area (float, default=0.0) – Minimum area(cell ∩ nucleus) required to consider a nucleus as a candidate. Overlaps <= this threshold are ignored.
n_jobs (int, optional) – Number of parallel jobs. Default=-1 uses all CPUs.
inplace (bool, optional) – Whether to add the results to sdata.tables. Default is True.
- Return type:
pandas.DataFrame
- segtraq.rs.region_similarity.similarity_border_neighborhood(sdata: SpatialData, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', shapes_key: str = 'cell_boundaries', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str = 'UNASSIGNED', points_x_key: str = 'x', points_y_key: str = 'y', points_gene_key: str = 'feature_name', erosion_fraction_of_radius: float = 0.2, neighborhood_radius_factor: float = 2.0, min_transcripts: int = 10, min_genes: int = 5, metric: str = 'cosine_sim', inplace: bool = True) DataFrame#
Computes the similarity between gene expression profiles in the border region of each cell and two references: (1) the center region of the same cell, and (2) the neighborhood composition vector (NCV) computed within a specified radius around the cell.
Specifically, the function: 1. Erodes each cell polygon to obtain a center region. 2. Defines the border region as the set difference between the full cell
and its eroded center.
Computes gene expression profiles for center and border.
Computes the correlation between center and border expression.
- Computes the correlation between border expression and the
NCV expression profile of the same cell.
- Parameters:
sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (tables, points, shapes, etc.).
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.
points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.
points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.
neighborhood_radius_factor (float, default=2.0) – For each cell, the neighborhood consists of the cells whose centroids lie within the radius of the cell times this factor.
erosion_fraction_of_radius (float, default=0.2) – Fraction of the equivalent radius to use as erosion Example: 0.2 means erode by 20% of the radius.
min_transcripts (int, default=10) – Minimum number of transcripts (raw counts) required per region to compute a correlation. If either region has fewer than min_transcripts counts, the correlation is set to NaN.
min_genes (int, default=5) – Minimum number of non-zero genes required to compute a correlation. If fewer genes are available, the correlation is set to NaN.
metric (str, default="cosine_sim") – Correlation metric to use (“pearson”, “spearman”, “cosine_sim” currently supported).
inplace (bool, optional) – Whether to add the results to sdata.tables[tables_key].obs. Default is True.
- Returns:
- DataFrame with columns:
tables_cell_id_key: identifier of each cell,
similarity_center_border: similarity between center and border expression,
similarity_border_neighborhood: similarity between border and neighborhood expression
ratio_border_neighborhood_to_center: ratio of the two similarities. A value > 1 indicates that the border is more similar to the neighborhood than to the center, while a value < 1 indicates the opposite.
- Return type:
pandas.DataFrame
- segtraq.rs.region_similarity.similarity_nucleus_cell(sdata: SpatialData, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', shapes_key: str = 'cell_boundaries', nucleus_shapes_key: str = 'nucleus_boundaries', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str = 'UNASSIGNED', points_gene_key: str = 'feature_name', points_x_key: str = 'x', points_y_key: str = 'y', min_transcripts: int = 10, min_genes: int = 5, metric: str = 'cosine_sim', select_by: str = 'nucleus_fraction', min_intersection_area: float = 0.0, n_jobs: int = -1, inplace: bool = True) DataFrame#
For each cell in the SpatialData table, identifies the nucleus with highest IoU and computes the similarity (cosine similarity, Pearson correlation, Spearman correlation) between the gene expression profiles of the whole cell and its nucleus.
- Parameters:
sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.
nucleus_shapes_key (str, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available.
points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.
points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.
min_transcripts (int, default=10) – Minimum number of transcripts (raw counts) required per region (cell and nucleus) to compute a correlation. If either region has fewer than min_transcripts counts, the correlation is set to NaN.
min_genes (int, default=5) – Minimum number of non-zero genes required to compute a correlation. If fewer genes are available, the correlation is set to NaN.
metric (str, default="cosine_sim") – Correlation metric to use (“pearson”, “spearman”, “cosine_sim” currently supported).
n_jobs (int) – Number of jobs for computing cell nucleus match, if not yet calculated.
select_by (str, default="nucleus_fraction") – Score used to select the best-matching nucleus per cell. Options: - “iou”: maximize Intersection-over-Union (cell vs nucleus). - “nucleus_fraction”: maximize area(cell ∩ nucleus) / area(nucleus). If multiple nuclei have the same score (e.g. fully inside the cell), the larger nucleus (by area) is selected.
min_intersection_area (float, default=0.0) – Minimum area(cell ∩ nucleus) required to consider a nucleus as a candidate. Overlaps <= this threshold are ignored.
inplace (bool, optional) – Whether to add the results to sdata.tables. Default is True.
- Returns:
- DataFrame with columns:
cell_id_key : identifier of each cell,
nucleus_id: matching nucleus ID with highest nucleus fraction or Intersection over Union (or None),
- similarity_nucleus_cell:
similarity (cosine similarity, Pearson correlation, Spearman correlation) between the cell and its matched nucleus gene counts
(NaN if no match).
- Return type:
pandas.DataFrame
- segtraq.rs.region_similarity.similarity_nucleus_cytoplasm(sdata, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', shapes_key: str = 'cell_boundaries', nucleus_shapes_key: str = 'nucleus_boundaries', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_gene_key: str = 'feature_name', points_x_key: str = 'x', points_y_key: str = 'y', min_transcripts: int = 10, min_genes: int = 5, metric: str = 'cosine_sim', scale: float = 10000.0, select_by: str = 'nucleus_fraction', min_intersection_area: float = 0.0, n_jobs: int = 1, inplace: bool = True, debug_cell_id: str | None = None)#
For each cell in the SpatialData table, identifies the nucleus with highest intersection over union (IoU) and computes the similarity (cosine similarity, Pearson correlation, Spearman correlation) between the gene expression profiles of the cytoplasm (cell - nucleus) and the cell region overlapping the nucleus.
- Parameters:
sdata (SpatialData) – A SpatialData object containing segmented and transcript-assigned spatial transcriptomics data (images, tables, points, shapes and optional labels).
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table. Gene names in sdata.tables[tables_key].var.index should match the gene field in sdata.points[points_key] (see points_gene_key).
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
shapes_key (str, default="cell_boundaries") – Key in sdata.shapes for cell boundary polygons.
nucleus_shapes_key (str, default="nucleus_boundaries") – Key in sdata.shapes for nucleus boundary polygons, if available.
points_key (str, default="transcripts") – Key in sdata.points for spot/transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript/spot to a cell.
points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript/spot.
points_x_key (str, default="x") – Column for the x-coordinate of each transcript/spot.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript/spot.
min_transcripts (int, default=10) – Minimum number of transcripts (raw counts) required per region (cytoplasm and nucleus) to compute a correlation. If either region has fewer than min_transcripts counts, the correlation is set to NaN.
min_genes (int, default=5) – Minimum number of non-zero genes required to compute a correlation. If fewer genes are available, the correlation is set to NaN.
metric (str, default="cosine_sim") – Correlation metric to use (“pearson”, “spearman”, “cosine_sim” currently supported).
scale (float, default=1e4,) – Scale for library size normalization.
select_by (str, default="nucleus_fraction") – Score used to select the best-matching nucleus per cell. Options: - “iou”: maximize Intersection-over-Union (cell vs nucleus). - “nucleus_fraction”: maximize area(cell ∩ nucleus) / area(nucleus). If multiple nuclei have the same score (e.g. fully inside the cell), the larger nucleus (by area) is selected.
min_intersection_area (float, default=0.0) – Minimum area(cell ∩ nucleus) required to consider a nucleus as a candidate. Overlaps <= this threshold are ignored.
n_jobs (int) – Number of parallel jobs for correlation computation.
inplace (bool, optional) – Whether to add the results to sdata.tables. Default is True.
- Returns:
DataFrame with columns [cell_id_key, “nucleus_id”, “similarity_nucleus_cytoplasm”]
- Return type:
pd.DataFrame