The volume (vl) accessor#

The volume accessor provides metrics about the 3D spatial distribution of transcripts.

segtraq.vl.volume.fraction_heterotypic_overlap(sdata: SpatialData, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', cell_type_key: str = 'transferred_cell_type', shapes_key_list: list[str] = ('cell_boundaries_z0', 'cell_boundaries_z1', 'cell_boundaries_z2', 'cell_boundaries_z3'), shapes_cell_id_key: str = 'cell_id', unknown_label: str = 'Unknown', unknown_policy: str = 'treat_as_label', inplace: bool = True) DataFrame#

Compute cross-depth heterotypic overlap fraction per cell using one representative polygon per cell (chosen as the polygon with the largest area across z layers).

For a representative polygon i (cell_id, z_layer) with geometry P_i and type t_i:

overlap_area_i = Area( P_i ∩ Union_{j: z_j != z_i, id_j != id_i, t_j != t_i} P_j ) overlap_fraction_i = overlap_area_i / Area(P_i)

Candidates are restricted to bbox-overlapping polygons via a spatial index.

Unknown/NA types:
  • unknown_policy=”exclude”: cells with NA/unknown type return NaN, and unknown-type candidates are excluded from overlap.

  • unknown_policy=”treat_as_label”: NA is replaced by unknown_label and treated as a real category.

Parameters:
  • sdata (SpatialData) – A SpatialData object containing cell boundary polygons in multiple z layers and a cell table with transferred cell type labels.

  • tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • cell_type_key (str, default="transferred_cell_type") – Column in the cell table containing cell-type labels (e.g. transferred from scRNA-seq).

  • shapes_key_list (list[str] or tuple[str, ...]) – Keys in sdata.shapes for per-z-layer cell boundary polygons (e.g. [“cell_boundaries_z0”, …, “cell_boundaries_z3”]).

  • shapes_cell_id_key (str, optional, default="cell_id") – Index name of shapes GeoDataFrame linking polygons to cell IDs.

  • unknown_label (str, default="Unknown") – Label name to use when treating NA as a separate category (unknown_policy=”treat_as_label”).

  • unknown_policy (str, default="exclude") –

    How to handle Unknown/NA cell types:
    • ”exclude”: exclude polygons with NA/unknown types from comparisons. If the focal cell has NA/unknown type, its overlap fraction is set to NaN.

    • ”treat_as_label”: convert NA to unknown_label and treat it as a valid category.

  • inplace (bool, default=True) – Whether to merge the aggregated per-cell result into sdata.tables[tables_key].obs.

Returns:

DataFrame with columns [tables_cell_id_key, “heterotypic_overlap_area”, “heterotypic_overlap_fraction”].

Return type:

pd.DataFrame

segtraq.vl.volume.similarity_top_bottom(sdata, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_gene_key: str = 'feature_name', points_x_key: str = 'x', points_y_key: str = 'y', points_z_key: str = 'z', correct_z_drift: bool = True, max_points: int = 1000000, seed: int | None = 0, q: float = 0.3, normalization: str | None = None, scale: float = 10000.0, min_genes: int = 5, min_transcripts: int = 10, inplace: bool = True)#

Compute cosine similarity between gene expression profiles of the bottom and top z-quantiles of transcripts within each cell.

Optionally, a global z-drift correction is applied before computing within-cell quantiles (default: True). This is useful when raw z coordinates show tilt/warping across the field of view (e.g. slide not even in z).

For each cell, transcripts are split into:
  • bottom part: z <= q-quantile within that cell

  • top part: z >= (1-q)-quantile within that cell

Gene counts normalized Analytic Pearson residuals (Lause et al. (2021)) for all counts together and work with the normalized residuals which are later taken apart

Cells are filtered / set to NaN if either part is too sparse:
  • at least min_transcripts transcripts in BOTH bottom and top parts

  • at least min_genes genes with nonzero counts across (bottom OR top)

Parameters:
  • sdata (SpatialData) – A SpatialData object containing transcript-assigned spatial transcriptomics data.

  • tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • points_key (str, default="transcripts") – Key in sdata.points for transcript-level data.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript to a cell.

  • points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).

  • points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript.

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript.

  • points_z_key (str, default="z") – Column specifying the z coordinate / depth for each transcript.

  • correct_z_drift (bool, default=True) – If True, correct global z-drift before computing within-cell z-quantiles. The corrected values are used only for defining top/bottom subsets.

  • max_points (int, default=1_000_000) – Max. number of points used to fit the regression (random subsampling) in z drift correction.

  • seed (int or None, default=0) – Random seed used for subsampling in z drift correction. If None, sampling is not reproducible.

  • q (float, default=0.30) – Quantile defining bottom and top parts. bottom = q, top = 1-q.

  • normalization (str, default="pearson") – Normalization to be applied to the data. Either Pearson residuals (“pearson”), scaled log-transform (“log”) or raw counts (“raw” or None).

  • scale (float, default=1e4) – Scale for within-cell library size normalization (bottom+top).

  • min_genes (int, default=5) – Minimum number of genes with nonzero counts in (bottom OR top) required to score a cell.

  • min_transcripts (int, default=10) – Minimum number of transcripts required in EACH part (bottom and top) to score a cell.

  • inplace (bool, default=True) – Whether to add the results to sdata.tables[tables_key].obs.

Returns:

DataFrame with columns [tables_cell_id_key, “cosine_sim_top_bottom_z”].

Return type:

pd.DataFrame

segtraq.vl.volume.vertical_signal_integrity_per_cell(sdata, vsi_map: ndarray, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_gene_key: str = 'feature_name', points_x_key: str = 'x', points_y_key: str = 'y', inplace: bool = True)#

Compute per-cell mean VSI by sampling a precomputed VSI map at transcript locations.

This metric assumes vsi_map is defined on the same coordinate system and scaling as the transcript coordinates in sdata.points[points_key], i.e. VSI values are indexed directly by integer x/y coordinates (after optional shift-to-origin). For each transcript, the VSI value is read from vsi_map[y_int, x_int] and then averaged across transcripts belonging to each cell.

Parameters:
  • sdata (SpatialData) – A SpatialData object containing transcript-assigned spatial transcriptomics data.

  • vsi_map (np.ndarray) – 2D array of VSI values. Must be indexable as vsi_map[y, x], where x/y correspond to transcript coordinates (after optional shift-to-origin).

  • tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table. If inplace=True, results are merged into sdata.tables[tables_key].obs.

  • tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.

  • points_key (str, default="transcripts") – Key in sdata.points for transcript-level data.

  • points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript to a cell.

  • points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).

  • points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript. Used to filter transcripts to features present in sdata.tables[tables_key].var_names.

  • points_x_key (str, default="x") – Column for the x-coordinate of each transcript.

  • points_y_key (str, default="y") – Column for the y-coordinate of each transcript.

  • inplace (bool, default=True) – Whether to add the results to sdata.tables[tables_key].obs.

Returns:

DataFrame with columns [tables_cell_id_key, “mean_vsi”]

Return type:

pd.DataFrame