The volume (vl) accessor#
The volume accessor provides metrics about the 3D spatial distribution of transcripts.
- segtraq.vl.volume.fraction_heterotypic_overlap(sdata: SpatialData, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', cell_type_key: str = 'transferred_cell_type', shapes_key_list: list[str] = ('cell_boundaries_z0', 'cell_boundaries_z1', 'cell_boundaries_z2', 'cell_boundaries_z3'), shapes_cell_id_key: str = 'cell_id', unknown_label: str = 'Unknown', unknown_policy: str = 'treat_as_label', inplace: bool = True) DataFrame#
Compute cross-depth heterotypic overlap fraction per cell using one representative polygon per cell (chosen as the polygon with the largest area across z layers).
For a representative polygon i (cell_id, z_layer) with geometry P_i and type t_i:
overlap_area_i = Area( P_i ∩ Union_{j: z_j != z_i, id_j != id_i, t_j != t_i} P_j ) overlap_fraction_i = overlap_area_i / Area(P_i)
Candidates are restricted to bbox-overlapping polygons via a spatial index.
- Unknown/NA types:
unknown_policy=”exclude”: cells with NA/unknown type return NaN, and unknown-type candidates are excluded from overlap.
unknown_policy=”treat_as_label”: NA is replaced by unknown_label and treated as a real category.
- Parameters:
sdata (SpatialData) – A SpatialData object containing cell boundary polygons in multiple z layers and a cell table with transferred cell type labels.
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
cell_type_key (str, default="transferred_cell_type") – Column in the cell table containing cell-type labels (e.g. transferred from scRNA-seq).
shapes_key_list (list[str] or tuple[str, ...]) – Keys in sdata.shapes for per-z-layer cell boundary polygons (e.g. [“cell_boundaries_z0”, …, “cell_boundaries_z3”]).
shapes_cell_id_key (str, optional, default="cell_id") – Index name of shapes GeoDataFrame linking polygons to cell IDs.
unknown_label (str, default="Unknown") – Label name to use when treating NA as a separate category (unknown_policy=”treat_as_label”).
unknown_policy (str, default="exclude") –
- How to handle Unknown/NA cell types:
”exclude”: exclude polygons with NA/unknown types from comparisons. If the focal cell has NA/unknown type, its overlap fraction is set to NaN.
”treat_as_label”: convert NA to unknown_label and treat it as a valid category.
inplace (bool, default=True) – Whether to merge the aggregated per-cell result into sdata.tables[tables_key].obs.
- Returns:
DataFrame with columns [tables_cell_id_key, “heterotypic_overlap_area”, “heterotypic_overlap_fraction”].
- Return type:
pd.DataFrame
- segtraq.vl.volume.similarity_top_bottom(sdata, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_gene_key: str = 'feature_name', points_x_key: str = 'x', points_y_key: str = 'y', points_z_key: str = 'z', correct_z_drift: bool = True, max_points: int = 1000000, seed: int | None = 0, q: float = 0.3, normalization: str | None = None, scale: float = 10000.0, min_genes: int = 5, min_transcripts: int = 10, inplace: bool = True)#
Compute cosine similarity between gene expression profiles of the bottom and top z-quantiles of transcripts within each cell.
Optionally, a global z-drift correction is applied before computing within-cell quantiles (default: True). This is useful when raw z coordinates show tilt/warping across the field of view (e.g. slide not even in z).
- For each cell, transcripts are split into:
bottom part: z <= q-quantile within that cell
top part: z >= (1-q)-quantile within that cell
Gene counts normalized Analytic Pearson residuals (Lause et al. (2021)) for all counts together and work with the normalized residuals which are later taken apart
- Cells are filtered / set to NaN if either part is too sparse:
at least min_transcripts transcripts in BOTH bottom and top parts
at least min_genes genes with nonzero counts across (bottom OR top)
- Parameters:
sdata (SpatialData) – A SpatialData object containing transcript-assigned spatial transcriptomics data.
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
points_key (str, default="transcripts") – Key in sdata.points for transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript to a cell.
points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript.
points_x_key (str, default="x") – Column for the x-coordinate of each transcript.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript.
points_z_key (str, default="z") – Column specifying the z coordinate / depth for each transcript.
correct_z_drift (bool, default=True) – If True, correct global z-drift before computing within-cell z-quantiles. The corrected values are used only for defining top/bottom subsets.
max_points (int, default=1_000_000) – Max. number of points used to fit the regression (random subsampling) in z drift correction.
seed (int or None, default=0) – Random seed used for subsampling in z drift correction. If None, sampling is not reproducible.
q (float, default=0.30) – Quantile defining bottom and top parts. bottom = q, top = 1-q.
normalization (str, default="pearson") – Normalization to be applied to the data. Either Pearson residuals (“pearson”), scaled log-transform (“log”) or raw counts (“raw” or None).
scale (float, default=1e4) – Scale for within-cell library size normalization (bottom+top).
min_genes (int, default=5) – Minimum number of genes with nonzero counts in (bottom OR top) required to score a cell.
min_transcripts (int, default=10) – Minimum number of transcripts required in EACH part (bottom and top) to score a cell.
inplace (bool, default=True) – Whether to add the results to sdata.tables[tables_key].obs.
- Returns:
DataFrame with columns [tables_cell_id_key, “cosine_sim_top_bottom_z”].
- Return type:
pd.DataFrame
- segtraq.vl.volume.vertical_signal_integrity_per_cell(sdata, vsi_map: ndarray, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: str | int = 'UNASSIGNED', points_gene_key: str = 'feature_name', points_x_key: str = 'x', points_y_key: str = 'y', inplace: bool = True)#
Compute per-cell mean VSI by sampling a precomputed VSI map at transcript locations.
This metric assumes vsi_map is defined on the same coordinate system and scaling as the transcript coordinates in sdata.points[points_key], i.e. VSI values are indexed directly by integer x/y coordinates (after optional shift-to-origin). For each transcript, the VSI value is read from vsi_map[y_int, x_int] and then averaged across transcripts belonging to each cell.
- Parameters:
sdata (SpatialData) – A SpatialData object containing transcript-assigned spatial transcriptomics data.
vsi_map (np.ndarray) – 2D array of VSI values. Must be indexable as vsi_map[y, x], where x/y correspond to transcript coordinates (after optional shift-to-origin).
tables_key (str, default="table") – Key in sdata.tables for the cell-level metadata table. If inplace=True, results are merged into sdata.tables[tables_key].obs.
tables_cell_id_key (str, default="cell_id") – Column in the cell table uniquely identifying each cell.
points_key (str, default="transcripts") – Key in sdata.points for transcript-level data.
points_cell_id_key (str, default="cell_id") – Column in the points table linking each transcript to a cell.
points_background_id (str or int, default="UNASSIGNED") – Identifier for transcripts not assigned to any cell (background).
points_gene_key (str, default="feature_name") – Column specifying the gene/feature name for each transcript. Used to filter transcripts to features present in sdata.tables[tables_key].var_names.
points_x_key (str, default="x") – Column for the x-coordinate of each transcript.
points_y_key (str, default="y") – Column for the y-coordinate of each transcript.
inplace (bool, default=True) – Whether to add the results to sdata.tables[tables_key].obs.
- Returns:
DataFrame with columns [tables_cell_id_key, “mean_vsi”]
- Return type:
pd.DataFrame