The nuclear correlation (nc
) accessor#
The nuclear correlation accessor provides metrics to evaluate the correlation between the cell and nuclear segmentation mask.
- segtraq.nc.nuclear_correlation.compute_cell_nuc_correlation(sdata: SpatialData, table_key: str = 'table', cell_id_key: str = 'cell_id', metric: str = 'pearson', transcripts_key: str = 'transcripts', nucleus_by: str = 'nucleus_boundaries', feature_column: str = 'feature_name', x_coordinate: str = 'x', y_coordinate: str = 'y') DataFrame #
For each cell in the SpatialData table, identifies the nucleus with highest IoU and computes a correlation (e.g. Pearson) between the gene expression profiles of the cell and that nucleus.
- Parameters:
sdata (spatialdata.SpatialData) –
- A SpatialData object containing:
.shapes[‘cell_boundaries’] and .shapes[‘nucleus_boundaries’] for polygon geometries,
.tables[table_key] as an AnnData table.
table_key (str) – Key in sdata.tables pointing to the expression matrix.
cell_id_key (str) – Column in `sdata.tables[table_key].obs containing cell IDs to match with shapes.
metric (str) – Correlation metric. Currently supports only “pearson”.
transcripts_key (str) – Name of transcripts Points element.
nucleus_by (str) – Name of nucleus shape layer to aggregate by.
feature_column (str) – Column in transcripts pointing to feature (e.g. gene/protein).
x_coordinate (str) – Column in transcripts pointing x coordinate.
y_coordinate (str) – Column in transcripts pointing y coordinate.
- Returns:
- DataFrame with columns:
cell_id: identifier of each cell,
best_nuc_id: matching nucleus ID with highest IoU (or None),
correlation: Pearson correlation between the cell and its matched nucleus gene counts (NaN if no match).
- Return type:
pandas.DataFrame
- segtraq.nc.nuclear_correlation.compute_cell_nuc_ious(sdata: SpatialData, cell_shape_key: str = 'cell_boundaries', nuc_shape_key: str = 'nucleus_boundaries', n_jobs: int = -1, use_progress: bool = True) DataFrame #
Compute per-cell IoU between cell and nucleus boundaries in a SpatialData object.
- Parameters:
sdata (spatialdata.SpatialData) – Must contain cell and nuclear shapes.
cell_shape_key (str, optional) – The key in the shapes attribute of sdata that corresponds to cell boundaries.
nuc_shape_key (str, optional) – The key in the shapes attribute of sdata that corresponds to nucleus boundaries.
n_jobs (int, optional) – Number of parallel jobs. Default=-1 uses all CPUs.
use_progress (bool, optional) – Whether to display a progress bar with tqdm.
- Returns:
Columns: [cell_id, best_nuc_id, IoU]
- Return type:
pandas.DataFrame
- segtraq.nc.nuclear_correlation.compute_correlation_between_parts(sdata: SpatialData, table_key: str = 'table', cell_shape_key: str = 'cell_boundaries', nuc_shape_key: str = 'nucleus_boundaries', transcripts_key: str = 'transcripts', feature_column: str = 'feature_name', x_coordinate: str = 'x', y_coordinate: str = 'y') DataFrame #
Compute Pearson correlation between cell part overlapping with its nucleus and the rest of the cell.
- Parameters:
sdata (SpatialData) – The SpatialData object containing cells, nuclei, and transcript points.
table_key (str) – Key in sdata.tables pointing to the expression matrix.
cell_shape_key (str) – Key for cell boundaries in sdata.shapes.
nuc_shape_key (str) – Key for nucleus boundaries in sdata.shapes.
transcripts_key (str) – Key for transcript points in sdata.points.
feature_column (str) – Feature column in transcript points (e.g. gene name).
x_coordinate (str) – Column name for x coordinate.
y_coordinate (str) – Column name for y coordinate.
- Returns:
DataFrame with columns [“cell_id”, “best_nuc_id”, “correlation”]
- Return type:
pd.DataFrame