The nuclear correlation (nc) accessor#

The nuclear correlation accessor provides metrics to evaluate the correlation between the cell and nuclear segmentation mask.

segtraq.nc.nuclear_correlation.compute_cell_nuc_correlation(sdata: SpatialData, table_key: str = 'table', cell_id_key: str = 'cell_id', metric: str = 'pearson', transcripts_key: str = 'transcripts', nucleus_by: str = 'nucleus_boundaries', feature_column: str = 'feature_name', x_coordinate: str = 'x', y_coordinate: str = 'y') DataFrame#

For each cell in the SpatialData table, identifies the nucleus with highest IoU and computes a correlation (e.g. Pearson) between the gene expression profiles of the cell and that nucleus.

Parameters:
  • sdata (spatialdata.SpatialData) –

    A SpatialData object containing:
    • .shapes[‘cell_boundaries’] and .shapes[‘nucleus_boundaries’] for polygon geometries,

    • .tables[table_key] as an AnnData table.

  • table_key (str) – Key in sdata.tables pointing to the expression matrix.

  • cell_id_key (str) – Column in `sdata.tables[table_key].obs containing cell IDs to match with shapes.

  • metric (str) – Correlation metric. Currently supports only “pearson”.

  • transcripts_key (str) – Name of transcripts Points element.

  • nucleus_by (str) – Name of nucleus shape layer to aggregate by.

  • feature_column (str) – Column in transcripts pointing to feature (e.g. gene/protein).

  • x_coordinate (str) – Column in transcripts pointing x coordinate.

  • y_coordinate (str) – Column in transcripts pointing y coordinate.

Returns:

DataFrame with columns:
  • cell_id: identifier of each cell,

  • best_nuc_id: matching nucleus ID with highest IoU (or None),

  • correlation: Pearson correlation between the cell and its matched nucleus gene counts (NaN if no match).

Return type:

pandas.DataFrame

segtraq.nc.nuclear_correlation.compute_cell_nuc_ious(sdata: SpatialData, cell_shape_key: str = 'cell_boundaries', nuc_shape_key: str = 'nucleus_boundaries', n_jobs: int = -1, use_progress: bool = True) DataFrame#

Compute per-cell IoU between cell and nucleus boundaries in a SpatialData object.

Parameters:
  • sdata (spatialdata.SpatialData) – Must contain cell and nuclear shapes.

  • cell_shape_key (str, optional) – The key in the shapes attribute of sdata that corresponds to cell boundaries.

  • nuc_shape_key (str, optional) – The key in the shapes attribute of sdata that corresponds to nucleus boundaries.

  • n_jobs (int, optional) – Number of parallel jobs. Default=-1 uses all CPUs.

  • use_progress (bool, optional) – Whether to display a progress bar with tqdm.

Returns:

Columns: [cell_id, best_nuc_id, IoU]

Return type:

pandas.DataFrame

segtraq.nc.nuclear_correlation.compute_correlation_between_parts(sdata: SpatialData, table_key: str = 'table', cell_shape_key: str = 'cell_boundaries', nuc_shape_key: str = 'nucleus_boundaries', transcripts_key: str = 'transcripts', feature_column: str = 'feature_name', x_coordinate: str = 'x', y_coordinate: str = 'y') DataFrame#

Compute Pearson correlation between cell part overlapping with its nucleus and the rest of the cell.

Parameters:
  • sdata (SpatialData) – The SpatialData object containing cells, nuclei, and transcript points.

  • table_key (str) – Key in sdata.tables pointing to the expression matrix.

  • cell_shape_key (str) – Key for cell boundaries in sdata.shapes.

  • nuc_shape_key (str) – Key for nucleus boundaries in sdata.shapes.

  • transcripts_key (str) – Key for transcript points in sdata.points.

  • feature_column (str) – Feature column in transcript points (e.g. gene name).

  • x_coordinate (str) – Column name for x coordinate.

  • y_coordinate (str) – Column name for y coordinate.

Returns:

DataFrame with columns [“cell_id”, “best_nuc_id”, “correlation”]

Return type:

pd.DataFrame