Correlation to nuclear segmentation#

The nuclear_correlation module provides metrics to assess how well cell segmentation masks align with nuclear segmentation masks. It is applicable only when nuclear masks are available.

[1]:

%load_ext autoreload
%autoreload 2

[2]:

import spatialdata as sd

sdata = sd.read_zarr("/home/lazic/src/segtraq/tests/data/xenium_sp_subset.zarr")

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/dask/dataframe/__init__.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option `dataframe.query-planning` to `True` or None to enable the new Dask Dataframe implementation and silence this warning.
  warnings.warn(
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/xarray_schema/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import DistributionNotFound, get_distribution
version mismatch: detected: RasterFormatV02, requested: FormatV04
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only'
  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only'
  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only'
  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only'
  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only'
  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
version mismatch: detected: RasterFormatV02, requested: FormatV04
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only'
  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
version mismatch: detected: RasterFormatV02, requested: FormatV04
version mismatch: detected: RasterFormatV02, requested: FormatV04

[3]:

sdata

[3]:

SpatialData object, with associated Zarr store: /g/huber/users/lazic/src/segtraq/tests/data/xenium_sp_subset.zarr
├── Images
│     ├── 'he_image': DataTree[cyx] (3, 448, 527), (3, 224, 264), (3, 112, 131), (3, 56, 66), (3, 28, 33)
│     ├── 'if_image': DataTree[cyx] (3, 221, 260), (3, 111, 130), (3, 55, 65), (3, 28, 33), (3, 13, 16)
│     ├── 'morphology_focus': DataTree[cyx] (1, 763, 899), (1, 381, 450), (1, 191, 225), (1, 95, 112), (1, 47, 56)
│     └── 'morphology_mip': DataTree[cyx] (1, 763, 899), (1, 381, 450), (1, 191, 225), (1, 95, 112), (1, 47, 56)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (763, 899), (381, 450), (191, 225), (95, 112), (47, 56)
│     └── 'nucleus_labels': DataTree[yx] (763, 899), (381, 450), (191, 225), (95, 112), (47, 56)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 8) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (158, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (122, 2) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (132, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (122, 313)
with coordinate systems:
    ▸ 'global', with elements:
        he_image (Images), if_image (Images), morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)

[4]:

sdata["table"].obs

[4]:

	cell_id	transcript_counts	control_probe_counts	control_codeword_counts	total_counts	cell_area	nucleus_area	region
4612	4613	210	0	0	210	272.066406	27.319531	cell_boundaries
4613	4614	192	0	0	192	154.208594	33.280156	cell_boundaries
4614	4615	174	0	0	174	170.193906	31.744844	cell_boundaries
4615	4616	70	0	0	70	85.796875	26.235781	cell_boundaries
4616	4617	85	0	0	85	101.737031	16.843281	cell_boundaries
...	...	...	...	...	...	...	...	...
80758	80759	227	0	0	227	256.984219	42.221094	cell_boundaries
80759	80760	198	0	0	198	222.800938	30.028906	cell_boundaries
80760	80761	110	0	0	110	111.355313	33.144688	cell_boundaries
80761	80762	150	0	0	150	322.189844	60.644844	cell_boundaries
80762	80763	109	0	0	109	152.763594	41.769531	cell_boundaries

122 rows × 8 columns

The spatialdatadataset contains cell and nuclear masks as Shapesand Labels, as shown below.

Intersection over Union between cell and nucleus masks#

First, we compute the Intersection over Union (IoU) between cell and nuclear masks using compute_cell_nuc_ious.

[5]:

import time

import segtraq as st

n = 1

start = time.time()
results_df = st.nc.compute_cell_nuc_ious(sdata, n_jobs=n)
end = time.time()

print(f"Elapsed time with {n} threads: {end - start:.2f} seconds")

Processing IoU between cells and nuclei: 100%|██████████| 158/158 [00:00<00:00, 702.68it/s]

Elapsed time with 1 threads: 0.23 seconds

For each cell_id, we obtain the ID (best_nuc_id) of the nucleus mask with the highest IoU. If a cell does not overlap with any nucleus, the function returns a missing value for best_nuc_id. If the nucleus has an invalid geometry, IoUis reported as NA.

[6]:

results_df

[6]:

	cell_id	best_nuc_id	IoU
0	4613	4613	0.094093
1	4614	4614	0.207238
2	4615	4615	0.178686
3	4616	4616	0.292909
4	4617	4617	0.161824
...	...	...	...
153	80765	None	0.000000
154	80766	None	0.000000
155	80767	None	0.000000
156	80768	None	0.000000
157	80769	None	0.000000

158 rows × 3 columns

We will store the results in the .obs of the Table within the sdata object for plotting.

[7]:

sdata["table"]

[7]:

AnnData object with n_obs × n_vars = 122 × 313
    obs: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatialdata_attrs'
    obsm: 'spatial'

[8]:

ious_df = results_df.set_index("cell_id")

sdata["table"].obs["IoU"] = sdata["table"].obs["cell_id"].map(ious_df["IoU"])
sdata["table"].obs["best_nuc_id"] = sdata["table"].obs["cell_id"].map(ious_df["best_nuc_id"])

[9]:

sdata

[9]:

SpatialData object, with associated Zarr store: /g/huber/users/lazic/src/segtraq/tests/data/xenium_sp_subset.zarr
├── Images
│     ├── 'he_image': DataTree[cyx] (3, 448, 527), (3, 224, 264), (3, 112, 131), (3, 56, 66), (3, 28, 33)
│     ├── 'if_image': DataTree[cyx] (3, 221, 260), (3, 111, 130), (3, 55, 65), (3, 28, 33), (3, 13, 16)
│     ├── 'morphology_focus': DataTree[cyx] (1, 763, 899), (1, 381, 450), (1, 191, 225), (1, 95, 112), (1, 47, 56)
│     └── 'morphology_mip': DataTree[cyx] (1, 763, 899), (1, 381, 450), (1, 191, 225), (1, 95, 112), (1, 47, 56)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (763, 899), (381, 450), (191, 225), (95, 112), (47, 56)
│     └── 'nucleus_labels': DataTree[yx] (763, 899), (381, 450), (191, 225), (95, 112), (47, 56)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 8) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (158, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (122, 2) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (132, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (122, 313)
with coordinate systems:
    ▸ 'global', with elements:
        he_image (Images), if_image (Images), morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)

The histogram of IoU distribution (below) shows a median IoU is at 0.16, i.e. 16% overlap.

[10]:

import matplotlib.pyplot as plt
import numpy as np

# Prepare IoU values
ious = sdata["table"].obs["IoU"].dropna()
median_iou = np.median(ious)

# Set up figure with histogram
fig, ax = plt.subplots(figsize=(6, 5), constrained_layout=True)

# Plot histogram
ax.hist(ious, bins=30, color="steelblue", edgecolor="black")

# Add median line
ax.axvline(
    median_iou,
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Median = {median_iou:.2f}",
)

# Decorate
ax.set_title("IoU Distribution with Median")
ax.set_xlabel("IoU")
ax.set_ylabel("Cell Count")
ax.legend()

plt.show()

../_images/notebooks_nuclear_correlation_17_0.png

In the spatial plots, we can see that cells that have a high overlap with nuclei, also show a high IoU value.

[11]:

import matplotlib.pyplot as plt

axes = plt.subplots(1, 3, figsize=(15, 5), constrained_layout=True)[1].flatten()

# link annotations with cell boundaries
sdata.tables["table"].obs["region"] = "cell_boundaries"
sdata.set_table_annotates_spatialelement("table", region="cell_boundaries")

# plot
sdata.pl.render_shapes("nucleus_boundaries").pl.show(
    ax=axes[0], title="Nucleus boundaries", coordinate_systems="global"
)

sdata.pl.render_shapes("cell_boundaries", color="IoU").pl.show(
    ax=axes[1], title="Cell boundaries colored by IoU", coordinate_systems="global"
)

sdata.pl.render_shapes(
    element="nucleus_boundaries",
    fill_alpha=0.2,
    outline_alpha=1.0,
    outline_color="black",
).pl.render_shapes(
    element="cell_boundaries",
    color="IoU",
    cmap="viridis",
    fill_alpha=0.5,
    outline_alpha=1.0,
    outline_width=0.5,
    outline_color="black",
).pl.show(ax=axes[2], title="Overlay of nuclei and cell masks colored by IoU", colorbar=True)

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/spatialdata.py:511: UserWarning: Converting `region_key: region` to categorical dtype.
  convert_region_column_to_categorical(table)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:105: UserWarning: Key `cell_boundaries` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:125: UserWarning: Key `table` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:105: UserWarning: Key `cell_boundaries` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:125: UserWarning: Key `table` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)

../_images/notebooks_nuclear_correlation_19_1.png

Expression correlation between cell and nucleus#

Expression correlation between a cell and the “most-overlapping” nucleus can be computed using compute_cell_nuc_correlation. This function will internally compute IoU using compute_cell_nuc_ious unless already present in the spatialdataobject. Currently only pearsoncorrelation is supported. More options will be added later.

[12]:

import time

import segtraq as st

[13]:

start = time.time()
corr_df = st.nc.compute_cell_nuc_correlation(sdata)
end = time.time()

print(f"Elapsed time without computing IoU: {end - start:.2f} seconds")

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/operations/aggregate.py:452: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  aggregated = joined.groupby([INDEX, vk])[ONES_COLUMN].agg(agg_func).reset_index()
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/anndata/_core/aligned_df.py:68: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/models/models.py:1144: UserWarning: Converting `region_key: region` to categorical dtype.
  return convert_region_column_to_categorical(adata)

Elapsed time without computing IoU: 0.85 seconds

[14]:

corr_df

[14]:

	cell_id	best_nuc_id	IoU	correlation
0	4613	4613	0.094093	0.272743
1	4614	4614	0.207238	0.878210
2	4615	4615	0.178686	0.739670
3	4616	4616	0.292909	0.779925
4	4617	4617	0.161824	0.910924
...	...	...	...	...
117	80759	80759	0.160499	0.812877
118	80760	80760	0.128202	0.914975
119	80761	80761	0.288906	0.913159
120	80762	80762	0.180239	0.913756
121	80763	80763	0.269007	0.733573

122 rows × 4 columns

We will store the results in the .obs of the Table within the sdata object for plotting.

[15]:

corr_df = corr_df.set_index("cell_id")

sdata["table"].obs["pearson_corr"] = sdata["table"].obs["cell_id"].map(corr_df["correlation"])

The histogram below shows the distribution of Pearson correlation values across cells. It is right-skewed, with a median of 0.74. This provides intuition about possible spatial spillover: cells may be contaminated by neighboring cells, assuming that nuclei capture expression with less contamination due to their smaller radius.

[16]:

# Prepare corr values
obs = sdata["table"].obs
corr = obs["pearson_corr"]
median_corr = np.median(corr)

# Set up figure with histogram
fig, ax = plt.subplots(figsize=(6, 5), constrained_layout=True)

# Plot histogram
ax.hist(corr, bins=30, color="steelblue", edgecolor="black")

# Add median line
ax.axvline(
    median_corr,
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Median = {median_corr:.2f}",
)

# Decorate
ax.set_title("Pearson correlation between cell and nucleus expression")
ax.set_xlabel("Pearson Correlation")
ax.set_ylabel("Cell Count")
ax.legend()

plt.show()

../_images/notebooks_nuclear_correlation_28_0.png

The calculated R2 (coefficient of determination) is 0.214 indicating that correlation between cell and nucleus increases with their IoU.

[17]:

import seaborn as sns
from scipy.stats import linregress

# Extract the relevant columns, excluding NaNs
df = obs[["IoU", "pearson_corr"]].dropna()

# Compute regression metrics
slope, intercept, r_value, p_value, std_err = linregress(df["IoU"], df["pearson_corr"])
r_squared = r_value**2

# Create the plot
plt.figure(figsize=(6, 6))
ax = sns.regplot(
    data=df,
    x="IoU",
    y="pearson_corr",
    scatter_kws={"alpha": 0.6},
    line_kws={"color": "red"},
    ci=95,
)

# Overlay the R² value
ax.text(
    0.05,
    0.95,
    f"$R^2 = {r_squared:.3f}$",
    transform=ax.transAxes,
    verticalalignment="top",
    fontsize=12,
    bbox=dict(boxstyle="round,pad=0.3", facecolor="white", alpha=0.8),
)

ax.set_xlabel("IoU")
ax.set_ylabel("Pearson Correlation between cell and nucleus intersection")
ax.set_title("Correlation vs. IoU with Regression Line")
ax.grid(True)

plt.show()

../_images/notebooks_nuclear_correlation_30_0.png

The spatial plot below overlays nucleus boundaries, cell masks colored by Pearson correlation, and transcripts. Cells whose transcripts are distributed across the nucleus region tend to have higher correlation with the nucleus, while cells whose transcripts are localized outside the nuclear region—potentially due to spillover from neighboring cells—tend to exhibit lower correlation.

[18]:

# link annotations with cell boundaries
sdata.tables["table"].obs["region"] = "cell_boundaries"
sdata.set_table_annotates_spatialelement("table", region="cell_boundaries")

axes = plt.subplots(1, 2, figsize=(10, 5), constrained_layout=True)[1].flatten()

sdata.pl.render_shapes(
    element="nucleus_boundaries",
    fill_alpha=0.2,
    outline_alpha=1.0,
    outline_color="black",
).pl.render_shapes(
    element="cell_boundaries",
    color="pearson_corr",
    cmap="viridis",
    fill_alpha=0.5,
    outline_alpha=1.0,
    outline_width=0.5,
    outline_color="black",
).pl.render_points(
    "transcripts",
    color="feature_name",
    groups="MMP2",
    palette="red",
).pl.show(
    ax=axes[0],
    title="Overlay of nuclei and cell masks colored by pearson correlation",
    colorbar=True,
    figsize=(6, 6),
)


sdata.pl.render_shapes(
    element="nucleus_boundaries",
    fill_alpha=0.2,
    outline_alpha=1.0,
    outline_color="black",
).pl.render_shapes(
    element="cell_boundaries",
    color="IoU",
    cmap="viridis",
    fill_alpha=0.5,
    outline_alpha=1.0,
    outline_width=0.5,
    outline_color="black",
).pl.show(
    ax=axes[1],
    title="Overlay of nuclei and cell masks colored by IoU",
    colorbar=True,
    figsize=(6, 6),
)

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/spatialdata.py:511: UserWarning: Converting `region_key: region` to categorical dtype.
  convert_region_column_to_categorical(table)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:105: UserWarning: Key `cell_boundaries` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:125: UserWarning: Key `table` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/anndata/_core/anndata.py:381: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/anndata/_core/aligned_df.py:68: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:115: UserWarning: Key `transcripts` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)

INFO     input has more than 103 categories. Uniform 'grey' color will be used for all categories.

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata_plot/pl/utils.py:775: FutureWarning: The default value of 'ignore' for the `na_action` parameter in pandas.Categorical.map is deprecated and will be changed to 'None' in a future version. Please set na_action to the desired value to avoid seeing this warning
  color_vector = color_source_vector.map(color_mapping)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata_plot/pl/render.py:682: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = ax.scatter(
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:105: UserWarning: Key `cell_boundaries` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:125: UserWarning: Key `table` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)

../_images/notebooks_nuclear_correlation_32_3.png

Correlation between Intersection and Remainder Parts of the Cell#

The function compute_correlation_between_parts() computes the Pearson correlation between the spatial transcript feature counts within the intersection of a cell and its matched nucleus, and the remainder of the cell. It returns NaN when the regions have no overlapping transcripts or lack sufficient variability. A low correlation may indicate that the cell boundary extension captures neighborhood or ambient signal rather than true intracellular expression—a concern similarly highlighted by segmentation benchmarks such as Baysor. This is based on the assumption that transcripts are homogeneously distributed within the cell and not localized in specific subcellular regions.

[19]:

start = time.time()
corr_parts_df = st.nc.compute_correlation_between_parts(sdata)
end = time.time()

print(f"Elapsed time without computing IoU: {end - start:.2f} seconds")

Computing correlations: 122it [00:02, 59.79it/s]

Elapsed time without computing IoU: 2.07 seconds

[20]:

len(corr_parts_df) == len(sdata.tables["table"].obs)

[20]:

True

We will store the results in the .obs of the Table within the sdata object for plotting.

[32]:

corr_parts_df = corr_parts_df.set_index("cell_id")

sdata["table"].obs["pearson_corr_parts"] = sdata["table"].obs["cell_id"].map(corr_parts_df["correlation_parts"])

[33]:

sdata["table"].obs

[33]:

	cell_id	transcript_counts	control_probe_counts	control_codeword_counts	total_counts	cell_area	nucleus_area	region	IoU	best_nuc_id	pearson_corr	pearson_corr_parts
4612	4613	210	0	0	210	272.066406	27.319531	cell_boundaries	0.094093	4613	0.272743	0.113081
4613	4614	192	0	0	192	154.208594	33.280156	cell_boundaries	0.207238	4614	0.878210	0.529910
4614	4615	174	0	0	174	170.193906	31.744844	cell_boundaries	0.178686	4615	0.739670	0.289943
4615	4616	70	0	0	70	85.796875	26.235781	cell_boundaries	0.292909	4616	0.779925	0.258221
4616	4617	85	0	0	85	101.737031	16.843281	cell_boundaries	0.161824	4617	0.910924	0.761505
...	...	...	...	...	...	...	...	...	...	...	...	...
80758	80759	227	0	0	227	256.984219	42.221094	cell_boundaries	0.160499	80759	0.812877	0.628438
80759	80760	198	0	0	198	222.800938	30.028906	cell_boundaries	0.128202	80760	0.914975	0.896952
80760	80761	110	0	0	110	111.355313	33.144688	cell_boundaries	0.288906	80761	0.913159	0.751234
80761	80762	150	0	0	150	322.189844	60.644844	cell_boundaries	0.180239	80762	0.913756	0.786104
80762	80763	109	0	0	109	152.763594	41.769531	cell_boundaries	0.269007	80763	0.733573	0.360049

122 rows × 12 columns

[23]:

sdata["table"].obs["pearson_corr_parts"].isna().sum()

[23]:

np.int64(2)

The histogram below shows that the Pearson correlation between the intersection and remainder region of the cell is 0.4.

[24]:

obs = sdata["table"].obs
# Prepare corr_parts values
corr = obs["pearson_corr_parts"]
median_corr_parts = corr.median()

# Set up figure with histogram
fig, ax = plt.subplots(figsize=(6, 5), constrained_layout=True)

# Plot histogram
ax.hist(corr, bins=30, color="steelblue", edgecolor="black")

# Add median line
ax.axvline(
    median_corr_parts,
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Median = {median_corr_parts:.2f}",
)

# Decorate
ax.set_title("Pearson correlation between cell intersection and remainder")
ax.set_xlabel("Pearson Correlation")
ax.set_ylabel("Cell Count")
ax.legend()

plt.show()

../_images/notebooks_nuclear_correlation_41_0.png

The histogram below visualizes the Pearson correlation between transcript counts in the intersection and remainder of each cell, plotted against the Intersection over Union (IoU) with the nucleus. The calculated R2 (coefficient of determination) value is 0.005, indicating that IoU explains less than 1% of the variance in correlation. This negligible relationship suggests that normalization of the correlation metric by IoU is not necessary.

[25]:

import seaborn as sns

# Extract the relevant columns, excluding NaNs
df = obs[["IoU", "pearson_corr_parts"]].dropna()

# Compute regression metrics
slope, intercept, r_value, p_value, std_err = linregress(df["IoU"], df["pearson_corr_parts"])
r_squared = r_value**2

# Create the plot
plt.figure(figsize=(6, 6))
ax = sns.regplot(
    data=df,
    x="IoU",
    y="pearson_corr_parts",
    scatter_kws={"alpha": 0.6},
    line_kws={"color": "red"},
    ci=95,
)

# Overlay the R² value
ax.text(
    0.05,
    0.95,
    f"$R^2 = {r_squared:.3f}$",
    transform=ax.transAxes,
    verticalalignment="top",
    fontsize=12,
    bbox=dict(boxstyle="round,pad=0.3", facecolor="white", alpha=0.8),
)

ax.set_xlabel("IoU")
ax.set_ylabel("Pearson Correlation Between Parts")
ax.set_title("Correlation vs. IoU with Regression Line")
ax.grid(True)

plt.show()

../_images/notebooks_nuclear_correlation_43_0.png

Because spatialdata-plot plots all boundaries in grey, if there are NaN values in the annotation (describe before), we will copy the pearson_corr_parts column and set NaNvalues to 0.0just for plotting. We will try to solve this issue later.

[26]:

sdata["table"].obs["pearson_corr_parts_plot"] = (
    sdata["table"].obs["cell_id"].map(corr_parts_df["correlation_parts"])
)  # TODO
sdata["table"].obs.loc[sdata["table"].obs["pearson_corr_parts_plot"].isna(), "pearson_corr_parts_plot"] = 0.0

The spatial plots below shows the spatial distribution of computed correlations. The Pearson correlation between parts can be a measure of how much neighboring signal is captured.

[27]:

# link annotations with cell boundaries
sdata.tables["table"].obs["region"] = "cell_boundaries"
sdata.set_table_annotates_spatialelement("table", region="cell_boundaries")

axes = plt.subplots(1, 3, figsize=(16, 6), constrained_layout=True)[1].flatten()

sdata.pl.render_shapes(
    element="nucleus_boundaries",
    fill_alpha=0.2,
    outline_alpha=1.0,
    outline_color="black",
).pl.render_shapes(
    element="cell_boundaries",
    color="IoU",
    cmap="viridis",
    fill_alpha=0.5,
    outline_alpha=1.0,
    outline_width=0.5,
    outline_color="black",
).pl.show(
    ax=axes[0],
    title="Overlay of nuclei and cell masks colored by IoU",
    colorbar=True,
    figsize=(6, 6),
)

sdata.pl.render_shapes(
    element="nucleus_boundaries",
    fill_alpha=0.2,
    outline_alpha=1.0,
    outline_color="black",
).pl.render_shapes(
    element="cell_boundaries",
    color="pearson_corr",
    cmap="viridis",
    fill_alpha=0.5,
    outline_alpha=1.0,
    outline_width=0.5,
    outline_color="black",
).pl.render_points(
    "transcripts",
    color="feature_name",
    groups="MMP2",
    palette="red",
).pl.show(
    ax=axes[1],
    title="Overlay of nuclei and cell masks colored by pearson correlation",
    colorbar=True,
    figsize=(6, 6),
)

sdata.pl.render_shapes(
    element="nucleus_boundaries",
    fill_alpha=0.2,
    outline_alpha=1.0,
    outline_color="black",
).pl.render_shapes(
    element="cell_boundaries",
    color="pearson_corr_parts_plot",
    cmap="viridis",
    fill_alpha=0.5,
    outline_alpha=1.0,
    outline_width=0.5,
    outline_color="black",
).pl.render_points(
    "transcripts",
    color="feature_name",
    groups="MMP2",
    palette="red",
).pl.show(
    ax=axes[2],
    title="... colored by correlation between parts",
    colorbar=True,
    figsize=(6, 6),
)

# remove column again
sdata["table"].obs.drop(columns=["pearson_corr_parts_plot"], inplace=True)

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/spatialdata.py:511: UserWarning: Converting `region_key: region` to categorical dtype.
  convert_region_column_to_categorical(table)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:105: UserWarning: Key `cell_boundaries` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:125: UserWarning: Key `table` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)

INFO     input has more than 103 categories. Uniform 'grey' color will be used for all categories.

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:105: UserWarning: Key `cell_boundaries` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:125: UserWarning: Key `table` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/anndata/_core/anndata.py:381: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/anndata/_core/aligned_df.py:68: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:115: UserWarning: Key `transcripts` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata_plot/pl/utils.py:775: FutureWarning: The default value of 'ignore' for the `na_action` parameter in pandas.Categorical.map is deprecated and will be changed to 'None' in a future version. Please set na_action to the desired value to avoid seeing this warning
  color_vector = color_source_vector.map(color_mapping)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata_plot/pl/render.py:682: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = ax.scatter(

INFO     input has more than 103 categories. Uniform 'grey' color will be used for all categories.

/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:105: UserWarning: Key `cell_boundaries` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:125: UserWarning: Key `table` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/anndata/_core/anndata.py:381: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/anndata/_core/aligned_df.py:68: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata/_core/_elements.py:115: UserWarning: Key `transcripts` already exists. Overwriting it in-memory.
  self._check_key(key, self.keys(), self._shared_keys)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata_plot/pl/utils.py:775: FutureWarning: The default value of 'ignore' for the `na_action` parameter in pandas.Categorical.map is deprecated and will be changed to 'None' in a future version. Please set na_action to the desired value to avoid seeing this warning
  color_vector = color_source_vector.map(color_mapping)
/home/lazic/miniforge3/envs/sc_analysis_sdata/lib/python3.10/site-packages/spatialdata_plot/pl/render.py:682: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = ax.scatter(

../_images/notebooks_nuclear_correlation_47_5.png

[ ]:

Correlation to nuclear segmentation

Contents

Correlation to nuclear segmentation#

Intersection over Union between cell and nucleus masks#

Expression correlation between cell and nucleus#

Correlation between Intersection and Remainder Parts of the Cell#