The clustering stability (cs) accessor#

The clustering stability accessor provides metrics for assessing the stability of clustering results across different resolutions and random subsets of genes.

segtraq.cs.clustering_stability.compute_ari(sdata: SpatialData, resolution: float = 1.0, n_genes_subset: int = 100, key_prefix: str = 'leiden_subset') float#

Compute the clustering stability using pairwise adjusted Rand index (ARI) on random subsets of genes.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing clustering information.

  • resolution (float, optional) – The resolution parameter for Leiden clustering, by default 1.0.

  • n_genes_subset (int, optional) – The number of genes to subset for clustering, by default 100.

  • key_prefix (str, optional) – The prefix for the keys under which the clustering results are stored, by default “leiden_subset”.

Returns:

The average pairwise ARI across the specified cluster keys.

Return type:

float

segtraq.cs.clustering_stability.compute_purity(sdata: SpatialData, resolution: float = 1.0, n_genes_subset: int = 100, key_prefix: str = 'leiden_subset') float#

Compute the clustering consistency using pairwise purity scores across clustering runs on random gene subsets.

Parameters:
  • sdata (SpatialData) – The SpatialData object.

  • resolution (float) – Leiden resolution parameter.

  • n_genes_subset (int) – Number of genes to use per clustering run.

  • key_prefix (str) – Prefix for storing cluster labels in .obs.

Returns:

Average pairwise purity score.

Return type:

float

segtraq.cs.clustering_stability.compute_rmsd(sdata: SpatialData, resolution: float | list[float] = (0.6, 0.8, 1.0), key_prefix: str = 'leiden_subset', random_state: int = 42) float#

Compute RMSD for different Leiden clustering resolutions and report the best (lowest) RMSD.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing clustering information.

  • resolution (float or list of float, optional) – The resolution parameter(s) for Leiden clustering, by default (0.6, 0.8, 1.0).

  • key_prefix (str, optional) – Prefix for clustering keys in .obs, by default “leiden_subset”.

  • random_state (int, optional) – Seed for reproducibility, by default 42.

Returns:

The best (lowest) RMSD across resolutions.

Return type:

float

segtraq.cs.clustering_stability.compute_silhouette_score(sdata: SpatialData, resolution: float | list[float] = (0.6, 0.8, 1.0), metric: str = 'euclidean', ncomps: int = 30, key_prefix: str = 'leiden_subset', random_state: int = 42) float#

Compute the silhouette score for different resolutions and report the best one.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing clustering information.

  • resolution (float, optional) – The resolution parameter for Leiden clustering, by default 1.0.

  • metric (str, optional) – The metric to use for silhouette score calculation, by default “euclidean”.

  • ncomps (int, optional) – The number of principal components to use, by default 30.

  • key_prefix (str, optional) – The prefix for the keys under which the clustering results are stored, by default “leiden_subset”.

  • random_state (int, optional) – Seed for reproducibility, by default 42.

Returns:

The silhouette score of the clustering.

Return type:

float

segtraq.cs.clustering_stability.compute_z_plane_correlation(sdata: SpatialData, quantile: float = 25, transcript_key: str = 'transcripts', cell_key: str = 'cell_id', gene_key: str = 'feature_name') DataFrame#

Compute the Pearson correlation between the top and bottom quantiles of transcripts in the z-plane.

This function computes the Pearson correlation between the top and bottom quantiles of transcripts in the z-plane for each cell. It subsets the transcripts based on the z-coordinate and calculates the correlation for each cell.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing transcript data.

  • quantile (float, optional) – The quantile to use for bottom and top subsets, by default 25.

  • transcript_key (str, optional) – The key for transcripts in sdata.points, by default “transcripts”.

  • cell_key (str, optional) – The key for cell IDs in sdata.points, by default “cell_id”.

  • gene_key (str, optional) – The key for gene names in sdata.points, by default “feature_name”.

Returns:

A DataFrame with cell IDs as index and Pearson correlations as values.

Return type:

pd.DataFrame