The clustering stability (cs
) accessor#
The clustering stability accessor provides metrics for assessing the stability of clustering results across different resolutions and random subsets of genes.
- segtraq.cs.clustering_stability.compute_ari(sdata: SpatialData, resolution: float = 1.0, n_genes_subset: int = 100, key_prefix: str = 'leiden_subset') float #
Compute the clustering stability using pairwise adjusted Rand index (ARI) on random subsets of genes.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing clustering information.
resolution (float, optional) – The resolution parameter for Leiden clustering, by default 1.0.
n_genes_subset (int, optional) – The number of genes to subset for clustering, by default 100.
key_prefix (str, optional) – The prefix for the keys under which the clustering results are stored, by default “leiden_subset”.
- Returns:
The average pairwise ARI across the specified cluster keys.
- Return type:
float
- segtraq.cs.clustering_stability.compute_purity(sdata: SpatialData, resolution: float = 1.0, n_genes_subset: int = 100, key_prefix: str = 'leiden_subset') float #
Compute the clustering consistency using pairwise purity scores across clustering runs on random gene subsets.
- Parameters:
sdata (SpatialData) – The SpatialData object.
resolution (float) – Leiden resolution parameter.
n_genes_subset (int) – Number of genes to use per clustering run.
key_prefix (str) – Prefix for storing cluster labels in .obs.
- Returns:
Average pairwise purity score.
- Return type:
float
- segtraq.cs.clustering_stability.compute_rmsd(sdata: SpatialData, resolution: float | list[float] = (0.6, 0.8, 1.0), key_prefix: str = 'leiden_subset', random_state: int = 42) float #
Compute RMSD for different Leiden clustering resolutions and report the best (lowest) RMSD.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing clustering information.
resolution (float or list of float, optional) – The resolution parameter(s) for Leiden clustering, by default (0.6, 0.8, 1.0).
key_prefix (str, optional) – Prefix for clustering keys in .obs, by default “leiden_subset”.
random_state (int, optional) – Seed for reproducibility, by default 42.
- Returns:
The best (lowest) RMSD across resolutions.
- Return type:
float
- segtraq.cs.clustering_stability.compute_silhouette_score(sdata: SpatialData, resolution: float | list[float] = (0.6, 0.8, 1.0), metric: str = 'euclidean', ncomps: int = 30, key_prefix: str = 'leiden_subset', random_state: int = 42) float #
Compute the silhouette score for different resolutions and report the best one.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing clustering information.
resolution (float, optional) – The resolution parameter for Leiden clustering, by default 1.0.
metric (str, optional) – The metric to use for silhouette score calculation, by default “euclidean”.
ncomps (int, optional) – The number of principal components to use, by default 30.
key_prefix (str, optional) – The prefix for the keys under which the clustering results are stored, by default “leiden_subset”.
random_state (int, optional) – Seed for reproducibility, by default 42.
- Returns:
The silhouette score of the clustering.
- Return type:
float
- segtraq.cs.clustering_stability.compute_z_plane_correlation(sdata: SpatialData, quantile: float = 25, transcript_key: str = 'transcripts', cell_key: str = 'cell_id', gene_key: str = 'feature_name') DataFrame #
Compute the Pearson correlation between the top and bottom quantiles of transcripts in the z-plane.
This function computes the Pearson correlation between the top and bottom quantiles of transcripts in the z-plane for each cell. It subsets the transcripts based on the z-coordinate and calculates the correlation for each cell.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing transcript data.
quantile (float, optional) – The quantile to use for bottom and top subsets, by default 25.
transcript_key (str, optional) – The key for transcripts in sdata.points, by default “transcripts”.
cell_key (str, optional) – The key for cell IDs in sdata.points, by default “cell_id”.
gene_key (str, optional) – The key for gene names in sdata.points, by default “feature_name”.
- Returns:
A DataFrame with cell IDs as index and Pearson correlations as values.
- Return type:
pd.DataFrame