The baseline (`bl`) accessor#

The baseline accessor provides basic quality control metrics such as the number of cells, the number of genes, and the number of detected genes per cell.

segtraq.bl.baseline.genes_per_cell(sdata, tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_gene_key: str = 'feature_name', tables_key: str = 'table', inplace: bool = True) → DataFrame#

Calculates the number of unique genes detected per cell.

Parameters:

sdata (object) – An object containing spatial transcriptomics data with a points attribute.
tables_cell_id_key (str) – Column in sdata.tables[tables_key].obs containing cell IDs to match with `shapes_cell_id_key.
points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
points_gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

A DataFrame with one row per cell, containing the cell identifier and the count of unique genes detected in that cell.

Return type:

pandas.DataFrame

segtraq.bl.baseline.mean_transcripts_per_gene_per_cell(sdata: SpatialData, tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_gene_key: str = 'feature_name', tables_key: str = 'table', inplace: bool = True) → DataFrame#

Computes the mean number of transcripts per gene per cell.

Transcripts are first counted per (cell, gene). Then, for each cell, we compute the mean of these per-gene transcript counts across genes detected in that cell.

Notes

This mean is computed across detected genes only (i.e., genes with at least one transcript in the cell). Genes with zero transcripts in a cell are not included.

Parameters:

sdata (object) – An object containing spatial transcriptomics data with a points attribute.
tables_cell_id_key (str) – Column in sdata.tables[tables_key].obs containing cell IDs to match with `shapes_cell_id_key.
points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
points_gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

A DataFrame with one row per cell containing the mean transcripts per detected gene: columns are [points_cell_id_key, “mean_transcripts_per_gene”].

Return type:

pd.DataFrame

segtraq.bl.baseline.morphological_features(sdata: SpatialData, tables_cell_id_key: str = 'cell_id', shapes_key: str = 'cell_boundaries', shapes_cell_id_key: str = 'cell_id', features_to_compute: list | None = None, n_jobs: int = -1, tables_key: str = 'table', inplace: bool = True)#

Compute morphological features for cell shapes in a spatial transcriptomics dataset.

Parameters:

sdata (object) – Spatial data object containing cell shape information. Must have a .shapes attribute with geometries.
tables_cell_id_key (str) – Column in sdata.tables[tables_key].obs containing cell IDs to match with `shapes_cell_id_key.
shapes_key (str, optional) – Key in sdata.shapes specifying the geometry column (default is “cell_boundaries”).
shapes_cell_id_key (str, optional) – Key in sdata.shapes specifying the unique cell identifier column (default is “cell_id”).
features_to_compute (list of str, optional) – List of morphological features to compute. If None, all available features are computed. Available features: “cell_area”, “perimeter”, “circularity”, “bbox_width”, “bbox_height”, “extent”, “solidity”, “convexity”, “elongation”, “eccentricity”, “compactness”, “num_polygons”.
n_jobs (int, optional) – Number of parallel jobs to use for computation. -1 uses all available CPUs (default is -1).
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

features – DataFrame containing the computed morphological features for each cell, indexed by shapes_cell_id_key.

Return type:

pandas.DataFrame

Raises:

ValueError – If any requested feature in features_to_compute is not recognized.

Notes

Requires geopandas, shapely, numpy, pandas, and joblib.
Some features are proxies or approximations (e.g., “sphericity” uses “circularity”).
Invalid or null geometries are filtered out before computation.

segtraq.bl.baseline.num_cells(sdata: SpatialData, tables_key: str = 'table', inplace: bool = True) → int#

Counts the number of cells in the given SpatialData object based on the specified table key.

Parameters:

sdata (sd.SpatialData) – The SpatialData object containing spatial information and a table.
tables_key (str, optional) – The key in the tables attribute of sdata that corresponds to table. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

The number of cells found under the specified table key.

Return type:

int

segtraq.bl.baseline.num_genes(sdata: SpatialData, points_key: str = 'transcripts', points_gene_key: str = 'feature_name', tables_key: str = 'table', inplace: bool = True) → int#

Counts the number of unique genes in the given SpatialData object.

Parameters:

sdata (sd.SpatialData) – The SpatialData object containing gene information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
points_gene_key (str, optional) – The key to access gene names within the transcript data. Default is “feature_name”.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

The number of unique genes found in the specified SpatialData object.

Return type:

int

segtraq.bl.baseline.num_transcripts(sdata: SpatialData, points_key: str = 'transcripts', tables_key: str = 'table', inplace: bool = True) → int#

Counts the total number of transcripts in the given SpatialData object.

Parameters:

sdata (sd.SpatialData) – The SpatialData object containing transcript information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

The total number of transcripts in the specified SpatialData object.

Return type:

int

segtraq.bl.baseline.perc_unassigned_transcripts(sdata: SpatialData, points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: int = -1, tables_key: str = 'table', inplace: bool = True) → float#

Calculates the percentage of unassigned transcripts in a SpatialData object.

Parameters:

sdata (sd.SpatialData) – The spatial data object containing transcript information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
points_cell_id_key (str, optional) – The key to access cell assignment information within the transcript data. Default is “cell_id”.
unassigned_key (int, optional) – The value indicating an unassigned transcript. Default is -1.
points_background_id (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

The fraction of transcripts that are unassigned.

Return type:

float

segtraq.bl.baseline.perc_unassigned_transcripts_per_gene(sdata: SpatialData, points_key: str = 'transcripts', points_gene_key: str = 'feature_name', points_cell_id_key: str = 'cell_id', points_background_id: int = -1, tables_key: str = 'table', inplace: bool = True) → DataFrame#

Calculates the number and percentage of unassigned transcripts per gene in a SpatialData object.

Parameters:

sdata (sd.SpatialData) – The spatial data object containing transcript information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
points_gene_key (str, optional) – The key for gene names in the transcript data. Default is “feature_name”.
points_cell_id_key (str, optional) – The key for cell assignment information within the transcript data. Default is “cell_id”.
points_background_id (int, optional) – The value indicating an unassigned transcript. Default is -1.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, stores the resulting DataFrame in sdata.tables[tables_key].uns[“perc_unassigned_transcripts_per_gene”]. Default is True.

Returns:

A DataFrame indexed by gene name with columns: - ‘total’ : total number of transcripts for the gene - ‘unassigned’ : number of unassigned transcripts - ‘perc_unassigned’ : percentage of unassigned transcripts

Return type:

pandas.DataFrame

segtraq.bl.baseline.transcript_density(sdata: SpatialData, tables_key: str = 'table', points_key: str = 'transcripts', tables_cell_id_key: str = 'cell_id', tables_area_volume_key: str = 'cell_area', inplace: bool = True) → DataFrame#

Calculates the transcript density for each cell in a SpatialData object. Transcript density is defined as the number of transcripts per unit area for each cell.

Parameters:

sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
points_key (str, optional) – The key in the transcript table indicating transcript identifiers. Default is “transcripts”.
tables_cell_id_key (str, optional) – The key in the table indicating cell identifiers. Default is “cell_id”.
tables_area_volume_key (str, optional) – The key in the table indicating the cell area/volume. Default is “cell_area”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

A DataFrame with columns [cell_key, “transcript_density”], where “transcript_density” is the number of transcripts per unit area for each cell. Rows with missing values are dropped.

Return type:

pd.DataFrame

segtraq.bl.baseline.transcripts_per_cell(sdata: SpatialData, tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', tables_key: str = 'table', inplace: bool = True) → DataFrame#

Counts the number of transcripts assigned to each cell.

Parameters:

sdata (sd.SpatialData) – A SpatialData object containing transcript and cell assignment information.
tables_cell_id_key (str) – Column in sdata.tables[tables_key].obs containing cell IDs to match with `shapes_cell_id_key.
points_key (str, optional) – The key in sdata.points corresponding to transcript data. Default is “transcripts”.
points_cell_id_key (str, optional) – The column name in the transcript data that contains cell assignment information. Default is “cell_id”.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.

Returns:

A DataFrame with two columns: the cell identifier (cell_key) and the corresponding transcript count (“transcript_count”).

Return type:

pd.DataFrame

The baseline (bl) accessor

Contents

The baseline (`bl`) accessor#

The baseline (bl) accessor

Contents

The baseline (bl) accessor#

The baseline (`bl`) accessor#