The baseline (bl) accessor#
The baseline accessor provides basic quality control metrics such as the number of cells, the number of genes, and the number of detected genes per cell.
- segtraq.bl.baseline.genes_per_cell(sdata, tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_gene_key: str = 'feature_name', points_background_id: int = -1, tables_key: str = 'table', inplace: bool = True) DataFrame#
Calculates the number of unique genes detected per cell (excluding unassigned transcripts).
- Parameters:
sdata (object) – An object containing spatial transcriptomics data with a points attribute.
tables_cell_id_key (str) – Column in `sdata.tables[tables_key].obs containing cell IDs to match with sdata.shapes[shapes_key] index.
points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
points_gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).
points_background_id (int = -1,) – The value indicating an unassigned transcript. Default is -1.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
A DataFrame with one row per cell, containing the cell identifier and the count of unique genes detected in that cell.
- Return type:
pandas.DataFrame
- segtraq.bl.baseline.mean_transcripts_per_gene_per_cell(sdata: SpatialData, tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_gene_key: str = 'feature_name', points_background_id: int = -1, tables_key: str = 'table', inplace: bool = True) DataFrame#
Computes the mean number of transcripts per gene per cell (excluding unassigned transcripts).
Transcripts are first counted per (cell, gene). Then, for each cell, we compute the mean of these per-gene transcript counts across genes detected in that cell.
Notes
This mean is computed across detected genes only (i.e., genes with at least one transcript in the cell). Genes with zero transcripts in a cell are not included.
- Parameters:
sdata (object) – An object containing spatial transcriptomics data with a points attribute.
tables_cell_id_key (str) – Column in `sdata.tables[tables_key].obs containing cell IDs to match with sdata.shapes[shapes_key] index.
points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
points_gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).
points_background_id (int = -1,) – The value indicating an unassigned transcript. Default is -1.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
A DataFrame with one row per cell containing the mean transcripts per detected gene: columns are [points_cell_id_key, “mean_transcripts_per_gene”].
- Return type:
pd.DataFrame
- segtraq.bl.baseline.morphological_features(sdata: SpatialData, tables_cell_id_key: str = 'cell_id', tables_centroid_x_key: str = 'centroid_x', tables_centroid_y_key: str = 'centroid_y', shapes_key: str = 'cell_boundaries', features_to_compute: list | None = None, n_jobs: int = -1, tables_key: str = 'table', inplace: bool = True)#
Compute morphological features for cell shapes in a spatial transcriptomics dataset.
- Parameters:
sdata (object) – Spatial data object containing cell shape information. Must have a .shapes attribute with geometries.
tables_cell_id_key (str) – Column in sdata.tables[tables_key].obs containing cell IDs to match with `shapes_cell_id_key.
tables_centroid_x_key (str, optional) – Column in sdata.tables[tables_key].obs to store the x-coordinate of the centroid (default is “centroid_x”).
tables_centroid_y_key (str, optional) – Column in sdata.tables[tables_key].obs to store the y-coordinate of the centroid (default is “centroid_y”).
shapes_key (str, optional) – Key in sdata.shapes specifying the geometry column (default is “cell_boundaries”).
features_to_compute (list of str, optional) – List of morphological features to compute. If None, all available features are computed. Available features: “centroid”, “cell_area”, “perimeter”, “circularity”, “bbox_width”, “bbox_height”, “extent”, “solidity”, “convexity”, “elongation”, “eccentricity”, “compactness”, “num_polygons”.
n_jobs (int, optional) – Number of parallel jobs to use for computation. -1 uses all available CPUs (default is -1).
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
features – DataFrame containing the computed morphological features for each cell, indexed by sdata[shapes_key].index.name.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If any requested feature in features_to_compute is not recognized.
Notes
Requires geopandas, shapely, numpy, pandas, and joblib.
Some features are proxies or approximations (e.g., “sphericity” uses “circularity”).
Invalid or null geometries are filtered out before computation.
- segtraq.bl.baseline.num_cells(sdata: SpatialData, tables_key: str = 'table', inplace: bool = True) int#
Counts the number of cells in the given SpatialData object based on the specified table key.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing spatial information and a table.
tables_key (str, optional) – The key in the tables attribute of sdata that corresponds to table. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
The number of cells found under the specified table key.
- Return type:
int
- segtraq.bl.baseline.num_genes(sdata: SpatialData, points_key: str = 'transcripts', points_gene_key: str = 'feature_name', tables_key: str = 'table', inplace: bool = True) int#
Counts the number of unique genes in the given SpatialData object.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing gene information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
points_gene_key (str, optional) – The key to access gene names within the transcript data. Default is “feature_name”.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
The number of unique genes found in the specified SpatialData object.
- Return type:
int
- segtraq.bl.baseline.num_transcripts(sdata: SpatialData, points_key: str = 'transcripts', tables_key: str = 'table', inplace: bool = True) int#
Counts the total number of transcripts in the given SpatialData object.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing transcript information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
The total number of transcripts in the specified SpatialData object.
- Return type:
int
- segtraq.bl.baseline.perc_unassigned_transcripts(sdata: SpatialData, points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: int = -1, tables_key: str = 'table', inplace: bool = True) float#
Calculates the percentage of unassigned transcripts in a SpatialData object.
- Parameters:
sdata (sd.SpatialData) – The spatial data object containing transcript information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
points_cell_id_key (str, optional) – The key to access cell assignment information within the transcript data. Default is “cell_id”.
unassigned_key (int, optional) – The value indicating an unassigned transcript. Default is -1.
points_background_id (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
The fraction of transcripts that are unassigned.
- Return type:
float
- segtraq.bl.baseline.perc_unassigned_transcripts_per_gene(sdata: SpatialData, points_key: str = 'transcripts', points_gene_key: str = 'feature_name', points_cell_id_key: str = 'cell_id', points_background_id: int = -1, tables_key: str = 'table', inplace: bool = True) DataFrame#
Calculates the number and percentage of unassigned transcripts per gene in a SpatialData object.
- Parameters:
sdata (sd.SpatialData) – The spatial data object containing transcript information.
points_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
points_gene_key (str, optional) – The key for gene names in the transcript data. Default is “feature_name”.
points_cell_id_key (str, optional) – The key for cell assignment information within the transcript data. Default is “cell_id”.
points_background_id (int, optional) – The value indicating an unassigned transcript. Default is -1.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, stores the resulting DataFrame in sdata.tables[tables_key].uns[“perc_unassigned_transcripts_per_gene”]. Default is True.
- Returns:
A DataFrame indexed by gene name with columns: - ‘total’ : total number of transcripts for the gene - ‘unassigned’ : number of unassigned transcripts - ‘perc_unassigned’ : percentage of unassigned transcripts
- Return type:
pandas.DataFrame
- segtraq.bl.baseline.transcript_density(sdata: SpatialData, tables_key: str = 'table', tables_cell_id_key: str = 'cell_id', tables_area_key: str = 'cell_area', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: int = -1, inplace: bool = True) DataFrame#
Calculates the transcript density for each cell in a SpatialData object. Transcript density is defined as the number of transcripts per unit area for each cell.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
tables_cell_id_key (str, optional) – The key in the table indicating cell identifiers. Default is “cell_id”.
tables_area_key (str, optional) – The key in the table indicating the cell area. Default is “cell_area”.
points_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
points_cell_id_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
points_background_id (int = -1,) – The value indicating an unassigned transcript. Default is -1.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
A DataFrame with columns [cell_key, “transcript_density”], where “transcript_density” is the number of transcripts per unit area for each cell. Rows with missing values are dropped.
- Return type:
pd.DataFrame
- segtraq.bl.baseline.transcripts_per_cell(sdata: SpatialData, tables_cell_id_key: str = 'cell_id', points_key: str = 'transcripts', points_cell_id_key: str = 'cell_id', points_background_id: int = -1, tables_key: str = 'table', inplace: bool = True) DataFrame#
Counts the number of transcripts assigned to each cell (excluding unassigned transcripts).
- Parameters:
sdata (sd.SpatialData) – A SpatialData object containing transcript and cell assignment information.
tables_cell_id_key (str) – Column in `sdata.tables[tables_key].obs containing cell IDs to match with sdata.shapes[shapes_key] index.
points_key (str, optional) – The key in sdata.points corresponding to transcript data. Default is “transcripts”.
points_cell_id_key (str, optional) – The column name in the transcript data that contains cell assignment information. Default is “cell_id”.
points_background_id (int = -1,) – The value indicating an unassigned transcript. Default is -1.
tables_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
inplace (bool, optional) – If True, modifies the SpatialData object in place. Default is True.
- Returns:
A DataFrame with two columns: the cell identifier (cell_key) and the corresponding transcript count (“transcript_count”).
- Return type:
pd.DataFrame