The baseline (bl
) accessor#
The baseline accessor provides basic quality control metrics such as the number of cells, the number of genes, and the number of detected genes per cell.
- segtraq.bl.baseline.genes_per_cell(sdata, transcript_key='transcripts', cell_key='cell_id', gene_key='feature_name')#
Calculates the number of unique genes detected per cell.
- Parameters:
sdata (object) – An object containing spatial transcriptomics data with a points attribute.
transcript_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).
cell_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).
gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).
- Returns:
A DataFrame with one row per cell, containing the cell identifier and the count of unique genes detected in that cell.
- Return type:
pandas.DataFrame
- segtraq.bl.baseline.morphological_features(sdata, shape_key: str = 'cell_boundaries', id_key: str = 'cell_id', features_to_compute: list = None, n_jobs: int = -1)#
Compute morphological features for cell shapes in a spatial transcriptomics dataset.
- Parameters:
sdata (object) – Spatial data object containing cell shape information. Must have a .shapes attribute with geometries.
shape_key (str, optional) – Key in sdata.shapes specifying the geometry column (default is “cell_boundaries”).
id_key (str, optional) – Key in sdata.shapes specifying the unique cell identifier column (default is “cell_id”).
features_to_compute (list of str, optional) – List of morphological features to compute. If None, all available features are computed. Available features: “cell_area”, “perimeter”, “circularity”, “bbox_width”, “bbox_height”, “extent”, “solidity”, “convexity”, “elongation”, “eccentricity”, “compactness”, “sphericity”.
n_jobs (int, optional) – Number of parallel jobs to use for computation. -1 uses all available CPUs (default is -1).
- Returns:
features – DataFrame containing the computed morphological features for each cell, indexed by id_key.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If any requested feature in features_to_compute is not recognized.
Notes
Requires geopandas, shapely, numpy, pandas, and joblib.
Some features are proxies or approximations (e.g., “sphericity” uses “circularity”).
Invalid or null geometries are filtered out before computation.
- segtraq.bl.baseline.num_cells(sdata: SpatialData, shape_key: str = 'cell_boundaries') int #
Counts the number of cells in the given SpatialData object based on the specified shape key.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing spatial information and cell boundaries.
shape_key (str, optional) – The key in the shapes attribute of sdata that corresponds to cell boundaries. Default is “cell_boundaries”.
- Returns:
The number of cells found under the specified shape key.
- Return type:
int
- segtraq.bl.baseline.num_genes(sdata: SpatialData, transcript_key: str = 'transcripts', gene_key: str = 'feature_name') int #
Counts the number of unique genes in the given SpatialData object.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing gene information.
transcript_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
gene_key (str, optional) – The key to access gene names within the transcript data. Default is “feature_name”.
- Returns:
The number of unique genes found in the specified SpatialData object.
- Return type:
int
- segtraq.bl.baseline.num_transcripts(sdata: SpatialData, transcript_key: str = 'transcripts')#
Counts the total number of transcripts in the given SpatialData object.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing transcript information.
transcript_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
- Returns:
The total number of transcripts in the specified SpatialData object.
- Return type:
int
- segtraq.bl.baseline.perc_unassigned_transcripts(sdata: SpatialData, transcript_key: str = 'transcripts', cell_key: str = 'cell_id', unassigned_key: int = -1) float #
Calculates the proportion of unassigned transcripts in a SpatialData object.
- Parameters:
sdata (sd.SpatialData) – The spatial data object containing transcript information.
transcript_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.
cell_key (str, optional) – The key to access cell assignment information within the transcript data. Default is “cell_id”.
unassigned_key (int, optional) – The value indicating an unassigned transcript. Default is -1.
- Returns:
The fraction of transcripts that are unassigned.
- Return type:
float
- segtraq.bl.baseline.transcript_density(sdata: SpatialData, table_key: str = 'table', transcript_key: str = 'transcripts', cell_key: str = 'cell_id') DataFrame #
Calculates the transcript density for each cell in a SpatialData object. Transcript density is defined as the number of transcripts per unit area for each cell.
- Parameters:
sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.
table_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.
transcript_key (str, optional) – The key in the transcript table indicating transcript identifiers. Default is “transcripts”.
cell_key (str, optional) – The key in the table indicating cell identifiers. Default is “cell_id”.
- Returns:
A DataFrame with columns [cell_key, “transcript_density”], where “transcript_density” is the number of transcripts per unit area for each cell. Rows with missing values are dropped.
- Return type:
pd.DataFrame
Notes
Requires that the input AnnData table contains a “cell_area” column in .obs.
- segtraq.bl.baseline.transcripts_per_cell(sdata: SpatialData, transcript_key: str = 'transcripts', cell_key: str = 'cell_id') DataFrame #
Counts the number of transcripts assigned to each cell.
- Parameters:
sdata (sd.SpatialData) – A SpatialData object containing transcript and cell assignment information.
transcript_key (str, optional) – The key in sdata.points corresponding to transcript data. Default is “transcripts”.
cell_key (str, optional) – The column name in the transcript data that contains cell assignment information. Default is “cell_id”.
- Returns:
A DataFrame with two columns: the cell identifier (cell_key) and the corresponding transcript count (“transcript_count”).
- Return type:
pd.DataFrame